diff options
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r-- | doc/gawktexi.in | 2007 |
1 files changed, 1726 insertions, 281 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 57cb727a..d13ea969 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -44,6 +44,14 @@ @set MINUS @end ifdocbook +@iftex +@set TIMES @times +@end iftex +@ifnottex +@set TIMES * +@end ifnottex + + @set xref-automatic-section-title @c The following information should be updated here only! @@ -51,7 +59,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH January, 2017 +@set UPDATE-MONTH July, 2017 @set VERSION 4.1 @set PATCHLEVEL 4 @@ -555,7 +563,13 @@ particular records in a file and perform operations upon them. field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is + splitting records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} @@ -576,6 +590,7 @@ particular records in a file and perform operations upon them. @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -605,6 +620,7 @@ particular records in a file and perform operations upon them. * Special Caveats:: Things to watch out for. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. * Values:: Constants, Variables, and Regular @@ -614,6 +630,9 @@ particular records in a file and perform operations upon them. * Nondecimal-numbers:: What are octal and hex numbers. * Regexp Constants:: Regular Expression constants. * Using Constant Regexps:: When and how to use a regexp constant. +* Standard Regexp Constants:: Regexp constants in standard + @command{awk}. +* Strong Regexp Constants:: Strongly typed regexp constants. * Variables:: Variables give names to values for later use. * Using Variables:: Using variables in your programs. @@ -884,6 +903,7 @@ particular records in a file and perform operations upon them. * Setting the rounding mode:: How to set the rounding mode. * Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with @command{gawk}. +* Checking for MPFR:: How to check if MPFR is available. * POSIX Floating Point Problems:: Standards Versus Existing Practice. * Floating point summary:: Summary of floating point discussion. * Extension Intro:: What is an extension. @@ -916,11 +936,14 @@ particular records in a file and perform operations upon them. * Array Functions:: Functions for working with arrays. * Flattening Arrays:: How to flatten arrays. * Creating Arrays:: How to create and populate arrays. +* Redirection API:: How to access and manipulate + redirections. * Extension API Variables:: Variables provided by the API. * Extension Versioning:: API Version information. * Extension API Informational Variables:: Variables providing information about @command{gawk}'s invocation. * Extension API Boilerplate:: Boilerplate code for using the API. +* Changes from API V1:: Changes from V1 of the API. * Finding Extensions:: How @command{gawk} finds compiled extensions. * Extension Example:: Example C code for an extension. @@ -974,14 +997,16 @@ particular records in a file and perform operations upon them. * Unix Installation:: Installing @command{gawk} under various versions of Unix. * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. * Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling @command{gawk} on - Microsoft Windows. +* PC Installation:: Installing and Compiling + @command{gawk} on Microsoft Windows. * PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for Windows32. +* PC Compiling:: Compiling @command{gawk} for + Windows32. * PC Using:: Running @command{gawk} on Windows32. * Cygwin:: Building and running @command{gawk} for Cygwin. @@ -2933,14 +2958,59 @@ it is worth addressing. @cindex Brink, Jeroen The ``shells'' on Microsoft Windows systems use the double-quote character for quoting, and make it difficult or impossible to include an -escaped double-quote character in a command-line script. -The following example, courtesy of Jeroen Brink, shows -how to print all lines in a file surrounded by double quotes: +escaped double-quote character in a command-line script. The following +example, courtesy of Jeroen Brink, shows how to escape the double quotes +from this one liner script that prints all lines in a file surrounded by +double quotes: + +@example +@{ print "\"" $0 "\"" @} +@end example + +@noindent +In an MS-Windows command-line the one-liner script above may be passed as +follows: @example gawk "@{ print \"\042\" $0 \"\042\" @}" @var{file} @end example +In this example the @samp{\042} is the octal code for a double-quote; +@command{gawk} converts it into a real double-quote for output by +the @code{print} statement. + +In MS-Windows escaping double-quotes is a little tricky because you use +backslashes to escape double-quotes, but backslashes themselves are not +escaped in the usual way; indeed they are either duplicated or not, +depending upon whether there is a subsequent double-quote. The MS-Windows +rule for double-quoting a string is the following: + +@enumerate +@item +For each double quote in the original string, let @var{N} be the number +of backslash(es) before it, @var{N} might be zero. Replace these @var{N} +backslash(es) by @math{2@value{TIMES}@var{N}+1} backslash(es) + +@item +Let @var{N} be the number of backslash(es) tailing the original string, +@var{N} might be zero. Replace these @var{N} backslash(es) by +@math{2@value{TIMES}@var{N}} backslash(es) + +@item +Surround the resulting string by double-quotes. +@end enumerate + +So to double-quote the one-liner script @samp{@{ print "\"" $0 "\"" @}} +from the previous example you would do it this way: + +@example +gawk "@{ print \"\\\"\" $0 \"\\\"\" @}" @var{file} +@end example + +@noindent +However, the use of @samp{\042} instead of @samp{\\\"} is also possible +and easier to read, because backslashes that are not followed by a +double-quote don't need duplication. @node Sample Data Files @section @value{DDF}s for the Examples @@ -3809,6 +3879,24 @@ This is particularly useful when you have library functions that you want to use from your command-line programs (@pxref{AWKPATH Variable}). +Note that @command{gawk} treats each string as if it ended with +a newline character (even if it doesn't). This makes building +the total program easier. + +@quotation CAUTION +At the moment, there is no requirement that each @var{program-text} +be a full syntactic unit. I.e., the following currently works: + +@example +$ @kbd{gawk -e 'BEGIN @{ a = 5 ;' -e 'print a @}'} +@print{} 5 +@end example + +@noindent +However, this could change in the future, so it's not a +good idea to rely upon this feature. +@end quotation + @item @option{-E} @var{file} @itemx @option{--exec} @var{file} @cindex @option{-E} option @@ -3960,6 +4048,7 @@ when parsing numeric input data (@pxref{Locales}). @cindex @option{-o} option @cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. +Implies @option{--no-optimize}. By default, the output program is created in a file named @file{awkprof.out} (@pxref{Profiling}). The optional @var{file} argument allows you to specify a different @@ -3968,18 +4057,22 @@ No space is allowed between the @option{-o} and @var{file}, if @var{file} is supplied. @quotation NOTE -Due to the way @command{gawk} has evolved, with this option -your program still executes. This will change in the -next major release, such that @command{gawk} will only -pretty-print the program and not run it. +In the past, this option would also execute your program. +This is no longer the case. @end quotation @item @option{-O} @itemx @option{--optimize} @cindex @option{--optimize} option @cindex @option{-O} option -Enable some optimizations on the internal representation of the program. -At the moment, this includes just simple constant folding. +Enable @command{gawk}'s default optimizations on the internal +representation of the program. At the moment, this includes simple +constant folding and tail recursion elimination in function calls. + +These optimizations are enabled by default. +This option remains primarily for backwards compatibility. However, it may +be used to cancel the effect of an earlier @option{-s} option +(see later in this list). @item @option{-p}[@var{file}] @itemx @option{--profile}[@code{=}@var{file}] @@ -3988,6 +4081,7 @@ At the moment, this includes just simple constant folding. @cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). +Implies @option{--no-optimize}. By default, profiles are created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different @value{FN} for the profile file. @@ -4017,11 +4111,6 @@ restrictions apply: @cindex newlines @cindex whitespace, newlines as @item -Newlines do not act as whitespace to separate fields when @code{FS} is -equal to a single space -(@pxref{Fields}). - -@item Newlines are not allowed after @samp{?} or @samp{:} (@pxref{Conditional Exp}). @@ -4059,6 +4148,13 @@ This is now @command{gawk}'s default behavior. Nevertheless, this option remains (both for backward compatibility and for use in combination with @option{--traditional}). +@item @option{-s} +@itemx @option{--no-optimize} +@cindex @option{--no-optimize} option +@cindex @option{-s} option +Disable @command{gawk}'s default optimizations on the internal +representation of the program. + @item @option{-S} @itemx @option{--sandbox} @cindex @option{-S} option @@ -4372,6 +4468,9 @@ searches first in the current directory and then in @file{/usr/local/share/awk}. In practice, this means that you will rarely need to change the value of @env{AWKPATH}. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4403,6 +4502,9 @@ an empty value, @command{gawk} uses a default path; this is typically @samp{/usr/local/lib/gawk}, although it can vary depending upon how @command{gawk} was built. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKLIBPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4430,6 +4532,8 @@ wait for input before returning with an error. Controls the number of times @command{gawk} attempts to retry a two-way TCP/IP (socket) connection before giving up. @xref{TCP/IP Networking}. +Note that when nonfatal I/O is enabled (@pxref{Nonfatal}), +@command{gawk} only tries to open a TCP/IP socket once. @item POSIXLY_CORRECT Causes @command{gawk} to switch to POSIX-compatibility @@ -4484,14 +4588,6 @@ two regexp matchers that @command{gawk} uses internally. (There aren't supposed to be differences, but occasionally theory and practice don't coordinate with each other.) -@item GAWK_NO_PP_RUN -When @command{gawk} is invoked with the @option{--pretty-print} option, -it will not run the program if this environment variable exists. - -@quotation CAUTION -This variable will not survive into the next major release. -@end quotation - @item GAWK_STACKSIZE This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. @@ -4789,6 +4885,13 @@ Similarly, you may use @code{print} or @code{printf} statements in the @var{init} and @var{increment} parts of a @code{for} loop. This is another long-undocumented ``feature'' of Unix @command{awk}. +@command{gawk} lets you use the names of built-in functions that are +@command{gawk} extensions as the names of parameters in user-defined functions. +This is intended to ``future-proof'' old code that happens to use +function names added by @command{gawk} after the code was written. +Standard @command{awk} built-in functions, such as @code{sin()} or +@code{substr()} are @emph{not} shadowed in this way. + @end ignore @node Invoking Summary @@ -5071,17 +5174,21 @@ between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @item \x@var{hh}@dots{} The hexadecimal value @var{hh}, where @var{hh} stands for a sequence of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} -or @samp{a}--@samp{f}). Like the same construct -in ISO C, the escape sequence continues until the first nonhexadecimal -digit is seen. @value{COMMONEXT} -However, using more than two hexadecimal digits produces -undefined results. (The @samp{\x} escape sequence is not allowed in -POSIX @command{awk}.) +or @samp{a}--@samp{f}). A maximum of two digts are allowed after +the @samp{\x}. Any further hexadecimal digits are treated as simple +letters or numbers. @value{COMMONEXT} +(The @samp{\x} escape sequence is not allowed in POSIX awk.) @quotation CAUTION -The next major release of @command{gawk} will change, such -that a maximum of two hexadecimal digits following the -@samp{\x} will be used. +In ISO C, the escape sequence continues until the first nonhexadecimal +digit is seen. +For many years, @command{gawk} would continue incorporating +hexadecimal digits into the value until a non-hexadecimal digit +or the end of the string was encountered. +However, using more than two hexadecimal digits produced +undefined results. +As of @value{PVERSION} 4.2, only two digits +are processed. @end quotation @cindex @code{\} (backslash), @code{\/} escape sequence @@ -6096,10 +6203,13 @@ used with it do not have to be named on the @command{awk} command line * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is splitting + records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -6417,16 +6527,12 @@ Readfile} for another option. @cindex fields @cindex accessing fields @cindex fields, examining -@cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and -@cindex separators, field, POSIX and When @command{awk} reads an input record, the record is automatically @dfn{parsed} or separated by the @command{awk} utility into chunks called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, like words in a line. Whitespace in @command{awk} means any string of one or more spaces, -TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not -considered whitespace for separating fields.} other characters +TABs, or newlines; other characters that are considered whitespace by other languages (such as formfeed, vertical tab, etc.) are @emph{not} considered whitespace by @command{awk}. @@ -6840,7 +6946,6 @@ can massage it first with a separate @command{awk} program.) @node Default Field Splitting @subsection Whitespace Normally Separates Fields -@cindex newlines, as field separators @cindex whitespace, as field separators Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not @@ -7241,18 +7346,30 @@ feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @command{gawk} provides a facility for dealing with fixed-width fields -with no distinctive field separator. For example, data of this nature -arises in the input for old Fortran programs where numbers are run -together, or in the output of programs that did not anticipate the use -of their output as input for other programs. - -An example of the latter is a table where all the columns are lined up by -the use of a variable number of spaces and @emph{empty fields are just -spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} -does not work well in this case. Although a portable @command{awk} program -can use a series of @code{substr()} calls on @code{$0} -(@pxref{String Functions}), -this is awkward and inefficient for a large number of fields. +with no distinctive field separator. We discuss this feature in +the following @value{SUBSECTION}s. + +@menu +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. +@end menu + +@node Fixed width data +@subsection Processing Fixed-Width Data + +An example of fixed-width data would be the input for old Fortran programs +where numbers are run together, or the output of programs that did not +anticipate the use of their output as input for other programs. + +An example of the latter is a table where all the columns are lined up +by the use of a variable number of spaces and @emph{empty fields are +just spaces}. Clearly, @command{awk}'s normal field splitting based +on @code{FS} does not work well in this case. Although a portable +@command{awk} program can use a series of @code{substr()} calls on +@code{$0} (@pxref{String Functions}), this is awkward and inefficient +for a large number of fields. @cindex troubleshooting, fatal errors, field widths@comma{} specifying @cindex @command{w} utility @@ -7260,11 +7377,12 @@ this is awkward and inefficient for a large number of fields. @cindex @command{gawk}, @code{FIELDWIDTHS} variable in The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in -variable @code{FIELDWIDTHS}. Each number specifies the width of the field, -@emph{including} columns between fields. If you want to ignore the columns -between fields, you can specify the width as a separate field that is -subsequently ignored. -It is a fatal error to supply a field width that has a negative value. +variable @code{FIELDWIDTHS}. Each number specifies the width of the +field, @emph{including} columns between fields. If you want to ignore +the columns between fields, you can specify the width as a separate +field that is subsequently ignored. It is a fatal error to supply a +field width that has a negative value. + The following data is the output of the Unix @command{w} utility. It is useful to illustrate the use of @code{FIELDWIDTHS}: @@ -7294,7 +7412,7 @@ NR > 2 @{ sub(/^ +/, "", idle) # strip leading spaces if (idle == "") idle = 0 - if (idle ~ /:/) @{ + if (idle ~ /:/) @{ # hh:mm split(idle, t, ":") idle = t[1] * 60 + t[2] @} @@ -7333,30 +7451,90 @@ program for processing such data could use the @code{FIELDWIDTHS} feature to simplify reading the data. (Of course, getting @command{gawk} to run on a system with card readers is another story!) -@cindex @command{gawk}, splitting fields and -Assigning a value to @code{FS} causes @command{gawk} to use -@code{FS} for field splitting again. Use @samp{FS = FS} to make this happen, -without having to know the current value of @code{FS}. -In order to tell which kind of field splitting is in effect, -use @code{PROCINFO["FS"]} -(@pxref{Auto-set}). -The value is @code{"FS"} if regular field splitting is being used, -or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: +@node Skipping intervening +@subsection Skipping Intervening Fields + +Starting in @value{PVERSION} 4.2, each field width may optionally be +preceded by a colon-separated value specifying the number of characters +to skip before the field starts. Thus, the preceding program could be +rewritten to specify @code{FIELDWIDTHS} like so: @example -if (PROCINFO["FS"] == "FS") - @var{regular field splitting} @dots{} -else if (PROCINFO["FS"] == "FIELDWIDTHS") - @var{fixed-width field splitting} @dots{} -else - @var{content-based field splitting} @dots{} @ii{(see next @value{SECTION})} +BEGIN @{ FIELDWIDTHS = "8 1:5 4:7 6 1:6 1:6 2:33" @} +@end example + +This strips away some of the white space separating the fields. With such +a change, the program produces the following results: + +@example +hzang ttyV3 50 +eklye ttyV5 0 +dportein ttyV6 107 +gierd ttyD3 1 +dave ttyD4 0 +brent ttyp0 286 +dave ttyq4 1296000 @end example -This information is useful when writing a function -that needs to temporarily change @code{FS} or @code{FIELDWIDTHS}, -read some records, and then restore the original settings -(@pxref{Passwd Functions} -for an example of such a function). +@node Allowing trailing data +@subsection Capturing Optional Trailing Data + +There are times when fixed-width data may be followed by additional data +that has no fixed length. Such data may or may not be present, but if +it is, it should be possible to get at it from an @command{awk} program. + +Starting with version 4.2, in order to provide a way to say ``anything +else in the record after the defined fields,'' @command{gawk} +allows you to add a final @samp{*} character to the value of +@code{FIELDWIDTHS}. There can only be one such character, and it must +be the final non-whitespace character in @code{FIELDWIDTHS}. +For example: + +@example +$ @kbd{cat fw.awk} @ii{Show the program} +@print{} BEGIN @{ FIELDWIDTHS = "2 2 *" @} +@print{} @{ print NF, $1, $2, $3 @} +$ @kbd{cat fw.in} @ii{Show sample input} +@print{} 1234abcdefghi +$ @kbd{gawk -f fw.awk fw.in} @ii{Run the program} +@print{} 3 12 34 abcdefghi +@end example + +@node Fields with fixed data +@subsection Field Values With Fixed-Width Data + +So far, so good. But what happens if there isn't as much data as there +should be based on the contents of @code{FIELDWIDTHS}? Or, what happens +if there is more data than expected? + +For many years, what happens in these cases was not well defined. Starting +with version 4.2, the rules are as follows: + +@table @asis +@item Enough data for some fields +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbb}. In this case, @code{NF} is set to two. + +@item Not enough data for a field +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aab}. In this case, @code{NF} is set to two and +@code{$2} has the value @code{"b"}. The idea is that even though there +aren't as many characters as were expected, there are some, so the data +should be made available to the program. + +@item Too much data +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +three and the extra characters (@samp{ddd}) are ignored. If you want +@command{gawk} to capture the extra characters, supply a final @samp{*} +in the value of @code{FIELDWIDTHS}. + +@item Too much data, but with @samp{*} supplied +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +four, and @code{$4} has the value @code{"ddd"}. + +@end table @node Splitting By Content @section Defining Fields by Content @@ -7457,8 +7635,6 @@ affects field splitting with @code{FPAT}. Assigning a value to @code{FPAT} overrides field splitting with @code{FS} and with @code{FIELDWIDTHS}. -Similar to @code{FIELDWIDTHS}, the value of @code{PROCINFO["FS"]} -will be @code{"FPAT"} if content-based field splitting is being used. @quotation NOTE Some programs export CSV data that contains embedded newlines between @@ -7485,11 +7661,44 @@ FPAT = "([^,]*)|(\"[^\"]+\")" Finally, the @code{patsplit()} function makes the same functionality available for splitting regular strings (@pxref{String Functions}). -To recap, @command{gawk} provides three independent methods -to split input records into fields. -The mechanism used is based on which of the three -variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was -last assigned to. + +@node Testing field creation +@section Checking How @command{gawk} Is Splitting Records + +@cindex @command{gawk}, splitting fields and +As we've seen, @command{gawk} provides three independent methods to split +input records into fields. The mechanism used is based on which of the +three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was +last assigned to. In addition, an API input parser may choose to override +the record parsing mechanism; please refer to @ref{Input Parsers} for +further information about this feature. + +To restore normal field splitting after using @code{FIELDWIDTHS} +and/or @code{FPAT}, simply assign a value to @code{FS}. +You can use @samp{FS = FS} to do this, +without having to know the current value of @code{FS}. + +In order to tell which kind of field splitting is in effect, +use @code{PROCINFO["FS"]} (@pxref{Auto-set}). +The value is @code{"FS"} if regular field splitting is being used, +@code{"FIELDWIDTHS"} if fixed-width field splitting is being used, +or @code{"FPAT"} if content-based field splitting is being used: + +@example +if (PROCINFO["FS"] == "FS") + @var{regular field splitting} @dots{} +else if (PROCINFO["FS"] == "FIELDWIDTHS") + @var{fixed-width field splitting} @dots{} +else if (PROCINFO["FS"] == "FPAT") + @var{content-based field splitting} +else + @var{API input parser field splitting} @dots{} @ii{(advanced feature)} +@end example + +This information is useful when writing a function that needs to +temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records, +and then restore the original settings (@pxref{Passwd Functions} for an +example of such a function). @node Multiple Line @section Multiple-Line Records @@ -7707,6 +7916,13 @@ a record, such as a file that cannot be opened, then @code{getline} returns @minus{}1. In this case, @command{gawk} sets the variable @code{ERRNO} to a string describing the error that occurred. +If @code{ERRNO} indicates that the I/O operation may be +retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set, +then @code{getline} returns @minus{}2 +instead of @minus{}1, and further calls to @code{getline} +may be attempted. @xref{Retrying Input} for further information about +this feature. + In the following examples, @var{command} stands for a string value that represents a shell command. @@ -8361,7 +8577,8 @@ on a per-command or per-connection basis. the attempt to read from the underlying device may succeed in a later attempt. This is a limitation, and it also means that you cannot use this to multiplex input from -two or more sources. +two or more sources. @xref{Retrying Input} for a way to enable +later I/O attempts to succeed. Assigning a timeout value prevents read operations from blocking indefinitely. But bear in mind that there are other ways @@ -8371,6 +8588,36 @@ a connection before it can start reading any data, or the attempt to open a FIFO special file for reading can block indefinitely until some other process opens it for writing. +@node Retrying Input +@section Retrying Reads After Certain Input Errors +@cindex retrying input + +@cindex differences in @command{awk} and @command{gawk}, retrying input +This @value{SECTION} describes a feature that is specific to @command{gawk}. + +When @command{gawk} encounters an error while reading input, by +default @code{getline} returns @minus{}1, and subsequent attempts to +read from that file result in an end-of-file indication. However, you +may optionally instruct @command{gawk} to allow I/O to be retried when +certain errors are encountered by setting a special element in +the @code{PROCINFO} array (@pxref{Auto-set}): + +@example +PROCINFO["@var{input_name}", "RETRY"] = 1 +@end example + +When this element exists, @command{gawk} checks the value of the system +(C language) +@code{errno} variable when an I/O error occurs. If @code{errno} indicates +a subsequent I/O attempt may succeed, @code{getline} instead returns +@minus{}2 and +further calls to @code{getline} may succeed. This applies to the @code{errno} +values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}. + +This feature is useful in conjunction with +@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file +descriptor has been configured to behave in a non-blocking fashion. + @node Command-line directories @section Directories on the Command Line @cindex differences in @command{awk} and @command{gawk}, command-line directories @@ -8532,6 +8779,7 @@ and discusses the @code{close()} built-in function. @command{gawk} allows access to inherited file descriptors. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. @end menu @@ -9912,17 +10160,26 @@ a system problem closing the file or process. In these cases, @command{gawk} sets the predefined variable @code{ERRNO} to a string describing the problem. -In @command{gawk}, -when closing a pipe or coprocess (input or output), -the return value is the exit status of the command.@footnote{ -This is a full 16-bit value as returned by the @code{wait()} -system call. See the system manual pages for information on -how to decode this value.} -Otherwise, it is the return value from the system's @code{close()} or -@code{fclose()} C functions when closing input or output -files, respectively. -This value is zero if the close succeeds, or @minus{}1 if -it fails. +In @command{gawk}, starting with @value{PVERSION} 4.2, when closing a pipe or +coprocess (input or output), the return value is the exit status of the +command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior +to @value{PVERSION} 4.2, the return value from closing a pipe or co-process +was the full 16-bit exit value as defined by the @code{wait()} system +call.} Otherwise, it is the return value from the system's @code{close()} +or @code{fclose()} C functions when closing input or output files, +respectively. This value is zero if the close succeeds, or @minus{}1 +if it fails. + +@float Table,table-close-pipe-return-values +@caption{Return values from @code{close()} of a pipe} +@multitable @columnfractions .40 .60 +@headitem Situation @tab Return value from @code{close()} +@item Normal exit of command @tab Command's exit status +@item Death by signal of command @tab 256 + number of murderous signal +@item Death by signal of command with core dump @tab 512 + number of murderous signal +@item Some kind of error @tab @minus{}1 +@end multitable +@end float The POSIX standard is very vague; it says that @code{close()} returns zero on success and a nonzero value otherwise. In general, @@ -9933,6 +10190,70 @@ In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. @end sidebar +@node Nonfatal +@section Enabling Nonfatal Output + +This @value{SECTION} describes a @command{gawk}-specific feature. + +In standard @command{awk}, output with @code{print} or @code{printf} +to a nonexistent file, or some other I/O error (such as filling up the +disk) is a fatal error. + +@example +$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'} +@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory) +@end example + +@command{gawk} makes it possible to detect that an error has +occurred, allowing you to possibly recover from the error, or +at least print an error message of your choosing before exiting. +You can do this in one of two ways: + +@itemize @bullet +@item +For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}. + +@item +On a per-file basis, by assigning any value to +@code{PROCINFO[@var{filename}, "NONFATAL"]}. +Here, @var{filename} is the name of the file to which +you wish output to be nonfatal. +@end itemize + +Once you have enabled nonfatal output, you must check @code{ERRNO} +after every relevant @code{print} or @code{printf} statement to +see if something went wrong. It is also a good idea to initialize +@code{ERRNO} to zero before attempting the output. For example: + +@example +$ @kbd{gawk '} +> @kbd{BEGIN @{} +> @kbd{ PROCINFO["NONFATAL"] = 1} +> @kbd{ ERRNO = 0} +> @kbd{ print "hi" > "/no/such/file"} +> @kbd{ if (ERRNO) @{} +> @kbd{ print("Output failed:", ERRNO) > "/dev/stderr"} +> @kbd{ exit 1} +> @kbd{ @}} +> @kbd{@}'} +@error{} Output failed: No such file or directory +@end example + +Here, @command{gawk} did not produce a fatal error; instead +it let the @command{awk} program code detect the problem and handle it. + +This mechanism works also for standard output and standard error. +For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} +or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use +@code{PROCINFO["/dev/stderr", "NONFATAL"]}. + +When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), +@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} +environment variable (@pxref{Other Environment Variables}) allows you to +override @command{gawk}'s builtin default number of attempts. However, +once nonfatal I/O is enabled for a given socket, @command{gawk} only +retries once, relying on @command{awk}-level code to notice that there +was a problem. @node Output Summary @section Summary @@ -9962,6 +10283,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections. For coprocesses, it is possible to close only one direction of the communications. +@item +Normally errors with @code{print} or @code{printf} are fatal. +@command{gawk} lets you make output errors be nonfatal either for +all files or on a per-file basis. You must then check for errors +after every relevant output statement. + @end itemize @c EXCLUDE START @@ -10109,7 +10436,7 @@ Just as @samp{11} in decimal is 1 times 10 plus 1, so @samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal. In hexadecimal, there are 16 digits. Because the everyday decimal number system only has ten digits (@samp{0}--@samp{9}), the letters -@samp{a} through @samp{f} are used to represent the rest. +@samp{a} through @samp{f} represent the rest. (Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A} have the same value.) Thus, @samp{11} in @@ -10212,6 +10539,20 @@ but could be more complex expressions). @node Using Constant Regexps @subsection Using Regular Expression Constants +Regular expression constants consist of text describing +a regular expression enclosed in slashes (such as @code{/the +answer/}). +This @value{SECTION} describes how such constants work in +POSIX @command{awk} and @command{gawk}, and then goes on to describe +@dfn{strongly typed regexp constants}, which are a @command{gawk} extension. + +@menu +* Standard Regexp Constants:: Regexp constants in standard @command{awk}. +* Strong Regexp Constants:: Strongly typed regexp constants. +@end menu + +@node Standard Regexp Constants +@subsubsection Standard Regular Expression Constants + @cindex dark corner, regexp constants When used on the righthand side of the @samp{~} or @samp{!~} operators, a regexp constant merely stands for the regexp that is to be @@ -10319,6 +10660,90 @@ or not @code{$0} matches @code{/hi/}. a parameter to a user-defined function, because passing a truth value in this way is probably not what was intended. +@node Strong Regexp Constants +@subsubsection Strongly Typed Regexp Constants + +This @value{SECTION} describes a @command{gawk}-specific feature. + +As we saw in the previous @value{SECTION}, +regexp constants (@code{/@dots{}/}) hold a strange position in the +@command{awk} language. In most contexts, they act like an expression: +@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to +be matched. In no case are they really a ``first class citizen'' of the +language. That is, you cannot define a scalar variable whose type is +``regexp'' in the same sense that you can define a variable to be a +number or a string: + +@example +num = 42 @ii{Numeric variable} +str = "hi" @ii{String variable} +re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/ +@end example + +For a number of more advanced use cases, +it would be nice to have regexp constants that +are @dfn{strongly typed}; in other words, that denote a regexp useful +for matching, and not an expression. + +@command{gawk} provides this feature. A strongly typed regexp constant +looks almost like a regular regexp constant, except that it is preceded +by an @samp{@@} sign: + +@example +re = @@/foo/ @ii{Regexp variable} +@end example + +Strongly typed regexp constants @emph{cannot} be used everywhere that a +regular regexp constant can, because this would make the language even more +confusing. Instead, you may use them only in certain contexts: + +@itemize @bullet +@item +On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/} +(@pxref{Regexp Usage}). + +@item +In the @code{case} part of a @code{switch} statement +(@pxref{Switch Statement}). + +@item +As an argument to one of the built-in functions that accept regexp constants: +@code{gensub()}, +@code{gsub()}, +@code{match()}, +@code{patsplit()}, +@code{split()}, +and +@code{sub()} +(@pxref{String Functions}). + +@item +As a parameter in a call to a user-defined function +(@pxref{User-defined}). + +@item +On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}. +In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var} +can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions +listed above, or passed as a parameter to a user-defined function. +@end itemize + +You may use the @code{typeof()} built-in function +(@pxref{Type Functions}) +to determine if a variable or function parameter is +a regexp variable. + +The true power of this feature comes from the ability to create variables that +have regexp type. Such variables can be passed on to user-defined functions, +without the confusing aspects of computed regular expressions created from +strings or string constants. They may also be passed through indirect function +calls (@pxref{Indirect Calls}) +and on to the built-in functions that accept regexp constants. + +When used in numeric conversions, strongly typed regexp variables convert +to zero. When used in string conversions, they convert to the string +value of the original regexp text. + @node Variables @subsection Variables @@ -11355,17 +11780,94 @@ compares variables. @node Variable Typing @subsubsection String Type versus Numeric Type +Scalar objects in @command{awk} (variables, array elements, and fields) +are @emph{dynamically} typed. This means their type can change as the +program runs, from @dfn{untyped} before any use,@footnote{@command{gawk} +calls this @dfn{unassigned}, as the following example shows.} to string +or number, and then from string to number or number to string, as the +program progresses. (@command{gawk} also provides regexp-typed scalars, +but let's ignore that for now; @pxref{Strong Regexp Constants}.) + +You can't do much with untyped variables, other than tell that they +are untyped. The following program tests @code{a} against @code{""} +and @code{0}; the test succeeds when @code{a} has never been assigned +a value. It also uses the built-in @code{typeof()} function +(not presented yet; @pxref{Type Functions}) to show @code{a}'s type: + +@example +$ @kbd{gawk 'BEGIN @{ print (a == "" && a == 0 ?} +> @kbd{"a is untyped" : "a has a type!") ; print typeof(a) @}'} +@print{} a is untyped +@print{} unassigned +@end example + +A scalar has numeric type when assigned a numeric value, +such as from a numeric constant, or from another scalar +with numeric type: + +@example +$ @kbd{gawk 'BEGIN @{ a = 42 ; print typeof(a)} +> @kbd{b = a ; print typeof(b) @}'} +number +number +@end example + +Similarly, a scalar has string type when assigned a string +value, such as from a string constant, or from another scalar +with string type: + +@example +$ @kbd{gawk 'BEGIN @{ a = "forty two" ; print typeof(a)} +> @kbd{b = a ; print typeof(b) @}'} +string +string +@end example + +So far, this is all simple and straightforward. What happens, though, +when @command{awk} has to process data from a user? Let's start with +field data. What should the following command produce as output? + +@example +echo hello | awk '@{ printf("%s %s < 42\n", $1, + ($1 < 42 ? "is" : "is not")) @}' +@end example + +@noindent +Since @samp{hello} is alphabetic data, @command{awk} can only do a string +comparison. Internally, it converts @code{42} into @code{"42"} and compares +the two string values @code{"hello"} and @code{"42"}. Here's the result: + +@example +$ @kbd{echo hello | awk '@{ printf("%s %s < 42\n", $1,} +> @kbd{ ($1 < 42 ? "is" : "is not")) @}'} +@print{} hello is not < 42 +@end example + +However, what happens when data from a user @emph{looks like} a number? +On the one hand, in reality, the input data consists of characters, not +binary numeric +values. But, on the other hand, the data looks numeric, and @command{awk} +really ought to treat it as such. And indeed, it does: + +@example +$ @kbd{echo 37 | awk '@{ printf("%s %s < 42\n", $1,} +> @kbd{ ($1 < 42 ? "is" : "is not")) @}'} +@print{} 37 is < 42 +@end example + +Here are the rules for when @command{awk} +treats data as a number, and for when it treats data as a string. + @cindex numeric, strings @cindex strings, numeric @cindex POSIX @command{awk}, numeric strings and -The POSIX standard introduced -the concept of a @dfn{numeric string}, which is simply a string that looks -like a number---for example, @code{@w{" +2"}}. This concept is used -for determining the type of a variable. -The type of the variable is important because the types of two variables -determine how they are compared. -Variable typing follows these rules: +The POSIX standard uses the term @dfn{numeric string} for input data that +looks numeric. The @samp{37} in the previous example is a numeric string. +So what is the type of a numeric string? Answer: numeric. +The type of a variable is important because the types of two variables +determine how they are compared. +Variable typing follows these definitions and rules: @itemize @value{BULLET} @item @@ -11380,7 +11882,9 @@ attribute. Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, @code{ENVIRON} elements, and the elements of an array created by @code{match()}, @code{split()}, and @code{patsplit()} that are numeric -strings have the @dfn{strnum} attribute. Otherwise, they have +strings have the @dfn{strnum} attribute.@footnote{Thus, a POSIX +numeric string and @command{gawk}'s strnum are the same thing.} +Otherwise, they have the @dfn{string} attribute. Uninitialized variables also have the @dfn{strnum} attribute. @@ -11454,7 +11958,7 @@ STRNUM &&string &numeric &numeric\cr @end tex @ifnottex @ifnotdocbook -@display +@verbatim +---------------------------------------------- | STRING NUMERIC STRNUM --------+---------------------------------------------- @@ -11465,7 +11969,7 @@ NUMERIC | string numeric numeric | STRNUM | string numeric numeric --------+---------------------------------------------- -@end display +@end verbatim @end ifnotdocbook @end ifnottex @docbook @@ -11524,10 +12028,14 @@ purposes. In short, when one operand is a ``pure'' string, such as a string constant, then a string comparison is performed. Otherwise, a numeric comparison is performed. +(The primary difference between a number and a strnum is that +for strnums @command{gawk} preserves the original string value that +the scalar had when it came in.) + +This point bears additional emphasis: +Input that looks numeric @emph{is} numeric. +All other input is treated as strings. -This point bears additional emphasis: All user input is made of characters, -and so is first and foremost of string type; input strings -that look numeric are additionally given the strnum attribute. Thus, the six-character input string @w{@samp{ +3.14}} receives the strnum attribute. In contrast, the eight characters @w{@code{" +3.14"}} appearing in program text comprise a string constant. @@ -11554,6 +12062,14 @@ $ @kbd{echo ' +3.14' | awk '@{ print($1 == 3.14) @}'} @ii{True} @print{} 1 @end example +You can see the type of an input field (or other user input) +using @code{typeof()}: + +@example +$ @kbd{echo hello 37 | gawk '@{ print typeof($1), typeof($2) @}'} +@print{} string strnum +@end example + @node Comparison Operators @subsubsection Comparison Operators @@ -11713,19 +12229,19 @@ One special place where @code{/foo/} is @emph{not} an abbreviation for where this is discussed in more detail. @node POSIX String Comparison -@subsubsection String Comparison with POSIX Rules +@subsubsection String Comparison Based on Locale Collating Order -The POSIX standard says that string comparison is performed based -on the locale's @dfn{collating order}. This is the order in which -characters sort, as defined by the locale (for more discussion, -@pxref{Locales}). This order is usually very different -from the results obtained when doing straight character-by-character -comparison.@footnote{Technically, string comparison is supposed -to behave the same way as if the strings were compared with the C -@code{strcoll()} function.} +The POSIX standard used to say that all string comparisons are +performed based on the locale's @dfn{collating order}. This +is the order in which characters sort, as defined by the locale +(for more discussion, @pxref{Locales}). This order is usually very +different from the results obtained when doing straight byte-by-byte +comparison.@footnote{Technically, string comparison is supposed to behave +the same way as if the strings were compared with the C @code{strcoll()} +function.} Because this behavior differs considerably from existing practice, -@command{gawk} only implements it when in POSIX mode (@pxref{Options}). +@command{gawk} only implemented it when in POSIX mode (@pxref{Options}). Here is an example to illustrate the difference, in an @code{en_US.UTF-8} locale: @@ -11738,6 +12254,26 @@ $ @kbd{gawk --posix 'BEGIN @{ printf("ABC < abc = %s\n",} @print{} ABC < abc = FALSE @end example +Fortunately, as of August 2016, comparison based on locale +collating order is no longer required for the @code{==} and @code{!=} +operators.@footnote{See @uref{http://austingroupbugs.net/view.php?id=1070, +the Austin Group website}.} However, comparison based on locales is still +required for @code{<}, @code{<=}, @code{>}, and @code{>=}. POSIX thus +recommends as follows: + +@quotation +Since the @code{==} operator checks whether strings are identical, +not whether they collate equally, applications needing to check whether +strings collate equally can use: + +@example +a <= b && a >= b +@end example +@end quotation + +As of @value{PVERSION} 4.2, @command{gawk} continues to use locale +collating order for @code{<}, @code{<=}, @code{>}, and @code{>=} only +in POSIX mode. @node Boolean Ops @subsection Boolean Expressions @@ -13867,6 +14403,9 @@ Its default value is @code{"%.6g"}. @item FIELDWIDTHS # A space-separated list of columns that tells @command{gawk} how to split input with fixed columnar boundaries. +Starting in @value{PVERSION} 4.2, each field width may optionally be +preceded by a colon-separated value specifying the number of characters to skip +before the field starts. Assigning a value to @code{FIELDWIDTHS} overrides the use of @code{FS} and @code{FPAT} for field splitting. @xref{Constant Size} for more information. @@ -13897,12 +14436,11 @@ specify the behavior when @code{FS} is the null string. Nonetheless, some other versions of @command{awk} also treat @code{""} specially.) -@cindex POSIX @command{awk}, @code{FS} variable and The default value is @w{@code{" "}}, a string consisting of a single -space. As a special exception, this value means that any -sequence of spaces, TABs, and/or newlines is a single separator.@footnote{In -POSIX @command{awk}, newline does not count as whitespace.} It also causes -spaces, TABs, and newlines at the beginning and end of a record to be ignored. +space. As a special exception, this value means that any sequence of +spaces, TABs, and/or newlines is a single separator. It also causes +spaces, TABs, and newlines at the beginning and end of a record to +be ignored. You can set the value of @code{FS} on the command line using the @option{-F} option: @@ -14126,10 +14664,24 @@ opens the next file. An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @code{"/home/arnold"}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system()} function. -(In a future version of @command{gawk}, it may do so.) +@code{ENVIRON["HOME"]} might be @code{/home/arnold}. + +For POSIX @command{awk}, changing this array does not affect the +environment passed on to any programs that @command{awk} may spawn via +redirection or the @code{system()} function. + +However, beginning with @value{PVERSION} 4.2, if not in POSIX +compatibility mode, @command{gawk} does update its own environment when +@code{ENVIRON} is changed, thus changing the environment seen by programs +that it creates. You should therefore be especially careful if you +modify @code{ENVIRON["PATH"]}, which is the search path for finding +executable programs. + +This can also affect the running @command{gawk} program, since some of the +built-in functions may pay attention to certain environment variables. +The most notable instance of this is @code{mktime()} (@pxref{Time +Functions}), which pays attention the value of the @env{TZ} environment +variable on many systems. Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @@ -14163,6 +14715,11 @@ value to be meaningful when an I/O operation returns a failure value, such as @code{getline} returning @minus{}1. You are, of course, free to clear it yourself before doing an I/O operation. +If the value of @code{ERRNO} corresponds to a system error in the C +@code{errno} variable, then @code{PROCINFO["errno"]} will be set to the value +of @code{errno}. For non-system errors, @code{PROCINFO["errno"]} will +be zero. + @cindex @code{FILENAME} variable @cindex dark corner, @code{FILENAME} variable @item @code{FILENAME} @@ -14227,10 +14784,35 @@ The following elements (listed alphabetically) are guaranteed to be available: @table @code +@item PROCINFO["argv"] +@cindex command line arguments, @code{PROCINFO["argv"} +The @code{PROCINFO["argv"]} array contains all of the command-line arguments +(after glob expansion and redirection processing on platforms where that must +be done manually by the program) with subscripts ranging from 0 through +@code{argc} @minus{} 1. For example, @code{PROCINFO["argv"][0]} will contain +the name by which @command{gawk} was invoked. Here is an example of how this +feature may be used: + +@example +gawk ' +BEGIN @{ + for (i = 0; i < length(PROCINFO["argv"]); i++) + print i, PROCINFO["argv"][i] +@}' +@end example + +Please note that this differs from the standard @code{ARGV} array which does +not include command-line arguments that have already been processed by +@command{gawk} (@pxref{ARGC and ARGV}). + @cindex effective group ID of @command{gawk} user @item PROCINFO["egid"] The value of the @code{getegid()} system call. +@item PROCINFO["errno"] +The value of the C @code{errno} variable when @code{ERRNO} is set to +the associated error message. + @item PROCINFO["euid"] @cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @@ -14239,7 +14821,8 @@ The value of the @code{geteuid()} system call. This is @code{"FS"} if field splitting with @code{FS} is in effect, @code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect, -or @code{"FPAT"} if field matching with @code{FPAT} is in effect. +@code{"FPAT"} if field matching with @code{FPAT} is in effect, +or @code{"API"} if field splitting is controlled by an API input parser. @item PROCINFO["gid"] @cindex group ID of @command{gawk} user @@ -14354,6 +14937,14 @@ to test for these elements The following elements allow you to change @command{gawk}'s behavior: @table @code +@item PROCINFO["NONFATAL"] +If this element exists, then I/O errors for all output redirections become nonfatal. +@xref{Nonfatal}. + +@item PROCINFO["@var{output_name}", "NONFATAL"] +Make output errors for @var{output_name} be nonfatal. +@xref{Nonfatal}. + @item PROCINFO["@var{command}", "pty"] For two-way communication to @var{command}, use a pseudo-tty instead of setting up a two-way pipe. @@ -16248,6 +16839,27 @@ truncated toward zero. For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. +@item @code{intdiv(@var{numerator}, @var{denominator}, @var{result})} +@cindexawkfunc{intdiv} +@cindex intdiv +Perform integer division, similar to the standard C @code{div()} function. +First, truncate @code{numerator} and @code{denominator} +towards zero, creating integer values. Clear the @code{result} +array, and then set @code{result["quotient"]} to the result of +@samp{numerator / denominator}, truncated towards zero to an integer, +and set @code{result["remainder"]} to the result of @samp{numerator % +denominator}, truncated towards zero to an integer. +Attempting division by zero causes a fatal error. +The function returns zero upon success, and @minus{}1 upon error. + +This function is +primarily intended for use with arbitrary length integers; it avoids +creating MPFR arbitrary precision floating-point values (@pxref{Arbitrary +Precision Integers}). + +This function is a @code{gawk} extension. It is not available in +compatibility mode (@pxref{Options}). + @item @code{log(@var{x})} @cindexawkfunc{log} @cindex logarithm @@ -16767,7 +17379,7 @@ using a third argument is a fatal error. @cindexgawkfunc{patsplit} @cindex split string into array Divide -@var{string} into pieces defined by @var{fieldpat} +@var{string} into pieces (or ``fields'') defined by @var{fieldpat} and store the pieces in @var{array} and the separator strings in the @var{seps} array. The first piece is stored in @code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so @@ -16778,9 +17390,11 @@ It may be either a regexp constant or a string. If @var{fieldpat} is omitted, the value of @code{FPAT} is used. @code{patsplit()} returns the number of elements created. @code{@var{seps}[@var{i}]} is -the separator string -between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}. -Any leading separator will be in @code{@var{seps}[0]}. +the possibly null separator string +after @code{@var{array}[@var{i}]}. +The possibly null leading separator will be in @code{@var{seps}[0]}. +So a non-null @var{string} with @var{n} fields will have @var{n+1} separators. +A null @var{string} will not have neither fields nor separators. The @code{patsplit()} function splits strings into pieces in a manner similar to the way input lines are split into fields using @code{FPAT} @@ -17733,7 +18347,7 @@ Optional parameters are enclosed in square brackets ([ ]): @c @asis for docbook @table @asis -@item @code{mktime(@var{datespec})} +@item @code{mktime(@var{datespec}} [@code{, @var{utc-flag}} ]@code{)} @cindexgawkfunc{mktime} @cindex generate time values Turn @var{datespec} into a timestamp in the same form @@ -17752,7 +18366,9 @@ The values of these numbers need not be within the ranges specified; for example, an hour of @minus{}1 means 1 hour before midnight. The origin-zero Gregorian calendar is assumed, with year 0 preceding year 1 and year @minus{}1 preceding year 0. -The time is assumed to be in the local time zone. +If @var{utc-flag} is present and is either nonzero or non-null, the time +is assumed to be in the UTC time zone; otherwise, the +time is assumed to be in the local time zone. If the daylight-savings flag is positive, the time is assumed to be daylight savings time; if zero, the time is assumed to be standard time; and if negative (the default), @code{mktime()} attempts to determine @@ -18252,12 +18868,12 @@ Return the value of @var{val}, shifted right by @var{count} bits. Return the bitwise XOR of the arguments. There must be at least two. @end table -For all of these functions, first the double-precision floating-point value is -converted to the widest C unsigned integer type, then the bitwise operation is -performed. If the result cannot be represented exactly as a C @code{double}, -leading nonzero bits are removed one by one until it can be represented -exactly. The result is then converted back into a C @code{double}. (If -you don't understand this paragraph, don't worry about it.) +@quotation CAUTION +Beginning with @command{gawk} @value{PVERSION} 4.2, negative +operands are not allowed for any of these functions. A negative +operand produces a fatal error. See the sidebar +``Beware The Smoke and Mirrors!'' for more information as to why. +@end quotation Here is a user-defined function (@pxref{User-defined}) that illustrates the use of these functions: @@ -18362,19 +18978,128 @@ decimal and octal values for the same numbers and then demonstrates the results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. +@sidebar Beware The Smoke and Mirrors! + +It other languages, bitwise operations are performed on integer values, +not floating-point values. As a general statement, such operations work +best when performed on unsigned integers. + +@command{gawk} attempts to treat the arguments to the bitwise functions +as unsigned integers. For this reason, negative arguments produce a +fatal error. + +In normal operation, for all of these functions, first the +double-precision floating-point value is converted to the widest C +unsigned integer type, then the bitwise operation is performed. If the +result cannot be represented exactly as a C @code{double}, leading +nonzero bits are removed one by one until it can be represented exactly. +The result is then converted back into a C @code{double}.@footnote{If you don't +understand this paragraph, the upshot is that @command{gawk} can only +store a particular range of integer values; numbers outside that range +are reduced to fit within the range.} + +However, when using arbitrary precision arithmetic with the @option{-M} +option (@pxref{Arbitrary Precision Arithmetic}), the results may differ. +This is particularly noticeable with the @code{compl()} function: + +@example +$ @kbd{gawk 'BEGIN @{ print compl(42) @}'} +@print{} 9007199254740949 +$ @kbd{gawk -M 'BEGIN @{ print compl(42) @}'} +@print{} -43 +@end example + +What's going on becomes clear when printing the results +in hexadecimal: + +@example +$ @kbd{gawk 'BEGIN @{ printf "%#x\n", compl(42) @}'} +@print{} 0x1fffffffffffd5 +$ @kbd{gawk -M 'BEGIN @{ printf "%#x\n", compl(42) @}'} +@print{} 0xffffffffffffffd5 +@end example + +When using the @option{-M} option, under the hood, @command{gawk} uses +GNU MP arbitrary precision integers which have at least 64 bits of precision. +When not using @option{-M}, @command{gawk} stores integral values in +regular double-precision floating point, which only maintain 53 bits of +precision. Furthermore, the GNU MP library treats (or at least seems to treat) +the leading bit as a sign bit; thus the result with @option{-M} in this case is +a negative number. + +In short, using @command{gawk} for any but the simplest kind of bitwise +operations is probably a bad idea; caveat emptor! + +@end sidebar + @node Type Functions @subsection Getting Type Information -@command{gawk} provides a single function that lets you distinguish -an array from a scalar variable. This is necessary for writing code +@command{gawk} provides two functions that let you distinguish +the type of a variable. +This is necessary for writing code that traverses every element of an array of arrays -(@pxref{Arrays of Arrays}). +(@pxref{Arrays of Arrays}), and in other contexts. @table @code @cindexgawkfunc{isarray} @cindex scalar or array @item isarray(@var{x}) Return a true value if @var{x} is an array. Otherwise, return false. + +@cindexgawkfunc{typeof} +@cindex variable type +@cindex type, of variable +@item typeof(@var{x}) +Return one of the following strings, depending upon the type of @var{x}: + +@c nested table +@table @code +@item "array" +@var{x} is an array. + +@item "regexp" +@var{x} is a strongly typed regexp (@pxref{Strong Regexp Constants}). + +@item "number" +@var{x} is a number. + +@item "string" +@var{x} is a string. + +@item "strnum" +@var{x} is a number that started life as user input, such as a field or +the result of calling @code{split()}. (I.e., @var{x} has the strnum +attribute; @pxref{Variable Typing}.) + +@item "unassigned" +@var{x} is a scalar variable that has not been assigned a value yet. +For example: + +@example +BEGIN @{ + # creates a[1] but it has no assigned value + a[1] + print typeof(a[1]) # unassigned +@} +@end example + +@item "untyped" +@var{x} has not yet been used yet at all; it can become a scalar or an +array. +For example: + +@example +BEGIN @{ + print typeof(x) # x never used --> untyped + mk_arr(x) + print typeof(x) # x now an array --> array +@} + +function mk_arr(a) @{ a[1] = 1 @} +@end example + +@end table @end table @code{isarray()} is meant for use in two circumstances. The first is when @@ -18392,6 +19117,14 @@ that has not been previously used to @code{isarray()}, @command{gawk} ends up turning it into a scalar. @end quotation +The @code{typeof()} function is general; it allows you to determine +if a variable or function parameter is a scalar, an array, or a strongly +typed regexp. + +@code{isarray()} is deprecated; you should use @code{typeof()} instead. +You should replace any existing uses of @samp{isarray(var)} in your +code with @samp{typeof(var) == "array"}. + @node I18N Functions @subsection String-Translation Functions @cindex @command{gawk}, string-translation functions @@ -26622,9 +27355,16 @@ your program to hang. (Thus, this particular feature is of much less use in practice than being able to close the @code{"to"} end.) @quotation CAUTION -It is a fatal error to write to the @code{"to"} end of a two-way -pipe which has been closed. It is also a fatal error to read +Normally, +it is a fatal error to write to the @code{"to"} end of a two-way +pipe which has been closed, and it is also a fatal error to read from the @code{"from"} end of a two-way pipe that has been closed. + +You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to +make such operations become nonfatal. If you do so, you then need +to check @code{ERRNO} after each @code{print}, @code{printf}, +or @code{getline}. +@xref{Nonfatal}, for more information. @end quotation @cindex @command{gawk}, @code{PROCINFO} array in @@ -27008,8 +27748,7 @@ The profiled version of your program may not look exactly like what you typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty-printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce -a standard representation. The disadvantage is that all source code -comments are lost. +a standard representation. Also, things such as: @example @@ -27103,10 +27842,39 @@ When called this way, @command{gawk} ``pretty-prints'' the program into @file{awkprof.out}, without any execution counts. @quotation NOTE -The @option{--pretty-print} option still runs your program. -This will change in the next major release. +Once upon a time, the @option{--pretty-print} option would also run +your program. This is is no longer the case. @end quotation +There is a significant difference between the output created when +profiling, and that created when pretty-printing. Pretty-printed output +preserves the original comments that were in the program, although their +placement may not correspond exactly to their original locations in the +source code.@footnote{@command{gawk} does the best it can to preserve +the distinction between comments at the end of a statement and comments +on lines by themselves. Due to implementation constraints, it does not +always do so correctly, particularly for @code{switch} statements. The +@command{gawk} maintainers hope to improve this in a subsequent +release.} + +However, as a deliberate design decision, profiling output @emph{omits} +the original program's comments. This allows you to focus on the +execution count data and helps you avoid the temptation to use the +profiler for pretty-printing. + +Additionally, pretty-printed output does not have the leading indentation +that the profiling output does. This makes it easy to pretty-print your +code once development is completed, and then use the result as the final +version of your program. + +Because the internal representation of your program is formatted to +recreate an @command{awk} program, profiling and pretty-printing +automatically disable @command{gawk}'s default optimizations. + +Pretty printing also preserves the original format of numeric +constants; if you used an octal or hexadecimal value in your source +code, it will appear that way in the output. + @node Advanced Features Summary @section Summary @@ -27147,8 +27915,7 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause @command{gawk} to dump the profile and keep going, including a function call stack. @item -You can also just ``pretty-print'' the program. This currently also runs -the program, but that will change in the next major release. +You can also just ``pretty-print'' the program. @end itemize @@ -29341,6 +30108,68 @@ The @command{gawk} debugger only accepts source code supplied with the @option{- @end itemize @ignore +@c 11/2016: This no longer applies after all the type cleanup work that's been done. +One other point is worth discussing. Conventional debuggers run in a +separate process (and thus address space) from the programs that they +debug (the @dfn{debuggee}, if you will). + +The @command{gawk} debugger is different; it is an integrated part +of @command{gawk} itself. This makes it possible, in rare cases, +for @command{gawk} to become an excellent demonstrator of Heisenberg +Uncertainty physics, where the mere act of observing something can change +it. Consider the following:@footnote{Thanks to Hermann Peifer for +this example.} + +@example +$ @kbd{cat test.awk} +@print{} @{ print typeof($1), typeof($2) @} +$ @kbd{cat test.data} +@print{} abc 123 +$ @kbd{gawk -f test.awk test.data} +@print{} strnum strnum +@end example + +This is all as expected: field data has the STRNUM attribute +(@pxref{Variable Typing}). Now watch what happens when we run +this program under the debugger: + +@example +$ @kbd{gawk -D -f test.awk test.data} +gawk> @kbd{w $1} @ii{Set watchpoint on} $1 +@print{} Watchpoint 1: $1 +gawk> @kbd{w $2} @ii{Set watchpoint on} $2 +@print{} Watchpoint 2: $2 +gawk> @kbd{r} @ii{Start the program} +@print{} Starting program: +@print{} Stopping in Rule ... +@print{} Watchpoint 1: $1 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "abc" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Keep going @dots{}} +@print{} Watchpoint 2: $2 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "123" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Get result from} typeof() +@print{} strnum number @ii{Result for} $2 @ii{isn't right} +@print{} Program exited normally with exit value: 0 +gawk> @kbd{quit} +@end example + +In this case, the act of comparing the new value of @code{$2} +with the old one caused @command{gawk} to evaluate it and determine that it +is indeed a number, and this is reflected in the result of +@code{typeof()}. + +Cases like this where the debugger is not transparent to the program's +execution should be rare. If you encounter one, please report it +(@pxref{Bugs}). +@end ignore + +@ignore Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! @end ignore @@ -29376,6 +30205,10 @@ If the GNU Readline library is available when @command{gawk} is compiled, it is used by the debugger to provide command-line history and editing. +@item +Usually, the debugger does not not affect the +program being debugged, but occasionally it can. + @end itemize @node Arbitrary Precision Arithmetic @@ -29408,6 +30241,7 @@ this is the place to be. * FP Math Caution:: Things to know. * Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with @command{gawk}. +* Checking for MPFR:: How to check if MPFR is available. * POSIX Floating Point Problems:: Standards Versus Existing Practice. * Floating point summary:: Summary of floating point discussion. @end menu @@ -30193,6 +31027,174 @@ to just use the following: gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @end example +When dividing two arbitrary precision integers with either +@samp{/} or @samp{%}, the result is typically an arbitrary +precision floating point value (unless the denominator evenly +divides into the numerator). In order to do integer division +or remainder with arbitrary precision integers, use the built-in +@code{intdiv()} function (@pxref{Numeric Functions}). + +You can simulate the @code{intdiv()} function in standard @command{awk} +using this user-defined function: + +@example +@c file eg/lib/intdiv.awk +# intdiv --- do integer division + +@c endfile +@ignore +@c file eg/lib/intdiv.awk +# +# Arnold Robbins, arnold@@skeeve.com, Public Domain +# July, 2014 +# +# Name changed from div() to intdiv() +# April, 2015 + +@c endfile + +@end ignore +@c file eg/lib/intdiv.awk +function intdiv(numerator, denominator, result) +@{ + split("", result) + + numerator = int(numerator) + denominator = int(denominator) + result["quotient"] = int(numerator / denominator) + result["remainder"] = int(numerator % denominator) + + return 0.0 +@} +@c endfile +@end example + +The following example program, contributed by Katie Wasserman, +uses @code{intdiv()} to +compute the digits of @value{PI} to as many places as you +choose to set: + +@example +@c file eg/prog/pi.awk +# pi.awk --- compute the digits of pi +@c endfile +@c endfile +@ignore +@c file eg/prog/pi.awk +# +# Katie Wasserman, katie@@wass.net +# August 2014 +@c endfile +@end ignore +@c file eg/prog/pi.awk + +BEGIN @{ + digits = 100000 + two = 2 * 10 ^ digits + pi = two + for (m = digits * 4; m > 0; --m) @{ + d = m * 2 + 1 + x = pi * m + intdiv(x, d, result) + pi = result["quotient"] + pi = pi + two + @} + print pi +@} +@c endfile +@end example + +@ignore +Date: Wed, 20 Aug 2014 10:19:11 -0400 +To: arnold@skeeve.com +From: Katherine Wasserman <katie@wass.net> +Subject: Re: computation of digits of pi? + +Arnold, + +>The program that you sent to compute the digits of pi using div(). Is +>that some standard algorithm that every math student knows? If so, +>what's it called? + +It's not that well known but it's not that obscure either + +It's Euler's modification to Newton's method for calculating pi. + +Take a look at lines (23) - (25) here: http://mathworld.wolfram.com/PiFormulas.htm + +The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators because it's quite easy to modify for tiny memory devices with smallish word sizes. + +http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899 + +-Katie +@end ignore + +When asked about the algorithm used, Katie replied: + +@quotation +It's not that well known but it's not that obscure either. +It's Euler's modification to Newton's method for calculating pi. +Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}. + +The algorithm I wrote simply expands the multiply by 2 and works from +the innermost expression outwards. I used this to program HP calculators +because it's quite easy to modify for tiny memory devices with smallish +word sizes. See +@uref{http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899}. +@end quotation + +@node Checking for MPFR +@section How To Check If MPFR Is Available + +@cindex MPFR, checking availability of +@cindex checking for MPFR +Occasionally, you might like to be able to check if @command{gawk} +was invoked with the @option{-M} option, enabling arbitrary-precision +arithmetic. You can do so with the following function, contributed +by Andrew Schorr: + +@example +@c file eg/lib/have_mpfr.awk +# adequate_math_precision --- return true if we have enough bits +@c endfile +@ignore +@c file eg/lib/have_mpfr.awk +# +# Andrew Schorr, aschorr@@telemetry-investments.com, Public Domain +# May 2017 +@c endfile +@end ignore +@c file eg/lib/have_mpfr.awk + +function adequate_math_precision(n) +@{ + return (1 != (1+(1/(2^(n-1))))) +@} +@c endfile +@end example + +Here is code that invokes the function in order to check +if arbitrary-precision arithmetic is available: + +@example +BEGIN @{ + # How many bits of mantissa precision are required + # for this program to function properly? + fpbits = 123 + + # We hope that we were invoked with MPFR enabled. If so, the + # following statement should configure calculations to our desired + # precision. + PREC = fpbits + + if (! adequate_math_precision(fpbits)) @{ + print("Error: insufficient computation precision available.\n" \ + "Try again with the -M argument?") > "/dev/stderr" + exit 1 + @} +@} +@end example + @node POSIX Floating Point Problems @section Standards Versus Existing Practice @@ -30592,8 +31594,11 @@ This (rather large) @value{SECTION} describes the API in detail. * Symbol Table Access:: Functions for accessing global variables. * Array Manipulation:: Functions for working with arrays. +* Redirection API:: How to access and manipulate + redirections. * Extension API Variables:: Variables provided by the API. * Extension API Boilerplate:: Boilerplate code for using the API. +* Changes from API V1:: Changes from V1 of the API. @end menu @node Extension API Functions Introduction @@ -30667,6 +31672,10 @@ Clearing an array @item Flattening an array for easy C-style looping over all its indices and elements @end itemize + +@item +Accessing and manipulating redirections. + @end itemize Some points about using the API: @@ -30720,14 +31729,26 @@ and is managed by @command{gawk} from then on. The API defines several simple @code{struct}s that map values as seen from @command{awk}. A value can be a @code{double}, a string, or an array (as in multidimensional arrays, or when creating a new array). + String values maintain both pointer and length, because embedded @sc{nul} characters are allowed. @quotation NOTE -By intent, strings are maintained using the current multibyte encoding (as -defined by @env{LC_@var{xxx}} environment variables) and not using wide -characters. This matches how @command{gawk} stores strings internally -and also how characters are likely to be input into and output from files. +By intent, @command{gawk} maintains strings using the current multibyte +encoding (as defined by @env{LC_@var{xxx}} environment variables) +and not using wide characters. This matches how @command{gawk} stores +strings internally and also how characters are likely to be input into +and output from files. +@end quotation + +@quotation NOTE +String values passed to an extension by @command{gawk} are always +@sc{nul}-terminated. Thus it is safe to pass such string values to +standard library and system routines. However, because @command{gawk} +allows embedded @sc{nul} characters in string data, before using the data +as a regular C string, you should check that the length for that string +passed to the extension matches the return value of @code{strlen()} +for it. @end quotation @item @@ -30810,6 +31831,8 @@ multibyte encoding. @itemx @ @ @ @ AWK_UNDEFINED, @itemx @ @ @ @ AWK_NUMBER, @itemx @ @ @ @ AWK_STRING, +@itemx @ @ @ @ AWK_REGEX, +@itemx @ @ @ @ AWK_STRNUM, @itemx @ @ @ @ AWK_ARRAY, @itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ @itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ @ /* for updating a previously created value */ @@ -30832,6 +31855,8 @@ The @code{val_type} member indicates what kind of value the @code{union} holds, and each member is of the appropriate type. @item #define str_value@ @ @ @ @ @ u.s +@itemx #define strnum_value@ @ @ str_value +@itemx #define regex_value@ @ @ @ str_value @itemx #define num_value@ @ @ @ @ @ u.d @itemx #define array_cookie@ @ @ u.a @itemx #define scalar_cookie@ @ u.scl @@ -30852,7 +31877,7 @@ and in more detail in @ref{Cached values}. @end table -Scalar values in @command{awk} are either numbers or strings. The +Scalar values in @command{awk} are numbers, strings, strnums, or typed regexps. The @code{awk_value_t} struct represents values. The @code{val_type} member indicates what is in the @code{union}. @@ -30861,6 +31886,26 @@ require more work. Because @command{gawk} allows embedded @sc{nul} bytes in string values, a string must be represented as a pair containing a data pointer and length. This is the @code{awk_string_t} type. +A strnum (numeric string) value is represented as a string and consists +of user input data that appears to be numeric. +When an extension creates a strnum value, the result is a string flagged +as user input. Subsequent parsing by @command{gawk} then determines whether it +looks like a number and should be treated as a strnum, or as a regular string. + +This is useful in cases where an extension function would like to do something +comparable to the @code{split()} function which sets the strnum attribute +on the array elements it creates. For example, an extension that implements +CSV splitting would want to use this feature. This is also useful for a +function that retrieves a data item from a database. The PostgreSQL +@code{PQgetvalue()} function, for example, returns a string that may be numeric +or textual depending on the contents. + +Typed regexp values (@pxref{Strong Regexp Constants}) are not of +much use to extension functions. Extension functions can tell that +they've received them, and create them for scalar values. Otherwise, +they can examine the text of the regexp through @code{regex_value.str} +and @code{regex_value.len}. + Identifiers (i.e., the names of global variables) can be associated with either scalar values or with arrays. In addition, @command{gawk} provides true arrays of arrays, where any given array element can @@ -30947,8 +31992,8 @@ to use its version of @code{free()} when the memory came from an unrelated version of @code{malloc()}, unexpected behavior would likely result. -Two convenience macros may be used for allocating storage -from @code{gawk_malloc()} and +Three convenience macros may be used for allocating storage +from @code{gawk_malloc()}, @code{gawk_calloc}, and @code{gawk_realloc()}. If the allocation fails, they cause @command{gawk} to exit with a fatal error message. They should be used as if they were procedure calls that do not return a value: @@ -30987,6 +32032,12 @@ strcpy(message, greet); make_malloced_string(message, strlen(message), & result); @end example +@item #define ezalloc(pointer, type, size, message) @dots{} +This is like @code{emalloc()}, but it calls @code{gawk_calloc()} +instead of @code{gawk_malloc()}. +The arguments are the same as for the @code{emalloc()} macro, but this +macro guarantees that the memory returned is initialized to zero. + @item #define erealloc(pointer, type, size, message) @dots{} This is like @code{emalloc()}, but it calls @code{gawk_realloc()} instead of @code{gawk_malloc()}. @@ -31027,6 +32078,31 @@ It returns @code{result}. @itemx make_number(double num, awk_value_t *result); This function simply creates a numeric value in the @code{awk_value_t} variable pointed to by @code{result}. + +@item static inline awk_value_t * +@itemx make_const_user_input(const char *string, size_t length, awk_value_t *result); +This function is identical to @code{make_const_string()}, but the string is +flagged as user input that should be treated as a strnum value if the contents +of the string are numeric. + +@item static inline awk_value_t * +@itemx make_malloced_user_input(const char *string, size_t length, awk_value_t *result); +This function is identical to @code{make_malloced_string()}, but the string is +flagged as user input that should be treated as a strnum value if the contents +of the string are numeric. + +@item static inline awk_value_t * +@itemx make_const_regex(const char *string, size_t length, awk_value_t *result); +This function creates a strongly typed regexp value by allocating a copy of the string. +@code{string} is the regular expression of length @code{len}. + +@item static inline awk_value_t * +@itemx make_malloced_regex(const char *string, size_t length, awk_value_t *result); +This function creates a strongly typed regexp value. @code{string} is +the regular expression of length @code{len}. It expects @code{string} +to be a @samp{char *} value pointing to data previously obtained from +@code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. + @end table @node Registration Functions @@ -31054,8 +32130,13 @@ Extension functions are described by the following record: @example typedef struct awk_ext_func @{ @ @ @ @ const char *name; -@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); -@ @ @ @ size_t num_expected_args; +@ @ @ @ awk_value_t *(*const function)(int num_actual_args, +@ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result, +@ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct awk_ext_func *finfo); +@ @ @ @ const size_t max_expected_args; +@ @ @ @ const size_t min_required_args; +@ @ @ @ awk_bool_t suppress_lint; +@ @ @ @ void *data; /* opaque pointer to any extra state */ @} awk_ext_func_t; @end example @@ -31073,36 +32154,94 @@ or an underscore, which may be followed by any number of letters, digits, and underscores. Letter case in function names is significant. -@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +@item awk_value_t *(*const function)(int num_actual_args, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct awk_ext_func *finfo); This is a pointer to the C function that provides the extension's functionality. -The function must fill in @code{*result} with either a number -or a string. @command{gawk} takes ownership of any string memory. +The function must fill in @code{*result} with either a number, +a string, or a regexp. +@command{gawk} takes ownership of any string memory. As mentioned earlier, string memory @emph{must} come from one of @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. The @code{num_actual_args} argument tells the C function how many actual parameters were passed from the calling @command{awk} code. +The @code{finfo} parameter is a pointer to the @code{awk_ext_func_t} for +this function. The called function may access data within it as desired, or not. + The function must return the value of @code{result}. This is for the convenience of the calling code inside @command{gawk}. -@item size_t num_expected_args; -This is the number of arguments the function expects to receive. -Each extension function may decide what to do if the number of -arguments isn't what it expected. As with real @command{awk} functions, it -is likely OK to ignore extra arguments. +@item const size_t max_expected_args; +This is the maximum number of arguments the function expects to receive. +If called with more arguments than this, and if lint checking has +been enabled, then @command{gawk} prints a warning message. For more +information, see the entry for @code{suppress_lint}, later in this list. + +@item const size_t min_required_args; +This is the minimum number of arguments the function expects to receive. +If called with fewer arguments, @command{gawk} prints a fatal error +message and exits. + +@item awk_bool_t suppress_lint; +This flag tells @command{gawk} not to print a lint message if lint +checking has been enabled and if more arguments were supplied in the call +than expected. An extension function can tell if @command{gawk} already +printed at least one such message by checking if @samp{num_actual_args > +finfo->max_expected_args}. If so, and the function does not want more +lint messages to be printed, it should set @code{finfo->suppress_lint} +to @code{awk_true}. + +@item void *data; +This is an opaque pointer to any data that an extension function may +wish to have available when called. Passing the @code{awk_ext_func_t} +structure to the extension function, and having this pointer available +in it enable writing a single C or C++ function that implements multiple +@command{awk}-level extension functions. @end table Once you have a record representing your extension function, you register it with @command{gawk} using this API function: @table @code -@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func); +@item awk_bool_t add_ext_func(const char *namespace, awk_ext_func_t *func); This function returns true upon success, false otherwise. The @code{namespace} parameter is currently not used; you should pass in an empty string (@code{""}). The @code{func} pointer is the address of a @code{struct} representing your function, as just described. + +@command{gawk} does not modify what @code{func} points to, but the +extension function itself receives this pointer and can modify what it +points to, thus it is purposely not declared to be @code{const}. +@end table + +The combination of @code{min_required_args}, @code{max_expected_args}, +and @code{suppress_lint} may be confusing. Here is how you should +set things up. + +@table @asis +@item Any number of arguments is valid +Set @code{min_required_args} and @code{max_expected_args} to zero and +set @code{suppress_lint} to @code{awk_true}. + +@item A minimum number of arguments is required, no limit on maximum number of arguments +Set @code{min_required_args} to the minimum required. Set +@code{max_expected_args} to zero and +set @code{suppress_lint} to @code{awk_true}. + +@item A minimum number of arguments is required, a maximum number is expected +Set @code{min_required_args} to the minimum required. Set +@code{max_expected_args} to the maximum expected. +Set @code{suppress_lint} to @code{awk_false}. + +@item A minimum number of arguments is required, and no more than a maximum is allowed +Set @code{min_required_args} to the minimum required. Set +@code{max_expected_args} to the maximum expected. +Set @code{suppress_lint} to @code{awk_false}. +In your extension function, check that @code{num_actual_args} does not +exceed @code{f->max_expected_args}. If it does, issue a fatal error message. @end table @node Exit Callback Functions @@ -31242,7 +32381,8 @@ typedef struct awk_input @{ #define INVALID_HANDLE (-1) void *opaque; /* private data for input parsers */ int (*get_record)(char **out, struct awk_input *iobuf, - int *errcode, char **rt_start, size_t *rt_len); + int *errcode, char **rt_start, size_t *rt_len, + const awk_fieldwidth_info_t **field_width); ssize_t (*read_func)(); void (*close_func)(struct awk_input *iobuf); struct stat sbuf; /* stat buf */ @@ -31294,7 +32434,8 @@ is not required to use this pointer. @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len); +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_fieldwidth_info_t **field_width); This function pointer should point to a function that creates the input records. Said function is the core of the input parser. Its behavior is described in the text following this list. @@ -31346,6 +32487,21 @@ If the concept of a ``record terminator'' makes sense, then data. Otherwise, @code{*rt_len} should be set to zero. @command{gawk} makes its own copy of this data, so the extension must manage this storage. + +@item const awk_fieldwidth_info_t **field_width +If @code{field_width} is not @code{NULL}, then @code{*field_width} will be initialized +to @code{NULL}, and the function may set it to point to a structure +supplying field width information to override the default +field parsing mechanism. Note that this structure will not +be copied by @command{gawk}; it must persist at least until the next call +to @code{get_record} or @code{close_func}. Note also that @code{field_width} is +@code{NULL} when @code{getline} is assigning the results to a variable, thus +field parsing is not needed. If the parser does set @code{*field_width}, +then @command{gawk} uses this layout to parse the input record, +and the @code{PROCINFO["FS"]} value will be @code{"API"} while this record +is active in @code{$0}. +The @code{awk_fieldwidth_info_t} data structure +is described below. @end table The return value is the length of the buffer pointed to by @@ -31404,6 +32560,50 @@ Register the input parser pointed to by @code{input_parser} with @command{gawk}. @end table +If you would like to override the default field parsing mechanism for a given +record, then you must populate an @code{awk_fieldwidth_info_t} structure, +which looks like this: + +@example +typedef struct @{ + awk_bool_t use_chars; /* false ==> use bytes */ + size_t nf; /* number of fields in record (NF) */ + struct awk_field_info @{ + size_t skip; /* amount to skip before field starts */ + size_t len; /* length of field */ + @} fields[1]; /* actual dimension should be nf */ +@} awk_fieldwidth_info_t; +@end example + +The fields are: + +@table @code +@item awk_bool_t use_chars; +Set this to @code{awk_true} if the field lengths are specified in terms +of potentially multi-byte characters, and set it to @code{awk_false} if +the lengths are in terms of bytes. +Performance will be better if the values are supplied in +terms of bytes. + +@item size_t nf; +Set this to the number of fields in the input record, i.e. @code{NF}. + +@item struct awk_field_info fields[nf]; +This is a variable-length array whose actual dimension should be @code{nf}. +For each field, the @code{skip} element should be set to the number +of characters or bytes, as controlled by the @code{use_chars} flag, +to skip before the start of this field. The @code{len} element provides +the length of the field. The values in @code{fields[0]} provide the information +for @code{$1}, and so on through the @code{fields[nf-1]} element containing the information for @code{$NF}. +@end table + +A convenience macro @code{awk_fieldwidth_info_size(NF)} is provided to +calculate the appropriate size of a variable-length +@code{awk_fieldwidth_info_t} structure containing @code{NF} fields. This can +be used as an argument to @code{malloc()} or in a union to allocate space +statically. Please refer to the @code{readdir_test} sample extension for an +example. + @node Output Wrappers @subsubsection Customized Output Wrappers @cindex customized output wrapper @@ -31594,6 +32794,9 @@ that parameter. More's the pity.} @item void fatal(awk_ext_id_t id, const char *format, ...); Print a message and then cause @command{gawk} to exit immediately. +@item void nonfatal(awk_ext_id_t id, const char *format, ...); +Print a nonfatal error message. + @item void warning(awk_ext_id_t id, const char *format, ...); Print a warning message. @@ -31646,21 +32849,25 @@ value type, as appropriate. This behavior is summarized in @caption{API value types returned} @docbook <informaltable> -<tgroup cols="6"> - <colspec colwidth="16.6*"/> - <colspec colwidth="16.6*"/> - <colspec colwidth="19.8*" colname="c3"/> - <colspec colwidth="15*" colname="c4"/> - <colspec colwidth="15*" colname="c5"/> - <colspec colwidth="16.6*" colname="c6"/> - <spanspec spanname="hspan" namest="c3" nameend="c6" align="center"/> +<tgroup cols="8"> + <colspec colname="c1"/> + <colspec colname="c2"/> + <colspec colname="c3"/> + <colspec colname="c4"/> + <colspec colname="c5"/> + <colspec colname="c6"/> + <colspec colname="c7"/> + <colspec colname="c8"/> + <spanspec spanname="hspan" namest="c3" nameend="c8" align="center"/> <thead> <row><entry></entry><entry spanname="hspan"><para>Type of Actual Value</para></entry></row> <row> <entry></entry> <entry></entry> <entry><para>String</para></entry> + <entry><para>Strnum</para></entry> <entry><para>Number</para></entry> + <entry><para>Regex</para></entry> <entry><para>Array</para></entry> <entry><para>Undefined</para></entry> </row> @@ -31671,48 +32878,80 @@ value type, as appropriate. This behavior is summarized in <entry><para><emphasis role="bold">String</emphasis></para></entry> <entry><para>String</para></entry> <entry><para>String</para></entry> - <entry><para>False</para></entry> - <entry><para>False</para></entry> + <entry><para>String</para></entry> + <entry><para>String</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Strnum</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>Strnum</para></entry> + <entry><para>Strnum</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> </row> <row> <entry></entry> <entry><para><emphasis role="bold">Number</emphasis></para></entry> - <entry><para>Number if can be converted, else false</para></entry> <entry><para>Number</para></entry> - <entry><para>False</para></entry> - <entry><para>False</para></entry> + <entry><para>Number</para></entry> + <entry><para>Number</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> </row> <row> <entry><para><emphasis role="bold">Type</emphasis></para></entry> + <entry><para><emphasis role="bold">Regex</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>Regex</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry><para><emphasis role="bold">Requested</emphasis></para></entry> <entry><para><emphasis role="bold">Array</emphasis></para></entry> - <entry><para>False</para></entry> - <entry><para>False</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> <entry><para>Array</para></entry> - <entry><para>False</para></entry> + <entry><para>false</para></entry> </row> <row> - <entry><para><emphasis role="bold">Requested</emphasis></para></entry> + <entry></entry> <entry><para><emphasis role="bold">Scalar</emphasis></para></entry> <entry><para>Scalar</para></entry> <entry><para>Scalar</para></entry> - <entry><para>False</para></entry> - <entry><para>False</para></entry> + <entry><para>Scalar</para></entry> + <entry><para>Scalar</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> </row> <row> <entry></entry> <entry><para><emphasis role="bold">Undefined</emphasis></para></entry> <entry><para>String</para></entry> + <entry><para>Strnum</para></entry> <entry><para>Number</para></entry> + <entry><para>Regex</para></entry> <entry><para>Array</para></entry> <entry><para>Undefined</para></entry> </row> <row> <entry></entry> <entry><para><emphasis role="bold">Value cookie</emphasis></para></entry> - <entry><para>False</para></entry> - <entry><para>False</para></entry> - <entry><para>False</para> - </entry><entry><para>False</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> </row> </tbody> </tgroup> @@ -31728,41 +32967,45 @@ value type, as appropriate. This behavior is summarized in @tex \vglue-1.1\baselineskip @end tex -@multitable @columnfractions .166 .166 .198 .15 .15 .166 -@headitem @tab @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{String} @tab String @tab String @tab False @tab False -@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab False @tab False -@item @b{Type} @tab @b{Array} @tab False @tab False @tab Array @tab False -@item @b{Requested} @tab @b{Scalar} @tab Scalar @tab Scalar @tab False @tab False -@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{Value cookie} @tab False @tab False @tab False @tab False +@c @multitable @columnfractions .166 .166 .198 .15 .15 .166 +@multitable {Requested} {Undefined} {Number} {Number} {Scalar} {Regex} {Array} {Undefined} +@headitem @tab @tab String @tab Strnum @tab Number @tab Regex @tab Array @tab Undefined +@item @tab @b{String} @tab String @tab String @tab String @tab String @tab false @tab false +@item @tab @b{Strnum} @tab false @tab Strnum @tab Strnum @tab false @tab false @tab false +@item @tab @b{Number} @tab Number @tab Number @tab Number @tab false @tab false @tab false +@item @b{Type} @tab @b{Regex} @tab false @tab false @tab false @tab Regex @tab false @tab false +@item @b{Requested} @tab @b{Array} @tab false @tab false @tab false @tab false @tab Array @tab false +@item @tab @b{Scalar} @tab Scalar @tab Scalar @tab Scalar @tab Scalar @tab false @tab false +@item @tab @b{Undefined} @tab String @tab Strnum @tab Number @tab Regex @tab Array @tab Undefined +@item @tab @b{Value cookie} @tab false @tab false @tab false @tab false @tab false @tab false @end multitable @end ifnotdocbook @end ifnotplaintext @ifplaintext -@example - +-------------------------------------------------+ - | Type of Actual Value: | - +------------+------------+-----------+-----------+ - | String | Number | Array | Undefined | -+-----------+-----------+------------+------------+-----------+-----------+ -| | String | String | String | False | False | -| |-----------+------------+------------+-----------+-----------+ -| | Number | Number if | Number | False | False | -| | | can be | | | | -| | | converted, | | | | -| | | else false | | | | -| |-----------+------------+------------+-----------+-----------+ -| Type | Array | False | False | Array | False | -| Requested |-----------+------------+------------+-----------+-----------+ -| | Scalar | Scalar | Scalar | False | False | -| |-----------+------------+------------+-----------+-----------+ -| | Undefined | String | Number | Array | Undefined | -| |-----------+------------+------------+-----------+-----------+ -| | Value | False | False | False | False | -| | cookie | | | | | -+-----------+-----------+------------+------------+-----------+-----------+ -@end example +@verbatim + +-------------------------------------------------------+ + | Type of Actual Value: | + +--------+--------+--------+--------+-------+-----------+ + | String | Strnum | Number | Regex | Array | Undefined | ++-----------+-----------+--------+--------+--------+--------+-------+-----------+ +| | String | String | String | String | String | false | false | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Strnum | false | Strnum | Strnum | false | false | false | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Number | Number | Number | Number | false | false | false | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Regex | false | false | false | Regex | false | false | +| Type +-----------+--------+--------+--------+--------+-------+-----------+ +| Requested | Array | false | false | false | false | Array | false | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Scalar | Scalar | Scalar | Scalar | Scalar | false | false | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Undefined | String | Strnum | Number | Regex | Array | Undefined | +| +-----------+--------+--------+--------+--------+-------+-----------+ +| | Value | false | false | false | false | false | false | +| | Cookie | | | | | | | ++-----------+-----------+--------+--------+--------+--------+-------+-----------+ +@end verbatim @end ifplaintext @end float @@ -31840,13 +33083,6 @@ An extension can look up the value of @command{gawk}'s special variables. However, with the exception of the @code{PROCINFO} array, an extension cannot change any of those variables. -@quotation CAUTION -It is possible for the lookup of @code{PROCINFO} to fail. This happens if -the @command{awk} program being run does not reference @code{PROCINFO}; -in this case, @command{gawk} doesn't bother to create the array and -populate it. -@end quotation - @node Symbol table by cookie @subsubsection Variable Access and Update by Cookie @@ -31868,7 +33104,7 @@ Return false if the value cannot be retrieved. @item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); Update the value associated with a scalar cookie. Return false if -the new value is not of type @code{AWK_STRING} or @code{AWK_NUMBER}. +the new value is not of type @code{AWK_STRING}, @code{AWK_STRNUM}, @code{AWK_REGEX}, or @code{AWK_NUMBER}. Here too, the predefined variables may not be updated. @end table @@ -31989,7 +33225,7 @@ is what the routines in this @value{SECTION} let you do. The functions are as f @table @code @item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result); Create a cached string or numeric value from @code{value} for -efficient later assignment. Only values of type @code{AWK_NUMBER} +efficient later assignment. Only values of type @code{AWK_NUMBER}, @code{AWK_REGEX}, @code{AWK_STRNUM}, and @code{AWK_STRING} are allowed. Any other type is rejected. @code{AWK_UNDEFINED} could be allowed, but doing so would result in inferior performance. @@ -32215,9 +33451,10 @@ The array remains an array, but after calling this function, it has no elements. This is equivalent to using the @code{delete} statement (@pxref{Delete}). -@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); +@item awk_bool_t flatten_array_typed(awk_array_t a_cookie, awk_flat_array_t **data, awk_valtype_t index_type, awk_valtype_t value_type); For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} -structure and fill it in. Set the pointer whose address is passed as @code{data} +structure and fill it in with indices and values of the requested types. +Set the pointer whose address is passed as @code{data} to point to this structure. Return true upon success, or false otherwise. @ifset FOR_PRINT @@ -32229,6 +33466,14 @@ See the next @value{SECTION} for a discussion of how to flatten an array and work with it. +@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); +For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} +structure and fill it in with @code{AWK_STRING} indices and +@code{AWK_UNDEFINED} values. +This is superseded by @code{flatten_array_typed()}. +It is provided as a macro, and remains for convenience and for source code +compatibility with the previous version of the API. + @item awk_bool_t release_flattened_array(awk_array_t a_cookie, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); When done with a flattened array, release the storage using this function. @@ -32341,7 +33586,7 @@ to double-check that the count in the @code{awk_flat_array_t} is the same as the count just retrieved: @example - if (! flatten_array(value2.array_cookie, & flat_array)) @{ + if (! flatten_array_typed(value2.array_cookie, & flat_array, AWK_STRING, AWK_UNDEFINED)) @{ printf("dump_array_and_delete: could not flatten array\n"); goto out; @} @@ -32637,6 +33882,75 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} (@xref{Finding Extensions} for more information on the @env{AWKLIBPATH} environment variable.) +@node Redirection API +@subsection Accessing and Manipulating Redirections + +The following function allows extensions to access and manipulate redirections. + +@table @code +@item awk_bool_t get_file(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t name_len, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const char *filetype, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int fd, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_input_buf_t **ibufp, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_output_buf_t **obufp); +Look up file @code{name} in @command{gawk}'s internal redirection table. +If @code{name} is @code{NULL} or @code{name_len} is zero, return +data for the currently open input file corresponding to @code{FILENAME}. +(This does not access the @code{filetype} argument, so that may be undefined). +If the file is not already open, attempt to open it. +The @code{filetype} argument must be zero-terminated and should be one of: + +@table @code +@item ">" +A file opened for output. + +@item ">>" +A file opened for append. + +@item "<" +A file opened for input. + +@item "|>" +A pipe opened for output. + +@item "|<" +A pipe opened for input. + +@item "|&" +A two-way coprocess. +@end table + +On error, return an @code{awk_false} value. Otherwise, return +@code{awk_true}, and return additional information about the redirection +in the @code{ibufp} and @code{obufp} pointers. For input +redirections, the @code{*ibufp} value should be non-@code{NULL}, +and @code{*obufp} should be @code{NULL}. For output redirections, +the @code{*obufp} value should be non-@code{NULL}, and @code{*ibufp} +should be @code{NULL}. For two-way coprocesses, both values should +be non-@code{NULL}. + +In the usual case, the extension is interested in @code{(*ibufp)->fd} +and/or @code{fileno((*obufp)->fp)}. If the file is not already +open, and the @code{fd} argument is nonnegative, @command{gawk} +will use that file descriptor instead of opening the file in the +usual way. If @code{fd} is nonnegative, but the file exists already, +@command{gawk} ignores @code{fd} and returns the existing file. It is +the caller's responsibility to notice that neither the @code{fd} in +the returned @code{awk_input_buf_t} nor the @code{fd} in the returned +@code{awk_output_buf_t} matches the requested value. + +Note that supplying a file descriptor is currently @emph{not} supported +for pipes. However, supplying a file descriptor should work for input, +output, append, and two-way (coprocess) sockets. If @code{filetype} +is two-way, @command{gawk} assumes that it is a socket! Note that in +the two-way case, the input and output file descriptors may differ. +To check for success, you must check whether either matches. +@end table + +It is anticipated that this API function will be used to implement I/O +multiplexing and a socket library. + @node Extension API Variables @subsection API Variables @@ -32663,10 +33977,10 @@ debugging: @float Table,gawk-api-version @caption{gawk API version constants} -@multitable @columnfractions .33 .33 .33 -@headitem API Version @tab C preprocessor define @tab enum constant -@item Major @tab gawk_api_major_version @tab GAWK_API_MAJOR_VERSION -@item Minor @tab gawk_api_minor_version @tab GAWK_API_MINOR_VERSION +@multitable {@b{API Version}} {@code{gawk_api_major_version}} {@code{GAWK_API_MAJOR_VERSION}} +@headitem API Version @tab C Preprocessor Define @tab enum constant +@item Major @tab @code{gawk_api_major_version} @tab @code{GAWK_API_MAJOR_VERSION} +@item Minor @tab @code{gawk_api_minor_version} @tab @code{GAWK_API_MINOR_VERSION} @end multitable @end float @@ -32685,10 +33999,10 @@ constant integers: @table @code @item api->major_version -The major version of the running @command{gawk} +The major version of the running @command{gawk}. @item api->minor_version -The minor version of the running @command{gawk} +The minor version of the running @command{gawk}. @end table It is up to the extension to decide if there are API incompatibilities. @@ -32761,7 +34075,7 @@ static awk_ext_id_t ext_id; static const char *ext_version = NULL; /* or @dots{} = "some string" */ static awk_ext_func_t func_table[] = @{ - @{ "name", do_name, 1 @}, + @{ "name", do_name, 1, 0, awk_false, NULL @}, /* @dots{} */ @}; @@ -32862,6 +34176,19 @@ If @code{ext_version} is not @code{NULL}, register the version string with @command{gawk}. @end enumerate + +@node Changes from API V1 +@subsection Changes From Version 1 of the API + +The current API is @emph{not} binary compatible with version 1 of the API. +You will have to recompile your extensions in order to use them with +the current version of @command{gawk}. + +Fortunately, at the possible expense of some compile-time warnings, the API remains +source-code--compatible with the previous API. The major differences are +the additional members in the @code{awk_ext_func_t} structure, and the +addition of the third argument to the C implementation function. + @node Finding Extensions @section How @command{gawk} Finds Extensions @cindex extension search path @@ -33102,17 +34429,12 @@ The second is a pointer to an @code{awk_value_t} structure, usually named /* do_chdir --- provide dynamically loaded chdir() function for gawk */ static awk_value_t * -do_chdir(int nargs, awk_value_t *result) +do_chdir(int nargs, awk_value_t *result, struct awk_ext_func *unused) @{ awk_value_t newdir; int ret = -1; assert(result != NULL); - - if (do_lint && nargs != 1) - lintwarn(ext_id, - _("chdir: called with incorrect number of arguments, " - "expecting 1")); @end example The @code{newdir} @@ -33121,8 +34443,8 @@ with @code{get_argument()}. Note that the first argument is numbered zero. If the argument is retrieved successfully, the function calls the -@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} -is updated: +@code{chdir()} system call. Otherwise, if the @code{chdir()} fails, +it updates @code{ERRNO}: @example if (get_argument(0, AWK_STRING, & newdir)) @{ @@ -33326,15 +34648,11 @@ is set to point to @code{stat()}, instead. Here is the @code{do_stat()} function, which starts with variable declarations and argument checking: -@ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", -@end ignore @example /* do_stat --- provide a stat() function for gawk */ static awk_value_t * -do_stat(int nargs, awk_value_t *result) +do_stat(int nargs, awk_value_t *result, struct awk_ext_func *unused) @{ awk_value_t file_param, array_param; char *name; @@ -33345,13 +34663,6 @@ do_stat(int nargs, awk_value_t *result) int (*statfunc)(const char *path, struct stat *sbuf) = lstat; assert(result != NULL); - - if (nargs != 2 && nargs != 3) @{ - if (do_lint) - lintwarn(ext_id, - _("stat: called with wrong number of arguments")); - return make_number(-1, result); - @} @end example Then comes the actual work. First, the function gets the arguments. @@ -33419,11 +34730,9 @@ structures for loading each function into @command{gawk}: @example static awk_ext_func_t func_table[] = @{ - @{ "chdir", do_chdir, 1 @}, - @{ "stat", do_stat, 2 @}, -#ifndef __MINGW32__ - @{ "fts", do_fts, 3 @}, -#endif + @{ "chdir", do_chdir, 1, 1, awk_false, NULL @}, + @{ "stat", do_stat, 3, 2, awk_false, NULL @}, + @dots{} @}; @end example @@ -34238,18 +35547,21 @@ As of this writing, there are seven extensions: GD graphics library extension @item +MPFR library extension +(this provides access to a number of MPFR functions that @command{gawk}'s +native MPFR support does not) + +@item PDF extension @item PostgreSQL extension @item -MPFR library extension -(this provides access to a number of MPFR functions that @command{gawk}'s -native MPFR support does not) +Redis extension @item -Redis extension +Select extension @item XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} @@ -34349,7 +35661,7 @@ output wrappers, and two-way processors) @item -Printing fatal, warning, and ``lint'' warning messages +Printing fatal, nonfatal, warning, and ``lint'' warning messages @item Updating @code{ERRNO}, or unsetting it @@ -34878,6 +36190,10 @@ Indirect function calls @item Directories on the command line produce a warning and are skipped (@pxref{Command-line directories}) + +@item +Output with @code{print} and @code{printf} need not be fatal +(@pxref{Nonfatal}) @end itemize @item @@ -34965,6 +36281,11 @@ The @code{isarray()} function to check if a variable is an array or not The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()} functions for internationalization (@pxref{Programmer i18n}) + +@item +The @code{intdiv()} function for doing integer +division and remainder +(@pxref{Numeric Functions}) @end itemize @item @@ -35003,6 +36324,7 @@ The @option{-p}, @option{-P}, @option{-r}, +@option{-s}, @option{-S}, @option{-t}, and @@ -35027,6 +36349,7 @@ and the @option{--load}, @option{--non-decimal-data}, @option{--optimize}, +@option{--no-optimize}, @option{--posix}, @option{--pretty-print}, @option{--profile}, @@ -35097,6 +36420,19 @@ for @command{gawk} @value{PVERSION} 4.1: Ultrix @end itemize +@item +Support for the following systems was removed from the code +for @command{gawk} @value{PVERSION} 4.2: + +@c nested table +@itemize @value{MINUS} +@item +MirBSD + +@item +GNU/Linux on Alpha +@end itemize + @end itemize @c XXX ADD MORE STUFF HERE @@ -35723,6 +37059,56 @@ Support for Ultrix was removed. @end itemize +Version 4.2 introduced the following changes: + +@itemize @bullet +@item +Changes to @code{ENVIRON} are reflected into @command{gawk}'s +environment and that of programs that it runs. +@xref{Auto-set}. + +@item +The @code{PROCINFO["argv"} array. +@xref{Auto-set}. + +@item +The @option{--pretty-print} option no longer runs the @command{awk} +program too. +@xref{Options}. + +@item +The @command{igawk} program and its manual page are no longer +installed when @command{gawk} is built. +@xref{Igawk Program}. + +@item +The @code{intdiv()} function. +@xref{Numeric Functions}. + +@item +The maximum number of hexadecimal digits in @samp{\x} escapes +is now two. +@xref{Escape Sequences}. + +@item +Nonfatal output with @code{print} and @code{printf}. +@xref{Nonfatal}. + +@item +For many years, POSIX specified that default field splitting +only allowed spaces and tabs to separate fields, and this was +how @command{gawk} behaved with @option{--posix}. As of 2013, +the standard restored historical behavior, and now default +field splitting with @option{--posix} also allows newlines to +separate fields. + +@item +Support for MirBSD was removed. + +@item +Support for GNU/Linux on Alpha was removed. +@end itemize + @c XXX ADD MORE STUFF HERE @end ifclear @@ -35852,7 +37238,7 @@ and @uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05, its rationale}.} By using this lovely technical term, the standard gives license -to implementors to implement ranges in whatever way they choose. +to implementers to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning both with the default regexp matching and when @option{--traditional} or @option{--posix} are used. @@ -36289,6 +37675,12 @@ These files contain the actual @command{gawk} source code. @end table @table @file +@item support/* +C header and source files for routines that @command{gawk} +uses, but that are not part of its core functionality. +For example, argument parsing, regular expression matching, +and random number generating routines are all kept here. + @item ABOUT-NLS A file containing information about GNU @command{gettext} and translations. @@ -36389,6 +37781,8 @@ The generated Info file for The @command{troff} source for a manual page describing the @command{igawk} program presented in @ref{Igawk Program}. +(Since @command{gawk} can do its own @code{@@include} processing, +neither @command{igawk} nor @file{igawk.1} are installed.) @item doc/Makefile.in The input file used during the configuration process to generate the @@ -36433,8 +37827,6 @@ source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} fil @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}. The library functions from @ref{Library Functions}, -and the @command{igawk} program from -@ref{Igawk Program} are included as ready-to-use files in the @command{gawk} distribution. They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate @@ -36445,6 +37837,12 @@ The source code, manual pages, and infrastructure files for the sample extensions included with @command{gawk}. @xref{Dynamic Extensions}, for more information. +@item extras/* +Additional non-essential files. Currently, this directory contains some shell +startup files to be installed in @file{/etc/profile.d} to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +@xref{Shell Startup Files}, for more information. + @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @@ -36473,6 +37871,7 @@ to configure @command{gawk} for your system yourself. @menu * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. @end menu @@ -36553,6 +37952,44 @@ is likely that you will be asked for your password, and you will have to have been set up previously as a user who is allowed to run the @command{sudo} command. +@node Shell Startup Files +@appendixsubsec Shell Startup Files + +The distribution contains shell startup files @file{gawk.sh} and +@file{gawk.csh}, containing functions to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +On a Fedora GNU/Linux system, these files should be installed in @file{/etc/profile.d}; +on other platforms, the appropriate location may be different. + +@table @command + +@cindex @command{gawkpath_default} shell function +@item gawkpath_default +Reset the @env{AWKPATH} environment variable to its default value. + +@cindex @command{gawkpath_prepend} shell function +@item gawkpath_prepend +Add the argument to the front of the @env{AWKPATH} environment variable. + +@cindex @command{gawkpath_append} shell function +@item gawkpath_append +Add the argument to the end of the @env{AWKPATH} environment variable. + +@cindex @command{gawklibpath_default} shell function +@item gawklibpath_default +Reset the @env{AWKLIBPATH} environment variable to its default value. + +@cindex @command{gawklibpath_prepend} shell function +@item gawklibpath_prepend +Add the argument to the front of the @env{AWKLIBPATH} environment variable. + +@cindex @command{gawklibpath_append} shell function +@item gawklibpath_append +Add the argument to the end of the @env{AWKLIBPATH} environment variable. + +@end table + + @node Additional Configuration Options @appendixsubsec Additional Configuration Options @cindex @command{gawk}, configuring, options @@ -36594,6 +38031,13 @@ Using this option will cause some of the tests in the test suite to fail. This option may be removed at a later date. @end quotation +@cindex @option{--disable-mpfr} configuration option +@cindex configuration option, @code{--disable-mpfr} +@item --disable-mpfr +Skip checking for the MPFR and GMP libraries. This is useful +mainly for the developers, to make sure nothing breaks if +MPFR support is not available. + @cindex @option{--disable-nls} configuration option @cindex configuration option, @code{--disable-nls} @item --disable-nls @@ -41136,6 +42580,7 @@ Consistency issues: Use MS-DOS not MS DOS Use an empty set of parentheses after built-in and awk function names. Use "multiFOO" without a hyphen. + Use "time zone" as two words, not "timezone". Date: Wed, 13 Apr 94 15:20:52 -0400 From: rms@gnu.org (Richard Stallman) |