diff options
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r-- | doc/gawktexi.in | 885 |
1 files changed, 797 insertions, 88 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 207c85bb..afcf749a 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -534,6 +534,7 @@ particular records in a file and perform operations upon them. * Computed Regexps:: Using Dynamic Regexps. * GNU Regexp Operators:: Operators specific to GNU software. * Case-sensitivity:: How to do case-insensitive matching. +* Strong Regexp Constants:: Strongly typed regexp constants. * Regexp Summary:: Regular expressions summary. * Records:: Controlling how data is split into records. @@ -576,6 +577,7 @@ particular records in a file and perform operations upon them. @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -605,6 +607,7 @@ particular records in a file and perform operations upon them. * Special Caveats:: Things to watch out for. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. * Values:: Constants, Variables, and Regular @@ -916,6 +919,7 @@ particular records in a file and perform operations upon them. * Array Functions:: Functions for working with arrays. * Flattening Arrays:: How to flatten arrays. * Creating Arrays:: How to create and populate arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension Versioning:: API Version information. * Extension API Informational Variables:: Variables providing information about @@ -974,6 +978,7 @@ particular records in a file and perform operations upon them. * Unix Installation:: Installing @command{gawk} under various versions of Unix. * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. * Non-Unix Installation:: Installation on Other Operating @@ -3955,6 +3960,7 @@ when parsing numeric input data (@pxref{Locales}). @cindex @option{-o} option @cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. +Implies @option{--no-optimize}. By default, the output program is created in a file named @file{awkprof.out} (@pxref{Profiling}). The optional @var{file} argument allows you to specify a different @@ -3963,18 +3969,22 @@ No space is allowed between the @option{-o} and @var{file}, if @var{file} is supplied. @quotation NOTE -Due to the way @command{gawk} has evolved, with this option -your program still executes. This will change in the -next major release, such that @command{gawk} will only -pretty-print the program and not run it. +In the past, this option would also execute your program. +This is no longer the case. @end quotation @item @option{-O} @itemx @option{--optimize} @cindex @option{--optimize} option @cindex @option{-O} option -Enable some optimizations on the internal representation of the program. -At the moment, this includes just simple constant folding. +Enable @command{gawk}'s default optimizations on the internal +representation of the program. At the moment, this includes simple +constant folding and tail recursion elimination in function calls. + +These optimizations are enabled by default. +This option remains primarily for backwards compatibility. However, it may +be used to cancel the effect of an earlier @option{-s} option +(see later in this list). @item @option{-p}[@var{file}] @itemx @option{--profile}[@code{=}@var{file}] @@ -3983,6 +3993,7 @@ At the moment, this includes just simple constant folding. @cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). +Implies @option{--no-optimize}. By default, profiles are created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different @value{FN} for the profile file. @@ -4012,11 +4023,6 @@ restrictions apply: @cindex newlines @cindex whitespace, newlines as @item -Newlines do not act as whitespace to separate fields when @code{FS} is -equal to a single space -(@pxref{Fields}). - -@item Newlines are not allowed after @samp{?} or @samp{:} (@pxref{Conditional Exp}). @@ -4054,6 +4060,13 @@ This is now @command{gawk}'s default behavior. Nevertheless, this option remains (both for backward compatibility and for use in combination with @option{--traditional}). +@item @option{-s} +@itemx @option{--no-optimize} +@cindex @option{--no-optimize} option +@cindex @option{-s} option +Disable @command{gawk}'s default optimizations on the internal +representation of the program. + @item @option{-S} @itemx @option{--sandbox} @cindex @option{-S} option @@ -4367,6 +4380,9 @@ searches first in the current directory and then in @file{/usr/local/share/awk}. In practice, this means that you will rarely need to change the value of @env{AWKPATH}. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4398,6 +4414,9 @@ an empty value, @command{gawk} uses a default path; this is typically @samp{/usr/local/lib/gawk}, although it can vary depending upon how @command{gawk} was built. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKLIBPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4425,6 +4444,8 @@ wait for input before returning with an error. Controls the number of times @command{gawk} attempts to retry a two-way TCP/IP (socket) connection before giving up. @xref{TCP/IP Networking}. +Note that when nonfatal I/O is enabled (@pxref{Nonfatal}), +@command{gawk} only tries to open a TCP/IP socket once. @item POSIXLY_CORRECT Causes @command{gawk} to switch to POSIX-compatibility @@ -4479,14 +4500,6 @@ two regexp matchers that @command{gawk} uses internally. (There aren't supposed to be differences, but occasionally theory and practice don't coordinate with each other.) -@item GAWK_NO_PP_RUN -When @command{gawk} is invoked with the @option{--pretty-print} option, -it will not run the program if this environment variable exists. - -@quotation CAUTION -This variable will not survive into the next major release. -@end quotation - @item GAWK_STACKSIZE This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. @@ -4784,6 +4797,32 @@ Similarly, you may use @code{print} or @code{printf} statements in the @var{init} and @var{increment} parts of a @code{for} loop. This is another long-undocumented ``feature'' of Unix @command{awk}. +@command{gawk} lets you use the names of built-in functions that are +@command{gawk} extensions as the names of parameters in user-defined functions. +This is intended to ``future-proof'' old code that happens to use +function names added by @command{gawk} after the code was written. +Standard @command{awk} built-in functions, such as @code{sin()} or +@code{substr()} are @emph{not} shadowed in this way. + +The @code{PROCINFO["argv"]} array contains all of the command-line arguments +(after glob expansion and redirection processing on platforms where that must +be done manually by the program) with subscripts ranging from 0 through +@code{argc} @minus{} 1. For example, @code{PROCINFO["argv"][0]} will contain +the name by which @command{gawk} was invoked. Here is an example of how this +feature may be used: + +@example +awk ' +BEGIN @{ + for (i = 0; i < length(PROCINFO["argv"]); i++) + print i, PROCINFO["argv"][i] +@}' +@end example + +Please note that this differs from the standard @code{ARGV} array which does +not include command-line arguments that have already been processed by +@command{gawk} (@pxref{ARGC and ARGV}). + @end ignore @node Invoking Summary @@ -4876,6 +4915,7 @@ regular expressions work, we present more complicated instances. * Computed Regexps:: Using Dynamic Regexps. * GNU Regexp Operators:: Operators specific to GNU software. * Case-sensitivity:: How to do case-insensitive matching. +* Strong Regexp Constants:: Strongly typed regexp constants. * Regexp Summary:: Regular expressions summary. @end menu @@ -5066,17 +5106,21 @@ between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @item \x@var{hh}@dots{} The hexadecimal value @var{hh}, where @var{hh} stands for a sequence of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} -or @samp{a}--@samp{f}). Like the same construct -in ISO C, the escape sequence continues until the first nonhexadecimal -digit is seen. @value{COMMONEXT} -However, using more than two hexadecimal digits produces -undefined results. (The @samp{\x} escape sequence is not allowed in -POSIX @command{awk}.) +or @samp{a}--@samp{f}). A maximum of two digts are allowed after +the @samp{\x}. Any further hexadecimal digits are treated as simple +letters or numbers. @value{COMMONEXT} +(The @samp{\x} escape sequence is not allowed in POSIX awk.) @quotation CAUTION -The next major release of @command{gawk} will change, such -that a maximum of two hexadecimal digits following the -@samp{\x} will be used. +In ISO C, the escape sequence continues until the first nonhexadecimal +digit is seen. +For many years, @command{gawk} would continue incorporating +hexadecimal digits into the value until a non-hexadecimal digit +or the end of the string was encountered. +However, using more than two hexadecimal digits produced +undefined results. +As of @value{PVERSION} 4.2, only two digits +are processed. @end quotation @cindex @code{\} (backslash), @code{\/} escape sequence @@ -6005,6 +6049,25 @@ The value of @code{IGNORECASE} has no effect if @command{gawk} is in compatibility mode (@pxref{Options}). Case is always significant in compatibility mode. +@node Strong Regexp Constants +@section Strongly Typed Regexp Constants + +This @value{SECTION} describes a @command{gawk}-specific feature. + +Regexp constants (@code{/@dots{}/}) hold a strange position in the +@command{awk} language. In most contexts, they act like an expression: +@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to +be matched. In no case are they really a ``first class citizen'' of the +language. That is, you cannot define a scalar variable whose type is +``regexp'' in the same sense that you can define a variable to be a +number or a string: + +@example +num = 42 @ii{Numeric variable} +str = "hi" @ii{String variable} +re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/ +@end example + @node Regexp Summary @section Summary @@ -6095,6 +6158,7 @@ used with it do not have to be named on the @command{awk} command line * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -6412,16 +6476,12 @@ Readfile} for another option. @cindex fields @cindex accessing fields @cindex fields, examining -@cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and -@cindex separators, field, POSIX and When @command{awk} reads an input record, the record is automatically @dfn{parsed} or separated by the @command{awk} utility into chunks called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, like words in a line. Whitespace in @command{awk} means any string of one or more spaces, -TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not -considered whitespace for separating fields.} other characters +TABs, or newlines; other characters that are considered whitespace by other languages (such as formfeed, vertical tab, etc.) are @emph{not} considered whitespace by @command{awk}. @@ -6835,7 +6895,6 @@ can massage it first with a separate @command{awk} program.) @node Default Field Splitting @subsection Whitespace Normally Separates Fields -@cindex newlines, as field separators @cindex whitespace, as field separators Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not @@ -7702,6 +7761,13 @@ a record, such as a file that cannot be opened, then @code{getline} returns @minus{}1. In this case, @command{gawk} sets the variable @code{ERRNO} to a string describing the error that occurred. +If @code{ERRNO} indicates that the I/O operation may be +retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set, +then @code{getline} returns @minus{}2 +instead of @minus{}1, and further calls to @code{getline} +may be attempted. @xref{Retrying Input} for further information about +this feature. + In the following examples, @var{command} stands for a string value that represents a shell command. @@ -8356,7 +8422,8 @@ on a per-command or per-connection basis. the attempt to read from the underlying device may succeed in a later attempt. This is a limitation, and it also means that you cannot use this to multiplex input from -two or more sources. +two or more sources. @xref{Retrying Input} for a way to enable +later I/O attempts to succeed. Assigning a timeout value prevents read operations from blocking indefinitely. But bear in mind that there are other ways @@ -8366,6 +8433,36 @@ a connection before it can start reading any data, or the attempt to open a FIFO special file for reading can block indefinitely until some other process opens it for writing. +@node Retrying Input +@section Retrying Reads After Certain Input Errors +@cindex retrying input + +@cindex differences in @command{awk} and @command{gawk}, retrying input +This @value{SECTION} describes a feature that is specific to @command{gawk}. + +When @command{gawk} encounters an error while reading input, by +default @code{getline} returns @minus{}1, and subsequent attempts to +read from that file result in an end-of-file indication. However, you +may optionally instruct @command{gawk} to allow I/O to be retried when +certain errors are encountered by setting a special element in +the @code{PROCINFO} array (@pxref{Auto-set}): + +@example +PROCINFO["@var{input_name}", "RETRY"] = 1 +@end example + +When this element exists, @command{gawk} checks the value of the system +(C language) +@code{errno} variable when an I/O error occurs. If @code{errno} indicates +a subsequent I/O attempt may succeed, @code{getline} instead returns +@minus{}2 and +further calls to @code{getline} may succeed. This applies to the @code{errno} +values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}. + +This feature is useful in conjunction with +@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file +descriptor has been configured to behave in a non-blocking fashion. + @node Command-line directories @section Directories on the Command Line @cindex differences in @command{awk} and @command{gawk}, command-line directories @@ -8527,6 +8624,7 @@ and discusses the @code{close()} built-in function. @command{gawk} allows access to inherited file descriptors. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. @end menu @@ -9907,17 +10005,26 @@ a system problem closing the file or process. In these cases, @command{gawk} sets the predefined variable @code{ERRNO} to a string describing the problem. -In @command{gawk}, -when closing a pipe or coprocess (input or output), -the return value is the exit status of the command.@footnote{ -This is a full 16-bit value as returned by the @code{wait()} -system call. See the system manual pages for information on -how to decode this value.} -Otherwise, it is the return value from the system's @code{close()} or -@code{fclose()} C functions when closing input or output -files, respectively. -This value is zero if the close succeeds, or @minus{}1 if -it fails. +In @command{gawk}, starting with version 4.2, when closing a pipe or +coprocess (input or output), the return value is the exit status of the +command, as described in @ref{table-close-pipe-return-values}.@footnote{Prior +to version 4.2, the return value from closing a pipe or co-process +was the full 16-bit exit value as defined by the @code{wait()} system +call.} Otherwise, it is the return value from the system's @code{close()} +or @code{fclose()} C functions when closing input or output files, +respectively. This value is zero if the close succeeds, or @minus{}1 +if it fails. + +@float Table,table-close-pipe-return-values +@caption{Return values from @code{close()} of a pipe} +@multitable @columnfractions .40 .60 +@headitem Situation @tab Return value from @code{close()} +@item Normal exit of command @tab Command's exit status +@item Death by signal of command @tab 256 + number of murderous signal +@item Death by signal of command with core dump @tab 512 + number of murderous signal +@item Some kind of error @tab @minus{}1 +@end multitable +@end float The POSIX standard is very vague; it says that @code{close()} returns zero on success and a nonzero value otherwise. In general, @@ -9928,6 +10035,70 @@ In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. @end sidebar +@node Nonfatal +@section Enabling Nonfatal Output + +This @value{SECTION} describes a @command{gawk}-specific feature. + +In standard @command{awk}, output with @code{print} or @code{printf} +to a nonexistent file, or some other I/O error (such as filling up the +disk) is a fatal error. + +@example +$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'} +@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory) +@end example + +@command{gawk} makes it possible to detect that an error has +occurred, allowing you to possibly recover from the error, or +at least print an error message of your choosing before exiting. +You can do this in one of two ways: + +@itemize @bullet +@item +For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}. + +@item +On a per-file basis, by assigning any value to +@code{PROCINFO[@var{filename}, "NONFATAL"]}. +Here, @var{filename} is the name of the file to which +you wish output to be nonfatal. +@end itemize + +Once you have enabled nonfatal output, you must check @code{ERRNO} +after every relevant @code{print} or @code{printf} statement to +see if something went wrong. It is also a good idea to initialize +@code{ERRNO} to zero before attempting the output. For example: + +@example +$ @kbd{gawk '} +> @kbd{BEGIN @{} +> @kbd{ PROCINFO["NONFATAL"] = 1} +> @kbd{ ERRNO = 0} +> @kbd{ print "hi" > "/no/such/file"} +> @kbd{ if (ERRNO) @{} +> @kbd{ print("Output failed:", ERRNO) > "/dev/stderr"} +> @kbd{ exit 1} +> @kbd{ @}} +> @kbd{@}'} +@error{} Output failed: No such file or directory +@end example + +Here, @command{gawk} did not produce a fatal error; instead +it let the @command{awk} program code detect the problem and handle it. + +This mechanism works also for standard output and standard error. +For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} +or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use +@code{PROCINFO["/dev/stderr", "NONFATAL"]}. + +When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), +@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} +environment variable (@pxref{Other Environment Variables}) allows you to +override @command{gawk}'s builtin default number of attempts. However, +once nonfatal I/O is enabled for a given socket, @command{gawk} only +retries once, relying on @command{awk}-level code to notice that there +was a problem. @node Output Summary @section Summary @@ -9957,6 +10128,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections. For coprocesses, it is possible to close only one direction of the communications. +@item +Normally errors with @code{print} or @code{printf} are fatal. +@command{gawk} lets you make output errors be nonfatal either for +all files or on a per-file basis. You must then check for errors +after every relevant output statement. + @end itemize @c EXCLUDE START @@ -11708,19 +11885,19 @@ One special place where @code{/foo/} is @emph{not} an abbreviation for where this is discussed in more detail. @node POSIX String Comparison -@subsubsection String Comparison with POSIX Rules +@subsubsection String Comparison Based on Locale Collating Order -The POSIX standard says that string comparison is performed based -on the locale's @dfn{collating order}. This is the order in which -characters sort, as defined by the locale (for more discussion, -@pxref{Locales}). This order is usually very different -from the results obtained when doing straight character-by-character -comparison.@footnote{Technically, string comparison is supposed -to behave the same way as if the strings were compared with the C -@code{strcoll()} function.} +The POSIX standard used to say that all string comparisons are +performed based on the locale's @dfn{collating order}. This +is the order in which characters sort, as defined by the locale +(for more discussion, @pxref{Locales}). This order is usually very +different from the results obtained when doing straight byte-by-byte +comparison.@footnote{Technically, string comparison is supposed to behave +the same way as if the strings were compared with the C @code{strcoll()} +function.} Because this behavior differs considerably from existing practice, -@command{gawk} only implements it when in POSIX mode (@pxref{Options}). +@command{gawk} only implemented it when in POSIX mode (@pxref{Options}). Here is an example to illustrate the difference, in an @code{en_US.UTF-8} locale: @@ -11733,6 +11910,26 @@ $ @kbd{gawk --posix 'BEGIN @{ printf("ABC < abc = %s\n",} @print{} ABC < abc = FALSE @end example +Fortunately, as of August 2016, comparison based on locale +collating order is no longer required for the @code{==} and @code{!=} +operators.@footnote{See @uref{http://austingroupbugs.net/view.php?id=1070, +the Austin Group website}.} However, comparison based on locales is still +required for @code{<}, @code{<=}, @code{>}, and @code{>=}. POSIX thus +recommends as follows: + +@quotation +Since the @code{==} operator checks whether strings are identical, +not whether they collate equally, applications needing to check whether +strings collate equally can use: + +@example +a <= b && a >= b +@end example +@end quotation + +As of @value{PVERSION} 4.2, @command{gawk} continues to use locale +collating order for @code{<}, @code{<=}, @code{>}, and @code{>=} only +in POSIX mode. @node Boolean Ops @subsection Boolean Expressions @@ -13892,12 +14089,11 @@ specify the behavior when @code{FS} is the null string. Nonetheless, some other versions of @command{awk} also treat @code{""} specially.) -@cindex POSIX @command{awk}, @code{FS} variable and The default value is @w{@code{" "}}, a string consisting of a single -space. As a special exception, this value means that any -sequence of spaces, TABs, and/or newlines is a single separator.@footnote{In -POSIX @command{awk}, newline does not count as whitespace.} It also causes -spaces, TABs, and newlines at the beginning and end of a record to be ignored. +space. As a special exception, this value means that any sequence of +spaces, TABs, and/or newlines is a single separator. It also causes +spaces, TABs, and newlines at the beginning and end of a record to +be ignored. You can set the value of @code{FS} on the command line using the @option{-F} option: @@ -14121,10 +14317,24 @@ opens the next file. An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @code{"/home/arnold"}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system()} function. -(In a future version of @command{gawk}, it may do so.) +@code{ENVIRON["HOME"]} might be @code{/home/arnold}. + +For POSIX @command{awk}, changing this array does not affect the +environment passed on to any programs that @command{awk} may spawn via +redirection or the @code{system()} function. + +However, beginning with version 4.2, if not in POSIX +compatibility mode, @command{gawk} does update its own environment when +@code{ENVIRON} is changed, thus changing the environment seen by programs +that it creates. You should therefore be especially careful if you +modify @code{ENVIRON["PATH"]}, which is the search path for finding +executable programs. + +This can also affect the running @command{gawk} program, since some of the +built-in functions may pay attention to certain environment variables. +The most notable instance of this is @code{mktime()} (@pxref{Time +Functions}), which pays attention the value of the @env{TZ} environment +variable on many systems. Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @@ -14158,6 +14368,11 @@ value to be meaningful when an I/O operation returns a failure value, such as @code{getline} returning @minus{}1. You are, of course, free to clear it yourself before doing an I/O operation. +If the value of @code{ERRNO} corresponds to a system error in the C +@code{errno} variable, then @code{PROCINFO["errno"]} will be set to the value +of @code{errno}. For non-system errors, @code{PROCINFO["errno"]} will +be zero. + @cindex @code{FILENAME} variable @cindex dark corner, @code{FILENAME} variable @item @code{FILENAME} @@ -14226,6 +14441,10 @@ are guaranteed to be available: @item PROCINFO["egid"] The value of the @code{getegid()} system call. +@item PROCINFO["errno"] +The value of the C @code{errno} variable when @code{ERRNO} is set to +the associated error message. + @item PROCINFO["euid"] @cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @@ -14349,6 +14568,14 @@ to test for these elements The following elements allow you to change @command{gawk}'s behavior: @table @code +@item PROCINFO["NONFATAL"] +If this element exists, then I/O errors for all output redirections become nonfatal. +@xref{Nonfatal}. + +@item PROCINFO["@var{output_name}", "NONFATAL"] +Make output errors for @var{output_name} be nonfatal. +@xref{Nonfatal}. + @item PROCINFO["@var{command}", "pty"] For two-way communication to @var{command}, use a pseudo-tty instead of setting up a two-way pipe. @@ -16243,6 +16470,23 @@ truncated toward zero. For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. +@item @code{intdiv(@var{numerator}, @var{denominator}, @var{result})} +@cindexawkfunc{intdiv} +@cindex intdiv +Perform integer division, similar to the standard C function of the +same name. First, truncate @code{numerator} and @code{denominator} +towards zero, creating integer values. Clear the @code{result} +array, and then set @code{result["quotient"]} to the result of +@samp{numerator / denominator}, truncated towards zero to an integer, +and set @code{result["remainder"]} to the result of @samp{numerator % +denominator}, truncated towards zero to an integer. This function is +primarily intended for use with arbitrary length integers; it avoids +creating MPFR arbitrary precision floating-point values (@pxref{Arbitrary +Precision Integers}). + +This function is a @code{gawk} extension. It is not available in +compatibility mode (@pxref{Options}). + @item @code{log(@var{x})} @cindexawkfunc{log} @cindex logarithm @@ -18360,16 +18604,67 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. @node Type Functions @subsection Getting Type Information -@command{gawk} provides a single function that lets you distinguish -an array from a scalar variable. This is necessary for writing code +@command{gawk} provides two functions that lets you distinguish +the type of a variable. +This is necessary for writing code that traverses every element of an array of arrays -(@pxref{Arrays of Arrays}). +(@pxref{Arrays of Arrays}), and in other contexts. @table @code @cindexgawkfunc{isarray} @cindex scalar or array @item isarray(@var{x}) Return a true value if @var{x} is an array. Otherwise, return false. + +@cindexgawkfunc{typeof} +@cindex variable type +@cindex type, of variable +@item typeof(@var{x}) +Return one of the following strings, depending upon the type of @var{x}: + +@c nested table +@table @code +@item "array" +@var{x} is an array. + +@item "number" +@var{x} is a number. + +@item "string" +@var{x} is a string. + +@item "strnum" +@var{x} is a string that might be a number, such as a field or +the result of calling @code{split()}. (I.e., @var{x} has the STRNUM +attribute; @pxref{Variable Typing}.) + +@item "unassigned" +@var{x} is a scalar variable that has not been assigned a value yet. +For example: + +@example +BEGIN @{ + a[1] # creates a[1] but it has no assigned value + print typeof(a[1]) # scalar_u +@} +@end example + +@item "untyped" +@var{x} has not yet been used yet at all; it can become a scalar or an +array. +For example: + +@example +BEGIN @{ + print typeof(x) # x never used --> untyped + mk_arr(x) + print typeof(x) # x now an array --> array +@} + +function mk_arr(a) @{ a[1] = 1 @} +@end example + +@end table @end table @code{isarray()} is meant for use in two circumstances. The first is when @@ -18387,6 +18682,13 @@ that has not been previously used to @code{isarray()}, @command{gawk} ends up turning it into a scalar. @end quotation +The @code{typeof()} function is general; it allows you to determine +if a variable or function parameter is a scalar, an array. + +@code{isarray()} is deprecated; you should use @code{typeof()} instead. +You should replace any existing uses of @samp{isarray(var)} in your +code with @samp{typeof(var) == "array"}. + @node I18N Functions @subsection String-Translation Functions @cindex @command{gawk}, string-translation functions @@ -26617,9 +26919,16 @@ your program to hang. (Thus, this particular feature is of much less use in practice than being able to close the @code{"to"} end.) @quotation CAUTION -It is a fatal error to write to the @code{"to"} end of a two-way -pipe which has been closed. It is also a fatal error to read +Normally, +it is a fatal error to write to the @code{"to"} end of a two-way +pipe which has been closed, and it is also a fatal error to read from the @code{"from"} end of a two-way pipe that has been closed. + +You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to +make such operations become nonfatal, in which case you then need +to check @code{ERRNO} after each @code{print}, @code{printf}, +or @code{getline}. +@xref{Nonfatal}, for more information. @end quotation @cindex @command{gawk}, @code{PROCINFO} array in @@ -27003,8 +27312,7 @@ The profiled version of your program may not look exactly like what you typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty-printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce -a standard representation. The disadvantage is that all source code -comments are lost. +a standard representation. Also, things such as: @example @@ -27098,10 +27406,35 @@ When called this way, @command{gawk} ``pretty-prints'' the program into @file{awkprof.out}, without any execution counts. @quotation NOTE -The @option{--pretty-print} option still runs your program. -This will change in the next major release. +Once upon a time, the @option{--pretty-print} option would also run +your program. This is is no longer the case. @end quotation +There is a significant difference between the output created when +profiling, and that created when pretty-printing. Pretty-printed output +preserves the original comments that were in the program, although their +placement may not correspond exactly to their original locations in the +source code.@footnote{@command{gawk} does the best it can to preserve +the distinction between comments at the end of a statement and comments +on lines by themselves. Due to implementation constraints, it does not +always do so correctly, particularly for @code{switch} statements. The +@command{gawk} maintainers hope to improve this in a subsequent +release.} + +However, as a deliberate design decision, profiling output @emph{omits} +the original program's comments. This allows you to focus on the +execution count data and helps you avoid the temptation to use the +profiler for pretty-printing. + +Additionally, pretty-printed output does not have the leading indentation +that the profiling output does. This makes it easy to pretty-print your +code once development is completed, and then use the result as the final +version of your program. + +Because the internal representation of your program is formatted to +recreate an @command{awk} program, profiling and pretty-printing +automatically disable @command{gawk}'s default optimizations. + @node Advanced Features Summary @section Summary @@ -27142,8 +27475,7 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause @command{gawk} to dump the profile and keep going, including a function call stack. @item -You can also just ``pretty-print'' the program. This currently also runs -the program, but that will change in the next major release. +You can also just ``pretty-print'' the program. @end itemize @@ -29333,6 +29665,65 @@ executing, short programs. The @command{gawk} debugger only accepts source code supplied with the @option{-f} option. @end itemize +One other point is worth discussing. Conventional debuggers run in a +separate process (and thus address space) from the programs that they +debug (the @dfn{debuggee}, if you will). + +The @command{gawk} debugger is different; it is an integrated part +of @command{gawk} itself. This makes it possible, in rare cases, +for @command{gawk} to become an excellent demonstrator of Heisenberg +Uncertainty physics, where the mere act of observing something can change +it. Consider the following:@footnote{Thanks to Hermann Peifer for +this example.} + +@example +$ @kbd{cat test.awk} +@print{} @{ print typeof($1), typeof($2) @} +$ @kbd{cat test.data} +@print{} abc 123 +$ @kbd{gawk -f test.awk test.data} +@print{} strnum strnum +@end example + +This is all as expected: field data has the STRNUM attribute +(@pxref{Variable Typing}). Now watch what happens when we run +this program under the debugger: + +@example +$ @kbd{gawk -D -f test.awk test.data} +gawk> @kbd{w $1} @ii{Set watchpoint on} $1 +@print{} Watchpoint 1: $1 +gawk> @kbd{w $2} @ii{Set watchpoint on} $2 +@print{} Watchpoint 2: $2 +gawk> @kbd{r} @ii{Start the program} +@print{} Starting program: +@print{} Stopping in Rule ... +@print{} Watchpoint 1: $1 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "abc" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Keep going @dots{}} +@print{} Watchpoint 2: $2 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "123" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Get result from} typeof() +@print{} strnum number @ii{Result for} $2 @ii{isn't right} +@print{} Program exited normally with exit value: 0 +gawk> @kbd{quit} +@end example + +In this case, the act of comparing the new value of @code{$2} +with the old one caused @command{gawk} to evaluate it and determine that it +is indeed a number, and this is reflected in the result of +@code{typeof()}. + +Cases like this where the debugger is not transparent to the program's +execution should be rare. If you encounter one, please report it +(@pxref{Bugs}). + @ignore Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! @@ -29369,6 +29760,10 @@ If the GNU Readline library is available when @command{gawk} is compiled, it is used by the debugger to provide command-line history and editing. +@item +Usually, the debugger does not not affect the +program being debugged, but occasionally it can. + @end itemize @node Arbitrary Precision Arithmetic @@ -30186,6 +30581,122 @@ to just use the following: gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @end example +When dividing two arbitrary precision integers with either +@samp{/} or @samp{%}, the result is typically an arbitrary +precision floating point value (unless the denominator evenly +divides into the numerator). In order to do integer division +or remainder with arbitrary precision integers, use the built-in +@code{intdiv()} function (@pxref{Numeric Functions}). + +You can simulate the @code{intdiv()} function in standard @command{awk} +using this user-defined function: + +@example +@c file eg/lib/intdiv.awk +# intdiv --- do integer division + +@c endfile +@ignore +@c file eg/lib/intdiv.awk +# +# Arnold Robbins, arnold@@skeeve.com, Public Domain +# July, 2014 +# +# Name changed from div() to intdiv() +# April, 2015 + +@c endfile + +@end ignore +@c file eg/lib/intdiv.awk +function intdiv(numerator, denominator, result) +@{ + split("", result) + + numerator = int(numerator) + denominator = int(denominator) + result["quotient"] = int(numerator / denominator) + result["remainder"] = int(numerator % denominator) + + return 0.0 +@} +@c endfile +@end example + +The following example program, contributed by Katie Wasserman, +uses @code{intdiv()} to +compute the digits of @value{PI} to as many places as you +choose to set: + +@example +@c file eg/prog/pi.awk +# pi.awk --- compute the digits of pi +@c endfile +@c endfile +@ignore +@c file eg/prog/pi.awk +# +# Katie Wasserman, katie@@wass.net +# August 2014 +@c endfile +@end ignore +@c file eg/prog/pi.awk + +BEGIN @{ + digits = 100000 + two = 2 * 10 ^ digits + pi = two + for (m = digits * 4; m > 0; --m) @{ + d = m * 2 + 1 + x = pi * m + intdiv(x, d, result) + pi = result["quotient"] + pi = pi + two + @} + print pi +@} +@c endfile +@end example + +@ignore +Date: Wed, 20 Aug 2014 10:19:11 -0400 +To: arnold@skeeve.com +From: Katherine Wasserman <katie@wass.net> +Subject: Re: computation of digits of pi? + +Arnold, + +>The program that you sent to compute the digits of pi using div(). Is +>that some standard algorithm that every math student knows? If so, +>what's it called? + +It's not that well known but it's not that obscure either + +It's Euler's modification to Newton's method for calculating pi. + +Take a look at lines (23) - (25) here: http://mathworld.wolfram.com/PiFormulas.htm + +The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators because it's quite easy to modify for tiny memory devices with smallish word sizes. + +http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899 + +-Katie +@end ignore + +When asked about the algorithm used, Katie replied: + +@quotation +It's not that well known but it's not that obscure either. +It's Euler's modification to Newton's method for calculating pi. +Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}. + +The algorithm I wrote simply expands the multiply by 2 and works from +the innermost expression outwards. I used this to program HP calculators +because it's quite easy to modify for tiny memory devices with smallish +word sizes. See +@uref{http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899}. +@end quotation + @node POSIX Floating Point Problems @section Standards Versus Existing Practice @@ -30585,6 +31096,7 @@ This (rather large) @value{SECTION} describes the API in detail. * Symbol Table Access:: Functions for accessing global variables. * Array Manipulation:: Functions for working with arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension API Boilerplate:: Boilerplate code for using the API. @end menu @@ -30660,6 +31172,10 @@ Clearing an array @item Flattening an array for easy C-style looping over all its indices and elements @end itemize + +@item +Accessing and manipulating redirections. + @end itemize Some points about using the API: @@ -31048,7 +31564,7 @@ Extension functions are described by the following record: typedef struct awk_ext_func @{ @ @ @ @ const char *name; @ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); -@ @ @ @ size_t num_expected_args; +@ @ @ @ size_t max_expected_args; @} awk_ext_func_t; @end example @@ -31080,11 +31596,17 @@ actual parameters were passed from the calling @command{awk} code. The function must return the value of @code{result}. This is for the convenience of the calling code inside @command{gawk}. -@item size_t num_expected_args; -This is the number of arguments the function expects to receive. +@item size_t max_expected_args; +This is the maximum number of arguments the function expects to receive. Each extension function may decide what to do if the number of arguments isn't what it expected. As with real @command{awk} functions, it -is likely OK to ignore extra arguments. +is likely OK to ignore extra arguments. This value does not affect +actual program execution. + +Extension functions should compare this value to the number of actual +arguments passed and possibly issue a lint warning if there is an +undesirable mismatch. Of course, if +@samp{--lint=fatal} is used, this would cause the program to exit. @end table Once you have a record representing your extension function, you register @@ -31587,6 +32109,9 @@ that parameter. More's the pity.} @item void fatal(awk_ext_id_t id, const char *format, ...); Print a message and then cause @command{gawk} to exit immediately. +@item void nonfatal(awk_ext_id_t id, const char *format, ...); +Print a nonfatal error message. + @item void warning(awk_ext_id_t id, const char *format, ...); Print a warning message. @@ -32630,6 +33155,75 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} (@xref{Finding Extensions} for more information on the @env{AWKLIBPATH} environment variable.) +@node Redirection API +@subsection Accessing and Manipulating Redirections + +The following function allows extensions to access and manipulate redirections. + +@table @code +@item awk_bool_t get_file(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t name_len, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const char *filetype, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int fd, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_input_buf_t **ibufp, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_output_buf_t **obufp); +Look up a file in @command{gawk}'s internal redirection table. +If @code{name} is @code{NULL} or @code{name_len} is zero, return +data for the currently open input file corresponding to @code{FILENAME}. +(This does not access the @code{filetype} argument, so that may be undefined). +If the file is not already open, attempt to open it. +The @code{filetype} argument must be zero-terminated and should be one of: + +@table @code +@item ">" +A file opened for output. + +@item ">>" +A file opened for append. + +@item "<" +A file opened for input. + +@item "|>" +A pipe opened for output. + +@item "|<" +A pipe opened for input. + +@item "|&" +A two-way coprocess. +@end table + +On error, return a @code{false} value. Otherwise, return +@code{true}, and return additional information about the redirection +in the @code{ibufp} and @code{obufp} pointers. For input +redirections, the @code{*ibufp} value should be non-@code{NULL}, +and @code{*obufp} should be @code{NULL}. For output redirections, +the @code{*obufp} value should be non-@code{NULL}, and @code{*ibufp} +should be @code{NULL}. For two-way coprocesses, both values should +be non-@code{NULL}. + +In the usual case, the extension is interested in @code{(*ibufp)->fd} +and/or @code{fileno((*obufp)->fp)}. If the file is not already +open, and the @code{fd} argument is non-negative, @command{gawk} +will use that file descriptor instead of opening the file in the +usual way. If @code{fd} is non-negative, but the file exists already, +@command{gawk} ignores @code{fd} and returns the existing file. It is +the caller's responsibility to notice that neither the @code{fd} in +the returned @code{awk_input_buf_t} nor the @code{fd} in the returned +@code{awk_output_buf_t} matches the requested value. + +Note that supplying a file descriptor is currently @emph{not} supported +for pipes. However, supplying a file descriptor should work for input, +output, append, and two-way (coprocess) sockets. If @code{filetype} +is two-way, @command{gawk} assumes that it is a socket! Note that in +the two-way case, the input and output file descriptors may differ. +To check for success, you must check whether either matches. +@end table + +It is anticipated that this API function will be used to implement I/O +multiplexing and a socket library. + @node Extension API Variables @subsection API Variables @@ -34197,18 +34791,21 @@ As of this writing, there are seven extensions: GD graphics library extension @item +MPFR library extension +(this provides access to a number of MPFR functions that @command{gawk}'s +native MPFR support does not) + +@item PDF extension @item PostgreSQL extension @item -MPFR library extension -(this provides access to a number of MPFR functions that @command{gawk}'s -native MPFR support does not) +Redis extension @item -Redis extension +Select extension @item XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} @@ -34308,7 +34905,7 @@ output wrappers, and two-way processors) @item -Printing fatal, warning, and ``lint'' warning messages +Printing fatal, nonfatal, warning, and ``lint'' warning messages @item Updating @code{ERRNO}, or unsetting it @@ -34837,6 +35434,10 @@ Indirect function calls @item Directories on the command line produce a warning and are skipped (@pxref{Command-line directories}) + +@item +Output with @code{print} and @code{printf} need not be fatal +(@pxref{Nonfatal}) @end itemize @item @@ -34924,6 +35525,11 @@ The @code{isarray()} function to check if a variable is an array or not The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()} functions for internationalization (@pxref{Programmer i18n}) + +@item +The @code{intdiv()} function for doing integer +division and remainder +(@pxref{Numeric Functions}) @end itemize @item @@ -34962,6 +35568,7 @@ The @option{-p}, @option{-P}, @option{-r}, +@option{-s}, @option{-S}, @option{-t}, and @@ -34986,6 +35593,7 @@ and the @option{--load}, @option{--non-decimal-data}, @option{--optimize}, +@option{--no-optimize}, @option{--posix}, @option{--pretty-print}, @option{--profile}, @@ -35056,6 +35664,16 @@ for @command{gawk} @value{PVERSION} 4.1: Ultrix @end itemize +@item +Support for the following systems was removed from the code +for @command{gawk} @value{PVERSION} 4.2: + +@c nested table +@itemize @value{MINUS} +@item +MirBSD +@end itemize + @end itemize @c XXX ADD MORE STUFF HERE @@ -35682,6 +36300,52 @@ Support for Ultrix was removed. @end itemize +Version 4.2 introduced the following changes: + +@itemize @bullet +@item +Changes to @code{ENVIRON} are reflected into @command{gawk}'s +environment and that of programs that it runs. +@xref{Auto-set}. + +@item +The @option{--pretty-print} option no longer runs the @command{awk} +program too. +@xref{Options}. + +@item +The @command{igawk} program and its manual page are no longer +installed when @command{gawk} is built. +@xref{Igawk Program}. + +@item +The @code{intdiv()} function. +@xref{Numeric Functions}. + +@item +The maximum number of hexadecimal digits in @samp{\x} escapes +is now two. +@xref{Escape Sequences}. + +@item +Nonfatal output with @code{print} and @code{printf}. +@xref{Nonfatal}. + +@item +For many years, POSIX specified that default field splitting +only allowed spaces and tabs to separate fields, and this was +how @command{gawk} behaved with @option{--posix}. As of 2013, +the standard restored historical behavior, and now default +field splitting with @option{--posix} also allows newlines to +separate fields. + +@item +Support for MirBSD was removed. + +@item +Support for GNU/Linux on Alpha was removed. +@end itemize + @c XXX ADD MORE STUFF HERE @end ifclear @@ -35811,7 +36475,7 @@ and @uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05, its rationale}.} By using this lovely technical term, the standard gives license -to implementors to implement ranges in whatever way they choose. +to implementers to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning both with the default regexp matching and when @option{--traditional} or @option{--posix} are used. @@ -36348,6 +37012,8 @@ The generated Info file for The @command{troff} source for a manual page describing the @command{igawk} program presented in @ref{Igawk Program}. +(Since @command{gawk} can do its own @code{@@include} processing, +neither @command{igawk} nor @file{igawk.1} are installed.) @item doc/Makefile.in The input file used during the configuration process to generate the @@ -36392,8 +37058,6 @@ source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} fil @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}. The library functions from @ref{Library Functions}, -and the @command{igawk} program from -@ref{Igawk Program} are included as ready-to-use files in the @command{gawk} distribution. They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate @@ -36404,6 +37068,12 @@ The source code, manual pages, and infrastructure files for the sample extensions included with @command{gawk}. @xref{Dynamic Extensions}, for more information. +@item extras/* +Additional non-essential files. Currently, this directory contains some shell +startup files to be installed in @file{/etc/profile.d} to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +@xref{Shell Startup Files}, for more information. + @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @@ -36432,6 +37102,7 @@ to configure @command{gawk} for your system yourself. @menu * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. @end menu @@ -36512,6 +37183,44 @@ is likely that you will be asked for your password, and you will have to have been set up previously as a user who is allowed to run the @command{sudo} command. +@node Shell Startup Files +@appendixsubsec Shell Startup Files + +The distribution contains shell startup files @file{gawk.sh} and +@file{gawk.csh} containing functions to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +On a Fedora system, these files should be installed in @file{/etc/profile.d}; +on other platforms, the appropriate location may be different. + +@table @command + +@cindex @command{gawkpath_default} shell function +@item gawkpath_default +Reset the @env{AWKPATH} environment variable to its default value. + +@cindex @command{gawkpath_prepend} shell function +@item gawkpath_prepend +Add the argument to the front of the @env{AWKPATH} environment variable. + +@cindex @command{gawkpath_append} shell function +@item gawkpath_append +Add the argument to the end of the @env{AWKPATH} environment variable. + +@cindex @command{gawklibpath_default} shell function +@item gawklibpath_default +Reset the @env{AWKLIBPATH} environment variable to its default value. + +@cindex @command{gawklibpath_prepend} shell function +@item gawklibpath_prepend +Add the argument to the front of the @env{AWKLIBPATH} environment variable. + +@cindex @command{gawklibpath_append} shell function +@item gawklibpath_append +Add the argument to the end of the @env{AWKLIBPATH} environment variable. + +@end table + + @node Additional Configuration Options @appendixsubsec Additional Configuration Options @cindex @command{gawk}, configuring, options |