diff options
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r-- | doc/gawktexi.in | 636 |
1 files changed, 332 insertions, 304 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in index dfc710b5..ecd6a972 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -721,12 +721,12 @@ particular records in a file and perform operations upon them. elements. * Controlling Scanning:: Controlling the order in which arrays are scanned. -* Delete:: The @code{delete} statement removes an - element from an array. * Numeric Array Subscripts:: How to use numbers as subscripts in @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. +* Delete:: The @code{delete} statement removes an + element from an array. * Multidimensional:: Emulating multidimensional arrays in @command{awk}. * Multiscanning:: Scanning multidimensional arrays. @@ -10110,7 +10110,7 @@ if (/barfly/ || /camelot/) @noindent are exactly equivalent. One rather bizarre consequence of this rule is that the following -Boolean expression is valid, but does not do what the user probably +Boolean expression is valid, but does not do what its author probably intended: @example @@ -10156,10 +10156,9 @@ Modern implementations of @command{awk}, including @command{gawk}, allow the third argument of @code{split()} to be a regexp constant, but some older implementations do not. @value{DARKCORNER} -This can lead to confusion when attempting to use regexp constants -as arguments to user-defined functions -(@pxref{User-defined}). -For example: +Because some built-in functions accept regexp constants as arguments, +it can be confusing when attempting to use regexp constants as arguments +to user-defined functions (@pxref{User-defined}). For example: @example function mysub(pat, repl, str, global) @@ -10227,8 +10226,8 @@ variable's current value. Variables are given new values with @dfn{decrement operators}. @xref{Assignment Ops}. In addition, the @code{sub()} and @code{gsub()} functions can -change a variable's value, and the @code{match()}, @code{patsplit()} -and @code{split()} functions can change the contents of their +change a variable's value, and the @code{match()}, @code{split()} +and @code{patsplit()} functions can change the contents of their array parameters. @xref{String Functions}. @cindex variables, built-in @@ -10244,7 +10243,7 @@ Variables in @command{awk} can be assigned either numeric or string values. The kind of value a variable holds can change over the life of a program. By default, variables are initialized to the empty string, which is zero if converted to a number. There is no need to explicitly -``initialize'' a variable in @command{awk}, +initialize a variable in @command{awk}, which is what you would do in C and in most other traditional languages. @node Assignment Options @@ -10452,7 +10451,7 @@ $ @kbd{echo 4,321 | LC_ALL=en_DK.utf-8 gawk '@{ print $1 + 1 @}'} @noindent The @code{en_DK.utf-8} locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal @code{"C"} locale, @command{gawk} -treats @samp{4,321} as @samp{4}, while in the Danish locale, it's treated +treats @samp{4,321} as 4, while in the Danish locale, it's treated as the full number, 4.321. Some earlier versions of @command{gawk} fully complied with this aspect @@ -11004,7 +11003,7 @@ awk '/[=]=/' /dev/null @end example @command{gawk} does not have this problem; BWK @command{awk} -and @command{mawk} also do not (@pxref{Other Versions}). +and @command{mawk} also do not. @end sidebar @c ENDOFRANGE exas @c ENDOFRANGE opas @@ -11257,7 +11256,7 @@ attribute. @item Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, @code{ENVIRON} elements, and the elements of an array created by -@code{patsplit()}, @code{split()} and @code{match()} that are numeric +@code{match()}, @code{split()} and @code{patsplit()} that are numeric strings have the @var{strnum} attribute. Otherwise, they have the @var{string} attribute. Uninitialized variables also have the @var{strnum} attribute. @@ -11412,22 +11411,23 @@ Thus, the six-character input string @w{@samp{ +3.14}} receives the The following examples print @samp{1} when the comparison between the two different constants is true, @samp{0} otherwise: +@c 22.9.2014: Tested with mawk and BWK awk, got same results. @example -$ @kbd{echo ' +3.14' | gawk '@{ print $0 == " +3.14" @}'} @ii{True} +$ @kbd{echo ' +3.14' | awk '@{ print($0 == " +3.14") @}'} @ii{True} @print{} 1 -$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "+3.14" @}'} @ii{False} +$ @kbd{echo ' +3.14' | awk '@{ print($0 == "+3.14") @}'} @ii{False} @print{} 0 -$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "3.14" @}'} @ii{False} +$ @kbd{echo ' +3.14' | awk '@{ print($0 == "3.14") @}'} @ii{False} @print{} 0 -$ @kbd{echo ' +3.14' | gawk '@{ print $0 == 3.14 @}'} @ii{True} +$ @kbd{echo ' +3.14' | awk '@{ print($0 == 3.14) @}'} @ii{True} @print{} 1 -$ @kbd{echo ' +3.14' | gawk '@{ print $1 == " +3.14" @}'} @ii{False} +$ @kbd{echo ' +3.14' | awk '@{ print($1 == " +3.14") @}'} @ii{False} @print{} 0 -$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "+3.14" @}'} @ii{True} +$ @kbd{echo ' +3.14' | awk '@{ print($1 == "+3.14") @}'} @ii{True} @print{} 1 -$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "3.14" @}'} @ii{False} +$ @kbd{echo ' +3.14' | awk '@{ print($1 == "3.14") @}'} @ii{False} @print{} 0 -$ @kbd{echo ' +3.14' | gawk '@{ print $1 == 3.14 @}'} @ii{True} +$ @kbd{echo ' +3.14' | awk '@{ print($1 == 3.14) @}'} @ii{True} @print{} 1 @end example @@ -11501,9 +11501,8 @@ part of the test always succeeds. Because the operators are so similar, this kind of error is very difficult to spot when scanning the source code. -@cindex @command{gawk}, comparison operators and -The following list of expressions illustrates the kind of comparison -@command{gawk} performs, as well as what the result of the comparison is: +The following list of expressions illustrates the kinds of comparisons +@command{awk} performs, as well as what the result of each comparison is: @table @code @item 1.5 <= 2.0 @@ -11576,7 +11575,7 @@ dynamic regexp (@pxref{Regexp Usage}; also @cindex @command{awk}, regexp constants and @cindex regexp constants -In modern implementations of @command{awk}, a constant regular +A constant regular expression in slashes by itself is also an expression. The regexp @code{/@var{regexp}/} is an abbreviation for the following comparison expression: @@ -11596,7 +11595,7 @@ where this is discussed in more detail. The POSIX standard says that string comparison is performed based on the locale's @dfn{collating order}. This is the order in which characters sort, as defined by the locale (for more discussion, -@pxref{Ranges and Locales}). This order is usually very different +@pxref{Locales}). This order is usually very different from the results obtained when doing straight character-by-character comparison.@footnote{Technically, string comparison is supposed to behave the same way as if the strings are compared with the C @@ -11676,7 +11675,7 @@ no substring @samp{foo} in the record. True if at least one of @var{boolean1} or @var{boolean2} is true. For example, the following statement prints all records in the input that contain @emph{either} @samp{edu} or -@samp{li} or both: +@samp{li}: @example if ($0 ~ /edu/ || $0 ~ /li/) print @@ -11685,6 +11684,9 @@ if ($0 ~ /edu/ || $0 ~ /li/) print The subexpression @var{boolean2} is evaluated only if @var{boolean1} is false. This can make a difference when @var{boolean2} contains expressions that have side effects. +(Thus, this test never really distinguishes records that contain both +@samp{edu} and @samp{li}---as soon as @samp{edu} is matched, +the full test succeeds.) @item ! @var{boolean} True if @var{boolean} is false. For example, @@ -11694,7 +11696,7 @@ variable is not defined: @example BEGIN @{ if (! ("HOME" in ENVIRON)) - print "no home!" @} + print "no home!" @} @end example (The @code{in} operator is described in @@ -12150,8 +12152,8 @@ system about the local character set and language. The ISO C standard defines a default @code{"C"} locale, which is an environment that is typical of what many C programmers are used to. -Once upon a time, the locale setting used to affect regexp matching -(@pxref{Ranges and Locales}), but this is no longer true. +Once upon a time, the locale setting used to affect regexp matching, +but this is no longer true (@pxref{Ranges and Locales}). Locales can affect record splitting. For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant. For other single-character @@ -12205,7 +12207,8 @@ Locales can influence the conversions. @item @command{awk} provides the usual arithmetic operators (addition, subtraction, multiplication, division, modulus), and unary plus and minus. -It also provides comparison operators, boolean operators, and regexp +It also provides comparison operators, boolean operators, array membership +testing, and regexp matching operators. String concatenation is accomplished by placing two expressions next to each other; there is no explicit operator. The three-operand @samp{?:} operator provides an ``if-else'' test within @@ -12220,7 +12223,7 @@ In @command{awk}, a value is considered to be true if it is non-zero @emph{or} non-null. Otherwise, the value is false. @item -A value's type is set upon each assignment and may change over its +A variable's type is set upon each assignment and may change over its lifetime. The type determines how it behaves in comparisons (string or numeric). @@ -12300,7 +12303,7 @@ is nonzero (if a number) or non-null (if a string). (@xref{Expression Patterns}.) @item @var{begpat}, @var{endpat} -A pair of patterns separated by a comma, specifying a range of records. +A pair of patterns separated by a comma, specifying a @dfn{range} of records. The range includes both the initial record that matches @var{begpat} and the final record that matches @var{endpat}. (@xref{Ranges}.) @@ -12390,8 +12393,8 @@ $ @kbd{awk '$1 ~ /li/ @{ print $2 @}' mail-list} @cindex regexp constants, as patterns @cindex patterns, regexp constants as A regexp constant as a pattern is also a special case of an expression -pattern. The expression @code{/li/} has the value one if @samp{li} -appears in the current input record. Thus, as a pattern, @code{/li/} +pattern. The expression @samp{/li/} has the value one if @samp{li} +appears in the current input record. Thus, as a pattern, @samp{/li/} matches any record containing @samp{li}. @cindex Boolean expressions, as patterns @@ -12573,7 +12576,7 @@ input is read. For example: @example $ @kbd{awk '} > @kbd{BEGIN @{ print "Analysis of \"li\"" @}} -> @kbd{/li/ @{ ++n @}} +> @kbd{/li/ @{ ++n @}} > @kbd{END @{ print "\"li\" appears in", n, "records." @}' mail-list} @print{} Analysis of "li" @print{} "li" appears in 4 records. @@ -12653,9 +12656,10 @@ The POSIX standard specifies that @code{NF} is available in an @code{END} rule. It contains the number of fields from the last input record. Most probably due to an oversight, the standard does not say that @code{$0} is also preserved, although logically one would think that it should be. -In fact, @command{gawk} does preserve the value of @code{$0} for use in -@code{END} rules. Be aware, however, that BWK @command{awk}, and possibly -other implementations, do not. +In fact, all of BWK @command{awk}, @command{mawk}, and @command{gawk} +preserve the value of @code{$0} for use in @code{END} rules. Be aware, +however, that some other implementations and many older versions +of Unix @command{awk} do not. The third point follows from the first two. The meaning of @samp{print} inside a @code{BEGIN} or @code{END} rule is the same as always: @@ -12750,8 +12754,8 @@ level of the @command{awk} program. @cindex @code{next} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and The @code{next} statement (@pxref{Next Statement}) is not allowed inside -either a @code{BEGINFILE} or and @code{ENDFILE} rule. The @code{nextfile} -statement (@pxref{Nextfile Statement}) is allowed only inside a +either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile} +statement is allowed only inside a @code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule. @cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and @@ -12815,7 +12819,7 @@ There are two ways to get the value of the shell variable into the body of the @command{awk} program. @cindex shells, quoting -The most common method is to use shell quoting to substitute +A common method is to use shell quoting to substitute the variable's value into the program inside the script. For example, consider the following program: @@ -13072,20 +13076,21 @@ If the @var{condition} is true, it executes the statement @var{body}. is not zero and not a null string.) @end ifinfo After @var{body} has been executed, -@var{condition} is tested again, and if it is still true, @var{body} is -executed again. This process repeats until the @var{condition} is no longer -true. If the @var{condition} is initially false, the body of the loop is -never executed and @command{awk} continues with the statement following +@var{condition} is tested again, and if it is still true, @var{body} +executes again. This process repeats until the @var{condition} is no longer +true. If the @var{condition} is initially false, the body of the loop +never executes and @command{awk} continues with the statement following the loop. This example prints the first three fields of each record, one per line: @example -awk '@{ - i = 1 - while (i <= 3) @{ - print $i - i++ - @} +awk ' +@{ + i = 1 + while (i <= 3) @{ + print $i + i++ + @} @}' inventory-shipped @end example @@ -13119,14 +13124,14 @@ do while (@var{condition}) @end example -Even if the @var{condition} is false at the start, the @var{body} is -executed at least once (and only once, unless executing @var{body} +Even if the @var{condition} is false at the start, the @var{body} +executes at least once (and only once, unless executing @var{body} makes @var{condition} true). Contrast this with the corresponding @code{while} statement: @example while (@var{condition}) - @var{body} + @var{body} @end example @noindent @@ -13136,11 +13141,11 @@ The following is an example of a @code{do} statement: @example @{ - i = 1 - do @{ - print $0 - i++ - @} while (i <= 10) + i = 1 + do @{ + print $0 + i++ + @} while (i <= 10) @} @end example @@ -13177,9 +13182,10 @@ compares it against the desired number of iterations. For example: @example -awk '@{ - for (i = 1; i <= 3; i++) - print $i +awk ' +@{ + for (i = 1; i <= 3; i++) + print $i @}' inventory-shipped @end example @@ -13207,7 +13213,7 @@ between 1 and 100: @example for (i = 1; i <= 100; i *= 2) - print i + print i @end example If there is nothing to be done, any of the three expressions in the @@ -13527,7 +13533,7 @@ The @code{next} statement is not allowed inside @code{BEGINFILE} and @cindex functions, user-defined, @code{next}/@code{nextfile} statements and According to the POSIX standard, the behavior is undefined if the @code{next} statement is used in a @code{BEGIN} or @code{END} rule. -@command{gawk} treats it as a syntax error. Although POSIX permits it, +@command{gawk} treats it as a syntax error. Although POSIX does not disallow it, most other @command{awk} implementations don't allow the @code{next} statement inside function bodies (@pxref{User-defined}). Just as with any other @code{next} statement, a @code{next} statement inside a function @@ -13582,7 +13588,7 @@ opened with redirections. It is not related to the main processing that @quotation NOTE For many years, @code{nextfile} was a -@command{gawk} extension. As of September, 2012, it was accepted for +common extension. In September, 2012, it was accepted for inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @end quotation @@ -13591,8 +13597,8 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @cindex @code{nextfile} statement, user-defined functions and @cindex Brian Kernighan's @command{awk} @cindex @command{mawk} utility -The current version of BWK @command{awk}, and @command{mawk} (@pxref{Other -Versions}) also support @code{nextfile}. However, they don't allow the +The current version of BWK @command{awk}, and @command{mawk} +also support @code{nextfile}. However, they don't allow the @code{nextfile} statement inside function bodies (@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the next record and starts processing it with the first rule in the program, @@ -13624,8 +13630,8 @@ the program to stop immediately. An @code{exit} statement that is not part of a @code{BEGIN} or @code{END} rule stops the execution of any further automatic rules for the current record, skips reading any remaining input records, and executes the -@code{END} rule if there is one. -Any @code{ENDFILE} rules are also skipped; they are not executed. +@code{END} rule if there is one. @command{gawk} also skips +any @code{ENDFILE} rules; they do not execute. In such a case, if you don't want the @code{END} rule to do its job, set a variable @@ -13733,7 +13739,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or @code{"wr"} indicates that all files should use binary I/O. Any other string value is treated the same as @code{"rw"}, but causes @command{gawk} to generate a warning message. @code{BINMODE} is described in more -detail in @ref{PC Using}. @command{mawk} @pxref{Other Versions}), +detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}), also supports this variable, but only using numeric values. @cindex @code{CONVFMT} variable @@ -13860,7 +13866,7 @@ printing with the @code{print} statement. It works by being passed as the first argument to the @code{sprintf()} function (@pxref{String Functions}). Its default value is @code{"%.6g"}. Earlier versions of @command{awk} -also used @code{OFMT} to specify the format for converting numbers to +used @code{OFMT} to specify the format for converting numbers to strings in general expressions; this is now done by @code{CONVFMT}. @cindex @code{sprintf()} function, @code{OFMT} variable and @@ -14012,8 +14018,8 @@ successive instances of the same @value{FN} on the command line. @cindex file names, distinguishing While you can change the value of @code{ARGIND} within your @command{awk} -program, @command{gawk} automatically sets it to a new value when the -next file is opened. +program, @command{gawk} automatically sets it to a new value when it +opens the next file. @cindex @code{ENVIRON} array @cindex environment variables, in @code{ENVIRON} array @@ -14070,10 +14076,10 @@ can give @code{FILENAME} a value. @cindex @code{FNR} variable @item @code{FNR} -The current record number in the current file. @code{FNR} is -incremented each time a new record is read -(@pxref{Records}). It is reinitialized -to zero each time a new input file is started. +The current record number in the current file. @command{awk} increments +@code{FNR} each time it reads a new record (@pxref{Records}). +@command{awk} resets @code{FNR} to zero each time it starts a new +input file. @cindex @code{NF} variable @item @code{NF} @@ -14105,7 +14111,7 @@ array causes a fatal error. Any attempt to assign to an element of The number of input records @command{awk} has processed since the beginning of the program's execution (@pxref{Records}). -@code{NR} is incremented each time a new record is read. +@command{awk} increments @code{NR} each time it reads a new record. @cindex @command{gawk}, @code{PROCINFO} array in @cindex @code{PROCINFO} array @@ -14185,7 +14191,7 @@ The parent process ID of the current process. @item PROCINFO["sorted_in"] If this element exists in @code{PROCINFO}, its value controls the order in which array indices will be processed by -@samp{for (@var{index} in @var{array})} loops. +@samp{for (@var{indx} in @var{array})} loops. Since this is an advanced feature, we defer the full description until later; see @ref{Scanning an Array}. @@ -14206,7 +14212,7 @@ The version of @command{gawk}. The following additional elements in the array are available to provide information about the MPFR and GMP libraries -if your version of @command{gawk} supports arbitrary precision numbers +if your version of @command{gawk} supports arbitrary precision arithmetic (@pxref{Arbitrary Precision Arithmetic}): @table @code @@ -14255,14 +14261,14 @@ The @code{PROCINFO} array has the following additional uses: @itemize @value{BULLET} @item -It may be used to cause coprocesses to communicate over pseudo-ttys -instead of through two-way pipes; this is discussed further in -@ref{Two-way I/O}. - -@item It may be used to provide a timeout when reading from any open input file, pipe, or coprocess. @xref{Read Timeout}, for more information. + +@item +It may be used to cause coprocesses to communicate over pseudo-ttys +instead of through two-way pipes; this is discussed further in +@ref{Two-way I/O}. @end itemize @cindex @code{RLENGTH} variable @@ -14504,6 +14510,12 @@ following @option{-v} are passed on to the @command{awk} program. (@xref{Getopt Function}, for an @command{awk} library function that parses command-line options.) +When designing your program, you should choose options that don't +conflict with @command{gawk}'s, since it will process any options +that it accepts before passing the rest of the command line on to +your program. Using @samp{#!} with the @option{-E} option may help +(@pxref{Executable Scripts}, and @pxref{Options}). + @node Pattern Action Summary @section Summary @@ -14538,7 +14550,7 @@ input and output statements, and deletion statements. The control statements in @command{awk} are @code{if}-@code{else}, @code{while}, @code{for}, and @code{do}-@code{while}. @command{gawk} adds the @code{switch} statement. There are two flavors of @code{for} -statement: one for for performing general looping, and the other iterating +statement: one for performing general looping, and the other for iterating through an array. @item @@ -14555,12 +14567,17 @@ The @code{exit} statement terminates your program. When executed from an action (or function body) it transfers control to the @code{END} statements. From an @code{END} statement body, it exits immediately. You may pass an optional numeric value to be used -at @command{awk}'s exit status. +as @command{awk}'s exit status. @item Some built-in variables provide control over @command{awk}, mainly for I/O. Other variables convey information from @command{awk} to your program. +@item +@code{ARGC} and @code{ARGV} make the command-line arguments available +to your program. Manipulating them from a @code{BEGIN} rule lets you +control how @command{awk} will process the provided @value{DF}s. + @end itemize @node Arrays @@ -14581,24 +14598,13 @@ The @value{CHAPTER} moves on to discuss @command{gawk}'s facility for sorting arrays, and ends with a brief description of @command{gawk}'s ability to support true arrays of arrays. -@cindex variables, names of -@cindex functions, names of -@cindex arrays, names of, and names of functions/variables -@cindex names, arrays/variables -@cindex namespace issues -@command{awk} maintains a single set -of names that may be used for naming variables, arrays, and functions -(@pxref{User-defined}). -Thus, you cannot have a variable and an array with the same name in the -same @command{awk} program. - @menu * Array Basics:: The basics of arrays. -* Delete:: The @code{delete} statement removes an element - from an array. * Numeric Array Subscripts:: How to use numbers as subscripts in @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. +* Delete:: The @code{delete} statement removes an element + from an array. * Multidimensional:: Emulating multidimensional arrays in @command{awk}. * Arrays of Arrays:: True multidimensional arrays. @@ -15026,14 +15032,14 @@ begin with a number: @example @c file eg/misc/arraymax.awk @{ - if ($1 > max) - max = $1 - arr[$1] = $0 + if ($1 > max) + max = $1 + arr[$1] = $0 @} END @{ - for (x = 1; x <= max; x++) - print arr[x] + for (x = 1; x <= max; x++) + print arr[x] @} @c endfile @end example @@ -15073,9 +15079,9 @@ program's @code{END} rule, as follows: @example END @{ - for (x = 1; x <= max; x++) - if (x in arr) - print arr[x] + for (x = 1; x <= max; x++) + if (x in arr) + print arr[x] @} @end example @@ -15097,7 +15103,7 @@ an array: @example for (@var{var} in @var{array}) - @var{body} + @var{body} @end example @noindent @@ -15170,7 +15176,7 @@ BEGIN @{ @} @end example -Here is what happens when run with @command{gawk}: +Here is what happens when run with @command{gawk} (and @command{mawk}): @example $ @kbd{gawk -f loopcheck.awk} @@ -15288,7 +15294,8 @@ does not affect the loop. For example: @example -$ @kbd{gawk 'BEGIN @{} +$ @kbd{gawk '} +> @kbd{BEGIN @{} > @kbd{ a[4] = 4} > @kbd{ a[3] = 3} > @kbd{ for (i in a)} @@ -15296,7 +15303,8 @@ $ @kbd{gawk 'BEGIN @{} > @kbd{@}'} @print{} 4 4 @print{} 3 3 -$ @kbd{gawk 'BEGIN @{} +$ @kbd{gawk '} +> @kbd{BEGIN @{} > @kbd{ PROCINFO["sorted_in"] = "@@ind_str_asc"} > @kbd{ a[4] = 4} > @kbd{ a[3] = 3} @@ -15345,118 +15353,6 @@ the @code{delete} statement. In addition, @command{gawk} provides built-in functions for sorting arrays; see @ref{Array Sorting Functions}. -@node Delete -@section The @code{delete} Statement -@cindex @code{delete} statement -@cindex deleting elements in arrays -@cindex arrays, elements, deleting -@cindex elements in arrays, deleting - -To remove an individual element of an array, use the @code{delete} -statement: - -@example -delete @var{array}[@var{index-expression}] -@end example - -Once an array element has been deleted, any value the element once -had is no longer available. It is as if the element had never -been referred to or been given a value. -The following is an example of deleting elements in an array: - -@example -for (i in frequencies) - delete frequencies[i] -@end example - -@noindent -This example removes all the elements from the array @code{frequencies}. -Once an element is deleted, a subsequent @code{for} statement to scan the array -does not report that element and the @code{in} operator to check for -the presence of that element returns zero (i.e., false): - -@example -delete foo[4] -if (4 in foo) - print "This will never be printed" -@end example - -@cindex null strings, and deleting array elements -It is important to note that deleting an element is @emph{not} the -same as assigning it a null value (the empty string, @code{""}). -For example: - -@example -foo[4] = "" -if (4 in foo) - print "This is printed, even though foo[4] is empty" -@end example - -@cindex lint checking, array elements -It is not an error to delete an element that does not exist. -However, if @option{--lint} is provided on the command line -(@pxref{Options}), -@command{gawk} issues a warning message when an element that -is not in the array is deleted. - -@cindex common extensions, @code{delete} to delete entire arrays -@cindex extensions, common@comma{} @code{delete} to delete entire arrays -@cindex arrays, deleting entire contents -@cindex deleting entire arrays -@cindex @code{delete} @var{array} -@cindex differences in @command{awk} and @command{gawk}, array elements, deleting -All the elements of an array may be deleted with a single statement -by leaving off the subscript in the @code{delete} statement, -as follows: - - -@example -delete @var{array} -@end example - -Using this version of the @code{delete} statement is about three times -more efficient than the equivalent loop that deletes each element one -at a time. - -@cindex Brian Kernighan's @command{awk} -@quotation NOTE -For many years, -using @code{delete} without a subscript was a @command{gawk} extension. -As of September, 2012, it was accepted for -inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544, -the Austin Group website}. This form of the @code{delete} statement is also supported -by BWK @command{awk} and @command{mawk}, as well as -by a number of other implementations (@pxref{Other Versions}). -@end quotation - -@cindex portability, deleting array elements -@cindex Brennan, Michael -The following statement provides a portable but nonobvious way to clear -out an array:@footnote{Thanks to Michael Brennan for pointing this out.} - -@example -split("", array) -@end example - -@cindex @code{split()} function, array elements@comma{} deleting -The @code{split()} function -(@pxref{String Functions}) -clears out the target array first. This call asks it to split -apart the null string. Because there is no data to split out, the -function simply clears the array and then returns. - -@quotation CAUTION -Deleting an array does not change its type; you cannot -delete an array and then use the array's name as a scalar -(i.e., a regular variable). For example, the following does not work: - -@example -a[1] = 3 -delete a -a = 3 -@end example -@end quotation - @node Numeric Array Subscripts @section Using Numbers to Subscript Arrays @@ -15497,7 +15393,7 @@ since @code{"12.15"} is different from @code{"12.153"}. @cindex integer array indices According to the rules for conversions (@pxref{Conversion}), integer -values are always converted to strings as integers, no matter what the +values always convert to strings as integers, no matter what the value of @code{CONVFMT} may happen to be. So the usual case of the following works: @@ -15520,7 +15416,7 @@ and all refer to the same element! As with many things in @command{awk}, the majority of the time -things work as one would expect them to. But it is useful to have a precise +things work as you would expect them to. But it is useful to have a precise knowledge of the actual rules since they can sometimes have a subtle effect on your programs. @@ -15584,6 +15480,119 @@ Even though it is somewhat unusual, the null string if @option{--lint} is provided on the command line (@pxref{Options}). +@node Delete +@section The @code{delete} Statement +@cindex @code{delete} statement +@cindex deleting elements in arrays +@cindex arrays, elements, deleting +@cindex elements in arrays, deleting + +To remove an individual element of an array, use the @code{delete} +statement: + +@example +delete @var{array}[@var{index-expression}] +@end example + +Once an array element has been deleted, any value the element once +had is no longer available. It is as if the element had never +been referred to or been given a value. +The following is an example of deleting elements in an array: + +@example +for (i in frequencies) + delete frequencies[i] +@end example + +@noindent +This example removes all the elements from the array @code{frequencies}. +Once an element is deleted, a subsequent @code{for} statement to scan the array +does not report that element and the @code{in} operator to check for +the presence of that element returns zero (i.e., false): + +@example +delete foo[4] +if (4 in foo) + print "This will never be printed" +@end example + +@cindex null strings, and deleting array elements +It is important to note that deleting an element is @emph{not} the +same as assigning it a null value (the empty string, @code{""}). +For example: + +@example +foo[4] = "" +if (4 in foo) + print "This is printed, even though foo[4] is empty" +@end example + +@cindex lint checking, array elements +It is not an error to delete an element that does not exist. +However, if @option{--lint} is provided on the command line +(@pxref{Options}), +@command{gawk} issues a warning message when an element that +is not in the array is deleted. + +@cindex common extensions, @code{delete} to delete entire arrays +@cindex extensions, common@comma{} @code{delete} to delete entire arrays +@cindex arrays, deleting entire contents +@cindex deleting entire arrays +@cindex @code{delete} @var{array} +@cindex differences in @command{awk} and @command{gawk}, array elements, deleting +All the elements of an array may be deleted with a single statement +by leaving off the subscript in the @code{delete} statement, +as follows: + + +@example +delete @var{array} +@end example + +Using this version of the @code{delete} statement is about three times +more efficient than the equivalent loop that deletes each element one +at a time. + +This form of the @code{delete} statement is also supported +by BWK @command{awk} and @command{mawk}, as well as +by a number of other implementations. + +@cindex Brian Kernighan's @command{awk} +@quotation NOTE +For many years, using @code{delete} without a subscript was a common +extension. In September, 2012, it was accepted for inclusion into the +POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544, +the Austin Group website}. +@end quotation + +@cindex portability, deleting array elements +@cindex Brennan, Michael +The following statement provides a portable but nonobvious way to clear +out an array:@footnote{Thanks to Michael Brennan for pointing this out.} + +@example +split("", array) +@end example + +@cindex @code{split()} function, array elements@comma{} deleting +The @code{split()} function +(@pxref{String Functions}) +clears out the target array first. This call asks it to split +apart the null string. Because there is no data to split out, the +function simply clears the array and then returns. + +@quotation CAUTION +Deleting all the elements from an array does not change its type; you cannot +clear an array and then use the array's name as a scalar +(i.e., a regular variable). For example, the following does not work: + +@example +a[1] = 3 +delete a +a = 3 +@end example +@end quotation + @node Multidimensional @section Multidimensional Arrays @@ -15595,7 +15604,7 @@ on the command line (@pxref{Options}). @cindex arrays, multidimensional A multidimensional array is an array in which an element is identified by a sequence of indices instead of a single index. For example, a -two-dimensional array requires two indices. The usual way (in most +two-dimensional array requires two indices. The usual way (in many languages, including @command{awk}) to refer to an element of a two-dimensional array named @code{grid} is with @code{grid[@var{x},@var{y}]}. @@ -15770,8 +15779,9 @@ a[1][3][1, "name"] = "barney" Each subarray and the main array can be of different length. In fact, the elements of an array or its subarray do not all have to have the same type. This means that the main array and any of its subarrays can be -non-rectangular, or jagged in structure. One can assign a scalar value to -the index @code{4} of the main array @code{a}: +non-rectangular, or jagged in structure. You can assign a scalar value to +the index @code{4} of the main array @code{a}, even though @code{a[1]} +is itself an array and not a scalar: @example a[4] = "An element in a jagged array" @@ -15853,6 +15863,8 @@ for (i in array) @{ print array[i][j] @} @} + else + print array[i] @} @end example @@ -16120,8 +16132,9 @@ Often random integers are needed instead. Following is a user-defined function that can be used to obtain a random non-negative integer less than @var{n}: @example -function randint(n) @{ - return int(n * rand()) +function randint(n) +@{ + return int(n * rand()) @} @end example @@ -16141,8 +16154,7 @@ function roll(n) @{ return 1 + int(rand() * n) @} # Roll 3 six-sided dice and # print total number of points. @{ - printf("%d points\n", - roll(6)+roll(6)+roll(6)) + printf("%d points\n", roll(6) + roll(6) + roll(6)) @} @end example @@ -16231,7 +16243,7 @@ doing index calculations, particularly if you are used to C. In the following list, optional parameters are enclosed in square brackets@w{ ([ ]).} Several functions perform string substitution; the full discussion is provided in the description of the @code{sub()} function, which comes -towards the end since the list is presented in alphabetic order. +towards the end since the list is presented alphabetically. Those functions that are specific to @command{gawk} are marked with a pound sign (@samp{#}). They are not available in compatibility mode @@ -16275,6 +16287,7 @@ When comparing strings, @code{IGNORECASE} affects the sorting (@pxref{Array Sorting Functions}). If the @var{source} array contains subarrays as values (@pxref{Arrays of Arrays}), they will come last, after all scalar values. +Subarrays are @emph{not} recursively sorted. For example, if the contents of @code{a} are as follows: @@ -16411,7 +16424,10 @@ $ @kbd{awk 'BEGIN @{ print index("peanut", "an") @}'} @noindent If @var{find} is not found, @code{index()} returns zero. -It is a fatal error to use a regexp constant for @var{find}. +With BWK @command{awk} and @command{gawk}, +it is a fatal error to use a regexp constant for @var{find}. +Other implementations allow it, simply treating the regexp +constant as an expression meaning @samp{$0 ~ /regexp/}. @item @code{length(}[@var{string}]@code{)} @cindexawkfunc{length} @@ -16525,13 +16541,12 @@ For example: @example @c file eg/misc/findpat.awk @{ - if ($1 == "FIND") - regex = $2 - else @{ - where = match($0, regex) - if (where != 0) - print "Match of", regex, "found at", - where, "in", $0 + if ($1 == "FIND") + regex = $2 + else @{ + where = match($0, regex) + if (where != 0) + print "Match of", regex, "found at", where, "in", $0 @} @} @c endfile @@ -16627,7 +16642,7 @@ Any leading separator will be in @code{@var{seps}[0]}. The @code{patsplit()} function splits strings into pieces in a manner similar to the way input lines are split into fields using @code{FPAT} -(@pxref{Splitting By Content}. +(@pxref{Splitting By Content}). Before splitting the string, @code{patsplit()} deletes any previously existing elements in the arrays @var{array} and @var{seps}. @@ -16640,8 +16655,7 @@ and store the pieces in @var{array} and the separator strings in the @code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so forth. The string value of the third argument, @var{fieldsep}, is a regexp describing where to split @var{string} (much as @code{FS} can -be a regexp describing where to split input records; -@pxref{Regexp Field Splitting}). +be a regexp describing where to split input records). If @var{fieldsep} is omitted, the value of @code{FS} is used. @code{split()} returns the number of elements created. @var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]} @@ -16936,6 +16950,26 @@ Nonalphabetic characters are left unchanged. For example, @code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}. @end table +@sidebar Matching the Null String +@cindex matching, null strings +@cindex null strings, matching +@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching +@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching + +In @command{awk}, the @samp{*} operator can match the null string. +This is particularly important for the @code{sub()}, @code{gsub()}, +and @code{gensub()} functions. For example: + +@example +$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'} +@print{} XaXbXcX +@end example + +@noindent +Although this makes a certain amount of sense, it can be surprising. +@end sidebar + + @node Gory Details @subsubsection More About @samp{\} and @samp{&} with @code{sub()}, @code{gsub()}, and @code{gensub()} @@ -16949,7 +16983,7 @@ Nonalphabetic characters are left unchanged. For example, @cindex ampersand (@code{&}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and @quotation CAUTION -This section has been known to cause headaches. +This subsubsection has been reported to cause headaches. You might want to skip it upon first reading. @end quotation @@ -17240,25 +17274,6 @@ and the special cases for @code{sub()} and @code{gsub()}, we recommend the use of @command{gawk} and @code{gensub()} when you have to do substitutions. -@sidebar Matching the Null String -@cindex matching, null strings -@cindex null strings, matching -@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching -@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching - -In @command{awk}, the @samp{*} operator can match the null string. -This is particularly important for the @code{sub()}, @code{gsub()}, -and @code{gensub()} functions. For example: - -@example -$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'} -@print{} XaXbXcX -@end example - -@noindent -Although this makes a certain amount of sense, it can be surprising. -@end sidebar - @node I/O Functions @subsection Input/Output Functions @cindex input/output functions @@ -17311,10 +17326,9 @@ buffers its output and the @code{fflush()} function forces @cindex extensions, common@comma{} @code{fflush()} function @cindex Brian Kernighan's @command{awk} -@code{fflush()} was added to BWK @command{awk} in -April of 1992. For two decades, it was not part of the POSIX standard. -As of December, 2012, it was accepted for inclusion into the POSIX -standard. +Brian Kernighan added @code{fflush()} to his @command{awk} in April +of 1992. For two decades, it was a common extension. In December, +2012, it was accepted for inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}. POSIX standardizes @code{fflush()} as follows: If there @@ -17583,7 +17597,7 @@ is out of range, @code{mktime()} returns @minus{}1. @cindex @command{gawk}, @code{PROCINFO} array in @cindex @code{PROCINFO} array -@item @code{strftime(} [@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)} +@item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)} @c STARTOFRANGE strf @cindexgawkfunc{strftime} @cindex format time string @@ -17850,7 +17864,7 @@ the string. For example: @example $ date '+Today is %A, %B %d, %Y.' -@print{} Today is Monday, May 05, 2014. +@print{} Today is Monday, September 22, 2014. @end example Here is the @command{gawk} version of the @command{date} utility. @@ -18044,17 +18058,16 @@ shows that 0's come in on the left side. For @command{gawk}, this is always true, but in some languages, it's possible to have the left side fill with 1's.} @c Purposely decided to use 0's and 1's here. 2/2001. -If you start over -again with @samp{10111001} and shift it left by three bits, you end up -with @samp{11001000}. -@command{gawk} provides built-in functions that implement the -bitwise operations just described. They are: +If you start over again with @samp{10111001} and shift it left by three +bits, you end up with @samp{11001000}. The following list describes +@command{gawk}'s built-in functions that implement the bitwise operations. +Optional parameters are enclosed in square brackets ([ ]): @cindex @command{gawk}, bitwise operations in @table @code @cindexgawkfunc{and} @cindex bitwise AND -@item @code{and(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)} +@item @code{and(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise AND of the arguments. There must be at least two. @cindexgawkfunc{compl} @@ -18069,7 +18082,7 @@ Return the value of @var{val}, shifted left by @var{count} bits. @cindexgawkfunc{or} @cindex bitwise OR -@item @code{or(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)} +@item @code{or(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise OR of the arguments. There must be at least two. @cindexgawkfunc{rshift} @@ -18079,7 +18092,7 @@ Return the value of @var{val}, shifted right by @var{count} bits. @cindexgawkfunc{xor} @cindex bitwise XOR -@item @code{xor(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)} +@item @code{xor(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise XOR of the arguments. There must be at least two. @end table @@ -18202,7 +18215,7 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. @command{gawk} provides a single function that lets you distinguish an array from a scalar variable. This is necessary for writing code -that traverses every element of an array of arrays. +that traverses every element of an array of arrays (@pxref{Arrays of Arrays}). @table @code @@ -18218,12 +18231,14 @@ an array or not. The second is inside the body of a user-defined function (not discussed yet; @pxref{User-defined}), to test if a parameter is an array or not. -Note, however, that using @code{isarray()} at the global level to test +@quotation NOTE +Using @code{isarray()} at the global level to test variables makes no sense. Since you are the one writing the program, you are supposed to know if your variables are arrays or not. And in fact, due to the way @command{gawk} works, if you pass the name of a variable that has not been previously used to @code{isarray()}, @command{gawk} -will end up turning it into a scalar. +ends up turning it into a scalar. +@end quotation @node I18N Functions @subsection String-Translation Functions @@ -18484,7 +18499,7 @@ extra whitespace signifies the start of the local variable list): function delarray(a, i) @{ for (i in a) - delete a[i] + delete a[i] @} @end example @@ -18495,7 +18510,7 @@ Instead of having to repeat this loop everywhere that you need to clear out an array, your program can just call @code{delarray}. (This guarantees portability. The use of @samp{delete @var{array}} to delete -the contents of an entire array is a recent@footnote{Late in 2012.} +the contents of an entire array is a relatively recent@footnote{Late in 2012.} addition to the POSIX standard.) The following is an example of a recursive function. It takes a string @@ -18525,7 +18540,7 @@ $ @kbd{echo "Don't Panic!" |} @print{} !cinaP t'noD @end example -The C @code{ctime()} function takes a timestamp and returns it in a string, +The C @code{ctime()} function takes a timestamp and returns it as a string, formatted in a well-known fashion. The following example uses the built-in @code{strftime()} function (@pxref{Time Functions}) @@ -18540,13 +18555,19 @@ to create an @command{awk} version of @code{ctime()}: function ctime(ts, format) @{ - format = PROCINFO["strftime"] + format = "%a %b %e %H:%M:%S %Z %Y" + if (ts == 0) ts = systime() # use current time as default return strftime(format, ts) @} @c endfile @end example + +You might think that @code{ctime()} could use @code{PROCINFO["strftime"]} +for its format string. That would be a mistake, since @code{ctime()} is +supposed to return the time formatted in a standard fashion, and user-level +code could have changed @code{PROCINFO["strftime"]}. @c ENDOFRANGE fdef @node Function Caveats @@ -19195,7 +19216,7 @@ function quicksort(data, left, right, less_than, i, last) # quicksort_swap --- helper function for quicksort, should really be inline -function quicksort_swap(data, i, j, temp) +function quicksort_swap(data, i, j, temp) @{ temp = data[i] data[i] = data[j] @@ -19346,10 +19367,11 @@ functions. @item POSIX @command{awk} provides three kinds of built-in functions: numeric, -string, and I/O. @command{gawk} provides functions that work with values -representing time, do bit manipulation, sort arrays, and internationalize -and localize programs. @command{gawk} also provides several extensions to -some of standard functions, typically in the form of additional arguments. +string, and I/O. @command{gawk} provides functions that sort arrays, work +with values representing time, do bit manipulation, determine variable +type (array vs.@: scalar), and internationalize and localize programs. +@command{gawk} also provides several extensions to some of standard +functions, typically in the form of additional arguments. @item Functions accept zero or more arguments and return a value. The @@ -19600,8 +19622,9 @@ are very difficult to track down: function lib_func(x, y, l1, l2) @{ @dots{} - @var{use variable} some_var # some_var should be local - @dots{} # but is not by oversight + # some_var should be local but by oversight is not + @var{use variable} some_var + @dots{} @} @end example @@ -19712,7 +19735,7 @@ function mystrtonum(str, ret, n, i, k, c) # a[5] = "123.45" # a[6] = "1.e3" # a[7] = "1.32" -# a[7] = "1.32E2" +# a[8] = "1.32E2" # # for (i = 1; i in a; i++) # print a[i], strtonum(a[i]), mystrtonum(a[i]) @@ -19723,9 +19746,12 @@ function mystrtonum(str, ret, n, i, k, c) The function first looks for C-style octal numbers (base 8). If the input string matches a regular expression describing octal numbers, then @code{mystrtonum()} loops through each character in the -string. It sets @code{k} to the index in @code{"01234567"} of the current -octal digit. Since the return value is one-based, the @samp{k--} -adjusts @code{k} so it can be used in computing the return value. +string. It sets @code{k} to the index in @code{"1234567"} of the current +octal digit. +The return value will either be the same number as the digit, or zero +if the character is not there, which will be true for a @samp{0}. +This is safe, since the regexp test in the @code{if} ensures that +only octal values are converted. Similar logic applies to the code that checks for and converts a hexadecimal value, which starts with @samp{0x} or @samp{0X}. @@ -19758,7 +19784,7 @@ that a condition or set of conditions is true. Before proceeding with a particular computation, you make a statement about what you believe to be the case. Such a statement is known as an @dfn{assertion}. The C language provides an @code{<assert.h>} header file -and corresponding @code{assert()} macro that the programmer can use to make +and corresponding @code{assert()} macro that a programmer can use to make assertions. If an assertion fails, the @code{assert()} macro arranges to print a diagnostic message describing the condition that should have been true but was not, and then it kills the program. In C, using @@ -20228,7 +20254,7 @@ function getlocaltime(time, ret, now, i) now = systime() # return date(1)-style output - ret = strftime(PROCINFO["strftime"], now) + ret = strftime("%a %b %e %H:%M:%S %Z %Y", now) # clear out target array delete time @@ -20343,6 +20369,9 @@ if (length(contents) == 0) This tests the result to see if it is empty or not. An equivalent test would be @samp{contents == ""}. +@xref{Extension Sample Readfile}, for an extension function that +also reads an entire file into memory. + @node Data File Management @section @value{DDF} Management @@ -20400,15 +20429,14 @@ Besides solving the problem in only nine(!) lines of code, it does so @c # Arnold Robbins, arnold@@skeeve.com, Public Domain @c # January 1992 -FILENAME != _oldfilename \ -@{ +FILENAME != _oldfilename @{ if (_oldfilename != "") endfile(_oldfilename) _oldfilename = FILENAME beginfile(FILENAME) @} -END @{ endfile(FILENAME) @} +END @{ endfile(FILENAME) @} @end example This file must be loaded before the user's ``main'' program, so that the @@ -20461,7 +20489,7 @@ FNR == 1 @{ beginfile(FILENAME) @} -END @{ endfile(_filename_) @} +END @{ endfile(_filename_) @} @c endfile @end example |