diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 549 |
1 files changed, 295 insertions, 254 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index b418d4cf..74dd35f8 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -526,10 +526,10 @@ particular records in a file and perform operations upon them. * Escape Sequences:: How to write nonprinting characters. * Regexp Operators:: Regular Expression Operators. * Bracket Expressions:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. * Leftmost Longest:: How much text matches. * Computed Regexps:: Using Dynamic Regexps. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. * Regexp Summary:: Regular expressions summary. * Records:: Controlling how data is split into records. @@ -1774,6 +1774,7 @@ They also appear in the index under the heading ``dark corner.'' As noted by the opening quote, though, any coverage of dark corners is, by definition, incomplete. +@cindex c.e., See common extensions Extensions to the standard @command{awk} language that are supported by more than one @command{awk} implementation are marked @ifclear FOR_PRINT @@ -2341,24 +2342,19 @@ For example, on OS/2, it is @kbd{Ctrl-z}.) As an example, the following program prints a friendly piece of advice (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}), to keep you from worrying about the complexities of computer -programming (@code{BEGIN} is a feature we haven't discussed yet): +programming: @example -$ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} +$ @kbd{awk "BEGIN @{ print "Don\47t Panic!" @}"} @print{} Don't Panic! @end example -@cindex shell quoting, double quote -@cindex double quote (@code{"}) in shell commands -@cindex @code{"} (double quote) in shell commands -@cindex @code{\} (backslash) in shell commands -@cindex backslash (@code{\}) in shell commands -This program does not read any input. The @samp{\} before each of the -inner double quotes is necessary because of the shell's quoting -rules---in particular because it mixes both single quotes and -double quotes.@footnote{Although we generally recommend the use of single -quotes around the program text, double quotes are needed here in order to -put the single quote into the message.} +@command{awk} executes statements associated with @code{BEGIN} before +reading any input. If there are no other statements in your program, +as is the case here, @command{awk} just stops, instead of trying to read +input it doesn't know how to process. +The @samp{\47} is a magic way of getting a single quote into +the program, without having to engage in ugly shell quoting tricks. @quotation NOTE As a side note, if you use Bash as your shell, you should execute the @@ -3046,6 +3042,9 @@ awk '@{ if (length($0) > max) max = length($0) @} END @{ print max @}' data @end example +The code associated with @code{END} executes after all +input has been read; it's the other side of the coin to @code{BEGIN}. + @cindex @command{expand} utility @item Print the length of the longest line in @file{data}: @@ -4130,6 +4129,11 @@ included. As each element of @code{ARGV} is processed, @command{gawk} sets the variable @code{ARGIND} to the index in @code{ARGV} of the current element. +@c FIXME: One day, move the ARGC and ARGV node closer to here. +Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets +you control how @command{awk} processes the input files; this is described +in more detail in @ref{ARGC and ARGV}. + @cindex input files, variable assignments and @cindex variable assignments and input files The distinction between @value{FN} arguments and variable-assignment @@ -4765,10 +4769,10 @@ regular expressions work, we present more complicated instances. * Escape Sequences:: How to write nonprinting characters. * Regexp Operators:: Regular Expression Operators. * Bracket Expressions:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. * Leftmost Longest:: How much text matches. * Computed Regexps:: Using Dynamic Regexps. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. * Regexp Summary:: Regular expressions summary. @end menu @@ -4979,8 +4983,11 @@ However, using more than two hexadecimal digits produces @item \/ A literal slash (necessary for regexp constants only). This sequence is used when you want to write a regexp -constant that contains a slash. Because the regexp is delimited by -slashes, you need to escape the slash that is part of the pattern, +constant that contains a slash +(such as @code{/.*:\/home\/[[:alnum:]]+:.*/}; the @samp{[[:alnum:]]} +notation is discussed shortly, in @ref{Bracket Expressions}). +Because the regexp is delimited by +slashes, you need to escape any slash that is part of the pattern, in order to tell @command{awk} to keep processing the rest of the regexp. @cindex @code{\} (backslash), @code{\"} escape sequence @@ -4988,8 +4995,10 @@ in order to tell @command{awk} to keep processing the rest of the regexp. @item \" A literal double quote (necessary for string constants only). This sequence is used when you want to write a string -constant that contains a double quote. Because the string is delimited by -double quotes, you need to escape the quote that is part of the string, +constant that contains a double quote +(such as @code{"He said \"hi!\" to her."}). +Because the string is delimited by +double quotes, you need to escape any quote that is part of the string, in order to tell @command{awk} to keep processing the rest of the string. @end table @@ -5550,6 +5559,204 @@ they do not recognize collating symbols or equivalence classes. @c maybe one day ... @c ENDOFRANGE charlist +@node Leftmost Longest +@section How Much Text Matches? + +@cindex regular expressions, leftmost longest match +@c @cindex matching, leftmost longest +Consider the following: + +@example +echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' +@end example + +This example uses the @code{sub()} function (which we haven't discussed yet; +@pxref{String Functions}) +to make a change to the input record. Here, the regexp @code{/a+/} +indicates ``one or more @samp{a} characters,'' and the replacement +text is @samp{<A>}. + +The input contains four @samp{a} characters. +@command{awk} (and POSIX) regular expressions always match +the leftmost, @emph{longest} sequence of input characters that can +match. Thus, all four @samp{a} characters are +replaced with @samp{<A>} in this example: + +@example +$ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'} +@print{} <A>bcd +@end example + +For simple match/no-match tests, this is not so important. But when doing +text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()}, +and @code{gensub()} functions, it is very important. +@ifinfo +@xref{String Functions}, +for more information on these functions. +@end ifinfo +Understanding this principle is also important for regexp-based record +and field splitting (@pxref{Records}, +and also @pxref{Field Separators}). + +@node Computed Regexps +@section Using Dynamic Regexps + +@c STARTOFRANGE dregexp +@cindex regular expressions, computed +@c STARTOFRANGE regexpd +@cindex regular expressions, dynamic +@cindex @code{~} (tilde), @code{~} operator +@cindex tilde (@code{~}), @code{~} operator +@cindex @code{!} (exclamation point), @code{!~} operator +@cindex exclamation point (@code{!}), @code{!~} operator +@c @cindex operators, @code{~} +@c @cindex operators, @code{!~} +The righthand side of a @samp{~} or @samp{!~} operator need not be a +regexp constant (i.e., a string of characters between slashes). It may +be any expression. The expression is evaluated and converted to a string +if necessary; the contents of the string are then used as the +regexp. A regexp computed in this way is called a @dfn{dynamic +regexp} or a @dfn{computed regexp}: + +@example +BEGIN @{ digits_regexp = "[[:digit:]]+" @} +$0 ~ digits_regexp @{ print @} +@end example + +@noindent +This sets @code{digits_regexp} to a regexp that describes one or more digits, +and tests whether the input record matches this regexp. + +@quotation NOTE +When using the @samp{~} and @samp{!~} +operators, there is a difference between a regexp constant +enclosed in slashes and a string constant enclosed in double quotes. +If you are going to use a string constant, you have to understand that +the string is, in essence, scanned @emph{twice}: the first time when +@command{awk} reads your program, and the second time when it goes to +match the string on the lefthand side of the operator with the pattern +on the right. This is true of any string-valued expression (such as +@code{digits_regexp}, shown previously), not just string constants. +@end quotation + +@cindex regexp constants, slashes vs.@: quotes +@cindex @code{\} (backslash), in regexp constants +@cindex backslash (@code{\}), in regexp constants +@cindex @code{"} (double quote), in regexp constants +@cindex double quote (@code{"}), in regexp constants +What difference does it make if the string is +scanned twice? The answer has to do with escape sequences, and particularly +with backslashes. To get a backslash into a regular expression inside a +string, you have to type two backslashes. + +For example, @code{/\*/} is a regexp constant for a literal @samp{*}. +Only one backslash is needed. To do the same thing with a string, +you have to type @code{"\\*"}. The first backslash escapes the +second one so that the string actually contains the +two characters @samp{\} and @samp{*}. + +@cindex troubleshooting, regexp constants vs.@: string constants +@cindex regexp constants, vs.@: string constants +@cindex string constants, vs.@: regexp constants +Given that you can use both regexp and string constants to describe +regular expressions, which should you use? The answer is ``regexp +constants,'' for several reasons: + +@itemize @value{BULLET} +@item +String constants are more complicated to write and +more difficult to read. Using regexp constants makes your programs +less error-prone. Not understanding the difference between the two +kinds of constants is a common source of errors. + +@item +It is more efficient to use regexp constants. @command{awk} can note +that you have supplied a regexp and store it internally in a form that +makes pattern matching more efficient. When using a string constant, +@command{awk} must first convert the string into this internal form and +then perform the pattern matching. + +@item +Using regexp constants is better form; it shows clearly that you +intend a regexp match. +@end itemize + +@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps +@ifdocbook +@docbook +<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title> +@end docbook + +@cindex regular expressions, dynamic, with embedded newlines +@cindex newlines, in dynamic regexps + +Some versions of @command{awk} do not allow the newline +character to be used inside a bracket expression for a dynamic regexp: + +@example +$ @kbd{awk '$0 ~ "[ \t\n]"'} +@error{} awk: newline in character class [ +@error{} ]... +@error{} source line number 1 +@error{} context is +@error{} >>> <<< +@end example + +@cindex newlines, in regexp constants +But a newline in a regexp constant works with no problem: + +@example +$ @kbd{awk '$0 ~ /[ \t\n]/'} +@kbd{here is a sample line} +@print{} here is a sample line +@kbd{Ctrl-d} +@end example + +@command{gawk} does not have this problem, and it isn't likely to +occur often in practice, but it's worth noting for future reference. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps} + + +@cindex regular expressions, dynamic, with embedded newlines +@cindex newlines, in dynamic regexps + +Some versions of @command{awk} do not allow the newline +character to be used inside a bracket expression for a dynamic regexp: + +@example +$ @kbd{awk '$0 ~ "[ \t\n]"'} +@error{} awk: newline in character class [ +@error{} ]... +@error{} source line number 1 +@error{} context is +@error{} >>> <<< +@end example + +@cindex newlines, in regexp constants +But a newline in a regexp constant works with no problem: + +@example +$ @kbd{awk '$0 ~ /[ \t\n]/'} +@kbd{here is a sample line} +@print{} here is a sample line +@kbd{Ctrl-d} +@end example + +@command{gawk} does not have this problem, and it isn't likely to +occur often in practice, but it's worth noting for future reference. +@end cartouche +@end ifnotdocbook +@c ENDOFRANGE dregexp +@c ENDOFRANGE regexpd + @node GNU Regexp Operators @section @command{gawk}-Specific Regexp Operators @@ -5825,204 +6032,6 @@ Case is always significant in compatibility mode. @c ENDOFRANGE csregexp @c ENDOFRANGE regexpcs -@node Leftmost Longest -@section How Much Text Matches? - -@cindex regular expressions, leftmost longest match -@c @cindex matching, leftmost longest -Consider the following: - -@example -echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' -@end example - -This example uses the @code{sub()} function (which we haven't discussed yet; -@pxref{String Functions}) -to make a change to the input record. Here, the regexp @code{/a+/} -indicates ``one or more @samp{a} characters,'' and the replacement -text is @samp{<A>}. - -The input contains four @samp{a} characters. -@command{awk} (and POSIX) regular expressions always match -the leftmost, @emph{longest} sequence of input characters that can -match. Thus, all four @samp{a} characters are -replaced with @samp{<A>} in this example: - -@example -$ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'} -@print{} <A>bcd -@end example - -For simple match/no-match tests, this is not so important. But when doing -text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()}, -and @code{gensub()} functions, it is very important. -@ifinfo -@xref{String Functions}, -for more information on these functions. -@end ifinfo -Understanding this principle is also important for regexp-based record -and field splitting (@pxref{Records}, -and also @pxref{Field Separators}). - -@node Computed Regexps -@section Using Dynamic Regexps - -@c STARTOFRANGE dregexp -@cindex regular expressions, computed -@c STARTOFRANGE regexpd -@cindex regular expressions, dynamic -@cindex @code{~} (tilde), @code{~} operator -@cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator -@c @cindex operators, @code{~} -@c @cindex operators, @code{!~} -The righthand side of a @samp{~} or @samp{!~} operator need not be a -regexp constant (i.e., a string of characters between slashes). It may -be any expression. The expression is evaluated and converted to a string -if necessary; the contents of the string are then used as the -regexp. A regexp computed in this way is called a @dfn{dynamic -regexp} or a @dfn{computed regexp}: - -@example -BEGIN @{ digits_regexp = "[[:digit:]]+" @} -$0 ~ digits_regexp @{ print @} -@end example - -@noindent -This sets @code{digits_regexp} to a regexp that describes one or more digits, -and tests whether the input record matches this regexp. - -@quotation NOTE -When using the @samp{~} and @samp{!~} -operators, there is a difference between a regexp constant -enclosed in slashes and a string constant enclosed in double quotes. -If you are going to use a string constant, you have to understand that -the string is, in essence, scanned @emph{twice}: the first time when -@command{awk} reads your program, and the second time when it goes to -match the string on the lefthand side of the operator with the pattern -on the right. This is true of any string-valued expression (such as -@code{digits_regexp}, shown previously), not just string constants. -@end quotation - -@cindex regexp constants, slashes vs.@: quotes -@cindex @code{\} (backslash), in regexp constants -@cindex backslash (@code{\}), in regexp constants -@cindex @code{"} (double quote), in regexp constants -@cindex double quote (@code{"}), in regexp constants -What difference does it make if the string is -scanned twice? The answer has to do with escape sequences, and particularly -with backslashes. To get a backslash into a regular expression inside a -string, you have to type two backslashes. - -For example, @code{/\*/} is a regexp constant for a literal @samp{*}. -Only one backslash is needed. To do the same thing with a string, -you have to type @code{"\\*"}. The first backslash escapes the -second one so that the string actually contains the -two characters @samp{\} and @samp{*}. - -@cindex troubleshooting, regexp constants vs.@: string constants -@cindex regexp constants, vs.@: string constants -@cindex string constants, vs.@: regexp constants -Given that you can use both regexp and string constants to describe -regular expressions, which should you use? The answer is ``regexp -constants,'' for several reasons: - -@itemize @value{BULLET} -@item -String constants are more complicated to write and -more difficult to read. Using regexp constants makes your programs -less error-prone. Not understanding the difference between the two -kinds of constants is a common source of errors. - -@item -It is more efficient to use regexp constants. @command{awk} can note -that you have supplied a regexp and store it internally in a form that -makes pattern matching more efficient. When using a string constant, -@command{awk} must first convert the string into this internal form and -then perform the pattern matching. - -@item -Using regexp constants is better form; it shows clearly that you -intend a regexp match. -@end itemize - -@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps -@ifdocbook -@docbook -<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title> -@end docbook - -@cindex regular expressions, dynamic, with embedded newlines -@cindex newlines, in dynamic regexps - -Some versions of @command{awk} do not allow the newline -character to be used inside a bracket expression for a dynamic regexp: - -@example -$ @kbd{awk '$0 ~ "[ \t\n]"'} -@error{} awk: newline in character class [ -@error{} ]... -@error{} source line number 1 -@error{} context is -@error{} >>> <<< -@end example - -@cindex newlines, in regexp constants -But a newline in a regexp constant works with no problem: - -@example -$ @kbd{awk '$0 ~ /[ \t\n]/'} -@kbd{here is a sample line} -@print{} here is a sample line -@kbd{Ctrl-d} -@end example - -@command{gawk} does not have this problem, and it isn't likely to -occur often in practice, but it's worth noting for future reference. - -@docbook -</sidebar> -@end docbook -@end ifdocbook - -@ifnotdocbook -@cartouche -@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps} - - -@cindex regular expressions, dynamic, with embedded newlines -@cindex newlines, in dynamic regexps - -Some versions of @command{awk} do not allow the newline -character to be used inside a bracket expression for a dynamic regexp: - -@example -$ @kbd{awk '$0 ~ "[ \t\n]"'} -@error{} awk: newline in character class [ -@error{} ]... -@error{} source line number 1 -@error{} context is -@error{} >>> <<< -@end example - -@cindex newlines, in regexp constants -But a newline in a regexp constant works with no problem: - -@example -$ @kbd{awk '$0 ~ /[ \t\n]/'} -@kbd{here is a sample line} -@print{} here is a sample line -@kbd{Ctrl-d} -@end example - -@command{gawk} does not have this problem, and it isn't likely to -occur often in practice, but it's worth noting for future reference. -@end cartouche -@end ifnotdocbook -@c ENDOFRANGE dregexp -@c ENDOFRANGE regexpd - @node Regexp Summary @section Summary @@ -7965,32 +7974,48 @@ finished processing the current record, but want to do some special processing on the next record @emph{right now}. For example: @example +# Remove text between /* and */, inclusive @{ - if ((t = index($0, "/*")) != 0) @{ - # value of `tmp' will be "" if t is 1 - tmp = substr($0, 1, t - 1) - u = index(substr($0, t + 2), "*/") - offset = t + 2 - while (u == 0) @{ - if (getline <= 0) @{ + if ((i = index($0, "/*")) != 0) @{ + out = substr($0, 1, i - 1) # leading part of the string + rest = substr($0, i + 2) # ... */ ... + j = index(rest, "*/") # is */ in trailing part? + if (j > 0) @{ + rest = substr(rest, j + 2) # remove comment + @} else @{ + while (j == 0) @{ + # get more text + if (getline <= 0) @{ m = "unexpected EOF or error" m = (m ": " ERRNO) print m > "/dev/stderr" exit - @} - u = index($0, "*/") - offset = 0 - @} - # substr() expression will be "" if */ - # occurred at end of line - $0 = tmp substr($0, offset + u + 2) - @} - print $0 + @} + # build up the line using string concatenation + rest = rest $0 + j = index(rest, "*/") # is */ in trailing part? + if (j != 0) @{ + rest = substr(rest, j + 2) + break + @} + @} + @} + # build up the output line using string concatenation + $0 = out rest + @} + print $0 @} @end example This @command{awk} program deletes C-style comments (@samp{/* @dots{} -*/}) from the input. By replacing the @samp{print $0} with other +*/}) from the input. +It uses a number of features we haven't covered yet, including +string concatenation +(@pxref{Concatenation}) +and the @code{index()} and @code{substr()} built-in +functions +(@pxref{String Functions}). +By replacing the @samp{print $0} with other statements, you could perform more complicated processing on the decommented input, such as searching for matches of a regular expression. (This program has a subtle problem---it does not work if one @@ -8681,7 +8706,7 @@ including abstentions, for each item. comments (@samp{/* @dots{} */}) from the input. That program does not work if one comment ends on one line and another one starts later on the same line. -Write a program that does handle multiple comments on the line. +That can be fixed by making one simple change. What is it? @end enumerate @c EXCLUDE END @@ -10511,7 +10536,8 @@ A regexp constant is a regular expression description enclosed in slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in @command{awk} programs are constant, but the @samp{~} and @samp{!~} matching operators can also match computed or dynamic regexps -(which are just ordinary strings or variables that contain a regexp). +(which are typically just ordinary strings or variables that contain a regexp, +but could be a more complex expression). @c ENDOFRANGE cnst @node Using Constant Regexps @@ -12302,7 +12328,7 @@ program is one way to print lines in between special bracketing lines: @example $1 == "START" @{ interested = ! interested; next @} -interested == 1 @{ print @} +interested @{ print @} $1 == "END" @{ interested = ! interested; next @} @end example @@ -12322,6 +12348,16 @@ bogus input data, but the point is to illustrate the use of `!', so we'll leave well enough alone. @end ignore +Most commonly, the @samp{!} operator is used in the conditions of +@code{if} and @code{while} statements, where it often makes more +sense to phrase the logic in the negative: + +@example +if (! @var{some condition} || @var{some other condition}) @{ + @var{@dots{} do whatever processing @dots{}} +@} +@end example + @cindex @code{next} statement @quotation NOTE The @code{next} statement is discussed in @@ -14114,7 +14150,8 @@ starts over with the first rule in the program. If the @code{nextfile} statement causes the end of the input to be reached, then the code in any @code{END} rules is executed. An exception to this is when @code{nextfile} is invoked during execution of any statement in an -@code{END} rule; In this case, it causes the program to stop immediately. @xref{BEGIN/END}. +@code{END} rule; in this case, it causes the program to stop immediately. +@xref{BEGIN/END}. The @code{nextfile} statement is useful when there are many @value{DF}s to process but it isn't necessary to process every record in every file. @@ -14124,13 +14161,10 @@ would have to continue scanning the unwanted records. The @code{nextfile} statement accomplishes this much more efficiently. In @command{gawk}, execution of @code{nextfile} causes additional things -to happen: -any @code{ENDFILE} rules are executed except in the case as -mentioned below, -@code{ARGIND} is incremented, -and -any @code{BEGINFILE} rules are executed. -(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.) +to happen: any @code{ENDFILE} rules are executed if @command{gawk} is +not currently in an @code{END} or @code{BEGINFILE} rule, @code{ARGIND} is +incremented, and any @code{BEGINFILE} rules are executed. (@code{ARGIND} +hasn't been introduced yet. @xref{Built-in Variables}.) With @command{gawk}, @code{nextfile} is useful inside a @code{BEGINFILE} rule to skip over a file that would otherwise cause @command{gawk} @@ -16152,7 +16186,7 @@ $ @kbd{echo 'line 1} > @kbd{line 2} > @kbd{line 3' | awk '@{ l[lines] = $0; ++lines @}} > @kbd{END @{} -> @kbd{for (i = lines-1; i >= 0; --i)} +> @kbd{for (i = lines - 1; i >= 0; i--)} > @kbd{print l[i]} > @kbd{@}'} @print{} line 3 @@ -16176,7 +16210,7 @@ The following version of the program works correctly: @example @{ l[lines++] = $0 @} END @{ - for (i = lines - 1; i >= 0; --i) + for (i = lines - 1; i >= 0; i--) print l[i] @} @end example @@ -20455,8 +20489,9 @@ function mystrtonum(str, ret, n, i, k, c) ret = 0 for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) - if ((k = index("01234567", c)) > 0) - k-- # adjust for 1-basing in awk + # index() returns 0 if c not in string, + # includes c == "0" + k = index("1234567", c) ret = ret * 8 + k @} @@ -20468,6 +20503,8 @@ function mystrtonum(str, ret, n, i, k, c) for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) c = tolower(c) + # index() returns 0 if c not in string, + # includes c == "0" k = index("123456789abcdef", c) ret = ret * 16 + k @@ -21070,7 +21107,12 @@ function readfile(file, tmp, contents) This function reads from @code{file} one record at a time, building up the full contents of the file in the local variable @code{contents}. -It works, but is not necessarily efficient. +It works, but is not necessarily +@c 8/2014. Thanks to BWK for pointing this out: +efficient.@footnote{Execution time grows quadratically in the size of +the input; for each record, @command{awk} has to allocate a bigger +internal buffer for @code{contents}, copy the old contents into it, +and then append the contents of the new record.} The following function, based on a suggestion by Denis Shirokov, reads the entire contents of the named file in one shot: @@ -21743,8 +21785,7 @@ it is not an option, and it ends option processing. Continuing on: i = index(options, thisopt) if (i == 0) @{ if (Opterr) - printf("%c -- invalid option\n", - thisopt) > "/dev/stderr" + printf("%c -- invalid option\n", thisopt) > "/dev/stderr" if (_opti >= length(argv[Optind])) @{ Optind++ _opti = 0 |