aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi969
1 files changed, 919 insertions, 50 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 94de0af8..b579592e 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -1,3 +1,8 @@
+% *****************************************************
+% * DO NOT MODIFY THIS FILE!!!! *
+% * It was generated from gawkman.texi by sidebar.awk *
+% * Edit gawkman.texi instead. *
+% *****************************************************
\input texinfo @c -*-texinfo-*-
@c %**start of header (This is for running Texinfo on a region.)
@setfilename gawk.info
@@ -1101,22 +1106,47 @@ has been removed.)
@unnumberedsec History of @command{awk} and @command{gawk}
@cindex recipe for a programming language
@cindex programming language, recipe for
+@cindex sidebar, Recipe For A Programming Language
+@ifdocbook
+@docbook
+<sidebar><title>Recipe For A Programming Language</title>
+@end docbook
+
+
+@multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}}
+@item @tab 1 part @code{egrep} @tab 1 part @code{snobol}
+@item @tab 2 parts @code{ed} @tab 3 parts C
+@end multitable
+
+Blend all parts well using @code{lex} and @code{yacc}.
+Document minimally and release.
+
+After eight years, add another part @code{egrep} and two
+more parts C. Document very well and release.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
@cartouche
@center @b{Recipe For A Programming Language}
+
+
@multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}}
@item @tab 1 part @code{egrep} @tab 1 part @code{snobol}
@item @tab 2 parts @code{ed} @tab 3 parts C
@end multitable
-@quotation
Blend all parts well using @code{lex} and @code{yacc}.
Document minimally and release.
After eight years, add another part @code{egrep} and two
more parts C. Document very well and release.
-@end quotation
@end cartouche
+@end ifnotdocbook
@cindex Aho, Alfred
@cindex Weinberger, Peter
@@ -1235,13 +1265,11 @@ You should also ignore the many cross-references; they are for the
expert user and for the online Info and HTML versions of the document.
@end ifnotinfo
-There are
-subsections labeled
-as @strong{Advanced Notes}
+There are sidebars
scattered throughout the @value{DOCUMENT}.
They add a more complete explanation of points that are relevant, but not likely
to be of interest on first reading.
-All appear in the index, under the heading ``advanced features.''
+All appear in the index, under the heading ``sidebar.''
Most of the time, the examples use complete @command{awk} programs.
Some of the more advanced sections show only the part of the @command{awk}
@@ -2166,8 +2194,12 @@ Self-contained @command{awk} scripts are useful when you want to write a
program that users can invoke without their having to know that the program is
written in @command{awk}.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Portability Issues with @samp{#!}
+@cindex sidebar, Portability Issues with @samp{#!}
+@ifdocbook
+@docbook
+<sidebar><title>Portability Issues with @samp{#!}</title>
+@end docbook
+
@cindex portability, @code{#!} (executable scripts)
Some systems limit the length of the interpreter name to 32 characters.
@@ -2191,6 +2223,41 @@ of your script (@samp{advice}). @value{DARKCORNER}
Don't rely on the value of @code{ARGV[0]}
to provide your script name.
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Portability Issues with @samp{#!}}
+
+
+@cindex portability, @code{#!} (executable scripts)
+
+Some systems limit the length of the interpreter name to 32 characters.
+Often, this can be dealt with by using a symbolic link.
+
+You should not put more than one argument on the @samp{#!}
+line after the path to @command{awk}. It does not work. The operating system
+treats the rest of the line as a single argument and passes it to @command{awk}.
+Doing this leads to confusing behavior---most likely a usage diagnostic
+of some sort from @command{awk}.
+
+@cindex @code{ARGC}/@code{ARGV} variables, portability and
+@cindex portability, @code{ARGV} variable
+Finally,
+the value of @code{ARGV[0]}
+(@pxref{Built-in Variables})
+varies depending upon your operating system.
+Some systems put @samp{awk} there, some put the full pathname
+of @command{awk} (such as @file{/bin/awk}), and some put the name
+of your script (@samp{advice}). @value{DARKCORNER}
+Don't rely on the value of @code{ARGV[0]}
+to provide your script name.
+@end cartouche
+@end ifnotdocbook
+
@node Comments
@subsection Comments in @command{awk} Programs
@cindex @code{#} (number sign), commenting
@@ -4495,8 +4562,12 @@ A backslash before any other character means to treat that character
literally.
@end itemize
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Backslash Before Regular Characters
+@cindex sidebar, Backslash Before Regular Characters
+@ifdocbook
+@docbook
+<sidebar><title>Backslash Before Regular Characters</title>
+@end docbook
+
@cindex portability, backslash in escape sequences
@cindex POSIX @command{awk}, backslashes in string constants
@cindex backslash (@code{\}), in escape sequences, POSIX and
@@ -4528,8 +4599,83 @@ In such implementations, typing @code{"a\qc"} is the same as typing
@code{"a\\qc"}.
@end table
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Escape Sequences for Metacharacters
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Backslash Before Regular Characters}
+
+
+@cindex portability, backslash in escape sequences
+@cindex POSIX @command{awk}, backslashes in string constants
+@cindex backslash (@code{\}), in escape sequences, POSIX and
+@cindex @code{\} (backslash), in escape sequences, POSIX and
+
+@cindex troubleshooting, backslash before nonspecial character
+If you place a backslash in a string constant before something that is
+not one of the characters previously listed, POSIX @command{awk} purposely
+leaves what happens as undefined. There are two choices:
+
+@c @cindex automatic warnings
+@c @cindex warnings, automatic
+@table @asis
+@item Strip the backslash out
+This is what Brian Kernighan's @command{awk} and @command{gawk} both do.
+For example, @code{"a\qc"} is the same as @code{"aqc"}.
+(Because this is such an easy bug both to introduce and to miss,
+@command{gawk} warns you about it.)
+Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars
+surrounded by whitespace as the field separator. There should be
+two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
+@c I did this! This is why I added the warning.
+
+@cindex @command{gawk}, escape sequences
+@cindex Unix @command{awk}, backslashes in escape sequences
+@item Leave the backslash alone
+Some other @command{awk} implementations do this.
+In such implementations, typing @code{"a\qc"} is the same as typing
+@code{"a\\qc"}.
+@end table
+@end cartouche
+@end ifnotdocbook
+
+@cindex sidebar, Escape Sequences for Metacharacters
+@ifdocbook
+@docbook
+<sidebar><title>Escape Sequences for Metacharacters</title>
+@end docbook
+
+@cindex metacharacters, escape sequences for
+
+Suppose you use an octal or hexadecimal
+escape to represent a regexp metacharacter.
+(See @ref{Regexp Operators}.)
+Does @command{awk} treat the character as a literal character or as a regexp
+operator?
+
+@cindex dark corner, escape sequences, for metacharacters
+Historically, such characters were taken literally.
+@value{DARKCORNER}
+However, the POSIX standard indicates that they should be treated
+as real metacharacters, which is what @command{gawk} does.
+In compatibility mode (@pxref{Options}),
+@command{gawk} treats the characters represented by octal and hexadecimal
+escape sequences literally when used in regexp constants. Thus,
+@code{/a\52b/} is equivalent to @code{/a\*b/}.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Escape Sequences for Metacharacters}
+
+
@cindex metacharacters, escape sequences for
Suppose you use an octal or hexadecimal
@@ -4547,6 +4693,8 @@ In compatibility mode (@pxref{Options}),
@command{gawk} treats the characters represented by octal and hexadecimal
escape sequences literally when used in regexp constants. Thus,
@code{/a\52b/} is equivalent to @code{/a\*b/}.
+@end cartouche
+@end ifnotdocbook
@node Regexp Operators
@section Regular Expression Operators
@@ -5316,7 +5464,50 @@ intend a regexp match.
@end itemize
@c fakenode --- for prepinfo
-@subheading Advanced Notes: Using @code{\n} in Bracket Expressions of Dynamic Regexps
+@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps
+@ifdocbook
+@docbook
+<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title>
+@end docbook
+
+@cindex regular expressions, dynamic, with embedded newlines
+@cindex newlines, in dynamic regexps
+
+Some commercial versions of @command{awk} do not allow the newline
+character to be used inside a bracket expression for a dynamic regexp:
+
+@example
+$ @kbd{awk '$0 ~ "[ \t\n]"'}
+@error{} awk: newline in character class [
+@error{} ]...
+@error{} source line number 1
+@error{} context is
+@error{} >>> <<<
+@end example
+
+@cindex newlines, in regexp constants
+But a newline in a regexp constant works with no problem:
+
+@example
+$ @kbd{awk '$0 ~ /[ \t\n]/'}
+@kbd{here is a sample line}
+@print{} here is a sample line
+@kbd{@value{CTL}-d}
+@end example
+
+@command{gawk} does not have this problem, and it isn't likely to
+occur often in practice, but it's worth noting for future reference.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps}
+
+
@cindex regular expressions, dynamic, with embedded newlines
@cindex newlines, in dynamic regexps
@@ -5344,6 +5535,8 @@ $ @kbd{awk '$0 ~ /[ \t\n]/'}
@command{gawk} does not have this problem, and it isn't likely to
occur often in practice, but it's worth noting for future reference.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE dregexp
@c ENDOFRANGE regexpd
@c ENDOFRANGE regexp
@@ -5635,10 +5828,12 @@ compatibility mode
In compatibility mode, only the first character of the value of
@code{RS} is used to determine the end of the record.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: @code{RS = "\0"} Is Not Portable
+@cindex sidebar, @code{RS = "\0"} Is Not Portable
+@ifdocbook
+@docbook
+<sidebar><title>@code{RS = "\0"} Is Not Portable</title>
+@end docbook
-@cindex advanced features, @value{DF}s as single record
@cindex portability, @value{DF}s as single record
There are times when you might want to treat an entire @value{DF} as a
single record. The only way to make this happen is to give @code{RS}
@@ -5673,6 +5868,53 @@ about.} store strings internally as C-style strings. C strings use the
The best way to treat a whole file as a single record is to
simply read the file in, one record at a time, concatenating each
record onto the end of the previous ones.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{@code{RS = "\0"} Is Not Portable}
+
+
+@cindex portability, @value{DF}s as single record
+There are times when you might want to treat an entire @value{DF} as a
+single record. The only way to make this happen is to give @code{RS}
+a value that you know doesn't occur in the input file. This is hard
+to do in a general way, such that a program always works for arbitrary
+input files.
+@c can you say `understatement' boys and girls?
+
+You might think that for text files, the @sc{nul} character, which
+consists of a character with all bits equal to zero, is a good
+value to use for @code{RS} in this case:
+
+@example
+BEGIN @{ RS = "\0" @} # whole file becomes one record?
+@end example
+
+@cindex differences in @command{awk} and @command{gawk}, strings, storing
+@command{gawk} in fact accepts this, and uses the @sc{nul}
+character for the record separator.
+However, this usage is @emph{not} portable
+to other @command{awk} implementations.
+
+@cindex dark corner, strings, storing
+All other @command{awk} implementations@footnote{At least that we know
+about.} store strings internally as C-style strings. C strings use the
+@sc{nul} character as the string terminator. In effect, this means that
+@samp{RS = "\0"} is the same as @samp{RS = ""}.
+@value{DARKCORNER}
+
+@cindex records, treating files as
+@cindex files, as single records
+The best way to treat a whole file as a single record is to
+simply read the file in, one record at a time, concatenating each
+record onto the end of the previous ones.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE inspl
@c ENDOFRANGE recspl
@@ -6001,8 +6243,37 @@ This also applies to any built-in function that updates @code{$0},
such as @code{sub()} and @code{gsub()}
(@pxref{String Functions}).
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Understanding @code{$0}
+@cindex sidebar, Understanding @code{$0}
+@ifdocbook
+@docbook
+<sidebar><title>Understanding @code{$0}</title>
+@end docbook
+
+
+It is important to remember that @code{$0} is the @emph{full}
+record, exactly as it was read from the input. This includes
+any leading or trailing whitespace, and the exact whitespace (or other
+characters) that separate the fields.
+
+It is a not-uncommon error to try to change the field separators
+in a record simply by setting @code{FS} and @code{OFS}, and then
+expecting a plain @samp{print} or @samp{print $0} to print the
+modified record.
+
+But this does not work, since nothing was done to change the record
+itself. Instead, you must force the record to be rebuilt, typically
+with a statement such as @samp{$1 = $1}, as described earlier.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Understanding @code{$0}}
+
+
It is important to remember that @code{$0} is the @emph{full}
record, exactly as it was read from the input. This includes
@@ -6017,6 +6288,8 @@ modified record.
But this does not work, since nothing was done to change the record
itself. Instead, you must force the record to be rebuilt, typically
with a statement such as @samp{$1 = $1}, as described earlier.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE ficon
@@ -6433,8 +6706,12 @@ Each individual character in the record becomes a separate field.
POSIX standard.)
@end table
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Changing @code{FS} Does Not Affect the Fields
+@cindex sidebar, Changing @code{FS} Does Not Affect the Fields
+@ifdocbook
+@docbook
+<sidebar><title>Changing @code{FS} Does Not Affect the Fields</title>
+@end docbook
+
@cindex POSIX @command{awk}, field separators and
@cindex field separators, POSIX and
@@ -6478,8 +6755,97 @@ prints something like:
root:nSijPlPhZZwgE:0:0:Root:/:
@end example
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: @code{FS} and @code{IGNORECASE}
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Changing @code{FS} Does Not Affect the Fields}
+
+
+
+@cindex POSIX @command{awk}, field separators and
+@cindex field separators, POSIX and
+According to the POSIX standard, @command{awk} is supposed to behave
+as if each record is split into fields at the time it is read.
+In particular, this means that if you change the value of @code{FS}
+after a record is read, the value of the fields (i.e., how they were split)
+should reflect the old value of @code{FS}, not the new one.
+
+@cindex dark corner, field separators
+@cindex @command{sed} utility
+@cindex stream editors
+However, many older implementations of @command{awk} do not work this way. Instead,
+they defer splitting the fields until a field is actually
+referenced. The fields are split
+using the @emph{current} value of @code{FS}!
+@value{DARKCORNER}
+This behavior can be difficult
+to diagnose. The following example illustrates the difference
+between the two methods.
+(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.''
+Its behavior is also defined by the POSIX standard.}
+command prints just the first line of @file{/etc/passwd}.)
+
+@example
+sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
+@end example
+
+@noindent
+which usually prints:
+
+@example
+root
+@end example
+
+@noindent
+on an incorrect implementation of @command{awk}, while @command{gawk}
+prints something like:
+
+@example
+root:nSijPlPhZZwgE:0:0:Root:/:
+@end example
+@end cartouche
+@end ifnotdocbook
+
+@cindex sidebar, @code{FS} and @code{IGNORECASE}
+@ifdocbook
+@docbook
+<sidebar><title>@code{FS} and @code{IGNORECASE}</title>
+@end docbook
+
+
+The @code{IGNORECASE} variable
+(@pxref{User-modified})
+affects field splitting @emph{only} when the value of @code{FS} is a regexp.
+It has no effect when @code{FS} is a single character, even if
+that character is a letter. Thus, in the following code:
+
+@example
+FS = "c"
+IGNORECASE = 1
+$0 = "aCa"
+print $1
+@end example
+
+@noindent
+The output is @samp{aCa}. If you really want to split fields on an
+alphabetic character while ignoring case, use a regexp that will
+do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE}
+will take effect.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{@code{FS} and @code{IGNORECASE}}
+
+
The @code{IGNORECASE} variable
(@pxref{User-modified})
@@ -6499,6 +6865,8 @@ The output is @samp{aCa}. If you really want to split fields on an
alphabetic character while ignoring case, use a regexp that will
do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE}
will take effect.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE fisepr
@c ENDOFRANGE fisepg
@@ -8619,9 +8987,44 @@ program may have open to just one! In @command{gawk}, there is no such limit.
@command{gawk} allows a program to
open as many pipelines as the underlying operating system permits.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Piping into @command{sh}
-@cindex advanced features, piping into @command{sh}
+@cindex sidebar, Piping into @command{sh}
+@ifdocbook
+@docbook
+<sidebar><title>Piping into @command{sh}</title>
+@end docbook
+
+@cindex shells, piping commands into
+
+A particularly powerful way to use redirection is to build command lines
+and pipe them into the shell, @command{sh}. For example, suppose you
+have a list of files brought over from a system where all the @value{FN}s
+are stored in uppercase, and you wish to rename them to have names in
+all lowercase. The following program is both simple and efficient:
+
+@c @cindex @command{mv} utility
+@example
+@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @}
+
+END @{ close("sh") @}
+@end example
+
+The @code{tolower()} function returns its argument string with all
+uppercase characters converted to lowercase
+(@pxref{String Functions}).
+The program builds up a list of command lines,
+using the @command{mv} utility to rename the files.
+It then sends the list to the shell for execution.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Piping into @command{sh}}
+
+
@cindex shells, piping commands into
A particularly powerful way to use redirection is to build command lines
@@ -8643,6 +9046,8 @@ uppercase characters converted to lowercase
The program builds up a list of command lines,
using the @command{mv} utility to rename the files.
It then sends the list to the shell for execution.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE outre
@c ENDOFRANGE reout
@@ -8997,9 +9402,12 @@ delayed until
@ref{Two-way I/O},
which discusses it in more detail and gives an example.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Using @code{close()}'s Return Value
-@cindex advanced features, @code{close()} function
+@cindex sidebar, Using @code{close()}'s Return Value
+@ifdocbook
+@docbook
+<sidebar><title>Using @code{close()}'s Return Value</title>
+@end docbook
+
@cindex dark corner, @code{close()} function
@cindex @code{close()} function, return values
@cindex return values@comma{} @code{close()} function
@@ -9046,6 +9454,64 @@ pipes; thus the return value cannot be used portably.
In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
when closing a pipe.
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Using @code{close()}'s Return Value}
+
+
+@cindex dark corner, @code{close()} function
+@cindex @code{close()} function, return values
+@cindex return values@comma{} @code{close()} function
+@cindex differences in @command{awk} and @command{gawk}, @code{close()} function
+@cindex Unix @command{awk}, @code{close()} function and
+
+In many versions of Unix @command{awk}, the @code{close()} function
+is actually a statement. It is a syntax error to try and use the return
+value from @code{close()}:
+@value{DARKCORNER}
+
+@example
+command = "@dots{}"
+command | getline info
+retval = close(command) # syntax error in many Unix awks
+@end example
+
+@cindex @command{gawk}, @code{ERRNO} variable in
+@cindex @code{ERRNO} variable
+@command{gawk} treats @code{close()} as a function.
+The return value is @minus{}1 if the argument names something
+that was never opened with a redirection, or if there is
+a system problem closing the file or process.
+In these cases, @command{gawk} sets the built-in variable
+@code{ERRNO} to a string describing the problem.
+
+In @command{gawk},
+when closing a pipe or coprocess (input or output),
+the return value is the exit status of the command.@footnote{
+This is a full 16-bit value as returned by the @code{wait()}
+system call. See the system manual pages for information on
+how to decode this value.}
+Otherwise, it is the return value from the system's @code{close()} or
+@code{fclose()} C functions when closing input or output
+files, respectively.
+This value is zero if the close succeeds, or @minus{}1 if
+it fails.
+
+The POSIX standard is very vague; it says that @code{close()}
+returns zero on success and nonzero otherwise. In general,
+different implementations vary in what they report when closing
+pipes; thus the return value cannot be used portably.
+@value{DARKCORNER}
+In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
+when closing a pipe.
+@end cartouche
+@end ifnotdocbook
+
@c ENDOFRANGE ifc
@c ENDOFRANGE ofc
@c ENDOFRANGE pc
@@ -9232,8 +9698,35 @@ If @command{gawk} is in compatibility mode
they are not available.
@c fakenode --- for prepinfo
-@subheading Advanced Notes: A Constant's Base Does Not Affect Its Value
-@cindex advanced features, constants@comma{} values of
+@cindex sidebar, A Constant's Base Does Not Affect Its Value
+@ifdocbook
+@docbook
+<sidebar><title>A Constant's Base Does Not Affect Its Value</title>
+@end docbook
+
+
+Once a numeric constant has
+been converted internally into a number,
+@command{gawk} no longer remembers
+what the original form of the constant was; the internal value is
+always used. This has particular consequences for conversion of
+numbers to strings:
+
+@example
+$ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'}
+@print{} 0x11 is <17>
+@end example
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{A Constant's Base Does Not Affect Its Value}
+
+
Once a numeric constant has
been converted internally into a number,
@@ -9246,6 +9739,8 @@ numbers to strings:
$ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'}
@print{} 0x11 is <17>
@end example
+@end cartouche
+@end ifnotdocbook
@node Regexp Constants
@subsubsection Regular Expression Constants
@@ -10130,9 +10625,60 @@ Only the @samp{^=} operator is specified by POSIX.
For maximum portability, do not use the @samp{**=} operator.
@end quotation
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Syntactic Ambiguities Between @samp{/=} and Regular Expressions
-@cindex advanced features, regexp constants
+@cindex sidebar, Syntactic Ambiguities Between @samp{/=} and Regular Expressions
+@ifdocbook
+@docbook
+<sidebar><title>Syntactic Ambiguities Between @samp{/=} and Regular Expressions</title>
+@end docbook
+
+@cindex dark corner, regexp constants, @code{/=} operator and
+@cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
+@cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
+@cindex regexp constants, @code{/=@dots{}/}, @code{/=} operator and
+
+@c derived from email from "Nelson H. F. Beebe" <beebe@math.utah.edu>
+@c Date: Mon, 1 Sep 1997 13:38:35 -0600 (MDT)
+
+@cindex dark corner
+@cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
+@cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant
+@cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant
+There is a syntactic ambiguity between the @code{/=} assignment
+operator and regexp constants whose first character is an @samp{=}.
+@value{DARKCORNER}
+This is most notable in some commercial @command{awk} versions.
+For example:
+
+@example
+$ awk /==/ /dev/null
+@error{} awk: syntax error at source line 1
+@error{} context is
+@error{} >>> /= <<<
+@error{} awk: bailing out at source line 1
+@end example
+
+@noindent
+A workaround is:
+
+@example
+awk '/[=]=/' /dev/null
+@end example
+
+@command{gawk} does not have this problem,
+nor do the other
+freely available versions described in
+@ref{Other Versions}.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Syntactic Ambiguities Between @samp{/=} and Regular Expressions}
+
+
@cindex dark corner, regexp constants, @code{/=} operator and
@cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
@cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant
@@ -10170,6 +10716,8 @@ awk '/[=]=/' /dev/null
nor do the other
freely available versions described in
@ref{Other Versions}.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE exas
@c ENDOFRANGE opas
@c ENDOFRANGE asop
@@ -10249,9 +10797,64 @@ as the value of the expression.
like @samp{@var{lvalue}++}, but instead of adding, it subtracts.)
@end table
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Operator Evaluation Order
-@cindex advanced features, operators@comma{} precedence
+@cindex sidebar, Operator Evaluation Order
+@ifdocbook
+@docbook
+<sidebar><title>Operator Evaluation Order</title>
+@end docbook
+
+@cindex precedence
+@cindex operators, precedence
+@cindex portability, operators
+@cindex evaluation order
+@cindex Marx, Groucho
+@quotation
+@i{Doctor, doctor! It hurts when I do this!@*
+So don't do that!}@*
+Groucho Marx
+@end quotation
+
+@noindent
+What happens for something like the following?
+
+@example
+b = 6
+print b += b++
+@end example
+
+@noindent
+Or something even stranger?
+
+@example
+b = 6
+b += ++b + b++
+print b
+@end example
+
+@cindex side effects
+In other words, when do the various side effects prescribed by the
+postfix operators (@samp{b++}) take effect?
+When side effects happen is @dfn{implementation defined}.
+In other words, it is up to the particular version of @command{awk}.
+The result for the first example may be 12 or 13, and for the second, it
+may be 22 or 23.
+
+In short, doing things like this is not recommended and definitely
+not anything that you can rely upon for portability.
+You should avoid such things in your own programs.
+@c You'll sleep better at night and be able to look at yourself
+@c in the mirror in the morning.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Operator Evaluation Order}
+
+
@cindex precedence
@cindex operators, precedence
@cindex portability, operators
@@ -10293,6 +10896,8 @@ not anything that you can rely upon for portability.
You should avoid such things in your own programs.
@c You'll sleep better at night and be able to look at yourself
@c in the mirror in the morning.
+@end cartouche
+@end ifnotdocbook
@c ENDOFRANGE inop
@c ENDOFRANGE opde
@c ENDOFRANGE deop
@@ -13366,11 +13971,54 @@ are available as elements within the @code{SYMTAB} array.
@c ENDOFRANGE bvconi
@c ENDOFRANGE vbconi
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Changing @code{NR} and @code{FNR}
+@cindex sidebar, Changing @code{NR} and @code{FNR}
+@ifdocbook
+@docbook
+<sidebar><title>Changing @code{NR} and @code{FNR}</title>
+@end docbook
+
+@cindex @code{NR} variable, changing
+@cindex @code{FNR} variable, changing
+@cindex dark corner, @code{FNR}/@code{NR} variables
+@command{awk} increments @code{NR} and @code{FNR}
+each time it reads a record, instead of setting them to the absolute
+value of the number of records read. This means that a program can
+change these variables and their new values are incremented for
+each record.
+@value{DARKCORNER}
+The following example shows this:
+
+@example
+$ @kbd{echo '1}
+> @kbd{2}
+> @kbd{3}
+> @kbd{4' | awk 'NR == 2 @{ NR = 17 @}}
+> @kbd{@{ print NR @}'}
+@print{} 1
+@print{} 17
+@print{} 18
+@print{} 19
+@end example
+
+@noindent
+Before @code{FNR} was added to the @command{awk} language
+(@pxref{V7/SVR3.1}),
+many @command{awk} programs used this feature to track the number of
+records in a file by resetting @code{NR} to zero when @code{FILENAME}
+changed.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Changing @code{NR} and @code{FNR}}
+
+
@cindex @code{NR} variable, changing
@cindex @code{FNR} variable, changing
-@cindex advanced features, @code{FNR}/@code{NR} variables
@cindex dark corner, @code{FNR}/@code{NR} variables
@command{awk} increments @code{NR} and @code{FNR}
each time it reads a record, instead of setting them to the absolute
@@ -13398,6 +14046,8 @@ Before @code{FNR} was added to the @command{awk} language
many @command{awk} programs used this feature to track the number of
records in a file by resetting @code{NR} to zero when @code{FILENAME}
changed.
+@end cartouche
+@end ifnotdocbook
@node ARGC and ARGV
@subsection Using @code{ARGC} and @code{ARGV}
@@ -15969,9 +16619,39 @@ and the special cases for @code{sub()} and @code{gsub()},
we recommend the use of @command{gawk} and @code{gensub()} when you have
to do substitutions.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Matching the Null String
-@cindex advanced features, null strings@comma{} matching
+@cindex sidebar, Matching the Null String
+@ifdocbook
+@docbook
+<sidebar><title>Matching the Null String</title>
+@end docbook
+
+@cindex matching, null strings
+@cindex null strings, matching
+@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
+@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
+
+In @command{awk}, the @samp{*} operator can match the null string.
+This is particularly important for the @code{sub()}, @code{gsub()},
+and @code{gensub()} functions. For example:
+
+@example
+$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
+@print{} XaXbXcX
+@end example
+
+@noindent
+Although this makes a certain amount of sense, it can be surprising.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Matching the Null String}
+
+
@cindex matching, null strings
@cindex null strings, matching
@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
@@ -15988,6 +16668,8 @@ $ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
@noindent
Although this makes a certain amount of sense, it can be surprising.
+@end cartouche
+@end ifnotdocbook
@node I/O Functions
@subsection Input/Output Functions
@@ -16121,9 +16803,12 @@ When @option{--sandbox} is specified, the @code{system()} function is disabled
@end table
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Interactive Versus Noninteractive Buffering
-@cindex advanced features, buffering
+@cindex sidebar, Interactive Versus Noninteractive Buffering
+@ifdocbook
+@docbook
+<sidebar><title>Interactive Versus Noninteractive Buffering</title>
+@end docbook
+
@cindex buffering, interactive vs.@: noninteractive
As a side point, buffering issues can be even more confusing, depending
@@ -16165,9 +16850,130 @@ $ @kbd{awk '@{ print $1 + $2 @}' | cat}
Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because
it is all buffered and sent down the pipe to @command{cat} in one shot.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: Controlling Output Buffering with @code{system()}
-@cindex advanced features, buffering
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Interactive Versus Noninteractive Buffering}
+
+
+@cindex buffering, interactive vs.@: noninteractive
+
+As a side point, buffering issues can be even more confusing, depending
+upon whether your program is @dfn{interactive}, i.e., communicating
+with a user sitting at a keyboard.@footnote{A program is interactive
+if the standard output is connected to a terminal device. On modern
+systems, this means your keyboard and screen.}
+
+@c Thanks to Walter.Mecky@dresdnerbank.de for this example, and for
+@c motivating me to write this section.
+Interactive programs generally @dfn{line buffer} their output; i.e., they
+write out every line. Noninteractive programs wait until they have
+a full buffer, which may be many lines of output.
+Here is an example of the difference:
+
+@example
+$ @kbd{awk '@{ print $1 + $2 @}'}
+@kbd{1 1}
+@print{} 2
+@kbd{2 3}
+@print{} 5
+@kbd{@value{CTL}-d}
+@end example
+
+@noindent
+Each line of output is printed immediately. Compare that behavior
+with this example:
+
+@example
+$ @kbd{awk '@{ print $1 + $2 @}' | cat}
+@kbd{1 1}
+@kbd{2 3}
+@kbd{@value{CTL}-d}
+@print{} 2
+@print{} 5
+@end example
+
+@noindent
+Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because
+it is all buffered and sent down the pipe to @command{cat} in one shot.
+@end cartouche
+@end ifnotdocbook
+
+@cindex sidebar, Controlling Output Buffering with @code{system()}
+@ifdocbook
+@docbook
+<sidebar><title>Controlling Output Buffering with @code{system()}</title>
+@end docbook
+
+@cindex buffers, flushing
+@cindex buffering, input/output
+@cindex output, buffering
+
+The @code{fflush()} function provides explicit control over output buffering for
+individual files and pipes. However, its use is not portable to many older
+@command{awk} implementations. An alternative method to flush output
+buffers is to call @code{system()} with a null string as its argument:
+
+@example
+system("") # flush output
+@end example
+
+@noindent
+@command{gawk} treats this use of the @code{system()} function as a special
+case and is smart enough not to run a shell (or other command
+interpreter) with the empty command. Therefore, with @command{gawk}, this
+idiom is not only useful, it is also efficient. While this method should work
+with other @command{awk} implementations, it does not necessarily avoid
+starting an unnecessary shell. (Other implementations may only
+flush the buffer associated with the standard output and not necessarily
+all buffered output.)
+
+If you think about what a programmer expects, it makes sense that
+@code{system()} should flush any pending output. The following program:
+
+@example
+BEGIN @{
+ print "first print"
+ system("echo system echo")
+ print "second print"
+@}
+@end example
+
+@noindent
+must print:
+
+@example
+first print
+system echo
+second print
+@end example
+
+@noindent
+and not:
+
+@example
+system echo
+first print
+second print
+@end example
+
+If @command{awk} did not flush its buffers before calling @code{system()},
+you would see the latter (undesirable) output.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Controlling Output Buffering with @code{system()}}
+
+
@cindex buffers, flushing
@cindex buffering, input/output
@cindex output, buffering
@@ -16222,6 +17028,8 @@ second print
If @command{awk} did not flush its buffers before calling @code{system()},
you would see the latter (undesirable) output.
+@end cartouche
+@end ifnotdocbook
@node Time Functions
@subsection Time Functions
@@ -18997,8 +19805,35 @@ END @{ endfile(_filename_) @}
shows how this library function can be used and
how it simplifies writing the main program.
-@c fakenode --- for prepinfo
-@subheading Advanced Notes: So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?
+@cindex sidebar, So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?
+@ifdocbook
+@docbook
+<sidebar><title>So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?</title>
+@end docbook
+
+
+You are probably wondering, if @code{beginfile()} and @code{endfile()}
+functions can do the job, why does @command{gawk} have
+@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+
+Good question. Normally, if @command{awk} cannot open a file, this
+causes an immediate fatal error. In this case, there is no way for a
+user-defined function to deal with the problem, since the mechanism for
+calling it relies on the file being open and at the first record. Thus,
+the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
+files that cannot be processed. @code{ENDFILE} exists for symmetry,
+and because it provides an easy way to do per-file cleanup processing.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?}
+
+
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
@@ -19011,6 +19846,8 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+@end cartouche
+@end ifnotdocbook
@node Rewind Function
@subsection Rereading the Current File
@@ -37528,3 +38365,35 @@ Suggestions:
% in the two sample code chapters.
% 2. Nuke the BBS stuff and use something that won't be obsolete
% 3. Turn the advanced notes into sidebars by using @cartouche
+
+Better sidebars can almost sort of be done with:
+
+ @ifdocbook
+ @macro @sidebar{title, content}
+ @inlinefmt{docbook, <sidebar><title>}
+ \title\
+ @inlinefmt{docbook, </title>}
+ \content\
+ @inlinefmt{docbook, </sidebar>}
+ @end macro
+ @end ifdocbook
+
+
+ @ifnotdocbook
+ @macro @sidebar{title, content}
+ @cartouche
+ @center @b{\title\}
+
+ \content\
+ @end cartouche
+ @end macro
+ @end ifnotdocbook
+
+But to use it you have to say
+
+ @sidebar{Title Here,
+ @include file-with-content
+ }
+
+which sorta sucks.
+