aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi134
1 files changed, 73 insertions, 61 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 1dd75e51..63489dae 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -3289,8 +3289,8 @@ The following list describes @command{gawk}-specific options:
@table @code
@item -b
@itemx --characters-as-bytes
-@cindex @code{-b} option
-@cindex @code{--characters-as-bytes} option
+@cindex @option{-b} option
+@cindex @option{--characters-as-bytes} option
Cause @command{gawk} to treat all input data as single-byte characters.
In addition, all output written with @code{print} or @code{printf}
are treated as single-byte characters.
@@ -3304,8 +3304,8 @@ multibyte characters. This option is an easy way to tell @command{gawk}:
@item -c
@itemx --traditional
-@cindex @code{-c} option
-@cindex @code{--traditional} option
+@cindex @option{-c} option
+@cindex @option{--traditional} option
@cindex compatibility mode (@command{gawk}), specifying
Specify @dfn{compatibility mode}, in which the GNU extensions to
the @command{awk} language are disabled, so that @command{gawk} behaves just
@@ -3316,17 +3316,17 @@ which summarizes the extensions. Also see
@item -C
@itemx --copyright
-@cindex @code{-C} option
-@cindex @code{--copyright} option
+@cindex @option{-C} option
+@cindex @option{--copyright} option
@cindex GPL (General Public License), printing
Print the short version of the General Public License and then exit.
@item -d@r{[}@var{file}@r{]}
@itemx --dump-variables@r{[}=@var{file}@r{]}
-@cindex @code{-d} option
-@cindex @code{--dump-variables} option
-@cindex @code{awkvars.out} file
-@cindex files, @code{awkvars.out}
+@cindex @option{-d} option
+@cindex @option{--dump-variables} option
+@cindex @file{awkvars.out} file
+@cindex files, @file{awkvars.out}
@cindex variables, global, printing list of
Print a sorted list of global variables, their types, and final values
to @var{file}. If no @var{file} is provided, print this
@@ -3345,8 +3345,8 @@ names like @code{i}, @code{j}, etc.)
@item -D@r{[}@var{file}@r{]}
@itemx --debug=@r{[}@var{file}@r{]}
-@cindex @code{-D} option
-@cindex @code{--debug} option
+@cindex @option{-D} option
+@cindex @option{--debug} option
@cindex @command{awk} debugging, enabling
Enable debugging of @command{awk} programs
(@pxref{Debugging}).
@@ -3358,8 +3358,8 @@ No space is allowed between the @option{-D} and @var{file}, if
@item -e @var{program-text}
@itemx --source @var{program-text}
-@cindex @code{-e} option
-@cindex @code{--source} option
+@cindex @option{-e} option
+@cindex @option{--source} option
@cindex source code, mixing
Provide program source code in the @var{program-text}.
This option allows you to mix source code in files with source
@@ -3370,8 +3370,8 @@ programs (@pxref{AWKPATH Variable}).
@item -E @var{file}
@itemx --exec @var{file}
-@cindex @code{-E} option
-@cindex @code{--exec} option
+@cindex @option{-E} option
+@cindex @option{--exec} option
@cindex @command{awk} programs, location of
@cindex CGI, @command{awk} scripts for
Similar to @option{-f}, read @command{awk} program text from @var{file}.
@@ -3401,8 +3401,8 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so:
@item -g
@itemx --gen-pot
-@cindex @code{-g} option
-@cindex @code{--gen-pot} option
+@cindex @option{-g} option
+@cindex @option{--gen-pot} option
@cindex portable object files, generating
@cindex files, portable object, generating
Analyze the source program and
@@ -3413,8 +3413,8 @@ for information about this option.
@item -h
@itemx --help
-@cindex @code{-h} option
-@cindex @code{--help} option
+@cindex @option{-h} option
+@cindex @option{--help} option
@cindex GNU long options, printing list of
@cindex options, printing list of
@cindex printing, list of options
@@ -3439,8 +3439,8 @@ find the main source code via the @option{-f} option or on the command-line.
@item -l @var{lib}
@itemx --load @var{lib}
-@cindex @code{-l} option
-@cindex @code{--load} option
+@cindex @option{-l} option
+@cindex @option{--load} option
@cindex loading, library
Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH}
environment variable. The correct library suffix for your platform will be
@@ -3451,8 +3451,8 @@ a shared library.
@item -L @r{[}value@r{]}
@itemx --lint@r{[}=value@r{]}
-@cindex @code{-l} option
-@cindex @code{--lint} option
+@cindex @option{-l} option
+@cindex @option{--lint} option
@cindex lint checking, issuing warnings
@cindex warnings, issuing
Warn about constructs that are dubious or nonportable to
@@ -3474,16 +3474,16 @@ care to search for all occurrences of each inappropriate construct. As
@item -M
@itemx --bignum
-@cindex @code{-M} option
-@cindex @code{--bignum} option
+@cindex @option{-M} option
+@cindex @option{--bignum} option
Force arbitrary precision arithmetic on numbers. This option has no effect
if @command{gawk} is not compiled to use the GNU MPFR and MP libraries
(@pxref{Arbitrary Precision Arithmetic}).
@item -n
@itemx --non-decimal-data
-@cindex @code{-n} option
-@cindex @code{--non-decimal-data} option
+@cindex @option{-n} option
+@cindex @option{--non-decimal-data} option
@cindex hexadecimal values@comma{} enabling interpretation of
@cindex octal values@comma{} enabling interpretation of
@cindex troubleshooting, @code{--non-decimal-data} option
@@ -3498,15 +3498,15 @@ Use with care.
@item -N
@itemx --use-lc-numeric
-@cindex @code{-N} option
-@cindex @code{--use-lc-numeric} option
+@cindex @option{-N} option
+@cindex @option{--use-lc-numeric} option
Force the use of the locale's decimal point character
when parsing numeric input data (@pxref{Locales}).
@item -o@r{[}@var{file}@r{]}
@itemx --pretty-print@r{[}=@var{file}@r{]}
-@cindex @code{-o} option
-@cindex @code{--pretty-print} option
+@cindex @option{-o} option
+@cindex @option{--pretty-print} option
Enable pretty-printing of @command{awk} programs.
By default, output program is created in a file named @file{awkprof.out}.
The optional @var{file} argument allows you to specify a different
@@ -3516,16 +3516,16 @@ No space is allowed between the @option{-o} and @var{file}, if
@item -O
@itemx --optimize
-@cindex @code{--optimize} option
-@cindex @code{-O} option
+@cindex @option{--optimize} option
+@cindex @option{-O} option
Enable some optimizations on the internal representation of the program.
At the moment this includes just simple constant folding. The @command{gawk}
maintainer hopes to add more optimizations over time.
@item -p@r{[}@var{file}@r{]}
@itemx --profile@r{[}=@var{file}@r{]}
-@cindex @code{-p} option
-@cindex @code{--profile} option
+@cindex @option{-p} option
+@cindex @option{--profile} option
@cindex @command{awk} profiling, enabling
Enable profiling of @command{awk} programs
(@pxref{Profiling}).
@@ -3540,8 +3540,8 @@ in the left margin, and function call counts for each function.
@item -P
@itemx --posix
-@cindex @code{-P} option
-@cindex @code{--posix} option
+@cindex @option{-P} option
+@cindex @option{--posix} option
@cindex POSIX mode
@cindex @command{gawk}, extensions@comma{} disabling
Operate in strict POSIX mode. This disables all @command{gawk}
@@ -3590,8 +3590,8 @@ also issues a warning if both options are supplied.
@item -r
@itemx --re-interval
-@cindex @code{-r} option
-@cindex @code{--re-interval} option
+@cindex @option{-r} option
+@cindex @option{--re-interval} option
@cindex regular expressions, interval expressions and
Allow interval expressions
(@pxref{Regexp Operators})
@@ -3602,8 +3602,8 @@ and for use in combination with the @option{--traditional} option.
@item -S
@itemx --sandbox
-@cindex @code{-S} option
-@cindex @code{--sandbox} option
+@cindex @option{-S} option
+@cindex @option{--sandbox} option
@cindex sandbox mode
Disable the @code{system()} function,
input redirections with @code{getline},
@@ -3615,16 +3615,16 @@ can't access your system (other than the specified input data file).
@item -t
@itemx --lint-old
-@cindex @code{-L} option
-@cindex @code{--lint-old} option
+@cindex @option{-L} option
+@cindex @option{--lint-old} option
Warn about constructs that are not available in the original version of
@command{awk} from Version 7 Unix
(@pxref{V7/SVR3.1}).
@item -V
@itemx --version
-@cindex @code{-V} option
-@cindex @code{--version} option
+@cindex @option{-V} option
+@cindex @option{--version} option
@cindex @command{gawk}, versions of, information about@comma{} printing
Print version information for this particular copy of @command{gawk}.
This allows you to determine if your copy of @command{gawk} is up to date
@@ -5043,8 +5043,8 @@ These sequences are:
@item Collating symbols
Multicharacter collating elements enclosed between
@samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element,
-then @code{[[.ch.]]} is a regexp that matches this collating element, whereas
-@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
+then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas
+@samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
@cindex bracket expressions, equivalence classes
@item Equivalence classes
@@ -5052,7 +5052,7 @@ Locale-specific names for a list of
characters that are equal. The name is enclosed between
@samp{[=} and @samp{=]}.
For example, the name @samp{e} might be used to represent all of
-``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp
+``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp
that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}.
@end table
@@ -5096,7 +5096,7 @@ or underscores (@samp{_}):
@item \s
Matches any whitespace character.
Think of it as shorthand for
-@w{@code{[[:space:]]}}.
+@w{@samp{[[:space:]]}}.
@c @cindex operators, @code{\S} (@command{gawk})
@cindex backslash (@code{\}), @code{\S} operator (@command{gawk})
@@ -5104,7 +5104,7 @@ Think of it as shorthand for
@item \S
Matches any character that is not whitespace.
Think of it as shorthand for
-@w{@code{[^[:space:]]}}.
+@w{@samp{[^[:space:]]}}.
@c @cindex operators, @code{\w} (@command{gawk})
@cindex backslash (@code{\}), @code{\w} operator (@command{gawk})
@@ -5112,7 +5112,7 @@ Think of it as shorthand for
@item \w
Matches any word-constituent character---that is, it matches any
letter, digit, or underscore. Think of it as shorthand for
-@w{@code{[[:alnum:]_]}}.
+@w{@samp{[[:alnum:]_]}}.
@c @cindex operators, @code{\W} (@command{gawk})
@cindex backslash (@code{\}), @code{\W} operator (@command{gawk})
@@ -5120,7 +5120,7 @@ letter, digit, or underscore. Think of it as shorthand for
@item \W
Matches any character that is not word-constituent.
Think of it as shorthand for
-@w{@code{[^[:alnum:]_]}}.
+@w{@samp{[^[:alnum:]_]}}.
@c @cindex operators, @code{\<} (@command{gawk})
@cindex backslash (@code{\}), @code{\<} operator (@command{gawk})
@@ -5231,7 +5231,7 @@ are allowed.
@item @code{--traditional}
Traditional Unix @command{awk} regexps are matched. The GNU operators
are not special, and interval expressions are not available.
-The POSIX character classes (@code{[[:alnum:]]}, etc.) are supported,
+The POSIX character classes (@samp{[[:alnum:]]}, etc.) are supported,
as Brian Kernighan's @command{awk} does support them.
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
@@ -5857,21 +5857,27 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@command{gawk} in fact accepts this, and uses the @sc{nul}
character for the record separator.
However, this usage is @emph{not} portable
-to other @command{awk} implementations.
+to most other @command{awk} implementations.
@cindex dark corner, strings, storing
-All other @command{awk} implementations@footnote{At least that we know
+Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
@sc{nul} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
+It happens that recent versions of @command{mawk} can use the @sc{nul}
+character as a record separator. However, this is a special case:
+@command{mawk} does not allow embedded @sc{nul} characters in strings.
+
@cindex records, treating files as
@cindex files, as single records
The best way to treat a whole file as a single record is to
simply read the file in, one record at a time, concatenating each
record onto the end of the previous ones.
+@c @strong{FIXME}: Using @sc{nul} is good for @file{/proc/environ} etc.
+
@docbook
</sidebar>
@end docbook
@@ -5902,20 +5908,26 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@command{gawk} in fact accepts this, and uses the @sc{nul}
character for the record separator.
However, this usage is @emph{not} portable
-to other @command{awk} implementations.
+to most other @command{awk} implementations.
@cindex dark corner, strings, storing
-All other @command{awk} implementations@footnote{At least that we know
+Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
@sc{nul} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
+It happens that recent versions of @command{mawk} can use the @sc{nul}
+character as a record separator. However, this is a special case:
+@command{mawk} does not allow embedded @sc{nul} characters in strings.
+
@cindex records, treating files as
@cindex files, as single records
The best way to treat a whole file as a single record is to
simply read the file in, one record at a time, concatenating each
record onto the end of the previous ones.
+
+@c @strong{FIXME}: Using @sc{nul} is good for @file{/proc/environ} etc.
@end cartouche
@end ifnotdocbook
@c ENDOFRANGE inspl
@@ -10105,7 +10117,7 @@ point when reading the @command{awk} program source code, and for command-line
variable assignments (@pxref{Other Arguments}).
However, when interpreting input data, for @code{print} and @code{printf} output,
and for number to string conversion, the local decimal point character is used.
-@value{DARKCORNER}.
+@value{DARKCORNER}
Here are some examples indicating the difference in behavior,
on a GNU/Linux system:
@@ -34088,7 +34100,7 @@ The option for raw sockets was removed, since it was never implemented
(@pxref{TCP/IP Networking}).
@item
-Ranges of the form @code{[d-h]} are treated as if they were in the
+Ranges of the form @samp{[d-h]} are treated as if they were in the
C locale, no matter what kind of regexp is being used, and even if
@option{--posix}
(@pxref{Ranges and Locales}).
@@ -34296,7 +34308,7 @@ When @command{gawk} switched to using locale-aware regexp matchers,
the problems began; especially as both GNU/Linux and commercial Unix
vendors started implementing non-ASCII locales, @emph{and making them
the default}. Perhaps the most frequently asked question became something
-like ``why does @code{[A-Z]} match lowercase letters?!?''
+like ``why does @samp{[A-Z]} match lowercase letters?!?''
This situation existed for close to 10 years, if not more, and
the @command{gawk} maintainer grew weary of trying to explain that