diff options
Diffstat (limited to 'doc/gawk.texi')
-rwxr-xr-x[-rw-r--r--] | doc/gawk.texi | 221 |
1 files changed, 165 insertions, 56 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 4ba65e3f..c0b738c7 100644..100755 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -20,9 +20,9 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH June, 2004 +@set UPDATE-MONTH June, 2005 @set VERSION 3.1 -@set PATCHLEVEL 4 +@set PATCHLEVEL 5 @set FSF @@ -110,7 +110,7 @@ Some comments on the layout for TeX. @end iftex @copying -Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc. +Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. @sp 2 This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, @@ -187,8 +187,8 @@ Published by: @sp 1 Free Software Foundation @* -59 Temple Place --- Suite 330 @* -Boston, MA 02111-1307 USA @* +51 Franklin Street, Fifth Floor @* +Boston, MA 02110-1301 USA @* Phone: +1-617-542-5942 @* Fax: +1-617-542-2652 @* Email: @email{gnu@@gnu.org} @* @@ -2465,7 +2465,7 @@ the file was last modified. Its output looks like this: -rw-r--r-- 1 arnold user 1933 Nov 7 13:05 Makefile -rw-r--r-- 1 arnold user 10809 Nov 7 13:03 awk.h -rw-r--r-- 1 arnold user 983 Apr 13 12:14 awk.tab.h --rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awk.y +-rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awkgram.y -rw-r--r-- 1 arnold user 22414 Nov 7 13:03 awk1.c -rw-r--r-- 1 arnold user 37455 Nov 7 13:03 awk2.c -rw-r--r-- 1 arnold user 27511 Dec 9 13:07 awk3.c @@ -4129,6 +4129,16 @@ if the input text that could match the trailing part is fairly long. @command{gawk} attempts to avoid this problem, but currently, there's no guarantee that this will never happen. +@quotation NOTE +Remember that in @command{awk}, the @samp{^} and @samp{$} anchor +metacharacters match the beginning and end of a @emph{string}, and not +the beginning and end of a @emph{line}. As a result, something like +@samp{RS = "^[[:upper:]]"} can only match at the beginning of a file. +This is because @command{gawk} views the input file as one long string +that happens to contain newline characters in it. +It is thus best to avoid anchor characters in the value of @code{RS}. +@end quotation + @cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables The use of @code{RS} as a regular expression and the @code{RT} variable are @command{gawk} extensions; they are not available in @@ -6320,12 +6330,12 @@ $ @kbd{LC_ALL=en_US.UTF-8 gawk -f thousands.awk} @i{Run in US Engli @noindent For more information about locales and internationalization issues, -@strong{FIXME: see xxxx}. +see @ref{Locales}. @quotation NOTE The @samp{'} flag is a nice feature, but its use complicates things: it now becomes difficult to use it in command-line programs. For information -on appropriate quoting tricks, @strong{FIXME: see XXXX}. +on appropriate quoting tricks, see @ref{Quoting}. @end quotation @item @var{width} @@ -7191,7 +7201,7 @@ In these cases, @command{gawk} sets the built-in variable @code{ERRNO} to a string describing the problem. In @command{gawk}, -when closing a pipe or coprocess, +when closing a pipe or coprocess (input or output), the return value is the exit status of the command.@footnote{ This is a full 16-bit value as returned by the @code{wait} system call. See the system manual pages for information on @@ -10132,11 +10142,13 @@ for more information on this version of the @code{for} loop. @cindex @code{case} keyword @cindex @code{default} keyword -@strong{NOTE:} This @value{SUBSECTION} describes an experimental feature +@quotation NOTE +This @value{SUBSECTION} describes an experimental feature added in @command{gawk} 3.1.3. It is @emph{not} enabled by default. To enable it, use the @option{--enable-switch} option to @command{configure} when @command{gawk} is being configured and built. @xref{Additional Configuration Options}, for more information. +@end quotation The @code{switch} statement allows the evaluation of an expression and the execution of statements based on a @code{case} match. Case statements @@ -12429,6 +12441,18 @@ version of the standard. Therefore, for programs to be maximally portable, always supply the parentheses. @end quotation +@cindex differences between @command{gawk} and @command{awk} +Beginning with @command{gawk} @value{PVERSION} 3.2, when supplied an +array argument, the @code{length} function returns the number of elements +in the array. This is less useful than it might seem at first, as the +array is not guaranteed to be indexed from one to the number of elements +in it. +If @option{--lint} is provided on the command line +(@pxref{Options}), +@command{gawk} warns that passing an array argument is not portable. +If @option{--posix} is supplied, using an array argument is a fatal error +(@pxref{Arrays}). + @item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]}) @cindex @code{match} function The @code{match} function searches @var{string} for the @@ -12645,6 +12669,9 @@ works only for decimal data, not for octal or hexadecimal.@footnote{Unless you use the @option{--non-decimal-data} option, which isn't recommended. @xref{Nondecimal Data}, for more information.} +Note also that @code{strtonum} uses the current locale's decimal point +for recognizing numbers. + @cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk}) @code{strtonum} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @@ -15516,14 +15543,6 @@ As of this writing, the latest version of GNU @code{gettext} is If a translation of @command{gawk}'s messages exists, then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. - -@cindex @code{--with-included-gettext} configuration option -@cindex configuration option, @code{--with-included-gettext} -On systems that do not use @value{PVERSION} 2 (or later) of the GNU C library, you should -configure @command{gawk} with the @option{--with-included-gettext} option -before compiling and installing it. -@xref{Additional Configuration Options}, -for more information. @c ENDOFRANGE inloc @node Advanced Features @@ -16444,6 +16463,29 @@ inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variable names like @code{i}, @code{j}, etc.) +@item -W exec @var{file} +@itemx --exec @var{file} +@cindex @code{--exec} option +@cindex @command{awk} programs, location of +@cindex CGI, @command{awk} scripts for +Similar to @option{-f}, reads @command{awk} program text from @var{file}. +There are two differences. The fist is that this option also terminates option processing; anything +else on the command line is passed on directly to the @command{awk} program. +The second is that command line variable assignments of the form +@samp{@var{var}=@var{value}} are disallowed. + +This option is particularly necessary for World Wide Web CGI applications +that pass arguments through the URL; using this option prevents a malicious +(or other) user from passing in options, assignments, or @command{awk} source code (via +@option{--source}) to the CGI application. This option should be used +with @samp{#!} scripts (@pxref{Executable Scripts}), like so: + +@example +#! /usr/local/bin/gawk --exec + +@var{awk program here @dots{}} +@end example + @item -W gen-po @itemx --gen-po @cindex @code{--gen-po} option @@ -23291,6 +23333,15 @@ at compile time POSIX compliance for @code{sub} and @code{gsub} (@pxref{Gory Details}). +@item +The @option{--exec} option, for use in CGI scripts. +(@pxref{Options}). + +@item +The @code{length} function was extended to accept an array argument +and return the number of elements in the array +(@pxref{String Functions}). + @end itemize @c XXX ADD MORE STUFF HERE @@ -23533,8 +23584,8 @@ Their address is: @display Free Software Foundation -59 Temple Place, Suite 330 -Boston, MA 02111-1307 USA +51 Franklin Street, Fifth Floor +Boston, MA 02110-1301 USA Phone: +1-617-542-5942 Fax (including Japan): +1-617-542-2652 Email: @email{gnu@@gnu.org} @@ -23714,11 +23765,8 @@ These files and subdirectories are used when configuring @command{gawk} for various Unix systems. They are explained in @ref{Unix Installation}. -@item intl/* -@itemx po/* -The @file{intl} directory provides the GNU @code{gettext} library, which implements -@command{gawk}'s internationalization features, while the @file{po} library -contains message translations. +@item po/* +The @file{po} library contains message translations. @item awklib/extract.awk @itemx awklib/Makefile.am @@ -23868,17 +23916,6 @@ Enable the recognition and execution of C-style @code{switch} statements in @command{awk} programs (@pxref{Switch Statement}.) -@cindex Linux -@cindex GNU/Linux -@cindex @code{--with-included-gettext} configuration option -@cindex @code{--with-included-gettext} configuration option, configuring @command{gawk} with -@cindex configuration option, @code{--with-included-gettext} -@item --with-included-gettext -Use the version of the @code{gettext} library that comes with @command{gawk}. -This option should be used on systems that do @emph{not} use @value{PVERSION} 2 (or later) -of the GNU C library. -All known modern GNU/Linux systems use Glibc 2. Use this option on any other system. - @cindex @code{--disable-lint} configuration option @cindex configuration option, @code{--disable-lint} @item --disable-lint @@ -23905,10 +23942,12 @@ to fail. This option may be removed at a later date. Disable all message-translation facilities. This is usually not desirable, but it may bring you some slight performance improvement. -You should also use this option if @option{--with-included-gettext} -doesn't work on your system. @end table +As of version 3.1.5, the @option{--with-included-gettext} configuration +option is no longer available, since @command{gawk} expects the +GNU @code{gettext} library to be installed as an external library. + @node Configuration Philosophy @appendixsubsec The Configuration Process @@ -24171,7 +24210,7 @@ $ CPPFLAGS="-D__ST_MT_ERRNO__" $ export CPPFLAGS $ CFLAGS="-O2 -Zomf -Zmt" $ export CFLAGS -$ LDFLAGS="-s -Zcrtdll -Zlinker /exepack:2 -Zlinker /pm:vio -Zstack 0x8000" +$ LDFLAGS="-s -Zcrtdll -Zlinker /exepack:2 -Zlinker /pm:vio -Zstack 0x6000" $ export LDFLAGS $ RANLIB="echo" $ export RANLIB @@ -24186,11 +24225,13 @@ To get an FHS-compliant file hierarchy it is recommended to use the additional @command{configure} options @option{--infodir=c:/usr/share/info}, @option{--mandir=c:/usr/share/man} and @option{--libexecdir=c:/usr/lib}. +@ignore The internal @code{gettext} library tends to be problematic. It is therefore recommended to use either an external one (@option{--without-included-gettext}) or to disable NLS entirely (@option{--disable-nls}). +@end ignore -If you use GCC 2.95 or newer it is recommended to use also: +If you use GCC 2.95 it is recommended to use also: @example $ LIBS="-lgcc" @@ -24204,14 +24245,19 @@ $ CPPFLAGS="-D__ST_MT_ERRNO__" $ export CPPFLAGS $ CFLAGS="-O2 -Zmt" $ export CFLAGS -$ LDFLAGS="-s -Zstack 0x8000" +$ LDFLAGS="-s -Zstack 0x6000" $ LIBS="-lgcc" $ unset RANLIB -$ ./configure --prefix=c:/usr --without-included-gettext +@c $ ./configure --prefix=c:/usr --without-included-gettext +$ ./configure --prefix=c:/usr $ make @end example @quotation NOTE +Versions later than GCC 2.95, i.e., GCC 3.x using the Innotek libc were not tested. +@end quotation + +@quotation NOTE Even if the compiled @command{gawk.exe} (@code{a.out}) executable contains a DOS header, it does @emph{not} work under DOS. To compile an executable that runs under DOS, @code{"-DPIPES_SIMULATED"} must be added to @env{CPPFLAGS}. @@ -24220,8 +24266,9 @@ But then some nonstandard extensions of @command{gawk} (e.g., @samp{|&}) do not After compilation the internal tests can be performed. Enter @samp{make check CMP="diff -a"} at your command prompt. All tests -but the @code{pid} test are expected to work properly. The @code{pid} -test fails because child processes are not started by @code{fork()}. +except for the @code{pid} test are expected to work properly. +The @code{pid} test fails because child processes are not started by +@code{fork()}. @samp{make install} works as expected. @@ -24891,7 +24938,7 @@ Darrel Hankerson, @email{hankedr@@mail.auburn.edu}. Juan Grigera, @email{juan@@biophnet.unlp.edu.ar}. @item OS/2 -The Unix for OS/2 team, @email{gawk-maintainer@@unixos2.org}. +Andreas Buening, @email{andreas.buening@@nexgo.de}. @cindex Davies, Stephen @item Tandem @@ -25453,6 +25500,11 @@ are very much subject to change in a future @command{gawk} release. Be aware that you may have to re-do everything, perhaps from scratch, at some future time. +@strong{Caution:} If you have written your own dynamic extensions, +be sure to recompile them for each new @command{gawk} release. +There is no guarantee of binary compatibility between different +releases, no will there ever be such a guarantee. + @menu * Internals:: A brief look at some @command{gawk} internals. * Sample Library:: A example of new functions. @@ -25470,7 +25522,7 @@ brief and simplistic; would-be @command{gawk} hackers are encouraged to spend some time reading the source code before trying to write extensions based on the material presented here. Of particular note are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}. -Reading @file{awk.y} in order to see how the parse tree is built +Reading @file{awkgram.y} in order to see how the parse tree is built would also be of use. @cindex @code{awk.h} file (internal) @@ -25663,6 +25715,70 @@ This function is called from within a C extension function to set the value of @command{gawk}'s @code{ERRNO} variable, based on the current value of the C @code{errno} variable. It is provided as a convenience. + +@cindex @code{ERRNO} variable +@cindex @code{update_ERRNO_saved} internal function +@item void update_ERRNO_saved(int errno_saved) +This function is called from within a C extension function to set +the value of @command{gawk}'s @code{ERRNO} variable, based on the saved +value of the C @code{errno} variable provided as the argument. +It is provided as a convenience. + +@strong{Caution:} This function is new as of @command{gawk} 3.1.5. + +@cindex @code{ENVIRON} variable +@cindex @code{PROCINFO} variable +@cindex @code{register_deferred_variable} internal function +@item void register_deferred_variable(const char *name, NODE *(*load_func)(void)) +This function is called to register a function to be called when a +reference to an undefined variable with the given name is encountered. +The callback function will never be called if the variable exists already, +so, unless the calling code is running at program startup, it should first +check whether a variable of the given name already exists. +The argument function must return a pointer to a NODE containing the +newly created variable. This function is used to implement the builtin +@code{ENVIRON} and @code{PROCINFO} variables, so you can refer to them +for examples. + +@strong{Caution:} This function is new as of @command{gawk} 3.1.5. + +@cindex @code{IOBUF} internal structure +@cindex @code{iop_alloc} internal function +@cindex @code{get_record} input method +@cindex @code{close_func} input method +@cindex XML +@cindex @code{register_open_hook} internal function +@item void register_open_hook(void *(*open_func)(IOBUF *)) +This function is called to register a function to be called whenever +a new data file is opened, leading to the creation of an @code{IOBUF} +structure in @code{iop_alloc}. After creating the new @code{IOBUF}, +@code{iop_alloc} will call (in reverse order of registration, so the last +function registered is called first) each open hook until one returns +non-NULL. If any hook returns a non-NULL value, that value is assigned +to the @code{IOBUF}'s @code{opaque} field (which will presumably point +to a structure containing additional state associated with the input +processing), and no further open hooks are called. + +The function called will most likely want to set the @code{IOBUF} +@code{get_record} method to indicate that future input records should +be retrieved by calling that method instead of using the standard +@command{gawk} input processing. + +And the function will also probably want to set the @code{IOBUF} +@code{close_func} method to be called when the file is closed to clean +up any state associated with the input. + +Finally, hook functions should be prepared to receive an @code{IOBUF} +structure where the @code{fd} field is set to @code{INVALID_HANDLE}, +meaning that @command{gawk} was not able to open the file itself. In +this case, the hook function must be able to successfully open the file +and place a valid file descriptor there. + +Currently, for example, the hook function facility is used to implement +the XML parser shared library extension. For more info, please look in +@file{awk.h} and in @file{io.c}. + +@strong{Caution:} This function is new as of @command{gawk} 3.1.5. @end table An argument that is supposed to be an array needs to be handled with @@ -26235,13 +26351,6 @@ in @command{gawk}. @item Databases It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. -@item Large character sets -It would be nice if @command{gawk} could handle UTF-8 and other -character sets that are larger than eight bits. -(@command{gawk} currently has partial multi-byte support, but it -needs an expert to really think out the multi-byte issues and consult -with the maintainer on the appropriate changes.) - @item More @code{lint} warnings There are more things that could be checked for portability. @end table @@ -27516,7 +27625,7 @@ record or a string. @display Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. -59 Temple Place, Suite 330, Boston, MA 02111, USA +51 Franklin Street, Fifth Floor, Boston, MA 02111, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. @@ -27867,7 +27976,7 @@ GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software -Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA. +Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. @end smallexample Also add information on how to contact you by electronic and paper mail. @@ -27921,7 +28030,7 @@ Public License instead of this License. @display Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc. -59 Temple Place, Suite 330, Boston, MA 02111-1307, USA +51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. |