diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/ChangeLog | 222 | ||||
-rw-r--r-- | doc/Makefile.in | 60 | ||||
-rw-r--r-- | doc/awkcard.in | 13 | ||||
-rw-r--r-- | doc/gawk.1 | 26 | ||||
-rw-r--r-- | doc/gawk.info | 4793 | ||||
-rw-r--r-- | doc/gawk.texi | 4132 | ||||
-rw-r--r-- | doc/gawkinet.info | 72 | ||||
-rw-r--r-- | doc/gawkinet.texi | 6 | ||||
-rw-r--r-- | doc/gawktexi.in | 4096 | ||||
-rw-r--r-- | doc/texinfo.tex | 257 |
10 files changed, 9112 insertions, 4565 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog index c1754161..d4f6881b 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,225 @@ +2014-04-08 Arnold D. Robbins <arnold@skeeve.com> + + * 4.1.1: Release tar ball made. + +2014-04-08 Arnold D. Robbins <arnold@skeeve.com> + + * texinfo.tex: Update to latest. + * awkcard.in: Update copyright, patchlevel in download. + * gawktexi.in: Update patchlevel, update month, spell check. + +2014-03-30 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Cleanups to docbook, finish math stuff. + +2014-03-28 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Minor cleanups to the indexing. + + Unrelated: + + * gawktexi.in: Merge in changes needed for creating valid + DocBook XML. Works with post-5.2 Texinfo and dblatex! + +2014-03-27 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Finish the massive indexing improvements such that + functions are indexed the way I want in TeX and the way Eli + wants in Info. + + Unrelated: + + * gawktexi.in: Add a note in extension chapter that lookup of + PROCINFO can fail. + +2014-03-27 Eli Zaretskii <eliz@gnu.org> + + * gawktexi.in: First round of massive indexing improvements. + +2014-03-27 Antonio Giovanni Colombo <azc100@gmail.com> + + * gawktexi.in: Redo all the examples using BBS-list to a different + file that doesn't use out-of-date concepts. + +2014-03-10 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Finish indexing improvements. (For now, anyway.) + + Unrelated: + + * gawk.1: Document the quote flag! (Better late than never.) + * awkcard.in: Update documentation of quote flag. + +2014-03-08 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Minor edits to the discussion of the memory allocation + functions. + +2014-03-08 Andrew J. Schorr <aschorr@telemetry-investments.com> + + * gawktexi.in: Document new extension API functions api_malloc, + api_calloc, api_realloc, and api_free. + +2014-03-07 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Indexing improvements. + +2014-03-02 John E. Malmberg <wb8tyw@qsl.net> + + * gawktexi.in: Remove paragraph about obsolete VMS + compilers. Update reference about building PCSI kit. + +2014-02-27 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Lots of small fixes throughout, update of + profiling output. Finished fixes needed before a release. + +2014-02-20 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Add a quote to the alarm clock program. + +2014-02-15 Arnold D. Robbins <arnold@skeeve.com> + + * texinfo.tex: Update to latest. + +2014-02-14 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Lots of small edits. + +2014-02-07 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: More minor fixes, update UPDATE_MONTH. + +2014-02-03 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: More minor fixes, in indexing. + +2014-02-03 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in, gawkinet.texi: Minor fixes, mostly in indexing. + * texinfo.tex: Update to latest. + +2014-01-31 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Add `()' to names of extension functions in indexing + commands and in one place in the text. Consistency, don'tcha know. + +2014-01-30 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Add a few missing STARTOFRANGE comments. + * gawk.1: Note that `(i, j) in array' doesn't work in for loops. + Update the copyright year. + +2014-01-28 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Update info for Anders Wallin. + +2014-01-25 Arnold D. Robbins <arnold@skeeve.com> + + * texinfo.tex: Updated to current version. + * gawktexi.in: Add magic stuff so that PDFs have "dark red" + links like before. + +2014-01-23 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Feature History): New node. + (Common Extensions): Update features now in mawk, too. + +2014-12-14 John E. Malmberg <wb8tyw@qsl.net> + + * gawktexi.in: Add information on building VMS PCSI kit. + +2014-01-03 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Full Line Fields): New node. + Update copyright year. + +2013-12-29 John E. Malmberg <wb8tyw@qsl.net> + + * gawktexi.in: VMS dynamic extensions. + +2013-12-26 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: More minor additions / fixes. + (Bugs): Add John Malmberg for VMS. Other minor edits. + +2013-12-25 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Minor additions / fixes. + +2013-12-23 John E. Malmberg <wb8tyw@qsl.net> + + * gawktexi.in: Document the VMS exit status encoding. + +2013-12-21 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Additional Configuration Options): Document + the --disable-extensions option. + +2013-12-16 John E. Malmberg <wb8tyw@qsl.net> + + * gawktexi.in: Updates to VMS sections. + +2013-12-12 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Fix the presentation of asort() and asorti(). + Thanks to Andy Schorr for pointing out the problems. + +2013-11-28 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Update quotations to use @author, fix a few + placements of footnotes. + +2013-11-08 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Update the list of files included in the gawk + distribution and fix a few typos. + +2013-11-03 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Fix the section and subsection headings in + the Preface. Also change the short title page to just + "GNU Awk". + +2013-10-31 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Add @shorttitlepage command. + +2013-10-25 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Contributors): Update with more info. + (Distributtion contents): Ditto. + General: Remove all hyphens when used with "multi" prefix. + +2013-10-22 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Other Environment Variables): Document GAWK_MSG_SRC + variable and fix documentation of *_CHAIN_MAX variables. + +2013-10-11 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Conversion, Printf Ordering): Better wording for + descriptions of CONVFMT. Thanks to Hermann Peifer. + +2013-09-29 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Other Versions): Updated info on MKS awk and + some other links. + +2013-09-24 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Readfile function): New node. + +2013-09-22 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (FN, FFN, DF,DDF, PVERSION, CTL): Remove macros. + They have no alternate versions and are just in the way. + +2013-08-15 Arnold D. Robbins <arnold@skeeve.com> + + * gawk.1: Document that ENVIRON updates affect the environment. + * gawktexi.in: Ditto. + 2013-06-27 Arnold D. Robbins <arnold@skeeve.com> * texinfo.tex: Update from Karl, fixes a formating problem. diff --git a/doc/Makefile.in b/doc/Makefile.in index 41f65b0d..52e5f873 100644 --- a/doc/Makefile.in +++ b/doc/Makefile.in @@ -1,7 +1,7 @@ -# Makefile.in generated by automake 1.13.1 from Makefile.am. +# Makefile.in generated by automake 1.13.4 from Makefile.am. # @configure_input@ -# Copyright (C) 1994-2012 Free Software Foundation, Inc. +# Copyright (C) 1994-2013 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -38,23 +38,51 @@ # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA # VPATH = @srcdir@ -am__make_dryrun = \ - { \ - am__dry=no; \ +am__is_gnu_make = test -n '$(MAKEFILE_LIST)' && test -n '$(MAKELEVEL)' +am__make_running_with_option = \ + case $${target_option-} in \ + ?) ;; \ + *) echo "am__make_running_with_option: internal error: invalid" \ + "target option '$${target_option-}' specified" >&2; \ + exit 1;; \ + esac; \ + has_opt=no; \ + sane_makeflags=$$MAKEFLAGS; \ + if $(am__is_gnu_make); then \ + sane_makeflags=$$MFLAGS; \ + else \ case $$MAKEFLAGS in \ *\\[\ \ ]*) \ - echo 'am--echo: ; @echo "AM" OK' | $(MAKE) -f - 2>/dev/null \ - | grep '^AM OK$$' >/dev/null || am__dry=yes;; \ - *) \ - for am__flg in $$MAKEFLAGS; do \ - case $$am__flg in \ - *=*|--*) ;; \ - *n*) am__dry=yes; break;; \ - esac; \ - done;; \ + bs=\\; \ + sane_makeflags=`printf '%s\n' "$$MAKEFLAGS" \ + | sed "s/$$bs$$bs[$$bs $$bs ]*//g"`;; \ esac; \ - test $$am__dry = yes; \ - } + fi; \ + skip_next=no; \ + strip_trailopt () \ + { \ + flg=`printf '%s\n' "$$flg" | sed "s/$$1.*$$//"`; \ + }; \ + for flg in $$sane_makeflags; do \ + test $$skip_next = yes && { skip_next=no; continue; }; \ + case $$flg in \ + *=*|--*) continue;; \ + -*I) strip_trailopt 'I'; skip_next=yes;; \ + -*I?*) strip_trailopt 'I';; \ + -*O) strip_trailopt 'O'; skip_next=yes;; \ + -*O?*) strip_trailopt 'O';; \ + -*l) strip_trailopt 'l'; skip_next=yes;; \ + -*l?*) strip_trailopt 'l';; \ + -[dEDm]) skip_next=yes;; \ + -[JT]) skip_next=yes;; \ + esac; \ + case $$flg in \ + *$$target_option*) has_opt=yes; break;; \ + esac; \ + done; \ + test $$has_opt = yes +am__make_dryrun = (target_option=n; $(am__make_running_with_option)) +am__make_keepgoing = (target_option=k; $(am__make_running_with_option)) pkgdatadir = $(datadir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ diff --git a/doc/awkcard.in b/doc/awkcard.in index 610032b7..ca28f0a7 100644 --- a/doc/awkcard.in +++ b/doc/awkcard.in @@ -1,7 +1,7 @@ .\" AWK Reference Card --- Arnold Robbins, arnold@skeeve.com .\" .\" Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, -.\" 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013 +.\" 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013, 2014 .\" Free Software Foundation, Inc. .\" .\" Permission is granted to make and distribute verbatim copies of @@ -100,7 +100,7 @@ Brian Kernighan and Michael Brennan who reviewed it. \*(CD .SL .nf -\*(FRCopyright \(co 1996\(en2005, 2007, 2009\(en2013 +\*(FRCopyright \(co 1996\(en2005, 2007, 2009\(en2014 Free Software Foundation, Inc. .nf .BT @@ -1493,7 +1493,8 @@ Only has an effect when the field width is wider than the value to be printed. T} \*(CB\*(FC'\*(FR T{ -Use the locale's thousands separator for \*(FC%d\fP, \*(FC%i\fP, and \*(FC%u\fP.\*(CD +Use the locale's thousands separator and decimal +point characters.\*(CD T} \*(FIwidth\fP T{ Pad the field to this width. The field is normally @@ -1938,7 +1939,7 @@ to use the current domain.\*(CB .ES .nf \*(CDHost: \*(FCftp.gnu.org\*(FR -File: \*(FC/gnu/gawk/gawk-4.1.0.tar.gz\fP +File: \*(FC/gnu/gawk/gawk-4.1.1.tar.gz\fP .in +.2i .fi GNU \*(AK (\*(GK). There may be a later version. @@ -1968,8 +1969,8 @@ maintains it.\*(CX .\" --- Copying Permissions .ES .fi -\*(CDCopyright \(co 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, -2007, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc. +\*(CDCopyright \(co 1996\(en2005, +2007, 2009\(en2014 Free Software Foundation, Inc. .sp .5 Permission is granted to make and distribute verbatim copies of this reference card provided the copyright notice and this permission notice @@ -13,7 +13,7 @@ . if \w'\(rq' .ds rq "\(rq . \} .\} -.TH GAWK 1 "May 09 2013" "Free Software Foundation" "Utility Commands" +.TH GAWK 1 "Mar 08 2014" "Free Software Foundation" "Utility Commands" .SH NAME gawk \- pattern scanning and processing language .SH SYNOPSIS @@ -917,11 +917,17 @@ An array containing the values of the current environment. The array is indexed by the environment variables, each element being the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be \fB"/home/arnold"\fR). -Changing this array does not affect the environment seen by programs which +.sp +In POSIX mode, +changing this array does not affect the environment seen by programs which .I gawk spawns via redirection or the .B system() function. +Otherwise, +.I gawk +updates its real environment so that programs it spawns see +the changes. .TP .B ERRNO If a system error occurs either doing a redirection for @@ -1364,6 +1370,11 @@ The construct may also be used in a .B for loop to iterate over all the elements of an array. +However, the +.B "(i, j) in array" +construct only works in tests, not in +.B for +loops. .PP An element may be deleted from an array using the .B delete @@ -2443,6 +2454,15 @@ This applies only to the numeric output formats. This flag only has an effect when the field width is wider than the value to be printed. .TP +.B ' +A single quote character instructs +.I gawk +to insert the locale's thousands-separator character +into decimal numbers, and to also use the locale's +decimal point character with floating point formats. +This requires correct locale support in the C library +and in the definition of the current locale. +.TP .I width The field should be padded to this width. The field is normally padded with spaces. With the @@ -3962,7 +3982,7 @@ We thank him. .SH COPYING PERMISSIONS Copyright \(co 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007, 2009, -2010, 2011, 2012, 2013 +2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc. .PP Permission is granted to make and distribute verbatim copies of diff --git a/doc/gawk.info b/doc/gawk.info index 9072bf06..aad73f7a 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -9,13 +9,12 @@ START-INFO-DIR-ENTRY * awk: (gawk)Invoking gawk. Text scanning and processing. END-INFO-DIR-ENTRY - Copyright (C) 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, -2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013 Free -Software Foundation, Inc. + Copyright (C) 1989, 1991, 1992, 1993, 1996-2005, 2007, 2009-2014 +Free Software Foundation, Inc. This is Edition 4.1 of `GAWK: Effective AWK Programming: A User's -Guide for GNU Awk', for the 4.1.0 (or later) version of the GNU +Guide for GNU Awk', for the 4.1.1 (or later) version of the GNU implementation of AWK. Permission is granted to copy, distribute and/or modify this document @@ -41,13 +40,12 @@ General Introduction This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them. - Copyright (C) 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, -2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013 Free -Software Foundation, Inc. + Copyright (C) 1989, 1991, 1992, 1993, 1996-2005, 2007, 2009-2014 +Free Software Foundation, Inc. This is Edition 4.1 of `GAWK: Effective AWK Programming: A User's -Guide for GNU Awk', for the 4.1.0 (or later) version of the GNU +Guide for GNU Awk', for the 4.1.1 (or later) version of the GNU implementation of AWK. Permission is granted to copy, distribute and/or modify this document @@ -193,10 +191,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) field. * Command Line Field Separator:: Setting `FS' from the command-line. +* Full Line Fields:: Making the full line be a single + field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the `getline' function. @@ -347,9 +347,9 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) `awk'. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in `awk'. -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. * Arrays of Arrays:: True multidimensional arrays. * Built-in:: Summarizes the built-in functions. * Calling Built-in:: How to call built-in functions. @@ -401,6 +401,8 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at + once. * Data File Management:: Functions for managing command-line data files. * Filetrans Function:: A function for handling data file @@ -518,6 +520,7 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with `gawk'. @@ -580,6 +583,8 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) version of `awk'. * POSIX/GNU:: The extensions in `gawk' not in POSIX `awk'. +* Feature History:: The history of the features in + `gawk'. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. @@ -612,9 +617,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * VMS Installation:: Installing `gawk' on VMS. * VMS Compilation:: How to compile `gawk' under VMS. +* VMS Dynamic Extensions:: Compiling `gawk' dynamic + extensions on VMS. * VMS Installation Details:: How to install `gawk' under VMS. * VMS Running:: How to run `gawk' under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. * Bugs:: Reporting Problems and Bugs. @@ -1079,11 +1087,11 @@ by first pressing and holding the `CONTROL' key, next pressing the `d' key and finally releasing both keys. Dark Corners -............ +------------ Dark corners are basically fractal -- no matter how much you - illuminate, there's always a smaller but darker one. - Brian Kernighan + illuminate, there's always a smaller but darker one. -- Brian + Kernighan Until the POSIX standard (and `GAWK: Effective AWK Programming'), many features of `awk' were either poorly documented or not documented @@ -1262,11 +1270,11 @@ acknowledgements: Dr. Nelson Beebe, Andreas Buening, Dr. Manuel Collado, Antonio Colombo, Stephen Davies, Scott Deifik, Akim Demaille, Darrel Hankerson, Michal Jaegermann, Ju"rgen Kahrs, Stepan Kasal, John Malmberg, Dave -Pitts, Chet Ramey, Pat Rankin, Andrew Schorr, Corinna Vinschen, Anders -Wallin, and Eli Zaretskii (in alphabetical order) make up the current -`gawk' "crack portability team." Without their hard work and help, -`gawk' would not be nearly the fine program it is today. It has been -and continues to be a pleasure working with this team of fine people. +Pitts, Chet Ramey, Pat Rankin, Andrew Schorr, Corinna Vinschen, and Eli +Zaretskii (in alphabetical order) make up the current `gawk' "crack +portability team." Without their hard work and help, `gawk' would not +be nearly the fine program it is today. It has been and continues to +be a pleasure working with this team of fine people. Notable code and documentation contributions were made by a number of people. *Note Contributors::, for the full list. @@ -1758,30 +1766,30 @@ File: gawk.info, Node: Sample Data Files, Next: Very Simple, Prev: Running ga =============================== Many of the examples in this Info file take their input from two sample -data files. The first, `BBS-list', represents a list of computer -bulletin board systems together with information about those systems. +data files. The first, `mail-list', represents a list of peoples' names +together with their email addresses and information about those people. The second data file, called `inventory-shipped', contains information about monthly shipments. In both files, each line is considered to be one "record". - In the data file `BBS-list', each record contains the name of a -computer bulletin board, its phone number, the board's baud rate(s), -and a code for the number of hours it is operational. An `A' in the -last column means the board operates 24 hours a day. A `B' in the last -column means the board only operates on evening and weekend hours. A -`C' means the board operates only on weekends: - - aardvark 555-5553 1200/300 B - alpo-net 555-3412 2400/1200/300 A - barfly 555-7685 1200/300 A - bites 555-1675 2400/1200/300 A - camelot 555-0542 300 C - core 555-2912 1200/300 C - fooey 555-1234 2400/1200/300 B - foot 555-6699 1200/300 B - macfoo 555-6480 1200/300 A - sdace 555-3430 2400/1200/300 A - sabafoo 555-2127 1200/300 C + In the data file `mail-list', each record contains the name of a +person, his/her phone number, his/her email-address, and a code for +their relationship with the author of the list. An `A' in the last +column means that the person is an acquaintance. An `F' in the last +column means that the person is a friend. An `R' means that the person +is a relative: + + Amelia 555-5553 amelia.zodiacusque@gmail.com F + Anthony 555-3412 anthony.asserturo@hotmail.com A + Becky 555-7685 becky.algebrarum@gmail.com A + Bill 555-1675 bill.drowning@hotmail.com A + Broderick 555-0542 broderick.aliquotiens@yahoo.com R + Camilla 555-2912 camilla.infusarum@skynet.be R + Fabius 555-1234 fabius.undevicesimus@ucb.edu F + Julie 555-6699 julie.perscrutabor@skeeve.com F + Martin 555-6480 martin.codicibus@hotmail.com A + Samuel 555-3430 samuel.lanceolis@shu.edu A + Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R The data file `inventory-shipped' represents information about shipments during the year. Each record contains the month, the number @@ -1808,18 +1816,8 @@ and the first four months of the current year. Mar 24 75 70 495 Apr 21 70 74 514 - If you are reading this in GNU Emacs using Info, you can copy the -regions of text showing these sample files into your own test files. -This way you can try out the examples shown in the remainder of this -document. You do this by using the command `M-x write-region' to copy -text from the Info file into a file for use with `awk' (*Note -Miscellaneous File Operations: (emacs)Misc File Ops, for more -information). Using this information, create your own `BBS-list' and -`inventory-shipped' files and practice what you learn in this Info file. - - If you are using the stand-alone version of Info, see *note Extract -Program::, for an `awk' program that extracts these data files from -`gawk.texi', the Texinfo source file for this Info file. + The sample files are included in the `gawk' distribution, in the +directory `awklib/eg/data'. File: gawk.info, Node: Very Simple, Next: Two Rules, Prev: Sample Data Files, Up: Getting Started @@ -1828,32 +1826,32 @@ File: gawk.info, Node: Very Simple, Next: Two Rules, Prev: Sample Data Files, ======================== The following command runs a simple `awk' program that searches the -input file `BBS-list' for the character string `foo' (a grouping of +input file `mail-list' for the character string `li' (a grouping of characters is usually called a "string"; the term "string" is based on similar usage in English, such as "a string of pearls," or "a string of cars in a train"): - awk '/foo/ { print $0 }' BBS-list + awk '/li/ { print $0 }' mail-list -When lines containing `foo' are found, they are printed because +When lines containing `li' are found, they are printed because `print $0' means print the current line. (Just `print' by itself means the same thing, so we could have written that instead.) - You will notice that slashes (`/') surround the string `foo' in the -`awk' program. The slashes indicate that `foo' is the pattern to -search for. This type of pattern is called a "regular expression", -which is covered in more detail later (*note Regexp::). The pattern is -allowed to match parts of words. There are single quotes around the -`awk' program so that the shell won't interpret any of it as special -shell characters. + You will notice that slashes (`/') surround the string `li' in the +`awk' program. The slashes indicate that `li' is the pattern to search +for. This type of pattern is called a "regular expression", which is +covered in more detail later (*note Regexp::). The pattern is allowed +to match parts of words. There are single quotes around the `awk' +program so that the shell won't interpret any of it as special shell +characters. Here is what this program prints: - $ awk '/foo/ { print $0 }' BBS-list - -| fooey 555-1234 2400/1200/300 B - -| foot 555-6699 1200/300 B - -| macfoo 555-6480 1200/300 A - -| sabafoo 555-2127 1200/300 C + $ awk '/li/ { print $0 }' mail-list + -| Amelia 555-5553 amelia.zodiacusque@gmail.com F + -| Broderick 555-0542 broderick.aliquotiens@yahoo.com R + -| Julie 555-6699 julie.perscrutabor@skeeve.com F + -| Samuel 555-3430 samuel.lanceolis@shu.edu A In an `awk' rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed @@ -1862,7 +1860,7 @@ is to print all lines that match the pattern. Thus, we could leave out the action (the `print' statement and the curly braces) in the previous example and the result would be the same: -`awk' prints all lines matching the pattern `foo'. By comparison, +`awk' prints all lines matching the pattern `li'. By comparison, omitting the `print' statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed). @@ -1968,25 +1966,19 @@ the string `21'. If a line contains both strings, it is printed twice, once by each rule. This is what happens if we run this program on our two sample data -files, `BBS-list' and `inventory-shipped': +files, `mail-list' and `inventory-shipped': $ awk '/12/ { print $0 } - > /21/ { print $0 }' BBS-list inventory-shipped - -| aardvark 555-5553 1200/300 B - -| alpo-net 555-3412 2400/1200/300 A - -| barfly 555-7685 1200/300 A - -| bites 555-1675 2400/1200/300 A - -| core 555-2912 1200/300 C - -| fooey 555-1234 2400/1200/300 B - -| foot 555-6699 1200/300 B - -| macfoo 555-6480 1200/300 A - -| sdace 555-3430 2400/1200/300 A - -| sabafoo 555-2127 1200/300 C - -| sabafoo 555-2127 1200/300 C + > /21/ { print $0 }' mail-list inventory-shipped + -| Anthony 555-3412 anthony.asserturo@hotmail.com A + -| Camilla 555-2912 camilla.infusarum@skynet.be R + -| Fabius 555-1234 fabius.undevicesimus@ucb.edu F + -| Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R + -| Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R -| Jan 21 36 64 620 -| Apr 21 70 74 514 -Note how the line beginning with `sabafoo' in `BBS-list' was printed +Note how the line beginning with `Jean-Paul' in `mail-list' was printed twice, once for each rule. @@ -2062,7 +2054,7 @@ Most often, each line in an `awk' program is a separate statement or separate rule, like this: awk '/12/ { print $0 } - /21/ { print $0 }' BBS-list inventory-shipped + /21/ { print $0 }' mail-list inventory-shipped However, `gawk' ignores newlines after any of the following symbols and keywords: @@ -2230,7 +2222,7 @@ File: gawk.info, Node: Invoking Gawk, Next: Regexp, Prev: Getting Started, U 2 Running `awk' and `gawk' ************************** -This major node covers how to run awk, both POSIX-standard and +This major node covers how to run `awk', both POSIX-standard and `gawk'-specific command-line options, and what `awk' and `gawk' do with non-option arguments. It then proceeds to cover how `gawk' searches for source files, reading standard input along with other files, @@ -2306,22 +2298,8 @@ The following list describes options mandated by the POSIX standard: `--file SOURCE-FILE' Read `awk' program source from SOURCE-FILE instead of in the first non-option argument. This option may be given multiple times; the - `awk' program consists of the concatenation the contents of each - specified SOURCE-FILE. - -`-i SOURCE-FILE' -`--include SOURCE-FILE' - Read `awk' source library from SOURCE-FILE. This option is - completely equivalent to using the `@include' directive inside - your program. This option is very similar to the `-f' option, but - there are two important differences. First, when `-i' is used, - the program source will not be loaded if it has been previously - loaded, whereas the `-f' will always load the file. Second, - because this option is intended to be used with code libraries, - `gawk' does not recognize such files as constituting main program - input. Thus, after processing an `-i' argument, `gawk' still - expects to find the main source code via the `-f' option or on the - command-line. + `awk' program consists of the concatenation of the contents of + each specified SOURCE-FILE. `-v VAR=VAL' `--assign VAR=VAL' @@ -2450,6 +2428,20 @@ The following list describes options mandated by the POSIX standard: Print a "usage" message summarizing the short and long style options that `gawk' accepts and then exit. +`-i SOURCE-FILE' +`--include SOURCE-FILE' + Read `awk' source library from SOURCE-FILE. This option is + completely equivalent to using the `@include' directive inside + your program. This option is very similar to the `-f' option, but + there are two important differences. First, when `-i' is used, + the program source will not be loaded if it has been previously + loaded, whereas the `-f' will always load the file. Second, + because this option is intended to be used with code libraries, + `gawk' does not recognize such files as constituting main program + input. Thus, after processing an `-i' argument, `gawk' still + expects to find the main source code via the `-f' option or on the + command-line. + `-l LIB' `--load LIB' Load a shared library LIB. This searches for the library using the @@ -2482,7 +2474,7 @@ The following list describes options mandated by the POSIX standard: `--bignum' Force arbitrary precision arithmetic on numbers. This option has no effect if `gawk' is not compiled to use the GNU MPFR and MP - libraries (*note Arbitrary Precision Arithmetic::). + libraries (*note Gawk and MPFR::). `-n' `--non-decimal-data' @@ -2750,13 +2742,14 @@ on the command-line with the `-f' option. In most `awk' implementations, you must supply a precise path name for each program file, unless the file is in the current directory. But in `gawk', if the file name supplied to the `-f' or `-i' options does not contain a -`/', then `gawk' searches a list of directories (called the "search -path"), one by one, looking for a file with the specified name. +directory separator `/', then `gawk' searches a list of directories +(called the "search path"), one by one, looking for a file with the +specified name. The search path is a string consisting of directory names separated by -colons. `gawk' gets its search path from the `AWKPATH' environment +colons(1). `gawk' gets its search path from the `AWKPATH' environment variable. If that variable does not exist, `gawk' uses a default path, -`.:/usr/local/share/awk'.(1) +`.:/usr/local/share/awk'.(2) The search path feature is particularly useful for building libraries of useful `awk' functions. The library files can be placed in a @@ -2797,7 +2790,9 @@ found, and `gawk' no longer needs to use `AWKPATH'. ---------- Footnotes ---------- - (1) Your version of `gawk' may use a different directory; it will + (1) Semicolons on MS-Windows and MS-DOS. + + (2) Your version of `gawk' may use a different directory; it will depend upon how `gawk' was built and installed. The actual directory is the value of `$(datadir)' generated when `gawk' was configured. You probably don't need to worry about this, though. @@ -2849,10 +2844,6 @@ used by regular users. the `gawk' developers for testing and tuning. They are subject to change. The variables are: -`AVG_CHAIN_MAX' - The average number of items `gawk' will maintain on a hash chain - for managing arrays. - `AWK_HASH' If this variable exists with a value of `gst', `gawk' will switch to using the hash function from GNU Smalltalk for managing arrays. @@ -2864,6 +2855,13 @@ change. The variables are: debugging problems on filesystems on non-POSIX operating systems where I/O is performed in records, not in blocks. +`GAWK_MSG_SRC' + If this variable exists, `gawk' includes the source file name and + line number from which warning and/or fatal messages are + generated. Its purpose is to help isolate the source of a + message, since there can be multiple places which produce the same + warning or error message. + `GAWK_NO_DFA' If this variable exists, `gawk' does not use the DFA regexp matcher for "does it match" kinds of tests. This can cause `gawk' to be @@ -2876,6 +2874,14 @@ change. The variables are: This specifies the amount by which `gawk' should grow its internal evaluation stack, when needed. +`INT_CHAIN_MAX' + The average number of items `gawk' will maintain on a hash chain + for managing arrays indexed by integers. + +`STR_CHAIN_MAX' + The average number of items `gawk' will maintain on a hash chain + for managing arrays indexed by strings. + `TIDYMEM' If this variable exists, `gawk' uses the `mtrace()' library calls from GNU LIBC to help track down possible memory leaks. @@ -3052,8 +3058,7 @@ File: gawk.info, Node: Undocumented, Prev: Obsolete, Up: Invoking Gawk 2.10 Undocumented Options and Features ====================================== - Use the Source, Luke! - Obi-Wan + Use the Source, Luke! -- Obi-Wan This minor node intentionally left blank. @@ -3097,14 +3102,14 @@ A regular expression can be used as a pattern by enclosing it in slashes. Then the regular expression is tested against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, the following prints the -second field of each record that contains the string `foo' anywhere in +second field of each record that contains the string `li' anywhere in it: - $ awk '/foo/ { print $2 }' BBS-list - -| 555-1234 + $ awk '/li/ { print $2 }' mail-list + -| 555-5553 + -| 555-0542 -| 555-6699 - -| 555-6480 - -| 555-2127 + -| 555-3430 Regular expressions can also be used in matching expressions. These expressions allow you to specify the string to match against; it need @@ -3876,7 +3881,7 @@ have to be named on the `awk' command line (*note Getline::). * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the `getline' function. * Read Timeout:: Reading input with a timeout. @@ -3912,67 +3917,82 @@ processed, so that the very first record is read with the proper separator. To do this, use the special `BEGIN' pattern (*note BEGIN/END::). For example: - awk 'BEGIN { RS = "/" } - { print $0 }' BBS-list - -changes the value of `RS' to `"/"', before reading any input. This is -a string whose first character is a slash; as a result, records are -separated by slashes. Then the input file is read, and the second rule -in the `awk' program (the action with no pattern) prints each record. -Because each `print' statement adds a newline at the end of its output, -this `awk' program copies the input with each slash changed to a -newline. Here are the results of running the program on `BBS-list': - - $ awk 'BEGIN { RS = "/" } - > { print $0 }' BBS-list - -| aardvark 555-5553 1200 - -| 300 B - -| alpo-net 555-3412 2400 - -| 1200 - -| 300 A - -| barfly 555-7685 1200 - -| 300 A - -| bites 555-1675 2400 - -| 1200 - -| 300 A - -| camelot 555-0542 300 C - -| core 555-2912 1200 - -| 300 C - -| fooey 555-1234 2400 - -| 1200 - -| 300 B - -| foot 555-6699 1200 - -| 300 B - -| macfoo 555-6480 1200 - -| 300 A - -| sdace 555-3430 2400 - -| 1200 - -| 300 A - -| sabafoo 555-2127 1200 - -| 300 C + awk 'BEGIN { RS = "u" } + { print $0 }' mail-list + +changes the value of `RS' to `u', before reading any input. This is a +string whose first character is the letter "u;" as a result, records +are separated by the letter "u." Then the input file is read, and the +second rule in the `awk' program (the action with no pattern) prints +each record. Because each `print' statement adds a newline at the end +of its output, this `awk' program copies the input with each `u' +changed to a newline. Here are the results of running the program on +`mail-list': + + $ awk 'BEGIN { RS = "u" } + > { print $0 }' mail-list + -| Amelia 555-5553 amelia.zodiac + -| sq + -| e@gmail.com F + -| Anthony 555-3412 anthony.assert + -| ro@hotmail.com A + -| Becky 555-7685 becky.algebrar + -| m@gmail.com A + -| Bill 555-1675 bill.drowning@hotmail.com A + -| Broderick 555-0542 broderick.aliq + -| otiens@yahoo.com R + -| Camilla 555-2912 camilla.inf + -| sar + -| m@skynet.be R + -| Fabi + -| s 555-1234 fabi + -| s. + -| ndevicesim + -| s@ + -| cb.ed + -| F + -| J + -| lie 555-6699 j + -| lie.perscr + -| tabor@skeeve.com F + -| Martin 555-6480 martin.codicib + -| s@hotmail.com A + -| Sam + -| el 555-3430 sam + -| el.lanceolis@sh + -| .ed + -| A + -| Jean-Pa + -| l 555-2127 jeanpa + -| l.campanor + -| m@ny + -| .ed + -| R -| -Note that the entry for the `camelot' BBS is not split. In the -original data file (*note Sample Data Files::), the line looks like -this: +Note that the entry for the name `Bill' is not split. In the original +data file (*note Sample Data Files::), the line looks like this: - camelot 555-0542 300 C + Bill 555-1675 bill.drowning@hotmail.com A -It has one baud rate only, so there are no slashes in the record, -unlike the others which have two or more baud rates. In fact, this -record is treated as part of the record for the `core' BBS; the newline +It contains no `u' so there is no reason to split the record, unlike +the others which have one or more occurrences of the `u'. In fact, +this record is treated as part of the previous record; the newline separating them in the output is the original newline in the data file, not the one added by `awk' when it printed the record! Another way to change the record separator is on the command line, using the variable-assignment feature (*note Other Arguments::): - awk '{ print $0 }' RS="/" BBS-list + awk '{ print $0 }' RS="u" mail-list -This sets `RS' to `/' before processing `BBS-list'. +This sets `RS' to `u' before processing `mail-list'. - Using an unusual character such as `/' for the record separator -produces correct behavior in the vast majority of cases. + Using an alphabetic character such as `u' for the record separator +is highly likely to produce strange results. Using an unusual +character such as `/' is more likely to produce correct behavior in the +majority of cases, but there are no guarantees. The moral is: Know Your +Data. There is one unusual case, that occurs when `gawk' is being fully POSIX-compliant (*note Options::). Then, the following (extreme) @@ -4075,18 +4095,23 @@ use for `RS' in this case: BEGIN { RS = "\0" } # whole file becomes one record? `gawk' in fact accepts this, and uses the NUL character for the -record separator. However, this usage is _not_ portable to other `awk' -implementations. +record separator. However, this usage is _not_ portable to most other +`awk' implementations. - All other `awk' implementations(1) store strings internally as -C-style strings. C strings use the NUL character as the string + Almost all other `awk' implementations(1) store strings internally +as C-style strings. C strings use the NUL character as the string terminator. In effect, this means that `RS = "\0"' is the same as `RS = ""'. (d.c.) + It happens that recent versions of `mawk' can use the NUL character +as a record separator. However, this is a special case: `mawk' does not +allow embedded NUL characters in strings. + The best way to treat a whole file as a single record is to simply read the file in, one record at a time, concatenating each record onto the end of the previous ones. + ---------- Footnotes ---------- (1) At least that we know about. @@ -4136,26 +4161,24 @@ get the empty string. (If used in a numeric operation, you get zero.) field, is a special case: it represents the whole input record when you are not interested in specific fields. Here are some more examples: - $ awk '$1 ~ /foo/ { print $0 }' BBS-list - -| fooey 555-1234 2400/1200/300 B - -| foot 555-6699 1200/300 B - -| macfoo 555-6480 1200/300 A - -| sabafoo 555-2127 1200/300 C + $ awk '$1 ~ /li/ { print $0 }' mail-list + -| Amelia 555-5553 amelia.zodiacusque@gmail.com F + -| Julie 555-6699 julie.perscrutabor@skeeve.com F -This example prints each record in the file `BBS-list' whose first -field contains the string `foo'. The operator `~' is called a -"matching operator" (*note Regexp Usage::); it tests whether a string -(here, the field `$1') matches a given regular expression. +This example prints each record in the file `mail-list' whose first +field contains the string `li'. The operator `~' is called a "matching +operator" (*note Regexp Usage::); it tests whether a string (here, the +field `$1') matches a given regular expression. - By contrast, the following example looks for `foo' in _the entire + By contrast, the following example looks for `li' in _the entire record_ and prints the first field and the last field for each matching input record: - $ awk '/foo/ { print $1, $NF }' BBS-list - -| fooey B - -| foot B - -| macfoo A - -| sabafoo C + $ awk '/li/ { print $1, $NF }' mail-list + -| Amelia F + -| Broderick R + -| Julie F + -| Samuel A ---------- Footnotes ---------- @@ -4183,16 +4206,16 @@ For the twentieth record, field number 20 is printed; most likely, the record has fewer than 20 fields, so this prints a blank line. Here is another example of using expressions as field numbers: - awk '{ print $(2*2) }' BBS-list + awk '{ print $(2*2) }' mail-list `awk' evaluates the expression `(2*2)' and uses its value as the number of the field to print. The `*' sign represents multiplication, so the expression `2*2' evaluates to four. The parentheses are used so that the multiplication is done before the `$' operation; they are necessary whenever there is a binary operator in the field-number -expression. This example, then, prints the hours of operation (the -fourth field) for every line of the file `BBS-list'. (All of the `awk' -operators are listed, in order of decreasing precedence, in *note +expression. This example, then, prints the type of relationship (the +fourth field) for every line of the file `mail-list'. (All of the +`awk' operators are listed, in order of decreasing precedence, in *note Precedence::.) If the field number you compute is zero, you get the entire record. @@ -4369,6 +4392,7 @@ File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing * Regexp Field Splitting:: Using regexps as the field separator. * Single Character Fields:: Making each character a separate field. * Command Line Field Separator:: Setting `FS' from the command-line. +* Full Line Fields:: Making the full line be a single field. * Field Splitting Summary:: Some final points and a summary table. The "field separator", which is either a single character or a @@ -4550,7 +4574,7 @@ Options::), if `FS' is the null string, then `gawk' also behaves this way. -File: gawk.info, Node: Command Line Field Separator, Next: Field Splitting Summary, Prev: Single Character Fields, Up: Field Separators +File: gawk.info, Node: Command Line Field Separator, Next: Full Line Fields, Prev: Single Character Fields, Up: Field Separators 4.5.4 Setting `FS' from the Command Line ---------------------------------------- @@ -4588,58 +4612,69 @@ type `-F\t' at the shell, without any quotes, the `\' gets deleted, so TABs and not `t's. Use `-v FS="t"' or `-F"[t]"' on the command line if you really do want to separate your fields with `t's. - As an example, let's use an `awk' program file called `baud.awk' -that contains the pattern `/300/' and the action `print $1': + As an example, let's use an `awk' program file called `edu.awk' that +contains the pattern `/edu/' and the action `print $1': - /300/ { print $1 } + /edu/ { print $1 } Let's also set `FS' to be the `-' character and run the program on -the file `BBS-list'. The following command prints a list of the names -of the bulletin boards that operate at 300 baud and the first three +the file `mail-list'. The following command prints a list of the names +of the people that work at or attend a university, and the first three digits of their phone numbers: - $ awk -F- -f baud.awk BBS-list - -| aardvark 555 - -| alpo - -| barfly 555 - -| bites 555 - -| camelot 555 - -| core 555 - -| fooey 555 - -| foot 555 - -| macfoo 555 - -| sdace 555 - -| sabafoo 555 - -Note the second line of output. The second line in the original file + $ awk -F- -f edu.awk mail-list + -| Fabius 555 + -| Samuel 555 + -| Jean + +Note the third line of output. The third line in the original file looked like this: - alpo-net 555-3412 2400/1200/300 A + Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R - The `-' as part of the system's name was used as the field + The `-' as part of the person's name was used as the field separator, instead of the `-' in the phone number that was originally intended. This demonstrates why you have to be careful in choosing your field and record separators. Perhaps the most common use of a single character as the field separator occurs when processing the Unix system password file. On -many Unix systems, each user has a separate entry in the system password -file, one line per user. The information in these lines is separated -by colons. The first field is the user's login name and the second is -the user's (encrypted or shadow) password. A password file entry might -look like this: +many Unix systems, each user has a separate entry in the system +password file, one line per user. The information in these lines is +separated by colons. The first field is the user's login name and the +second is the user's encrypted or shadow password. (A shadow password +is indicated by the presence of a single `x' in the second field.) A +password file entry might look like this: - arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash + arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash The following program searches the system password file and prints -the entries for users who have no password: +the entries for users whose full name is not indicated: - awk -F: '$2 == ""' /etc/passwd + awk -F: '$5 == ""' /etc/passwd -File: gawk.info, Node: Field Splitting Summary, Prev: Command Line Field Separator, Up: Field Separators +File: gawk.info, Node: Full Line Fields, Next: Field Splitting Summary, Prev: Command Line Field Separator, Up: Field Separators + +4.5.5 Making The Full Line Be A Single Field +-------------------------------------------- + +Occasionally, it's useful to treat the whole input line as a single +field. This can be done easily and portably simply by setting `FS' to +`"\n"' (a newline).(1) + + awk -F'\n' 'PROGRAM' FILES ... -4.5.5 Field-Splitting Summary +When you do this, `$1' is the same as `$0'. + + ---------- Footnotes ---------- + + (1) Thanks to Andrew Schorr for this tip. + + +File: gawk.info, Node: Field Splitting Summary, Prev: Full Line Fields, Up: Field Separators + +4.5.6 Field-Splitting Summary ----------------------------- It is important to remember that when you assign a string constant as @@ -4728,14 +4763,15 @@ File: gawk.info, Node: Constant Size, Next: Splitting By Content, Prev: Field 4.6 Reading Fixed-Width Data ============================ -(This minor node discusses an advanced feature of `awk'. If you are a -novice `awk' user, you might want to skip it on the first reading.) + NOTE: This minor node discusses an advanced feature of `gawk'. If + you are a novice `awk' user, you might want to skip it on the + first reading. -`gawk' provides a facility for dealing with fixed-width fields with no -distinctive field separator. For example, data of this nature arises -in the input for old Fortran programs where numbers are run together, -or in the output of programs that did not anticipate the use of their -output as input for other programs. + `gawk' provides a facility for dealing with fixed-width fields with +no distinctive field separator. For example, data of this nature +arises in the input for old Fortran programs where numbers are run +together, or in the output of programs that did not anticipate the use +of their output as input for other programs. An example of the latter is a table where all the columns are lined up by the use of a variable number of spaces and _empty fields are just @@ -4834,10 +4870,11 @@ File: gawk.info, Node: Splitting By Content, Next: Multiple Line, Prev: Const 4.7 Defining Fields By Content ============================== -(This minor node discusses an advanced feature of `awk'. If you are a -novice `awk' user, you might want to skip it on the first reading.) + NOTE: This minor node discusses an advanced feature of `gawk'. If + you are a novice `awk' user, you might want to skip it on the + first reading. -Normally, when using `FS', `gawk' defines the fields as the parts of + Normally, when using `FS', `gawk' defines the fields as the parts of the record that occur in between each field separator. In other words, `FS' defines what a field _is not_, instead of what a field _is_. However, there are times when you really want to define the fields by @@ -5291,8 +5328,7 @@ File: gawk.info, Node: Getline/Pipe, Next: Getline/Variable/Pipe, Prev: Getli --------------------------------- Omniscience has much to recommend it. Failing that, attention to - details would be useful. - Brian Kernighan + details would be useful. -- Brian Kernighan The output of a command can also be piped into `getline', using `COMMAND | getline'. In this case, the string COMMAND is run as a @@ -5801,13 +5837,29 @@ prints the first and second fields of each input record, separated by a semicolon, with a blank line added after each newline: $ awk 'BEGIN { OFS = ";"; ORS = "\n\n" } - > { print $1, $2 }' BBS-list - -| aardvark;555-5553 + > { print $1, $2 }' mail-list + -| Amelia;555-5553 -| - -| alpo-net;555-3412 + -| Anthony;555-3412 + -| + -| Becky;555-7685 + -| + -| Bill;555-1675 + -| + -| Broderick;555-0542 + -| + -| Camilla;555-2912 + -| + -| Fabius;555-1234 + -| + -| Julie;555-6699 + -| + -| Martin;555-6480 + -| + -| Samuel;555-3430 + -| + -| Jean-Paul;555-2127 -| - -| barfly;555-7685 - ... If the value of `ORS' does not contain a newline, the program's output runs together on a single line. @@ -6176,25 +6228,25 @@ File: gawk.info, Node: Printf Examples, Prev: Format Modifiers, Up: Printf The following simple example shows how to use `printf' to make an aligned table: - awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list + awk '{ printf "%-10s %s\n", $1, $2 }' mail-list -This command prints the names of the bulletin boards (`$1') in the file -`BBS-list' as a string of 10 characters that are left-justified. It +This command prints the names of the people (`$1') in the file +`mail-list' as a string of 10 characters that are left-justified. It also prints the phone numbers (`$2') next on the line. This produces an aligned two-column table of names and phone numbers, as shown here: - $ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list - -| aardvark 555-5553 - -| alpo-net 555-3412 - -| barfly 555-7685 - -| bites 555-1675 - -| camelot 555-0542 - -| core 555-2912 - -| fooey 555-1234 - -| foot 555-6699 - -| macfoo 555-6480 - -| sdace 555-3430 - -| sabafoo 555-2127 + $ awk '{ printf "%-10s %s\n", $1, $2 }' mail-list + -| Amelia 555-5553 + -| Anthony 555-3412 + -| Becky 555-7685 + -| Bill 555-1675 + -| Broderick 555-0542 + -| Camilla 555-2912 + -| Fabius 555-1234 + -| Julie 555-6699 + -| Martin 555-6480 + -| Samuel 555-3430 + -| Jean-Paul 555-2127 In this case, the phone numbers had to be printed as strings because the numbers are separated by a dash. Printing the phone numbers as @@ -6212,14 +6264,14 @@ beginning of the `awk' program: awk 'BEGIN { print "Name Number" print "---- ------" } - { printf "%-10s %s\n", $1, $2 }' BBS-list + { printf "%-10s %s\n", $1, $2 }' mail-list The above example mixes `print' and `printf' statements in the same program. Using just `printf' statements can produce the same results: awk 'BEGIN { printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" } - { printf "%-10s %s\n", $1, $2 }' BBS-list + { printf "%-10s %s\n", $1, $2 }' mail-list Printing each column heading with the same format specification used for the column elements ensures that the headings are aligned just like @@ -6231,7 +6283,7 @@ be emphasized by storing it in a variable, like this: awk 'BEGIN { format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" } - { printf format, $1, $2 }' BBS-list + { printf format, $1, $2 }' mail-list At this point, it would be a worthwhile exercise to use the `printf' statement to line up the headings and table data for the @@ -6271,19 +6323,19 @@ work identically for `printf': the same OUTPUT-FILE do not erase OUTPUT-FILE, but append to it. (This is different from how you use redirections in shell scripts.) If OUTPUT-FILE does not exist, it is created. For example, here - is how an `awk' program can write a list of BBS names to one file - named `name-list', and a list of phone numbers to another file + is how an `awk' program can write a list of peoples' names to one + file named `name-list', and a list of phone numbers to another file named `phone-list': $ awk '{ print $2 > "phone-list" - > print $1 > "name-list" }' BBS-list + > print $1 > "name-list" }' mail-list $ cat phone-list -| 555-5553 -| 555-3412 ... $ cat name-list - -| aardvark - -| alpo-net + -| Amelia + -| Anthony ... Each output file contains one name or number per line. @@ -6304,12 +6356,12 @@ work identically for `printf': The redirection argument COMMAND is actually an `awk' expression. Its value is converted to a string whose contents give the shell command to be run. For example, the following produces two files, - one unsorted list of BBS names, and one list sorted in reverse + one unsorted list of peoples' names, and one list sorted in reverse alphabetical order: awk '{ print $1 > "names.unsorted" command = "sort -r > names.sorted" - print $1 | command }' BBS-list + print $1 | command }' mail-list The unsorted list is written with an ordinary redirection, while the sorted list is written by piping through the `sort' utility. @@ -7039,16 +7091,16 @@ assignment is performed at a time determined by its position among the input file arguments--after the processing of the preceding input file argument. For example: - awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list + awk '{ print $n }' n=4 inventory-shipped n=2 mail-list prints the value of field number `n' for all input records. Before the first file is read, the command line sets the variable `n' equal to four. This causes the fourth field to be printed in lines from `inventory-shipped'. After the first file has finished, but before the second file is started, `n' is set to two, so that the second field is -printed in lines from `BBS-list': +printed in lines from `mail-list': - $ awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list + $ awk '{ print $n }' n=4 inventory-shipped n=2 mail-list -| 15 -| 24 ... @@ -7095,7 +7147,7 @@ controlled by the `awk' built-in variable `CONVFMT' (*note Built-in Variables::). Numbers are converted using the `sprintf()' function with `CONVFMT' as the format specifier (*note String Functions::). - `CONVFMT''s default value is `"%.6g"', which prints a value with at + `CONVFMT''s default value is `"%.6g"', which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number's value @@ -7143,7 +7195,7 @@ decimal point when reading the `awk' program source code, and for command-line variable assignments (*note Other Arguments::). However, when interpreting input data, for `print' and `printf' output, and for number to string conversion, the local decimal point character is used. -(d.c.). Here are some examples indicating the difference in behavior, +(d.c.) Here are some examples indicating the difference in behavior, on a GNU/Linux system: $ export POSIXLY_CORRECT=1 Force POSIX behavior @@ -7302,25 +7354,24 @@ File: gawk.info, Node: Concatenation, Next: Assignment Ops, Prev: Arithmetic 6.2.2 String Concatenation -------------------------- - It seemed like a good idea at the time. - Brian Kernighan + It seemed like a good idea at the time. -- Brian Kernighan There is only one string operation: concatenation. It does not have a specific operator to represent it. Instead, concatenation is performed by writing expressions next to one another, with no operator. For example: - $ awk '{ print "Field number one: " $1 }' BBS-list - -| Field number one: aardvark - -| Field number one: alpo-net + $ awk '{ print "Field number one: " $1 }' mail-list + -| Field number one: Amelia + -| Field number one: Anthony ... Without the space in the string constant after the `:', the line runs together. For example: - $ awk '{ print "Field number one:" $1 }' BBS-list - -| Field number one:aardvark - -| Field number one:alpo-net + $ awk '{ print "Field number one:" $1 }' mail-list + -| Field number one:Amelia + -| Field number one:Anthony ... Because string concatenation does not have an explicit operator, it @@ -7609,8 +7660,7 @@ is a summary of increment and decrement expressions: Operator Evaluation Order Doctor, doctor! It hurts when I do this! - So don't do that! - Groucho Marx + So don't do that! -- Groucho Marx What happens for something like the following? @@ -7691,8 +7741,8 @@ File: gawk.info, Node: Typing and Comparison, Next: Boolean Ops, Prev: Truth 6.3.2 Variable Typing and Comparison Expressions ------------------------------------------------ - The Guide is definitive. Reality is frequently inaccurate. - The Hitchhiker's Guide to the Galaxy + The Guide is definitive. Reality is frequently inaccurate. -- The + Hitchhiker's Guide to the Galaxy Unlike other programming languages, `awk' variables do not have a fixed type. Instead, they can be either a number or a string, depending @@ -7872,7 +7922,6 @@ of error is very difficult to spot when scanning the source code. string comparison (true) `a = 2; b = " +2"' - `a == b' string comparison (false) @@ -7968,9 +8017,9 @@ Boolean operators are: `BOOLEAN1 && BOOLEAN2' True if both BOOLEAN1 and BOOLEAN2 are true. For example, the following statement prints the current input record if it contains - both `2400' and `foo': + both `edu' and `li': - if ($0 ~ /2400/ && $0 ~ /foo/) print + if ($0 ~ /edu/ && $0 ~ /li/) print The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true. This can make a difference when BOOLEAN2 contains expressions that @@ -7981,9 +8030,9 @@ Boolean operators are: `BOOLEAN1 || BOOLEAN2' True if at least one of BOOLEAN1 or BOOLEAN2 is true. For example, the following statement prints all records in the input - that contain _either_ `2400' or `foo' or both: + that contain _either_ `edu' or `li' or both: - if ($0 ~ /2400/ || $0 ~ /foo/) print + if ($0 ~ /edu/ || $0 ~ /li/) print The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false. This can make a difference when BOOLEAN2 contains expressions that @@ -8403,56 +8452,53 @@ operand is either a constant regular expression enclosed in slashes (`/REGEXP/'), or any expression whose string value is used as a dynamic regular expression (*note Computed Regexps::). The following example prints the second field of each input record whose first field is -precisely `foo': +precisely `li': - $ awk '$1 == "foo" { print $2 }' BBS-list + $ awk '$1 == "li" { print $2 }' mail-list -(There is no output, because there is no BBS site with the exact name -`foo'.) Contrast this with the following regular expression match, -which accepts any record with a first field that contains `foo': +(There is no output, because there is no person with the exact name +`li'.) Contrast this with the following regular expression match, which +accepts any record with a first field that contains `li': - $ awk '$1 ~ /foo/ { print $2 }' BBS-list - -| 555-1234 + $ awk '$1 ~ /foo/ { print $2 }' mail-list + -| 555-5553 -| 555-6699 - -| 555-6480 - -| 555-2127 A regexp constant as a pattern is also a special case of an -expression pattern. The expression `/foo/' has the value one if `foo' -appears in the current input record. Thus, as a pattern, `/foo/' -matches any record containing `foo'. +expression pattern. The expression `/li/' has the value one if `li' +appears in the current input record. Thus, as a pattern, `/li/' matches +any record containing `li'. Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match. For example, the following command prints all the records in -`BBS-list' that contain both `2400' and `foo': - - $ awk '/2400/ && /foo/' BBS-list - -| fooey 555-1234 2400/1200/300 B - - The following command prints all records in `BBS-list' that contain -_either_ `2400' or `foo' (or both, of course): - - $ awk '/2400/ || /foo/' BBS-list - -| alpo-net 555-3412 2400/1200/300 A - -| bites 555-1675 2400/1200/300 A - -| fooey 555-1234 2400/1200/300 B - -| foot 555-6699 1200/300 B - -| macfoo 555-6480 1200/300 A - -| sdace 555-3430 2400/1200/300 A - -| sabafoo 555-2127 1200/300 C - - The following command prints all records in `BBS-list' that do _not_ -contain the string `foo': - - $ awk '! /foo/' BBS-list - -| aardvark 555-5553 1200/300 B - -| alpo-net 555-3412 2400/1200/300 A - -| barfly 555-7685 1200/300 A - -| bites 555-1675 2400/1200/300 A - -| camelot 555-0542 300 C - -| core 555-2912 1200/300 C - -| sdace 555-3430 2400/1200/300 A +`mail-list' that contain both `edu' and `li': + + $ awk '/edu/ && /li/' mail-list + -| Samuel 555-3430 samuel.lanceolis@shu.edu A + + The following command prints all records in `mail-list' that contain +_either_ `edu' or `li' (or both, of course): + + $ awk '/edu/ || /li/' mail-list + -| Amelia 555-5553 amelia.zodiacusque@gmail.com F + -| Broderick 555-0542 broderick.aliquotiens@yahoo.com R + -| Fabius 555-1234 fabius.undevicesimus@ucb.edu F + -| Julie 555-6699 julie.perscrutabor@skeeve.com F + -| Samuel 555-3430 samuel.lanceolis@shu.edu A + -| Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R + + The following command prints all records in `mail-list' that do +_not_ contain the string `li': + + $ awk '! /li/' mail-list + -| Anthony 555-3412 anthony.asserturo@hotmail.com A + -| Becky 555-7685 becky.algebrarum@gmail.com A + -| Bill 555-1675 bill.drowning@hotmail.com A + -| Camilla 555-2912 camilla.infusarum@skynet.be R + -| Fabius 555-1234 fabius.undevicesimus@ucb.edu F + -| Martin 555-6480 martin.codicibus@hotmail.com A + -| Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R The subexpressions of a Boolean operator in a pattern can be constant regular expressions, comparisons, or any other `awk' @@ -8529,6 +8575,10 @@ worked around; range patterns do not combine with other patterns: error--> gawk: cmd. line:1: (/1/,/2/) || /Yes/ error--> gawk: cmd. line:1: ^ syntax error + As a minor point of interest, although it is poor style, POSIX +allows you to put a newline after the comma in a range pattern. +(d.c.) + File: gawk.info, Node: BEGIN/END, Next: BEGINFILE/ENDFILE, Prev: Ranges, Up: Pattern Overview @@ -8559,19 +8609,19 @@ read. Likewise, an `END' rule is executed once only, after all the input is read. For example: $ awk ' - > BEGIN { print "Analysis of \"foo\"" } - > /foo/ { ++n } - > END { print "\"foo\" appears", n, "times." }' BBS-list - -| Analysis of "foo" - -| "foo" appears 4 times. - - This program finds the number of records in the input file `BBS-list' -that contain the string `foo'. The `BEGIN' rule prints a title for the -report. There is no need to use the `BEGIN' rule to initialize the -counter `n' to zero, since `awk' does this automatically (*note -Variables::). The second rule increments the variable `n' every time a -record containing the pattern `foo' is read. The `END' rule prints the -value of `n' at the end of the run. + > BEGIN { print "Analysis of \"li\"" } + > /li/ { ++n } + > END { print "\"li\" appears in", n, "records." }' mail-list + -| Analysis of "li" + -| "li" appears in 4 records. + + This program finds the number of records in the input file +`mail-list' that contain the string `li'. The `BEGIN' rule prints a +title for the report. There is no need to use the `BEGIN' rule to +initialize the counter `n' to zero, since `awk' does this automatically +(*note Variables::). The second rule increments the variable `n' every +time a record containing the pattern `li' is read. The `END' rule +prints the value of `n' at the end of the run. The special patterns `BEGIN' and `END' cannot be used in ranges or with Boolean operators (indeed, they cannot be used with any operators). @@ -8722,7 +8772,7 @@ File: gawk.info, Node: Empty, Prev: BEGINFILE/ENDFILE, Up: Pattern Overview An empty (i.e., nonexistent) pattern is considered to match _every_ input record. For example, the program: - awk '{ print $1 }' BBS-list + awk '{ print $1 }' mail-list prints the first field of every record. @@ -9086,9 +9136,11 @@ File: gawk.info, Node: Switch Statement, Next: Break Statement, Prev: For Sta 7.4.5 The `switch' Statement ---------------------------- -The `switch' statement allows the evaluation of an expression and the -execution of statements based on a `case' match. Case statements are -checked for a match in the order they are defined. If no suitable +This minor node describes a `gawk'-specific feature. + + The `switch' statement allows the evaluation of an expression and +the execution of statements based on a `case' match. Case statements +are checked for a match in the order they are defined. If no suitable `case' is found, the `default' section is executed, if supplied. Each `case' contains a single constant, be it numeric, string, or @@ -9347,12 +9399,12 @@ listed in `ARGV'. standard. See the Austin Group website (http://austingroupbugs.net/view.php?id=607). - The current version of the Brian Kernighan's `awk' (*note Other -Versions::) also supports `nextfile'. However, it doesn't allow the -`nextfile' statement inside function bodies (*note User-defined::). -`gawk' does; a `nextfile' inside a function body reads the next record -and starts processing it with the first rule in the program, just as -any other `nextfile' statement. + The current version of the Brian Kernighan's `awk', and `mawk' +(*note Other Versions::) also support `nextfile'. However, they don't +allow the `nextfile' statement inside function bodies (*note +User-defined::). `gawk' does; a `nextfile' inside a function body +reads the next record and starts processing it with the first rule in +the program, just as any other `nextfile' statement. File: gawk.info, Node: Exit Statement, Prev: Nextfile Statement, Up: Statements @@ -9495,7 +9547,7 @@ specific to `gawk' are marked with a pound sign (`#'). `FS' This is the input field separator (*note Field Separators::). The - value is a single-character string or a multi-character regular + value is a single-character string or a multicharacter regular expression that matches the separations between fields in an input record. If the value is the null string (`""'), then each character in the record becomes a separate field. (This behavior @@ -9597,7 +9649,7 @@ specific to `gawk' are marked with a pound sign (`#'). This is the subscript separator. It has the default value of `"\034"' and is used to separate the parts of the indices of a multidimensional array. Thus, the expression `foo["A", "B"]' - really accesses `foo["A\034B"]' (*note Multi-dimensional::). + really accesses `foo["A\034B"]' (*note Multidimensional::). `TEXTDOMAIN #' This variable is used for internationalization of programs at the @@ -9636,13 +9688,13 @@ with a pound sign (`#'). $ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] - > }' inventory-shipped BBS-list + > }' inventory-shipped mail-list -| awk -| inventory-shipped - -| BBS-list + -| mail-list `ARGV[0]' contains `awk', `ARGV[1]' contains `inventory-shipped', - and `ARGV[2]' contains `BBS-list'. The value of `ARGC' is three, + and `ARGV[2]' contains `mail-list'. The value of `ARGC' is three, one more than the index of the last element in `ARGV', because the elements are numbered from zero. @@ -9679,9 +9731,18 @@ with a pound sign (`#'). An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For - example, `ENVIRON["HOME"]' might be `/home/arnold'. Changing this - array does not affect the environment passed on to any programs - that `awk' may spawn via redirection or the `system()' function. + example, `ENVIRON["HOME"]' might be `/home/arnold'. + + For POSIX `awk', changing this array does not affect the + environment passed on to any programs that `awk' may spawn via + redirection or the `system()' function. + + However, beginning with version 4.2, if not in POSIX compatibility + mode, `gawk' does update its own environment when `ENVIRON' is + changed, thus changing the environment seen by programs that it + creates. You should therefore be especially careful if you modify + `ENVIRON["PATH"]"', which is the search path for finding + executable programs. Some operating systems may not have environment variables. On such systems, the `ENVIRON' array is empty (except for @@ -9823,8 +9884,8 @@ with a pound sign (`#'). The following additional elements in the array are available to provide information about the MPFR and GMP libraries if your - version of `gawk' supports arbitrary precision numbers (*note - Arbitrary Precision Arithmetic::): + version of `gawk' supports arbitrary precision numbers (*note Gawk + and MPFR::): `PROCINFO["mpfr_version"]' The version of the GNU MPFR library. @@ -9903,7 +9964,7 @@ with a pound sign (`#'). the `delete' statement with the `SYMTAB' array. You may use an index for `SYMTAB' that is not a predefined - identifer: + identifier: SYMTAB["xxx"] = 5 print SYMTAB["xxx"] @@ -9969,13 +10030,13 @@ information contained in `ARGC' and `ARGV': $ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] - > }' inventory-shipped BBS-list + > }' inventory-shipped mail-list -| awk -| inventory-shipped - -| BBS-list + -| mail-list In this example, `ARGV[0]' contains `awk', `ARGV[1]' contains -`inventory-shipped', and `ARGV[2]' contains `BBS-list'. Notice that +`inventory-shipped', and `ARGV[2]' contains `mail-list'. Notice that the `awk' program is not entered in `ARGV'. The other command-line options, with their arguments, are also not entered. This includes variable assignments done with the `-v' option (*note Options::). @@ -10088,7 +10149,7 @@ cannot have a variable and an array with the same name in the same * Numeric Array Subscripts:: How to use numbers as subscripts in `awk'. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in `awk'. * Arrays of Arrays:: True multidimensional arrays. @@ -10120,8 +10181,7 @@ File: gawk.info, Node: Array Intro, Next: Reference to Elements, Up: Array Ba ---------------------------- Doing linear scans over an associative array is like trying to - club someone to death with a loaded Uzi. - Larry Wall + club someone to death with a loaded Uzi. -- Larry Wall The `awk' language provides one-dimensional arrays for storing groups of related strings or numbers. Every `awk' array must have a @@ -10431,42 +10491,45 @@ available: default `awk' behavior. `"@ind_str_asc"' - Order by indices compared as strings; this is the most basic sort. - (Internally, array indices are always strings, so with `a[2*5] = 1' - the index is `"10"' rather than numeric 10.) + Order by indices in ascending order compared as strings; this is + the most basic sort. (Internally, array indices are always + strings, so with `a[2*5] = 1' the index is `"10"' rather than + numeric 10.) `"@ind_num_asc"' - Order by indices but force them to be treated as numbers in the - process. Any index with a non-numeric value will end up - positioned as if it were zero. + Order by indices in ascending order but force them to be treated + as numbers in the process. Any index with a non-numeric value + will end up positioned as if it were zero. `"@val_type_asc"' - Order by element values rather than indices. Ordering is by the - type assigned to the element (*note Typing and Comparison::). All - numeric values come before all string values, which in turn come - before all subarrays. (Subarrays have not been described yet; - *note Arrays of Arrays::). + Order by element values in ascending order (rather than by + indices). Ordering is by the type assigned to the element (*note + Typing and Comparison::). All numeric values come before all + string values, which in turn come before all subarrays. + (Subarrays have not been described yet; *note Arrays of Arrays::.) `"@val_str_asc"' - Order by element values rather than by indices. Scalar values are - compared as strings. Subarrays, if present, come out last. + Order by element values in ascending order (rather than by + indices). Scalar values are compared as strings. Subarrays, if + present, come out last. `"@val_num_asc"' - Order by element values rather than by indices. Scalar values are - compared as numbers. Subarrays, if present, come out last. When - numeric values are equal, the string values are used to provide an - ordering: this guarantees consistent results across different - versions of the C `qsort()' function,(1) which `gawk' uses - internally to perform the sorting. + Order by element values in ascending order (rather than by + indices). Scalar values are compared as numbers. Subarrays, if + present, come out last. When numeric values are equal, the string + values are used to provide an ordering: this guarantees consistent + results across different versions of the C `qsort()' function,(1) + which `gawk' uses internally to perform the sorting. `"@ind_str_desc"' - Reverse order from the most basic sort. + String indices ordered from high to low. `"@ind_num_desc"' Numeric indices ordered from high to low. `"@val_type_desc"' - Element values, based on type, in descending order. + Element values, based on type, ordered from high to low. + Subarrays, if present, come out first. `"@val_str_desc"' Element values, treated as strings, ordered from high to low. @@ -10669,7 +10732,7 @@ knowledge of the actual rules since they can sometimes have a subtle effect on your programs. -File: gawk.info, Node: Uninitialized Subscripts, Next: Multi-dimensional, Prev: Numeric Array Subscripts, Up: Arrays +File: gawk.info, Node: Uninitialized Subscripts, Next: Multidimensional, Prev: Numeric Array Subscripts, Up: Arrays 8.4 Using Uninitialized Variables as Subscripts =============================================== @@ -10717,14 +10780,14 @@ string as a subscript if `--lint' is provided on the command line (*note Options::). -File: gawk.info, Node: Multi-dimensional, Next: Arrays of Arrays, Prev: Uninitialized Subscripts, Up: Arrays +File: gawk.info, Node: Multidimensional, Next: Arrays of Arrays, Prev: Uninitialized Subscripts, Up: Arrays 8.5 Multidimensional Arrays =========================== * Menu: -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. A multidimensional array is an array in which an element is identified by a sequence of indices instead of a single index. For @@ -10803,7 +10866,7 @@ the program produces the following output: 3 2 1 6 -File: gawk.info, Node: Multi-scanning, Up: Multi-dimensional +File: gawk.info, Node: Multiscanning, Up: Multidimensional 8.5.1 Scanning Multidimensional Arrays -------------------------------------- @@ -10843,7 +10906,7 @@ The result is to set `separate[1]' to `"1"' and `separate[2]' to recovered. -File: gawk.info, Node: Arrays of Arrays, Prev: Multi-dimensional, Up: Arrays +File: gawk.info, Node: Arrays of Arrays, Prev: Multidimensional, Up: Arrays 8.6 Arrays of Arrays ==================== @@ -11179,13 +11242,15 @@ File: gawk.info, Node: String Functions, Next: I/O Functions, Prev: Numeric F ----------------------------------- The functions in this minor node look at or change the text of one or -more strings. `gawk' understands locales (*note Locales::), and does -all string processing in terms of _characters_, not _bytes_. This -distinction is particularly important to understand for locales where -one character may be represented by multiple bytes. Thus, for example, -`length()' returns the number of characters in a string, and not the -number of bytes used to represent those characters, Similarly, -`index()' works with character indices, and not byte indices. +more strings. + + `gawk' understands locales (*note Locales::), and does all string +processing in terms of _characters_, not _bytes_. This distinction is +particularly important to understand for locales where one character +may be represented by multiple bytes. Thus, for example, `length()' +returns the number of characters in a string, and not the number of +bytes used to represent those characters. Similarly, `index()' works +with character indices, and not byte indices. In the following list, optional parameters are enclosed in square brackets ([ ]). Several functions perform string substitution; the @@ -11201,26 +11266,27 @@ pound sign (`#'): `gensub()'. `asort(SOURCE [, DEST [, HOW ] ]) #' - Return the number of elements in the array SOURCE. `gawk' sorts - the contents of SOURCE and replaces the indices of the sorted - values of SOURCE with sequential integers starting with one. If - the optional array DEST is specified, then SOURCE is duplicated - into DEST. DEST is then sorted, leaving the indices of SOURCE - unchanged. The optional third argument HOW is a string which - controls the rule for comparing values, and the sort direction. A - single space is required between the comparison mode, `string' or - `number', and the direction specification, `ascending' or - `descending'. You can omit direction and/or mode in which case it - will default to `ascending' and `string', respectively. An empty - string "" is the same as the default `"ascending string"' for the - value of HOW. If the `source' array contains subarrays as values, - they will come out last(first) in the `dest' array for - `ascending'(`descending') order specification. The value of - `IGNORECASE' affects the sorting. The third argument can also be - a user-defined function name in which case the value returned by - the function is used to order the array elements before - constructing the result array. *Note Array Sorting Functions::, - for more information. +`asorti(SOURCE [, DEST [, HOW ] ]) #' + These two functions are similar in behavior, so they are described + together. + + NOTE: The following description ignores the third argument, + HOW, since it requires understanding features that we have + not discussed yet. Thus, the discussion here is a deliberate + simplification. (We do provide all the details later on: + *Note Array Sorting Functions::, for the full story.) + + Both functions return the number of elements in the array SOURCE. + For `asort()', `gawk' sorts the values of SOURCE and replaces the + indices of the sorted values of SOURCE with sequential integers + starting with one. If the optional array DEST is specified, then + SOURCE is duplicated into DEST. DEST is then sorted, leaving the + indices of SOURCE unchanged. + + When comparing strings, `IGNORECASE' affects the sorting (*note + Array Sorting Functions::). If the SOURCE array contains + subarrays as values (*note Arrays of Arrays::), they will come + last, after all scalar values. For example, if the contents of `a' are as follows: @@ -11238,23 +11304,16 @@ pound sign (`#'): a[2] = "de" a[3] = "sac" - In order to reverse the direction of the sorted results in the - above example, `asort()' can be called with three arguments as - follows: + The `asorti()' function works similarly to `asort()', however, the + _indices_ are sorted, instead of the values. Thus, in the previous + example, starting with the same initial set of indices and values + in `a', calling `asorti(a)' would yield: - asort(a, a, "descending") + a[1] = "first" + a[2] = "last" + a[3] = "middle" - The `asort()' function is described in more detail in *note Array - Sorting Functions::. `asort()' is a `gawk' extension; it is not - available in compatibility mode (*note Options::). - -`asorti(SOURCE [, DEST [, HOW ] ]) #' - Return the number of elements in the array SOURCE. It works - similarly to `asort()', however, the _indices_ are sorted, instead - of the values. (Here too, `IGNORECASE' affects the sorting.) - - The `asorti()' function is described in more detail in *note Array - Sorting Functions::. `asorti()' is a `gawk' extension; it is not + `asort()' and `asorti()' are `gawk' extensions; they are not available in compatibility mode (*note Options::). `gensub(REGEXP, REPLACEMENT, HOW [, TARGET]) #' @@ -11266,7 +11325,7 @@ pound sign (`#'): `$0'. It returns the modified string as the result of the function and the original target string is _not_ changed. - `gensub()' is a general substitution function. It's purpose is to + `gensub()' is a general substitution function. Its purpose is to provide more features than the standard `sub()' and `gsub()' functions. @@ -11942,10 +12001,10 @@ parameters are enclosed in square brackets ([ ]): function--`gawk' also buffers its output and the `fflush()' function forces `gawk' to flush its buffers. - `fflush()' was added to Brian Kernighan's version of `awk' in 1994. - For over two decades, it was not part of the POSIX standard. As - of December, 2012, it was accepted for inclusion into the POSIX - standard. See the Austin Group website + `fflush()' was added to Brian Kernighan's version of `awk' in + April of 1992. For two decades, it was not part of the POSIX + standard. As of December, 2012, it was accepted for inclusion + into the POSIX standard. See the Austin Group website (http://austingroupbugs.net/view.php?id=634). POSIX standardizes `fflush()' as follows: If there is no argument, @@ -12148,7 +12207,8 @@ enclosed in square brackets ([ ]): Variables::). The default string value is `"%a %b %e %H:%M:%S %Z %Y"'. This format string produces output that is equivalent to that of the `date' utility. You can assign - a new value to `PROCINFO["strftime"]' to change the default format. + a new value to `PROCINFO["strftime"]' to change the default + format; see below for the various format directives. `systime()' Return the current time as the number of seconds since the system @@ -12417,7 +12477,7 @@ File: gawk.info, Node: Bitwise Functions, Next: Type Functions, Prev: Time Fu 9.1.6 Bit-Manipulation Functions -------------------------------- - I can explain it for you, but I can't understand it for you. + I can explain it for you, but I can't understand it for you. -- Anonymous Many languages provide the ability to perform "bitwise" operations @@ -12561,7 +12621,7 @@ of Arrays::). traversing a multidimensional array: you can test if an element is itself an array or not. The second is inside the body of a user-defined function (not discussed yet; *note User-defined::), to -test if a paramater is an array or not. +test if a parameter is an array or not. Note, however, that using `isarray()' at the global level to test variables makes no sense. Since you are the one writing the program, you @@ -12662,7 +12722,7 @@ a parameter with the same name as the function itself. In addition, according to the POSIX standard, function parameters cannot have the same name as one of the special built-in variables (*note Built-in Variables::. Not all versions of `awk' enforce this -restriction. +restriction.) The BODY-OF-FUNCTION consists of `awk' statements. It is the most important part of the definition, because it says what the function @@ -12705,8 +12765,8 @@ function. When this happens, we say the function is "recursive". The act of a function calling itself is called "recursion". All the built-in functions return a value to their caller. -User-defined functions can do also, using the `return' statement, which -is described in detail in *note Return Statement::. Many of the +User-defined functions can do so also, using the `return' statement, +which is described in detail in *note Return Statement::. Many of the subsequent examples in this minor node use the `return' statement. In many `awk' implementations, including `gawk', the keyword @@ -12776,7 +12836,8 @@ elements in an array and start over with a new list of elements (*note Delete::). Instead of having to repeat this loop everywhere that you need to clear out an array, your program can just call `delarray'. (This guarantees portability. The use of `delete ARRAY' to delete the -contents of an entire array is a nonstandard extension.) +contents of an entire array is a recent(1) addition to the POSIX +standard.) The following is an example of a recursive function. It takes a string as an input parameter and returns the string in backwards order. @@ -12816,13 +12877,19 @@ an `awk' version of `ctime()': return strftime(format, ts) } + ---------- Footnotes ---------- + + (1) Late in 2012. + File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined 9.2.3 Calling User-Defined Functions ------------------------------------ -This section describes how to call a user-defined function. +"Calling a function" means causing the function to run and do its job. +A function call is an expression and its value is the value returned by +the function. * Menu: @@ -12836,16 +12903,12 @@ File: gawk.info, Node: Calling A Function, Next: Variable Scope, Up: Function 9.2.3.1 Writing A Function Call ............................... -"Calling a function" means causing the function to run and do its job. -A function call is an expression and its value is the value returned by -the function. - - A function call consists of the function name followed by the -arguments in parentheses. `awk' expressions are what you write in the -call for the arguments. Each time the call is executed, these -expressions are evaluated, and the values become the actual arguments. -For example, here is a call to `foo()' with three arguments (the first -being a string concatenation): +A function call consists of the function name followed by the arguments +in parentheses. `awk' expressions are what you write in the call for +the arguments. Each time the call is executed, these expressions are +evaluated, and the values become the actual arguments. For example, +here is a call to `foo()' with three arguments (the first being a +string concatenation): foo(x y, "lose", 4 * z) @@ -13240,7 +13303,7 @@ and then a closing right parenthesis, with the addition of a leading `@' character: the_func = "sum" - result = @the_func() # calls the `sum' function + result = @the_func() # calls the sum() function Here is a full program that processes the previously shown data, using indirect function calls. @@ -13391,8 +13454,8 @@ order. Next comes a sorting function. It is parameterized with the starting and ending field numbers and the comparison function. It -builds an array with the data and calls `quicksort' appropriately, and -then formats the results as a single string: +builds an array with the data and calls `quicksort()' appropriately, +and then formats the results as a single string: # do_sort --- sort the data according to `compare' # and return it as a string @@ -13487,7 +13550,7 @@ algorithms and program tasks in a single place. It simplifies programming, making program development more manageable, and making programs more readable. - In their seminal 1976 book, `Software Tools'(1), Brian Kernighan and + In their seminal 1976 book, `Software Tools',(1) Brian Kernighan and P.J. Plauger wrote: Good Programming is not learned from generalities, but by seeing @@ -13592,7 +13655,7 @@ will be accidentally shared with the user's program. In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables--for -example, `_pw_byname' in the user database routines (*note Passwd +example, `_pw_byname()' in the user database routines (*note Passwd Functions::). This convention is recommended, since it even further decreases the chance of inadvertent conflict among variable names. Note that this convention is used equally well for variable names and @@ -13663,6 +13726,7 @@ programming use. vice versa. * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at once. File: gawk.info, Node: Strtonum Function, Next: Assert Function, Up: General Functions @@ -13842,9 +13906,9 @@ File: gawk.info, Node: Round Function, Next: Cliff Random Function, Prev: Ass The way `printf' and `sprintf()' (*note Printf::) perform rounding often depends upon the system's C `sprintf()' subroutine. On many -machines, `sprintf()' rounding is "unbiased," which means it doesn't -always round a trailing `.5' up, contrary to naive expectations. In -unbiased rounding, `.5' rounds to even, rather than always up, so 1.5 +machines, `sprintf()' rounding is "unbiased", which means it doesn't +always round a trailing .5 up, contrary to naive expectations. In +unbiased rounding, .5 rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means that if you are using a format that does rounding (e.g., `"%.0f"'), you should check what your system does. The following function does traditional rounding; it @@ -13878,7 +13942,7 @@ might be useful if your `awk''s `printf' does unbiased rounding: } # test harness - { print $0, round($0) } + # { print $0, round($0) } File: gawk.info, Node: Cliff Random Function, Next: Ordinal Functions, Prev: Round Function, Up: General Functions @@ -13954,8 +14018,8 @@ corresponding character. Both functions are written very nicely in } } - Some explanation of the numbers used by `chr' is worthwhile. The -most prominent character set in use today is ASCII.(1) Although an + Some explanation of the numbers used by `_ord_init()' is worthwhile. +The most prominent character set in use today is ASCII.(1) Although an 8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only defines characters that use the values from 0 to 127.(2) In the now distant past, at least one minicomputer manufacturer used ASCII, but @@ -14005,7 +14069,7 @@ tests such as used here prohibitively expensive. (2) ASCII has been extended in many countries to use the values from 128 to 255 for country-specific characters. If your system uses these -extensions, you can simplify `_ord_init' to loop from 0 to 255. +extensions, you can simplify `_ord_init()' to loop from 0 to 255. File: gawk.info, Node: Join Function, Next: Getlocaltime Function, Prev: Ordinal Functions, Up: General Functions @@ -14055,7 +14119,7 @@ concatenation. The lack of an explicit operator for concatenation makes string operations more difficult than they really need to be. -File: gawk.info, Node: Getlocaltime Function, Prev: Join Function, Up: General Functions +File: gawk.info, Node: Getlocaltime Function, Next: Readfile Function, Prev: Join Function, Up: General Functions 10.2.7 Managing the Time of Day ------------------------------- @@ -14137,6 +14201,66 @@ the `getlocaltime()' function would have allowed the user to supply an optional timestamp value to use instead of the current time. +File: gawk.info, Node: Readfile Function, Prev: Getlocaltime Function, Up: General Functions + +10.2.8 Reading A Whole File At Once +----------------------------------- + +Often, it is convenient to have the entire contents of a file available +in memory as a single string. A straightforward but naive way to do +that might be as follows: + + function readfile(file, tmp, contents) + { + if ((getline tmp < file) < 0) + return + + contents = tmp + while (getline tmp < file) > 0) + contents = contents RT tmp + + close(file) + return contents + } + + This function reads from `file' one record at a time, building up +the full contents of the file in the local variable `contents'. It +works, but is not necessarily efficient. + + The following function, based on a suggestion by Denis Shirokov, +reads the entire contents of the named file in one shot: + + # readfile.awk --- read an entire file at once + + function readfile(file, tmp, save_rs) + { + save_rs = RS + RS = "^$" + getline tmp < file + close(file) + RS = save_rs + + return tmp + } + + It works by setting `RS' to `^$', a regular expression that will +never match if the file has contents. `gawk' reads data from the file +into `tmp' attempting to match `RS'. The match fails after each read, +but fails quickly, such that `gawk' fills `tmp' with the entire +contents of the file. (*Note Records::, for information on `RT' and +`RS'.) + + In the case that `file' is empty, the return value is the null +string. Thus calling code may use something like: + + contents = readfile("/some/path") + if (length(contents) == 0) + # file was empty ... + + This tests the result to see if it is empty or not. An equivalent +test would be `contents == ""'. + + File: gawk.info, Node: Data File Management, Next: Getopt Function, Prev: General Functions, Up: Library Functions 10.3 Data File Management @@ -14386,7 +14510,7 @@ File: gawk.info, Node: Ignoring Assigns, Prev: Empty Files, Up: Data File Man Occasionally, you might not want `awk' to process command-line variable assignments (*note Assignment Options::). In particular, if you have a -file name that contain an `=' character, `awk' treats the file name as +file name that contains an `=' character, `awk' treats the file name as an assignment, and does not process it. Some users have suggested an additional command-line option for @@ -14543,7 +14667,7 @@ characters (*note String Functions::).(1) # <c> a character representing the current option # Private Data: - # _opti -- index in multi-flag option, e.g., -abc + # _opti -- index in multiflag option, e.g., -abc The function starts out with comments presenting a list of the global variables it uses, what the return values are, what they mean, @@ -14882,7 +15006,7 @@ later. The test can only be true for `gawk'. It is false if using `FS' or `FPAT', or on some other `awk' implementation. The code that checks for using `FPAT', using `using_fpat' and -`PROCINFO["FS"]' is similar. +`PROCINFO["FS"]', is similar. The main part of the function uses a loop to read database lines, split the line into fields, and then store the line into each array as @@ -14902,9 +15026,9 @@ create the element with the null string as its value: return _pw_byname[name] } - Similarly, the `getpwuid' function takes a user ID number argument. -If that user number is in the database, it returns the appropriate -line. Otherwise, it returns the null string: + Similarly, the `getpwuid()' function takes a user ID number +argument. If that user number is in the database, it returns the +appropriate line. Otherwise, it returns the null string: function getpwuid(uid) { @@ -15251,8 +15375,8 @@ index and value, use the indirect function call syntax (*note Indirect Calls::) on `process', passing it the index and the value. When calling `walk_array()', you would pass the name of a -user-defined function that expects to receive and index and a value, -and then processes the element. +user-defined function that expects to receive an index and a value, and +then processes the element. File: gawk.info, Node: Sample Programs, Next: Advanced Features, Prev: Library Functions, Up: Top @@ -15513,7 +15637,7 @@ fields to print are `$1', `$3', and `$5'. The intermediate fields are the fields to print, and `t' tracks the complete field list, including filler fields: - function set_charlist( field, i, j, f, g, t, + function set_charlist( field, i, j, f, g, n, m, t, filler, last, len) { field = 1 # count total fields @@ -15917,9 +16041,9 @@ groups in the `PROCINFO' array have the indices `"group1"' through However, we don't know in advance how many of these groups there are. This loop works by starting at one, concatenating the value with -`"group"', and then using `in' to see if that value is in the array. -Eventually, `i' is incremented past the last group in the array and the -loop exits. +`"group"', and then using `in' to see if that value is in the array +(*note Reference to Elements::). Eventually, `i' is incremented past +the last group in the array and the loop exits. The loop is also correct if there are _no_ supplementary groups; then the condition is false the first time it's tested, and the loop @@ -16566,8 +16690,10 @@ File: gawk.info, Node: Alarm Program, Next: Translate Program, Prev: Dupword 11.3.2 An Alarm Clock Program ----------------------------- - Nothing cures insomnia like a ringing alarm clock. - Arnold Robbins + Nothing cures insomnia like a ringing alarm clock. -- Arnold + Robbins + + Sleep is for web developers. -- Erik Quanstrom The following program is a simple "alarm clock" program. You give it a time of day and an optional message. At the specified time, it @@ -16811,10 +16937,10 @@ program. ---------- Footnotes ---------- - (1) On some older systems, `tr' may require that the lists be -written as range expressions enclosed in square brackets (`[a-z]') and -quoted, to prevent the shell from attempting a file name expansion. -This is not a feature. + (1) On some older systems, including Solaris, `tr' may require that +the lists be written as range expressions enclosed in square brackets +(`[a-z]') and quoted, to prevent the shell from attempting a file name +expansion. This is not a feature. (2) This program was written before `gawk' acquired the ability to split each character in a string into separate array elements. @@ -17124,11 +17250,11 @@ are simply removed. `extract.awk' uses the `join()' library function (*note Join Function::). The example programs in the online Texinfo source for `GAWK: -Effective AWK Programming' (`gawk.texi') have all been bracketed inside -`file' and `endfile' lines. The `gawk' distribution uses a copy of -`extract.awk' to extract the sample programs and install many of them -in a standard directory where `gawk' can find them. The Texinfo file -looks something like this: +Effective AWK Programming' (`gawktexi.in') have all been bracketed +inside `file' and `endfile' lines. The `gawk' distribution uses a copy +of `extract.awk' to extract the sample programs and install many of +them in a standard directory where `gawk' can find them. The Texinfo +file looks something like this: ... This program has a @code{BEGIN} rule, @@ -17880,8 +18006,8 @@ File: gawk.info, Node: Advanced Features, Next: Internationalization, Prev: S ****************************** Write documentation as if whoever reads it is a violent psychopath - who knows where you live. - Steve English, as quoted by Peter Langston + who knows where you live. -- Steve English, as quoted by Peter + Langston This major node discusses advanced features in `gawk'. It's a bit of a "grab bag" of items that are otherwise unrelated to each other. @@ -18157,7 +18283,7 @@ seemingly ordered data: function cmp_randomize(i1, v1, i2, v2) { - # random order + # random order (caution: this may never terminate!) return (2 - 4 * rand()) } @@ -18171,7 +18297,7 @@ elements with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, so consider it only if necessary. The following comparison functions force a deterministic order, and are based on the -fact that the indices of two elements are never equal: +fact that the (string) indices of two elements are never equal: function cmp_numeric(i1, v1, i2, v2) { @@ -18230,9 +18356,9 @@ functions (*note String Functions::) for sorting arrays. For example: After the call to `asort()', the array `data' is indexed from 1 to some number N, the total number of elements in `data'. (This count is `asort()''s return value.) `data[1]' <= `data[2]' <= `data[3]', and so -on. The comparison is based on the type of the elements (*note Typing -and Comparison::). All numeric values come before all string values, -which in turn come before all subarrays. +on. The default comparison is based on the type of the elements (*note +Typing and Comparison::). All numeric values come before all string +values, which in turn come before all subarrays. An important side effect of calling `asort()' is that _the array's original indices are irrevocably lost_. As this isn't always @@ -18247,21 +18373,11 @@ desirable, `asort()' accepts a second argument: and then sorts `dest', destroying its indices. However, the `source' array is not affected. - `asort()' accepts a third string argument to control comparison of -array elements. As with `PROCINFO["sorted_in"]', this argument may be -one of the predefined names that `gawk' provides (*note Controlling -Scanning::), or the name of a user-defined function (*note Controlling -Array Traversal::). - - NOTE: In all cases, the sorted element values consist of the - original array's element values. The ability to control - comparison merely affects the way in which they are sorted. - Often, what's needed is to sort on the values of the _indices_ instead of the values of the elements. To do that, use the `asorti()' -function. The interface is identical to that of `asort()', except that -the index values are used for sorting, and become the values of the -result array: +function. The interface and behavior are identical to that of +`asort()', except that the index values are used for sorting, and +become the values of the result array: { source[$0] = some_func($0) } @@ -18276,32 +18392,41 @@ result array: } } - Similar to `asort()', in all cases, the sorted element values -consist of the original array's indices. The ability to control -comparison merely affects the way in which they are sorted. - - Sorting the array by replacing the indices provides maximal -flexibility. To traverse the elements in decreasing order, use a loop -that goes from N down to 1, either over the elements or over the -indices.(1) - - Copying array indices and elements isn't expensive in terms of -memory. Internally, `gawk' maintains "reference counts" to data. For -example, when `asort()' copies the first array to the second one, there -is only one copy of the original array elements' data, even though both -arrays use the values. + So far, so good. Now it starts to get interesting. Both `asort()' +and `asorti()' accept a third string argument to control comparison of +array elements. In *note String Functions::, we ignored this third +argument; however, the time has now come to describe how this argument +affects these two functions. + + Basically, the third argument specifies how the array is to be +sorted. There are two possibilities. As with `PROCINFO["sorted_in"]', +this argument may be one of the predefined names that `gawk' provides +(*note Controlling Scanning::), or it may be the name of a user-defined +function (*note Controlling Array Traversal::). + + In the latter case, _the function can compare elements in any way it +chooses_, taking into account just the indices, just the values, or +both. This is extremely powerful. + + Once the array is sorted, `asort()' takes the _values_ in their +final order, and uses them to fill in the result array, whereas +`asorti()' takes the _indices_ in their final order, and uses them to +fill in the result array. + + NOTE: Copying array indices and elements isn't expensive in terms + of memory. Internally, `gawk' maintains "reference counts" to + data. For example, when `asort()' copies the first array to the + second one, there is only one copy of the original array elements' + data, even though both arrays use the values. Because `IGNORECASE' affects string comparisons, the value of `IGNORECASE' also affects sorting for both `asort()' and `asorti()'. Note also that the locale's sorting order does _not_ come into play; -comparisons are based on character values only.(2) Caveat Emptor. +comparisons are based on character values only.(1) Caveat Emptor. ---------- Footnotes ---------- - (1) You may also use one of the predefined sorting names that sorts -in decreasing order. - - (2) This is true because locale-based comparison occurs only when in + (1) This is true because locale-based comparison occurs only when in POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk' extensions, they are not available in that case. @@ -18438,7 +18563,8 @@ regular pipes. ---------- Footnotes ---------- - (1) This is very different from the same operator in the C shell. + (1) This is very different from the same operator in the C shell and +in Bash. File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way I/O, Up: Advanced Features @@ -18576,56 +18702,64 @@ First, the `awk' program: junk Here is the `awkprof.out' that results from running the `gawk' -profiler on this program and data (this example also illustrates that -`awk' programmers sometimes have to work late): +profiler on this program and data. (This example also illustrates that +`awk' programmers sometimes get up very early in the morning to work.) - # gawk profile, created Sun Aug 13 00:00:15 2000 + # gawk profile, created Thu Feb 27 05:16:21 2014 - # BEGIN block(s) + # BEGIN block(s) - BEGIN { - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - } + BEGIN { + 1 print "First BEGIN rule" + } - # Rule(s) + BEGIN { + 1 print "Second BEGIN rule" + } - 5 /foo/ { # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) { - 6 sing() - } - } + # Rule(s) - 5 { - 5 if (/foo/) { # 2 - 2 print "if is true" - 3 } else { - 3 print "else is true" - } - } + 5 /foo/ { # 2 + 2 print "matched /foo/, gosh" + 6 for (i = 1; i <= 3; i++) { + 6 sing() + } + } - # END block(s) + 5 { + 5 if (/foo/) { # 2 + 2 print "if is true" + 3 } else { + 3 print "else is true" + } + } - END { - 1 print "First END rule" - 1 print "Second END rule" - } + # END block(s) - # Functions, listed alphabetically + END { + 1 print "First END rule" + } + + END { + 1 print "Second END rule" + } - 6 function sing(dummy) - { - 6 print "I gotta be me!" - } + + # Functions, listed alphabetically + + 6 function sing(dummy) + { + 6 print "I gotta be me!" + } This example illustrates many of the basic features of profiling output. They are as follows: - * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule, - pattern/action rules, `ENDFILE' rule, `END' rule and functions, - listed alphabetically. Multiple `BEGIN' and `END' rules are - merged together, as are multiple `BEGINFILE' and `ENDFILE' rules. + * The program is printed in the order `BEGIN' rules, `BEGINFILE' + rules, pattern/action rules, `ENDFILE' rules, `END' rules and + functions, listed alphabetically. Multiple `BEGIN' and `END' + rules retain their separate identities, as do multiple `BEGINFILE' + and `ENDFILE' rules. * Pattern-action rules have two counts. The first count, to the left of the rule, shows how many times the rule's pattern was @@ -18676,8 +18810,7 @@ you typed when you wrote it. This is because `gawk' creates the profiled version by "pretty printing" its internal representation of the program. The advantage to this is that `gawk' can produce a standard representation. The disadvantage is that all source-code -comments are lost, as are the distinctions among multiple `BEGIN', -`END', `BEGINFILE', and `ENDFILE' rules. Also, things such as: +comments are lost. Also, things such as: /foo/ @@ -18736,6 +18869,9 @@ by the `Ctrl-<\>' key. called this way, `gawk' "pretty prints" the program into `awkprof.out', without any execution counts. + NOTE: The `--pretty-print' option still runs your program. This + will change in the next major release. + File: gawk.info, Node: Internationalization, Next: Debugger, Prev: Advanced Features, Up: Top @@ -19029,9 +19165,9 @@ File: gawk.info, Node: Translator i18n, Next: I18N Example, Prev: Programmer =============================== Once a program's translatable strings have been marked, they must be -extracted to create the initial `.po' file. As part of translation, it -is often helpful to rearrange the order in which arguments to `printf' -are output. +extracted to create the initial `.pot' file. As part of translation, +it is often helpful to rearrange the order in which arguments to +`printf' are output. `gawk''s `--gen-pot' command-line option extracts the messages and is discussed next. After that, `printf''s ability to rearrange the @@ -19104,7 +19240,7 @@ second: $ gawk 'BEGIN { > string = "Dont Panic" - > printf _"%2$d characters live in \"%1$s\"\n", + > printf "%2$d characters live in \"%1$s\"\n", > string, length(string) > }' -| 10 characters live in "Dont Panic" @@ -19129,7 +19265,7 @@ precision capability: `gawk' does not allow you to mix regular format specifiers and those with positional specifiers in the same string: - $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }' + $ gawk 'BEGIN { printf "%d %3$s\n", 1, 2, "hi" }' error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or none NOTE: There are some pathological cases that `gawk' may fail to @@ -19491,7 +19627,7 @@ File: gawk.info, Node: Debugger Invocation, Next: Finding The Bug, Up: Sample 14.2.1 How to Start the Debugger -------------------------------- -Starting the debugger is almost exactly like running `awk', except you +Starting the debugger is almost exactly like running `gawk', except you have to pass an additional option `--debug' or the corresponding short option `-D'. The file(s) containing the program and any supporting code are given on the command line as arguments to one or more `-f' @@ -19604,8 +19740,8 @@ our test input above. Let's look at `NR': -| NR = number (2) So we can see that `are_equal()' was only called for the second record -of the file. Of course, this is because our program contained a rule -for `NR == 1': +of the file. Of course, this is because our program contains a rule for +`NR == 1': NR == 1 { last = $0 @@ -20378,8 +20514,7 @@ File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Dynamic Extension authority; they tend to believe that all digits of a printed answer are significant. Disillusioned computer users have just the opposite approach; they are constantly afraid that their answers - are almost meaningless. - Donald Knuth(1) + are almost meaningless.(1) -- Donald Knuth This major node discusses issues that you may encounter when performing arithmetic. It begins by discussing some of the general @@ -20508,7 +20643,7 @@ automatic conversion (via `CONVFMT') and from printing (via `OFMT'). what the default string representations show. `CONVFMT''s default value is `"%.6g"', which yields a value with at -least six significant digits. For some applications, you might want to +most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, most of the time, 17 digits is enough to capture a floating-point number's value exactly.(1) @@ -21047,11 +21182,15 @@ need it. arbitrary precision arithmetic. The easiest way to find out is to look at the output of the following command: - $ gawk --version - -| GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) - -| Copyright (C) 1989, 1991-2013 Free Software Foundation. + $ ./gawk --version + -| GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) + -| Copyright (C) 1989, 1991-2014 Free Software Foundation. ... +(You may see different version numbers than what's shown here. That's +OK; what's important is to see that GNU MPFR and GNU MP are listed in +the output.) + `gawk' uses the GNU MPFR (http://www.mpfr.org) and GNU MP (http://gmplib.org) (GMP) libraries for arbitrary precision arithmetic on numbers. So if you do not see the names of these libraries in the @@ -21260,9 +21399,7 @@ File: gawk.info, Node: Changing Precision, Next: Exact Arithmetic, Prev: Floa them to full membership of the high-precision club, or do we treat them and all their associates as second-class citizens? Sometimes the first course is proper, sometimes the second, and it takes - careful analysis to tell which. - - Dirk Laurie(1) + careful analysis to tell which.(1) -- Dirk Laurie `gawk' does not implicitly modify the precision of any previously computed results when the working precision is changed with an @@ -21423,7 +21560,7 @@ floating-point value to begin with: gawk -M 'BEGIN { n = 13.0; print n % 2.0 }' - Note that for the particular example above, there is likely best to + Note that for the particular example above, it is likely best to just use the following: gawk -M 'BEGIN { n = 13; print n % 2 }' @@ -21627,6 +21764,7 @@ This (rather large) minor node describes the API in detail. * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with `gawk'. @@ -21675,6 +21813,8 @@ operations: * Symbol table access: retrieving a global variable, creating one, or changing one. + * Allocating, reallocating, and releasing memory. + * Creating and releasing cached values; this provides an efficient way to use values for multiple variables and can be a big performance win. @@ -21703,10 +21843,8 @@ operations: `EOF' `<stdio.h>' `FILE' `<stdio.h>' `NULL' `<stddef.h>' - `malloc()' `<stdlib.h>' `memcpy()' `<string.h>' `memset()' `<string.h>' - `realloc()' `<stdlib.h>' `size_t' `<sys/types.h>' `struct stat' `<sys/stat.h>' @@ -21732,7 +21870,9 @@ operations: * All pointers filled in by `gawk' are to memory managed by `gawk' and should be treated by the extension as read-only. Memory for _all_ strings passed into `gawk' from the extension _must_ come - from `malloc()' and is managed by `gawk' from then on. + from calling the API-provided function pointers `api_malloc()', + `api_calloc()' or `api_realloc()', and is managed by `gawk' from + then on. * The API defines several simple `struct's that map values as seen from `awk'. A value can be a `double', a string, or an array (as @@ -21770,12 +21910,11 @@ File: gawk.info, Node: General Data Types, Next: Requesting Values, Prev: Ext 16.4.2 General Purpose Data Types --------------------------------- - I have a true love/hate relationship with unions. - Arnold Robbins + I have a true love/hate relationship with unions. -- Arnold + Robbins That's the thing about unions: the compiler will arrange things so - they can accommodate both love and hate. - Chet Ramey + they can accommodate both love and hate. -- Chet Ramey The extension API defines a number of simple types and structures for general purpose use. Additional, more specialized, data structures @@ -21794,11 +21933,8 @@ that use them. allowing `gawk' to use them as it needs to. `typedef enum awk_bool {' - ` awk_false = 0,' - ` awk_true' - `} awk_bool_t;' A simple boolean type. @@ -21808,7 +21944,9 @@ that use them. `} awk_string_t;' This represents a mutable string. `gawk' owns the memory pointed to if it supplied the value. Otherwise, it takes ownership of the - memory pointed to. *Such memory must come from `malloc()'!* + memory pointed to. *Such memory must come from calling the + API-provided function pointers `api_malloc()', `api_calloc()', or + `api_realloc()'!* As mentioned earlier, strings are maintained using the current multibyte encoding. @@ -21913,7 +22051,7 @@ the value. See also the entry for "Cookie" in the *note Glossary::. -File: gawk.info, Node: Requesting Values, Next: Constructor Functions, Prev: General Data Types, Up: Extension API Description +File: gawk.info, Node: Requesting Values, Next: Memory Allocation Functions, Prev: General Data Types, Up: Extension API Description 16.4.3 Requesting Values ------------------------ @@ -21946,46 +22084,43 @@ Requested: Scalar Scalar Scalar false false Table 16.1: Value Types Returned -File: gawk.info, Node: Constructor Functions, Next: Registration Functions, Prev: Requesting Values, Up: Extension API Description +File: gawk.info, Node: Memory Allocation Functions, Next: Constructor Functions, Prev: Requesting Values, Up: Extension API Description -16.4.4 Constructor Functions and Convenience Macros ---------------------------------------------------- +16.4.4 Memory Allocation Functions and Convenience Macros +--------------------------------------------------------- -The API provides a number of "constructor" functions for creating -string and numeric values, as well as a number of convenience macros. -This node presents them all as function prototypes, in the way that -extension code would use them. +The API provides a number of "memory allocation" functions for +allocating memory that can be passed to `gawk', as well as a number of +convenience macros. -`static inline awk_value_t *' -`make_const_string(const char *string, size_t length, awk_value_t *result)' - This function creates a string value in the `awk_value_t' variable - pointed to by `result'. It expects `string' to be a C string - constant (or other string data), and automatically creates a - _copy_ of the data for storage in `result'. It returns `result'. +`void *gawk_malloc(size_t size);' + Call `gawk'-provided `api_malloc()' to allocate storage that may + be passed to `gawk'. -`static inline awk_value_t *' -`make_malloced_string(const char *string, size_t length, awk_value_t *result)' - This function creates a string value in the `awk_value_t' variable - pointed to by `result'. It expects `string' to be a `char *' value - pointing to data previously obtained from `malloc()'. The idea here - is that the data is passed directly to `gawk', which assumes - responsibility for it. It returns `result'. +`void *gawk_calloc(size_t nmemb, size_t size);' + Call `gawk'-provided `api_calloc()' to allocate storage that may + be passed to `gawk'. -`static inline awk_value_t *' -`make_null_string(awk_value_t *result)' - This specialized function creates a null string (the "undefined" - value) in the `awk_value_t' variable pointed to by `result'. It - returns `result'. +`void *gawk_realloc(void *ptr, size_t size);' + Call `gawk'-provided `api_realloc()' to allocate storage that may + be passed to `gawk'. -`static inline awk_value_t *' -`make_number(double num, awk_value_t *result)' - This function simply creates a numeric value in the `awk_value_t' - variable pointed to by `result'. +`void gawk_free(void *ptr);' + Call `gawk'-provided `api_free()' to release storage that was + allocated with `gawk_malloc()', `gawk_calloc()' or + `gawk_realloc()'. - Two convenience macros may be used for allocating storage from -`malloc()' and `realloc()'. If the allocation fails, they cause `gawk' -to exit with a fatal error message. They should be used as if they were -procedure calls that do not return a value. + The API has to provide these functions because it is possible for an +extension to be compiled and linked against a different version of the +C library than was used for the `gawk' executable.(1) If `gawk' were to +use its version of `free()' when the memory came from an unrelated +version of `malloc()', unexpected behavior would likely result. + + Two convenience macros may be used for allocating storage from the +API-provided function pointers `api_malloc()' and `api_realloc()'. If +the allocation fails, they cause `gawk' to exit with a fatal error +message. They should be used as if they were procedure calls that do +not return a value. `#define emalloc(pointer, type, size, message) ...' The arguments to this macro are as follows: @@ -21994,7 +22129,7 @@ procedure calls that do not return a value. `type' The type of the pointer variable, used to create a cast for - the call to `malloc()'. + the call to `api_malloc()'. `size' The total number of bytes to be allocated. @@ -22014,14 +22149,57 @@ procedure calls that do not return a value. make_malloced_string(message, strlen(message), & result); `#define erealloc(pointer, type, size, message) ...' - This is like `emalloc()', but it calls `realloc()', instead of - `malloc()'. The arguments are the same as for the `emalloc()' + This is like `emalloc()', but it calls `api_realloc()', instead of + `api_malloc()'. The arguments are the same as for the `emalloc()' macro. + ---------- Footnotes ---------- + + (1) This is more common on MS-Windows systems, but can happen on +Unix-like systems as well. + + +File: gawk.info, Node: Constructor Functions, Next: Registration Functions, Prev: Memory Allocation Functions, Up: Extension API Description + +16.4.5 Constructor Functions +---------------------------- + +The API provides a number of "constructor" functions for creating +string and numeric values, as well as a number of convenience macros. +This node presents them all as function prototypes, in the way that +extension code would use them. + +`static inline awk_value_t *' +`make_const_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a C string + constant (or other string data), and automatically creates a + _copy_ of the data for storage in `result'. It returns `result'. + +`static inline awk_value_t *' +`make_malloced_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a `char *' value + pointing to data previously obtained from the api-provided + functions `api_malloc()', `api_calloc()' or `api_realloc()'. The + idea here is that the data is passed directly to `gawk', which + assumes responsibility for it. It returns `result'. + +`static inline awk_value_t *' +`make_null_string(awk_value_t *result)' + This specialized function creates a null string (the "undefined" + value) in the `awk_value_t' variable pointed to by `result'. It + returns `result'. + +`static inline awk_value_t *' +`make_number(double num, awk_value_t *result)' + This function simply creates a numeric value in the `awk_value_t' + variable pointed to by `result'. + File: gawk.info, Node: Registration Functions, Next: Printing Messages, Prev: Constructor Functions, Up: Extension API Description -16.4.5 Registration Functions +16.4.6 Registration Functions ----------------------------- This minor node describes the API functions for registering parts of @@ -22039,7 +22217,7 @@ your extension with `gawk'. File: gawk.info, Node: Extension Functions, Next: Exit Callback Functions, Up: Registration Functions -16.4.5.1 Registering An Extension Function +16.4.6.1 Registering An Extension Function .......................................... Extension functions are described by the following record: @@ -22064,8 +22242,10 @@ Extension functions are described by the following record: `awk_value_t *(*function)(int num_actual_args, awk_value_t *result);' This is a pointer to the C function that provides the desired functionality. The function must fill in the result with either a - number or a string. `awk' takes ownership of any string memory. - As mentioned earlier, string memory *must* come from `malloc()'. + number or a string. `gawk' takes ownership of any string memory. + As mentioned earlier, string memory *must* come from the + api-provided functions `api_malloc()', `api_calloc()' or + `api_realloc()'. The `num_actual_args' argument tells the C function how many actual parameters were passed from the calling `awk' code. @@ -22091,7 +22271,7 @@ register it with `gawk' using this API function: File: gawk.info, Node: Exit Callback Functions, Next: Extension Version String, Prev: Extension Functions, Up: Registration Functions -16.4.5.2 Registering An Exit Callback Function +16.4.6.2 Registering An Exit Callback Function .............................................. An "exit callback" function is a function that `gawk' calls before it @@ -22120,7 +22300,7 @@ order--that is, in the reverse order in which they are registered with File: gawk.info, Node: Extension Version String, Next: Input Parsers, Prev: Exit Callback Functions, Up: Registration Functions -16.4.5.3 Registering An Extension Version String +16.4.6.3 Registering An Extension Version String ................................................ You can register a version string which indicates the name and version @@ -22136,7 +22316,7 @@ invoked with the `--version' option. File: gawk.info, Node: Input Parsers, Next: Output Wrappers, Prev: Extension Version String, Up: Registration Functions -16.4.5.4 Customized Input Parsers +16.4.6.4 Customized Input Parsers ................................. By default, `gawk' reads text files as its input. It uses the value of @@ -22358,7 +22538,7 @@ whether or not to activate an input parser (*note BEGINFILE/ENDFILE::). File: gawk.info, Node: Output Wrappers, Next: Two-way processors, Prev: Input Parsers, Up: Registration Functions -16.4.5.5 Customized Output Wrappers +16.4.6.5 Customized Output Wrappers ................................... An "output wrapper" is the mirror image of an input parser. It allows @@ -22465,7 +22645,7 @@ normally. File: gawk.info, Node: Two-way processors, Prev: Output Wrappers, Up: Registration Functions -16.4.5.6 Customized Two-way Processors +16.4.6.6 Customized Two-way Processors ...................................... A "two-way processor" combines an input parser and an output wrapper for @@ -22518,7 +22698,7 @@ can take this" and "take over for this" functions, File: gawk.info, Node: Printing Messages, Next: Updating `ERRNO', Prev: Registration Functions, Up: Extension API Description -16.4.6 Printing Messages +16.4.7 Printing Messages ------------------------ You can print different kinds of warning messages from your extension, @@ -22549,7 +22729,7 @@ the pity. File: gawk.info, Node: Updating `ERRNO', Next: Accessing Parameters, Prev: Printing Messages, Up: Extension API Description -16.4.7 Updating `ERRNO' +16.4.8 Updating `ERRNO' ----------------------- The following functions allow you to update the `ERRNO' variable: @@ -22570,7 +22750,7 @@ The following functions allow you to update the `ERRNO' variable: File: gawk.info, Node: Accessing Parameters, Next: Symbol Table Access, Prev: Updating `ERRNO', Up: Extension API Description -16.4.8 Accessing and Updating Parameters +16.4.9 Accessing and Updating Parameters ---------------------------------------- Two functions give you access to the arguments (parameters) passed to @@ -22596,8 +22776,8 @@ your extension function. They are: File: gawk.info, Node: Symbol Table Access, Next: Array Manipulation, Prev: Accessing Parameters, Up: Extension API Description -16.4.9 Symbol Table Access --------------------------- +16.4.10 Symbol Table Access +--------------------------- Two sets of routines provide access to global variables, and one set allows you to create and release cached values. @@ -22611,8 +22791,8 @@ allows you to create and release cached values. File: gawk.info, Node: Symbol table by name, Next: Symbol table by cookie, Up: Symbol Table Access -16.4.9.1 Variable Access and Update by Name -........................................... +16.4.10.1 Variable Access and Update by Name +............................................ The following routines provide the ability to access and update global `awk'-level variables by name. In compiler terminology, identifiers of @@ -22644,11 +22824,16 @@ termed a "symbol table". However, with the exception of the `PROCINFO' array, an extension cannot change any of those variables. + NOTE: It is possible for the lookup of `PROCINFO' to fail. This + happens if the `awk' program being run does not reference + `PROCINFO'; in this case `gawk' doesn't bother to create the array + and populate it. + File: gawk.info, Node: Symbol table by cookie, Next: Cached values, Prev: Symbol table by name, Up: Symbol Table Access -16.4.9.2 Variable Access and Update by Cookie -............................................. +16.4.10.2 Variable Access and Update by Cookie +.............................................. A "scalar cookie" is an opaque handle that provides access to a global variable or array. It is an optimization that avoids looking up @@ -22760,8 +22945,8 @@ like this: File: gawk.info, Node: Cached values, Prev: Symbol table by cookie, Up: Symbol Table Access -16.4.9.3 Creating and Using Cached Values -......................................... +16.4.10.3 Creating and Using Cached Values +.......................................... The routines in this section allow you to create and release cached values. As with scalar cookies, in theory, cached values are not @@ -22771,8 +22956,9 @@ variables using `sym_update()' or `sym_update_scalar()', as you like. However, you can understand the point of cached values if you remember that _every_ string value's storage _must_ come from -`malloc()'. If you have 20 variables, all of which have the same -string value, you must create 20 identical copies of the string.(1) +`api_malloc()', `api_calloc()' or `api_realloc()'. If you have 20 +variables, all of which have the same string value, you must create 20 +identical copies of the string.(1) It is clearly more efficient, if possible, to create a value once, and then tell `gawk' to reuse the value for multiple variables. That is @@ -22855,7 +23041,7 @@ using `release_value()'. File: gawk.info, Node: Array Manipulation, Next: Extension API Variables, Prev: Symbol Table Access, Up: Extension API Description -16.4.10 Array Manipulation +16.4.11 Array Manipulation -------------------------- The primary data structure(1) in `awk' is the associative array (*note @@ -22882,7 +23068,7 @@ arrays of arrays (*note General Data Types::). File: gawk.info, Node: Array Data Types, Next: Array Functions, Up: Array Manipulation -16.4.10.1 Array Data Types +16.4.11.1 Array Data Types .......................... The data types associated with arrays are listed below. @@ -22949,7 +23135,7 @@ overuse this term. File: gawk.info, Node: Array Functions, Next: Flattening Arrays, Prev: Array Data Types, Up: Array Manipulation -16.4.10.2 Array Functions +16.4.11.2 Array Functions ......................... The following functions relate to individual array elements. @@ -22975,7 +23161,8 @@ The following functions relate to individual array elements. strings (*note Conversion::); thus using integral values is safest. As with _all_ strings passed into `gawk' from an extension, the - string value of `index' must come from `malloc()', and `gawk' + string value of `index' must come from the API-provided functions + `api_malloc()', `api_calloc()' or `api_realloc()' and `gawk' releases the storage. `awk_bool_t set_array_element(awk_array_t a_cookie,' @@ -23026,7 +23213,7 @@ The following functions relate to individual array elements. File: gawk.info, Node: Flattening Arrays, Next: Creating Arrays, Prev: Array Functions, Up: Array Manipulation -16.4.10.3 Working With All The Elements of an Array +16.4.11.3 Working With All The Elements of an Array ................................................... To "flatten" an array is create a structure that represents the full @@ -23200,7 +23387,7 @@ return value to success, and returns: File: gawk.info, Node: Creating Arrays, Prev: Flattening Arrays, Up: Array Manipulation -16.4.10.4 How To Create and Populate Arrays +16.4.11.4 How To Create and Populate Arrays ........................................... Besides working with arrays created by `awk' code, you can create @@ -23339,7 +23526,7 @@ environment variable.) File: gawk.info, Node: Extension API Variables, Next: Extension API Boilerplate, Prev: Array Manipulation, Up: Extension API Description -16.4.11 API Variables +16.4.12 API Variables --------------------- The API provides two sets of variables. The first provides information @@ -23356,7 +23543,7 @@ information about how `gawk' was invoked. File: gawk.info, Node: Extension Versioning, Next: Extension API Informational Variables, Up: Extension API Variables -16.4.11.1 API Version Constants and Variables +16.4.12.1 API Version Constants and Variables ............................................. The API provides both a "major" and a "minor" version number. The API @@ -23405,7 +23592,7 @@ Boilerplate::). File: gawk.info, Node: Extension API Informational Variables, Prev: Extension Versioning, Up: Extension API Variables -16.4.11.2 Informational Variables +16.4.12.2 Informational Variables ................................. The API provides access to several variables that describe whether the @@ -23441,7 +23628,7 @@ change during execution. File: gawk.info, Node: Extension API Boilerplate, Prev: Extension API Variables, Up: Extension API Description -16.4.12 Boilerplate Code +16.4.13 Boilerplate Code ------------------------ As mentioned earlier (*note Extension Mechanism Outline::), the function @@ -23558,8 +23745,7 @@ File: gawk.info, Node: Extension Example, Next: Extension Samples, Prev: Find 16.6 Example: Some File Functions ================================= - No matter where you go, there you are. - Buckaroo Bonzai + No matter where you go, there you are. -- Buckaroo Bonzai Two useful functions that are not in `awk' are `chdir()' (so that an `awk' program can change its directory) and `stat()' (so that an `awk' @@ -23960,7 +24146,7 @@ declarations and argument checking: awk_array_t array; int ret; struct stat sbuf; - /* default is stat() */ + /* default is lstat() */ int (*statfunc)(const char *path, struct stat *sbuf) = lstat; assert(result != NULL); @@ -24249,7 +24435,7 @@ follows: The usage is: The `fts()' function provides a hook to the C library `fts()' routines for traversing file hierarchies. Instead of returning data -about one file at a time in a stream, it fills in a multi-dimensional +about one file at a time in a stream, it fills in a multidimensional array with data about each file and directory encountered in the requested hierarchies. @@ -24344,7 +24530,7 @@ Otherwise it returns -1. lack of a comparison function, since `gawk' already provides powerful array sorting facilities. While an `fts_read()'-like interface could have been provided, this felt less natural than - simply creating a multi-dimensional array to represent the file + simply creating a multidimensional array to represent the file hierarchy and its information. See `test/fts.awk' in the `gawk' distribution for an example. @@ -24358,12 +24544,16 @@ File: gawk.info, Node: Extension Sample Fnmatch, Next: Extension Sample Fork, This extension provides an interface to the C library `fnmatch()' function. The usage is: - @load "fnmatch" +`@load "fnmatch"' + This is how you load the extension. - result = fnmatch(pattern, string, flags) +`result = fnmatch(pattern, string, flags)' + The return value is zero on success, `FNM_NOMATCH' if the string + did not match the pattern, or a different non-zero value if an + error occurred. - The `fnmatch' extension adds a single function named `fnmatch()', -one constant (`FNM_NOMATCH'), and an array of flag values named `FNM'. + Besides the `fnmatch()' function, the `fnmatch' extension adds one +constant (`FNM_NOMATCH'), and an array of flag values named `FNM'. The arguments to `fnmatch()' are: @@ -24377,10 +24567,6 @@ one constant (`FNM_NOMATCH'), and an array of flag values named `FNM'. Either zero, or the bitwise OR of one or more of the flags in the `FNM' array. - The return value is zero on success, `FNM_NOMATCH' if the string did -not match the pattern, or a different non-zero value if an error -occurred. - The flags are follows: `FNM["CASEFOLD"]' Corresponds to the `FNM_CASEFOLD' flag as defined in @@ -24416,14 +24602,14 @@ The `fork' extension adds three functions, as follows. This is how you load the extension. `pid = fork()' - This function creates a new process. The return value is the zero - in the child and the process-id number of the child in the parent, - or -1 upon error. In the latter case, `ERRNO' indicates the - problem. In the child, `PROCINFO["pid"]' and `PROCINFO["ppid"]' - are updated to reflect the correct values. + This function creates a new process. The return value is zero in + the child and the process-ID number of the child in the parent, or + -1 upon error. In the latter case, `ERRNO' indicates the problem. + In the child, `PROCINFO["pid"]' and `PROCINFO["ppid"]' are updated + to reflect the correct values. `ret = waitpid(pid)' - This function takes a numeric argument, which is the process-id to + This function takes a numeric argument, which is the process-ID to wait for. The return value is that of the `waitpid()' system call. `ret = wait()' @@ -24656,7 +24842,8 @@ File: gawk.info, Node: Extension Sample Readfile, Next: Extension Sample API T 16.7.10 Reading An Entire File ------------------------------ -The `readfile' extension adds a single function named `readfile()': +The `readfile' extension adds a single function named `readfile()', and +an input parser: `@load "readfile"' This is how you load the extension. @@ -24666,6 +24853,12 @@ The `readfile' extension adds a single function named `readfile()': a string containing the entire contents of the requested file. Upon error, the function returns the empty string and sets `ERRNO'. +`BEGIN { PROCINFO["readfile"] = 1 }' + In addition, the extension adds an input parser that is activated + if `PROCINFO["readfile"]' exists. When activated, each input file + is returned in its entirety as `$0'. `RT' is set to the null + string. + Here is an example: @load "readfile" @@ -24731,11 +24924,13 @@ provides a number of `gawk' extensions, including one for processing XML files. This is the evolution of the original `xgawk' (XML `gawk') project. - As of this writing, there are four extensions: + As of this writing, there are five extensions: * XML parser extension, using the Expat (http://expat.sourceforge.net) XML parsing library. + * PDF extension. + * PostgreSQL extension. * GD graphics library extension. @@ -24815,6 +25010,7 @@ you can find more information. `awk'. * POSIX/GNU:: The extensions in `gawk' not in POSIX `awk'. +* Feature History:: The history of the features in `gawk'. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. * Contributors:: The major contributors to `gawk'. @@ -24882,7 +25078,7 @@ the changes, with cross-references to further details: * Multiple `BEGIN' and `END' rules (*note BEGIN/END::). - * Multidimensional arrays (*note Multi-dimensional::). + * Multidimensional arrays (*note Multidimensional::). File: gawk.info, Node: SVR4, Next: POSIX, Prev: V7/SVR3.1, Up: Language History @@ -24993,7 +25189,7 @@ in his version of `awk'. available in his `awk'. -File: gawk.info, Node: POSIX/GNU, Next: Common Extensions, Prev: BTL, Up: Language History +File: gawk.info, Node: POSIX/GNU, Next: Feature History, Prev: BTL, Up: Language History A.5 Extensions in `gawk' Not in POSIX `awk' =========================================== @@ -25145,12 +25341,397 @@ the current version of `gawk'. - Prestandard VAX C compiler for VAX/VMS + - GCC for VAX and Alpha has not been tested for a while. + -File: gawk.info, Node: Common Extensions, Next: Ranges and Locales, Prev: POSIX/GNU, Up: Language History +File: gawk.info, Node: Feature History, Next: Common Extensions, Prev: POSIX/GNU, Up: Language History + +A.6 History of `gawk' Features +============================== + +This minor node describes the features in `gawk' over and above those +in POSIX `awk', in the order they were added to `gawk'. + + Version 2.10 of `gawk' introduced the following features: + + * The `AWKPATH' environment variable for specifying a path search for + the `-f' command-line option (*note Options::). + + * The `IGNORECASE' variable and its effects (*note + Case-sensitivity::). + + * The `/dev/stdin', `/dev/stdout', `/dev/stderr' and `/dev/fd/N' + special file names (*note Special Files::). + + Version 2.13 of `gawk' introduced the following features: + + * The `FIELDWIDTHS' variable and its effects (*note Constant Size::). + + * The `systime()' and `strftime()' built-in functions for obtaining + and printing timestamps (*note Time Functions::). + + * Additional command-line options (*note Options::): + + - The `-W lint' option to provide error and portability checking + for both the source code and at runtime. + + - The `-W compat' option to turn off the GNU extensions. + + - The `-W posix' option for full POSIX compliance. + + Version 2.14 of `gawk' introduced the following feature: + + * The `next file' statement for skipping to the next data file + (*note Nextfile Statement::). + + Version 2.15 of `gawk' introduced the following features: + + * New variables (*note Built-in Variables::): + + - `ARGIND', which tracks the movement of `FILENAME' through + `ARGV'. + + - `ERRNO', which contains the system error message when + `getline' returns -1 or `close()' fails. + + * The `/dev/pid', `/dev/ppid', `/dev/pgrpid', and `/dev/user' + special file names. These have since been removed. + + * The ability to delete all of an array at once with `delete ARRAY' + (*note Delete::). + + * Command line option changes (*note Options::): + + - The ability to use GNU-style long-named options that start + with `--'. + + - The `--source' option for mixing command-line and library-file + source code. + + Version 3.0 of `gawk' introduced the following features: + + * New or changed variables: + + - `IGNORECASE' changed, now applying to string comparison as + well as regexp operations (*note Case-sensitivity::). + + - `RT', which contains the input text that matched `RS' (*note + Records::). + + * Full support for both POSIX and GNU regexps (*note Regexp::). + + * The `gensub()' function for more powerful text manipulation (*note + String Functions::). + + * The `strftime()' function acquired a default time format, allowing + it to be called with no arguments (*note Time Functions::). + + * The ability for `FS' and for the third argument to `split()' to be + null strings (*note Single Character Fields::). + + * The ability for `RS' to be a regexp (*note Records::). + + * The `next file' statement became `nextfile' (*note Nextfile + Statement::). + + * The `fflush()' function from the Bell Laboratories research + version of `awk' (*note I/O Functions::). + + * New command line options: + + - The `--lint-old' option to warn about constructs that are not + available in the original Version 7 Unix version of `awk' + (*note V7/SVR3.1::). + + - The `-m' option from the Bell Laboratories research version + of `awk' This was later removed. + + - The `--re-interval' option to provide interval expressions in + regexps (*note Regexp Operators::). + + - The `--traditional' option was added as a better name for + `--compat' (*note Options::). + + * The use of GNU Autoconf to control the configuration process + (*note Quick Installation::). + + * Amiga support. + + + Version 3.1 of `gawk' introduced the following features: + + * New variables (*note Built-in Variables::): + + - `BINMODE', for non-POSIX systems, which allows binary I/O for + input and/or output files (*note PC Using::). + + - `LINT', which dynamically controls lint warnings. + + - `PROCINFO', an array for providing process-related + information. + + - `TEXTDOMAIN', for setting an application's + internationalization text domain (*note + Internationalization::). + + * The ability to use octal and hexadecimal constants in `awk' + program source code (*note Nondecimal-numbers::). + + * The `|&' operator for two-way I/O to a coprocess (*note Two-way + I/O::). + + * The `/inet' special files for TCP/IP networking using `|&' (*note + TCP/IP Networking::). + + * The optional second argument to `close()' that allows closing one + end of a two-way pipe to a coprocess (*note Two-way I/O::). + + * The optional third argument to the `match()' function for + capturing text-matching subexpressions within a regexp (*note + String Functions::). + + * Positional specifiers in `printf' formats for making translations + easier (*note Printf Ordering::). + + * A number of new built-in functions: + + - The `asort()' and `asorti()' functions for sorting arrays + (*note Array Sorting::). + + - The `bindtextdomain()', `dcgettext()' and `dcngettext()' + functions for internationalization (*note Programmer i18n::). + + - The `extension()' function and the ability to add new + built-in functions dynamically (*note Dynamic Extensions::). + + - The `mktime()' function for creating timestamps (*note Time + Functions::). + + - The `and()', `or()', `xor()', `compl()', `lshift()', + `rshift()', and `strtonum()' functions (*note Bitwise + Functions::). + + * The support for `next file' as two words was removed completely + (*note Nextfile Statement::). + + * Additional commnd line options (*note Options::): + + - The `--dump-variables' option to print a list of all global + variables. + + - The `--exec' option, for use in CGI scripts. + + - The `--gen-po' command-line option and the use of a leading + underscore to mark strings that should be translated (*note + String Extraction::). + + - The `--non-decimal-data' option to allow non-decimal input + data (*note Nondecimal Data::). + + - The `--profile' option and `pgawk', the profiling version of + `gawk', for producing execution profiles of `awk' programs + (*note Profiling::). + + - The `--use-lc-numeric' option to force `gawk' to use the + locale's decimal point for parsing input data (*note + Conversion::). + + * The use of GNU Automake to help in standardizing the configuration + process (*note Quick Installation::). + + * The use of GNU `gettext' for `gawk''s own message output (*note + Gawk I18N::). -A.6 Common Extensions Summary + * BeOS support. This was later removed. + + * Tandem support. This was later removed. + + * The Atari port became officially unsupported. + + * The source code changed to use ISO C standard-style function + definitions. + + * POSIX compliance for `sub()' and `gsub()' (*note Gory Details::). + + * The `length()' function was extended to accept an array argument + and return the number of elements in the array (*note String + Functions::). + + * The `strftime()' function acquired a third argument to enable + printing times as UTC (*note Time Functions::). + + Version 4.0 of `gawk' introduced the following features: + + * Variable additions: + + - `FPAT', which allows you to specify a regexp that matches the + fields, instead of matching the field separator (*note + Splitting By Content::). + + - If `PROCINFO["sorted_in"]' exists, `for(iggy in foo)' loops + sort the indices before looping over them. The value of this + element provides control over how the indices are sorted + before the loop traversal starts (*note Controlling + Scanning::). + + - `PROCINFO["strftime"]', which holds the default format for + `strftime()' (*note Time Functions::). + + * The special files `/dev/pid', `/dev/ppid', `/dev/pgrpid' and + `/dev/user' were removed. + + * Support for IPv6 was added via the `/inet6' special file. + `/inet4' forces IPv4 and `/inet' chooses the system default, which + is probably IPv4 (*note TCP/IP Networking::). + + * The use of `\s' and `\S' escape sequences in regular expressions + (*note GNU Regexp Operators::). + + * Interval expressions became part of default regular expressions + (*note Regexp Operators::). + + * POSIX character classes work even with `--traditional' (*note + Regexp Operators::). + + * `break' and `continue' became invalid outside a loop, even with + `--traditional' (*note Break Statement::, and also see *note + Continue Statement::). + + * `fflush()', `nextfile', and `delete ARRAY' are allowed if + `--posix' or `--traditional', since they are all now part of POSIX. + + * An optional third argument to `asort()' and `asorti()', specifying + how to sort (*note String Functions::). + + * The behavior of `fflush()' changed to match Brian Kernighan's `awk' + and for POSIX; now both `fflush()' and `fflush("")' flush all open + output redirections (*note I/O Functions::). + + * The `isarray()' function which distinguishes if an item is an array + or not, to make it possible to traverse multidimensional arrays + (*note Type Functions::). + + * The `patsplit()' function which gives the same capability as + `FPAT', for splitting (*note String Functions::). + + * An optional fourth argument to the `split()' function, which is an + array to hold the values of the separators (*note String + Functions::). + + * Arrays of arrays (*note Arrays of Arrays::). + + * The `BEGINFILE' and `ENDFILE' special patterns (*note + BEGINFILE/ENDFILE::). + + * Indirect function calls (*note Indirect Calls::). + + * `switch' / `case' are enabled by default (*note Switch + Statement::). + + * Command line option changes (*note Options::): + + - The `-b' and `--characters-as-bytes' options which prevent + `gawk' from treating input as a multibyte string. + + - The redundant `--compat', `--copyleft', and `--usage' long + options were removed. + + - The `--gen-po' option was finally renamed to the correct + `--gen-pot'. + + - The `--sandbox' option which disables certain features. + + - All long options acquired corresponding short options, for + use in `#!' scripts. + + * Directories named on the command line now produce a warning, not a + fatal error, unless `--posix' or `--traditional' are used (*note + Command line directories::). + + * The `gawk' internals were rewritten, bringing the `dgawk' debugger + and possibly improved performance (*note Debugger::). + + * Per the GNU Coding Standards, dynamic extensions must now define a + global symbol indicating that they are GPL-compatible (*note + Plugin License::). + + * In POSIX mode, string comparisons use `strcoll()' / `wcscoll()' + (*note POSIX String Comparison::). + + * The option for raw sockets was removed, since it was never + implemented (*note TCP/IP Networking::). + + * Ranges of the form `[d-h]' are treated as if they were in the C + locale, no matter what kind of regexp is being used, and even if + `--posix' (*note Ranges and Locales::). + + * Support was removed for the following systems: + + - Atari + + - Amiga + + - BeOS + + - Cray + + - MIPS RiscOS + + - MS-DOS with Microsoft Compiler + + - MS-Windows with Microsoft Compiler + + - NeXT + + - SunOS 3.x, Sun 386 (Road Runner) + + - Tandem (non-POSIX) + + - Prestandard VAX C compiler for VAX/VMS + + Version 4.1 of `gawk' introduced the following features: + + * Three new arrays: `SYMTAB', `FUNCTAB', and + `PROCINFO["identifiers"]' (*note Auto-set::). + + * The three executables `gawk', `pgawk', and `dgawk', were merged + into one, named just `gawk'. As a result the command line options + changed. + + * Command line option changes (*note Options::): + + - The `-D' option invokes the debugger. + + - The `-i' and `--include' options load `awk' library files. + + - The `-l' and `--load' options load compiled dynamic + extensions. + + - The `-M' and `--bignum' options enable MPFR. + + - The `-o' only does pretty-printing. + + - The `-p' option is used for profiling. + + - The `-R' option was removed. + + * Support for high precision arithmetic with MPFR. (*note Gawk and + MPFR::). + + * The `and()', `or()' and `xor()' functions changed to allow any + number of arguments, with a minimum of two (*note Bitwise + Functions::). + + * The dynamic extension interface was completely redone (*note + Dynamic Extensions::). + + + +File: gawk.info, Node: Common Extensions, Next: Ranges and Locales, Prev: Feature History, Up: Language History + +A.7 Common Extensions Summary ============================= This minor node summarizes the common extensions supported by `gawk', @@ -25160,18 +25741,18 @@ available versions of `awk' (*note Other Versions::). Feature BWK Awk Mawk GNU Awk -------------------------------------------------------- `\x' Escape sequence X X X -`RS' as regexp X X `FS' as null string X X X -`/dev/stdin' special file X X +`/dev/stdin' special file X X X `/dev/stdout' special file X X X `/dev/stderr' special file X X X -`**' and `**=' operators X X +`delete' without subscript X X X `fflush()' function X X X -`func' keyword X X +`length()' of an array X X X `nextfile' statement X X X -`delete' without subscript X X X -`length()' of an array X X +`**' and `**=' operators X X +`func' keyword X X `BINMODE' variable X X +`RS' as regexp X X Time related functions X X (Technically speaking, as of late 2012, `fflush()', `delete ARRAY', @@ -25181,7 +25762,7 @@ POSIX.) File: gawk.info, Node: Ranges and Locales, Next: Contributors, Prev: Common Extensions, Up: Language History -A.7 Regexp Ranges and Locales: A Long Sad Story +A.8 Regexp Ranges and Locales: A Long Sad Story =============================================== This minor node describes the confusing history of ranges within @@ -25204,7 +25785,7 @@ as working in this fashion, and in particular, would teach that the `[A-Z]' was the "correct" way to match uppercase letters. And indeed, this was true.(1) - The 1993 POSIX standard introduced the idea of locales (*note + The 1992 POSIX standard introduced the idea of locales (*note Locales::). Since many locales include other letters besides the plain twenty-six letters of the American English alphabet, the POSIX standard added character classes (*note Bracket Expressions::) as a way to match @@ -25267,17 +25848,17 @@ of range expressions was _undefined_.(3) By using this lovely technical term, the standard gives license to implementors to implement ranges in whatever way they choose. The `gawk' maintainer chose to apply the pre-POSIX meaning in all cases: -the default regexp matching; with `--traditional', and with `--posix'; +the default regexp matching; with `--traditional' and with `--posix'; in all cases, `gawk' remains POSIX compliant. ---------- Footnotes ---------- (1) And Life was good. - (2) And thus was born the Campain for Rational Range Interpretation -(or RRI). A number of GNU tools, such as `grep' and `sed', have either -implemented this change, or will soon. Thanks to Karl Berry for -coining the phrase "Rational Range Interpretation." + (2) And thus was born the Campaign for Rational Range Interpretation +(or RRI). A number of GNU tools have either implemented this change, or +will soon. Thanks to Karl Berry for coining the phrase "Rational Range +Interpretation." (3) See the standard (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05) @@ -25287,11 +25868,10 @@ and its rationale File: gawk.info, Node: Contributors, Prev: Ranges and Locales, Up: Language History -A.8 Major Contributors to `gawk' +A.9 Major Contributors to `gawk' ================================ - Always give credit where credit is due. - Anonymous + Always give credit where credit is due. -- Anonymous This minor node names the major contributors to `gawk' and/or this Info file, in approximate chronological order: @@ -25388,11 +25968,19 @@ Info file, in approximate chronological order: * Patrick T.J. McPhee contributed the code for dynamic loading in Windows32 environments. (This is no longer supported) + * Anders Wallin helped keep the VMS port going for several years. + + * Assaf Gordon contributed the code to implement the `--sandbox' + option. + * John Haque made the following contributions: - The modifications to convert `gawk' into a byte-code interpreter, including the debugger. + - The addition of true multidimensional arrays. *note Arrays + of Arrays::. + - The additional modifications for support of arbitrary precision arithmetic. @@ -25403,12 +25991,19 @@ Info file, in approximate chronological order: - Improved array internals for arrays indexed by integers. + - The improved array sorting features were driven by John + together with Pat Rankin. + * Efraim Yawitz contributed the original text for *note Debugger::. * The development of the extension API first released with `gawk' 4.1 was driven primarily by Arnold Robbins and Andrew Schorr, with notable contributions from the rest of the development team. + * Antonio Giovanni Colombo rewrote a number of examples in the early + chapters that were severely dated, for which I am incredibly + grateful. + * Arnold Robbins has been working on `gawk' since 1988, at first helping David Trueman, and as the primary maintainer since around 1994. @@ -25465,7 +26060,7 @@ There are three ways to get GNU software: supported. If you have the `wget' program, you can use a command like the following: - wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.0.tar.gz + wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.1.tar.gz The GNU software archive is mirrored around the world. The up-to-date list of mirror sites is available from the main FSF web site @@ -25484,26 +26079,26 @@ compression programs: `gzip', `bzip2', and `xz'. For simplicity, the rest of these instructions assume you are using the one compressed with the GNU Zip program, `gzip'. - Once you have the distribution (for example, `gawk-4.1.0.tar.gz'), + Once you have the distribution (for example, `gawk-4.1.1.tar.gz'), use `gzip' to expand the file and then use `tar' to extract it. You can use the following pipeline to produce the `gawk' distribution: # Under System V, add 'o' to the tar options - gzip -d -c gawk-4.1.0.tar.gz | tar -xvpf - + gzip -d -c gawk-4.1.1.tar.gz | tar -xvpf - On a system with GNU `tar', you can let `tar' do the decompression for you: - tar -xvpzf gawk-4.1.0.tar.gz + tar -xvpzf gawk-4.1.1.tar.gz -Extracting the archive creates a directory named `gawk-4.1.0' in the +Extracting the archive creates a directory named `gawk-4.1.1' in the current directory. The distribution file name is of the form `gawk-V.R.P.tar.gz'. The V represents the major version of `gawk', the R represents the current release of version V, and the P represents a "patch level", meaning that minor bugs have been fixed in the release. The current patch -level is 0, but when retrieving distributions, you should get the +level is 1, but when retrieving distributions, you should get the version with the highest version, release, and patch level. (Note, however, that patch levels greater than or equal to 70 denote "beta" or nonproduction software; you might not want to retrieve such a version @@ -25525,6 +26120,13 @@ to different non-Unix operating systems: Various `.c', `.y', and `.h' files The actual `gawk' source code. +`ABOUT-NLS' + Information about GNU `gettext' and translations. + +`AUTHORS' + A file with some information about the authorship of `gawk'. It + exists only to satisfy the pedants at the Free Software Foundation. + `README' `README_d/README.*' Descriptive files: `README' for `gawk' under Unix and the rest for @@ -25550,16 +26152,6 @@ Various `.c', `.y', and `.h' files `COPYING' The GNU General Public License. -`FUTURES' - A brief list of features and changes being contemplated for future - releases, with some indication of the time frame for the feature, - based on its difficulty. - -`LIMITATIONS' - A list of those factors that limit `gawk''s performance. Most of - these depend on the hardware or operating system software and are - not limits in `gawk' itself. - `POSIX.STD' A description of behaviors in the POSIX standard for `awk' which are left undefined, or where `gawk' may not comply fully, as well @@ -25591,11 +26183,18 @@ Various `.c', `.y', and `.h' files The `troff' source for a manual page describing `gawk'. This is distributed for the convenience of Unix users. -`doc/gawk.texi' +`doc/gawktexi.in' +`doc/sidebar.awk' The Texinfo source file for this Info file. It should be - processed with TeX (via `texi2dvi' or `texi2pdf') to produce a - printed document, and with `makeinfo' to produce an Info or HTML - file. + processed by `doc/sidebar.awk' before processing with `texi2dvi' + or `texi2pdf' to produce a printed document, and with `makeinfo' + to produce an Info or HTML file. The `Makefile' takes care of + this processing and produces printable output via `texi2dvi' or + `texi2pdf'. + +`doc/gawk.texi' + The file produced after processing `gawktexi.in' with + `sidebar.awk'. `doc/gawk.info' The generated Info file for this Info file. @@ -25625,15 +26224,21 @@ Various `.c', `.y', and `.h' files `Makefile.in' `aclocal.m4' +`bisonfix.awk' +`config.guess' `configh.in' `configure.ac' `configure' `custom.h' +`depcomp' +`install-sh' `missing_d/*' +`mkinstalldirs' `m4/*' - These files and subdirectories are used when configuring `gawk' - for various Unix systems. They are explained in *note Unix - Installation::. + These files and subdirectories are used when configuring and + compiling `gawk' for various Unix systems. Most of them are + explained in *note Unix Installation::. The rest are there to + support the main infrastructure. `po/*' The `po' library contains message translations. @@ -25654,6 +26259,11 @@ Various `.c', `.y', and `.h' files of the programs in this Info file are available in appropriate subdirectories of `awklib/eg'. +`extension/*' + The source code, manual pages, and infrastructure files for the + sample extensions included with `gawk'. *Note Dynamic + Extensions::, for more information. + `posix/*' Files needed for building `gawk' on POSIX-compliant systems. @@ -25698,7 +26308,7 @@ Unix-derived systems, GNU/Linux, BSD-based systems, and the Cygwin environment for MS-Windows. After you have extracted the `gawk' distribution, `cd' to -`gawk-4.1.0'. Like most GNU software, `gawk' is configured +`gawk-4.1.1'. Like most GNU software, `gawk' is configured automatically for your system by running the `configure' program. This program is a Bourne shell script that is generated automatically using GNU `autoconf'. (The `autoconf' software is described fully starting @@ -25737,8 +26347,8 @@ failure is not described there, please send in a bug report (*note Bugs::). Of course, once you've built `gawk', it is likely that you will wish -to install it. To do so, you need to run the command `make check', as -a user with the appropriate permissions. How to do this varies by +to install it. To do so, you need to run the command `make install', +as a user with the appropriate permissions. How to do this varies by system, but on many systems you can use the `sudo' command to do so. The command then becomes `sudo make install'. It is likely that you will be asked for your password, and you will have to have been set up @@ -25753,6 +26363,12 @@ B.2.2 Additional Configuration Options There are several additional options you may use on the `configure' command line when compiling `gawk' from scratch, including: +`--disable-extensions' + Disable configuring and building the sample extensions in the + `extension' directory. This is useful for cross-compiling. The + default action is to dynamically check if the extensions can be + configured and compiled. + `--disable-lint' Disable all lint checking within `gawk'. The `--lint' and `--lint-old' options (*note Options::) are accepted, but silently @@ -26015,10 +26631,9 @@ File: gawk.info, Node: PC Using, Next: Cygwin, Prev: PC Testing, Up: PC Inst B.3.1.4 Using `gawk' on PC Operating Systems ............................................ -With the exception of the Cygwin environment, the `|&' operator and -TCP/IP networking (*note TCP/IP Networking::) are not supported for -MS-DOS or MS-Windows. EMX (OS/2 only) does support at least the `|&' -operator. +Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support +both the `|&' operator and TCP/IP networking (*note TCP/IP +Networking::). EMX (OS/2 only) supports at least the `|&' operator. The MS-DOS and MS-Windows versions of `gawk' search for program files as described in *note AWKPATH Variable::. However, semicolons @@ -26111,13 +26726,13 @@ B.3.1.5 Using `gawk' In The Cygwin Environment `gawk' can be built and used "out of the box" under MS-Windows if you are using the Cygwin environment (http://www.cygwin.com). This -environment provides an excellent simulation of Unix, using the GNU -tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, and -other GNU programs. Compilation and installation for Cygwin is the +environment provides an excellent simulation of GNU/Linux, using the +GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, +and other GNU programs. Compilation and installation for Cygwin is the same as for a Unix system: - tar -xvpzf gawk-4.1.0.tar.gz - cd gawk-4.1.0 + tar -xvpzf gawk-4.1.1.tar.gz + cd gawk-4.1.1 ./configure make @@ -26125,10 +26740,6 @@ same as for a Unix system: on Cygwin takes considerably longer. However, it does finish, and then the `make' proceeds as usual. - NOTE: The `|&' operator and TCP/IP networking (*note TCP/IP - Networking::) are fully supported in the Cygwin environment. This - is not true for any other environment on MS-Windows. - File: gawk.info, Node: MSYS, Prev: Cygwin, Up: PC Installation @@ -26155,51 +26766,112 @@ older designation "VMS" is used throughout to refer to OpenVMS. * Menu: * VMS Compilation:: How to compile `gawk' under VMS. +* VMS Dynamic Extensions:: Compiling `gawk' dynamic extensions on + VMS. * VMS Installation Details:: How to install `gawk' under VMS. * VMS Running:: How to run `gawk' under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. -File: gawk.info, Node: VMS Compilation, Next: VMS Installation Details, Up: VMS Installation +File: gawk.info, Node: VMS Compilation, Next: VMS Dynamic Extensions, Up: VMS Installation B.3.2.1 Compiling `gawk' on VMS ............................... To compile `gawk' under VMS, there is a `DCL' command procedure that issues all the necessary `CC' and `LINK' commands. There is also a -`Makefile' for use with the `MMS' utility. From the source directory, -use either: +`Makefile' for use with the `MMS' and `MMK' utilities. From the source +directory, use either: - $ @[.VMS]VMSBUILD.COM + $ @[.vms]vmsbuild.com or: - $ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK + $ MMS/DESCRIPTION=[.vms]descrip.mms gawk + +or: + + $ MMK/DESCRIPTION=[.vms]descrip.mms gawk + + `MMK' is an open source, free, near-clone of `MMS' and can better +handle `ODS-5' volumes with upper- and lowercase filenames. `MMK' is +available from `https://github.com/endlesssoftware/mmk'. + + With `ODS-5' volumes and extended parsing enabled, the case of the +target parameter may need to be exact. - Older versions of `gawk' could be built with VAX C or GNU C on -VAX/VMS, as well as with DEC C, but that is no longer supported. DEC C -(also briefly known as "Compaq C" and now known as "HP C," but referred -to here as "DEC C") is required. Both `VMSBUILD.COM' and `DESCRIP.MMS' -contain some obsolete support for the older compilers but are set up to -use DEC C by default. + `gawk' has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1 using +Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. +The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both Alpha +and IA64 VMS 8.4 used HP C 7.3.(1) - `gawk' has been tested under Alpha/VMS 7.3-1 using Compaq C V6.4, -and on Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.(1) + The `[.vms]gawk_build_steps.txt' provides information on how to build +`gawk' into a PCSI kit that is compatible with the GNV product. ---------- Footnotes ---------- (1) The IA64 architecture is also known as "Itanium." -File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Compilation, Up: VMS Installation +File: gawk.info, Node: VMS Dynamic Extensions, Next: VMS Installation Details, Prev: VMS Compilation, Up: VMS Installation -B.3.2.2 Installing `gawk' on VMS +B.3.2.2 Compiling `gawk' Dynamic Extensions on VMS +.................................................. + +The extensions that have been ported to VMS can be built using one of +the following commands. + + $ MMS/DESCRIPTION=[.vms]descrip.mms extensions + +or: + + $ MMK/DESCRIPTION=[.vms]descrip.mms extensions + + `gawk' uses `AWKLIBPATH' as either an environment variable or a +logical name to find the dynamic extensions. + + Dynamic extensions need to be compiled with the same compiler +options for floating point, pointer size, and symbol name handling as +were used to compile `gawk' itself. Alpha and Itanium should use IEEE +floating point. The pointer size is 32 bits, and the symbol name +handling should be exact case with CRC shortening for symbols longer +than 32 bits. + + For Alpha and Itanium: + + /name=(as_is,short) + /float=ieee/ieee_mode=denorm_results + + For VAX: + + /name=(as_is,short) + + Compile time macros need to be defined before the first VMS-supplied +header file is included. + + #if (__CRTL_VER >= 70200000) && !defined (__VAX) + #define _LARGEFILE 1 + #endif + + #ifndef __VAX + #ifdef __CRTL_VER + #if __CRTL_VER >= 80200000 + #define _USE_STD_STAT 1 + #endif + #endif + #endif + + +File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Dynamic Extensions, Up: VMS Installation + +B.3.2.3 Installing `gawk' on VMS ................................ -To install `gawk', all you need is a "foreign" command, which is a -`DCL' symbol whose value begins with a dollar sign. For example: +To use `gawk', all you need is a "foreign" command, which is a `DCL' +symbol whose value begins with a dollar sign. For example: - $ GAWK :== $disk1:[gnubin]GAWK + $ GAWK :== $disk1:[gnubin]gawk Substitute the actual location of `gawk.exe' for `$disk1:[gnubin]'. The symbol should be placed in the `login.com' of any user who wants to run @@ -26207,9 +26879,27 @@ symbol should be placed in the `login.com' of any user who wants to run Alternatively, the symbol may be placed in the system-wide `sylogin.com' procedure, which allows all users to run `gawk'. - Optionally, the help entry can be loaded into a VMS help library: + If your `gawk' was installed by a PCSI kit into the `GNV$GNU:' +directory tree, the program will be known as +`GNV$GNU:[bin]gnv$gawk.exe' and the help file will be +`GNV$GNU:[vms_help]gawk.hlp'. + + The PCSI kit also installs a `GNV$GNU:[vms_bin]gawk_verb.cld' file +which can be used to add `gawk' and `awk' as DCL commands. + + For just the current process you can use: + + $ set command gnv$gnu:[vms_bin]gawk_verb.cld - $ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP + Or the system manager can use `GNV$GNU:[vms_bin]gawk_verb.cld' to +add the `gawk' and `awk' to the system wide `DCLTABLES'. + + The DCL syntax is documented in the `gawk.hlp' file. + + Optionally, the `gawk.hlp' entry can be loaded into a VMS help +library: + + $ LIBRARY/HELP sys$help:helplib [.vms]gawk.hlp (You may want to substitute a site-specific help library rather than the standard VMS library `HELPLIB'.) After loading the help text, the @@ -26231,9 +26921,9 @@ If `AWK_LIBRARY' has no definition, a default value of `SYS$LIBRARY:' is used for it. -File: gawk.info, Node: VMS Running, Next: VMS Old Gawk, Prev: VMS Installation Details, Up: VMS Installation +File: gawk.info, Node: VMS Running, Next: VMS GNV, Prev: VMS Installation Details, Up: VMS Installation -B.3.2.3 Running `gawk' on VMS +B.3.2.4 Running `gawk' on VMS ............................. Command-line parsing and quoting conventions are significantly different @@ -26259,6 +26949,35 @@ If any other dash-type options (or multiple parameters such as data files to process) are present, there is no ambiguity and `--' can be omitted. + The `exit' value is a Unix-style value and is encoded to a VMS exit +status value when the program exits. + + The VMS severity bits will be set based on the `exit' value. A +failure is indicated by 1 and VMS sets the `ERROR' status. A fatal +error is indicated by 2 and VMS will set the `FATAL' status. All other +values will have the `SUCCESS' status. The exit value is encoded to +comply with VMS coding standards and will have the `C_FACILITY_NO' of +`0x350000' with the constant `0xA000' added to the number shifted over +by 3 bits to make room for the severity codes. + + To extract the actual `gawk' exit code from the VMS status use: + + unix_status = (vms_status .and. &x7f8) / 8 + +A C program that uses `exec()' to call `gawk' will get the original +Unix-style exit value. + + Older versions of `gawk' treated a Unix exit code 0 as 1, a failure +as 2, a fatal error as 4, and passed all the other numbers through. +This violated the VMS exit status coding requirements. + + VAX/VMS floating point uses unbiased rounding. *Note Round +Function::. + + VMS reports time values in GMT unless one of the `SYS$TIMEZONE_RULE' +or `TZ' logical names is set. Older versions of VMS, such as VAX/VMS +7.3 do not set these logical names. + The default search path, when looking for `awk' program files specified by the `-f' option, is `"SYS$DISK:[],AWK_LIBRARY:"'. The logical name `AWKPATH' can be used to override this default. The format @@ -26267,9 +26986,27 @@ When defining it, the value should be quoted so that it retains a single translation and not a multitranslation `RMS' searchlist. -File: gawk.info, Node: VMS Old Gawk, Prev: VMS Running, Up: VMS Installation +File: gawk.info, Node: VMS GNV, Next: VMS Old Gawk, Prev: VMS Running, Up: VMS Installation + +B.3.2.5 The VMS GNV Project +........................... + +The VMS GNV package provides a build environment similar to POSIX with +ports of a collection of open source tools. The `gawk' found in the GNV +base kit is an older port. Currently the GNV project is being +reorganized to supply individual PCSI packages for each component. See +`https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/'. -B.3.2.4 Some VMS Systems Have An Old Version of `gawk' + The normal build procedure for `gawk' produces a program that is +suitable for use with GNV. + + The `vms/gawk_build_steps.txt' in the source documents the procedure +for building a VMS PCSI kit that is compatible with GNV. + + +File: gawk.info, Node: VMS Old Gawk, Prev: VMS GNV, Up: VMS Installation + +B.3.2.6 Some VMS Systems Have An Old Version of `gawk' ...................................................... Some versions of VMS have an old version of `gawk'. To access it, @@ -26286,8 +27023,8 @@ File: gawk.info, Node: Bugs, Next: Other Versions, Prev: Non-Unix Installatio B.4 Reporting Problems and Bugs =============================== - There is nothing more dangerous than a bored archeologist. - The Hitchhiker's Guide to the Galaxy + There is nothing more dangerous than a bored archeologist. -- The + Hitchhiker's Guide to the Galaxy If you have problems with `gawk' or think that you have found a bug, please report it to the developers; we cannot promise to do anything @@ -26353,7 +27090,8 @@ considered authoritative if it conflicts with this Info file. MS-DOS with DJGPP Scott Deifik, <scottd.mail@sbcglobal.net>. MS-Windows with MINGW Eli Zaretskii, <eliz@gnu.org>. OS/2 Andreas Buening, <andreas.buening@nexgo.de>. -VMS Pat Rankin, <r.pat.rankin@gmail.com> +VMS Pat Rankin, <r.pat.rankin@gmail.com>, and John + Malmberg, <wb8tyw@qsl.net>. z/OS (OS/390) Dave Pitts, <dpitts@cozx.com>. If your bug is also reproducible under Unix, please send a copy of @@ -26366,8 +27104,8 @@ B.5 Other Freely Available `awk' Implementations ================================================ It's kind of fun to put comments like this in your awk code. - `// Do C++ comments work? answer: yes! of course' - Michael Brennan + `// Do C++ comments work? answer: yes! of course' -- Michael + Brennan There are a number of other freely available `awk' implementations. This minor node briefly describes where to get them: @@ -26458,12 +27196,18 @@ Busybox Awk The OpenSolaris POSIX `awk' The version of `awk' in `/usr/xpg4/bin' on Solaris is more-or-less POSIX-compliant. It is based on the `awk' from Mortice Kern - Systems for PCs. The source code can be downloaded from the - OpenSolaris web site (http://www.opensolaris.org). This author - was able to make it compile and work under GNU/Linux with 1-2 - hours of work. Making it more generally portable (using GNU - Autoconf and/or Automake) would take more work, and this has not - been done, at least to our knowledge. + Systems for PCs. This author was able to make it compile and work + under GNU/Linux with 1-2 hours of work. Making it more generally + portable (using GNU Autoconf and/or Automake) would take more + work, and this has not been done, at least to our knowledge. + + The source code used to be available from the OpenSolaris web site. + However, that project was ended and the web site shut down. + Fortunately, the Illumos project + (http://wiki.illumos.org/display/illumos/illumos+Home) makes this + implementation available. You can view the files one at a time + from + `https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/awk_xpg4'. `jawk' This is an interpreter for `awk' written in Java. It claims to be @@ -26493,6 +27237,11 @@ QSE Awk `http://www.quiktrim.org/QTawk.html' for more information, including the manual and a download link. +Other Versions + See also the Wikipedia article + (http://en.wikipedia.org/wiki/Awk_language#Versions_and_implementations), + for information on additional versions. + File: gawk.info, Node: Notes, Next: Basic Concepts, Prev: Installation, Up: Top @@ -26959,10 +27708,9 @@ C.3 Probable Future Extensions ============================== AWK is a language similar to PERL, only considerably more elegant. - Arnold Robbins + -- Arnold Robbins - Hey! - Larry Wall + Hey! -- Larry Wall The `TODO' file in the `gawk' Git repository lists possible future enhancements. Some of these relate to the source code, and others to @@ -27096,7 +27844,7 @@ Some goals for the new API were: fashion for C code. - The ability to create arrays (including `gawk''s true - multi-dimensional arrays). + multidimensional arrays). Some additional important goals were: @@ -27278,7 +28026,7 @@ D.1 What a Program Does ======================= At the most basic level, the job of a program is to process some input -data and produce results. See *note figure-general-flow::. +data and produce results. See *note figure-general-flow::. _______ +------+ / \ +---------+ @@ -27511,9 +28259,6 @@ Bash The GNU version of the standard shell (the Bourne-Again SHell). See also "Bourne Shell." -BBS - See "Bulletin Board System." - Bit Short for "Binary Digit." All values in computer memory ultimately reduce to binary digits: values that are either zero or @@ -27559,11 +28304,6 @@ Built-in Variable Braces See "Curly Braces." -Bulletin Board System - A computer system allowing users to log in and read and/or leave - messages for other users of the system, much like leaving paper - notes on a bulletin board. - C The system programming language that most GNU software is written in. The `awk' programming language has C-like syntax, and this @@ -27670,8 +28410,8 @@ Dynamic Regular Expression (*Note Computed Regexps::.) Environment - A collection of strings, of the form NAME`='VAL, that each program - has available to it. Users generally place values into the + A collection of strings, of the form NAME`='`val', that each + program has available to it. Users generally place values into the environment in order to provide information to various programs. Typical examples are the environment variables `HOME' and `PATH'. @@ -28086,7 +28826,6 @@ GNU General Public License ************************** Version 3, 29 June 2007 - Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/' Everyone is permitted to copy and distribute verbatim copies of this @@ -28809,7 +29548,6 @@ GNU Free Documentation License ****************************** Version 1.3, 3 November 2008 - Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. `http://fsf.org/' @@ -29313,17 +30051,17 @@ Index * ! (exclamation point), !~ operator <5>: Computed Regexps. (line 6) * ! (exclamation point), !~ operator <6>: Case-sensitivity. (line 26) * ! (exclamation point), !~ operator: Regexp Usage. (line 19) -* " (double quote) <1>: Quoting. (line 37) -* " (double quote): Read Terminal. (line 25) +* " (double quote) in shell commands: Read Terminal. (line 25) * " (double quote), in regexp constants: Computed Regexps. (line 28) +* " (double quote), in shell commands: Quoting. (line 37) * # (number sign), #! (executable scripts): Executable Scripts. (line 6) * # (number sign), commenting: Comments. (line 6) -* $ (dollar sign): Regexp Operators. (line 35) * $ (dollar sign), $ field operator <1>: Precedence. (line 43) * $ (dollar sign), $ field operator: Fields. (line 19) * $ (dollar sign), incrementing fields and arrays: Increment Ops. (line 30) +* $ (dollar sign), regexp operator: Regexp Operators. (line 35) * % (percent sign), % operator: Precedence. (line 55) * % (percent sign), %= operator <1>: Precedence. (line 95) * % (percent sign), %= operator: Assignment Ops. (line 129) @@ -29331,13 +30069,13 @@ Index * & (ampersand), && operator: Boolean Ops. (line 57) * & (ampersand), gsub()/gensub()/sub() functions and: Gory Details. (line 6) -* ' (single quote) <1>: Quoting. (line 31) -* ' (single quote) <2>: Long. (line 33) * ' (single quote): One-shot. (line 15) +* ' (single quote) in gawk command lines: Long. (line 33) +* ' (single quote), in shell commands: Quoting. (line 31) * ' (single quote), vs. apostrophe: Comments. (line 27) * ' (single quote), with double quotes: Quoting. (line 53) -* () (parentheses) <1>: Profiling. (line 138) -* () (parentheses): Regexp Operators. (line 79) +* () (parentheses), in a profile: Profiling. (line 146) +* () (parentheses), regexp operator: Regexp Operators. (line 79) * * (asterisk), * operator, as multiplication operator: Precedence. (line 55) * * (asterisk), * operator, as regexp operator: Regexp Operators. @@ -29350,40 +30088,41 @@ Index * * (asterisk), **= operator: Assignment Ops. (line 129) * * (asterisk), *= operator <1>: Precedence. (line 95) * * (asterisk), *= operator: Assignment Ops. (line 129) -* + (plus sign): Regexp Operators. (line 102) * + (plus sign), + operator: Precedence. (line 52) * + (plus sign), ++ operator <1>: Precedence. (line 46) * + (plus sign), ++ operator: Increment Ops. (line 11) * + (plus sign), += operator <1>: Precedence. (line 95) * + (plus sign), += operator: Assignment Ops. (line 82) +* + (plus sign), regexp operator: Regexp Operators. (line 102) * , (comma), in range patterns: Ranges. (line 6) * - (hyphen), - operator: Precedence. (line 52) * - (hyphen), -- operator <1>: Precedence. (line 46) * - (hyphen), -- operator: Increment Ops. (line 48) * - (hyphen), -= operator <1>: Precedence. (line 95) * - (hyphen), -= operator: Assignment Ops. (line 129) -* - (hyphen), filenames beginning with: Options. (line 73) +* - (hyphen), filenames beginning with: Options. (line 59) * - (hyphen), in bracket expressions: Bracket Expressions. (line 17) -* --assign option: Options. (line 46) +* --assign option: Options. (line 32) * --bignum option: Options. (line 201) -* --c option: Options. (line 95) -* --characters-as-bytes option: Options. (line 82) -* --copyright option: Options. (line 102) -* --debug option: Options. (line 122) -* --disable-lint configuration option: Additional Configuration Options. +* --characters-as-bytes option: Options. (line 68) +* --copyright option: Options. (line 88) +* --debug option: Options. (line 108) +* --disable-extensions configuration option: Additional Configuration Options. (line 9) +* --disable-lint configuration option: Additional Configuration Options. + (line 15) * --disable-nls configuration option: Additional Configuration Options. - (line 24) -* --dump-variables option <1>: Library Names. (line 45) -* --dump-variables option: Options. (line 107) -* --exec option: Options. (line 139) + (line 30) +* --dump-variables option: Options. (line 93) +* --dump-variables option, using for library functions: Library Names. + (line 45) +* --exec option: Options. (line 125) * --field-separator option: Options. (line 21) * --file option: Options. (line 25) * --gen-pot option <1>: String Extraction. (line 6) -* --gen-pot option: Options. (line 161) -* --help option: Options. (line 168) -* --include option: Options. (line 32) -* --L option: Options. (line 288) +* --gen-pot option: Options. (line 147) +* --help option: Options. (line 154) +* --include option: Options. (line 159) * --lint option <1>: Options. (line 182) * --lint option: Command Line. (line 20) * --lint-old option: Options. (line 288) @@ -29405,29 +30144,31 @@ Index * --sandbox option, input redirection with getline: Getline. (line 19) * --sandbox option, output redirection with print, printf: Redirection. (line 6) -* --source option: Options. (line 131) -* --traditional option: Options. (line 95) +* --source option: Options. (line 117) +* --traditional option: Options. (line 81) * --traditional option, --posix option and: Options. (line 266) * --use-lc-numeric option: Options. (line 215) * --version option: Options. (line 293) * --with-whiny-user-strftime configuration option: Additional Configuration Options. - (line 29) -* -b option: Options. (line 82) -* -C option: Options. (line 102) -* -D option: Options. (line 122) -* -d option: Options. (line 107) -* -E option: Options. (line 139) -* -e option: Options. (line 131) -* -F option: Command Line Field Separator. - (line 6) + (line 35) +* -b option: Options. (line 68) +* -C option: Options. (line 88) +* -c option: Options. (line 81) +* -D option: Options. (line 108) +* -d option: Options. (line 93) +* -E option: Options. (line 125) +* -e option: Options. (line 117) * -f option: Options. (line 25) * -F option: Options. (line 21) * -f option: Long. (line 12) * -F option, -Ft sets FS to TAB: Options. (line 301) +* -F option, command line: Command Line Field Separator. + (line 6) * -f option, multiple uses: Options. (line 306) -* -g option: Options. (line 161) -* -h option: Options. (line 168) -* -i option: Options. (line 32) +* -g option: Options. (line 147) +* -h option: Options. (line 154) +* -i option: Options. (line 159) +* -L option: Options. (line 288) * -l option: Options. (line 173) * -M option: Options. (line 201) * -N option: Options. (line 215) @@ -29440,9 +30181,9 @@ Index * -S option: Options. (line 279) * -v option: Assignment Options. (line 12) * -V option: Options. (line 293) -* -v option: Options. (line 46) -* -W option: Options. (line 60) -* . (period): Regexp Operators. (line 43) +* -v option: Options. (line 32) +* -W option: Options. (line 46) +* . (period), regexp operator: Regexp Operators. (line 43) * .gmo files: Explaining gettext. (line 41) * .gmo files, converting from .po: I18N Example. (line 62) * .gmo files, specifying directory of <1>: Programmer i18n. (line 47) @@ -29451,7 +30192,7 @@ Index * .po files: Explaining gettext. (line 36) * .po files, converting to .gmo: I18N Example. (line 62) * .pot files: Explaining gettext. (line 30) -* / (forward slash): Regexp. (line 10) +* / (forward slash) to enclose regular expressions: Regexp. (line 10) * / (forward slash), / operator: Precedence. (line 55) * / (forward slash), /= operator <1>: Precedence. (line 95) * / (forward slash), /= operator: Assignment Ops. (line 129) @@ -29459,17 +30200,18 @@ Index (line 147) * / (forward slash), patterns and: Expression Patterns. (line 24) * /= operator vs. /=.../ regexp constant: Assignment Ops. (line 147) -* /dev/... special files (gawk): Special FD. (line 46) -* /dev/fd/N special files: Special FD. (line 46) +* /dev/... special files: Special FD. (line 46) +* /dev/fd/N special files (gawk): Special FD. (line 46) * /inet/... special files (gawk): TCP/IP Networking. (line 6) * /inet4/... special files (gawk): TCP/IP Networking. (line 6) * /inet6/... special files (gawk): TCP/IP Networking. (line 6) -* ; (semicolon): Statements/Lines. (line 91) -* ; (semicolon), AWKPATH variable and: PC Using. (line 11) +* ; (semicolon), AWKPATH variable and: PC Using. (line 10) * ; (semicolon), separating statements in actions <1>: Statements. (line 10) -* ; (semicolon), separating statements in actions: Action Overview. +* ; (semicolon), separating statements in actions <2>: Action Overview. (line 19) +* ; (semicolon), separating statements in actions: Statements/Lines. + (line 91) * < (left angle bracket), < operator <1>: Precedence. (line 65) * < (left angle bracket), < operator: Comparison Operators. (line 11) @@ -29490,15 +30232,13 @@ Index (line 11) * > (right angle bracket), >> operator (I/O) <1>: Precedence. (line 65) * > (right angle bracket), >> operator (I/O): Redirection. (line 50) -* ? (question mark) regexp operator <1>: GNU Regexp Operators. - (line 59) -* ? (question mark) regexp operator: Regexp Operators. (line 111) * ? (question mark), ?: operator: Precedence. (line 92) -* [] (square brackets): Regexp Operators. (line 55) -* \ (backslash) <1>: Regexp Operators. (line 18) -* \ (backslash) <2>: Quoting. (line 31) -* \ (backslash) <3>: Comments. (line 50) -* \ (backslash): Read Terminal. (line 25) +* ? (question mark), regexp operator <1>: GNU Regexp Operators. + (line 59) +* ? (question mark), regexp operator: Regexp Operators. (line 111) +* [] (square brackets), regexp operator: Regexp Operators. (line 55) +* \ (backslash): Comments. (line 50) +* \ (backslash) in shell commands: Read Terminal. (line 25) * \ (backslash), \" escape sequence: Escape Sequences. (line 76) * \ (backslash), \' operator (gawk): GNU Regexp Operators. (line 56) @@ -29545,22 +30285,27 @@ Index * \ (backslash), in escape sequences, POSIX and: Escape Sequences. (line 112) * \ (backslash), in regexp constants: Computed Regexps. (line 28) -* ^ (caret): GNU Regexp Operators. - (line 59) +* \ (backslash), in shell commands: Quoting. (line 31) +* \ (backslash), regexp operator: Regexp Operators. (line 18) * ^ (caret), ^ operator: Precedence. (line 49) * ^ (caret), ^= operator <1>: Precedence. (line 95) * ^ (caret), ^= operator: Assignment Ops. (line 129) * ^ (caret), in bracket expressions: Bracket Expressions. (line 17) -* ^ (caret), regexp operator: Regexp Operators. (line 22) -* ^, in FS: Regexp Field Splitting. +* ^ (caret), in FS: Regexp Field Splitting. + (line 59) +* ^ (caret), regexp operator <1>: GNU Regexp Operators. (line 59) +* ^ (caret), regexp operator: Regexp Operators. (line 22) * _ (underscore), C macro: Explaining gettext. (line 70) * _ (underscore), in names of private variables: Library Names. (line 29) * _ (underscore), translatable string: Programmer i18n. (line 69) * _gr_init() user-defined function: Group Functions. (line 82) +* _ord_init() user-defined function: Ordinal Functions. (line 16) * _pw_init() user-defined function: Passwd Functions. (line 105) * accessing fields: Fields. (line 6) +* accessing global variables from extensions: Symbol Table Access. + (line 6) * account information <1>: Group Functions. (line 6) * account information: Passwd Functions. (line 16) * actions: Action Overview. (line 6) @@ -29570,21 +30315,21 @@ Index * Ada programming language: Glossary. (line 20) * adding, features to gawk: Adding Code. (line 6) * adding, fields: Changing Fields. (line 53) -* advanced features, fixed-width data: Constant Size. (line 9) +* advanced features, fixed-width data: Constant Size. (line 10) * advanced features, gawk: Advanced Features. (line 6) -* advanced features, network connections, See Also networks, connections: Advanced Features. - (line 6) * advanced features, network programming: TCP/IP Networking. (line 6) * advanced features, nondecimal input data: Nondecimal Data. (line 6) * advanced features, processes, communicating with: Two-way I/O. (line 23) * advanced features, specifying field content: Splitting By Content. - (line 9) -* Aho, Alfred <1>: Contributors. (line 12) + (line 10) +* Aho, Alfred <1>: Contributors. (line 11) * Aho, Alfred: History. (line 17) -* alarm clock example program: Alarm Program. (line 9) -* alarm.awk program: Alarm Program. (line 29) +* alarm clock example program: Alarm Program. (line 11) +* alarm.awk program: Alarm Program. (line 31) * algorithms: Basic High Level. (line 68) +* allocating memory for extensions: Memory Allocation Functions. + (line 6) * Alpha (DEC): Manual History. (line 28) * amazing awk assembler (aaa): Glossary. (line 12) * amazingly workable formatter (awf): Glossary. (line 25) @@ -29595,17 +30340,25 @@ Index * ampersand (&), gsub()/gensub()/sub() functions and: Gory Details. (line 6) * anagram.awk program: Anagram Program. (line 22) +* anagrams, finding: Anagram Program. (line 6) +* and: Bitwise Functions. (line 39) * AND bitwise operation: Bitwise Functions. (line 6) * and Boolean-logic operator: Boolean Ops. (line 6) -* and() function (gawk): Bitwise Functions. (line 39) * ANSI: Glossary. (line 35) +* API informational variables: Extension API Informational Variables. + (line 6) +* API version: Extension Versioning. + (line 6) * arbitrary precision: Arbitrary Precision Arithmetic. (line 6) +* arbitrary precision integers: Arbitrary Precision Integers. + (line 6) * archeologists: Bugs. (line 6) -* ARGC/ARGV variables <1>: ARGC and ARGV. (line 6) +* arctangent: Numeric Functions. (line 11) * ARGC/ARGV variables: Auto-set. (line 11) * ARGC/ARGV variables, command-line arguments: Other Arguments. (line 12) +* ARGC/ARGV variables, how to use: ARGC and ARGV. (line 6) * ARGC/ARGV variables, portability and: Executable Scripts. (line 42) * ARGIND variable: Auto-set. (line 40) * ARGIND variable, command-line arguments: Other Arguments. (line 12) @@ -29615,54 +30368,74 @@ Index * arguments, command-line, invoking awk: Command Line. (line 6) * arguments, in function calls: Function Calls. (line 16) * arguments, processing: Getopt Function. (line 6) +* ARGV array, indexing into: Other Arguments. (line 12) * arithmetic operators: Arithmetic Ops. (line 6) +* array manipulation in extensions: Array Manipulation. (line 6) +* array members: Reference to Elements. + (line 6) +* array scanning order, controlling: Controlling Scanning. + (line 12) +* array, number of elements: String Functions. (line 194) * arrays: Arrays. (line 6) +* arrays of arrays: Arrays of Arrays. (line 6) +* arrays, an example of using: Array Example. (line 6) +* arrays, and IGNORECASE variable: Array Intro. (line 91) * arrays, as parameters to functions: Pass By Value/Reference. (line 47) -* arrays, associative: Array Intro. (line 50) +* arrays, associative: Array Intro. (line 49) * arrays, associative, library functions and: Library Names. (line 57) * arrays, deleting entire contents: Delete. (line 39) -* arrays, elements, assigning: Assigning Elements. (line 6) +* arrays, elements that don't exist: Reference to Elements. + (line 23) +* arrays, elements, assigning values: Assigning Elements. (line 6) * arrays, elements, deleting: Delete. (line 6) -* arrays, elements, order of: Scanning an Array. (line 48) -* arrays, elements, referencing: Reference to Elements. - (line 6) -* arrays, elements, retrieving number of: String Functions. (line 29) +* arrays, elements, order of access by in operator: Scanning an Array. + (line 48) +* arrays, elements, retrieving number of: String Functions. (line 32) * arrays, for statement and: Scanning an Array. (line 20) -* arrays, IGNORECASE variable and: Array Intro. (line 92) -* arrays, indexing: Array Intro. (line 50) +* arrays, indexing: Array Intro. (line 49) * arrays, merging into strings: Join Function. (line 6) -* arrays, multidimensional: Multi-dimensional. (line 10) -* arrays, multidimensional, scanning: Multi-scanning. (line 11) -* arrays, names of: Arrays. (line 18) +* arrays, multidimensional: Multidimensional. (line 10) +* arrays, multidimensional, scanning: Multiscanning. (line 11) +* arrays, names of, and names of functions/variables: Arrays. (line 18) +* arrays, numeric subscripts: Numeric Array Subscripts. + (line 6) +* arrays, referencing elements: Reference to Elements. + (line 6) * arrays, scanning: Scanning an Array. (line 6) * arrays, sorting: Array Sorting Functions. (line 6) -* arrays, sorting, IGNORECASE variable and: Array Sorting Functions. - (line 81) -* arrays, sparse: Array Intro. (line 71) -* arrays, subscripts: Numeric Array Subscripts. - (line 6) +* arrays, sorting, and IGNORECASE variable: Array Sorting Functions. + (line 83) +* arrays, sparse: Array Intro. (line 70) * arrays, subscripts, uninitialized variables as: Uninitialized Subscripts. (line 6) +* arrays, unassigned elements: Reference to Elements. + (line 18) * artificial intelligence, gawk and: Distribution contents. - (line 55) -* ASCII <1>: Glossary. (line 141) + (line 52) +* ASCII <1>: Glossary. (line 133) * ASCII: Ordinal Functions. (line 45) -* asort() function (gawk) <1>: Array Sorting Functions. +* asort <1>: Array Sorting Functions. (line 6) -* asort() function (gawk): String Functions. (line 29) +* asort: String Functions. (line 32) * asort() function (gawk), arrays, sorting: Array Sorting Functions. (line 6) -* asorti() function (gawk): String Functions. (line 77) +* asorti <1>: Array Sorting Functions. + (line 6) +* asorti: String Functions. (line 32) +* asorti() function (gawk), arrays, sorting: Array Sorting Functions. + (line 6) * assert() function (C library): Assert Function. (line 6) * assert() user-defined function: Assert Function. (line 28) * assertions: Assert Function. (line 6) +* assign values to variables, in debugger: Viewing And Changing Data. + (line 59) * assignment operators: Assignment Ops. (line 6) * assignment operators, evaluation order: Assignment Ops. (line 111) * assignment operators, lvalues/rvalues: Assignment Ops. (line 32) * assignments as filenames: Ignoring Assigns. (line 6) -* associative arrays: Array Intro. (line 50) +* associative arrays: Array Intro. (line 49) * asterisk (*), * operator, as multiplication operator: Precedence. (line 55) * asterisk (*), * operator, as regexp operator: Regexp Operators. @@ -29675,9 +30448,10 @@ Index * asterisk (*), **= operator: Assignment Ops. (line 129) * asterisk (*), *= operator <1>: Precedence. (line 95) * asterisk (*), *= operator: Assignment Ops. (line 129) -* atan2() function: Numeric Functions. (line 11) +* atan2: Numeric Functions. (line 11) +* automatic displays, in debugger: Debugger Info. (line 24) * awf (amazingly workable formatter) program: Glossary. (line 25) -* awk debugging, enabling: Options. (line 122) +* awk debugging, enabling: Options. (line 108) * awk language, POSIX version: Assignment Ops. (line 136) * awk profiling, enabling: Options. (line 235) * awk programs <1>: Two Rules. (line 6) @@ -29714,7 +30488,7 @@ Index * awk, POSIX and: Preface. (line 23) * awk, POSIX and, See Also POSIX awk: Preface. (line 23) * awk, regexp constants and: Comparison Operators. - (line 103) + (line 102) * awk, See Also gawk: Preface. (line 36) * awk, terms describing: This Manual. (line 6) * awk, uses for <1>: When. (line 6) @@ -29731,16 +30505,14 @@ Index * awk, versions of, See Also Brian Kernighan's awk: BTL. (line 6) * awka compiler for awk: Other Versions. (line 64) * AWKLIBPATH environment variable: AWKLIBPATH Variable. (line 6) -* AWKPATH environment variable <1>: PC Using. (line 11) +* AWKPATH environment variable <1>: PC Using. (line 10) * AWKPATH environment variable: AWKPATH Variable. (line 6) * awkprof.out file: Profiling. (line 6) * awksed.awk program: Simple Sed. (line 25) -* awkvars.out file: Options. (line 107) +* awkvars.out file: Options. (line 93) * b debugger command (alias for break): Breakpoint Control. (line 11) -* backslash (\) <1>: Regexp Operators. (line 18) -* backslash (\) <2>: Quoting. (line 31) -* backslash (\) <3>: Comments. (line 50) -* backslash (\): Read Terminal. (line 25) +* backslash (\): Comments. (line 50) +* backslash (\) in shell commands: Read Terminal. (line 25) * backslash (\), \" escape sequence: Escape Sequences. (line 76) * backslash (\), \' operator (gawk): GNU Regexp Operators. (line 56) @@ -29787,17 +30559,18 @@ Index * backslash (\), in escape sequences, POSIX and: Escape Sequences. (line 112) * backslash (\), in regexp constants: Computed Regexps. (line 28) +* backslash (\), in shell commands: Quoting. (line 31) +* backslash (\), regexp operator: Regexp Operators. (line 18) * backtrace debugger command: Execution Stack. (line 13) -* BBS-list file: Sample Data Files. (line 6) -* Beebe, Nelson <1>: Other Versions. (line 78) -* Beebe, Nelson: Acknowledgments. (line 60) -* BEGIN pattern <1>: Profiling. (line 62) +* Beebe, Nelson H.F. <1>: Other Versions. (line 78) +* Beebe, Nelson H.F.: Acknowledgments. (line 60) +* BEGIN pattern <1>: Using BEGIN/END. (line 6) * BEGIN pattern <2>: BEGIN/END. (line 6) -* BEGIN pattern <3>: Field Separators. (line 44) -* BEGIN pattern: Records. (line 29) +* BEGIN pattern: Field Separators. (line 45) +* BEGIN pattern, and profiling: Profiling. (line 62) * BEGIN pattern, assert() user-defined function and: Assert Function. (line 83) -* BEGIN pattern, Boolean patterns and: Expression Patterns. (line 73) +* BEGIN pattern, Boolean patterns and: Expression Patterns. (line 70) * BEGIN pattern, exit statement and: Exit Statement. (line 12) * BEGIN pattern, getline and: Getline Notes. (line 19) * BEGIN pattern, headings, adding: Print Examples. (line 43) @@ -29814,29 +30587,36 @@ Index * BEGIN pattern, TEXTDOMAIN variable and: Programmer i18n. (line 60) * BEGINFILE pattern: BEGINFILE/ENDFILE. (line 6) * BEGINFILE pattern, Boolean patterns and: Expression Patterns. - (line 73) + (line 70) * beginfile() user-defined function: Filetrans Function. (line 62) -* Benzinger, Michael: Contributors. (line 98) +* Bentley, Jon: Glossary. (line 143) +* Benzinger, Michael: Contributors. (line 97) +* Berry, Karl <1>: Ranges and Locales. (line 74) * Berry, Karl: Acknowledgments. (line 33) * binary input/output: User-modified. (line 10) +* bindtextdomain <1>: Programmer i18n. (line 47) +* bindtextdomain: I18N Functions. (line 12) * bindtextdomain() function (C library): Explaining gettext. (line 49) -* bindtextdomain() function (gawk) <1>: Programmer i18n. (line 47) -* bindtextdomain() function (gawk): I18N Functions. (line 12) * bindtextdomain() function (gawk), portability and: I18N Portability. (line 33) -* BINMODE variable <1>: PC Using. (line 34) +* BINMODE variable <1>: PC Using. (line 33) * BINMODE variable: User-modified. (line 10) +* bit-manipulation functions: Bitwise Functions. (line 6) * bits2str() user-defined function: Bitwise Functions. (line 70) +* bitwise AND: Bitwise Functions. (line 39) +* bitwise complement: Bitwise Functions. (line 43) +* bitwise OR: Bitwise Functions. (line 49) +* bitwise XOR: Bitwise Functions. (line 55) * bitwise, complement: Bitwise Functions. (line 25) * bitwise, operations: Bitwise Functions. (line 6) * bitwise, shift: Bitwise Functions. (line 32) * body, in actions: Statements. (line 10) * body, in loops: While Statement. (line 14) * Boolean expressions: Boolean Ops. (line 6) -* Boolean expressions, as patterns: Expression Patterns. (line 41) +* Boolean expressions, as patterns: Expression Patterns. (line 39) * Boolean operators, See Boolean expressions: Boolean Ops. (line 6) * Bourne shell, quoting rules for: Quoting. (line 18) -* braces ({}): Profiling. (line 134) +* braces ({}): Profiling. (line 142) * braces ({}), actions and: Action Overview. (line 19) * braces ({}), statements, grouping: Statements. (line 10) * bracket expressions <1>: Bracket Expressions. (line 6) @@ -29855,18 +30635,45 @@ Index (line 6) * break debugger command: Breakpoint Control. (line 11) * break statement: Break Statement. (line 6) +* breakpoint: Debugging Terms. (line 33) +* breakpoint at location, how to delete: Breakpoint Control. (line 36) +* breakpoint commands: Debugger Execution Control. + (line 10) +* breakpoint condition: Breakpoint Control. (line 54) +* breakpoint, delete by number: Breakpoint Control. (line 64) +* breakpoint, how to disable or enable: Breakpoint Control. (line 69) +* breakpoint, setting: Breakpoint Control. (line 11) * Brennan, Michael <1>: Other Versions. (line 6) * Brennan, Michael <2>: Two-way I/O. (line 6) * Brennan, Michael <3>: Simple Sed. (line 25) -* Brennan, Michael: Delete. (line 56) -* Brian Kernighan's awk: Other Versions. (line 13) +* Brennan, Michael <4>: Delete. (line 56) +* Brennan, Michael: Foreword. (line 83) +* Brian Kernighan's awk <1>: I/O Functions. (line 40) +* Brian Kernighan's awk <2>: Gory Details. (line 15) +* Brian Kernighan's awk <3>: String Functions. (line 490) +* Brian Kernighan's awk <4>: Delete. (line 48) +* Brian Kernighan's awk <5>: Nextfile Statement. (line 47) +* Brian Kernighan's awk <6>: Continue Statement. (line 43) +* Brian Kernighan's awk <7>: Break Statement. (line 51) +* Brian Kernighan's awk <8>: I/O And BEGIN/END. (line 16) +* Brian Kernighan's awk <9>: Concatenation. (line 36) +* Brian Kernighan's awk <10>: Getline/Pipe. (line 62) +* Brian Kernighan's awk <11>: Regexp Field Splitting. + (line 67) +* Brian Kernighan's awk <12>: GNU Regexp Operators. + (line 83) +* Brian Kernighan's awk <13>: Escape Sequences. (line 116) +* Brian Kernighan's awk <14>: When. (line 21) +* Brian Kernighan's awk: Preface. (line 15) * Brian Kernighan's awk, extensions: BTL. (line 6) -* Broder, Alan J.: Contributors. (line 89) -* Brown, Martin: Contributors. (line 83) -* BSD-based operating systems: Glossary. (line 624) +* Brian Kernighan's awk, source code: Other Versions. (line 13) +* Brini, Davide: Signature Program. (line 6) +* Broder, Alan J.: Contributors. (line 88) +* Brown, Martin: Contributors. (line 82) +* BSD-based operating systems: Glossary. (line 616) * bt debugger command (alias for backtrace): Execution Stack. (line 13) -* Buening, Andreas <1>: Bugs. (line 71) -* Buening, Andreas <2>: Contributors. (line 93) +* Buening, Andreas <1>: Bugs. (line 70) +* Buening, Andreas <2>: Contributors. (line 92) * Buening, Andreas: Acknowledgments. (line 60) * buffering, input/output <1>: Two-way I/O. (line 70) * buffering, input/output: I/O Functions. (line 137) @@ -29879,7 +30686,7 @@ Index * built-in functions: Functions. (line 6) * built-in functions, evaluation order: Calling Built-in. (line 30) * built-in variables: Built-in Variables. (line 6) -* built-in variables, -v option, setting with: Options. (line 54) +* built-in variables, -v option, setting with: Options. (line 40) * built-in variables, conveying information: Auto-set. (line 6) * built-in variables, user-modifiable: User-modified. (line 6) * Busybox Awk: Other Versions. (line 88) @@ -29887,24 +30694,29 @@ Index (line 47) * call by value: Pass By Value/Reference. (line 18) -* caret (^): GNU Regexp Operators. - (line 59) +* call stack, display in debugger: Execution Stack. (line 13) * caret (^), ^ operator: Precedence. (line 49) * caret (^), ^= operator <1>: Precedence. (line 95) * caret (^), ^= operator: Assignment Ops. (line 129) * caret (^), in bracket expressions: Bracket Expressions. (line 17) +* caret (^), regexp operator <1>: GNU Regexp Operators. + (line 59) * caret (^), regexp operator: Regexp Operators. (line 22) * case keyword: Switch Statement. (line 6) -* case sensitivity, array indices and: Array Intro. (line 92) -* case sensitivity, converting case: String Functions. (line 524) +* case sensitivity, and regexps: User-modified. (line 82) +* case sensitivity, and string comparisons: User-modified. (line 82) +* case sensitivity, array indices and: Array Intro. (line 91) +* case sensitivity, converting case: String Functions. (line 520) * case sensitivity, example programs: Library Functions. (line 53) * case sensitivity, gawk: Case-sensitivity. (line 26) -* case sensitivity, regexps and <1>: User-modified. (line 82) * case sensitivity, regexps and: Case-sensitivity. (line 6) -* case sensitivity, string comparisons and: User-modified. (line 82) -* CGI, awk scripts for: Options. (line 139) +* CGI, awk scripts for: Options. (line 125) +* changing precision of a number: Changing Precision. (line 6) +* character classes, See bracket expressions: Regexp Operators. + (line 55) +* character lists in regular expression: Bracket Expressions. (line 6) * character lists, See bracket expressions: Regexp Operators. (line 55) -* character sets (machine character encodings) <1>: Glossary. (line 141) +* character sets (machine character encodings) <1>: Glossary. (line 133) * character sets (machine character encodings): Ordinal Functions. (line 45) * character sets, See Also bracket expressions: Regexp Operators. @@ -29913,10 +30725,10 @@ Index * characters, transliterating: Translate Program. (line 6) * characters, values of as numbers: Ordinal Functions. (line 6) * Chassell, Robert J.: Acknowledgments. (line 33) -* chdir extension function: Extension Sample File Functions. +* chdir() extension function: Extension Sample File Functions. (line 12) -* chem utility: Glossary. (line 151) -* chr extension function: Extension Sample Ord. +* chem utility: Glossary. (line 143) +* chr() extension function: Extension Sample Ord. (line 15) * chr() user-defined function: Ordinal Functions. (line 16) * clear debugger command: Breakpoint Control. (line 36) @@ -29924,24 +30736,26 @@ Index (line 6) * cliff_rand() user-defined function: Cliff Random Function. (line 12) -* close() function <1>: I/O Functions. (line 10) -* close() function <2>: Close Files And Pipes. +* close <1>: I/O Functions. (line 10) +* close: Close Files And Pipes. (line 18) -* close() function <3>: Getline/Pipe. (line 28) -* close() function: Getline/Variable/File. - (line 30) +* close file or coprocess: I/O Functions. (line 10) +* close() function, portability: Close Files And Pipes. + (line 81) * close() function, return value: Close Files And Pipes. (line 130) * close() function, two-way pipes and: Two-way I/O. (line 77) -* Close, Diane <1>: Contributors. (line 21) +* Close, Diane <1>: Contributors. (line 20) * Close, Diane: Manual History. (line 41) * Collado, Manuel: Acknowledgments. (line 60) * collating elements: Bracket Expressions. (line 69) * collating symbols: Bracket Expressions. (line 76) +* Colombo, Antonio <1>: Contributors. (line 135) * Colombo, Antonio: Acknowledgments. (line 60) * columns, aligning: Print Examples. (line 70) * columns, cutting: Cut Program. (line 6) * comma (,), in range patterns: Ranges. (line 6) +* command completion, in debugger: Readline Support. (line 6) * command line, arguments <1>: ARGC and ARGV. (line 6) * command line, arguments <2>: Auto-set. (line 11) * command line, arguments: Other Arguments. (line 6) @@ -29951,16 +30765,16 @@ Index * command line, FS on, setting: Command Line Field Separator. (line 6) * command line, invoking awk from: Command Line. (line 6) -* command line, options <1>: Command Line Field Separator. - (line 6) -* command line, options <2>: Options. (line 6) -* command line, options: Long. (line 12) -* command line, options, end of: Options. (line 68) +* command line, option -f: Long. (line 12) +* command line, options: Options. (line 6) +* command line, options, end of: Options. (line 54) * command line, variables, assigning on: Assignment Options. (line 6) * command-line options, processing: Getopt Function. (line 6) * command-line options, string extraction: String Extraction. (line 6) * commands debugger command: Debugger Execution Control. (line 10) +* commands to execute at breakpoint: Debugger Execution Control. + (line 10) * commenting: Comments. (line 6) * commenting, backslash continuation and: Statements/Lines. (line 76) * common extensions, ** operator: Arithmetic Ops. (line 30) @@ -29969,12 +30783,12 @@ Index * common extensions, /dev/stdin special file: Special FD. (line 46) * common extensions, /dev/stdout special file: Special FD. (line 46) * common extensions, \x escape sequence: Escape Sequences. (line 61) -* common extensions, BINMODE variable: PC Using. (line 34) +* common extensions, BINMODE variable: PC Using. (line 33) * common extensions, delete to delete entire arrays: Delete. (line 39) * common extensions, func keyword: Definition Syntax. (line 83) * common extensions, length() applied to an array: String Functions. - (line 198) -* common extensions, RS as a regexp: Records. (line 120) + (line 194) +* common extensions, RS as a regexp: Records. (line 135) * common extensions, single character fields: Single Character Fields. (line 6) * comp.lang.awk newsgroup: Bugs. (line 38) @@ -29982,74 +30796,89 @@ Index (line 9) * comparison expressions, as patterns: Expression Patterns. (line 14) * comparison expressions, string vs. regexp: Comparison Operators. - (line 79) + (line 78) * compatibility mode (gawk), extensions: POSIX/GNU. (line 6) * compatibility mode (gawk), file names: Special Caveats. (line 9) * compatibility mode (gawk), hexadecimal numbers: Nondecimal-numbers. (line 60) * compatibility mode (gawk), octal numbers: Nondecimal-numbers. (line 60) -* compatibility mode (gawk), specifying: Options. (line 95) -* compiled programs <1>: Glossary. (line 165) +* compatibility mode (gawk), specifying: Options. (line 81) +* compiled programs <1>: Glossary. (line 157) * compiled programs: Basic High Level. (line 15) * compiling gawk for Cygwin: Cygwin. (line 6) * compiling gawk for MS-DOS and MS-Windows: PC Compiling. (line 13) * compiling gawk for VMS: VMS Compilation. (line 6) * compiling gawk with EMX for OS/2: PC Compiling. (line 28) -* compl() function (gawk): Bitwise Functions. (line 43) +* compl: Bitwise Functions. (line 43) * complement, bitwise: Bitwise Functions. (line 25) * compound statements, control statements and: Statements. (line 10) -* concatenating: Concatenation. (line 9) +* concatenating: Concatenation. (line 8) * condition debugger command: Breakpoint Control. (line 54) * conditional expressions: Conditional Exp. (line 6) -* configuration option, --disable-lint: Additional Configuration Options. +* configuration option, --disable-extensions: Additional Configuration Options. (line 9) +* configuration option, --disable-lint: Additional Configuration Options. + (line 15) * configuration option, --disable-nls: Additional Configuration Options. - (line 24) + (line 30) * configuration option, --with-whiny-user-strftime: Additional Configuration Options. - (line 29) + (line 35) * configuration options, gawk: Additional Configuration Options. (line 6) +* constant regexps: Regexp Usage. (line 57) * constants, floating-point: Floating-point Constants. (line 6) * constants, nondecimal: Nondecimal Data. (line 6) +* constants, numeric: Scalar Constants. (line 6) * constants, types of: Constants. (line 6) * context, floating-point: Floating-point Context. (line 6) +* continue program, in debugger: Debugger Execution Control. + (line 33) * continue statement: Continue Statement. (line 6) * control statements: Statements. (line 6) -* converting, case: String Functions. (line 524) -* converting, dates to timestamps: Time Functions. (line 75) -* converting, during subscripting: Numeric Array Subscripts. +* controlling array scanning order: Controlling Scanning. + (line 12) +* convert string to lower case: String Functions. (line 521) +* convert string to number: String Functions. (line 385) +* convert string to upper case: String Functions. (line 527) +* converting integer array subscripts: Numeric Array Subscripts. (line 31) +* converting, dates to timestamps: Time Functions. (line 76) * converting, numbers to strings <1>: Bitwise Functions. (line 109) * converting, numbers to strings: Conversion. (line 6) * converting, strings to numbers <1>: Bitwise Functions. (line 109) * converting, strings to numbers: Conversion. (line 6) * CONVFMT variable <1>: User-modified. (line 28) * CONVFMT variable: Conversion. (line 29) -* CONVFMT variable, array subscripts and: Numeric Array Subscripts. +* CONVFMT variable, and array subscripts: Numeric Array Subscripts. (line 6) -* cookie: Glossary. (line 157) +* cookie: Glossary. (line 149) * coprocesses <1>: Two-way I/O. (line 44) * coprocesses: Redirection. (line 102) * coprocesses, closing: Close Files And Pipes. (line 6) * coprocesses, getline from: Getline/Coprocess. (line 6) -* cos() function: Numeric Functions. (line 15) +* cos: Numeric Functions. (line 15) +* cosine: Numeric Functions. (line 15) * counting: Wc Program. (line 6) * csh utility: Statements/Lines. (line 44) * csh utility, POSIXLY_CORRECT environment variable: Options. (line 348) * csh utility, |& operator, comparison with: Two-way I/O. (line 44) -* ctime() user-defined function: Function Example. (line 72) +* ctime() user-defined function: Function Example. (line 73) * currency symbols, localization: Explaining gettext. (line 103) +* current system time: Time Functions. (line 66) * custom.h file: Configuration Philosophy. (line 30) +* customized input parser: Input Parsers. (line 6) +* customized output wrapper: Output Wrappers. (line 6) +* customized two-way processor: Two-way processors. (line 6) * cut utility: Cut Program. (line 6) * cut.awk program: Cut Program. (line 45) * d debugger command (alias for delete): Breakpoint Control. (line 64) * d.c., See dark corner: Conventions. (line 38) -* dark corner <1>: Glossary. (line 197) +* dark corner <1>: Glossary. (line 189) * dark corner: Conventions. (line 38) * dark corner, "0" is actually true: Truth Values. (line 24) * dark corner, /= operator vs. /=.../ regexp constant: Assignment Ops. @@ -30070,15 +30899,15 @@ Index * dark corner, exit statement: Exit Statement. (line 30) * dark corner, field separators: Field Splitting Summary. (line 46) -* dark corner, FILENAME variable <1>: Auto-set. (line 93) +* dark corner, FILENAME variable <1>: Auto-set. (line 102) * dark corner, FILENAME variable: Getline Notes. (line 19) -* dark corner, FNR/NR variables: Auto-set. (line 314) +* dark corner, FNR/NR variables: Auto-set. (line 323) * dark corner, format-control characters: Control Letters. (line 18) * dark corner, FS as null string: Single Character Fields. (line 20) -* dark corner, input files: Records. (line 103) +* dark corner, input files: Records. (line 118) * dark corner, invoking awk: Command Line. (line 16) -* dark corner, length() function: String Functions. (line 184) +* dark corner, length() function: String Functions. (line 180) * dark corner, locale's decimal point character: Conversion. (line 77) * dark corner, multiline records: Multiple Line. (line 35) * dark corner, NF variable, decrementing: Changing Fields. (line 107) @@ -30089,26 +30918,26 @@ Index (line 147) * dark corner, regexp constants, as arguments to user-defined functions: Using Constant Regexps. (line 43) -* dark corner, split() function: String Functions. (line 363) -* dark corner, strings, storing: Records. (line 195) +* dark corner, split() function: String Functions. (line 359) +* dark corner, strings, storing: Records. (line 210) * dark corner, value of ARGV[0]: Auto-set. (line 35) -* data, fixed-width: Constant Size. (line 9) +* data, fixed-width: Constant Size. (line 10) * data-driven languages: Basic High Level. (line 85) * database, group, reading: Group Functions. (line 6) * database, users, reading: Passwd Functions. (line 6) * date utility, GNU: Time Functions. (line 17) -* date utility, POSIX: Time Functions. (line 262) -* dates, converting to timestamps: Time Functions. (line 75) +* date utility, POSIX: Time Functions. (line 263) +* dates, converting to timestamps: Time Functions. (line 76) * dates, information related to, localization: Explaining gettext. (line 115) -* Davies, Stephen <1>: Contributors. (line 75) +* Davies, Stephen <1>: Contributors. (line 74) * Davies, Stephen: Acknowledgments. (line 60) -* dcgettext() function (gawk) <1>: Programmer i18n. (line 19) -* dcgettext() function (gawk): I18N Functions. (line 22) +* dcgettext <1>: Programmer i18n. (line 19) +* dcgettext: I18N Functions. (line 22) * dcgettext() function (gawk), portability and: I18N Portability. (line 33) -* dcngettext() function (gawk) <1>: Programmer i18n. (line 36) -* dcngettext() function (gawk): I18N Functions. (line 28) +* dcngettext <1>: Programmer i18n. (line 36) +* dcngettext: I18N Functions. (line 28) * dcngettext() function (gawk), portability and: I18N Portability. (line 33) * deadlocks: Two-way I/O. (line 70) @@ -30208,20 +31037,33 @@ Index (line 67) * debugger commands, watch: Viewing And Changing Data. (line 67) +* debugger default list amount: Debugger Info. (line 69) +* debugger history file: Debugger Info. (line 80) +* debugger history size: Debugger Info. (line 65) +* debugger options: Debugger Info. (line 57) +* debugger prompt: Debugger Info. (line 77) +* debugger, how to start: Debugger Invocation. (line 6) +* debugger, read commands from a file: Debugger Info. (line 96) * debugging awk programs: Debugger. (line 6) * debugging gawk, bug reports: Bugs. (line 9) * decimal point character, locale specific: Options. (line 263) * decrement operators: Increment Ops. (line 35) * default keyword: Switch Statement. (line 6) * Deifik, Scott <1>: Bugs. (line 70) -* Deifik, Scott <2>: Contributors. (line 54) +* Deifik, Scott <2>: Contributors. (line 53) * Deifik, Scott: Acknowledgments. (line 60) +* delete ARRAY: Delete. (line 39) +* delete breakpoint at location: Breakpoint Control. (line 36) +* delete breakpoint by number: Breakpoint Control. (line 64) * delete debugger command: Breakpoint Control. (line 64) * delete statement: Delete. (line 6) +* delete watchpoint: Viewing And Changing Data. + (line 84) * deleting elements in arrays: Delete. (line 6) * deleting entire arrays: Delete. (line 39) * Demaille, Akim: Acknowledgments. (line 60) -* differences between gawk and awk: String Functions. (line 198) +* describe call stack frame, in debugger: Debugger Info. (line 27) +* differences between gawk and awk: String Functions. (line 194) * differences in awk and gawk, ARGC/ARGV variables: ARGC and ARGV. (line 88) * differences in awk and gawk, ARGIND variable: Auto-set. (line 40) @@ -30236,17 +31078,19 @@ Index * differences in awk and gawk, BEGINFILE/ENDFILE patterns: BEGINFILE/ENDFILE. (line 6) * differences in awk and gawk, BINMODE variable <1>: PC Using. - (line 34) + (line 33) * differences in awk and gawk, BINMODE variable: User-modified. (line 23) * differences in awk and gawk, close() function: Close Files And Pipes. (line 81) -* differences in awk and gawk, ERRNO variable: Auto-set. (line 73) +* differences in awk and gawk, command line directories: Command line directories. + (line 6) +* differences in awk and gawk, ERRNO variable: Auto-set. (line 82) * differences in awk and gawk, error messages: Special FD. (line 16) * differences in awk and gawk, FIELDWIDTHS variable: User-modified. (line 35) * differences in awk and gawk, FPAT variable: User-modified. (line 45) -* differences in awk and gawk, FUNCTAB variable: Auto-set. (line 119) +* differences in awk and gawk, FUNCTAB variable: Auto-set. (line 128) * differences in awk and gawk, function arguments (gawk): Calling Built-in. (line 16) * differences in awk and gawk, getline command: Getline. (line 19) @@ -30266,83 +31110,95 @@ Index (line 34) * differences in awk and gawk, LINT variable: User-modified. (line 98) * differences in awk and gawk, match() function: String Functions. - (line 261) + (line 257) * differences in awk and gawk, print/printf statements: Format Modifiers. (line 13) -* differences in awk and gawk, PROCINFO array: Auto-set. (line 133) -* differences in awk and gawk, record separators: Records. (line 117) +* differences in awk and gawk, PROCINFO array: Auto-set. (line 142) +* differences in awk and gawk, record separators: Records. (line 132) * differences in awk and gawk, regexp constants: Using Constant Regexps. (line 43) * differences in awk and gawk, regular expressions: Case-sensitivity. (line 26) -* differences in awk and gawk, RS/RT variables: Records. (line 172) -* differences in awk and gawk, RT variable: Auto-set. (line 266) +* differences in awk and gawk, RS/RT variables: Records. (line 187) +* differences in awk and gawk, RT variable: Auto-set. (line 275) * differences in awk and gawk, single-character fields: Single Character Fields. (line 6) * differences in awk and gawk, split() function: String Functions. - (line 351) + (line 347) * differences in awk and gawk, strings: Scalar Constants. (line 20) -* differences in awk and gawk, strings, storing: Records. (line 191) -* differences in awk and gawk, strtonum() function (gawk): String Functions. - (line 406) -* differences in awk and gawk, SYMTAB variable: Auto-set. (line 274) +* differences in awk and gawk, strings, storing: Records. (line 206) +* differences in awk and gawk, SYMTAB variable: Auto-set. (line 283) * differences in awk and gawk, TEXTDOMAIN variable: User-modified. (line 162) * differences in awk and gawk, trunc-mod operation: Arithmetic Ops. (line 66) * directories, command line: Command line directories. (line 6) -* directories, searching <1>: Igawk Program. (line 368) -* directories, searching <2>: AWKLIBPATH Variable. (line 6) -* directories, searching: AWKPATH Variable. (line 6) +* directories, searching: Igawk Program. (line 368) +* directories, searching for shared libraries: AWKLIBPATH Variable. + (line 6) +* directories, searching for source files: AWKPATH Variable. (line 6) +* disable breakpoint: Breakpoint Control. (line 69) * disable debugger command: Breakpoint Control. (line 69) * display debugger command: Viewing And Changing Data. (line 8) +* display debugger options: Debugger Info. (line 57) * division: Arithmetic Ops. (line 44) -* do-while statement <1>: Do Statement. (line 6) -* do-while statement: Regexp Usage. (line 19) +* do-while statement: Do Statement. (line 6) +* do-while statement, use of regexps in: Regexp Usage. (line 19) * documentation, of awk programs: Library Names. (line 6) * documentation, online: Manual History. (line 11) * documents, searching: Dupword Program. (line 6) -* dollar sign ($): Regexp Operators. (line 35) * dollar sign ($), $ field operator <1>: Precedence. (line 43) * dollar sign ($), $ field operator: Fields. (line 19) * dollar sign ($), incrementing fields and arrays: Increment Ops. (line 30) +* dollar sign ($), regexp operator: Regexp Operators. (line 35) * double precision floating-point: General Arithmetic. (line 21) -* double quote (") <1>: Quoting. (line 37) -* double quote ("): Read Terminal. (line 25) +* double quote (") in shell commands: Read Terminal. (line 25) * double quote ("), in regexp constants: Computed Regexps. (line 28) +* double quote ("), in shell commands: Quoting. (line 37) * down debugger command: Execution Stack. (line 21) * Drepper, Ulrich: Acknowledgments. (line 52) +* dump all variables of a program: Options. (line 93) * dump debugger command: Miscellaneous Debugger Commands. (line 9) * dupword.awk program: Dupword Program. (line 31) +* dynamic profiling: Profiling. (line 179) +* dynamically loaded extensions: Dynamic Extensions. (line 6) * e debugger command (alias for enable): Breakpoint Control. (line 73) * EBCDIC: Ordinal Functions. (line 45) +* effective group ID of gawk user: Auto-set. (line 147) +* effective user ID of gawk user: Auto-set. (line 151) * egrep utility <1>: Egrep Program. (line 6) * egrep utility: Bracket Expressions. (line 24) * egrep.awk program: Egrep Program. (line 54) -* elements in arrays: Reference to Elements. - (line 6) -* elements in arrays, assigning: Assigning Elements. (line 6) +* elements in arrays, assigning values: Assigning Elements. (line 6) * elements in arrays, deleting: Delete. (line 6) -* elements in arrays, order of: Scanning an Array. (line 48) +* elements in arrays, order of access by in operator: Scanning an Array. + (line 48) * elements in arrays, scanning: Scanning an Array. (line 6) +* elements of arrays: Reference to Elements. + (line 6) * email address for bug reports, bug-gawk@gnu.org: Bugs. (line 30) * EMISTERED: TCP/IP Networking. (line 6) +* empty array elements: Reference to Elements. + (line 18) * empty pattern: Empty. (line 6) +* empty strings: Records. (line 122) * empty strings, See null strings: Regexp Field Splitting. (line 43) +* enable breakpoint: Breakpoint Control. (line 73) * enable debugger command: Breakpoint Control. (line 73) * end debugger command: Debugger Execution Control. (line 10) -* END pattern <1>: Profiling. (line 62) +* END pattern <1>: Using BEGIN/END. (line 6) * END pattern: BEGIN/END. (line 6) +* END pattern, and profiling: Profiling. (line 62) * END pattern, assert() user-defined function and: Assert Function. (line 75) * END pattern, backslash continuation and: Egrep Program. (line 220) -* END pattern, Boolean patterns and: Expression Patterns. (line 73) +* END pattern, Boolean patterns and: Expression Patterns. (line 70) * END pattern, exit statement and: Exit Statement. (line 12) * END pattern, next/nextfile statements and <1>: Next Statement. (line 45) @@ -30351,36 +31207,40 @@ Index * END pattern, operators and: Using BEGIN/END. (line 17) * END pattern, print statement and: I/O And BEGIN/END. (line 16) * ENDFILE pattern: BEGINFILE/ENDFILE. (line 6) -* ENDFILE pattern, Boolean patterns and: Expression Patterns. (line 73) +* ENDFILE pattern, Boolean patterns and: Expression Patterns. (line 70) * endfile() user-defined function: Filetrans Function. (line 62) * endgrent() function (C library): Group Functions. (line 215) * endgrent() user-defined function: Group Functions. (line 218) * endpwent() function (C library): Passwd Functions. (line 210) * endpwent() user-defined function: Passwd Functions. (line 213) * ENVIRON array: Auto-set. (line 60) -* environment variables: Auto-set. (line 60) -* epoch, definition of: Glossary. (line 243) +* environment variables used by gawk: Environment Variables. + (line 6) +* environment variables, in ENVIRON array: Auto-set. (line 60) +* epoch, definition of: Glossary. (line 235) * equals sign (=), = operator: Assignment Ops. (line 6) * equals sign (=), == operator <1>: Precedence. (line 65) * equals sign (=), == operator: Comparison Operators. (line 11) * EREs (Extended Regular Expressions): Bracket Expressions. (line 24) * ERRNO variable <1>: TCP/IP Networking. (line 54) -* ERRNO variable <2>: Auto-set. (line 73) -* ERRNO variable <3>: BEGINFILE/ENDFILE. (line 26) -* ERRNO variable <4>: Close Files And Pipes. +* ERRNO variable: Auto-set. (line 82) +* ERRNO variable, with BEGINFILE pattern: BEGINFILE/ENDFILE. (line 26) +* ERRNO variable, with close() function: Close Files And Pipes. (line 138) -* ERRNO variable: Getline. (line 19) +* ERRNO variable, with getline command: Getline. (line 19) * error handling: Special FD. (line 16) -* error handling, ERRNO variable and: Auto-set. (line 73) +* error handling, ERRNO variable and: Auto-set. (line 82) * error output: Special FD. (line 6) * escape processing, gsub()/gensub()/sub() functions: Gory Details. (line 6) -* escape sequences: Escape Sequences. (line 6) +* escape sequences, in strings: Escape Sequences. (line 6) * eval debugger command: Viewing And Changing Data. (line 23) +* evaluate expressions, in debugger: Viewing And Changing Data. + (line 23) * evaluation order: Increment Ops. (line 60) -* evaluation order, concatenation: Concatenation. (line 42) +* evaluation order, concatenation: Concatenation. (line 41) * evaluation order, functions: Calling Built-in. (line 30) * examining fields: Fields. (line 6) * exclamation point (!), ! operator <1>: Egrep Program. (line 170) @@ -30400,9 +31260,13 @@ Index * exclamation point (!), !~ operator: Regexp Usage. (line 19) * exit statement: Exit Statement. (line 6) * exit status, of gawk: Exit Status. (line 6) -* exp() function: Numeric Functions. (line 18) +* exit status, of VMS: VMS Running. (line 29) +* exit the debugger: Miscellaneous Debugger Commands. + (line 99) +* exp: Numeric Functions. (line 18) * expand utility: Very Simple. (line 69) -* Expat XML parser library: gawkextlib. (line 33) +* Expat XML parser library: gawkextlib. (line 35) +* exponent: Numeric Functions. (line 18) * expressions: Expressions. (line 6) * expressions, as patterns: Expression Patterns. (line 6) * expressions, assignment: Assignment Ops. (line 6) @@ -30414,6 +31278,20 @@ Index (line 9) * expressions, selecting: Conditional Exp. (line 6) * Extended Regular Expressions (EREs): Bracket Expressions. (line 24) +* extension API: Extension API Description. + (line 6) +* extension API informational variables: Extension API Informational Variables. + (line 6) +* extension API version: Extension Versioning. + (line 6) +* extension API, version number: Auto-set. (line 238) +* extension example: Extension Example. (line 6) +* extension registration: Registration Functions. + (line 6) +* extension search path: Finding Extensions. (line 6) +* extensions distributed with gawk: Extension Samples. (line 6) +* extensions, allocating memory: Memory Allocation Functions. + (line 6) * extensions, Brian Kernighan's awk <1>: Common Extensions. (line 6) * extensions, Brian Kernighan's awk: BTL. (line 6) * extensions, common, ** operator: Arithmetic Ops. (line 30) @@ -30422,47 +31300,49 @@ Index * extensions, common, /dev/stdin special file: Special FD. (line 46) * extensions, common, /dev/stdout special file: Special FD. (line 46) * extensions, common, \x escape sequence: Escape Sequences. (line 61) -* extensions, common, BINMODE variable: PC Using. (line 34) +* extensions, common, BINMODE variable: PC Using. (line 33) * extensions, common, delete to delete entire arrays: Delete. (line 39) +* extensions, common, fflush() function: I/O Functions. (line 40) * extensions, common, func keyword: Definition Syntax. (line 83) * extensions, common, length() applied to an array: String Functions. - (line 198) -* extensions, common, RS as a regexp: Records. (line 120) + (line 194) +* extensions, common, RS as a regexp: Records. (line 135) * extensions, common, single character fields: Single Character Fields. (line 6) * extensions, in gawk, not in POSIX awk: POSIX/GNU. (line 6) * extensions, mawk: Common Extensions. (line 6) +* extensions, where to find: gawkextlib. (line 6) * extract.awk program: Extract Program. (line 79) * extraction, of marked strings (internationalization): String Extraction. (line 6) * f debugger command (alias for frame): Execution Stack. (line 25) * false, logical: Truth Values. (line 6) * FDL (Free Documentation License): GNU Free Documentation License. - (line 6) + (line 7) * features, adding to gawk: Adding Code. (line 6) * features, advanced, See advanced features: Obsolete. (line 6) * features, deprecated: Obsolete. (line 6) * features, undocumented: Undocumented. (line 6) -* Fenlason, Jay <1>: Contributors. (line 19) +* Fenlason, Jay <1>: Contributors. (line 18) * Fenlason, Jay: History. (line 30) -* fflush() function: I/O Functions. (line 25) +* fflush: I/O Functions. (line 25) * field numbers: Nonconstant Fields. (line 6) * field operator $: Fields. (line 19) * field operators, dollar sign as: Fields. (line 19) +* field separator, in multiline records: Multiple Line. (line 41) +* field separator, on command line: Command Line Field Separator. + (line 6) +* field separator, POSIX and: Field Splitting Summary. + (line 40) * field separators <1>: User-modified. (line 56) -* field separators: Field Separators. (line 14) -* field separators, choice of: Field Separators. (line 50) +* field separators: Field Separators. (line 15) +* field separators, choice of: Field Separators. (line 51) * field separators, FIELDWIDTHS variable and: User-modified. (line 35) * field separators, FPAT variable and: User-modified. (line 45) -* field separators, in multiline records: Multiple Line. (line 41) -* field separators, on command line: Command Line Field Separator. - (line 6) -* field separators, POSIX and <1>: Field Splitting Summary. - (line 40) * field separators, POSIX and: Fields. (line 6) * field separators, regular expressions as <1>: Regexp Field Splitting. (line 6) -* field separators, regular expressions as: Field Separators. (line 50) +* field separators, regular expressions as: Field Separators. (line 51) * field separators, See Also OFS: Changing Fields. (line 64) * field separators, spaces as: Cut Program. (line 109) * fields <1>: Basic High Level. (line 73) @@ -30475,16 +31355,16 @@ Index * fields, number of: Fields. (line 33) * fields, numbers: Nonconstant Fields. (line 6) * fields, printing: Print Examples. (line 21) -* fields, separating: Field Separators. (line 14) +* fields, separating: Field Separators. (line 15) * fields, single-character: Single Character Fields. (line 6) * FIELDWIDTHS variable <1>: User-modified. (line 35) -* FIELDWIDTHS variable: Constant Size. (line 22) +* FIELDWIDTHS variable: Constant Size. (line 23) * file descriptors: Special FD. (line 6) * file names, distinguishing: Auto-set. (line 52) * file names, in compatibility mode: Special Caveats. (line 9) * file names, standard streams in gawk: Special FD. (line 46) -* FILENAME variable <1>: Auto-set. (line 93) +* FILENAME variable <1>: Auto-set. (line 102) * FILENAME variable: Reading Files. (line 6) * FILENAME variable, getline, setting with: Getline Notes. (line 19) * filenames, assignments as: Ignoring Assigns. (line 6) @@ -30500,10 +31380,9 @@ Index * files, /inet/... (gawk): TCP/IP Networking. (line 6) * files, /inet4/... (gawk): TCP/IP Networking. (line 6) * files, /inet6/... (gawk): TCP/IP Networking. (line 6) -* files, as single records: Records. (line 200) * files, awk programs in: Long. (line 6) * files, awkprof.out: Profiling. (line 6) -* files, awkvars.out: Options. (line 107) +* files, awkvars.out: Options. (line 93) * files, closing: I/O Functions. (line 10) * files, descriptors, See file descriptors: Special FD. (line 6) * files, group: Group Functions. (line 6) @@ -30530,7 +31409,7 @@ Index * files, portable object template: Explaining gettext. (line 30) * files, portable object, converting to message object files: I18N Example. (line 62) -* files, portable object, generating: Options. (line 161) +* files, portable object, generating: Options. (line 147) * files, processing, ARGIND variable and: Auto-set. (line 47) * files, reading: Rewind Function. (line 6) * files, reading, multiline records: Multiple Line. (line 6) @@ -30539,34 +31418,40 @@ Index * files, source, search path for: Igawk Program. (line 368) * files, splitting: Split Program. (line 6) * files, Texinfo, extracting programs from: Extract Program. (line 6) +* find substring in string: String Functions. (line 151) +* finding extensions: Finding Extensions. (line 6) * finish debugger command: Debugger Execution Control. (line 39) -* Fish, Fred: Contributors. (line 51) -* fixed-width data: Constant Size. (line 9) +* Fish, Fred: Contributors. (line 50) +* fixed-width data: Constant Size. (line 10) * flag variables <1>: Tee Program. (line 20) * flag variables: Boolean Ops. (line 67) -* floating-point numbers, arbitrary precision: Arbitrary Precision Arithmetic. - (line 6) * floating-point, numbers <1>: Unexpected Results. (line 6) * floating-point, numbers: General Arithmetic. (line 6) -* fnmatch extension function: Extension Sample Fnmatch. +* floating-point, numbers, arbitrary precision: Arbitrary Precision Arithmetic. (line 6) -* FNR variable <1>: Auto-set. (line 103) +* floating-point, VAX/VMS: VMS Running. (line 51) +* flush buffered output: I/O Functions. (line 25) +* fnmatch() extension function: Extension Sample Fnmatch. + (line 12) +* FNR variable <1>: Auto-set. (line 112) * FNR variable: Records. (line 6) -* FNR variable, changing: Auto-set. (line 314) +* FNR variable, changing: Auto-set. (line 323) * for statement: For Statement. (line 6) * for statement, looping over arrays: Scanning an Array. (line 20) -* fork extension function: Extension Sample Fork. +* fork() extension function: Extension Sample Fork. (line 11) +* format specifiers: Basic Printf. (line 15) * format specifiers, mixing regular with positional specifiers: Printf Ordering. (line 57) * format specifiers, printf statement: Control Letters. (line 6) * format specifiers, strftime() function (gawk): Time Functions. - (line 88) -* format strings: Basic Printf. (line 15) + (line 89) +* format time string: Time Functions. (line 48) * formats, numeric output: OFMT. (line 6) * formatting output: Printf. (line 6) -* forward slash (/): Regexp. (line 10) +* formatting strings: String Functions. (line 378) +* forward slash (/) to enclose regular expressions: Regexp. (line 10) * forward slash (/), / operator: Precedence. (line 55) * forward slash (/), /= operator <1>: Precedence. (line 95) * forward slash (/), /= operator: Assignment Ops. (line 129) @@ -30575,34 +31460,36 @@ Index * forward slash (/), patterns and: Expression Patterns. (line 24) * FPAT variable <1>: User-modified. (line 45) * FPAT variable: Splitting By Content. - (line 26) + (line 27) * frame debugger command: Execution Stack. (line 25) * Free Documentation License (FDL): GNU Free Documentation License. - (line 6) -* Free Software Foundation (FSF) <1>: Glossary. (line 305) + (line 7) +* Free Software Foundation (FSF) <1>: Glossary. (line 297) * Free Software Foundation (FSF) <2>: Getting. (line 10) * Free Software Foundation (FSF): Manual History. (line 6) -* FreeBSD: Glossary. (line 624) +* FreeBSD: Glossary. (line 616) * FS variable <1>: User-modified. (line 56) -* FS variable: Field Separators. (line 14) +* FS variable: Field Separators. (line 15) * FS variable, --field-separator option and: Options. (line 21) * FS variable, as null string: Single Character Fields. (line 20) * FS variable, as TAB character: Options. (line 259) -* FS variable, changing value of: Field Separators. (line 34) +* FS variable, changing value of: Field Separators. (line 35) * FS variable, running awk programs and: Cut Program. (line 68) * FS variable, setting from command line: Command Line Field Separator. (line 6) * FS, containing ^: Regexp Field Splitting. (line 59) -* FSF (Free Software Foundation) <1>: Glossary. (line 305) +* FS, in multiline records: Multiple Line. (line 41) +* FSF (Free Software Foundation) <1>: Glossary. (line 297) * FSF (Free Software Foundation) <2>: Getting. (line 10) * FSF (Free Software Foundation): Manual History. (line 6) -* fts extension function: Extension Sample File Functions. +* fts() extension function: Extension Sample File Functions. (line 77) -* FUNCTAB array: Auto-set. (line 119) +* FUNCTAB array: Auto-set. (line 128) * function calls: Function Calls. (line 6) * function calls, indirect: Indirect Calls. (line 6) +* function definition example: Function Example. (line 6) * function pointers: Indirect Calls. (line 6) * functions, arrays as parameters to: Pass By Value/Reference. (line 47) @@ -30639,16 +31526,17 @@ Index * functions, undefined: Pass By Value/Reference. (line 71) * functions, user-defined: User-defined. (line 6) -* functions, user-defined, calling: Calling A Function. (line 6) -* functions, user-defined, counts: Profiling. (line 129) +* functions, user-defined, calling: Function Caveats. (line 6) +* functions, user-defined, counts, in a profile: Profiling. (line 137) * functions, user-defined, library of: Library Functions. (line 6) * functions, user-defined, next/nextfile statements and <1>: Nextfile Statement. (line 47) * functions, user-defined, next/nextfile statements and: Next Statement. (line 45) * G-d: Acknowledgments. (line 78) -* Garfinkle, Scott: Contributors. (line 35) -* gawk program, dynamic profiling: Profiling. (line 172) +* Garfinkle, Scott: Contributors. (line 34) +* gawk program, dynamic profiling: Profiling. (line 179) +* gawk version: Auto-set. (line 213) * gawk, ARGIND variable in: Other Arguments. (line 12) * gawk, awk and <1>: This Manual. (line 14) * gawk, awk and: Preface. (line 23) @@ -30657,7 +31545,7 @@ Index * gawk, built-in variables and: Built-in Variables. (line 14) * gawk, character classes and: Bracket Expressions. (line 90) * gawk, coding style in: Adding Code. (line 38) -* gawk, command-line options: GNU Regexp Operators. +* gawk, command-line options, and regular expressions: GNU Regexp Operators. (line 70) * gawk, comparison operators and: Comparison Operators. (line 50) @@ -30669,7 +31557,7 @@ Index * gawk, distribution: Distribution contents. (line 6) * gawk, ERRNO variable in <1>: TCP/IP Networking. (line 54) -* gawk, ERRNO variable in <2>: Auto-set. (line 73) +* gawk, ERRNO variable in <2>: Auto-set. (line 82) * gawk, ERRNO variable in <3>: BEGINFILE/ENDFILE. (line 26) * gawk, ERRNO variable in <4>: Close Files And Pipes. (line 138) @@ -30680,19 +31568,19 @@ Index * gawk, features, advanced: Advanced Features. (line 6) * gawk, field separators and: User-modified. (line 77) * gawk, FIELDWIDTHS variable in <1>: User-modified. (line 35) -* gawk, FIELDWIDTHS variable in: Constant Size. (line 22) +* gawk, FIELDWIDTHS variable in: Constant Size. (line 23) * gawk, file names in: Special Files. (line 6) * gawk, format-control characters: Control Letters. (line 18) * gawk, FPAT variable in <1>: User-modified. (line 45) * gawk, FPAT variable in: Splitting By Content. - (line 26) -* gawk, FUNCTAB array in: Auto-set. (line 119) + (line 27) +* gawk, FUNCTAB array in: Auto-set. (line 128) * gawk, function arguments and: Calling Built-in. (line 16) * gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42) * gawk, IGNORECASE variable in <1>: Array Sorting Functions. - (line 81) -* gawk, IGNORECASE variable in <2>: String Functions. (line 29) -* gawk, IGNORECASE variable in <3>: Array Intro. (line 92) + (line 83) +* gawk, IGNORECASE variable in <2>: String Functions. (line 48) +* gawk, IGNORECASE variable in <3>: Array Intro. (line 91) * gawk, IGNORECASE variable in <4>: User-modified. (line 82) * gawk, IGNORECASE variable in: Case-sensitivity. (line 26) * gawk, implementation issues: Notes. (line 6) @@ -30710,14 +31598,14 @@ Index * gawk, line continuation in: Conditional Exp. (line 34) * gawk, LINT variable in: User-modified. (line 98) * gawk, list of contributors to: Contributors. (line 6) -* gawk, MS-DOS version of: PC Using. (line 11) -* gawk, MS-Windows version of: PC Using. (line 11) +* gawk, MS-DOS version of: PC Using. (line 10) +* gawk, MS-Windows version of: PC Using. (line 10) * gawk, newlines in: Statements/Lines. (line 12) * gawk, octal numbers and: Nondecimal-numbers. (line 42) -* gawk, OS/2 version of: PC Using. (line 11) +* gawk, OS/2 version of: PC Using. (line 10) * gawk, PROCINFO array in <1>: Two-way I/O. (line 116) * gawk, PROCINFO array in <2>: Time Functions. (line 47) -* gawk, PROCINFO array in: Auto-set. (line 133) +* gawk, PROCINFO array in: Auto-set. (line 142) * gawk, regexp constants and: Using Constant Regexps. (line 28) * gawk, regular expressions, case sensitivity: Case-sensitivity. @@ -30725,16 +31613,14 @@ Index * gawk, regular expressions, operators: GNU Regexp Operators. (line 6) * gawk, regular expressions, precedence: Regexp Operators. (line 161) -* gawk, RT variable in <1>: Auto-set. (line 266) -* gawk, RT variable in <2>: Getline/Variable/File. - (line 10) -* gawk, RT variable in <3>: Multiple Line. (line 129) -* gawk, RT variable in: Records. (line 117) +* gawk, RT variable in <1>: Auto-set. (line 275) +* gawk, RT variable in <2>: Multiple Line. (line 129) +* gawk, RT variable in: Records. (line 132) * gawk, See Also awk: Preface. (line 36) * gawk, source code, obtaining: Getting. (line 6) -* gawk, splitting fields and: Constant Size. (line 87) +* gawk, splitting fields and: Constant Size. (line 88) * gawk, string-translation functions: I18N Functions. (line 6) -* gawk, SYMTAB array in: Auto-set. (line 274) +* gawk, SYMTAB array in: Auto-set. (line 283) * gawk, TEXTDOMAIN variable in: User-modified. (line 162) * gawk, timestamps: Time Functions. (line 6) * gawk, uses for: Preface. (line 36) @@ -30742,11 +31628,13 @@ Index * gawk, VMS version of: VMS Installation. (line 6) * gawk, word-boundary operator: GNU Regexp Operators. (line 63) +* gawkextlib: gawkextlib. (line 6) * gawkextlib project: gawkextlib. (line 6) -* General Public License (GPL): Glossary. (line 314) +* General Public License (GPL): Glossary. (line 306) * General Public License, See GPL: Manual History. (line 11) -* gensub() function (gawk) <1>: String Functions. (line 86) -* gensub() function (gawk): Using Constant Regexps. +* generate time values: Time Functions. (line 25) +* gensub <1>: String Functions. (line 82) +* gensub: Using Constant Regexps. (line 43) * gensub() function (gawk), escape processing: Gory Details. (line 6) * getaddrinfo() function (C library): TCP/IP Networking. (line 38) @@ -30771,6 +31659,8 @@ Index * getline command, FILENAME variable and: Getline Notes. (line 19) * getline command, return values: Getline. (line 19) * getline command, variants: Getline Summary. (line 6) +* getline from a file: Getline/File. (line 6) +* getline into a variable: Getline/Variable. (line 6) * getline statement, BEGINFILE/ENDFILE patterns and: BEGINFILE/ENDFILE. (line 54) * getlocaltime() user-defined function: Getlocaltime Function. @@ -30786,95 +31676,105 @@ Index * gettext library: Explaining gettext. (line 6) * gettext library, locale categories: Explaining gettext. (line 80) * gettext() function (C library): Explaining gettext. (line 62) -* gettimeofday extension function: Extension Sample Time. +* gettimeofday() extension function: Extension Sample Time. (line 13) -* GMP: Arbitrary Precision Arithmetic. - (line 6) +* git utility <1>: Adding Code. (line 111) +* git utility <2>: Accessing The Source. + (line 10) +* git utility <3>: Other Versions. (line 29) +* git utility: gawkextlib. (line 29) +* git, use of for gawk source code: Derived Files. (line 6) +* GMP: Gawk and MPFR. (line 6) * GNITS mailing list: Acknowledgments. (line 52) * GNU awk, See gawk: Preface. (line 49) * GNU Free Documentation License: GNU Free Documentation License. - (line 6) -* GNU General Public License: Glossary. (line 314) -* GNU Lesser General Public License: Glossary. (line 405) + (line 7) +* GNU General Public License: Glossary. (line 306) +* GNU Lesser General Public License: Glossary. (line 397) * GNU long options <1>: Options. (line 6) * GNU long options: Command Line. (line 13) -* GNU long options, printing list of: Options. (line 168) -* GNU Project <1>: Glossary. (line 323) +* GNU long options, printing list of: Options. (line 154) +* GNU Project <1>: Glossary. (line 315) * GNU Project: Manual History. (line 11) -* GNU/Linux <1>: Glossary. (line 624) +* GNU/Linux <1>: Glossary. (line 616) * GNU/Linux <2>: I18N Example. (line 55) * GNU/Linux: Manual History. (line 28) -* GPL (General Public License) <1>: Glossary. (line 314) +* Gordon, Assaf: Contributors. (line 105) +* GPL (General Public License) <1>: Glossary. (line 306) * GPL (General Public License): Manual History. (line 11) -* GPL (General Public License), printing: Options. (line 102) +* GPL (General Public License), printing: Options. (line 88) * grcat program: Group Functions. (line 16) -* Grigera, Juan: Contributors. (line 58) +* Grigera, Juan: Contributors. (line 57) * group database, reading: Group Functions. (line 6) * group file: Group Functions. (line 6) +* group ID of gawk user: Auto-set. (line 186) * groups, information about: Group Functions. (line 6) -* gsub() function <1>: String Functions. (line 139) -* gsub() function: Using Constant Regexps. +* gsub <1>: String Functions. (line 135) +* gsub: Using Constant Regexps. (line 43) -* gsub() function, arguments of: String Functions. (line 464) +* gsub() function, arguments of: String Functions. (line 460) * gsub() function, escape processing: Gory Details. (line 6) * h debugger command (alias for help): Miscellaneous Debugger Commands. (line 66) -* Hankerson, Darrel <1>: Contributors. (line 61) +* Hankerson, Darrel <1>: Contributors. (line 60) * Hankerson, Darrel: Acknowledgments. (line 60) -* Haque, John: Contributors. (line 104) +* Haque, John: Contributors. (line 108) * Hartholz, Elaine: Acknowledgments. (line 38) * Hartholz, Marshall: Acknowledgments. (line 38) -* Hasegawa, Isamu: Contributors. (line 95) +* Hasegawa, Isamu: Contributors. (line 94) * help debugger command: Miscellaneous Debugger Commands. (line 66) * hexadecimal numbers: Nondecimal-numbers. (line 6) * hexadecimal values, enabling interpretation of: Options. (line 207) +* history expansion, in debugger: Readline Support. (line 6) * histsort.awk program: History Sorting. (line 25) * Hughes, Phil: Acknowledgments. (line 43) -* HUP signal: Profiling. (line 204) +* HUP signal, for dynamic profiling: Profiling. (line 211) * hyphen (-), - operator: Precedence. (line 52) * hyphen (-), -- operator <1>: Precedence. (line 46) * hyphen (-), -- operator: Increment Ops. (line 48) * hyphen (-), -= operator <1>: Precedence. (line 95) * hyphen (-), -= operator: Assignment Ops. (line 129) -* hyphen (-), filenames beginning with: Options. (line 73) +* hyphen (-), filenames beginning with: Options. (line 59) * hyphen (-), in bracket expressions: Bracket Expressions. (line 17) * i debugger command (alias for info): Debugger Info. (line 13) * id utility: Id Program. (line 6) * id.awk program: Id Program. (line 30) * IEEE-754 format: Floating-point Representation. (line 6) -* if statement <1>: If Statement. (line 6) -* if statement: Regexp Usage. (line 19) +* if statement: If Statement. (line 6) * if statement, actions, changing: Ranges. (line 25) +* if statement, use of regexps in: Regexp Usage. (line 19) * igawk.sh program: Igawk Program. (line 124) +* ignore breakpoint: Breakpoint Control. (line 87) * ignore debugger command: Breakpoint Control. (line 87) -* IGNORECASE variable <1>: Array Sorting Functions. - (line 81) -* IGNORECASE variable <2>: String Functions. (line 29) -* IGNORECASE variable <3>: Array Intro. (line 92) -* IGNORECASE variable <4>: User-modified. (line 82) -* IGNORECASE variable: Case-sensitivity. (line 26) -* IGNORECASE variable, array sorting and: Array Sorting Functions. - (line 81) -* IGNORECASE variable, array subscripts and: Array Intro. (line 92) +* IGNORECASE variable: User-modified. (line 82) +* IGNORECASE variable, and array indices: Array Intro. (line 91) +* IGNORECASE variable, and array sorting functions: Array Sorting Functions. + (line 83) * IGNORECASE variable, in example programs: Library Functions. (line 53) +* IGNORECASE variable, with ~ and !~ operators: Case-sensitivity. + (line 26) +* Illumos: Other Versions. (line 104) +* Illumos, POSIX-compliant awk: Other Versions. (line 104) * implementation issues, gawk: Notes. (line 6) * implementation issues, gawk, debugging: Compatibility Mode. (line 6) * implementation issues, gawk, limits <1>: Redirection. (line 135) * implementation issues, gawk, limits: Getline Notes. (line 14) -* in operator <1>: Id Program. (line 93) -* in operator <2>: Scanning an Array. (line 17) -* in operator <3>: Reference to Elements. - (line 37) -* in operator <4>: For Statement. (line 75) -* in operator <5>: Precedence. (line 83) +* in operator <1>: For Statement. (line 75) +* in operator <2>: Precedence. (line 83) * in operator: Comparison Operators. (line 11) +* in operator, index existence in multidimensional arrays: Multidimensional. + (line 43) +* in operator, order of array access: Scanning an Array. (line 48) +* in operator, testing if array element exists: Reference to Elements. + (line 37) +* in operator, use in loops: Scanning an Array. (line 17) * increment operators: Increment Ops. (line 6) -* index() function: String Functions. (line 155) -* indexing arrays: Array Intro. (line 50) +* index: String Functions. (line 151) +* indexing arrays: Array Intro. (line 49) * indirect function calls: Indirect Calls. (line 6) * infinite precision: Arbitrary Precision Arithmetic. (line 6) @@ -30890,7 +31790,8 @@ Index * input files, reading: Reading Files. (line 6) * input files, running awk without: Read Terminal. (line 6) * input files, variable assignments and: Other Arguments. (line 19) -* input pipeline: Getline/Pipe. (line 10) +* input pipeline: Getline/Pipe. (line 9) +* input record, length of: String Functions. (line 171) * input redirection: Getline/File. (line 6) * input, data, nondecimal: Nondecimal Data. (line 6) * input, explicit: Getline. (line 6) @@ -30899,17 +31800,21 @@ Index * input, splitting into records: Records. (line 6) * input, standard <1>: Special FD. (line 6) * input, standard: Read Terminal. (line 6) +* input/output functions: I/O Functions. (line 6) * input/output, binary: User-modified. (line 10) * input/output, from BEGIN and END: I/O And BEGIN/END. (line 6) * input/output, two-way: Two-way I/O. (line 44) * insomnia, cure for: Alarm Program. (line 6) * installation, VMS: VMS Installation. (line 6) * installing gawk: Installation. (line 6) -* INT signal (MS-Windows): Profiling. (line 207) -* int() function: Numeric Functions. (line 23) -* integer, arbitrary precision: Arbitrary Precision Integers. - (line 6) +* instruction tracing, in debugger: Debugger Info. (line 89) +* int: Numeric Functions. (line 23) +* INT signal (MS-Windows): Profiling. (line 214) +* integer array indices: Numeric Array Subscripts. + (line 31) * integers: General Arithmetic. (line 6) +* integers, arbitrary precision: Arbitrary Precision Integers. + (line 6) * integers, unsigned: General Arithmetic. (line 15) * interacting with other programs: I/O Functions. (line 72) * internationalization <1>: I18N and L10N. (line 6) @@ -30928,39 +31833,43 @@ Index * internationalization, localization, portability and: I18N Portability. (line 6) * internationalizing a program: Explaining gettext. (line 6) -* interpreted programs <1>: Glossary. (line 365) +* interpreted programs <1>: Glossary. (line 357) * interpreted programs: Basic High Level. (line 15) -* interval expressions: Regexp Operators. (line 116) +* interval expressions, regexp operator: Regexp Operators. (line 116) * inventory-shipped file: Sample Data Files. (line 32) -* isarray() function (gawk): Type Functions. (line 11) -* ISO: Glossary. (line 376) -* ISO 8859-1: Glossary. (line 141) -* ISO Latin-1: Glossary. (line 141) +* invoke shell command: I/O Functions. (line 72) +* isarray: Type Functions. (line 11) +* ISO: Glossary. (line 368) +* ISO 8859-1: Glossary. (line 133) +* ISO Latin-1: Glossary. (line 133) * Jacobs, Andrew: Passwd Functions. (line 90) -* Jaegermann, Michal <1>: Contributors. (line 46) +* Jaegermann, Michal <1>: Contributors. (line 45) * Jaegermann, Michal: Acknowledgments. (line 60) -* Java implementation of awk: Other Versions. (line 106) -* Java programming language: Glossary. (line 388) -* jawk: Other Versions. (line 106) +* Java implementation of awk: Other Versions. (line 112) +* Java programming language: Glossary. (line 380) +* jawk: Other Versions. (line 112) * Jedi knights: Undocumented. (line 6) * join() user-defined function: Join Function. (line 18) -* Kahrs, Ju"rgen <1>: Contributors. (line 71) +* Kahrs, Ju"rgen <1>: Contributors. (line 70) * Kahrs, Ju"rgen: Acknowledgments. (line 60) * Kasal, Stepan: Acknowledgments. (line 60) * Kenobi, Obi-Wan: Undocumented. (line 6) -* Kernighan, Brian <1>: Basic Data Typing. (line 55) -* Kernighan, Brian <2>: Other Versions. (line 13) -* Kernighan, Brian <3>: Contributors. (line 12) -* Kernighan, Brian <4>: BTL. (line 6) -* Kernighan, Brian <5>: Concatenation. (line 6) -* Kernighan, Brian <6>: Acknowledgments. (line 72) -* Kernighan, Brian <7>: Conventions. (line 34) +* Kernighan, Brian <1>: Glossary. (line 143) +* Kernighan, Brian <2>: Basic Data Typing. (line 55) +* Kernighan, Brian <3>: Other Versions. (line 13) +* Kernighan, Brian <4>: Contributors. (line 11) +* Kernighan, Brian <5>: BTL. (line 6) +* Kernighan, Brian <6>: Library Functions. (line 12) +* Kernighan, Brian <7>: Concatenation. (line 6) +* Kernighan, Brian <8>: Getline/Pipe. (line 6) +* Kernighan, Brian <9>: Acknowledgments. (line 72) +* Kernighan, Brian <10>: Conventions. (line 34) * Kernighan, Brian: History. (line 17) -* kill command, dynamic profiling: Profiling. (line 181) +* kill command, dynamic profiling: Profiling. (line 188) * Knights, jedi: Undocumented. (line 6) * Knuth, Donald: Arbitrary Precision Arithmetic. (line 6) -* Kwok, Conrad: Contributors. (line 35) +* Kwok, Conrad: Contributors. (line 34) * l debugger command (alias for list): Miscellaneous Debugger Commands. (line 72) * labels.awk program: Labels Program. (line 51) @@ -30983,12 +31892,15 @@ Index * left angle bracket (<), <= operator <1>: Precedence. (line 65) * left angle bracket (<), <= operator: Comparison Operators. (line 11) +* left shift: Bitwise Functions. (line 46) * left shift, bitwise: Bitwise Functions. (line 32) * leftmost longest match: Multiple Line. (line 26) -* length() function: String Functions. (line 168) -* Lesser General Public License (LGPL): Glossary. (line 405) -* LGPL (Lesser General Public License): Glossary. (line 405) -* libmawk: Other Versions. (line 114) +* length: String Functions. (line 164) +* length of input record: String Functions. (line 171) +* length of string: String Functions. (line 164) +* Lesser General Public License (LGPL): Glossary. (line 397) +* LGPL (Lesser General Public License): Glossary. (line 397) +* libmawk: Other Versions. (line 120) * libraries of awk functions: Library Functions. (line 6) * libraries of awk functions, assertions: Assert Function. (line 6) * libraries of awk functions, associative arrays and: Library Names. @@ -31032,50 +31944,66 @@ Index * lint checking, undefined functions: Pass By Value/Reference. (line 88) * LINT variable: User-modified. (line 98) -* Linux <1>: Glossary. (line 624) +* Linux <1>: Glossary. (line 616) * Linux <2>: I18N Example. (line 55) * Linux: Manual History. (line 28) +* list all global variables, in debugger: Debugger Info. (line 48) * list debugger command: Miscellaneous Debugger Commands. (line 72) +* list function definitions, in debugger: Debugger Info. (line 30) * loading, library: Options. (line 173) -* local variables: Variable Scope. (line 6) +* local variables, in a function: Variable Scope. (line 6) * locale categories: Explaining gettext. (line 80) * locale decimal point character: Options. (line 263) * locale, definition of: Locales. (line 6) * localization: I18N and L10N. (line 6) * localization, See internationalization, localization: I18N and L10N. (line 6) +* log: Numeric Functions. (line 30) * log files, timestamps in: Time Functions. (line 6) -* log() function: Numeric Functions. (line 30) +* logarithm: Numeric Functions. (line 30) * logical false/true: Truth Values. (line 6) * logical operators, See Boolean expressions: Boolean Ops. (line 6) * login information: Passwd Functions. (line 16) * long options: Command Line. (line 13) * loops: While Statement. (line 6) +* loops, break statement and: Break Statement. (line 6) * loops, continue statements and: For Statement. (line 64) -* loops, count for header: Profiling. (line 123) +* loops, count for header, in a profile: Profiling. (line 131) +* loops, do-while: Do Statement. (line 6) * loops, exiting: Break Statement. (line 6) +* loops, for, array scanning: Scanning an Array. (line 6) +* loops, for, iterative: For Statement. (line 6) * loops, See Also while statement: While Statement. (line 6) +* loops, while: While Statement. (line 6) * ls utility: More Complex. (line 15) -* lshift() function (gawk): Bitwise Functions. (line 46) +* lshift: Bitwise Functions. (line 46) * lvalues/rvalues: Assignment Ops. (line 32) +* mail-list file: Sample Data Files. (line 6) * mailing labels, printing: Labels Program. (line 6) * mailing list, GNITS: Acknowledgments. (line 52) +* Malmberg, John <1>: Bugs. (line 70) * Malmberg, John: Acknowledgments. (line 60) * mark parity: Ordinal Functions. (line 45) * marked string extraction (internationalization): String Extraction. (line 6) * marked strings, extracting: String Extraction. (line 6) * Marx, Groucho: Increment Ops. (line 60) -* match() function: String Functions. (line 208) +* match: String Functions. (line 204) +* match regexp in string: String Functions. (line 204) * match() function, RSTART/RLENGTH variables: String Functions. - (line 225) + (line 221) * matching, expressions, See comparison expressions: Typing and Comparison. (line 9) * matching, leftmost longest: Multiple Line. (line 26) * matching, null strings: Gory Details. (line 164) -* mawk program: Other Versions. (line 44) -* McPhee, Patrick: Contributors. (line 101) +* mawk utility <1>: Other Versions. (line 44) +* mawk utility <2>: Nextfile Statement. (line 47) +* mawk utility <3>: Concatenation. (line 36) +* mawk utility <4>: Getline/Pipe. (line 62) +* mawk utility: Escape Sequences. (line 124) +* maximum precision supported by MPFR library: Auto-set. (line 227) +* McPhee, Patrick: Contributors. (line 100) * message object files: Explaining gettext. (line 41) * message object files, converting from portable object files: I18N Example. (line 62) @@ -31083,15 +32011,18 @@ Index (line 47) * message object files, specifying directory of: Explaining gettext. (line 53) +* messages from extensions: Printing Messages. (line 6) +* metacharacters in regular expressions: Regexp Operators. (line 6) * metacharacters, escape sequences for: Escape Sequences. (line 130) -* mktime() function (gawk): Time Functions. (line 25) +* minimum precision supported by MPFR library: Auto-set. (line 230) +* mktime: Time Functions. (line 25) * modifiers, in format specifiers: Format Modifiers. (line 6) * monetary information, localization: Explaining gettext. (line 103) -* MPFR: Arbitrary Precision Arithmetic. - (line 6) +* MPFR: Gawk and MPFR. (line 6) * msgfmt utility: I18N Example. (line 62) * multiple precision: Arbitrary Precision Arithmetic. (line 6) +* multiple-line records: Multiple Line. (line 6) * n debugger command (alias for next): Debugger Execution Control. (line 43) * names, arrays/variables <1>: Library Names. (line 6) @@ -31103,7 +32034,7 @@ Index * namespace issues, functions: Definition Syntax. (line 20) * nawk utility: Names. (line 17) * negative zero: Unexpected Results. (line 34) -* NetBSD: Glossary. (line 624) +* NetBSD: Glossary. (line 616) * networks, programming: TCP/IP Networking. (line 6) * networks, support for: Special Network. (line 6) * newlines <1>: Boolean Ops. (line 67) @@ -31120,6 +32051,7 @@ Index (line 19) * next debugger command: Debugger Execution Control. (line 43) +* next file statement: Feature History. (line 168) * next statement <1>: Next Statement. (line 6) * next statement: Boolean Ops. (line 85) * next statement, BEGIN/END patterns and: I/O And BEGIN/END. (line 37) @@ -31135,27 +32067,31 @@ Index (line 47) * nexti debugger command: Debugger Execution Control. (line 49) -* NF variable <1>: Auto-set. (line 108) +* NF variable <1>: Auto-set. (line 117) * NF variable: Fields. (line 33) * NF variable, decrementing: Changing Fields. (line 107) * ni debugger command (alias for nexti): Debugger Execution Control. (line 49) * noassign.awk program: Ignoring Assigns. (line 15) +* non-existent array elements: Reference to Elements. + (line 23) * not Boolean-logic operator: Boolean Ops. (line 6) -* NR variable <1>: Auto-set. (line 128) +* NR variable <1>: Auto-set. (line 137) * NR variable: Records. (line 6) -* NR variable, changing: Auto-set. (line 314) +* NR variable, changing: Auto-set. (line 323) * null strings <1>: Basic Data Typing. (line 26) * null strings <2>: Truth Values. (line 6) * null strings <3>: Regexp Field Splitting. (line 43) -* null strings: Records. (line 107) -* null strings, array elements and: Delete. (line 27) +* null strings: Records. (line 122) +* null strings in gawk arguments, quoting and: Quoting. (line 62) +* null strings, and deleting array elements: Delete. (line 27) * null strings, as array subscripts: Uninitialized Subscripts. (line 43) * null strings, converting numbers to strings: Conversion. (line 21) * null strings, matching: Gory Details. (line 164) -* null strings, quoting and: Quoting. (line 62) +* number as string of bits: Bitwise Functions. (line 109) +* number of array elements: String Functions. (line 194) * number sign (#), #! (executable scripts): Executable Scripts. (line 6) * number sign (#), commenting: Comments. (line 6) @@ -31170,9 +32106,9 @@ Index * numbers, floating-point: General Arithmetic. (line 6) * numbers, hexadecimal: Nondecimal-numbers. (line 6) * numbers, octal: Nondecimal-numbers. (line 6) -* numbers, random: Numeric Functions. (line 64) * numbers, rounding: Round Function. (line 6) -* numeric, constants: Scalar Constants. (line 6) +* numeric constants: Scalar Constants. (line 6) +* numeric functions: Numeric Functions. (line 6) * numeric, output format: OFMT. (line 6) * numeric, strings: Variable Typing. (line 6) * o debugger command (alias for option): Debugger Info. (line 57) @@ -31187,7 +32123,7 @@ Index * OFS variable <1>: User-modified. (line 124) * OFS variable <2>: Output Separators. (line 6) * OFS variable: Changing Fields. (line 64) -* OpenBSD: Glossary. (line 624) +* OpenBSD: Glossary. (line 616) * OpenSolaris: Other Versions. (line 96) * operating systems, BSD-based: Manual History. (line 28) * operating systems, PC, gawk on: PC Using. (line 6) @@ -31207,7 +32143,7 @@ Index * operators, input/output <1>: Precedence. (line 65) * operators, input/output <2>: Redirection. (line 22) * operators, input/output <3>: Getline/Coprocess. (line 6) -* operators, input/output <4>: Getline/Pipe. (line 10) +* operators, input/output <4>: Getline/Pipe. (line 9) * operators, input/output: Getline/File. (line 6) * operators, logical, See Boolean expressions: Boolean Ops. (line 6) * operators, precedence <1>: Precedence. (line 6) @@ -31215,37 +32151,35 @@ Index * operators, relational, See operators, comparison: Typing and Comparison. (line 9) * operators, short-circuit: Boolean Ops. (line 57) -* operators, string: Concatenation. (line 9) +* operators, string: Concatenation. (line 8) * operators, string-matching: Regexp Usage. (line 19) * operators, string-matching, for buffers: GNU Regexp Operators. (line 48) * operators, word-boundary (gawk): GNU Regexp Operators. (line 63) * option debugger command: Debugger Info. (line 57) -* options, command-line <1>: Command Line Field Separator. - (line 6) -* options, command-line <2>: Options. (line 6) -* options, command-line: Long. (line 12) -* options, command-line, end of: Options. (line 68) +* options, command-line: Options. (line 6) +* options, command-line, end of: Options. (line 54) * options, command-line, invoking awk: Command Line. (line 6) * options, command-line, processing: Getopt Function. (line 6) * options, deprecated: Obsolete. (line 6) * options, long <1>: Options. (line 6) * options, long: Command Line. (line 13) -* options, printing list of: Options. (line 168) +* options, printing list of: Options. (line 154) +* or: Bitwise Functions. (line 49) * OR bitwise operation: Bitwise Functions. (line 6) * or Boolean-logic operator: Boolean Ops. (line 6) -* or() function (gawk): Bitwise Functions. (line 49) -* ord extension function: Extension Sample Ord. +* ord() extension function: Extension Sample Ord. (line 12) * ord() user-defined function: Ordinal Functions. (line 16) -* order of evaluation, concatenation: Concatenation. (line 42) +* order of evaluation, concatenation: Concatenation. (line 41) * ORS variable <1>: User-modified. (line 129) * ORS variable: Output Separators. (line 20) * output field separator, See OFS variable: Changing Fields. (line 64) * output record separator, See ORS variable: Output Separators. (line 20) * output redirection: Redirection. (line 6) +* output wrapper: Output Wrappers. (line 6) * output, buffering: I/O Functions. (line 29) * output, duplicating into files: Tee Program. (line 6) * output, files, closing: Close Files And Pipes. @@ -31258,45 +32192,48 @@ Index * output, standard: Special FD. (line 6) * p debugger command (alias for print): Viewing And Changing Data. (line 36) -* P1003.1 POSIX standard: Glossary. (line 462) -* parentheses () <1>: Profiling. (line 138) -* parentheses (): Regexp Operators. (line 79) +* P1003.1 POSIX standard: Glossary. (line 454) +* parent process ID of gawk process: Auto-set. (line 195) +* parentheses (), in a profile: Profiling. (line 146) +* parentheses (), regexp operator: Regexp Operators. (line 79) * password file: Passwd Functions. (line 16) -* patsplit() function (gawk): String Functions. (line 295) +* patsplit: String Functions. (line 291) * patterns: Patterns and Actions. (line 6) * patterns, comparison expressions as: Expression Patterns. (line 14) -* patterns, counts: Profiling. (line 110) +* patterns, counts, in a profile: Profiling. (line 118) * patterns, default: Very Simple. (line 34) * patterns, empty: Empty. (line 6) * patterns, expressions as: Regexp Patterns. (line 6) * patterns, ranges in: Ranges. (line 6) -* patterns, regexp constants as: Expression Patterns. (line 36) +* patterns, regexp constants as: Expression Patterns. (line 34) * patterns, types of: Pattern Overview. (line 15) * pawk (profiling version of Brian Kernighan's awk): Other Versions. (line 78) -* pawk, awk-like facilities for Python: Other Versions. (line 118) +* pawk, awk-like facilities for Python: Other Versions. (line 124) * PC operating systems, gawk on: PC Using. (line 6) * PC operating systems, gawk on, installing: PC Installation. (line 6) * percent sign (%), % operator: Precedence. (line 55) * percent sign (%), %= operator <1>: Precedence. (line 95) * percent sign (%), %= operator: Assignment Ops. (line 129) -* period (.): Regexp Operators. (line 43) +* period (.), regexp operator: Regexp Operators. (line 43) * Perl: Future Extensions. (line 6) -* Peters, Arno: Contributors. (line 86) -* Peterson, Hal: Contributors. (line 40) -* pipes, closing: Close Files And Pipes. +* Peters, Arno: Contributors. (line 85) +* Peterson, Hal: Contributors. (line 39) +* pipe, closing: Close Files And Pipes. (line 6) -* pipes, input: Getline/Pipe. (line 10) -* pipes, output: Redirection. (line 57) -* Pitts, Dave <1>: Bugs. (line 73) +* pipe, input: Getline/Pipe. (line 9) +* pipe, output: Redirection. (line 57) +* Pitts, Dave <1>: Bugs. (line 70) * Pitts, Dave: Acknowledgments. (line 60) -* plus sign (+): Regexp Operators. (line 102) +* Plauger, P.J.: Library Functions. (line 12) +* plug-in: Extension Intro. (line 6) * plus sign (+), + operator: Precedence. (line 52) * plus sign (+), ++ operator <1>: Precedence. (line 46) * plus sign (+), ++ operator: Increment Ops. (line 11) * plus sign (+), += operator <1>: Precedence. (line 95) * plus sign (+), += operator: Assignment Ops. (line 82) +* plus sign (+), regexp operator: Regexp Operators. (line 102) * pointers to functions: Indirect Calls. (line 6) * portability: Escape Sequences. (line 94) * portability, #! (executable scripts): Executable Scripts. (line 33) @@ -31308,14 +32245,14 @@ Index (line 112) * portability, close() function and: Close Files And Pipes. (line 81) -* portability, data files as single record: Records. (line 179) +* portability, data files as single record: Records. (line 194) * portability, deleting array elements: Delete. (line 56) * portability, example programs: Library Functions. (line 42) * portability, functions, defining: Definition Syntax. (line 99) * portability, gawk: New Ports. (line 6) * portability, gettext library and: Explaining gettext. (line 10) * portability, internationalization and: I18N Portability. (line 6) -* portability, length() function: String Functions. (line 177) +* portability, length() function: String Functions. (line 173) * portability, new awk vs. old awk: Conversion. (line 55) * portability, next statement in user-defined functions: Pass By Value/Reference. (line 91) @@ -31323,12 +32260,12 @@ Index * portability, operators: Increment Ops. (line 60) * portability, operators, not in POSIX awk: Precedence. (line 98) * portability, POSIXLY_CORRECT environment variable: Options. (line 353) -* portability, substr() function: String Functions. (line 514) +* portability, substr() function: String Functions. (line 510) * portable object files <1>: Translator i18n. (line 6) * portable object files: Explaining gettext. (line 36) * portable object files, converting to message object files: I18N Example. (line 62) -* portable object files, generating: Options. (line 161) +* portable object files, generating: Options. (line 147) * portable object template files: Explaining gettext. (line 30) * porting gawk: New Ports. (line 6) * positional specifiers, printf statement <1>: Printf Ordering. @@ -31353,14 +32290,14 @@ Index * POSIX awk, changes in awk versions: POSIX. (line 6) * POSIX awk, continue statement and: Continue Statement. (line 43) * POSIX awk, CONVFMT variable and: User-modified. (line 28) -* POSIX awk, date utility and: Time Functions. (line 262) +* POSIX awk, date utility and: Time Functions. (line 263) * POSIX awk, field separators and <1>: Field Splitting Summary. (line 40) * POSIX awk, field separators and: Fields. (line 6) * POSIX awk, FS variable and: User-modified. (line 66) * POSIX awk, function keyword in: Definition Syntax. (line 83) * POSIX awk, functions and, gsub()/sub(): Gory Details. (line 54) -* POSIX awk, functions and, length(): String Functions. (line 177) +* POSIX awk, functions and, length(): String Functions. (line 173) * POSIX awk, GNU long options and: Options. (line 15) * POSIX awk, interval expressions in: Regexp Operators. (line 135) * POSIX awk, next/nextfile statements and: Next Statement. (line 45) @@ -31371,7 +32308,7 @@ Index * POSIX awk, printf format strings and: Format Modifiers. (line 159) * POSIX awk, regular expressions and: Regexp Operators. (line 161) * POSIX awk, timestamps and: Time Functions. (line 6) -* POSIX awk, | I/O operator and: Getline/Pipe. (line 56) +* POSIX awk, | I/O operator and: Getline/Pipe. (line 55) * POSIX mode: Options. (line 247) * POSIX, awk and: Preface. (line 23) * POSIX, gawk extensions not included in: POSIX/GNU. (line 6) @@ -31393,6 +32330,8 @@ Index * print statement, See Also redirection, of output: Redirection. (line 17) * print statement, sprintf() function and: Round Function. (line 6) +* print variables, in debugger: Viewing And Changing Data. + (line 36) * printf debugger command: Viewing And Changing Data. (line 54) * printf statement <1>: Printf. (line 6) @@ -31412,22 +32351,30 @@ Index * printf statement, sprintf() function and: Round Function. (line 6) * printf statement, syntax of: Basic Printf. (line 6) * printing: Printing. (line 6) -* printing, list of options: Options. (line 168) +* printing messages from extensions: Printing Messages. (line 6) +* printing, list of options: Options. (line 154) * printing, mailing labels: Labels Program. (line 6) * printing, unduplicated lines of text: Uniq Program. (line 6) * printing, user information: Id Program. (line 6) * private variables: Library Names. (line 11) +* process group idIDof gawk process: Auto-set. (line 189) +* process ID of gawk process: Auto-set. (line 192) * processes, two-way communications with: Two-way I/O. (line 23) * processing data: Basic High Level. (line 6) -* PROCINFO array <1>: Two-way I/O. (line 116) -* PROCINFO array <2>: Id Program. (line 15) -* PROCINFO array <3>: Group Functions. (line 6) -* PROCINFO array <4>: Passwd Functions. (line 6) -* PROCINFO array <5>: Time Functions. (line 47) -* PROCINFO array <6>: Auto-set. (line 133) -* PROCINFO array: Obsolete. (line 11) +* PROCINFO array <1>: Passwd Functions. (line 6) +* PROCINFO array <2>: Time Functions. (line 47) +* PROCINFO array: Auto-set. (line 142) +* PROCINFO array, and communications via ptys: Two-way I/O. (line 116) +* PROCINFO array, and group membership: Group Functions. (line 6) +* PROCINFO array, and user and group ID numbers: Id Program. (line 15) +* PROCINFO array, testing the field splitting: Passwd Functions. + (line 161) +* PROCINFO array, uses: Auto-set. (line 248) +* PROCINFO, values of sorted_in: Controlling Scanning. + (line 24) * profiling awk programs: Profiling. (line 6) -* profiling awk programs, dynamically: Profiling. (line 172) +* profiling awk programs, dynamically: Profiling. (line 179) +* program identifiers: Auto-set. (line 160) * program, definition of: Getting Started. (line 21) * programmers, attractiveness of: Two-way I/O. (line 6) * programming conventions, --non-decimal-data option: Nondecimal Data. @@ -31445,34 +32392,34 @@ Index * programming conventions, private variable names: Library Names. (line 23) * programming language, recipe for: History. (line 6) -* Programming languages, Ada: Glossary. (line 20) +* programming languages, Ada: Glossary. (line 20) * programming languages, data-driven vs. procedural: Getting Started. (line 12) -* Programming languages, Java: Glossary. (line 388) +* programming languages, Java: Glossary. (line 380) * programming, basic steps: Basic High Level. (line 20) * programming, concepts: Basic Concepts. (line 6) * pwcat program: Passwd Functions. (line 23) * q debugger command (alias for quit): Miscellaneous Debugger Commands. (line 99) -* QSE Awk: Other Versions. (line 124) -* question mark (?) regexp operator <1>: GNU Regexp Operators. - (line 59) -* question mark (?) regexp operator: Regexp Operators. (line 111) +* QSE Awk: Other Versions. (line 130) +* Quanstrom, Erik: Alarm Program. (line 8) * question mark (?), ?: operator: Precedence. (line 92) -* QuikTrim Awk: Other Versions. (line 128) +* question mark (?), regexp operator <1>: GNU Regexp Operators. + (line 59) +* question mark (?), regexp operator: Regexp Operators. (line 111) +* QuikTrim Awk: Other Versions. (line 134) * quit debugger command: Miscellaneous Debugger Commands. (line 99) -* QUIT signal (MS-Windows): Profiling. (line 207) -* quoting <1>: Comments. (line 27) -* quoting <2>: Long. (line 26) -* quoting: Read Terminal. (line 25) -* quoting, rules for: Quoting. (line 6) -* quoting, tricks for: Quoting. (line 71) +* QUIT signal (MS-Windows): Profiling. (line 214) +* quoting in gawk command lines: Long. (line 26) +* quoting in gawk command lines, tricks for: Quoting. (line 71) +* quoting, for small awk programs: Comments. (line 27) * r debugger command (alias for run): Debugger Execution Control. (line 62) * Rakitzis, Byron: History Sorting. (line 25) +* Ramey, Chet <1>: General Data Types. (line 6) * Ramey, Chet: Acknowledgments. (line 60) -* rand() function: Numeric Functions. (line 34) +* rand: Numeric Functions. (line 34) * random numbers, Cliff: Cliff Random Function. (line 6) * random numbers, rand()/srand() functions: Numeric Functions. @@ -31480,55 +32427,61 @@ Index * random numbers, seed of: Numeric Functions. (line 64) * range expressions (regexps): Bracket Expressions. (line 6) * range patterns: Ranges. (line 6) -* Rankin, Pat <1>: Bugs. (line 72) -* Rankin, Pat <2>: Contributors. (line 38) +* range patterns, line continuation and: Ranges. (line 65) +* Rankin, Pat <1>: Bugs. (line 70) +* Rankin, Pat <2>: Contributors. (line 37) * Rankin, Pat <3>: Assignment Ops. (line 100) * Rankin, Pat: Acknowledgments. (line 60) -* reada extension function: Extension Sample Read write array. +* reada() extension function: Extension Sample Read write array. (line 15) * readable data files, checking: File Checking. (line 6) * readable.awk program: File Checking. (line 11) * readdir extension: Extension Sample Readdir. (line 9) -* readfile extension function: Extension Sample Readfile. - (line 11) +* readfile() extension function: Extension Sample Readfile. + (line 12) +* readfile() user-defined function: Readfile Function. (line 30) +* reading input files: Reading Files. (line 6) * recipe for a programming language: History. (line 6) * record separators <1>: User-modified. (line 143) * record separators: Records. (line 14) -* record separators, changing: Records. (line 81) -* record separators, regular expressions as: Records. (line 117) +* record separators, changing: Records. (line 93) +* record separators, regular expressions as: Records. (line 132) * record separators, with multiline records: Multiple Line. (line 10) * records <1>: Basic High Level. (line 73) * records: Reading Files. (line 14) * records, multiline: Multiple Line. (line 6) * records, printing: Print. (line 22) * records, splitting input into: Records. (line 6) -* records, terminating: Records. (line 117) -* records, treating files as: Records. (line 200) +* records, terminating: Records. (line 132) +* records, treating files as: Records. (line 219) * recursive functions: Definition Syntax. (line 73) +* redirect gawk output, in debugger: Debugger Info. (line 72) * redirection of input: Getline/File. (line 6) * redirection of output: Redirection. (line 6) * reference counting, sorting arrays: Array Sorting Functions. - (line 75) + (line 77) +* regexp: Regexp. (line 6) * regexp constants <1>: Comparison Operators. - (line 103) + (line 102) * regexp constants <2>: Regexp Constants. (line 6) * regexp constants: Regexp Usage. (line 57) * regexp constants, /=.../, /= operator and: Assignment Ops. (line 147) -* regexp constants, as patterns: Expression Patterns. (line 36) +* regexp constants, as patterns: Expression Patterns. (line 34) * regexp constants, in gawk: Using Constant Regexps. (line 28) * regexp constants, slashes vs. quotes: Computed Regexps. (line 28) * regexp constants, vs. string constants: Computed Regexps. (line 38) -* regexp, See regular expressions: Regexp. (line 6) +* register extension: Registration Functions. + (line 6) * regular expressions: Regexp. (line 6) -* regular expressions as field separators: Field Separators. (line 50) +* regular expressions as field separators: Field Separators. (line 51) * regular expressions, anchors in: Regexp Operators. (line 22) * regular expressions, as field separators: Regexp Field Splitting. (line 6) * regular expressions, as patterns <1>: Regexp Patterns. (line 6) * regular expressions, as patterns: Regexp Usage. (line 6) -* regular expressions, as record separators: Records. (line 117) +* regular expressions, as record separators: Records. (line 132) * regular expressions, case sensitivity <1>: User-modified. (line 82) * regular expressions, case sensitivity: Case-sensitivity. (line 6) * regular expressions, computed: Computed Regexps. (line 6) @@ -31555,12 +32508,13 @@ Index * regular expressions, searching for: Egrep Program. (line 6) * relational operators, See comparison operators: Typing and Comparison. (line 9) +* replace in string: String Functions. (line 406) * return debugger command: Debugger Execution Control. (line 54) * return statement, user-defined functions: Return Statement. (line 6) * return value, close() function: Close Files And Pipes. (line 130) -* rev() user-defined function: Function Example. (line 52) +* rev() user-defined function: Function Example. (line 53) * revoutput extension: Extension Sample Revout. (line 11) * revtwoway extension: Extension Sample Rev2way. @@ -31575,25 +32529,28 @@ Index (line 11) * right angle bracket (>), >> operator (I/O) <1>: Precedence. (line 65) * right angle bracket (>), >> operator (I/O): Redirection. (line 50) +* right shift: Bitwise Functions. (line 52) * right shift, bitwise: Bitwise Functions. (line 32) * Ritchie, Dennis: Basic Data Typing. (line 55) -* RLENGTH variable: Auto-set. (line 253) -* RLENGTH variable, match() function and: String Functions. (line 225) +* RLENGTH variable: Auto-set. (line 262) +* RLENGTH variable, match() function and: String Functions. (line 221) * Robbins, Arnold <1>: Future Extensions. (line 6) * Robbins, Arnold <2>: Bugs. (line 32) -* Robbins, Arnold <3>: Contributors. (line 125) -* Robbins, Arnold <4>: Alarm Program. (line 6) -* Robbins, Arnold <5>: Passwd Functions. (line 90) -* Robbins, Arnold <6>: Getline/Pipe. (line 40) +* Robbins, Arnold <3>: Contributors. (line 139) +* Robbins, Arnold <4>: General Data Types. (line 6) +* Robbins, Arnold <5>: Alarm Program. (line 6) +* Robbins, Arnold <6>: Passwd Functions. (line 90) +* Robbins, Arnold <7>: Getline/Pipe. (line 39) * Robbins, Arnold: Command Line Field Separator. - (line 80) -* Robbins, Bill: Getline/Pipe. (line 40) + (line 73) +* Robbins, Bill: Getline/Pipe. (line 39) * Robbins, Harry: Acknowledgments. (line 78) * Robbins, Jean: Acknowledgments. (line 78) * Robbins, Miriam <1>: Passwd Functions. (line 90) -* Robbins, Miriam <2>: Getline/Pipe. (line 40) +* Robbins, Miriam <2>: Getline/Pipe. (line 39) * Robbins, Miriam: Acknowledgments. (line 78) -* Rommel, Kai Uwe: Contributors. (line 43) +* Rommel, Kai Uwe: Contributors. (line 42) +* round to nearest integer: Numeric Functions. (line 23) * round() user-defined function: Round Function. (line 16) * rounding mode, floating-point: Rounding Mode. (line 6) * rounding numbers: Round Function. (line 6) @@ -31603,15 +32560,13 @@ Index * RS variable <1>: User-modified. (line 143) * RS variable: Records. (line 20) * RS variable, multiline records and: Multiple Line. (line 17) -* rshift() function (gawk): Bitwise Functions. (line 52) -* RSTART variable: Auto-set. (line 259) -* RSTART variable, match() function and: String Functions. (line 225) -* RT variable <1>: Auto-set. (line 266) -* RT variable <2>: Getline/Variable/File. - (line 10) -* RT variable <3>: Multiple Line. (line 129) -* RT variable: Records. (line 117) -* Rubin, Paul <1>: Contributors. (line 16) +* rshift: Bitwise Functions. (line 52) +* RSTART variable: Auto-set. (line 268) +* RSTART variable, match() function and: String Functions. (line 221) +* RT variable <1>: Auto-set. (line 275) +* RT variable <2>: Multiple Line. (line 129) +* RT variable: Records. (line 132) +* Rubin, Paul <1>: Contributors. (line 15) * Rubin, Paul: History. (line 30) * rule, definition of: Getting Started. (line 21) * run debugger command: Debugger Execution Control. @@ -31619,59 +32574,84 @@ Index * rvalues/lvalues: Assignment Ops. (line 32) * s debugger command (alias for step): Debugger Execution Control. (line 68) +* sample debugging session: Sample Debugging Session. + (line 6) * sandbox mode: Options. (line 279) +* save debugger options: Debugger Info. (line 84) +* scalar or array: Type Functions. (line 11) * scalar values: Basic Data Typing. (line 13) -* Schorr, Andrew <1>: Contributors. (line 121) +* scanning arrays: Scanning an Array. (line 6) +* scanning multidimensional arrays: Multiscanning. (line 11) +* Schorr, Andrew <1>: Contributors. (line 131) * Schorr, Andrew: Acknowledgments. (line 60) * Schreiber, Bert: Acknowledgments. (line 38) * Schreiber, Rita: Acknowledgments. (line 38) -* search paths <1>: VMS Running. (line 29) -* search paths <2>: PC Using. (line 11) -* search paths <3>: Igawk Program. (line 368) -* search paths <4>: AWKLIBPATH Variable. (line 6) -* search paths: AWKPATH Variable. (line 6) +* search and replace in strings: String Functions. (line 82) +* search in string: String Functions. (line 151) +* search paths <1>: VMS Running. (line 58) +* search paths <2>: PC Using. (line 10) +* search paths: Igawk Program. (line 368) * search paths, for shared libraries: AWKLIBPATH Variable. (line 6) -* search paths, for source files <1>: VMS Running. (line 29) -* search paths, for source files <2>: PC Using. (line 11) +* search paths, for source files <1>: VMS Running. (line 58) +* search paths, for source files <2>: PC Using. (line 10) * search paths, for source files <3>: Igawk Program. (line 368) * search paths, for source files: AWKPATH Variable. (line 6) -* searching: String Functions. (line 155) * searching, files for regular expressions: Egrep Program. (line 6) * searching, for words: Dupword Program. (line 6) * sed utility <1>: Glossary. (line 12) * sed utility <2>: Simple Sed. (line 6) * sed utility: Field Splitting Summary. (line 46) -* semicolon (;): Statements/Lines. (line 91) -* semicolon (;), AWKPATH variable and: PC Using. (line 11) +* seeding random number generator: Numeric Functions. (line 64) +* semicolon (;), AWKPATH variable and: PC Using. (line 10) * semicolon (;), separating statements in actions <1>: Statements. (line 10) -* semicolon (;), separating statements in actions: Action Overview. +* semicolon (;), separating statements in actions <2>: Action Overview. (line 19) +* semicolon (;), separating statements in actions: Statements/Lines. + (line 91) * separators, field: User-modified. (line 56) * separators, field, FIELDWIDTHS variable and: User-modified. (line 35) * separators, field, FPAT variable and: User-modified. (line 45) * separators, field, POSIX and: Fields. (line 6) * separators, for records <1>: User-modified. (line 143) * separators, for records: Records. (line 14) -* separators, for records, regular expressions as: Records. (line 117) +* separators, for records, regular expressions as: Records. (line 132) * separators, for statements in actions: Action Overview. (line 19) * separators, subscript: User-modified. (line 156) +* set breakpoint: Breakpoint Control. (line 11) * set debugger command: Viewing And Changing Data. (line 59) +* set directory of message catalogs: I18N Functions. (line 12) +* set watchpoint: Viewing And Changing Data. + (line 67) +* setting rounding mode: Setting Rounding Mode. + (line 6) +* setting working precision: Setting Precision. (line 6) +* shadowing of variable values: Definition Syntax. (line 61) +* shell quoting, double quote: Read Terminal. (line 25) +* shell quoting, rules for: Quoting. (line 6) * shells, piping commands into: Redirection. (line 142) * shells, quoting: Using Shell Variables. (line 12) * shells, quoting, rules for: Quoting. (line 18) * shells, scripts: One-shot. (line 22) +* shells, sea: Undocumented. (line 8) * shells, variables: Using Shell Variables. (line 6) * shift, bitwise: Bitwise Functions. (line 32) * short-circuit operators: Boolean Ops. (line 57) +* show all source files, in debugger: Debugger Info. (line 45) +* show breakpoints: Debugger Info. (line 21) +* show function arguments, in debugger: Debugger Info. (line 18) +* show local variables, in debugger: Debugger Info. (line 34) +* show name of current source file, in debugger: Debugger Info. + (line 37) +* show watchpoints: Debugger Info. (line 51) * si debugger command (alias for stepi): Debugger Execution Control. (line 76) * side effects <1>: Increment Ops. (line 11) -* side effects: Concatenation. (line 42) +* side effects: Concatenation. (line 41) * side effects, array indexing: Reference to Elements. (line 42) * side effects, asort() function: Array Sorting Functions. @@ -31689,7 +32669,7 @@ Index (line 110) * sidebar, Changing FS Does Not Affect the Fields: Field Splitting Summary. (line 38) -* sidebar, Changing NR and FNR: Auto-set. (line 312) +* sidebar, Changing NR and FNR: Auto-set. (line 321) * sidebar, Controlling Output Buffering with system(): I/O Functions. (line 135) * sidebar, Escape Sequences for Metacharacters: Escape Sequences. @@ -31703,7 +32683,7 @@ Index * sidebar, Piping into sh: Redirection. (line 140) * sidebar, Portability Issues with #!: Executable Scripts. (line 31) * sidebar, Recipe For A Programming Language: History. (line 6) -* sidebar, RS = "\0" Is Not Portable: Records. (line 177) +* sidebar, RS = "\0" Is Not Portable: Records. (line 192) * sidebar, So Why Does gawk have BEGINFILE and ENDFILE?: Filetrans Function. (line 83) * sidebar, Syntactic Ambiguities Between /= and Regular Expressions: Assignment Ops. @@ -31713,30 +32693,36 @@ Index (line 56) * sidebar, Using close()'s Return Value: Close Files And Pipes. (line 128) -* SIGHUP signal: Profiling. (line 204) -* SIGINT signal (MS-Windows): Profiling. (line 207) -* signals, HUP/SIGHUP: Profiling. (line 204) -* signals, INT/SIGINT (MS-Windows): Profiling. (line 207) -* signals, QUIT/SIGQUIT (MS-Windows): Profiling. (line 207) -* signals, USR1/SIGUSR1: Profiling. (line 181) -* SIGQUIT signal (MS-Windows): Profiling. (line 207) -* SIGUSR1 signal: Profiling. (line 181) +* SIGHUP signal, for dynamic profiling: Profiling. (line 211) +* SIGINT signal (MS-Windows): Profiling. (line 214) +* signals, HUP/SIGHUP, for profiling: Profiling. (line 211) +* signals, INT/SIGINT (MS-Windows): Profiling. (line 214) +* signals, QUIT/SIGQUIT (MS-Windows): Profiling. (line 214) +* signals, USR1/SIGUSR1, for profiling: Profiling. (line 188) +* signature program: Signature Program. (line 6) +* SIGQUIT signal (MS-Windows): Profiling. (line 214) +* SIGUSR1 signal, for dynamic profiling: Profiling. (line 188) * silent debugger command: Debugger Execution Control. (line 10) -* sin() function: Numeric Functions. (line 75) +* sin: Numeric Functions. (line 75) +* sine: Numeric Functions. (line 75) * single precision floating-point: General Arithmetic. (line 21) -* single quote (') <1>: Quoting. (line 31) -* single quote (') <2>: Long. (line 33) * single quote ('): One-shot. (line 15) +* single quote (') in gawk command lines: Long. (line 33) +* single quote ('), in shell commands: Quoting. (line 31) * single quote ('), vs. apostrophe: Comments. (line 27) * single quote ('), with double quotes: Quoting. (line 53) * single-character fields: Single Character Fields. (line 6) +* single-step execution, in the debugger: Debugger Execution Control. + (line 43) * Skywalker, Luke: Undocumented. (line 6) -* sleep extension function: Extension Sample Time. +* sleep utility: Alarm Program. (line 111) +* sleep() extension function: Extension Sample Time. (line 23) -* sleep utility: Alarm Program. (line 109) * Solaris, POSIX-compliant awk: Other Versions. (line 96) +* sort array: String Functions. (line 32) +* sort array indices: String Functions. (line 32) * sort function, arrays, sorting: Array Sorting Functions. (line 6) * sort utility: Word Sorting. (line 50) @@ -31747,38 +32733,44 @@ Index * source code, Brian Kernighan's awk: Other Versions. (line 13) * source code, Busybox Awk: Other Versions. (line 88) * source code, gawk: Gawk Distribution. (line 6) -* source code, jawk: Other Versions. (line 106) -* source code, libmawk: Other Versions. (line 114) +* source code, Illumos awk: Other Versions. (line 104) +* source code, jawk: Other Versions. (line 112) +* source code, libmawk: Other Versions. (line 120) * source code, mawk: Other Versions. (line 44) -* source code, mixing: Options. (line 131) +* source code, mixing: Options. (line 117) * source code, pawk: Other Versions. (line 78) -* source code, QSE Awk: Other Versions. (line 124) -* source code, QuikTrim Awk: Other Versions. (line 128) +* source code, pawk (Python version): Other Versions. (line 124) +* source code, QSE Awk: Other Versions. (line 130) +* source code, QuikTrim Awk: Other Versions. (line 134) * source code, Solaris awk: Other Versions. (line 96) * source files, search path for: Igawk Program. (line 368) -* sparse arrays: Array Intro. (line 71) +* sparse arrays: Array Intro. (line 70) * Spencer, Henry: Glossary. (line 12) +* split: String Functions. (line 313) +* split string into array: String Functions. (line 291) * split utility: Split Program. (line 6) -* split() function: String Functions. (line 317) * split() function, array elements, deleting: Delete. (line 61) * split.awk program: Split Program. (line 30) -* sprintf() function <1>: String Functions. (line 382) -* sprintf() function: OFMT. (line 15) +* sprintf <1>: String Functions. (line 378) +* sprintf: OFMT. (line 15) * sprintf() function, OFMT variable and: User-modified. (line 124) * sprintf() function, print/printf statements and: Round Function. (line 6) -* sqrt() function: Numeric Functions. (line 78) -* square brackets ([]): Regexp Operators. (line 55) -* srand() function: Numeric Functions. (line 82) -* Stallman, Richard <1>: Glossary. (line 305) -* Stallman, Richard <2>: Contributors. (line 24) +* sqrt: Numeric Functions. (line 78) +* square brackets ([]), regexp operator: Regexp Operators. (line 55) +* square root: Numeric Functions. (line 78) +* srand: Numeric Functions. (line 82) +* stack frame: Debugging Terms. (line 10) +* Stallman, Richard <1>: Glossary. (line 297) +* Stallman, Richard <2>: Contributors. (line 23) * Stallman, Richard <3>: Acknowledgments. (line 18) * Stallman, Richard: Manual History. (line 6) * standard error: Special FD. (line 6) * standard input <1>: Special FD. (line 6) * standard input: Read Terminal. (line 6) * standard output: Special FD. (line 6) -* stat extension function: Extension Sample File Functions. +* starting the debugger: Debugger Invocation. (line 6) +* stat() extension function: Extension Sample File Functions. (line 18) * statements, compound, control statements and: Statements. (line 10) * statements, control, in actions: Statements. (line 6) @@ -31787,55 +32779,65 @@ Index (line 68) * stepi debugger command: Debugger Execution Control. (line 76) +* stop automatic display, in debugger: Viewing And Changing Data. + (line 80) * stream editors <1>: Simple Sed. (line 6) * stream editors: Field Splitting Summary. (line 46) -* strftime() function (gawk): Time Functions. (line 48) +* strftime: Time Functions. (line 48) * string constants: Scalar Constants. (line 15) * string constants, vs. regexp constants: Computed Regexps. (line 38) * string extraction (internationalization): String Extraction. (line 6) -* string operators: Concatenation. (line 9) +* string length: String Functions. (line 164) +* string operators: Concatenation. (line 8) +* string, regular expression match: String Functions. (line 204) +* string-manipulation functions: String Functions. (line 6) * string-matching operators: Regexp Usage. (line 19) +* string-translation functions: I18N Functions. (line 6) +* strings splitting, example: String Functions. (line 333) * strings, converting <1>: Bitwise Functions. (line 109) * strings, converting: Conversion. (line 6) +* strings, converting letter case: String Functions. (line 520) * strings, converting, numbers to: User-modified. (line 28) -* strings, empty, See null strings: Records. (line 107) +* strings, empty, See null strings: Records. (line 122) * strings, extracting: String Extraction. (line 6) * strings, for localization: Programmer i18n. (line 14) -* strings, length of: Scalar Constants. (line 20) +* strings, length limitations: Scalar Constants. (line 20) * strings, merging arrays into: Join Function. (line 6) * strings, null: Regexp Field Splitting. (line 43) * strings, numeric: Variable Typing. (line 6) -* strings, splitting: String Functions. (line 337) -* strtonum() function (gawk): String Functions. (line 389) +* strtonum: String Functions. (line 385) * strtonum() function (gawk), --non-decimal-data option and: Nondecimal Data. (line 36) -* sub() function <1>: String Functions. (line 410) -* sub() function: Using Constant Regexps. +* sub <1>: String Functions. (line 406) +* sub: Using Constant Regexps. (line 43) -* sub() function, arguments of: String Functions. (line 464) +* sub() function, arguments of: String Functions. (line 460) * sub() function, escape processing: Gory Details. (line 6) * subscript separators: User-modified. (line 156) -* subscripts in arrays, multidimensional: Multi-dimensional. (line 10) -* subscripts in arrays, multidimensional, scanning: Multi-scanning. +* subscripts in arrays, multidimensional: Multidimensional. (line 10) +* subscripts in arrays, multidimensional, scanning: Multiscanning. (line 11) * subscripts in arrays, numbers as: Numeric Array Subscripts. (line 6) * subscripts in arrays, uninitialized variables as: Uninitialized Subscripts. (line 6) * SUBSEP variable: User-modified. (line 156) -* SUBSEP variable, multidimensional arrays: Multi-dimensional. +* SUBSEP variable, and multidimensional arrays: Multidimensional. (line 16) -* substr() function: String Functions. (line 483) +* substitute in string: String Functions. (line 82) +* substr: String Functions. (line 479) +* substring: String Functions. (line 479) * Sumner, Andrew: Other Versions. (line 64) +* supplementary groups of gawk process: Auto-set. (line 243) * switch statement: Switch Statement. (line 6) -* SYMTAB array: Auto-set. (line 274) +* SYMTAB array: Auto-set. (line 283) * syntactic ambiguity: /= operator vs. /=.../ regexp constant: Assignment Ops. (line 147) -* system() function: I/O Functions. (line 72) -* systime() function (gawk): Time Functions. (line 65) +* system: I/O Functions. (line 72) +* systime: Time Functions. (line 66) * t debugger command (alias for tbreak): Breakpoint Control. (line 90) * tbreak debugger command: Breakpoint Control. (line 90) * Tcl: Library Names. (line 57) @@ -31843,17 +32845,17 @@ Index * TCP/IP, support for: Special Network. (line 6) * tee utility: Tee Program. (line 6) * tee.awk program: Tee Program. (line 26) -* terminating records: Records. (line 117) +* temporary breakpoint: Breakpoint Control. (line 90) +* terminating records: Records. (line 132) * testbits.awk program: Bitwise Functions. (line 70) * testext extension: Extension Sample API Tests. (line 6) * Texinfo <1>: Adding Code. (line 99) * Texinfo <2>: Distribution contents. - (line 80) + (line 77) * Texinfo <3>: Extract Program. (line 12) * Texinfo <4>: Dupword Program. (line 17) * Texinfo <5>: Library Functions. (line 33) -* Texinfo <6>: Sample Data Files. (line 66) * Texinfo: Conventions. (line 6) * Texinfo, chapter beginnings in files: Regexp Operators. (line 22) * Texinfo, extracting programs from source files: Extract Program. @@ -31873,31 +32875,35 @@ Index * tilde (~), ~ operator <5>: Computed Regexps. (line 6) * tilde (~), ~ operator <6>: Case-sensitivity. (line 26) * tilde (~), ~ operator: Regexp Usage. (line 19) -* time, alarm clock example program: Alarm Program. (line 9) +* time functions: Time Functions. (line 6) +* time, alarm clock example program: Alarm Program. (line 11) * time, localization and: Explaining gettext. (line 115) * time, managing: Getlocaltime Function. (line 6) * time, retrieving: Time Functions. (line 17) * timeout, reading input: Read Timeout. (line 6) * timestamps: Time Functions. (line 6) -* timestamps, converting dates to: Time Functions. (line 75) +* timestamps, converting dates to: Time Functions. (line 76) * timestamps, formatted: Getlocaltime Function. (line 6) -* tolower() function: String Functions. (line 525) -* toupper() function: String Functions. (line 531) +* tolower: String Functions. (line 521) +* toupper: String Functions. (line 527) * tr utility: Translate Program. (line 6) * trace debugger command: Miscellaneous Debugger Commands. (line 108) +* traceback, display in debugger: Execution Stack. (line 13) +* translate string: I18N Functions. (line 22) * translate.awk program: Translate Program. (line 55) +* treating files, as single records: Records. (line 219) * troubleshooting, --non-decimal-data option: Options. (line 207) * troubleshooting, == operator: Comparison Operators. (line 37) -* troubleshooting, awk uses FS not IFS: Field Separators. (line 29) +* troubleshooting, awk uses FS not IFS: Field Separators. (line 30) * troubleshooting, backslash before nonspecial character: Escape Sequences. (line 112) * troubleshooting, division: Arithmetic Ops. (line 44) * troubleshooting, fatal errors, field widths, specifying: Constant Size. - (line 22) + (line 23) * troubleshooting, fatal errors, printf format strings: Format Modifiers. (line 159) * troubleshooting, fflush() function: I/O Functions. (line 60) @@ -31907,9 +32913,9 @@ Index * troubleshooting, gawk, fatal errors, function arguments: Calling Built-in. (line 16) * troubleshooting, getline function: File Checking. (line 25) -* troubleshooting, gsub()/sub() functions: String Functions. (line 474) -* troubleshooting, match() function: String Functions. (line 290) -* troubleshooting, patsplit() function: String Functions. (line 313) +* troubleshooting, gsub()/sub() functions: String Functions. (line 470) +* troubleshooting, match() function: String Functions. (line 286) +* troubleshooting, patsplit() function: String Functions. (line 309) * troubleshooting, print statement, omitting commas: Print Examples. (line 31) * troubleshooting, printing: Redirection. (line 118) @@ -31917,13 +32923,13 @@ Index * troubleshooting, readable data files: File Checking. (line 6) * troubleshooting, regexp constants vs. string constants: Computed Regexps. (line 38) -* troubleshooting, string concatenation: Concatenation. (line 27) -* troubleshooting, substr() function: String Functions. (line 501) +* troubleshooting, string concatenation: Concatenation. (line 26) +* troubleshooting, substr() function: String Functions. (line 497) * troubleshooting, system() function: I/O Functions. (line 94) * troubleshooting, typographical errors, global variables: Options. - (line 112) + (line 98) * true, logical: Truth Values. (line 6) -* Trueman, David <1>: Contributors. (line 31) +* Trueman, David <1>: Contributors. (line 30) * Trueman, David <2>: Acknowledgments. (line 47) * Trueman, David: History. (line 30) * trunc-mod operation: Arithmetic Ops. (line 66) @@ -31931,6 +32937,8 @@ Index * type conversion: Conversion. (line 21) * u debugger command (alias for until): Debugger Execution Control. (line 83) +* unassigned array elements: Reference to Elements. + (line 18) * undefined functions: Pass By Value/Reference. (line 71) * underscore (_), C macro: Explaining gettext. (line 70) @@ -31940,20 +32948,22 @@ Index * undisplay debugger command: Viewing And Changing Data. (line 80) * undocumented features: Undocumented. (line 6) -* Unicode: Glossary. (line 141) +* Unicode <1>: Glossary. (line 133) +* Unicode <2>: Ranges and Locales. (line 61) +* Unicode: Ordinal Functions. (line 45) * uninitialized variables, as array subscripts: Uninitialized Subscripts. (line 6) * uniq utility: Uniq Program. (line 6) * uniq.awk program: Uniq Program. (line 65) -* Unix: Glossary. (line 624) +* Unix: Glossary. (line 616) * Unix awk, backslashes in escape sequences: Escape Sequences. (line 124) * Unix awk, close() function and: Close Files And Pipes. (line 130) * Unix awk, password files, field separators and: Command Line Field Separator. - (line 72) + (line 64) * Unix, awk scripts and: Executable Scripts. (line 6) -* UNIXROOT variable, on OS/2 systems: PC Using. (line 17) +* UNIXROOT variable, on OS/2 systems: PC Using. (line 16) * unsigned integers: General Arithmetic. (line 15) * until debugger command: Debugger Execution Control. (line 83) @@ -31961,15 +32971,16 @@ Index (line 84) * up debugger command: Execution Stack. (line 33) * user database, reading: Passwd Functions. (line 6) -* user-defined, functions: User-defined. (line 6) -* user-defined, functions, counts: Profiling. (line 129) +* user-defined functions: User-defined. (line 6) +* user-defined, functions, counts, in a profile: Profiling. (line 137) * user-defined, variables: Variables. (line 6) * user-modifiable variables: User-modified. (line 6) * users, information about, printing: Id Program. (line 6) * users, information about, retrieving: Passwd Functions. (line 16) -* USR1 signal: Profiling. (line 181) +* USR1 signal, for dynamic profiling: Profiling. (line 188) * values, numeric: Basic Data Typing. (line 13) * values, string: Basic Data Typing. (line 13) +* variable assignments and input files: Other Arguments. (line 19) * variable typing: Typing and Comparison. (line 9) * variables <1>: Basic Data Typing. (line 6) @@ -31977,7 +32988,7 @@ Index * variables, assigning on command line: Assignment Options. (line 6) * variables, built-in <1>: Built-in Variables. (line 6) * variables, built-in: Using Variables. (line 20) -* variables, built-in, -v option, setting with: Options. (line 54) +* variables, built-in, -v option, setting with: Options. (line 40) * variables, built-in, conveying information: Auto-set. (line 6) * variables, flag: Boolean Ops. (line 67) * variables, getline command into, using <1>: Getline/Variable/Coprocess. @@ -31988,12 +32999,12 @@ Index (line 6) * variables, getline command into, using: Getline/Variable. (line 6) * variables, global, for library functions: Library Names. (line 11) -* variables, global, printing list of: Options. (line 107) +* variables, global, printing list of: Options. (line 93) * variables, initializing: Using Variables. (line 20) -* variables, local: Variable Scope. (line 6) +* variables, local to a function: Variable Scope. (line 6) * variables, names of: Arrays. (line 18) * variables, private: Library Names. (line 11) -* variables, setting: Options. (line 46) +* variables, setting: Options. (line 32) * variables, shadowing: Definition Syntax. (line 61) * variables, types of: Assignment Ops. (line 40) * variables, types of, comparison expressions and: Typing and Comparison. @@ -32001,9 +33012,13 @@ Index * variables, uninitialized, as array subscripts: Uninitialized Subscripts. (line 6) * variables, user-defined: Variables. (line 6) +* version of gawk: Auto-set. (line 213) +* version of gawk extension API: Auto-set. (line 238) +* version of GNU MP library: Auto-set. (line 224) +* version of GNU MPFR library: Auto-set. (line 220) * vertical bar (|): Regexp Operators. (line 69) * vertical bar (|), | operator (I/O) <1>: Precedence. (line 65) -* vertical bar (|), | operator (I/O): Getline/Pipe. (line 10) +* vertical bar (|), | operator (I/O): Getline/Pipe. (line 9) * vertical bar (|), |& operator (I/O) <1>: Two-way I/O. (line 44) * vertical bar (|), |& operator (I/O) <2>: Precedence. (line 65) * vertical bar (|), |& operator (I/O): Getline/Coprocess. (line 6) @@ -32012,31 +33027,32 @@ Index * Vinschen, Corinna: Acknowledgments. (line 60) * w debugger command (alias for watch): Viewing And Changing Data. (line 67) -* w utility: Constant Size. (line 22) -* wait extension function: Extension Sample Fork. +* w utility: Constant Size. (line 23) +* wait() extension function: Extension Sample Fork. (line 22) -* waitpid extension function: Extension Sample Fork. +* waitpid() extension function: Extension Sample Fork. (line 18) * walk_array() user-defined function: Walking Arrays. (line 14) * Wall, Larry <1>: Future Extensions. (line 6) * Wall, Larry: Array Intro. (line 6) -* Wallin, Anders: Acknowledgments. (line 60) +* Wallin, Anders: Contributors. (line 103) * warnings, issuing: Options. (line 182) * watch debugger command: Viewing And Changing Data. (line 67) +* watchpoint: Debugging Terms. (line 42) * wc utility: Wc Program. (line 6) * wc.awk program: Wc Program. (line 46) -* Weinberger, Peter <1>: Contributors. (line 12) +* Weinberger, Peter <1>: Contributors. (line 11) * Weinberger, Peter: History. (line 17) -* while statement <1>: While Statement. (line 6) -* while statement: Regexp Usage. (line 19) +* while statement: While Statement. (line 6) +* while statement, use of regexps in: Regexp Usage. (line 19) * whitespace, as field separators: Default Field Splitting. (line 6) * whitespace, functions, calling: Calling Built-in. (line 10) * whitespace, newlines as: Options. (line 253) -* Williams, Kent: Contributors. (line 35) -* Woehlke, Matthew: Contributors. (line 80) -* Woods, John: Contributors. (line 28) +* Williams, Kent: Contributors. (line 34) +* Woehlke, Matthew: Contributors. (line 79) +* Woods, John: Contributors. (line 27) * word boundaries, matching: GNU Regexp Operators. (line 38) * word, regexp definition of: GNU Regexp Operators. @@ -32047,25 +33063,25 @@ Index * words, counting: Wc Program. (line 6) * words, duplicate, searching for: Dupword Program. (line 6) * words, usage counts, generating: Word Sorting. (line 6) -* writea extension function: Extension Sample Read write array. +* writea() extension function: Extension Sample Read write array. (line 9) * xgettext utility: String Extraction. (line 13) +* xor: Bitwise Functions. (line 55) * XOR bitwise operation: Bitwise Functions. (line 6) -* xor() function (gawk): Bitwise Functions. (line 55) -* Yawitz, Efraim: Contributors. (line 119) +* Yawitz, Efraim: Contributors. (line 129) * Zaretskii, Eli <1>: Bugs. (line 70) -* Zaretskii, Eli <2>: Contributors. (line 56) +* Zaretskii, Eli <2>: Contributors. (line 55) * Zaretskii, Eli: Acknowledgments. (line 60) * zero, negative vs. positive: Unexpected Results. (line 34) * zerofile.awk program: Empty Files. (line 21) -* Zoulas, Christos: Contributors. (line 67) -* {} (braces): Profiling. (line 134) +* Zoulas, Christos: Contributors. (line 66) +* {} (braces): Profiling. (line 142) * {} (braces), actions and: Action Overview. (line 19) * {} (braces), statements, grouping: Statements. (line 10) * | (vertical bar): Regexp Operators. (line 69) * | (vertical bar), | operator (I/O) <1>: Precedence. (line 65) * | (vertical bar), | operator (I/O) <2>: Redirection. (line 57) -* | (vertical bar), | operator (I/O): Getline/Pipe. (line 10) +* | (vertical bar), | operator (I/O): Getline/Pipe. (line 9) * | (vertical bar), |& operator (I/O) <1>: Two-way I/O. (line 44) * | (vertical bar), |& operator (I/O) <2>: Precedence. (line 65) * | (vertical bar), |& operator (I/O) <3>: Redirection. (line 102) @@ -32086,521 +33102,530 @@ Index Tag Table: -Node: Top1360 -Node: Foreword40338 -Node: Preface44683 -Ref: Preface-Footnote-147736 -Ref: Preface-Footnote-247832 -Node: History48064 -Node: Names50438 -Ref: Names-Footnote-151915 -Node: This Manual51987 -Ref: This Manual-Footnote-157761 -Node: Conventions57861 -Node: Manual History60013 -Ref: Manual History-Footnote-163461 -Ref: Manual History-Footnote-263502 -Node: How To Contribute63576 -Node: Acknowledgments64720 -Node: Getting Started68929 -Node: Running gawk71308 -Node: One-shot72494 -Node: Read Terminal73719 -Ref: Read Terminal-Footnote-175369 -Ref: Read Terminal-Footnote-275645 -Node: Long75816 -Node: Executable Scripts77192 -Ref: Executable Scripts-Footnote-179025 -Ref: Executable Scripts-Footnote-279127 -Node: Comments79674 -Node: Quoting82141 -Node: DOS Quoting86764 -Node: Sample Data Files87439 -Node: Very Simple90471 -Node: Two Rules95070 -Node: More Complex97217 -Ref: More Complex-Footnote-1100147 -Node: Statements/Lines100232 -Ref: Statements/Lines-Footnote-1104694 -Node: Other Features104959 -Node: When105887 -Node: Invoking Gawk108034 -Node: Command Line109495 -Node: Options110278 -Ref: Options-Footnote-1125670 -Node: Other Arguments125695 -Node: Naming Standard Input128353 -Node: Environment Variables129447 -Node: AWKPATH Variable130005 -Ref: AWKPATH Variable-Footnote-1132763 -Node: AWKLIBPATH Variable133023 -Node: Other Environment Variables133741 -Node: Exit Status136236 -Node: Include Files136911 -Node: Loading Shared Libraries140480 -Node: Obsolete141844 -Node: Undocumented142541 -Node: Regexp142784 -Node: Regexp Usage144173 -Node: Escape Sequences146199 -Node: Regexp Operators151868 -Ref: Regexp Operators-Footnote-1159248 -Ref: Regexp Operators-Footnote-2159395 -Node: Bracket Expressions159493 -Ref: table-char-classes161383 -Node: GNU Regexp Operators163906 -Node: Case-sensitivity167629 -Ref: Case-sensitivity-Footnote-1170597 -Ref: Case-sensitivity-Footnote-2170832 -Node: Leftmost Longest170940 -Node: Computed Regexps172141 -Node: Reading Files175478 -Node: Records177481 -Ref: Records-Footnote-1186370 -Node: Fields186407 -Ref: Fields-Footnote-1189440 -Node: Nonconstant Fields189526 -Node: Changing Fields191728 -Node: Field Separators197687 -Node: Default Field Splitting200316 -Node: Regexp Field Splitting201433 -Node: Single Character Fields204775 -Node: Command Line Field Separator205834 -Node: Field Splitting Summary209275 -Ref: Field Splitting Summary-Footnote-1212386 -Node: Constant Size212487 -Node: Splitting By Content217071 -Ref: Splitting By Content-Footnote-1220797 -Node: Multiple Line220837 -Ref: Multiple Line-Footnote-1226684 -Node: Getline226863 -Node: Plain Getline229079 -Node: Getline/Variable231174 -Node: Getline/File232321 -Node: Getline/Variable/File233662 -Ref: Getline/Variable/File-Footnote-1235261 -Node: Getline/Pipe235348 -Node: Getline/Variable/Pipe238048 -Node: Getline/Coprocess239155 -Node: Getline/Variable/Coprocess240407 -Node: Getline Notes241144 -Node: Getline Summary243931 -Ref: table-getline-variants244339 -Node: Read Timeout245251 -Ref: Read Timeout-Footnote-1248992 -Node: Command line directories249049 -Node: Printing249679 -Node: Print251310 -Node: Print Examples252647 -Node: Output Separators255431 -Node: OFMT257191 -Node: Printf258549 -Node: Basic Printf259455 -Node: Control Letters260994 -Node: Format Modifiers264806 -Node: Printf Examples270815 -Node: Redirection273530 -Node: Special Files280495 -Node: Special FD281028 -Ref: Special FD-Footnote-1284653 -Node: Special Network284727 -Node: Special Caveats285577 -Node: Close Files And Pipes286373 -Ref: Close Files And Pipes-Footnote-1293356 -Ref: Close Files And Pipes-Footnote-2293504 -Node: Expressions293654 -Node: Values294786 -Node: Constants295462 -Node: Scalar Constants296142 -Ref: Scalar Constants-Footnote-1297001 -Node: Nondecimal-numbers297183 -Node: Regexp Constants300183 -Node: Using Constant Regexps300658 -Node: Variables303713 -Node: Using Variables304368 -Node: Assignment Options306092 -Node: Conversion307964 -Ref: table-locale-affects313464 -Ref: Conversion-Footnote-1314088 -Node: All Operators314197 -Node: Arithmetic Ops314827 -Node: Concatenation317332 -Ref: Concatenation-Footnote-1320125 -Node: Assignment Ops320245 -Ref: table-assign-ops325233 -Node: Increment Ops326564 -Node: Truth Values and Conditions329999 -Node: Truth Values331082 -Node: Typing and Comparison332131 -Node: Variable Typing332920 -Ref: Variable Typing-Footnote-1336817 -Node: Comparison Operators336939 -Ref: table-relational-ops337349 -Node: POSIX String Comparison340898 -Ref: POSIX String Comparison-Footnote-1341854 -Node: Boolean Ops341992 -Ref: Boolean Ops-Footnote-1346070 -Node: Conditional Exp346161 -Node: Function Calls347893 -Node: Precedence351487 -Node: Locales355156 -Node: Patterns and Actions356245 -Node: Pattern Overview357299 -Node: Regexp Patterns358968 -Node: Expression Patterns359511 -Node: Ranges363196 -Node: BEGIN/END366162 -Node: Using BEGIN/END366924 -Ref: Using BEGIN/END-Footnote-1369655 -Node: I/O And BEGIN/END369761 -Node: BEGINFILE/ENDFILE372043 -Node: Empty374957 -Node: Using Shell Variables375273 -Node: Action Overview377558 -Node: Statements379915 -Node: If Statement381769 -Node: While Statement383268 -Node: Do Statement385312 -Node: For Statement386468 -Node: Switch Statement389620 -Node: Break Statement391717 -Node: Continue Statement393707 -Node: Next Statement395500 -Node: Nextfile Statement397890 -Node: Exit Statement400533 -Node: Built-in Variables402949 -Node: User-modified404044 -Ref: User-modified-Footnote-1412404 -Node: Auto-set412466 -Ref: Auto-set-Footnote-1425544 -Ref: Auto-set-Footnote-2425749 -Node: ARGC and ARGV425805 -Node: Arrays429656 -Node: Array Basics431161 -Node: Array Intro431987 -Node: Reference to Elements436305 -Node: Assigning Elements438575 -Node: Array Example439066 -Node: Scanning an Array440798 -Node: Controlling Scanning443112 -Ref: Controlling Scanning-Footnote-1448035 -Node: Delete448351 -Ref: Delete-Footnote-1451116 -Node: Numeric Array Subscripts451173 -Node: Uninitialized Subscripts453356 -Node: Multi-dimensional454984 -Node: Multi-scanning458078 -Node: Arrays of Arrays459669 -Node: Functions464310 -Node: Built-in465129 -Node: Calling Built-in466207 -Node: Numeric Functions468195 -Ref: Numeric Functions-Footnote-1472027 -Ref: Numeric Functions-Footnote-2472384 -Ref: Numeric Functions-Footnote-3472432 -Node: String Functions472701 -Ref: String Functions-Footnote-1496259 -Ref: String Functions-Footnote-2496388 -Ref: String Functions-Footnote-3496636 -Node: Gory Details496723 -Ref: table-sub-escapes498402 -Ref: table-sub-posix-92499756 -Ref: table-sub-proposed501107 -Ref: table-posix-sub502461 -Ref: table-gensub-escapes504006 -Ref: Gory Details-Footnote-1505182 -Ref: Gory Details-Footnote-2505233 -Node: I/O Functions505384 -Ref: I/O Functions-Footnote-1512369 -Node: Time Functions512516 -Ref: Time Functions-Footnote-1523449 -Ref: Time Functions-Footnote-2523517 -Ref: Time Functions-Footnote-3523675 -Ref: Time Functions-Footnote-4523786 -Ref: Time Functions-Footnote-5523898 -Ref: Time Functions-Footnote-6524125 -Node: Bitwise Functions524391 -Ref: table-bitwise-ops524949 -Ref: Bitwise Functions-Footnote-1529170 -Node: Type Functions529354 -Node: I18N Functions530505 -Node: User-defined532132 -Node: Definition Syntax532936 -Ref: Definition Syntax-Footnote-1537846 -Node: Function Example537915 -Node: Function Caveats540509 -Node: Calling A Function540930 -Node: Variable Scope542045 -Node: Pass By Value/Reference545008 -Node: Return Statement548516 -Node: Dynamic Typing551497 -Node: Indirect Calls552428 -Node: Library Functions562113 -Ref: Library Functions-Footnote-1565626 -Ref: Library Functions-Footnote-2565769 -Node: Library Names565940 -Ref: Library Names-Footnote-1569411 -Ref: Library Names-Footnote-2569631 -Node: General Functions569717 -Node: Strtonum Function570670 -Node: Assert Function573600 -Node: Round Function576926 -Node: Cliff Random Function578469 -Node: Ordinal Functions579485 -Ref: Ordinal Functions-Footnote-1582555 -Ref: Ordinal Functions-Footnote-2582807 -Node: Join Function583016 -Ref: Join Function-Footnote-1584787 -Node: Getlocaltime Function584987 -Node: Data File Management588702 -Node: Filetrans Function589334 -Node: Rewind Function593403 -Node: File Checking594790 -Node: Empty Files595884 -Node: Ignoring Assigns598114 -Node: Getopt Function599667 -Ref: Getopt Function-Footnote-1610971 -Node: Passwd Functions611174 -Ref: Passwd Functions-Footnote-1620149 -Node: Group Functions620237 -Node: Walking Arrays628321 -Node: Sample Programs630458 -Node: Running Examples631132 -Node: Clones631860 -Node: Cut Program633084 -Node: Egrep Program642929 -Ref: Egrep Program-Footnote-1650702 -Node: Id Program650812 -Node: Split Program654428 -Ref: Split Program-Footnote-1657947 -Node: Tee Program658075 -Node: Uniq Program660878 -Node: Wc Program668307 -Ref: Wc Program-Footnote-1672573 -Ref: Wc Program-Footnote-2672773 -Node: Miscellaneous Programs672865 -Node: Dupword Program674053 -Node: Alarm Program676084 -Node: Translate Program680833 -Ref: Translate Program-Footnote-1685220 -Ref: Translate Program-Footnote-2685448 -Node: Labels Program685582 -Ref: Labels Program-Footnote-1688953 -Node: Word Sorting689037 -Node: History Sorting692921 -Node: Extract Program694760 -Ref: Extract Program-Footnote-1702261 -Node: Simple Sed702389 -Node: Igawk Program705451 -Ref: Igawk Program-Footnote-1720608 -Ref: Igawk Program-Footnote-2720809 -Node: Anagram Program720947 -Node: Signature Program724015 -Node: Advanced Features725115 -Node: Nondecimal Data726997 -Node: Array Sorting728580 -Node: Controlling Array Traversal729277 -Node: Array Sorting Functions737515 -Ref: Array Sorting Functions-Footnote-1741189 -Ref: Array Sorting Functions-Footnote-2741282 -Node: Two-way I/O741476 -Ref: Two-way I/O-Footnote-1746908 -Node: TCP/IP Networking746978 -Node: Profiling749822 -Node: Internationalization757319 -Node: I18N and L10N758744 -Node: Explaining gettext759430 -Ref: Explaining gettext-Footnote-1764498 -Ref: Explaining gettext-Footnote-2764682 -Node: Programmer i18n764847 -Node: Translator i18n769049 -Node: String Extraction769842 -Ref: String Extraction-Footnote-1770803 -Node: Printf Ordering770889 -Ref: Printf Ordering-Footnote-1773673 -Node: I18N Portability773737 -Ref: I18N Portability-Footnote-1776186 -Node: I18N Example776249 -Ref: I18N Example-Footnote-1778887 -Node: Gawk I18N778959 -Node: Debugger779580 -Node: Debugging780551 -Node: Debugging Concepts780984 -Node: Debugging Terms782840 -Node: Awk Debugging785437 -Node: Sample Debugging Session786329 -Node: Debugger Invocation786849 -Node: Finding The Bug788181 -Node: List of Debugger Commands794669 -Node: Breakpoint Control796003 -Node: Debugger Execution Control799667 -Node: Viewing And Changing Data803027 -Node: Execution Stack806383 -Node: Debugger Info807850 -Node: Miscellaneous Debugger Commands811832 -Node: Readline Support817008 -Node: Limitations817839 -Node: Arbitrary Precision Arithmetic820091 -Ref: Arbitrary Precision Arithmetic-Footnote-1821742 -Node: General Arithmetic821890 -Node: Floating Point Issues823610 -Node: String Conversion Precision824491 -Ref: String Conversion Precision-Footnote-1826197 -Node: Unexpected Results826306 -Node: POSIX Floating Point Problems828459 -Ref: POSIX Floating Point Problems-Footnote-1832284 -Node: Integer Programming832322 -Node: Floating-point Programming834061 -Ref: Floating-point Programming-Footnote-1840392 -Ref: Floating-point Programming-Footnote-2840662 -Node: Floating-point Representation840926 -Node: Floating-point Context842091 -Ref: table-ieee-formats842930 -Node: Rounding Mode844314 -Ref: table-rounding-modes844793 -Ref: Rounding Mode-Footnote-1847808 -Node: Gawk and MPFR847987 -Node: Arbitrary Precision Floats849242 -Ref: Arbitrary Precision Floats-Footnote-1851685 -Node: Setting Precision852001 -Ref: table-predefined-precision-strings852687 -Node: Setting Rounding Mode854832 -Ref: table-gawk-rounding-modes855236 -Node: Floating-point Constants856423 -Node: Changing Precision857852 -Ref: Changing Precision-Footnote-1859252 -Node: Exact Arithmetic859426 -Node: Arbitrary Precision Integers862564 -Ref: Arbitrary Precision Integers-Footnote-1865582 -Node: Dynamic Extensions865729 -Node: Extension Intro867187 -Node: Plugin License868452 -Node: Extension Mechanism Outline869137 -Ref: load-extension869554 -Ref: load-new-function871032 -Ref: call-new-function872027 -Node: Extension API Description874042 -Node: Extension API Functions Introduction875255 -Node: General Data Types880121 -Ref: General Data Types-Footnote-1885723 -Node: Requesting Values886022 -Ref: table-value-types-returned886753 -Node: Constructor Functions887707 -Node: Registration Functions890727 -Node: Extension Functions891412 -Node: Exit Callback Functions893637 -Node: Extension Version String894886 -Node: Input Parsers895536 -Node: Output Wrappers905293 -Node: Two-way processors909803 -Node: Printing Messages912011 -Ref: Printing Messages-Footnote-1913088 -Node: Updating `ERRNO'913240 -Node: Accessing Parameters913979 -Node: Symbol Table Access915209 -Node: Symbol table by name915721 -Node: Symbol table by cookie917468 -Ref: Symbol table by cookie-Footnote-1921598 -Node: Cached values921661 -Ref: Cached values-Footnote-1925110 -Node: Array Manipulation925201 -Ref: Array Manipulation-Footnote-1926299 -Node: Array Data Types926338 -Ref: Array Data Types-Footnote-1929041 -Node: Array Functions929133 -Node: Flattening Arrays932899 -Node: Creating Arrays939751 -Node: Extension API Variables944476 -Node: Extension Versioning945112 -Node: Extension API Informational Variables947013 -Node: Extension API Boilerplate948099 -Node: Finding Extensions951903 -Node: Extension Example952463 -Node: Internal File Description953194 -Node: Internal File Ops957285 -Ref: Internal File Ops-Footnote-1968793 -Node: Using Internal File Ops968933 -Ref: Using Internal File Ops-Footnote-1971286 -Node: Extension Samples971552 -Node: Extension Sample File Functions973076 -Node: Extension Sample Fnmatch981563 -Node: Extension Sample Fork983289 -Node: Extension Sample Inplace984507 -Node: Extension Sample Ord986285 -Node: Extension Sample Readdir987121 -Node: Extension Sample Revout988653 -Node: Extension Sample Rev2way989246 -Node: Extension Sample Read write array989936 -Node: Extension Sample Readfile991819 -Node: Extension Sample API Tests992637 -Node: Extension Sample Time993162 -Node: gawkextlib994526 -Node: Language History997286 -Node: V7/SVR3.1998808 -Node: SVR41001129 -Node: POSIX1002571 -Node: BTL1003957 -Node: POSIX/GNU1004691 -Node: Common Extensions1010226 -Node: Ranges and Locales1011532 -Ref: Ranges and Locales-Footnote-11016150 -Ref: Ranges and Locales-Footnote-21016177 -Ref: Ranges and Locales-Footnote-31016437 -Node: Contributors1016658 -Node: Installation1021537 -Node: Gawk Distribution1022431 -Node: Getting1022915 -Node: Extracting1023741 -Node: Distribution contents1025433 -Node: Unix Installation1030694 -Node: Quick Installation1031311 -Node: Additional Configuration Options1033755 -Node: Configuration Philosophy1035232 -Node: Non-Unix Installation1037586 -Node: PC Installation1038044 -Node: PC Binary Installation1039343 -Node: PC Compiling1041191 -Node: PC Testing1044135 -Node: PC Using1045311 -Node: Cygwin1049496 -Node: MSYS1050496 -Node: VMS Installation1051010 -Node: VMS Compilation1051613 -Ref: VMS Compilation-Footnote-11052620 -Node: VMS Installation Details1052678 -Node: VMS Running1054313 -Node: VMS Old Gawk1055920 -Node: Bugs1056394 -Node: Other Versions1060246 -Node: Notes1065847 -Node: Compatibility Mode1066647 -Node: Additions1067430 -Node: Accessing The Source1068357 -Node: Adding Code1069797 -Node: New Ports1075842 -Node: Derived Files1079977 -Ref: Derived Files-Footnote-11085298 -Ref: Derived Files-Footnote-21085332 -Ref: Derived Files-Footnote-31085932 -Node: Future Extensions1086030 -Node: Implementation Limitations1086611 -Node: Extension Design1087863 -Node: Old Extension Problems1089017 -Ref: Old Extension Problems-Footnote-11090525 -Node: Extension New Mechanism Goals1090582 -Ref: Extension New Mechanism Goals-Footnote-11093948 -Node: Extension Other Design Decisions1094134 -Node: Extension Future Growth1096240 -Node: Old Extension Mechanism1097076 -Node: Basic Concepts1098816 -Node: Basic High Level1099497 -Ref: figure-general-flow1099768 -Ref: figure-process-flow1100367 -Ref: Basic High Level-Footnote-11103596 -Node: Basic Data Typing1103781 -Node: Glossary1107136 -Node: Copying1132598 -Node: GNU Free Documentation License1170155 -Node: Index1195292 +Node: Top1292 +Node: Foreword40821 +Node: Preface45166 +Ref: Preface-Footnote-148219 +Ref: Preface-Footnote-248315 +Node: History48547 +Node: Names50921 +Ref: Names-Footnote-152398 +Node: This Manual52470 +Ref: This Manual-Footnote-158244 +Node: Conventions58344 +Node: Manual History60500 +Ref: Manual History-Footnote-163948 +Ref: Manual History-Footnote-263989 +Node: How To Contribute64063 +Node: Acknowledgments65207 +Node: Getting Started69401 +Node: Running gawk71780 +Node: One-shot72966 +Node: Read Terminal74191 +Ref: Read Terminal-Footnote-175841 +Ref: Read Terminal-Footnote-276117 +Node: Long76288 +Node: Executable Scripts77664 +Ref: Executable Scripts-Footnote-179497 +Ref: Executable Scripts-Footnote-279599 +Node: Comments80146 +Node: Quoting82613 +Node: DOS Quoting87236 +Node: Sample Data Files87911 +Node: Very Simple90426 +Node: Two Rules95077 +Node: More Complex96975 +Ref: More Complex-Footnote-199905 +Node: Statements/Lines99990 +Ref: Statements/Lines-Footnote-1104453 +Node: Other Features104718 +Node: When105646 +Node: Invoking Gawk107793 +Node: Command Line109256 +Node: Options110039 +Ref: Options-Footnote-1125417 +Node: Other Arguments125442 +Node: Naming Standard Input128100 +Node: Environment Variables129194 +Node: AWKPATH Variable129752 +Ref: AWKPATH Variable-Footnote-1132533 +Ref: AWKPATH Variable-Footnote-2132578 +Node: AWKLIBPATH Variable132838 +Node: Other Environment Variables133556 +Node: Exit Status136519 +Node: Include Files137194 +Node: Loading Shared Libraries140763 +Node: Obsolete142127 +Node: Undocumented142824 +Node: Regexp143066 +Node: Regexp Usage144455 +Node: Escape Sequences146480 +Node: Regexp Operators152149 +Ref: Regexp Operators-Footnote-1159529 +Ref: Regexp Operators-Footnote-2159676 +Node: Bracket Expressions159774 +Ref: table-char-classes161664 +Node: GNU Regexp Operators164187 +Node: Case-sensitivity167910 +Ref: Case-sensitivity-Footnote-1170878 +Ref: Case-sensitivity-Footnote-2171113 +Node: Leftmost Longest171221 +Node: Computed Regexps172422 +Node: Reading Files175759 +Node: Records177761 +Ref: Records-Footnote-1187284 +Node: Fields187321 +Ref: Fields-Footnote-1190277 +Node: Nonconstant Fields190363 +Node: Changing Fields192569 +Node: Field Separators198528 +Node: Default Field Splitting201230 +Node: Regexp Field Splitting202347 +Node: Single Character Fields205689 +Node: Command Line Field Separator206748 +Node: Full Line Fields210090 +Ref: Full Line Fields-Footnote-1210598 +Node: Field Splitting Summary210644 +Ref: Field Splitting Summary-Footnote-1213743 +Node: Constant Size213844 +Node: Splitting By Content218451 +Ref: Splitting By Content-Footnote-1222200 +Node: Multiple Line222240 +Ref: Multiple Line-Footnote-1228087 +Node: Getline228266 +Node: Plain Getline230482 +Node: Getline/Variable232577 +Node: Getline/File233724 +Node: Getline/Variable/File235065 +Ref: Getline/Variable/File-Footnote-1236664 +Node: Getline/Pipe236751 +Node: Getline/Variable/Pipe239450 +Node: Getline/Coprocess240557 +Node: Getline/Variable/Coprocess241809 +Node: Getline Notes242546 +Node: Getline Summary245333 +Ref: table-getline-variants245741 +Node: Read Timeout246653 +Ref: Read Timeout-Footnote-1250394 +Node: Command line directories250451 +Node: Printing251081 +Node: Print252712 +Node: Print Examples254049 +Node: Output Separators256833 +Node: OFMT258849 +Node: Printf260207 +Node: Basic Printf261113 +Node: Control Letters262652 +Node: Format Modifiers266464 +Node: Printf Examples272473 +Node: Redirection275185 +Node: Special Files282159 +Node: Special FD282692 +Ref: Special FD-Footnote-1286317 +Node: Special Network286391 +Node: Special Caveats287241 +Node: Close Files And Pipes288037 +Ref: Close Files And Pipes-Footnote-1295020 +Ref: Close Files And Pipes-Footnote-2295168 +Node: Expressions295318 +Node: Values296450 +Node: Constants297126 +Node: Scalar Constants297806 +Ref: Scalar Constants-Footnote-1298665 +Node: Nondecimal-numbers298847 +Node: Regexp Constants301847 +Node: Using Constant Regexps302322 +Node: Variables305377 +Node: Using Variables306032 +Node: Assignment Options307756 +Node: Conversion309631 +Ref: table-locale-affects315131 +Ref: Conversion-Footnote-1315755 +Node: All Operators315864 +Node: Arithmetic Ops316494 +Node: Concatenation318999 +Ref: Concatenation-Footnote-1321787 +Node: Assignment Ops321907 +Ref: table-assign-ops326895 +Node: Increment Ops328226 +Node: Truth Values and Conditions331660 +Node: Truth Values332743 +Node: Typing and Comparison333792 +Node: Variable Typing334585 +Ref: Variable Typing-Footnote-1338482 +Node: Comparison Operators338604 +Ref: table-relational-ops339014 +Node: POSIX String Comparison342562 +Ref: POSIX String Comparison-Footnote-1343518 +Node: Boolean Ops343656 +Ref: Boolean Ops-Footnote-1347726 +Node: Conditional Exp347817 +Node: Function Calls349549 +Node: Precedence353143 +Node: Locales356812 +Node: Patterns and Actions357901 +Node: Pattern Overview358955 +Node: Regexp Patterns360624 +Node: Expression Patterns361167 +Node: Ranges364948 +Node: BEGIN/END368052 +Node: Using BEGIN/END368814 +Ref: Using BEGIN/END-Footnote-1371550 +Node: I/O And BEGIN/END371656 +Node: BEGINFILE/ENDFILE373938 +Node: Empty376852 +Node: Using Shell Variables377169 +Node: Action Overview379454 +Node: Statements381811 +Node: If Statement383665 +Node: While Statement385164 +Node: Do Statement387208 +Node: For Statement388364 +Node: Switch Statement391516 +Node: Break Statement393670 +Node: Continue Statement395660 +Node: Next Statement397453 +Node: Nextfile Statement399843 +Node: Exit Statement402498 +Node: Built-in Variables404914 +Node: User-modified406009 +Ref: User-modified-Footnote-1414367 +Node: Auto-set414429 +Ref: Auto-set-Footnote-1427886 +Ref: Auto-set-Footnote-2428091 +Node: ARGC and ARGV428147 +Node: Arrays432001 +Node: Array Basics433506 +Node: Array Intro434332 +Node: Reference to Elements438649 +Node: Assigning Elements440919 +Node: Array Example441410 +Node: Scanning an Array443142 +Node: Controlling Scanning445456 +Ref: Controlling Scanning-Footnote-1450543 +Node: Delete450859 +Ref: Delete-Footnote-1453624 +Node: Numeric Array Subscripts453681 +Node: Uninitialized Subscripts455864 +Node: Multidimensional457491 +Node: Multiscanning460584 +Node: Arrays of Arrays462173 +Node: Functions466813 +Node: Built-in467632 +Node: Calling Built-in468710 +Node: Numeric Functions470698 +Ref: Numeric Functions-Footnote-1474530 +Ref: Numeric Functions-Footnote-2474887 +Ref: Numeric Functions-Footnote-3474935 +Node: String Functions475204 +Ref: String Functions-Footnote-1498162 +Ref: String Functions-Footnote-2498291 +Ref: String Functions-Footnote-3498539 +Node: Gory Details498626 +Ref: table-sub-escapes500305 +Ref: table-sub-posix-92501659 +Ref: table-sub-proposed503010 +Ref: table-posix-sub504364 +Ref: table-gensub-escapes505909 +Ref: Gory Details-Footnote-1507085 +Ref: Gory Details-Footnote-2507136 +Node: I/O Functions507287 +Ref: I/O Functions-Footnote-1514277 +Node: Time Functions514424 +Ref: Time Functions-Footnote-1525407 +Ref: Time Functions-Footnote-2525475 +Ref: Time Functions-Footnote-3525633 +Ref: Time Functions-Footnote-4525744 +Ref: Time Functions-Footnote-5525856 +Ref: Time Functions-Footnote-6526083 +Node: Bitwise Functions526349 +Ref: table-bitwise-ops526911 +Ref: Bitwise Functions-Footnote-1531132 +Node: Type Functions531316 +Node: I18N Functions532467 +Node: User-defined534094 +Node: Definition Syntax534898 +Ref: Definition Syntax-Footnote-1539812 +Node: Function Example539881 +Ref: Function Example-Footnote-1542530 +Node: Function Caveats542552 +Node: Calling A Function543070 +Node: Variable Scope544025 +Node: Pass By Value/Reference546988 +Node: Return Statement550496 +Node: Dynamic Typing553477 +Node: Indirect Calls554408 +Node: Library Functions564095 +Ref: Library Functions-Footnote-1567608 +Ref: Library Functions-Footnote-2567751 +Node: Library Names567922 +Ref: Library Names-Footnote-1571395 +Ref: Library Names-Footnote-2571615 +Node: General Functions571701 +Node: Strtonum Function572729 +Node: Assert Function575659 +Node: Round Function578985 +Node: Cliff Random Function580526 +Node: Ordinal Functions581542 +Ref: Ordinal Functions-Footnote-1584619 +Ref: Ordinal Functions-Footnote-2584871 +Node: Join Function585082 +Ref: Join Function-Footnote-1586853 +Node: Getlocaltime Function587053 +Node: Readfile Function590794 +Node: Data File Management592633 +Node: Filetrans Function593265 +Node: Rewind Function597334 +Node: File Checking598721 +Node: Empty Files599815 +Node: Ignoring Assigns602045 +Node: Getopt Function603599 +Ref: Getopt Function-Footnote-1614902 +Node: Passwd Functions615105 +Ref: Passwd Functions-Footnote-1624083 +Node: Group Functions624171 +Node: Walking Arrays632255 +Node: Sample Programs634391 +Node: Running Examples635065 +Node: Clones635793 +Node: Cut Program637017 +Node: Egrep Program646868 +Ref: Egrep Program-Footnote-1654641 +Node: Id Program654751 +Node: Split Program658400 +Ref: Split Program-Footnote-1661919 +Node: Tee Program662047 +Node: Uniq Program664850 +Node: Wc Program672279 +Ref: Wc Program-Footnote-1676545 +Ref: Wc Program-Footnote-2676745 +Node: Miscellaneous Programs676837 +Node: Dupword Program678025 +Node: Alarm Program680056 +Node: Translate Program684863 +Ref: Translate Program-Footnote-1689250 +Ref: Translate Program-Footnote-2689498 +Node: Labels Program689632 +Ref: Labels Program-Footnote-1693003 +Node: Word Sorting693087 +Node: History Sorting696971 +Node: Extract Program698810 +Ref: Extract Program-Footnote-1706313 +Node: Simple Sed706441 +Node: Igawk Program709503 +Ref: Igawk Program-Footnote-1724660 +Ref: Igawk Program-Footnote-2724861 +Node: Anagram Program724999 +Node: Signature Program728067 +Node: Advanced Features729167 +Node: Nondecimal Data731053 +Node: Array Sorting732636 +Node: Controlling Array Traversal733333 +Node: Array Sorting Functions741617 +Ref: Array Sorting Functions-Footnote-1745486 +Node: Two-way I/O745680 +Ref: Two-way I/O-Footnote-1751112 +Node: TCP/IP Networking751194 +Node: Profiling754038 +Node: Internationalization761541 +Node: I18N and L10N762966 +Node: Explaining gettext763652 +Ref: Explaining gettext-Footnote-1768720 +Ref: Explaining gettext-Footnote-2768904 +Node: Programmer i18n769069 +Node: Translator i18n773271 +Node: String Extraction774065 +Ref: String Extraction-Footnote-1775026 +Node: Printf Ordering775112 +Ref: Printf Ordering-Footnote-1777894 +Node: I18N Portability777958 +Ref: I18N Portability-Footnote-1780407 +Node: I18N Example780470 +Ref: I18N Example-Footnote-1783108 +Node: Gawk I18N783180 +Node: Debugger783801 +Node: Debugging784772 +Node: Debugging Concepts785205 +Node: Debugging Terms787061 +Node: Awk Debugging789658 +Node: Sample Debugging Session790550 +Node: Debugger Invocation791070 +Node: Finding The Bug792403 +Node: List of Debugger Commands798890 +Node: Breakpoint Control800224 +Node: Debugger Execution Control803888 +Node: Viewing And Changing Data807248 +Node: Execution Stack810604 +Node: Debugger Info812071 +Node: Miscellaneous Debugger Commands816053 +Node: Readline Support821229 +Node: Limitations822060 +Node: Arbitrary Precision Arithmetic824312 +Ref: Arbitrary Precision Arithmetic-Footnote-1825961 +Node: General Arithmetic826109 +Node: Floating Point Issues827829 +Node: String Conversion Precision828710 +Ref: String Conversion Precision-Footnote-1830415 +Node: Unexpected Results830524 +Node: POSIX Floating Point Problems832677 +Ref: POSIX Floating Point Problems-Footnote-1836502 +Node: Integer Programming836540 +Node: Floating-point Programming838279 +Ref: Floating-point Programming-Footnote-1844610 +Ref: Floating-point Programming-Footnote-2844880 +Node: Floating-point Representation845144 +Node: Floating-point Context846309 +Ref: table-ieee-formats847148 +Node: Rounding Mode848532 +Ref: table-rounding-modes849011 +Ref: Rounding Mode-Footnote-1852026 +Node: Gawk and MPFR852205 +Node: Arbitrary Precision Floats853616 +Ref: Arbitrary Precision Floats-Footnote-1856059 +Node: Setting Precision856375 +Ref: table-predefined-precision-strings857061 +Node: Setting Rounding Mode859206 +Ref: table-gawk-rounding-modes859610 +Node: Floating-point Constants860797 +Node: Changing Precision862226 +Ref: Changing Precision-Footnote-1863623 +Node: Exact Arithmetic863797 +Node: Arbitrary Precision Integers866935 +Ref: Arbitrary Precision Integers-Footnote-1869950 +Node: Dynamic Extensions870097 +Node: Extension Intro871555 +Node: Plugin License872820 +Node: Extension Mechanism Outline873505 +Ref: load-extension873922 +Ref: load-new-function875400 +Ref: call-new-function876395 +Node: Extension API Description878410 +Node: Extension API Functions Introduction879697 +Node: General Data Types884624 +Ref: General Data Types-Footnote-1890319 +Node: Requesting Values890618 +Ref: table-value-types-returned891355 +Node: Memory Allocation Functions892309 +Ref: Memory Allocation Functions-Footnote-1895055 +Node: Constructor Functions895151 +Node: Registration Functions896909 +Node: Extension Functions897594 +Node: Exit Callback Functions899896 +Node: Extension Version String901145 +Node: Input Parsers901795 +Node: Output Wrappers911552 +Node: Two-way processors916062 +Node: Printing Messages918270 +Ref: Printing Messages-Footnote-1919347 +Node: Updating `ERRNO'919499 +Node: Accessing Parameters920238 +Node: Symbol Table Access921468 +Node: Symbol table by name921982 +Node: Symbol table by cookie923958 +Ref: Symbol table by cookie-Footnote-1928090 +Node: Cached values928153 +Ref: Cached values-Footnote-1931643 +Node: Array Manipulation931734 +Ref: Array Manipulation-Footnote-1932832 +Node: Array Data Types932871 +Ref: Array Data Types-Footnote-1935574 +Node: Array Functions935666 +Node: Flattening Arrays939502 +Node: Creating Arrays946354 +Node: Extension API Variables951079 +Node: Extension Versioning951715 +Node: Extension API Informational Variables953616 +Node: Extension API Boilerplate954702 +Node: Finding Extensions958506 +Node: Extension Example959066 +Node: Internal File Description959796 +Node: Internal File Ops963887 +Ref: Internal File Ops-Footnote-1975396 +Node: Using Internal File Ops975536 +Ref: Using Internal File Ops-Footnote-1977889 +Node: Extension Samples978155 +Node: Extension Sample File Functions979679 +Node: Extension Sample Fnmatch988164 +Node: Extension Sample Fork989933 +Node: Extension Sample Inplace991146 +Node: Extension Sample Ord992924 +Node: Extension Sample Readdir993760 +Node: Extension Sample Revout995292 +Node: Extension Sample Rev2way995885 +Node: Extension Sample Read write array996575 +Node: Extension Sample Readfile998458 +Node: Extension Sample API Tests999558 +Node: Extension Sample Time1000083 +Node: gawkextlib1001447 +Node: Language History1004228 +Node: V7/SVR3.11005821 +Node: SVR41008141 +Node: POSIX1009583 +Node: BTL1010969 +Node: POSIX/GNU1011703 +Node: Feature History1017302 +Node: Common Extensions1030278 +Node: Ranges and Locales1031590 +Ref: Ranges and Locales-Footnote-11036207 +Ref: Ranges and Locales-Footnote-21036234 +Ref: Ranges and Locales-Footnote-31036468 +Node: Contributors1036689 +Node: Installation1042070 +Node: Gawk Distribution1042964 +Node: Getting1043448 +Node: Extracting1044274 +Node: Distribution contents1045966 +Node: Unix Installation1051671 +Node: Quick Installation1052288 +Node: Additional Configuration Options1054734 +Node: Configuration Philosophy1056470 +Node: Non-Unix Installation1058824 +Node: PC Installation1059282 +Node: PC Binary Installation1060581 +Node: PC Compiling1062429 +Node: PC Testing1065373 +Node: PC Using1066549 +Node: Cygwin1070717 +Node: MSYS1071526 +Node: VMS Installation1072040 +Node: VMS Compilation1072804 +Ref: VMS Compilation-Footnote-11074056 +Node: VMS Dynamic Extensions1074114 +Node: VMS Installation Details1075487 +Node: VMS Running1077738 +Node: VMS GNV1080572 +Node: VMS Old Gawk1081295 +Node: Bugs1081765 +Node: Other Versions1085683 +Node: Notes1091767 +Node: Compatibility Mode1092567 +Node: Additions1093350 +Node: Accessing The Source1094277 +Node: Adding Code1095717 +Node: New Ports1101762 +Node: Derived Files1105897 +Ref: Derived Files-Footnote-11111218 +Ref: Derived Files-Footnote-21111252 +Ref: Derived Files-Footnote-31111852 +Node: Future Extensions1111950 +Node: Implementation Limitations1112533 +Node: Extension Design1113785 +Node: Old Extension Problems1114939 +Ref: Old Extension Problems-Footnote-11116447 +Node: Extension New Mechanism Goals1116504 +Ref: Extension New Mechanism Goals-Footnote-11119869 +Node: Extension Other Design Decisions1120055 +Node: Extension Future Growth1122161 +Node: Old Extension Mechanism1122997 +Node: Basic Concepts1124737 +Node: Basic High Level1125418 +Ref: figure-general-flow1125690 +Ref: figure-process-flow1126289 +Ref: Basic High Level-Footnote-11129518 +Node: Basic Data Typing1129703 +Node: Glossary1133058 +Node: Copying1158289 +Node: GNU Free Documentation License1195845 +Node: Index1220981 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index fa3e5871..539ea53d 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -19,6 +19,20 @@ * awk: (gawk)Invoking gawk. Text scanning and processing. @end direntry +@ifset FOR_PRINT +@tex +\gdef\xrefprintnodename#1{``#1''} +@end tex +@end ifset +@ifclear FOR_PRINT +@c With early 2014 texinfo.tex, restore PDF links and colors +@tex +\gdef\linkcolor{0.5 0.09 0.12} % Dark Red +\gdef\urlcolor{0.5 0.09 0.12} % Also +\global\urefurlonlylinktrue +@end tex +@end ifclear + @set xref-automatic-section-title @c The following information should be updated here only! @@ -26,9 +40,9 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH May, 2013 +@set UPDATE-MONTH April, 2014 @set VERSION 4.1 -@set PATCHLEVEL 0 +@set PATCHLEVEL 1 @set FSF @@ -102,11 +116,19 @@ @end ifnottex @ifnottex +@ifnotdocbook @macro ii{text} @i{\text\} @end macro +@end ifnotdocbook @end ifnottex +@ifdocbook +@macro ii{text} +@inlineraw{docbook,<lineannotation>\text\</lineannotation>} +@end macro +@end ifdocbook + @c For HTML, spell out email addresses, to avoid problems with @c address harvesters for spammers. @ifhtml @@ -120,19 +142,36 @@ @end macro @end ifnothtml -@set FN file name -@set FFN File Name -@set DF data file -@set DDF Data File -@set PVERSION version -@set CTL Ctrl +@c Indexing macros +@ifinfo + +@macro cindexawkfunc{name} +@cindex @code{\name\} +@end macro + +@macro cindexgawkfunc{name} +@cindex @code{\name\} +@end macro + +@end ifinfo + +@ifnotinfo + +@macro cindexawkfunc{name} +@cindex @code{\name\()} function +@end macro + +@macro cindexgawkfunc{name} +@cindex @code{\name\()} function (@command{gawk}) +@end macro +@end ifnotinfo @ignore Some comments on the layout for TeX. -1. Use at least texinfo.tex 2000-09-06.09 -2. I have done A LOT of work to make this look good. There are `@page' commands - and use of `@group ... @end group' in a number of places. If you muck - with anything, it's your responsibility not to break the layout. +1. Use at least texinfo.tex 2014-01-30.15 +2. When using @docbook, if the last line is part of a paragraph, end +it with a space and @c so that the lines won't run together. This is a +quirk of the language / makeinfo, and isn't going to change. @end ignore @c merge the function and variable indexes into the concept index @@ -148,6 +187,10 @@ Some comments on the layout for TeX. @syncodeindex fn cp @syncodeindex vr cp @end ifxml +@ifdocbook +@synindex fn cp +@synindex vr cp +@end ifdocbook @c If "finalout" is commented out, the printed output will show @c black boxes that mark lines that are too long. Thus, it is @@ -159,9 +202,26 @@ Some comments on the layout for TeX. @end iftex @copying -Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, -2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013 +@docbook +<para>Published by:</para> + +<literallayout class="normal">Free Software Foundation +51 Franklin Street, Fifth Floor +Boston, MA 02110-1301 USA +Phone: +1-617-542-5942 +Fax: +1-617-542-2652 +Email: <email>gnu@@gnu.org</email> +URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout> + +<literallayout class="normal">Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2014 Free Software Foundation, Inc. +All Rights Reserved.</literallayout> +@end docbook + +@ifnotdocbook +Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @* +Free Software Foundation, Inc. +@end ifnotdocbook @sp 2 This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, @@ -201,6 +261,7 @@ supports it in developing GNU and promoting software freedom.'' @c during editing and review. @setchapternewpage odd +@shorttitlepage GNU Awk @titlepage @title @value{TITLE} @subtitle @value{SUBTITLE} @@ -208,6 +269,7 @@ supports it in developing GNU and promoting software freedom.'' @subtitle @value{UPDATE-MONTH} @author Arnold D. Robbins +@ifnotdocbook @c Include the Distribution inside the titlepage environment so @c that headings are turned off. Headings on and off do not work. @@ -232,6 +294,7 @@ URL: @uref{http://www.gnu.org/} @* ISBN 1-882114-28-0 @* @sp 2 @insertcopying +@end ifnotdocbook @end titlepage @c Thanks to Bob Chassell for directions on doing dedications. @@ -256,6 +319,18 @@ ISBN 1-882114-28-0 @* @headings on @end iftex +@docbook +<dedication> +<simplelist> +<member>To Miriam, for making me complete.</member> +<member>To Chana, for the joy you bring us.</member> +<member>To Rivka, for the exponential increase.</member> +<member>To Nachum, for the added dimension.</member> +<member>To Malka, for the new beginning.</member> +</simplelist> +</dedication> +@end docbook + @iftex @headings off @evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| @@ -264,6 +339,7 @@ ISBN 1-882114-28-0 @* @ifnottex @ifnotxml +@ifnotdocbook @node Top @top General Introduction @c Preface node should come right after the Top @@ -275,6 +351,7 @@ particular records in a file and perform operations upon them. @insertcopying +@end ifnotdocbook @end ifnotxml @end ifnottex @@ -407,10 +484,12 @@ particular records in a file and perform operations upon them. field. * Command Line Field Separator:: Setting @code{FS} from the command-line. +* Full Line Fields:: Making the full line be a single + field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. @@ -561,9 +640,9 @@ particular records in a file and perform operations upon them. @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. * Arrays of Arrays:: True multidimensional arrays. * Built-in:: Summarizes the built-in functions. * Calling Built-in:: How to call built-in functions. @@ -615,6 +694,8 @@ particular records in a file and perform operations upon them. * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at + once. * Data File Management:: Functions for managing command-line data files. * Filetrans Function:: A function for handling data file @@ -732,6 +813,7 @@ particular records in a file and perform operations upon them. * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with @command{gawk}. @@ -794,6 +876,8 @@ particular records in a file and perform operations upon them. version of @command{awk}. * POSIX/GNU:: The extensions in @command{gawk} not in POSIX @command{awk}. +* Feature History:: The history of the features in + @command{gawk}. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. @@ -826,9 +910,12 @@ particular records in a file and perform operations upon them. * VMS Installation:: Installing @command{gawk} on VMS. * VMS Compilation:: How to compile @command{gawk} under VMS. +* VMS Dynamic Extensions:: Compiling @command{gawk} dynamic + extensions on VMS. * VMS Installation Details:: How to install @command{gawk} under VMS. * VMS Running:: How to run @command{gawk} under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. * Bugs:: Reporting Problems and Bugs. @@ -962,21 +1049,37 @@ and the AWK prototype becomes the product. The new @command{pgawk} (profiling @command{gawk}), produces program execution counts. I recently experimented with an algorithm that for -@math{n} lines of input, exhibited +@ifnotdocbook +@math{n} +@end ifnotdocbook +@ifdocbook +@i{n} +@end ifdocbook +lines of input, exhibited @tex $\sim\! Cn^2$ @end tex @ifnottex +@ifnotdocbook ~ C n^2 +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>∼ Cn<superscript>2</superscript></emphasis> @c +@end docbook performance, while theory predicted @tex $\sim\! Cn\log n$ @end tex @ifnottex +@ifnotdocbook ~ C n log n +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>∼ Cn log n</emphasis> @c +@end docbook behavior. A few minutes poring over the @file{awkprof.out} profile pinpointed the problem to a single line of code. @command{pgawk} is a welcome addition to @@ -986,6 +1089,7 @@ Arnold has distilled over a decade of experience writing and using AWK programs, and developing @command{gawk}, into this book. If you use AWK or want to learn how, then read this book. +@cindex Brennan, Michael @display Michael Brennan Author of @command{mawk} @@ -1010,6 +1114,7 @@ Such jobs are often easier with @command{awk}. The @command{awk} utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. +@cindex Brian Kernighan's @command{awk} The GNU implementation of @command{awk} is called @command{gawk}; if you invoke it with the proper options or environment variables (@pxref{Options}), it is fully @@ -1188,17 +1293,17 @@ wrote the bulk of @cite{TCP/IP Internetworking with @command{gawk}} (a separate document, available as part of the @command{gawk} distribution). His code finally became part of the main @command{gawk} distribution -with @command{gawk} @value{PVERSION} 3.1. +with @command{gawk} version 3.1. John Haque rewrote the @command{gawk} internals, in the process providing an @command{awk}-level debugger. This version became available as -@command{gawk} @value{PVERSION} 4.0, in 2011. +@command{gawk} version 4.0, in 2011. @xref{Contributors}, for a complete list of those who made important contributions to @command{gawk}. @node Names -@section A Rose by Any Other Name +@unnumberedsec A Rose by Any Other Name @cindex @command{awk}, new vs.@: old The @command{awk} language has evolved over the years. Full details are @@ -1234,7 +1339,7 @@ we simply use the term @command{awk}. When referring to a feature that is specific to the GNU implementation, we use the term @command{gawk}. @node This Manual -@section Using This Book +@unnumberedsec Using This Book @cindex @command{awk}, terms describing The term @command{awk} refers to a particular program as well as to the language you @@ -1244,7 +1349,7 @@ and the program ``the @command{awk} utility.'' This @value{DOCUMENT} explains both how to write programs in the @command{awk} language and how to run the @command{awk} utility. -The term @dfn{@command{awk} program} refers to a program written by you in +The term ``@command{awk} program'' refers to a program written by you in the @command{awk} programming language. @cindex @command{gawk}, @command{awk} and @@ -1407,7 +1512,7 @@ present the licenses that cover the @command{gawk} source code and this @value{DOCUMENT}, respectively. @node Conventions -@section Typographical Conventions +@unnumberedsec Typographical Conventions @cindex Texinfo This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, @@ -1446,23 +1551,23 @@ emphasized @emph{like this}, and if a point needs to be made strongly, it is done @strong{like this}. The first occurrence of a new term is usually its @dfn{definition} and appears in the same font as the previous occurrence of ``definition'' in this sentence. -Finally, @value{FN}s are indicated like this: @file{/path/to/ourfile}. +Finally, file names are indicated like this: @file{/path/to/ourfile}. @end ifnotinfo Characters that you type at the keyboard look @kbd{like this}. In particular, there are special characters called ``control characters.'' These are characters that you type by holding down both the @kbd{CONTROL} key and -another key, at the same time. For example, a @kbd{@value{CTL}-d} is typed +another key, at the same time. For example, a @kbd{Ctrl-d} is typed by first pressing and holding the @kbd{CONTROL} key, next pressing the @kbd{d} key and finally releasing both keys. @c fakenode --- for prepinfo -@subsubheading Dark Corners +@unnumberedsubsec Dark Corners @cindex Kernighan, Brian @quotation @i{Dark corners are basically fractal --- no matter how much -you illuminate, there's always a smaller but darker one.}@* -Brian Kernighan +you illuminate, there's always a smaller but darker one.} +@author Brian Kernighan @end quotation @cindex d.c., See dark corner @@ -1597,7 +1702,7 @@ of @cite{GAWK: The GNU Awk User's Guide}. Edition @value{EDITION} maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features -in @command{gawk} @value{PVERSION} @value{VERSION}. +in @command{gawk} version @value{VERSION}. Of particular note is @ref{Array Sorting}, @ref{Bitwise Functions}, @@ -1760,7 +1865,7 @@ significant editorial help for this @value{DOCUMENT} for the 3.1 release of @command{gawk}. @end quotation -@cindex Beebe, Nelson +@cindex Beebe, Nelson H.F.@: @cindex Buening, Andreas @cindex Collado, Manuel @cindex Colombo, Antonio @@ -1777,7 +1882,6 @@ significant editorial help for this @value{DOCUMENT} for the @cindex Rankin, Pat @cindex Schorr, Andrew @cindex Vinschen, Corinna -@cindex Wallin, Anders @cindex Zaretskii, Eli Dr.@: Nelson Beebe, @@ -1797,7 +1901,6 @@ Chet Ramey, Pat Rankin, Andrew Schorr, Corinna Vinschen, -Anders Wallin, and Eli Zaretskii (in alphabetical order) make up the current @@ -2033,9 +2136,9 @@ awk '@var{program}' @noindent @command{awk} applies the @var{program} to the @dfn{standard input}, which usually means whatever you type on the terminal. This continues -until you indicate end-of-file by typing @kbd{@value{CTL}-d}. +until you indicate end-of-file by typing @kbd{Ctrl-d}. (On other operating systems, the end-of-file character may be different. -For example, on OS/2, it is @kbd{@value{CTL}-z}.) +For example, on OS/2, it is @kbd{Ctrl-z}.) @cindex files, input, See input files @cindex input files, running @command{awk} without @@ -2055,11 +2158,11 @@ $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} @print{} Don't Panic! @end example -@cindex quoting -@cindex double quote (@code{"}) -@cindex @code{"} (double quote) -@cindex @code{\} (backslash) -@cindex backslash (@code{\}) +@cindex shell quoting, double quote +@cindex double quote (@code{"}) in shell commands +@cindex @code{"} (double quote) in shell commands +@cindex @code{\} (backslash) in shell commands +@cindex backslash (@code{\}) in shell commands This program does not read any input. The @samp{\} before each of the inner double quotes is necessary because of the shell's quoting rules---in particular because it mixes both single quotes and @@ -2081,7 +2184,7 @@ $ @kbd{awk '@{ print @}'} @print{} Four score and seven years ago, ... @kbd{What, me worry?} @print{} What, me worry? -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @node Long @@ -2098,11 +2201,10 @@ more convenient to put the program into a separate file. In order to tell awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} @end example -@cindex @code{-f} option -@cindex command line, options -@cindex options, command-line +@cindex @option{-f} option +@cindex command line, option @option{-f} The @option{-f} instructs the @command{awk} utility to get the @command{awk} program -from the file @var{source-file}. Any @value{FN} can be used for +from the file @var{source-file}. Any file name can be used for @var{source-file}. For example, you could put the program: @example @@ -2123,22 +2225,22 @@ does the same thing as this one: awk "BEGIN @{ print \"Don't Panic!\" @}" @end example -@cindex quoting +@cindex quoting in @command{gawk} command lines @noindent This was explained earlier (@pxref{Read Terminal}). -Note that you don't usually need single quotes around the @value{FN} that you -specify with @option{-f}, because most @value{FN}s don't contain any of the shell's +Note that you don't usually need single quotes around the file name that you +specify with @option{-f}, because most file names don't contain any of the shell's special characters. Notice that in @file{advice}, the @command{awk} program did not have single quotes around it. The quotes are only needed for programs that are provided on the @command{awk} command line. @c STARTOFRANGE sq1x -@cindex single quote (@code{'}) +@cindex single quote (@code{'}) in @command{gawk} command lines @c STARTOFRANGE qs2x -@cindex @code{'} (single quote) +@cindex @code{'} (single quote) in @command{gawk} command lines If you want to clearly identify your @command{awk} program files as such, -you can add the extension @file{.awk} to the @value{FN}. This doesn't +you can add the extension @file{.awk} to the file name. This doesn't affect the execution of the @command{awk} program but it does make ``housekeeping'' easier. @@ -2165,13 +2267,13 @@ BEGIN @{ print "Don't Panic!" @} After making this file executable (with the @command{chmod} utility), simply type @samp{advice} at the shell and the system arranges to run @command{awk}@footnote{The -line beginning with @samp{#!} lists the full @value{FN} of an interpreter +line beginning with @samp{#!} lists the full file name of an interpreter to run and an optional initial command-line argument to pass to that interpreter. The operating system then runs the interpreter with the given argument and the full argument list of the executed program. The first argument -in the list is the full @value{FN} of the @command{awk} program. +in the list is the full file name of the @command{awk} program. The rest of the -argument list contains either options to @command{awk}, or @value{DF}s, +argument list contains either options to @command{awk}, or data files, or both. Note that on many systems @command{awk} may be found in @file{/usr/bin} instead of in @file{/bin}. Caveat Emptor.} as if you had typed @samp{awk -f advice}: @@ -2285,7 +2387,7 @@ programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time. -@cindex quoting +@cindex quoting, for small awk programs @cindex single quote (@code{'}), vs.@: apostrophe @cindex @code{'} (single quote), vs.@: apostrophe @quotation CAUTION @@ -2326,7 +2428,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules. @node Quoting @subsection Shell-Quoting Issues -@cindex quoting, rules for +@cindex shell quoting, rules for @menu * DOS Quoting:: Quoting in Windows Batch Files. @@ -2361,10 +2463,10 @@ that character. The shell removes the backslash and passes the quoted character on to the command. @item -@cindex @code{\} (backslash) -@cindex backslash (@code{\}) -@cindex single quote (@code{'}) -@cindex @code{'} (single quote) +@cindex @code{\} (backslash), in shell commands +@cindex backslash (@code{\}), in shell commands +@cindex single quote (@code{'}), in shell commands +@cindex @code{'} (single quote), in shell commands Single quotes protect everything between the opening and closing quotes. The shell does no interpretation of the quoted text, passing it on verbatim to the command. @@ -2374,8 +2476,8 @@ Refer back to for an example of what happens if you try. @item -@cindex double quote (@code{"}) -@cindex @code{"} (double quote) +@cindex double quote (@code{"}), in shell commands +@cindex @code{"} (double quote), in shell commands Double quotes protect most things between the opening and closing quotes. The shell does at least variable and command substitution on the quoted text. Different shells may do additional kinds of processing on double-quoted text. @@ -2412,7 +2514,7 @@ awk -F "" '@var{program}' @var{files} # correct @end example @noindent -@cindex null strings, quoting and +@cindex null strings in @command{gawk} arguments, quoting and Don't use this: @example @@ -2421,11 +2523,11 @@ awk -F"" '@var{program}' @var{files} # wrong! @noindent In the second case, @command{awk} will attempt to use the text of the program -as the value of @code{FS}, and the first @value{FN} as the text of the program! +as the value of @code{FS}, and the first file name as the text of the program! This results in syntax errors at best, and confusing behavior at worst. @end itemize -@cindex quoting, tricks for +@cindex quoting in @command{gawk} command lines, tricks for Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this: @@ -2536,49 +2638,48 @@ gawk "@{ print \"\042\" $0 \"\042\" @}" @var{file} @node Sample Data Files -@section @value{DDF}s for the Examples +@section Data Files for the Examples @c For gawk >= 4.0, update these data files. No-one has such slow modems! @cindex input files, examples -@cindex @code{BBS-list} file +@cindex @code{mail-list} file Many of the examples in this @value{DOCUMENT} take their input from two sample -@value{DF}s. The first, @file{BBS-list}, represents a list of -computer bulletin board systems together with information about those systems. -The second @value{DF}, called @file{inventory-shipped}, contains +data files. The first, @file{mail-list}, represents a list of peoples' names +together with their email addresses and information about those people. +The second data file, called @file{inventory-shipped}, contains information about monthly shipments. In both files, each line is considered to be one @dfn{record}. -In the @value{DF} @file{BBS-list}, each record contains the name of a computer -bulletin board, its phone number, the board's baud rate(s), and a code for -the number of hours it is operational. An @samp{A} in the last column -means the board operates 24 hours a day. A @samp{B} in the last -column means the board only operates on evening and weekend hours. -A @samp{C} means the board operates only on weekends: +In the data file @file{mail-list}, each record contains the name of a person, +his/her phone number, his/her email-address, and a code for their relationship +with the author of the list. An @samp{A} in the last column +means that the person is an acquaintance. An @samp{F} in the last +column means that the person is a friend. +An @samp{R} means that the person is a relative: -@c 2e: Update the baud rates to reflect today's faster modems @example @c system if test ! -d eg ; then mkdir eg ; fi @c system if test ! -d eg/lib ; then mkdir eg/lib ; fi @c system if test ! -d eg/data ; then mkdir eg/data ; fi @c system if test ! -d eg/prog ; then mkdir eg/prog ; fi @c system if test ! -d eg/misc ; then mkdir eg/misc ; fi -@c file eg/data/BBS-list -aardvark 555-5553 1200/300 B -alpo-net 555-3412 2400/1200/300 A -barfly 555-7685 1200/300 A -bites 555-1675 2400/1200/300 A -camelot 555-0542 300 C -core 555-2912 1200/300 C -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sdace 555-3430 2400/1200/300 A -sabafoo 555-2127 1200/300 C +@c file eg/data/mail-list +Amelia 555-5553 amelia.zodiacusque@@gmail.com F +Anthony 555-3412 anthony.asserturo@@hotmail.com A +Becky 555-7685 becky.algebrarum@@gmail.com A +Bill 555-1675 bill.drowning@@hotmail.com A +Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +Camilla 555-2912 camilla.infusarum@@skynet.be R +Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +Julie 555-6699 julie.perscrutabor@@skeeve.com F +Martin 555-6480 martin.codicibus@@hotmail.com A +Samuel 555-3430 samuel.lanceolis@@shu.edu A +Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @c endfile @end example @cindex @code{inventory-shipped} file -The @value{DF} @file{inventory-shipped} represents +The data file @file{inventory-shipped} represents information about shipments during the year. Each record contains the month, the number of green crates shipped, the number of red boxes shipped, the number of @@ -2608,45 +2709,30 @@ Apr 21 70 74 514 @c endfile @end example -@ifinfo -If you are reading this in GNU Emacs using Info, you can copy the regions -of text showing these sample files into your own test files. This way you -can try out the examples shown in the remainder of this document. You do -this by using the command @kbd{M-x write-region} to copy text from the Info -file into a file for use with @command{awk} -(@xref{Misc File Ops, , Miscellaneous File Operations, emacs, GNU Emacs Manual}, -for more information). Using this information, create your own -@file{BBS-list} and @file{inventory-shipped} files and practice what you -learn in this @value{DOCUMENT}. - -@cindex Texinfo -If you are using the stand-alone version of Info, -see @ref{Extract Program}, -for an @command{awk} program that extracts these @value{DF}s from -@file{gawk.texi}, the Texinfo source file for this Info file. -@end ifinfo +The sample files are included in the @command{gawk} distribution, +in the directory @file{awklib/eg/data}. @node Very Simple @section Some Simple Examples The following command runs a simple @command{awk} program that searches the -input file @file{BBS-list} for the character string @samp{foo} (a +input file @file{mail-list} for the character string @samp{li} (a grouping of characters is usually called a @dfn{string}; the term @dfn{string} is based on similar usage in English, such as ``a string of pearls,'' or ``a string of cars in a train''): @example -awk '/foo/ @{ print $0 @}' BBS-list +awk '/li/ @{ print $0 @}' mail-list @end example @noindent -When lines containing @samp{foo} are found, they are printed because +When lines containing @samp{li} are found, they are printed because @w{@samp{print $0}} means print the current line. (Just @samp{print} by itself means the same thing, so we could have written that instead.) -You will notice that slashes (@samp{/}) surround the string @samp{foo} -in the @command{awk} program. The slashes indicate that @samp{foo} +You will notice that slashes (@samp{/}) surround the string @samp{li} +in the @command{awk} program. The slashes indicate that @samp{li} is the pattern to search for. This type of pattern is called a @dfn{regular expression}, which is covered in more detail later (@pxref{Regexp}). @@ -2658,11 +2744,11 @@ interpret any of it as special shell characters. Here is what this program prints: @example -$ @kbd{awk '/foo/ @{ print $0 @}' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '/li/ @{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @end example @cindex actions, default @@ -2675,7 +2761,7 @@ action is to print all lines that match the pattern. @cindex actions, empty Thus, we could leave out the action (the @code{print} statement and the curly braces) in the previous example and the result would be the same: -@command{awk} prints all lines matching the pattern @samp{foo}. By comparison, +@command{awk} prints all lines matching the pattern @samp{li}. By comparison, omitting the @code{print} statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed). @@ -2685,9 +2771,9 @@ collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description of the program will give you a good idea of what is going on, but please read the rest of the @value{DOCUMENT} to become an @command{awk} expert!) -Most of the examples use a @value{DF} named @file{data}. This is just a +Most of the examples use a data file named @file{data}. This is just a placeholder; if you use these programs yourself, substitute -your own @value{FN}s for @file{data}. +your own file names for @file{data}. For future reference, note that there is often more than one way to do things in @command{awk}. At some point, you may want to look back at these examples and see if @@ -2777,7 +2863,7 @@ awk 'END @{ print NR @}' data @end example @item -Print the even-numbered lines in the @value{DF}: +Print the even-numbered lines in the data file: @example awk 'NR % 2 == 0' data @@ -2819,30 +2905,24 @@ This program prints every line that contains the string @samp{12} @emph{or} the string @samp{21}. If a line contains both strings, it is printed twice, once by each rule. -This is what happens if we run this program on our two sample @value{DF}s, -@file{BBS-list} and @file{inventory-shipped}: +This is what happens if we run this program on our two sample data files, +@file{mail-list} and @file{inventory-shipped}: @example $ @kbd{awk '/12/ @{ print $0 @}} -> @kbd{/21/ @{ print $0 @}' BBS-list inventory-shipped} -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} core 555-2912 1200/300 C -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@print{} sabafoo 555-2127 1200/300 C +> @kbd{/21/ @{ print $0 @}' mail-list inventory-shipped} +@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A +@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @print{} Jan 21 36 64 620 @print{} Apr 21 70 74 514 @end example @noindent -Note how the line beginning with @samp{sabafoo} -in @file{BBS-list} was printed twice, once for each rule. +Note how the line beginning with @samp{Jean-Paul} +in @file{mail-list} was printed twice, once for each rule. @node More Complex @section A More Complex Example @@ -2885,7 +2965,7 @@ the file. The fourth field identifies the group of the file. The fifth field contains the size of the file in bytes. The sixth, seventh, and eighth fields contain the month, day, and time, respectively, that the file was last modified. Finally, the ninth field -contains the @value{FN}.@footnote{The @samp{LC_ALL=C} is +contains the file name.@footnote{The @samp{LC_ALL=C} is needed to produce this traditional-style output from @command{ls}.} @c @cindex automatic initialization @@ -2921,7 +3001,7 @@ separate rule, like this: @example awk '/12/ @{ print $0 @} - /21/ @{ print $0 @}' BBS-list inventory-shipped + /21/ @{ print $0 @}' mail-list inventory-shipped @end example @cindex @command{gawk}, newlines in @@ -3036,8 +3116,8 @@ noticed because it is ``hidden'' inside the comment. Thus, the @code{BEGIN} is noted as a syntax error. @cindex statements, multiple -@cindex @code{;} (semicolon) -@cindex semicolon (@code{;}) +@cindex @code{;} (semicolon), separating statements in actions +@cindex semicolon (@code{;}), separating statements in actions When @command{awk} statements within one rule are short, you might want to put more than one of them on a line. This is accomplished by separating the statements with a semicolon (@samp{;}). @@ -3097,6 +3177,7 @@ used once, and thrown away. Because @command{awk} programs are interpreted, you can avoid the (usually lengthy) compilation part of the typical edit-compile-test-debug cycle of software development. +@cindex Brian Kernighan's @command{awk} Complex programs have been written in @command{awk}, including a complete retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for more information), and a microcode assembler for a special-purpose Prolog @@ -3121,7 +3202,7 @@ easier to maintain and usually run more efficiently. @node Invoking Gawk @chapter Running @command{awk} and @command{gawk} -This @value{CHAPTER} covers how to run awk, both POSIX-standard +This @value{CHAPTER} covers how to run @command{awk}, both POSIX-standard and @command{gawk}-specific command-line options, and what @command{awk} and @command{gawk} do with non-option arguments. @@ -3159,10 +3240,19 @@ There are two ways to run @command{awk}---with an explicit program or with one or more program files. Here are templates for both of them; items enclosed in [@dots{}] in these templates are optional: +@ifnotdocbook @example awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{} awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{} @end example +@end ifnotdocbook + +@c FIXME - find a better way to mark this up in docbook +@docbook +<screen>awk [<replaceable>options</replaceable>] -f progfile [<literal>--</literal>] <replaceable>file</replaceable> … +awk [<replaceable>options</replaceable>] [<literal>--</literal>] '<replaceable>program</replaceable>' <replaceable>file</replaceable> … +</screen> +@end docbook @cindex GNU long options @cindex long options @@ -3178,7 +3268,7 @@ It is possible to invoke @command{awk} with an empty program: awk '' datafile1 datafile2 @end example -@cindex @code{--lint} option +@cindex @option{--lint} option @noindent Doing so makes little sense, though; @command{awk} exits silently when given an empty program. @@ -3218,43 +3308,27 @@ The following list describes options mandated by the POSIX standard: @table @code @item -F @var{fs} @itemx --field-separator @var{fs} -@cindex @code{-F} option -@cindex @code{--field-separator} option +@cindex @option{-F} option +@cindex @option{--field-separator} option @cindex @code{FS} variable, @code{--field-separator} option and Set the @code{FS} variable to @var{fs} (@pxref{Field Separators}). @item -f @var{source-file} @itemx --file @var{source-file} -@cindex @code{-f} option -@cindex @code{--file} option +@cindex @option{-f} option +@cindex @option{--file} option @cindex @command{awk} programs, location of Read @command{awk} program source from @var{source-file} instead of in the first non-option argument. This option may be given multiple times; the @command{awk} -program consists of the concatenation the contents of +program consists of the concatenation of the contents of each specified @var{source-file}. -@item -i @var{source-file} -@itemx --include @var{source-file} -@cindex @code{-i} option -@cindex @code{--include} option -@cindex @command{awk} programs, location of -Read @command{awk} source library from @var{source-file}. This option is -completely equivalent to using the @samp{@@include} directive inside -your program. This option is very -similar to the @option{-f} option, but there are two important differences. -First, when @option{-i} is used, the program source will not be loaded if it has -been previously loaded, whereas the @option{-f} will always load the file. -Second, because this option is intended to be used with code libraries, -@command{gawk} does not recognize such files as constituting main program -input. Thus, after processing an @option{-i} argument, @command{gawk} still expects to -find the main source code via the @option{-f} option or on the command-line. - @item -v @var{var}=@var{val} @itemx --assign @var{var}=@var{val} -@cindex @code{-v} option -@cindex @code{--assign} option +@cindex @option{-v} option +@cindex @option{--assign} option @cindex variables, setting Set the variable @var{var} to the value @var{val} @emph{before} execution of the program begins. Such variable values are available @@ -3275,7 +3349,7 @@ predefined value you may have given. @end quotation @item -W @var{gawk-opt} -@cindex @code{-W} option +@cindex @option{-W} option Provide an implementation-specific option. This is the POSIX convention for providing implementation-specific options. These options @@ -3294,8 +3368,8 @@ conventions. @cindex @code{-} (hyphen), filenames beginning with @cindex hyphen (@code{-}), filenames beginning with -This is useful if you have @value{FN}s that start with @samp{-}, -or in shell scripts, if you have @value{FN}s that will be specified +This is useful if you have file names that start with @samp{-}, +or in shell scripts, if you have file names that will be specified by the user that could start with @samp{-}. It is also useful for passing options on to the @command{awk} program; see @ref{Getopt Function}. @@ -3308,8 +3382,8 @@ The following list describes @command{gawk}-specific options: @table @code @item -b @itemx --characters-as-bytes -@cindex @code{-b} option -@cindex @code{--characters-as-bytes} option +@cindex @option{-b} option +@cindex @option{--characters-as-bytes} option Cause @command{gawk} to treat all input data as single-byte characters. In addition, all output written with @code{print} or @code{printf} are treated as single-byte characters. @@ -3323,8 +3397,8 @@ multibyte characters. This option is an easy way to tell @command{gawk}: @item -c @itemx --traditional -@cindex @code{--c} option -@cindex @code{--traditional} option +@cindex @option{-c} option +@cindex @option{--traditional} option @cindex compatibility mode (@command{gawk}), specifying Specify @dfn{compatibility mode}, in which the GNU extensions to the @command{awk} language are disabled, so that @command{gawk} behaves just @@ -3335,17 +3409,18 @@ which summarizes the extensions. Also see @item -C @itemx --copyright -@cindex @code{-C} option -@cindex @code{--copyright} option +@cindex @option{-C} option +@cindex @option{--copyright} option @cindex GPL (General Public License), printing Print the short version of the General Public License and then exit. @item -d@r{[}@var{file}@r{]} @itemx --dump-variables@r{[}=@var{file}@r{]} -@cindex @code{-d} option -@cindex @code{--dump-variables} option -@cindex @code{awkvars.out} file -@cindex files, @code{awkvars.out} +@cindex @option{-d} option +@cindex @option{--dump-variables} option +@cindex dump all variables of a program +@cindex @file{awkvars.out} file +@cindex files, @file{awkvars.out} @cindex variables, global, printing list of Print a sorted list of global variables, their types, and final values to @var{file}. If no @var{file} is provided, print this @@ -3364,8 +3439,8 @@ names like @code{i}, @code{j}, etc.) @item -D@r{[}@var{file}@r{]} @itemx --debug=@r{[}@var{file}@r{]} -@cindex @code{-D} option -@cindex @code{--debug} option +@cindex @option{-D} option +@cindex @option{--debug} option @cindex @command{awk} debugging, enabling Enable debugging of @command{awk} programs (@pxref{Debugging}). @@ -3377,8 +3452,8 @@ No space is allowed between the @option{-D} and @var{file}, if @item -e @var{program-text} @itemx --source @var{program-text} -@cindex @code{-e} option -@cindex @code{--source} option +@cindex @option{-e} option +@cindex @option{--source} option @cindex source code, mixing Provide program source code in the @var{program-text}. This option allows you to mix source code in files with source @@ -3389,8 +3464,8 @@ programs (@pxref{AWKPATH Variable}). @item -E @var{file} @itemx --exec @var{file} -@cindex @code{-E} option -@cindex @code{--exec} option +@cindex @option{-E} option +@cindex @option{--exec} option @cindex @command{awk} programs, location of @cindex CGI, @command{awk} scripts for Similar to @option{-f}, read @command{awk} program text from @var{file}. @@ -3420,8 +3495,8 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so: @item -g @itemx --gen-pot -@cindex @code{-g} option -@cindex @code{--gen-pot} option +@cindex @option{-g} option +@cindex @option{--gen-pot} option @cindex portable object files, generating @cindex files, portable object, generating Analyze the source program and @@ -3432,18 +3507,34 @@ for information about this option. @item -h @itemx --help -@cindex @code{-h} option -@cindex @code{--help} option +@cindex @option{-h} option +@cindex @option{--help} option @cindex GNU long options, printing list of @cindex options, printing list of @cindex printing, list of options Print a ``usage'' message summarizing the short and long style options that @command{gawk} accepts and then exit. +@item -i @var{source-file} +@itemx --include @var{source-file} +@cindex @option{-i} option +@cindex @option{--include} option +@cindex @command{awk} programs, location of +Read @command{awk} source library from @var{source-file}. This option is +completely equivalent to using the @samp{@@include} directive inside +your program. This option is very +similar to the @option{-f} option, but there are two important differences. +First, when @option{-i} is used, the program source will not be loaded if it has +been previously loaded, whereas the @option{-f} will always load the file. +Second, because this option is intended to be used with code libraries, +@command{gawk} does not recognize such files as constituting main program +input. Thus, after processing an @option{-i} argument, @command{gawk} still expects to +find the main source code via the @option{-f} option or on the command-line. + @item -l @var{lib} @itemx --load @var{lib} -@cindex @code{-l} option -@cindex @code{--load} option +@cindex @option{-l} option +@cindex @option{--load} option @cindex loading, library Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH} environment variable. The correct library suffix for your platform will be @@ -3454,8 +3545,8 @@ a shared library. @item -L @r{[}value@r{]} @itemx --lint@r{[}=value@r{]} -@cindex @code{-l} option -@cindex @code{--lint} option +@cindex @option{-l} option +@cindex @option{--lint} option @cindex lint checking, issuing warnings @cindex warnings, issuing Warn about constructs that are dubious or nonportable to @@ -3477,16 +3568,16 @@ care to search for all occurrences of each inappropriate construct. As @item -M @itemx --bignum -@cindex @code{-M} option -@cindex @code{--bignum} option +@cindex @option{-M} option +@cindex @option{--bignum} option Force arbitrary precision arithmetic on numbers. This option has no effect if @command{gawk} is not compiled to use the GNU MPFR and MP libraries -(@pxref{Arbitrary Precision Arithmetic}). +(@pxref{Gawk and MPFR}). @item -n @itemx --non-decimal-data -@cindex @code{-n} option -@cindex @code{--non-decimal-data} option +@cindex @option{-n} option +@cindex @option{--non-decimal-data} option @cindex hexadecimal values@comma{} enabling interpretation of @cindex octal values@comma{} enabling interpretation of @cindex troubleshooting, @code{--non-decimal-data} option @@ -3501,40 +3592,40 @@ Use with care. @item -N @itemx --use-lc-numeric -@cindex @code{-N} option -@cindex @code{--use-lc-numeric} option +@cindex @option{-N} option +@cindex @option{--use-lc-numeric} option Force the use of the locale's decimal point character when parsing numeric input data (@pxref{Locales}). @item -o@r{[}@var{file}@r{]} @itemx --pretty-print@r{[}=@var{file}@r{]} -@cindex @code{-o} option -@cindex @code{--pretty-print} option +@cindex @option{-o} option +@cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. By default, output program is created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different -@value{FN} for the output. +file name for the output. No space is allowed between the @option{-o} and @var{file}, if @var{file} is supplied. @item -O @itemx --optimize -@cindex @code{--optimize} option -@cindex @code{-O} option +@cindex @option{--optimize} option +@cindex @option{-O} option Enable some optimizations on the internal representation of the program. At the moment this includes just simple constant folding. The @command{gawk} maintainer hopes to add more optimizations over time. @item -p@r{[}@var{file}@r{]} @itemx --profile@r{[}=@var{file}@r{]} -@cindex @code{-p} option -@cindex @code{--profile} option +@cindex @option{-p} option +@cindex @option{--profile} option @cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). By default, profiles are created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different -@value{FN} for the profile file. +file name for the profile file. No space is allowed between the @option{-p} and @var{file}, if @var{file} is supplied. @@ -3543,8 +3634,8 @@ in the left margin, and function call counts for each function. @item -P @itemx --posix -@cindex @code{-P} option -@cindex @code{--posix} option +@cindex @option{-P} option +@cindex @option{--posix} option @cindex POSIX mode @cindex @command{gawk}, extensions@comma{} disabling Operate in strict POSIX mode. This disables all @command{gawk} @@ -3585,16 +3676,16 @@ data (@pxref{Locales}). @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex @code{--traditional} option, @code{--posix} option and -@cindex @code{--posix} option, @code{--traditional} option and +@cindex @option{--traditional} option, @code{--posix} option and +@cindex @option{--posix} option, @code{--traditional} option and If you supply both @option{--traditional} and @option{--posix} on the command line, @option{--posix} takes precedence. @command{gawk} also issues a warning if both options are supplied. @item -r @itemx --re-interval -@cindex @code{-r} option -@cindex @code{--re-interval} option +@cindex @option{-r} option +@cindex @option{--re-interval} option @cindex regular expressions, interval expressions and Allow interval expressions (@pxref{Regexp Operators}) @@ -3605,8 +3696,8 @@ and for use in combination with the @option{--traditional} option. @item -S @itemx --sandbox -@cindex @code{-S} option -@cindex @code{--sandbox} option +@cindex @option{-S} option +@cindex @option{--sandbox} option @cindex sandbox mode Disable the @code{system()} function, input redirections with @code{getline}, @@ -3618,16 +3709,16 @@ can't access your system (other than the specified input data file). @item -t @itemx --lint-old -@cindex @code{--L} option -@cindex @code{--lint-old} option +@cindex @option{-L} option +@cindex @option{--lint-old} option Warn about constructs that are not available in the original version of @command{awk} from Version 7 Unix (@pxref{V7/SVR3.1}). @item -V @itemx --version -@cindex @code{-V} option -@cindex @code{--version} option +@cindex @option{-V} option +@cindex @option{--version} option @cindex @command{gawk}, versions of, information about@comma{} printing Print version information for this particular copy of @command{gawk}. This allows you to determine if your copy of @command{gawk} is up to date @@ -3641,14 +3732,14 @@ As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored. -@cindex @code{-F} option, @code{-Ft} sets @code{FS} to TAB +@cindex @option{-F} option, @option{-Ft} sets @code{FS} to TAB In compatibility mode, as a special case, if the value of @var{fs} supplied to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB character (@code{"\t"}). This is true only for @option{--traditional} and not for @option{--posix} (@pxref{Field Separators}). -@cindex @code{-f} option, multiple uses +@cindex @option{-f} option, multiple uses The @option{-f} option may be used more than once on the command line. If it is, @command{awk} reads its program source from all of the named files, as if they had been concatenated together into one big file. This is @@ -3662,7 +3753,7 @@ function names must be unique.) With standard @command{awk}, library functions can still be used, even if the program is entered at the terminal, by specifying @samp{-f /dev/tty}. After typing your program, -type @kbd{@value{CTL}-d} (the end-of-file character) to terminate it. +type @kbd{Ctrl-d} (the end-of-file character) to terminate it. (You may also use @samp{-f -} to read program source from the standard input but then you will not be able to also use the standard input as a source of data.) @@ -3675,7 +3766,7 @@ and library source code (@pxref{AWKPATH Variable}). The @option{--source} option may also be used multiple times on the command line. -@cindex @code{--source} option +@cindex @option{--source} option If no @option{-f} or @option{--source} option is specified, then @command{gawk} uses the first non-option command-line argument as the text of the program source code. @@ -3734,6 +3825,7 @@ file at all. @cindex @command{gawk}, @code{ARGIND} variable in @cindex @code{ARGIND} variable, command-line arguments +@cindex @code{ARGV} array, indexing into @cindex @code{ARGC}/@code{ARGV} variables, command-line arguments All these arguments are made available to your @command{awk} program in the @code{ARGV} array (@pxref{Built-in Variables}). Command-line options @@ -3744,9 +3836,10 @@ sets the variable @code{ARGIND} to the index in @code{ARGV} of the current element. @cindex input files, variable assignments and -The distinction between @value{FN} arguments and variable-assignment +@cindex variable assignments and input files +The distinction between file name arguments and variable-assignment arguments is made when @command{awk} is about to open the next input file. -At that point in execution, it checks the @value{FN} to see whether +At that point in execution, it checks the file name to see whether it is really a variable assignment; if so, @command{awk} sets the variable instead of reading a file. @@ -3763,7 +3856,7 @@ sequences (@pxref{Escape Sequences}). @value{DARKCORNER} In some earlier implementations of @command{awk}, when a variable assignment -occurred before any @value{FN}s, the assignment would happen @emph{before} +occurred before any file names, the assignment would happen @emph{before} the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus inconsistent; some command-line assignments were available inside the @code{BEGIN} rule, while others were not. Unfortunately, @@ -3774,8 +3867,8 @@ upon the old behavior. The variable assignment feature is most useful for assigning to variables such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and -output formats before scanning the @value{DF}s. It is also useful for -controlling state if multiple passes are needed over a @value{DF}. For +output formats before scanning the data files. It is also useful for +controlling state if multiple passes are needed over a data file. For example: @cindex files, multiple passes over @@ -3811,16 +3904,17 @@ You may also use @code{"-"} to name standard input when reading files with @code{getline} (@pxref{Getline/File}). In addition, @command{gawk} allows you to specify the special -@value{FN} @file{/dev/stdin}, both on the command line and +file name @file{/dev/stdin}, both on the command line and with @code{getline}. Some other versions of @command{awk} also support this, but it is not standard. (Some operating systems provide a @file{/dev/stdin} file in the file system, however, @command{gawk} always processes -this @value{FN} itself.) +this file name itself.) @node Environment Variables @section The Environment Variables @command{gawk} Uses +@cindex environment variables used by @command{gawk} A number of environment variables influence how @command{gawk} behaves. @@ -3836,8 +3930,7 @@ behaves. @node AWKPATH Variable @subsection The @env{AWKPATH} Environment Variable @cindex @env{AWKPATH} environment variable -@cindex directories, searching -@cindex search paths +@cindex directories, searching for source files @cindex search paths, for source files @cindex differences in @command{awk} and @command{gawk}, @code{AWKPATH} environment variable @ifinfo @@ -3847,14 +3940,14 @@ on the command-line with the @option{-f} option. In most @command{awk} implementations, you must supply a precise path name for each program file, unless the file is in the current directory. -But in @command{gawk}, if the @value{FN} supplied to the @option{-f} +But in @command{gawk}, if the file name supplied to the @option{-f} or @option{-i} options -does not contain a @samp{/}, then @command{gawk} searches a list of +does not contain a directory separator @samp{/}, then @command{gawk} searches a list of directories (called the @dfn{search path}), one by one, looking for a file with the specified name. The search path is a string consisting of directory names -separated by colons. @command{gawk} gets its search path from the +separated by colons@footnote{Semicolons on MS-Windows and MS-DOS.}. @command{gawk} gets its search path from the @env{AWKPATH} environment variable. If that variable does not exist, @command{gawk} uses a default path, @samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk} @@ -3867,7 +3960,7 @@ though.} The search path feature is particularly useful for building libraries of useful @command{awk} functions. The library files can be placed in a standard directory in the default path and then specified on -the command line with a short @value{FN}. Otherwise, the full @value{FN} +the command line with a short file name. Otherwise, the full file name would have to be typed for each file. By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line @@ -3912,8 +4005,7 @@ found, and @command{gawk} no longer needs to use @env{AWKPATH}. @node AWKLIBPATH Variable @subsection The @env{AWKLIBPATH} Environment Variable @cindex @env{AWKLIBPATH} environment variable -@cindex directories, searching -@cindex search paths +@cindex directories, searching for shared libraries @cindex search paths, for shared libraries @cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable @@ -3961,10 +4053,6 @@ for use by the @command{gawk} developers for testing and tuning. They are subject to change. The variables are: @table @env -@item AVG_CHAIN_MAX -The average number of items @command{gawk} will maintain on a -hash chain for managing arrays. - @item AWK_HASH If this variable exists with a value of @samp{gst}, @command{gawk} will switch to using the hash function from GNU Smalltalk for @@ -3977,6 +4065,13 @@ files one line at a time, instead of reading in blocks. This exists for debugging problems on filesystems on non-POSIX operating systems where I/O is performed in records, not in blocks. +@item GAWK_MSG_SRC +If this variable exists, @command{gawk} includes the source file +name and line number from which warning and/or fatal messages +are generated. Its purpose is to help isolate the source of a +message, since there can be multiple places which produce the +same warning or error message. + @item GAWK_NO_DFA If this variable exists, @command{gawk} does not use the DFA regexp matcher for ``does it match'' kinds of tests. This can cause @command{gawk} @@ -3989,6 +4084,14 @@ coordinate with each other.) This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. +@item INT_CHAIN_MAX +The average number of items @command{gawk} will maintain on a +hash chain for managing arrays indexed by integers. + +@item STR_CHAIN_MAX +The average number of items @command{gawk} will maintain on a +hash chain for managing arrays indexed by strings. + @item TIDYMEM If this variable exists, @command{gawk} uses the @code{mtrace()} library calls from GNU LIBC to help track down possible memory leaks. @@ -4067,7 +4170,7 @@ use @samp{@@include} followed by the name of the file to be included, enclosed in double quotes. @quotation NOTE -Keep in mind that this is a language construct and the @value{FN} cannot +Keep in mind that this is a language construct and the file name cannot be a string variable, but rather just a literal string in double quotes. @end quotation @@ -4092,7 +4195,7 @@ $ @kbd{gawk -f test3} @print{} This is file test3. @end example -The @value{FN} can, of course, be a pathname. For example: +The file name can, of course, be a pathname. For example: @example @@include "../io_funcs" @@ -4187,10 +4290,9 @@ they will @emph{not} be in the next release). @c update this section for each release! -@cindex @code{PROCINFO} array The process-related special files @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk} -3.1, but still worked. As of @value{PVERSION} 4.0, they are no longer +3.1, but still worked. As of version 4.0, they are no longer interpreted specially by @command{gawk}. (Use @code{PROCINFO} instead; see @ref{Auto-set}.) @@ -4209,10 +4311,11 @@ in case some option becomes obsolete in a future version of @command{gawk}. @cindex Jedi knights @cindex Knights, jedi @quotation -@i{Use the Source, Luke!}@* -Obi-Wan +@i{Use the Source, Luke!} +@author Obi-Wan @end quotation +@cindex shells, sea This @value{SECTION} intentionally left blank. @@ -4225,7 +4328,7 @@ blank. @table @code @item -W nostalgia @itemx --nostalgia -Print the message @code{"awk: bailing out near line 1"} and dump core. +Print the message @samp{awk: bailing out near line 1} and dump core. This option was inspired by the common behavior of very early versions of Unix @command{awk} and by a t--shirt. The message is @emph{not} subject to translation in non-English locales. @@ -4271,7 +4374,7 @@ long-undocumented ``feature'' of Unix @code{awk}. @node Regexp @chapter Regular Expressions -@cindex regexp, See regular expressions +@cindex regexp @c STARTOFRANGE regexp @cindex regular expressions @@ -4280,8 +4383,8 @@ set of strings. Because regular expressions are such a fundamental part of @command{awk} programming, their format and use deserve a separate @value{CHAPTER}. -@cindex forward slash (@code{/}) -@cindex @code{/} (forward slash) +@cindex forward slash (@code{/}) to enclose regular expressions +@cindex @code{/} (forward slash) to enclose regular expressions A regular expression enclosed in slashes (@samp{/}) is an @command{awk} pattern that matches every input record whose text belongs to that set. @@ -4318,14 +4421,14 @@ slashes. Then the regular expression is tested against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, the following prints the second field of each record that contains the string -@samp{foo} anywhere in it: +@samp{li} anywhere in it: @example -$ @kbd{awk '/foo/ @{ print $2 @}' BBS-list} -@print{} 555-1234 +$ @kbd{awk '/li/ @{ print $2 @}' mail-list} +@print{} 555-5553 +@print{} 555-0542 @print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 +@print{} 555-3430 @end example @cindex regular expressions, operators @@ -4337,9 +4440,9 @@ $ @kbd{awk '/foo/ @{ print $2 @}' BBS-list} @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator @c @cindex operators, @code{!~} -@cindex @code{if} statement -@cindex @code{while} statement -@cindex @code{do}-@code{while} statement +@cindex @code{if} statement, use of regexps in +@cindex @code{while} statement, use of regexps in +@cindex @code{do}-@code{while} statement, use of regexps in @c @cindex statements, @code{if} @c @cindex statements, @code{while} @c @cindex statements, @code{do} @@ -4398,6 +4501,7 @@ $ @kbd{awk '$1 !~ /J/' inventory-shipped} @end example @cindex regexp constants +@cindex constant regexps @cindex regular expressions, constants, See regexp constants When a regexp is enclosed in slashes, such as @code{/foo/}, we call it a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @@ -4406,7 +4510,7 @@ a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @node Escape Sequences @section Escape Sequences -@cindex escape sequences +@cindex escape sequences, in strings @cindex backslash (@code{\}), in escape sequences @cindex @code{\} (backslash), in escape sequences Some characters cannot be included literally in string constants @@ -4446,39 +4550,39 @@ A literal backslash, @samp{\}. @cindex @code{\} (backslash), @code{\a} escape sequence @cindex backslash (@code{\}), @code{\a} escape sequence @item \a -The ``alert'' character, @kbd{@value{CTL}-g}, ASCII code 7 (BEL). +The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL). (This usually makes some sort of audible noise.) @cindex @code{\} (backslash), @code{\b} escape sequence @cindex backslash (@code{\}), @code{\b} escape sequence @item \b -Backspace, @kbd{@value{CTL}-h}, ASCII code 8 (BS). +Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS). @cindex @code{\} (backslash), @code{\f} escape sequence @cindex backslash (@code{\}), @code{\f} escape sequence @item \f -Formfeed, @kbd{@value{CTL}-l}, ASCII code 12 (FF). +Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF). @cindex @code{\} (backslash), @code{\n} escape sequence @cindex backslash (@code{\}), @code{\n} escape sequence @item \n -Newline, @kbd{@value{CTL}-j}, ASCII code 10 (LF). +Newline, @kbd{Ctrl-j}, ASCII code 10 (LF). @cindex @code{\} (backslash), @code{\r} escape sequence @cindex backslash (@code{\}), @code{\r} escape sequence @item \r -Carriage return, @kbd{@value{CTL}-m}, ASCII code 13 (CR). +Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR). @cindex @code{\} (backslash), @code{\t} escape sequence @cindex backslash (@code{\}), @code{\t} escape sequence @item \t -Horizontal TAB, @kbd{@value{CTL}-i}, ASCII code 9 (HT). +Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). @c @cindex @command{awk} language, V.4 version @cindex @code{\} (backslash), @code{\v} escape sequence @cindex backslash (@code{\}), @code{\v} escape sequence @item \v -Vertical tab, @kbd{@value{CTL}-k}, ASCII code 11 (VT). +Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT). @cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence @cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence @@ -4576,6 +4680,7 @@ leaves what happens as undefined. There are two choices: @c @cindex automatic warnings @c @cindex warnings, automatic +@cindex Brian Kernighan's @command{awk} @table @asis @item Strip the backslash out This is what Brian Kernighan's @command{awk} and @command{gawk} both do. @@ -4589,6 +4694,7 @@ two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) @cindex @command{gawk}, escape sequences @cindex Unix @command{awk}, backslashes in escape sequences +@cindex @command{mawk} utility @item Leave the backslash alone Some other @command{awk} implementations do this. In such implementations, typing @code{"a\qc"} is the same as typing @@ -4617,6 +4723,7 @@ leaves what happens as undefined. There are two choices: @c @cindex automatic warnings @c @cindex warnings, automatic +@cindex Brian Kernighan's @command{awk} @table @asis @item Strip the backslash out This is what Brian Kernighan's @command{awk} and @command{gawk} both do. @@ -4630,6 +4737,7 @@ two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) @cindex @command{gawk}, escape sequences @cindex Unix @command{awk}, backslashes in escape sequences +@cindex @command{mawk} utility @item Leave the backslash alone Some other @command{awk} implementations do this. In such implementations, typing @code{"a\qc"} is the same as typing @@ -4696,6 +4804,7 @@ escape sequences literally when used in regexp constants. Thus, @section Regular Expression Operators @c STARTOFRANGE regexpo @cindex regular expressions, operators +@cindex metacharacters in regular expressions You can combine regular expressions with special characters, called @dfn{regular expression operators} or @dfn{metacharacters}, to @@ -4714,8 +4823,8 @@ Here is a list of metacharacters. All characters that are not escape sequences and that are not listed in the table stand for themselves: @table @code -@cindex backslash (@code{\}) -@cindex @code{\} (backslash) +@cindex backslash (@code{\}), regexp operator +@cindex @code{\} (backslash), regexp operator @item \ This is used to suppress the special meaning of a character when matching. For example, @samp{\$} @@ -4740,8 +4849,8 @@ The condition is not true in the following example: if ("line1\nLINE 2" ~ /^L/) @dots{} @end example -@cindex @code{$} (dollar sign) -@cindex dollar sign (@code{$}) +@cindex @code{$} (dollar sign), regexp operator +@cindex dollar sign (@code{$}), regexp operator @item $ This is similar to @samp{^}, but it matches only at the end of a string. For example, @samp{p$} @@ -4753,8 +4862,8 @@ The condition in the following example is not true: if ("line1\nLINE 2" ~ /1$/) @dots{} @end example -@cindex @code{.} (period) -@cindex period (@code{.}) +@cindex @code{.} (period), regexp operator +@cindex period (@code{.}), regexp operator @item . @r{(period)} This matches any single character, @emph{including} the newline character. For example, @samp{.P} @@ -4770,11 +4879,12 @@ character, which is a character with all bits equal to zero. Otherwise, @sc{nul} is just another character. Other versions of @command{awk} may not be able to match the @sc{nul} character. -@cindex @code{[]} (square brackets) -@cindex square brackets (@code{[]}) +@cindex @code{[]} (square brackets), regexp operator +@cindex square brackets (@code{[]}), regexp operator @cindex bracket expressions @cindex character sets, See Also bracket expressions @cindex character lists, See bracket expressions +@cindex character classes, See bracket expressions @item [@dots{}] This is called a @dfn{bracket expression}.@footnote{In other literature, you may see a bracket expression referred to as either a @@ -4807,8 +4917,8 @@ means it matches any string that starts with @samp{P} or contains a digit. The alternation applies to the largest possible regexps on either side. -@cindex @code{()} (parentheses) -@cindex parentheses @code{()} +@cindex @code{()} (parentheses), regexp operator +@cindex parentheses @code{()}, regexp operator @item (@dots{}) Parentheses are used for grouping in regular expressions, as in arithmetic. They can be used to concatenate regular expressions @@ -4836,8 +4946,8 @@ prints every record in @file{sample} containing a string of the form Notice the escaping of the parentheses by preceding them with backslashes. -@cindex @code{+} (plus sign) -@cindex plus sign (@code{+}) +@cindex @code{+} (plus sign), regexp operator +@cindex plus sign (@code{+}), regexp operator @item + This symbol is similar to @samp{*}, except that the preceding expression must be matched at least once. This means that @samp{wh+y} @@ -4850,14 +4960,14 @@ way of writing the last @samp{*} example: awk '/\(c[ad]+r x\)/ @{ print @}' sample @end example -@cindex @code{?} (question mark) regexp operator -@cindex question mark (@code{?}) regexp operator +@cindex @code{?} (question mark), regexp operator +@cindex question mark (@code{?}), regexp operator @item ? This symbol is similar to @samp{*}, except that the preceding expression can be matched either once or not at all. For example, @samp{fe?d} matches @samp{fed} and @samp{fd}, but nothing else. -@cindex interval expressions +@cindex interval expressions, regexp operator @item @{@var{n}@} @itemx @{@var{n},@} @itemx @{@var{n},@var{m}@} @@ -4891,7 +5001,7 @@ constants, @command{gawk} did @emph{not} match interval expressions in regexps. -However, beginning with @value{PVERSION} 4.0, +However, beginning with version 4.0, @command{gawk} does match interval expressions by default. This is because compatibility with POSIX has become more important to most @command{gawk} users than compatibility with @@ -4934,6 +5044,7 @@ expressions are not available in regular expressions. @cindex bracket expressions @cindex bracket expressions, range expressions @cindex range expressions (regexps) +@cindex character lists in regular expression As mentioned earlier, a bracket expression matches any character amongst those listed between the opening and closing square brackets. @@ -5035,8 +5146,8 @@ These sequences are: @item Collating symbols Multicharacter collating elements enclosed between @samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element, -then @code{[[.ch.]]} is a regexp that matches this collating element, whereas -@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}. +then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas +@samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}. @cindex bracket expressions, equivalence classes @item Equivalence classes @@ -5044,7 +5155,7 @@ Locale-specific names for a list of characters that are equal. The name is enclosed between @samp{[=} and @samp{=]}. For example, the name @samp{e} might be used to represent all of -``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp +``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}. @end table @@ -5088,7 +5199,7 @@ or underscores (@samp{_}): @item \s Matches any whitespace character. Think of it as shorthand for -@w{@code{[[:space:]]}}. +@w{@samp{[[:space:]]}}. @c @cindex operators, @code{\S} (@command{gawk}) @cindex backslash (@code{\}), @code{\S} operator (@command{gawk}) @@ -5096,7 +5207,7 @@ Think of it as shorthand for @item \S Matches any character that is not whitespace. Think of it as shorthand for -@w{@code{[^[:space:]]}}. +@w{@samp{[^[:space:]]}}. @c @cindex operators, @code{\w} (@command{gawk}) @cindex backslash (@code{\}), @code{\w} operator (@command{gawk}) @@ -5104,7 +5215,7 @@ Think of it as shorthand for @item \w Matches any word-constituent character---that is, it matches any letter, digit, or underscore. Think of it as shorthand for -@w{@code{[[:alnum:]_]}}. +@w{@samp{[[:alnum:]_]}}. @c @cindex operators, @code{\W} (@command{gawk}) @cindex backslash (@code{\}), @code{\W} operator (@command{gawk}) @@ -5112,7 +5223,7 @@ letter, digit, or underscore. Think of it as shorthand for @item \W Matches any character that is not word-constituent. Think of it as shorthand for -@w{@code{[^[:alnum:]_]}}. +@w{@samp{[^[:alnum:]_]}}. @c @cindex operators, @code{\<} (@command{gawk}) @cindex backslash (@code{\}), @code{\<} operator (@command{gawk}) @@ -5173,10 +5284,10 @@ Matches the empty string at the end of a buffer (string). @end table -@cindex @code{^} (caret) -@cindex caret (@code{^}) -@cindex @code{?} (question mark) regexp operator -@cindex question mark (@code{?}) regexp operator +@cindex @code{^} (caret), regexp operator +@cindex caret (@code{^}), regexp operator +@cindex @code{?} (question mark), regexp operator +@cindex question mark (@code{?}), regexp operator Because @samp{^} and @samp{$} always work in terms of the beginning and end of strings, these operators don't add any new capabilities for @command{awk}. They are provided for compatibility with other @@ -5197,7 +5308,7 @@ lesser of two evils. @c @c Should really do this with file inclusion. @cindex regular expressions, @command{gawk}, command-line options -@cindex @command{gawk}, command-line options +@cindex @command{gawk}, command-line options, and regular expressions The various command-line options (@pxref{Options}) control how @command{gawk} interprets characters in regexps: @@ -5220,10 +5331,11 @@ Only POSIX regexps are supported; the GNU operators are not special (e.g., @samp{\w} matches a literal @samp{w}). Interval expressions are allowed. +@cindex Brian Kernighan's @command{awk} @item @code{--traditional} Traditional Unix @command{awk} regexps are matched. The GNU operators are not special, and interval expressions are not available. -The POSIX character classes (@code{[[:alnum:]]}, etc.) are supported, +The POSIX character classes (@samp{[[:alnum:]]}, etc.) are supported, as Brian Kernighan's @command{awk} does support them. Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters. @@ -5275,7 +5387,7 @@ This works in any POSIX-compliant @command{awk}. @cindex tilde (@code{~}), @code{~} operator @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator -@cindex @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, with @code{~} and @code{!~} operators @cindex @command{gawk}, @code{IGNORECASE} variable in @c @cindex variables, @code{IGNORECASE} Another method, specific to @command{gawk}, is to set the variable @@ -5487,7 +5599,7 @@ But a newline in a regexp constant works with no problem: $ @kbd{awk '$0 ~ /[ \t\n]/'} @kbd{here is a sample line} @print{} here is a sample line -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @command{gawk} does not have this problem, and it isn't likely to @@ -5525,7 +5637,7 @@ But a newline in a regexp constant works with no problem: $ @kbd{awk '$0 ~ /[ \t\n]/'} @kbd{here is a sample line} @print{} here is a sample line -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @command{gawk} does not have this problem, and it isn't likely to @@ -5540,6 +5652,7 @@ occur often in practice, but it's worth noting for future reference. @chapter Reading Input Files @c STARTOFRANGE infir +@cindex reading input files @cindex input files, reading @cindex input files @cindex @code{FILENAME} variable @@ -5576,7 +5689,7 @@ used with it do not have to be named on the @command{awk} command line * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. @@ -5601,7 +5714,7 @@ so far from the current input file. This value is stored in a built-in variable called @code{FNR}. It is reset to zero when a new file is started. Another built-in variable, @code{NR}, records the total -number of input records read so far from all @value{DF}s. It starts at zero, +number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero. @cindex separators, for records @@ -5626,69 +5739,80 @@ To do this, use the special @code{BEGIN} pattern (@pxref{BEGIN/END}). For example: -@cindex @code{BEGIN} pattern @example -awk 'BEGIN @{ RS = "/" @} - @{ print $0 @}' BBS-list +awk 'BEGIN @{ RS = "u" @} + @{ print $0 @}' mail-list @end example @noindent -changes the value of @code{RS} to @code{"/"}, before reading any input. -This is a string whose first character is a slash; as a result, records -are separated by slashes. Then the input file is read, and the second +changes the value of @code{RS} to @samp{u}, before reading any input. +This is a string whose first character is the letter ``u;'' as a result, records +are separated by the letter ``u.'' Then the input file is read, and the second rule in the @command{awk} program (the action with no pattern) prints each record. Because each @code{print} statement adds a newline at the end of its output, this @command{awk} program copies the input -with each slash changed to a newline. Here are the results of running -the program on @file{BBS-list}: - -@example -$ @kbd{awk 'BEGIN @{ RS = "/" @}} -> @kbd{@{ print $0 @}' BBS-list} -@print{} aardvark 555-5553 1200 -@print{} 300 B -@print{} alpo-net 555-3412 2400 -@print{} 1200 -@print{} 300 A -@print{} barfly 555-7685 1200 -@print{} 300 A -@print{} bites 555-1675 2400 -@print{} 1200 -@print{} 300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200 -@print{} 300 C -@print{} fooey 555-1234 2400 -@print{} 1200 -@print{} 300 B -@print{} foot 555-6699 1200 -@print{} 300 B -@print{} macfoo 555-6480 1200 -@print{} 300 A -@print{} sdace 555-3430 2400 -@print{} 1200 -@print{} 300 A -@print{} sabafoo 555-2127 1200 -@print{} 300 C -@print{} +with each @samp{u} changed to a newline. Here are the results of running +the program on @file{mail-list}: + +@example +$ @kbd{awk 'BEGIN @{ RS = "u" @}} +> @kbd{@{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiac +@print{} sq +@print{} e@@gmail.com F +@print{} Anthony 555-3412 anthony.assert +@print{} ro@@hotmail.com A +@print{} Becky 555-7685 becky.algebrar +@print{} m@@gmail.com A +@print{} Bill 555-1675 bill.drowning@@hotmail.com A +@print{} Broderick 555-0542 broderick.aliq +@print{} otiens@@yahoo.com R +@print{} Camilla 555-2912 camilla.inf +@print{} sar +@print{} m@@skynet.be R +@print{} Fabi +@print{} s 555-1234 fabi +@print{} s. +@print{} ndevicesim +@print{} s@@ +@print{} cb.ed +@print{} F +@print{} J +@print{} lie 555-6699 j +@print{} lie.perscr +@print{} tabor@@skeeve.com F +@print{} Martin 555-6480 martin.codicib +@print{} s@@hotmail.com A +@print{} Sam +@print{} el 555-3430 sam +@print{} el.lanceolis@@sh +@print{} .ed +@print{} A +@print{} Jean-Pa +@print{} l 555-2127 jeanpa +@print{} l.campanor +@print{} m@@ny +@print{} .ed +@print{} R +@print{} @end example @noindent -Note that the entry for the @samp{camelot} BBS is not split. -In the original @value{DF} +Note that the entry for the name @samp{Bill} is not split. +In the original data file (@pxref{Sample Data Files}), the line looks like this: @example -camelot 555-0542 300 C +Bill 555-1675 bill.drowning@@hotmail.com A @end example @noindent -It has one baud rate only, so there are no slashes in the record, -unlike the others which have two or more baud rates. -In fact, this record is treated as part of the record -for the @samp{core} BBS; the newline separating them in the output -is the original newline in the @value{DF}, not the one added by +It contains no @samp{u} so there is no reason to split the record, +unlike the others which have one or more occurrences of the @samp{u}. +In fact, this record is treated as part of the previous record; +the newline separating them in the output +is the original newline in the data file, not the one added by @command{awk} when it printed the record! @cindex record separators, changing @@ -5698,14 +5822,17 @@ using the variable-assignment feature (@pxref{Other Arguments}): @example -awk '@{ print $0 @}' RS="/" BBS-list +awk '@{ print $0 @}' RS="u" mail-list @end example @noindent -This sets @code{RS} to @samp{/} before processing @file{BBS-list}. +This sets @code{RS} to @samp{u} before processing @file{mail-list}. -Using an unusual character such as @samp{/} for the record separator -produces correct behavior in the vast majority of cases. +Using an alphabetic character such as @samp{u} for the record separator +is highly likely to produce strange results. +Using an unusual character such as @samp{/} is more likely to +produce correct behavior in the majority of cases, but there +are no guarantees. The moral is: Know Your Data. There is one unusual case, that occurs when @command{gawk} is being fully POSIX-compliant (@pxref{Options}). @@ -5727,6 +5854,7 @@ Reaching the end of an input file terminates the current input record, even if the last character in the file is not the character in @code{RS}. @value{DARKCORNER} +@cindex empty strings @cindex null strings @cindex strings, empty, See null strings The empty string @code{""} (a string without any characters) @@ -5829,8 +5957,8 @@ In compatibility mode, only the first character of the value of <sidebar><title>@code{RS = "\0"} Is Not Portable</title> @end docbook -@cindex portability, @value{DF}s as single record -There are times when you might want to treat an entire @value{DF} as a +@cindex portability, data files as single record +There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give @code{RS} a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary @@ -5849,21 +5977,27 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @command{gawk} in fact accepts this, and uses the @sc{nul} character for the record separator. However, this usage is @emph{not} portable -to other @command{awk} implementations. +to most other @command{awk} implementations. @cindex dark corner, strings, storing -All other @command{awk} implementations@footnote{At least that we know +Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the @sc{nul} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} +It happens that recent versions of @command{mawk} can use the @sc{nul} +character as a record separator. However, this is a special case: +@command{mawk} does not allow embedded @sc{nul} characters in strings. + @cindex records, treating files as -@cindex files, as single records +@cindex treating files, as single records The best way to treat a whole file as a single record is to simply read the file in, one record at a time, concatenating each record onto the end of the previous ones. +@c @strong{FIXME}: Using @sc{nul} is good for @file{/proc/environ} etc. + @docbook </sidebar> @end docbook @@ -5874,8 +6008,8 @@ record onto the end of the previous ones. @center @b{@code{RS = "\0"} Is Not Portable} -@cindex portability, @value{DF}s as single record -There are times when you might want to treat an entire @value{DF} as a +@cindex portability, data files as single record +There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give @code{RS} a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary @@ -5894,20 +6028,26 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @command{gawk} in fact accepts this, and uses the @sc{nul} character for the record separator. However, this usage is @emph{not} portable -to other @command{awk} implementations. +to most other @command{awk} implementations. @cindex dark corner, strings, storing -All other @command{awk} implementations@footnote{At least that we know +Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the @sc{nul} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} +It happens that recent versions of @command{mawk} can use the @sc{nul} +character as a record separator. However, this is a special case: +@command{mawk} does not allow embedded @sc{nul} characters in strings. + @cindex records, treating files as -@cindex files, as single records +@cindex treating files, as single records The best way to treat a whole file as a single record is to simply read the file in, one record at a time, concatenating each record onto the end of the previous ones. + +@c @strong{FIXME}: Using @sc{nul} is good for @file{/proc/environ} etc. @end cartouche @end ifnotdocbook @c ENDOFRANGE inspl @@ -5980,31 +6120,29 @@ when you are not interested in specific fields. Here are some more examples: @example -$ @kbd{awk '$1 ~ /foo/ @{ print $0 @}' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '$1 ~ /li/ @{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F @end example @noindent -This example prints each record in the file @file{BBS-list} whose first -field contains the string @samp{foo}. The operator @samp{~} is called a +This example prints each record in the file @file{mail-list} whose first +field contains the string @samp{li}. The operator @samp{~} is called a @dfn{matching operator} (@pxref{Regexp Usage}); it tests whether a string (here, the field @code{$1}) matches a given regular expression. By contrast, the following example -looks for @samp{foo} in @emph{the entire record} and prints the first +looks for @samp{li} in @emph{the entire record} and prints the first field and the last field for each matching input record: @example -$ @kbd{awk '/foo/ @{ print $1, $NF @}' BBS-list} -@print{} fooey B -@print{} foot B -@print{} macfoo A -@print{} sabafoo C +$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} +@print{} Amelia F +@print{} Broderick R +@print{} Julie F +@print{} Samuel A @end example @c ENDOFRANGE fiex @@ -6032,7 +6170,7 @@ the record has fewer than 20 fields, so this prints a blank line. Here is another example of using expressions as field numbers: @example -awk '@{ print $(2*2) @}' BBS-list +awk '@{ print $(2*2) @}' mail-list @end example @command{awk} evaluates the expression @samp{(2*2)} and uses @@ -6041,8 +6179,8 @@ represents multiplication, so the expression @samp{2*2} evaluates to four. The parentheses are used so that the multiplication is done before the @samp{$} operation; they are necessary whenever there is a binary operator in the field-number expression. This example, then, prints the -hours of operation (the fourth field) for every line of the file -@file{BBS-list}. (All of the @command{awk} operators are listed, in +type of relationship (the fourth field) for every line of the file +@file{mail-list}. (All of the @command{awk} operators are listed, in order of decreasing precedence, in @ref{Precedence}.) @@ -6296,6 +6434,7 @@ with a statement such as @samp{$1 = $1}, as described earlier. * Regexp Field Splitting:: Using regexps as the field separator. * Single Character Fields:: Making each character a separate field. * Command Line Field Separator:: Setting @code{FS} from the command-line. +* Full Line Fields:: Making the full line be a single field. * Field Splitting Summary:: Some final points and a summary table. @end menu @@ -6483,7 +6622,7 @@ was ignored when finding @code{$1}, it is not part of the new @code{$0}. Finally, the last @code{print} statement prints the new @code{$0}. @cindex @code{FS}, containing @code{^} -@cindex @code{^}, in @code{FS} +@cindex @code{^} (caret), in @code{FS} @cindex dark corner, @code{^}, in @code{FS} There is an additional subtlety to be aware of when using regular expressions for field splitting. @@ -6494,6 +6633,7 @@ different @command{awk} versions answer this question differently, and you should not rely on any specific behavior in your programs. @value{DARKCORNER} +@cindex Brian Kernighan's @command{awk} As a point of information, Brian Kernighan's @command{awk} allows @samp{^} to match only at the beginning of the record. @command{gawk} also works this way. For example: @@ -6537,7 +6677,7 @@ $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}} @end example @cindex dark corner, @code{FS} as null string -@cindex FS variable, as null string +@cindex @code{FS} variable, as null string Traditionally, the behavior of @code{FS} equal to @code{""} was not defined. In this case, most versions of Unix @command{awk} simply treat the entire record as only having one field. @@ -6549,10 +6689,8 @@ behaves this way. @node Command Line Field Separator @subsection Setting @code{FS} from the Command Line -@cindex @code{-F} option -@cindex options, command-line -@cindex command line, options -@cindex field separators, on command line +@cindex @option{-F} option, command line +@cindex field separator, on command line @cindex command line, @code{FS} on@comma{} setting @cindex @code{FS} variable, setting from command line @@ -6602,68 +6740,76 @@ figures that you really want your fields to be separated with TABs and not @samp{t}s. Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line if you really do want to separate your fields with @samp{t}s. -As an example, let's use an @command{awk} program file called @file{baud.awk} -that contains the pattern @code{/300/} and the action @samp{print $1}: +As an example, let's use an @command{awk} program file called @file{edu.awk} +that contains the pattern @code{/edu/} and the action @samp{print $1}: @example -/300/ @{ print $1 @} +/edu/ @{ print $1 @} @end example Let's also set @code{FS} to be the @samp{-} character and run the -program on the file @file{BBS-list}. The following command prints a -list of the names of the bulletin boards that operate at 300 baud and +program on the file @file{mail-list}. The following command prints a +list of the names of the people that work at or attend a university, and the first three digits of their phone numbers: @c tweaked to make the tex output look better in @smallbook @example -$ @kbd{awk -F- -f baud.awk BBS-list} -@print{} aardvark 555 -@print{} alpo -@print{} barfly 555 -@print{} bites 555 -@print{} camelot 555 -@print{} core 555 -@print{} fooey 555 -@print{} foot 555 -@print{} macfoo 555 -@print{} sdace 555 -@print{} sabafoo 555 +$ @kbd{awk -F- -f edu.awk mail-list} +@print{} Fabius 555 +@print{} Samuel 555 +@print{} Jean @end example @noindent -Note the second line of output. The second line +Note the third line of output. The third line in the original file looked like this: @example -alpo-net 555-3412 2400/1200/300 A +Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example -The @samp{-} as part of the system's name was used as the field +The @samp{-} as part of the person's name was used as the field separator, instead of the @samp{-} in the phone number that was originally intended. This demonstrates why you have to be careful in choosing your field and record separators. @cindex Unix @command{awk}, password files@comma{} field separators and -Perhaps the most common use of a single character as the field -separator occurs when processing the Unix system password file. -On many Unix systems, each user has a separate entry in the system password -file, one line per user. The information in these lines is separated -by colons. The first field is the user's login name and the second is -the user's (encrypted or shadow) password. A password file entry might look -like this: +Perhaps the most common use of a single character as the field separator +occurs when processing the Unix system password file. On many Unix +systems, each user has a separate entry in the system password file, one +line per user. The information in these lines is separated by colons. +The first field is the user's login name and the second is the user's +encrypted or shadow password. (A shadow password is indicated by the +presence of a single @samp{x} in the second field.) A password file +entry might look like this: @cindex Robbins, Arnold @example -arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash +arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash @end example The following program searches the system password file and prints -the entries for users who have no password: +the entries for users whose full name is not indicated: + +@example +awk -F: '$5 == ""' /etc/passwd +@end example + +@node Full Line Fields +@subsection Making The Full Line Be A Single Field + +Occasionally, it's useful to treat the whole input line as a +single field. This can be done easily and portably simply by +setting @code{FS} to @code{"\n"} (a newline).@footnote{Thanks to +Andrew Schorr for this tip.} @example -awk -F: '$2 == ""' /etc/passwd +awk -F'\n' '@var{program}' @var{files @dots{}} @end example +@noindent +When you do this, @code{$1} is the same as @code{$0}. + @node Field Splitting Summary @subsection Field-Splitting Summary @@ -6709,7 +6855,7 @@ POSIX standard.) @cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and +@cindex field separator, POSIX and According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} @@ -6762,7 +6908,7 @@ root:nSijPlPhZZwgE:0:0:Root:/: @cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and +@cindex field separator, POSIX and According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} @@ -6869,19 +7015,11 @@ will take effect. @node Constant Size @section Reading Fixed-Width Data -@ifnotinfo @quotation NOTE This @value{SECTION} discusses an advanced feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @end quotation -@end ifnotinfo - -@ifinfo -(This @value{SECTION} discusses an advanced feature of @command{awk}. -If you are a novice @command{awk} user, you might want to skip it on -the first reading.) -@end ifinfo @cindex data, fixed-width @cindex fixed-width data @@ -7011,19 +7149,11 @@ for an example of such a function). @node Splitting By Content @section Defining Fields By Content -@ifnotinfo @quotation NOTE This @value{SECTION} discusses an advanced feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @end quotation -@end ifnotinfo - -@ifinfo -(This @value{SECTION} discusses an advanced feature of @command{awk}. -If you are a novice @command{awk} user, you might want to skip it on -the first reading.) -@end ifinfo @cindex advanced features, specifying field content Normally, when using @code{FS}, @command{gawk} defines the fields as the @@ -7138,6 +7268,7 @@ available for splitting regular strings (@pxref{String Functions}). @node Multiple Line @section Multiple-Line Records +@cindex multiple-line records @c STARTOFRANGE recm @cindex records, multiline @c STARTOFRANGE imr @@ -7184,12 +7315,13 @@ appear in a row, they are considered one record separator. @cindex dark corner, multiline records There is an important difference between @samp{RS = ""} and @samp{RS = "\n\n+"}. In the first case, leading newlines in the input -@value{DF} are ignored, and if a file ends without extra blank lines +data file are ignored, and if a file ends without extra blank lines after the last record, the final newline is removed from the record. In the second case, this special processing is not done. @value{DARKCORNER} -@cindex field separators, in multiline records +@cindex field separator, in multiline records +@cindex @code{FS}, in multiline records Now that the input is separated into records, the second step is to separate the fields in the record. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default @@ -7219,7 +7351,7 @@ Another way to separate fields is to put each field on a separate line: to do this, just set the variable @code{FS} to the string @code{"\n"}. (This single character separator matches a single newline.) -A practical example of a @value{DF} organized this way might be a mailing +A practical example of a data file organized this way might be a mailing list, where each entry is separated by blank lines. Consider a mailing list in a file named @file{addresses}, which looks like this: @@ -7284,7 +7416,7 @@ value of @table @code @item RS == "\n" Records are separated by the newline character (@samp{\n}). In effect, -every line in the @value{DF} is a separate record, including blank lines. +every line in the data file is a separate record, including blank lines. This is the default. @item RS == @var{any single character} @@ -7320,6 +7452,7 @@ then @command{gawk} sets @code{RT} to the null string. @c STARTOFRANGE getl @cindex @code{getline} command, explicit input with +@c STARTOFRANGE inex @cindex input, explicit So far we have been getting our input data from @command{awk}'s main input stream---either the standard input (usually your terminal, sometimes @@ -7336,10 +7469,10 @@ and study the @code{getline} command @emph{after} you have reviewed the rest of this @value{DOCUMENT} and have a good knowledge of how @command{awk} works. @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @command{getline} command @cindex differences in @command{awk} and @command{gawk}, @code{getline} command @cindex @code{getline} command, return values -@cindex @code{--sandbox} option, input redirection with @command{getline} +@cindex @option{--sandbox} option, input redirection with @code{getline} The @code{getline} command returns one if it finds a record and zero if it encounters the end of the file. If there is some error in getting @@ -7432,6 +7565,7 @@ rule in the program. @xref{Next Statement}. @node Getline/Variable @subsection Using @code{getline} into a Variable +@cindex @code{getline} into a variable @cindex variables, @code{getline} command into@comma{} using You can use @samp{getline @var{var}} to read the next record from @@ -7483,6 +7617,7 @@ the value of @code{NF} do not change. @node Getline/File @subsection Using @code{getline} from a File +@cindex @code{getline} from a file @cindex input redirection @cindex redirection of input @cindex @code{<} (left angle bracket), @code{<} operator (I/O) @@ -7490,7 +7625,7 @@ the value of @code{NF} do not change. @cindex operators, input/output Use @samp{getline < @var{file}} to read the next record from @var{file}. Here @var{file} is a string-valued expression that -specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} +specifies the file name. @samp{< @var{file}} is called a @dfn{redirection} because it directs input to come from a different place. For example, the following program reads its input record from the file @file{secondary.input} when it @@ -7531,8 +7666,6 @@ from the file @var{file}, and put it in the variable @var{var}. As above, @var{file} is a string-valued expression that specifies the file from which to read. -@cindex @command{gawk}, @code{RT} variable in -@cindex @code{RT} variable In this version of @code{getline}, none of the built-in variables are changed and the record is not split into fields. The only variable changed is @var{var}.@footnote{This is not quite true. @code{RT} could @@ -7557,7 +7690,6 @@ Note here how the name of the extra input file is not built into the program; it is taken directly from the data, specifically from the second field on the @samp{@@include} line. -@cindex @code{close()} function The @code{close()} function is called to ensure that if two identical @samp{@@include} lines appear in the input, the entire specified file is included twice. @@ -7574,16 +7706,17 @@ that does handle nested @samp{@@include} statements. @subsection Using @code{getline} from a Pipe @c From private email, dated October 2, 1988. Used by permission, March 2013. +@cindex Kernighan, Brian @quotation @i{Omniscience has much to recommend it. -Failing that, attention to details would be useful.}@* -Brian Kernighan +Failing that, attention to details would be useful.} +@author Brian Kernighan @end quotation @cindex @code{|} (vertical bar), @code{|} operator (I/O) @cindex vertical bar (@code{|}), @code{|} operator (I/O) @cindex input pipeline -@cindex pipes, input +@cindex pipe, input @cindex operators, input/output The output of a command can also be piped into @code{getline}, using @samp{@var{command} | getline}. In @@ -7607,7 +7740,6 @@ produced by running the rest of the line as a shell command: @end example @noindent -@cindex @code{close()} function The @code{close()} function is called to ensure that if two identical @samp{@@execute} lines appear in the input, the command is run for each one. @@ -7661,6 +7793,8 @@ because the concatenation operator is not parenthesized. You should write it as @samp{(@w{"echo "} "date") | getline} if you want your program to be portable to all @command{awk} implementations. +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility @quotation NOTE Unfortunately, @command{gawk} has not been consistent in its treatment of a construct like @samp{@w{"echo "} "date" | getline}. @@ -7797,10 +7931,10 @@ system permits. @item An interesting side effect occurs if you use @code{getline} without a redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline} -reads from the command-line @value{DF}s, the first @code{getline} command +reads from the command-line data files, the first @code{getline} command causes @command{awk} to set the value of @code{FILENAME}. Normally, @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you -have not yet started to process the command-line @value{DF}s. +have not yet started to process the command-line data files. @value{DARKCORNER} (@xref{BEGIN/END}, also @pxref{Auto-set}.) @@ -7985,6 +8119,7 @@ indefinitely until some other process opens it for writing. @node Command line directories @section Directories On The Command Line +@cindex differences in @command{awk} and @command{gawk}, command line directories @cindex directories, command line @cindex command line, directories on @@ -8022,7 +8157,7 @@ For printing with specifications, you need the @code{printf} statement @cindex @code{printf} statement Besides basic and formatted printing, this @value{CHAPTER} also covers I/O redirections to files and pipes, introduces -the special @value{FN}s that @command{gawk} processes internally, +the special file names that @command{gawk} processes internally, and discusses the @code{close()} built-in function. @menu @@ -8228,13 +8363,29 @@ program by using a new value of @code{OFS}. @example $ @kbd{awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}} -> @kbd{@{ print $1, $2 @}' BBS-list} -@print{} aardvark;555-5553 -@print{} -@print{} alpo-net;555-3412 -@print{} -@print{} barfly;555-7685 -@dots{} +> @kbd{@{ print $1, $2 @}' mail-list} +@print{} Amelia;555-5553 +@print{} +@print{} Anthony;555-3412 +@print{} +@print{} Becky;555-7685 +@print{} +@print{} Bill;555-1675 +@print{} +@print{} Broderick;555-0542 +@print{} +@print{} Camilla;555-2912 +@print{} +@print{} Fabius;555-1234 +@print{} +@print{} Julie;555-6699 +@print{} +@print{} Martin;555-6480 +@print{} +@print{} Samuel;555-3430 +@print{} +@print{} Jean-Paul;555-2127 +@print{} @end example If the value of @code{ORS} does not contain a newline, the program's output @@ -8256,7 +8407,7 @@ numbers can be formatted. The different format specifications are discussed more fully in @ref{Control Letters}. -@cindex @code{sprintf()} function +@cindexawkfunc{sprintf} @cindex @code{OFMT} variable @cindex output, format specifier@comma{} @code{OFMT} The built-in variable @code{OFMT} contains the default format specification @@ -8322,7 +8473,7 @@ parentheses are necessary if any of the item expressions use the @samp{>} relational operator; otherwise, it can be confused with an output redirection (@pxref{Redirection}). -@cindex format strings +@cindex format specifiers The difference between @code{printf} and @code{print} is the @var{format} argument. This is an expression whose value is taken as a string; it specifies how to output each of the other arguments. It is called the @@ -8708,30 +8859,30 @@ The following simple example shows how to use @code{printf} to make an aligned table: @example -awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list +awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example @noindent This command -prints the names of the bulletin boards (@code{$1}) in the file -@file{BBS-list} as a string of 10 characters that are left-justified. It also +prints the names of the people (@code{$1}) in the file +@file{mail-list} as a string of 10 characters that are left-justified. It also prints the phone numbers (@code{$2}) next on the line. This produces an aligned two-column table of names and phone numbers, as shown here: @example -$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list} -@print{} aardvark 555-5553 -@print{} alpo-net 555-3412 -@print{} barfly 555-7685 -@print{} bites 555-1675 -@print{} camelot 555-0542 -@print{} core 555-2912 -@print{} fooey 555-1234 -@print{} foot 555-6699 -@print{} macfoo 555-6480 -@print{} sdace 555-3430 -@print{} sabafoo 555-2127 +$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list} +@print{} Amelia 555-5553 +@print{} Anthony 555-3412 +@print{} Becky 555-7685 +@print{} Bill 555-1675 +@print{} Broderick 555-0542 +@print{} Camilla 555-2912 +@print{} Fabius 555-1234 +@print{} Julie 555-6699 +@print{} Martin 555-6480 +@print{} Samuel 555-3430 +@print{} Jean-Paul 555-2127 @end example In this case, the phone numbers had to be printed as strings because @@ -8752,7 +8903,7 @@ the @command{awk} program: @example awk 'BEGIN @{ print "Name Number" print "---- ------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list + @{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example The above example mixes @code{print} and @code{printf} statements in @@ -8762,7 +8913,7 @@ same results: @example awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list + @{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example @noindent @@ -8777,7 +8928,7 @@ emphasized by storing it in a variable, like this: awk 'BEGIN @{ format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" @} - @{ printf format, $1, $2 @}' BBS-list + @{ printf format, $1, $2 @}' mail-list @end example @c !!! exercise @@ -8791,9 +8942,11 @@ on the @code{print} statement @node Redirection @section Redirecting Output of @code{print} and @code{printf} +@c STARTOFRANGE outre @cindex output redirection +@c STARTOFRANGE reout @cindex redirection of output -@cindex @code{--sandbox} option, output redirection with @code{print}, @code{printf} +@cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf} So far, the output from @code{print} and @code{printf} has gone to the standard output, usually the screen. Both @code{print} and @code{printf} can @@ -8810,8 +8963,8 @@ Redirections in @command{awk} are written just like redirections in shell commands, except that they are written inside the @command{awk} program. @c the commas here are part of the see also -@cindex @code{print} statement, See Also redirection, of output -@cindex @code{printf} statement, See Also redirection, of output +@cindex @code{print} statement, See Also redirection@comma{} of output +@cindex @code{printf} statement, See Also redirection@comma{} of output There are four forms of output redirection: output to a file, output appended to a file, output through a pipe to another command, and output to a coprocess. They are all shown for the @code{print} statement, @@ -8823,29 +8976,29 @@ but they work identically for @code{printf}: @cindex operators, input/output @item print @var{items} > @var{output-file} This redirection prints the items into the output file named -@var{output-file}. The @value{FN} @var{output-file} can be any +@var{output-file}. The file name @var{output-file} can be any expression. Its value is changed to a string and then used as a -@value{FN} (@pxref{Expressions}). +file name (@pxref{Expressions}). When this type of redirection is used, the @var{output-file} is erased before the first output is written to it. Subsequent writes to the same @var{output-file} do not erase @var{output-file}, but append to it. (This is different from how you use redirections in shell scripts.) If @var{output-file} does not exist, it is created. For example, here -is how an @command{awk} program can write a list of BBS names to one +is how an @command{awk} program can write a list of peoples' names to one file named @file{name-list}, and a list of phone numbers to another file named @file{phone-list}: @example $ @kbd{awk '@{ print $2 > "phone-list"} -> @kbd{print $1 > "name-list" @}' BBS-list} +> @kbd{print $1 > "name-list" @}' mail-list} $ @kbd{cat phone-list} @print{} 555-5553 @print{} 555-3412 @dots{} $ @kbd{cat name-list} -@print{} aardvark -@print{} alpo-net +@print{} Amelia +@print{} Anthony @dots{} @end example @@ -8863,7 +9016,7 @@ appended to the file. If @var{output-file} does not exist, then it is created. @cindex @code{|} (vertical bar), @code{|} operator (I/O) -@cindex pipes, output +@cindex pipe, output @cindex output, pipes @item print @var{items} | @var{command} It is possible to send output to another program through a pipe @@ -8874,7 +9027,7 @@ to another process created to execute @var{command}. The redirection argument @var{command} is actually an @command{awk} expression. Its value is converted to a string whose contents give the shell command to be run. For example, the following produces two -files, one unsorted list of BBS names, and one list sorted in reverse +files, one unsorted list of peoples' names, and one list sorted in reverse alphabetical order: @ignore @@ -8887,7 +9040,7 @@ alone for now and let's hope no-one notices. @example awk '@{ print $1 > "names.unsorted" command = "sort -r > names.sorted" - print $1 | command @}' BBS-list + print $1 | command @}' mail-list @end example The unsorted list is written with an ordinary redirection, while @@ -8996,7 +9149,7 @@ open as many pipelines as the underlying operating system permits. A particularly powerful way to use redirection is to build command lines and pipe them into the shell, @command{sh}. For example, suppose you -have a list of files brought over from a system where all the @value{FN}s +have a list of files brought over from a system where all the file names are stored in uppercase, and you wish to rename them to have names in all lowercase. The following program is both simple and efficient: @@ -9028,7 +9181,7 @@ It then sends the list to the shell for execution. A particularly powerful way to use redirection is to build command lines and pipe them into the shell, @command{sh}. For example, suppose you -have a list of files brought over from a system where all the @value{FN}s +have a list of files brought over from a system where all the file names are stored in uppercase, and you wish to rename them to have names in all lowercase. The following program is both simple and efficient: @@ -9051,12 +9204,12 @@ It then sends the list to the shell for execution. @c ENDOFRANGE reout @node Special Files -@section Special @value{FFN}s in @command{gawk} +@section Special File Names in @command{gawk} @c STARTOFRANGE gfn -@cindex @command{gawk}, @value{FN}s in +@cindex @command{gawk}, file names in -@command{gawk} provides a number of special @value{FN}s that it interprets -internally. These @value{FN}s provide access to standard file descriptors +@command{gawk} provides a number of special file names that it interprets +internally. These file names provide access to standard file descriptors and TCP/IP networking. @menu @@ -9120,12 +9273,12 @@ that happens, writing to the screen is not correct. In fact, if terminal at all. Then opening @file{/dev/tty} fails. -@command{gawk} provides special @value{FN}s for accessing the three standard +@command{gawk} provides special file names for accessing the three standard streams. @value{COMMONEXT}. It also provides syntax for accessing -any other inherited open files. If the @value{FN} matches +any other inherited open files. If the file name matches one of these special names when @command{gawk} redirects input or output, -then it directly uses the stream that the @value{FN} stands for. -These special @value{FN}s work for all operating systems that @command{gawk} +then it directly uses the stream that the file name stands for. +These special file names work for all operating systems that @command{gawk} has been ported to, not just those that are POSIX-compliant: @cindex common extensions, @code{/dev/stdin} special file @@ -9134,10 +9287,10 @@ has been ported to, not just those that are POSIX-compliant: @cindex extensions, common@comma{} @code{/dev/stdin} special file @cindex extensions, common@comma{} @code{/dev/stdout} special file @cindex extensions, common@comma{} @code{/dev/stderr} special file -@cindex @value{FN}s, standard streams in @command{gawk} -@cindex @code{/dev/@dots{}} special files (@command{gawk}) +@cindex file names, standard streams in @command{gawk} +@cindex @code{/dev/@dots{}} special files @cindex files, @code{/dev/@dots{}} special files -@cindex @code{/dev/fd/@var{N}} special files +@cindex @code{/dev/fd/@var{N}} special files (@command{gawk}) @table @file @item /dev/stdin The standard input (file descriptor 0). @@ -9155,7 +9308,7 @@ the shell). Unless special pains are taken in the shell from which @command{gawk} is invoked, only descriptors 0, 1, and 2 are available. @end table -The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} +The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2}, respectively. However, they are more self-explanatory. The proper way to write an error message in a @command{gawk} program @@ -9165,14 +9318,14 @@ is to use @file{/dev/stderr}, like this: print "Serious error detected!" > "/dev/stderr" @end example -@cindex troubleshooting, quotes with @value{FN}s -Note the use of quotes around the @value{FN}. +@cindex troubleshooting, quotes with file names +Note the use of quotes around the file name. Like any other redirection, the value must be a string. It is a common error to omit the quotes, which leads to confusing results. @c Exercise: What does it do? :-) -Finally, using the @code{close()} function on a @value{FN} of the +Finally, using the @code{close()} function on a file name of the form @code{"/dev/fd/@var{N}"}, for file descriptor numbers above two, does actually close the given file descriptor. @@ -9188,7 +9341,7 @@ versions of @command{awk}. @command{gawk} programs can open a two-way TCP/IP connection, acting as either a client or a server. -This is done using a special @value{FN} of the form: +This is done using a special file name of the form: @example @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}} @@ -9198,7 +9351,7 @@ The @var{net-type} is one of @samp{inet}, @samp{inet4} or @samp{inet6}. The @var{protocol} is one of @samp{tcp} or @samp{udp}, and the other fields represent the other essential pieces of information for making a networking connection. -These @value{FN}s are used with the @samp{|&} operator for communicating +These file names are used with the @samp{|&} operator for communicating with a coprocess (@pxref{Two-way I/O}). This is an advanced feature, mentioned here only for completeness. @@ -9206,21 +9359,21 @@ Full discussion is delayed until @ref{TCP/IP Networking}. @node Special Caveats -@subsection Special @value{FFN} Caveats +@subsection Special File Name Caveats Here is a list of things to bear in mind when using the -special @value{FN}s that @command{gawk} provides: +special file names that @command{gawk} provides: @itemize @bullet -@cindex compatibility mode (@command{gawk}), @value{FN}s -@cindex @value{FN}s, in compatibility mode +@cindex compatibility mode (@command{gawk}), file names +@cindex file names, in compatibility mode @item -Recognition of these special @value{FN}s is disabled if @command{gawk} is in +Recognition of these special file names is disabled if @command{gawk} is in compatibility mode (@pxref{Options}). @item @command{gawk} @emph{always} -interprets these special @value{FN}s. +interprets these special file names. For example, using @samp{/dev/fd/4} for output actually writes on file descriptor 4, and not on a new file descriptor that is @code{dup()}'ed from file descriptor 4. Most of @@ -9238,12 +9391,12 @@ Doing so results in unpredictable behavior. @c STARTOFRANGE ofc @cindex output, files@comma{} closing @c STARTOFRANGE pc -@cindex pipes, closing +@cindex pipe, closing @c STARTOFRANGE cc @cindex coprocesses, closing @cindex @code{getline} command, coprocesses@comma{} using from -If the same @value{FN} or the same shell command is used with @code{getline} +If the same file name or the same shell command is used with @code{getline} more than once during the execution of an @command{awk} program (@pxref{Getline}), the file is opened (or the command is executed) the first time only. @@ -9252,11 +9405,11 @@ The next time the same file or command is used with @code{getline}, another record is read from it, and so on. Similarly, when a file or pipe is opened for output, @command{awk} remembers -the @value{FN} or command associated with it, and subsequent +the file name or command associated with it, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until @command{awk} exits. -@cindex @code{close()} function +@cindexawkfunc{close} This implies that special steps are necessary in order to read the same file again from the beginning, or to rerun a shell command (rather than reading more output from the same command). The @code{close()} function @@ -9294,7 +9447,7 @@ file or command, or the next @code{print} or @code{printf} to that file or command, reopens the file or reruns the command. Because the expression that you use to close a file or pipeline must exactly match the expression used to open the file or run the command, -it is good practice to use a variable to store the @value{FN} or command. +it is good practice to use a variable to store the file name or command. The previous example becomes the following: @example @@ -9341,9 +9494,10 @@ a separate message. @cindex differences in @command{awk} and @command{gawk}, @code{close()} function @cindex portability, @code{close()} function and +@cindex @code{close()} function, portability If you use more files than the system allows you to have open, @command{gawk} attempts to multiplex the available open files among -your @value{DF}s. @command{gawk}'s ability to do this depends upon the +your data files. @command{gawk}'s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use @code{close()} on your files when you are done with them. @@ -9425,7 +9579,7 @@ retval = close(command) # syntax error in many Unix awks @end example @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @command{close()} function @command{gawk} treats @code{close()} as a function. The return value is @minus{}1 if the argument names something that was never opened with a redirection, or if there is @@ -9481,7 +9635,7 @@ retval = close(command) # syntax error in many Unix awks @end example @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @command{close()} function @command{gawk} treats @code{close()} as a function. The return value is @minus{}1 if the argument names something that was never opened with a redirection, or if there is @@ -9560,6 +9714,8 @@ which provide the values used in expressions. @node Constants @subsection Constant Expressions + +@c STARTOFRANGE cnst @cindex constants, types of The simplest type of expression is the @dfn{constant}, which always has @@ -9579,7 +9735,8 @@ have different forms, but are stored identically internally. @node Scalar Constants @subsubsection Numeric and String Constants -@cindex numeric, constants +@cindex constants, numeric +@cindex numeric constants A @dfn{numeric constant} stands for a number. This number can be an integer, a decimal fraction, or a number in scientific (exponential) notation.@footnote{The internal representation of all numbers, @@ -9605,7 +9762,7 @@ double-quotation marks. For example: @noindent @cindex differences in @command{awk} and @command{gawk}, strings -@cindex strings, length of +@cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible eight-bit ASCII characters including ASCII @sc{nul} (character code zero). @@ -9821,9 +9978,9 @@ upon the contents of the current input record. @cindex differences in @command{awk} and @command{gawk}, regexp constants @cindex dark corner, regexp constants, as arguments to user-defined functions -@cindex @code{gensub()} function (@command{gawk}) -@cindex @code{sub()} function -@cindex @code{gsub()} function +@cindexgawkfunc{gensub} +@cindexawkfunc{sub} +@cindexawkfunc{gsub} Constant regular expressions are also used as the first argument for the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the second argument of the @code{match()} function, @@ -9934,7 +10091,7 @@ Such an assignment has the following form: @var{variable}=@var{text} @end example -@cindex @code{-v} option +@cindex @option{-v} option @noindent With it, a variable is set either at the beginning of the @command{awk} run or in between input files. @@ -9948,7 +10105,7 @@ as in the following: @noindent the variable is set at the very beginning, even before the @code{BEGIN} rules execute. The @option{-v} option and its assignment -must precede all the @value{FN} arguments, as well as the program text. +must precede all the file name arguments, as well as the program text. (@xref{Options}, for more information about the @option{-v} option.) Otherwise, the variable assignment is performed at a time determined by @@ -9956,7 +10113,7 @@ its position among the input file arguments---after the processing of the preceding input file argument. For example: @example -awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list +awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list @end example @noindent @@ -9965,10 +10122,10 @@ the first file is read, the command line sets the variable @code{n} equal to four. This causes the fourth field to be printed in lines from @file{inventory-shipped}. After the first file has finished, but before the second file is started, @code{n} is set to two, so that the -second field is printed in lines from @file{BBS-list}: +second field is printed in lines from @file{mail-list}: @example -$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list} +$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list} @print{} 15 @print{} 24 @dots{} @@ -10030,7 +10187,7 @@ with @code{CONVFMT} as the format specifier (@pxref{String Functions}). -@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with +@code{CONVFMT}'s default value is @code{"%.6g"}, which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, @@ -10092,7 +10249,7 @@ point when reading the @command{awk} program source code, and for command-line variable assignments (@pxref{Other Arguments}). However, when interpreting input data, for @code{print} and @code{printf} output, and for number to string conversion, the local decimal point character is used. -@value{DARKCORNER}. +@value{DARKCORNER} Here are some examples indicating the difference in behavior, on a GNU/Linux system: @@ -10279,8 +10436,8 @@ For maximum portability, do not use the @samp{**} operator. @subsection String Concatenation @cindex Kernighan, Brian @quotation -@i{It seemed like a good idea at the time.}@* -Brian Kernighan +@i{It seemed like a good idea at the time.} +@author Brian Kernighan @end quotation @cindex string operators @@ -10291,9 +10448,9 @@ specific operator to represent it. Instead, concatenation is performed by writing expressions next to one another, with no operator. For example: @example -$ @kbd{awk '@{ print "Field number one: " $1 @}' BBS-list} -@print{} Field number one: aardvark -@print{} Field number one: alpo-net +$ @kbd{awk '@{ print "Field number one: " $1 @}' mail-list} +@print{} Field number one: Amelia +@print{} Field number one: Anthony @dots{} @end example @@ -10301,9 +10458,9 @@ Without the space in the string constant after the @samp{:}, the line runs together. For example: @example -$ @kbd{awk '@{ print "Field number one:" $1 @}' BBS-list} -@print{} Field number one:aardvark -@print{} Field number one:alpo-net +$ @kbd{awk '@{ print "Field number one:" $1 @}' mail-list} +@print{} Field number one:Amelia +@print{} Field number one:Anthony @dots{} @end example @@ -10320,6 +10477,8 @@ name = "name" print "something meaningful" > file name @end example +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility @noindent This produces a syntax error with some versions of Unix @command{awk}.@footnote{It happens that Brian Kernighan's @@ -10765,6 +10924,7 @@ just like variables. (Use @samp{$(i++)} when you want to do a field reference and a variable increment at the same time. The parentheses are necessary because of the precedence of the field reference operator @samp{$}.) +@c STARTOFRANGE deop @cindex decrement operators The decrement operator @samp{--} works just like @samp{++}, except that it subtracts one instead of adding it. As with @samp{++}, it can be used before @@ -10810,8 +10970,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @cindex Marx, Groucho @quotation @i{Doctor, doctor! It hurts when I do this!@* -So don't do that!}@* -Groucho Marx +So don't do that!} +@author Groucho Marx @end quotation @noindent @@ -10862,8 +11022,8 @@ You should avoid such things in your own programs. @cindex Marx, Groucho @quotation @i{Doctor, doctor! It hurts when I do this!@* -So don't do that!}@* -Groucho Marx +So don't do that!} +@author Groucho Marx @end quotation @noindent @@ -10961,8 +11121,8 @@ the string constant @code{"0"} is actually true, because it is non-null. @node Typing and Comparison @subsection Variable Typing and Comparison Expressions @quotation -@i{The Guide is definitive. Reality is frequently inaccurate.}@* -The Hitchhiker's Guide to the Galaxy +@i{The Guide is definitive. Reality is frequently inaccurate.} +@author The Hitchhiker's Guide to the Galaxy @end quotation @c STARTOFRANGE comex @@ -11241,7 +11401,7 @@ string comparison (true) string comparison (true) @item a = 2; b = " +2" -@item a == b +@itemx a == b string comparison (false) @end table @@ -11375,10 +11535,10 @@ The Boolean operators are: @item @var{boolean1} && @var{boolean2} True if both @var{boolean1} and @var{boolean2} are true. For example, the following statement prints the current input record if it contains -both @samp{2400} and @samp{foo}: +both @samp{edu} and @samp{li}: @example -if ($0 ~ /2400/ && $0 ~ /foo/) print +if ($0 ~ /edu/ && $0 ~ /li/) print @end example @cindex side effects, Boolean operators @@ -11391,11 +11551,11 @@ no substring @samp{foo} in the record. @item @var{boolean1} || @var{boolean2} True if at least one of @var{boolean1} or @var{boolean2} is true. For example, the following statement prints all records in the input -that contain @emph{either} @samp{2400} or -@samp{foo} or both: +that contain @emph{either} @samp{edu} or +@samp{li} or both: @example -if ($0 ~ /2400/ || $0 ~ /foo/) print +if ($0 ~ /edu/ || $0 ~ /li/) print @end example The subexpression @var{boolean2} is evaluated only if @var{boolean1} @@ -11620,7 +11780,7 @@ $ @kbd{awk '@{ print "The square root of", $1, "is", sqrt($1) @}'} @print{} The square root of 3 is 1.73205 @kbd{5} @print{} The square root of 5 is 2.23607 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example A function can also have side effects, such as assigning @@ -11990,7 +12150,7 @@ slashes (@code{/@var{regexp}/}), or any expression whose string value is used as a dynamic regular expression (@pxref{Computed Regexps}). The following example prints the second field of each input record -whose first field is precisely @samp{foo}: +whose first field is precisely @samp{li}: @cindex @code{/} (forward slash), patterns and @cindex forward slash (@code{/}), patterns and @@ -11999,68 +12159,65 @@ whose first field is precisely @samp{foo}: @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator @example -$ @kbd{awk '$1 == "foo" @{ print $2 @}' BBS-list} +$ @kbd{awk '$1 == "li" @{ print $2 @}' mail-list} @end example @noindent -(There is no output, because there is no BBS site with the exact name @samp{foo}.) +(There is no output, because there is no person with the exact name @samp{li}.) Contrast this with the following regular expression match, which -accepts any record with a first field that contains @samp{foo}: +accepts any record with a first field that contains @samp{li}: @example -$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' BBS-list} -@print{} 555-1234 +$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' mail-list} +@print{} 555-5553 @print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 @end example @cindex regexp constants, as patterns @cindex patterns, regexp constants as A regexp constant as a pattern is also a special case of an expression -pattern. The expression @code{/foo/} has the value one if @samp{foo} -appears in the current input record. Thus, as a pattern, @code{/foo/} -matches any record containing @samp{foo}. +pattern. The expression @code{/li/} has the value one if @samp{li} +appears in the current input record. Thus, as a pattern, @code{/li/} +matches any record containing @samp{li}. @cindex Boolean expressions, as patterns Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match. For example, the following command prints all the records in -@file{BBS-list} that contain both @samp{2400} and @samp{foo}: +@file{mail-list} that contain both @samp{edu} and @samp{li}: @example -$ @kbd{awk '/2400/ && /foo/' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B +$ @kbd{awk '/edu/ && /li/' mail-list} +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @end example The following command prints all records in -@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo} +@file{mail-list} that contain @emph{either} @samp{edu} or @samp{li} (or both, of course): @example -$ @kbd{awk '/2400/ || /foo/' BBS-list} -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '/edu/ || /li/' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example The following command prints all records in -@file{BBS-list} that do @emph{not} contain the string @samp{foo}: +@file{mail-list} that do @emph{not} contain the string @samp{li}: @example -$ @kbd{awk '! /foo/' BBS-list} -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200/300 C -@print{} sdace 555-3430 2400/1200/300 A +$ @kbd{awk '! /li/' mail-list} +@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A +@print{} Becky 555-7685 becky.algebrarum@@gmail.com A +@print{} Bill 555-1675 bill.drowning@@hotmail.com A +@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Martin 555-6480 martin.codicibus@@hotmail.com A +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example @cindex @code{BEGIN} pattern, Boolean patterns and @@ -12164,6 +12321,11 @@ $ @kbd{echo Yes | gawk '(/1/,/2/) || /Yes/'} @error{} gawk: cmd. line:1: ^ syntax error @end example +@cindex range patterns, line continuation and +As a minor point of interest, although it is poor style, +POSIX allows you to put a newline after the comma in +a range pattern. @value{DARKCORNER} + @node BEGIN/END @subsection The @code{BEGIN} and @code{END} Special Patterns @@ -12188,28 +12350,30 @@ programmers. @node Using BEGIN/END @subsubsection Startup and Cleanup Actions +@cindex @code{BEGIN} pattern +@cindex @code{END} pattern A @code{BEGIN} rule is executed once only, before the first input record is read. Likewise, an @code{END} rule is executed once only, after all the input is read. For example: @example $ @kbd{awk '} -> @kbd{BEGIN @{ print "Analysis of \"foo\"" @}} -> @kbd{/foo/ @{ ++n @}} -> @kbd{END @{ print "\"foo\" appears", n, "times." @}' BBS-list} -@print{} Analysis of "foo" -@print{} "foo" appears 4 times. +> @kbd{BEGIN @{ print "Analysis of \"li\"" @}} +> @kbd{/li/ @{ ++n @}} +> @kbd{END @{ print "\"li\" appears in", n, "records." @}' mail-list} +@print{} Analysis of "li" +@print{} "li" appears in 4 records. @end example @cindex @code{BEGIN} pattern, operators and @cindex @code{END} pattern, operators and -This program finds the number of records in the input file @file{BBS-list} -that contain the string @samp{foo}. The @code{BEGIN} rule prints a title +This program finds the number of records in the input file @file{mail-list} +that contain the string @samp{li}. The @code{BEGIN} rule prints a title for the report. There is no need to use the @code{BEGIN} rule to initialize the counter @code{n} to zero, since @command{awk} does this automatically (@pxref{Variables}). The second rule increments the variable @code{n} every time a -record containing the pattern @samp{foo} is read. The @code{END} rule +record containing the pattern @samp{li} is read. The @code{END} rule prints the value of @code{n} at the end of the run. The special patterns @code{BEGIN} and @code{END} cannot be used in ranges @@ -12262,6 +12426,7 @@ to give @code{$0} a real value is to execute a @code{getline} command without a variable (@pxref{Getline}). Another way is simply to assign a value to @code{$0}. +@cindex Brian Kernighan's @command{awk} @cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns @cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and @@ -12330,7 +12495,7 @@ you can bypass the fatal error and move on to the next file on the command line. @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @code{BEGINFILE} pattern @cindex @code{nextfile} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and You do this by checking if the @code{ERRNO} variable is not the empty string; if so, then @command{gawk} was not able to open the file. In @@ -12372,7 +12537,7 @@ both @code{BEGINFILE} and @code{ENDFILE}. Only the @samp{getline In most other @command{awk} implementations, or if @command{gawk} is in compatibility mode (@pxref{Options}), they are not special. -@c FIXME: For 4.1 maybe deal with this? +@c FIXME: For 4.2 maybe deal with this? @ignore Date: Tue, 17 May 2011 02:06:10 PDT From: rankin@pactechdata.com (Pat Rankin) @@ -12403,7 +12568,7 @@ An empty (i.e., nonexistent) pattern is considered to match @emph{every} input record. For example, the program: @example -awk '@{ print $1 @}' BBS-list +awk '@{ print $1 @}' mail-list @end example @noindent @@ -12656,6 +12821,7 @@ the first thing on its line. @subsection The @code{while} Statement @cindex @code{while} statement @cindex loops +@cindex loops, @code{while} @cindex loops, See Also @code{while} statement In programming, a @dfn{loop} is a part of a program that can @@ -12716,6 +12882,7 @@ program is harder to read without it. @node Do Statement @subsection The @code{do}-@code{while} Statement @cindex @code{do}-@code{while} statement +@cindex loops, @code{do}-@code{while} The @code{do} loop is a variation of the @code{while} looping statement. The @code{do} loop executes the @var{body} once and then repeats the @@ -12761,6 +12928,7 @@ occasionally is there a real use for a @code{do} statement. @node For Statement @subsection The @code{for} Statement @cindex @code{for} statement +@cindex loops, @code{for}, iterative The @code{for} statement makes it more convenient to count iterations of a loop. The general form of the @code{for} statement looks like this: @@ -12867,6 +13035,8 @@ for more information on this version of the @code{for} loop. @cindex @code{case} keyword @cindex @code{default} keyword +This @value{SECTION} describes a @command{gawk}-specific feature. + The @code{switch} statement allows the evaluation of an expression and the execution of statements based on a @code{case} match. Case statements are checked for a match in the order they are defined. If no suitable @@ -12931,6 +13101,7 @@ it is not available. @subsection The @code{break} Statement @cindex @code{break} statement @cindex loops, exiting +@cindex loops, @code{break} statement and The @code{break} statement jumps out of the innermost @code{for}, @code{while}, or @code{do} loop that encloses it. The following example @@ -12990,6 +13161,7 @@ This is discussed in @ref{Switch Statement}. @cindex POSIX @command{awk}, @code{break} statement and @cindex dark corner, @code{break} statement @cindex @command{gawk}, @code{break} statement in +@cindex Brian Kernighan's @command{awk} The @code{break} statement has no meaning when used outside the body of a loop or @code{switch}. However, although it was never documented, @@ -13054,6 +13226,7 @@ This program loops forever once @code{x} reaches 5. @cindex POSIX @command{awk}, @code{continue} statement and @cindex dark corner, @code{continue} statement @cindex @command{gawk}, @code{continue} statement in +@cindex Brian Kernighan's @command{awk} The @code{continue} statement has no special meaning with respect to the @code{switch} statement, nor does it have any meaning when used outside the body of a loop. Historical versions of @command{awk} treated a @code{continue} @@ -13142,11 +13315,11 @@ The @code{nextfile} statement is similar to the @code{next} statement. However, instead of abandoning processing of the current record, the @code{nextfile} statement instructs @command{awk} to stop processing the -current @value{DF}. +current data file. Upon execution of the @code{nextfile} statement, @code{FILENAME} is -updated to the name of the next @value{DF} listed on the command line, +updated to the name of the next data file listed on the command line, @code{FNR} is reset to one, and processing starts over with the first rule in the program. @@ -13155,10 +13328,10 @@ then the code in any @code{END} rules is executed. An exception to this is when @code{nextfile} is invoked during execution of any statement in an @code{END} rule; In this case, it causes the program to stop immediately. @xref{BEGIN/END}. -The @code{nextfile} statement is useful when there are many @value{DF}s +The @code{nextfile} statement is useful when there are many data files to process but it isn't necessary to process every record in every file. Without @code{nextfile}, -in order to move on to the next @value{DF}, a program +in order to move on to the next data file, a program would have to continue scanning the unwanted records. The @code{nextfile} statement accomplishes this much more efficiently. @@ -13191,8 +13364,10 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @cindex functions, user-defined, @code{next}/@code{nextfile} statements and @cindex @code{nextfile} statement, user-defined functions and -The current version of the Brian Kernighan's @command{awk} (@pxref{Other -Versions}) also supports @code{nextfile}. However, it doesn't allow the +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility +The current version of the Brian Kernighan's @command{awk}, and @command{mawk} (@pxref{Other +Versions}) also support @code{nextfile}. However, they don't allow the @code{nextfile} statement inside function bodies (@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the next record and starts processing it with the first rule in the program, @@ -13396,7 +13571,7 @@ exclusively on the value of @code{FS}. @item FS This is the input field separator (@pxref{Field Separators}). -The value is a single-character string or a multi-character regular +The value is a single-character string or a multicharacter regular expression that matches the separations between fields in an input record. If the value is the null string (@code{""}), then each character in the record becomes a separate field. @@ -13429,8 +13604,8 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment. @cindex @command{gawk}, @code{IGNORECASE} variable in @cindex @code{IGNORECASE} variable @cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable -@cindex case sensitivity, string comparisons and -@cindex case sensitivity, regexps and +@cindex case sensitivity, and string comparisons +@cindex case sensitivity, and regexps @cindex regular expressions, case sensitivity @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons @@ -13542,7 +13717,7 @@ This is the subscript separator. It has the default value of @code{"\034"} and is used to separate the parts of the indices of a multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} -(@pxref{Multi-dimensional}). +(@pxref{Multidimensional}). @cindex @command{gawk}, @code{TEXTDOMAIN} variable in @cindex @code{TEXTDOMAIN} variable @@ -13595,16 +13770,16 @@ In the following example: $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} -> @kbd{@}' inventory-shipped BBS-list} +> @kbd{@}' inventory-shipped mail-list} @print{} awk @print{} inventory-shipped -@print{} BBS-list +@print{} mail-list @end example @noindent @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]} contains @samp{inventory-shipped}, and @code{ARGV[2]} contains -@samp{BBS-list}. The value of @code{ARGC} is three, one more than the +@samp{mail-list}. The value of @code{ARGC} is three, one more than the index of the last element in @code{ARGV}, because the elements are numbered from zero. @@ -13625,17 +13800,17 @@ about how @command{awk} uses these variables. @cindex differences in @command{awk} and @command{gawk}, @code{ARGIND} variable @item ARGIND # The index in @code{ARGV} of the current file being processed. -Every time @command{gawk} opens a new @value{DF} for processing, it sets -@code{ARGIND} to the index in @code{ARGV} of the @value{FN}. +Every time @command{gawk} opens a new data file for processing, it sets +@code{ARGIND} to the index in @code{ARGV} of the file name. When @command{gawk} is processing the input files, @samp{FILENAME == ARGV[ARGIND]} is always true. @cindex files, processing@comma{} @code{ARGIND} variable and This variable is useful in file processing; it allows you to tell how far -along you are in the list of @value{DF}s as well as to distinguish between -successive instances of the same @value{FN} on the command line. +along you are in the list of data files as well as to distinguish between +successive instances of the same file name on the command line. -@cindex @value{FN}s, distinguishing +@cindex file names, distinguishing While you can change the value of @code{ARGIND} within your @command{awk} program, @command{gawk} automatically sets it to a new value when the next file is opened. @@ -13647,15 +13822,23 @@ or if @command{gawk} is in compatibility mode it is not special. @cindex @code{ENVIRON} array -@cindex environment variables +@cindex environment variables, in @code{ENVIRON} array @item ENVIRON An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system()} function. -@c (In a future version of @command{gawk}, it may do so.) +@code{ENVIRON["HOME"]} might be @file{/home/arnold}. + +For POSIX @command{awk}, changing this array does not affect the +environment passed on to any programs that @command{awk} may spawn via +redirection or the @code{system()} function. + +However, beginning with version 4.2, if not in POSIX +compatibility mode, @command{gawk} does update its own environment when +@code{ENVIRON} is changed, thus changing the environment seen by programs +that it creates. You should therefore be especially careful if you +modify @code{ENVIRON["PATH"]"}, which is the search path for finding +executable programs. Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @@ -13697,14 +13880,14 @@ it is not special. @cindex dark corner, @code{FILENAME} variable @item FILENAME The name of the file that @command{awk} is currently reading. -When no @value{DF}s are listed on the command line, @command{awk} reads +When no data files are listed on the command line, @command{awk} reads from the standard input and @code{FILENAME} is set to @code{"-"}. @code{FILENAME} is changed each time a new file is read (@pxref{Reading Files}). Inside a @code{BEGIN} rule, the value of @code{FILENAME} is @code{""}, since there are no input files being processed yet.@footnote{Some early implementations of Unix @command{awk} initialized -@code{FILENAME} to @code{"-"}, even if there were @value{DF}s to be +@code{FILENAME} to @code{"-"}, even if there were data files to be processed. This behavior was incorrect and should not be relied upon in your programs.} @value{DARKCORNER} @@ -13726,13 +13909,7 @@ The number of fields in the current input record. @code{NF} is set each time a new record is read, when a new field is created or when @code{$0} changes (@pxref{Fields}). -Unlike most of the variables described in this -@ifnotinfo -section, -@end ifnotinfo -@ifinfo -node, -@end ifinfo +Unlike most of the variables described in this @value{SUBSECTION}, assigning a value to @code{NF} has the potential to affect @command{awk}'s internal workings. In particular, assignments to @code{NF} can be used to create or remove fields from the @@ -13768,10 +13945,12 @@ The following elements (listed alphabetically) are guaranteed to be available: @table @code +@cindex effective group ID of @command{gawk} user @item PROCINFO["egid"] The value of the @code{getegid()} system call. @item PROCINFO["euid"] +@cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @item PROCINFO["FS"] @@ -13781,6 +13960,7 @@ This is or @code{"FPAT"} if field matching with @code{FPAT} is in effect. @item PROCINFO["identifiers"] +@cindex program identifiers A subarray, indexed by the names of all identifiers used in the text of the AWK program. For each identifier, the value of the element is one of the following: @@ -13809,15 +13989,19 @@ after it has finished parsing the program; they are @emph{not} updated while the program runs. @item PROCINFO["gid"] +@cindex group ID of @command{gawk} user The value of the @code{getgid()} system call. @item PROCINFO["pgrpid"] +@cindex process group idIDof @command{gawk} process The process group ID of the current process. @item PROCINFO["pid"] +@cindex process ID of @command{gawk} process The process ID of the current process. @item PROCINFO["ppid"] +@cindex parent process ID of @command{gawk} process The parent process ID of the current process. @item PROCINFO["sorted_in"] @@ -13837,25 +14021,31 @@ Assigning a new value to this element changes the default. The value of the @code{getuid()} system call. @item PROCINFO["version"] +@cindex version of @command{gawk} +@cindex @command{gawk} version The version of @command{gawk}. @end table The following additional elements in the array are available to provide information about the MPFR and GMP libraries if your version of @command{gawk} supports arbitrary precision numbers -(@pxref{Arbitrary Precision Arithmetic}): +(@pxref{Gawk and MPFR}): @table @code +@cindex version of GNU MPFR library @item PROCINFO["mpfr_version"] The version of the GNU MPFR library. @item PROCINFO["gmp_version"] +@cindex version of GNU MP library The version of the GNU MP library. @item PROCINFO["prec_max"] +@cindex maximum precision supported by MPFR library The maximum precision supported by MPFR. @item PROCINFO["prec_min"] +@cindex minimum precision supported by MPFR library The minimum precision required by MPFR. @end table @@ -13866,12 +14056,15 @@ of @command{gawk} supports dynamic loading of extension functions @table @code @item PROCINFO["api_major"] +@cindex version of @command{gawk} extension API +@cindex extension API, version number The major version of the extension API. @item PROCINFO["api_minor"] The minor version of the extension API. @end table +@cindex supplementary groups of @command{gawk} process On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -13879,7 +14072,7 @@ to test for these elements (@pxref{Reference to Elements}). @cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, uses The @code{PROCINFO} array has the following additional uses: @itemize @bullet @@ -13951,7 +14144,7 @@ if an element in @code{SYMTAB} is an array. Also, you may not use the @code{delete} statement with the @code{SYMTAB} array. -You may use an index for @code{SYMTAB} that is not a predefined identifer: +You may use an index for @code{SYMTAB} that is not a predefined identifier: @example SYMTAB["xxx"] = 5 @@ -14065,7 +14258,7 @@ changed. @node ARGC and ARGV @subsection Using @code{ARGC} and @code{ARGV} -@cindex @code{ARGC}/@code{ARGV} variables +@cindex @code{ARGC}/@code{ARGV} variables, how to use @cindex arguments, command-line @cindex command line, arguments @@ -14077,16 +14270,16 @@ and @code{ARGV}: $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} -> @kbd{@}' inventory-shipped BBS-list} +> @kbd{@}' inventory-shipped mail-list} @print{} awk @print{} inventory-shipped -@print{} BBS-list +@print{} mail-list @end example @noindent In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]} contains @samp{inventory-shipped}, and @code{ARGV[2]} contains -@samp{BBS-list}. +@samp{mail-list}. Notice that the @command{awk} program is not entered in @code{ARGV}. The other command-line options, with their arguments, are also not entered. This includes variable assignments done with the @option{-v} @@ -14127,11 +14320,11 @@ additional files to be read. If the value of @code{ARGC} is decreased, that eliminates input files from the end of the list. By recording the old value of @code{ARGC} elsewhere, a program can treat the eliminated arguments as -something other than @value{FN}s. +something other than file names. To eliminate a file from the middle of the list, store the null string (@code{""}) into @code{ARGV} in place of the file's name. As a -special feature, @command{awk} ignores @value{FN}s that have been +special feature, @command{awk} ignores file names that have been replaced with the null string. Another option is to use the @code{delete} statement to remove elements from @@ -14210,7 +14403,7 @@ ability to support true multidimensional arrays. @cindex variables, names of @cindex functions, names of -@cindex arrays, names of +@cindex arrays, names of, and names of functions/variables @cindex names, arrays/variables @cindex namespace issues @command{awk} maintains a single set @@ -14226,7 +14419,7 @@ same @command{awk} program. * Numeric Array Subscripts:: How to use numbers as subscripts in @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in @command{awk}. * Arrays of Arrays:: True multidimensional arrays. @end menu @@ -14256,8 +14449,8 @@ an array. @cindex Wall, Larry @quotation @i{Doing linear scans over an associative array is like trying to club someone -to death with a loaded Uzi.}@* -Larry Wall +to death with a loaded Uzi.} +@author Larry Wall @end quotation The @command{awk} language provides one-dimensional arrays @@ -14386,10 +14579,9 @@ Here, the number @code{1} isn't double-quoted, since @command{awk} automatically converts it to a string. @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable @cindex case sensitivity, array indices and -@cindex arrays, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array subscripts and +@cindex arrays, and @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, and array indices The value of @code{IGNORECASE} has no effect upon array subscripting. The identical string value used to store an array element must be used to retrieve it. @@ -14405,8 +14597,9 @@ is independent of the number of elements in the array. @node Reference to Elements @subsection Referring to an Array Element -@cindex arrays, elements, referencing -@cindex elements in arrays +@cindex arrays, referencing elements +@cindex array members +@cindex elements of arrays The principal way to use an array is to refer to one of its elements. An array reference is an expression as follows: @@ -14423,11 +14616,16 @@ The value of the array reference is the current value of that array element. For example, @code{foo[4.3]} is an expression for the element of array @code{foo} at index @samp{4.3}. +@cindex arrays, unassigned elements +@cindex unassigned array elements +@cindex empty array elements A reference to an array element that has no recorded value yields a value of @code{""}, the null string. This includes elements that have not been assigned any value as well as elements that have been deleted (@pxref{Delete}). +@cindex non-existent array elements +@cindex arrays, elements that don't exist @quotation NOTE A reference to an element that does not exist @emph{automatically} creates that array element, with the null string as its value. (In some cases, @@ -14447,7 +14645,7 @@ if it didn't exist before! @end quotation @c @cindex arrays, @code{in} operator and -@cindex @code{in} operator +@cindex @code{in} operator, testing if array element exists To determine whether an element exists in an array at a certain index, use the following expression: @@ -14482,8 +14680,8 @@ if (frequencies[2] != "") @node Assigning Elements @subsection Assigning Array Elements -@cindex arrays, elements, assigning -@cindex elements in arrays, assigning +@cindex arrays, elements, assigning values +@cindex elements in arrays, assigning values Array elements can be assigned values just like @command{awk} variables: @@ -14500,6 +14698,7 @@ assign to that element of the array. @node Array Example @subsection Basic Array Example +@cindex arrays, an example of using The following program takes a list of lines, each beginning with a line number, and prints them out in order of line number. The line numbers @@ -14569,7 +14768,9 @@ END @{ @node Scanning an Array @subsection Scanning All Elements of an Array @cindex elements in arrays, scanning +@cindex scanning arrays @cindex arrays, scanning +@cindex loops, @code{for}, array scanning In programs that use arrays, it is often necessary to use a loop that executes once for each element of an array. In other languages, where @@ -14586,7 +14787,7 @@ for (@var{var} in @var{array}) @end example @noindent -@cindex @code{in} operator +@cindex @code{in} operator, use in loops This loop executes @var{body} once for each index in @var{array} that the program has previously used, with the variable @var{var} set to that index. @@ -14625,8 +14826,9 @@ END @{ @xref{Word Sorting}, for a more detailed example of this type. -@cindex arrays, elements, order of -@cindex elements in arrays, order of +@cindex arrays, elements, order of access by @code{in} operator +@cindex elements in arrays, order of access by @code{in} operator +@cindex @code{in} operator, order of array access The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within @command{awk} and normally cannot be controlled or changed. This can lead to @@ -14644,6 +14846,8 @@ determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of @command{awk} to the next. +@cindex array scanning order, controlling +@cindex controlling array scanning order Often, though, you may wish to do something simple, such as ``traverse the array by comparing the indices in ascending order,'' or ``traverse the array by comparing the values in descending order.'' @@ -14660,6 +14864,7 @@ to use for comparison of array elements. This advanced feature is described later, in @ref{Array Sorting}. @end itemize +@cindex @code{PROCINFO}, values of @code{sorted_in} The following special values for @code{PROCINFO["sorted_in"]} are available: @table @code @@ -14668,29 +14873,29 @@ Array elements are processed in arbitrary order, which is the default @command{awk} behavior. @item "@@ind_str_asc" -Order by indices compared as strings; this is the most basic sort. +Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with @samp{a[2*5] = 1} the index is @code{"10"} rather than numeric 10.) @item "@@ind_num_asc" -Order by indices but force them to be treated as numbers in the process. +Order by indices in ascending order but force them to be treated as numbers in the process. Any index with a non-numeric value will end up positioned as if it were zero. @item "@@val_type_asc" -Order by element values rather than indices. +Order by element values in ascending order (rather than by indices). Ordering is by the type assigned to the element (@pxref{Typing and Comparison}). All numeric values come before all string values, which in turn come before all subarrays. (Subarrays have not been described yet; -@pxref{Arrays of Arrays}). +@pxref{Arrays of Arrays}.) @item "@@val_str_asc" -Order by element values rather than by indices. Scalar values are +Order by element values in ascending order (rather than by indices). Scalar values are compared as strings. Subarrays, if present, come out last. @item "@@val_num_asc" -Order by element values rather than by indices. Scalar values are +Order by element values in ascending order (rather than by indices). Scalar values are compared as numbers. Subarrays, if present, come out last. When numeric values are equal, the string values are used to provide an ordering: this guarantees consistent results across different @@ -14703,13 +14908,14 @@ across different environments.} which @command{gawk} uses internally to perform the sorting. @item "@@ind_str_desc" -Reverse order from the most basic sort. +String indices ordered from high to low. @item "@@ind_num_desc" Numeric indices ordered from high to low. @item "@@val_type_desc" -Element values, based on type, in descending order. +Element values, based on type, ordered from high to low. +Subarrays, if present, come out first. @item "@@val_str_desc" Element values, treated as strings, ordered from high to low. @@ -14819,7 +15025,7 @@ if (4 in foo) print "This will never be printed" @end example -@cindex null strings, array elements and +@cindex null strings, and deleting array elements It is important to note that deleting an element is @emph{not} the same as assigning it a null value (the empty string, @code{""}). For example: @@ -14841,6 +15047,7 @@ is not in the array is deleted. @cindex extensions, common@comma{} @code{delete} to delete entire arrays @cindex arrays, deleting entire contents @cindex deleting entire arrays +@cindex @code{delete} @var{array} @cindex differences in @command{awk} and @command{gawk}, array elements, deleting All the elements of an array may be deleted with a single statement by leaving off the subscript in the @code{delete} statement, @@ -14855,6 +15062,7 @@ Using this version of the @code{delete} statement is about three times more efficient than the equivalent loop that deletes each element one at a time. +@cindex Brian Kernighan's @command{awk} @quotation NOTE For many years, using @code{delete} without a subscript was a @command{gawk} extension. @@ -14897,9 +15105,9 @@ a = 3 @section Using Numbers to Subscript Arrays @cindex numbers, as array subscripts -@cindex arrays, subscripts +@cindex arrays, numeric subscripts @cindex subscripts in arrays, numbers as -@cindex @code{CONVFMT} variable, array subscripts and +@cindex @code{CONVFMT} variable, and array subscripts An important aspect to remember about arrays is that @emph{array subscripts are always strings}. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting @@ -14929,7 +15137,8 @@ string value from @code{xyz}---this time @code{"12.15"}---because the value of @code{CONVFMT} only allows two significant digits. This test fails, since @code{"12.15"} is different from @code{"12.153"}. -@cindex converting, during subscripting +@cindex converting integer array subscripts +@cindex integer array indices According to the rules for conversions (@pxref{Conversion}), integer values are always converted to strings as integers, no matter what the @@ -15019,11 +15228,11 @@ Even though it is somewhat unusual, the null string if @option{--lint} is provided on the command line (@pxref{Options}). -@node Multi-dimensional +@node Multidimensional @section Multidimensional Arrays @menu -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. @end menu @cindex subscripts in arrays, multidimensional @@ -15035,7 +15244,7 @@ languages, including @command{awk}) to refer to an element of a two-dimensional array named @code{grid} is with @code{grid[@var{x},@var{y}]}. -@cindex @code{SUBSEP} variable, multidimensional arrays +@cindex @code{SUBSEP} variable, and multidimensional arrays Multidimensional arrays are supported in @command{awk} through concatenation of indices into one string. @command{awk} converts the indices into strings @@ -15067,6 +15276,7 @@ combined strings that are ambiguous. Suppose that @code{SUBSEP} is "b@@c"]}} are indistinguishable because both are actually stored as @samp{foo["a@@b@@c"]}. +@cindex @code{in} operator, index existence in multidimensional arrays To test whether a particular index sequence exists in a multidimensional array, use the same operator (@code{in}) that is used for single dimensional arrays. Write the whole sequence of indices @@ -15121,7 +15331,7 @@ the program produces the following output: 3 2 1 6 @end example -@node Multi-scanning +@node Multiscanning @subsection Scanning Multidimensional Arrays There is no special @code{for} statement for scanning a @@ -15132,6 +15342,7 @@ multidimensional @emph{way of accessing} an array. @cindex subscripts in arrays, multidimensional, scanning @cindex arrays, multidimensional, scanning +@cindex scanning multidimensional arrays However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining the scanning @code{for} statement @@ -15173,6 +15384,7 @@ separate indices is recovered. @node Arrays of Arrays @section Arrays of Arrays +@cindex arrays of arrays @command{gawk} goes beyond standard @command{awk}'s multidimensional array access and provides true arrays of @@ -15432,6 +15644,7 @@ two arguments 11 and 10. @node Numeric Functions @subsection Numeric Functions +@cindex numeric functions The following list describes all of the built-in functions that work with numbers. @@ -15439,22 +15652,26 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):} @table @code @item atan2(@var{y}, @var{x}) -@cindex @code{atan2()} function +@cindexawkfunc{atan2} +@cindex arctangent Return the arctangent of @code{@var{y} / @var{x}} in radians. You can use @samp{pi = atan2(0, -1)} to retrieve the value of @value{PI}. @item cos(@var{x}) -@cindex @code{cos()} function +@cindexawkfunc{cos} +@cindex cosine Return the cosine of @var{x}, with @var{x} in radians. @item exp(@var{x}) -@cindex @code{exp()} function +@cindexawkfunc{exp} +@cindex exponent Return the exponential of @var{x} (@code{e ^ @var{x}}) or report an error if @var{x} is out of range. The range of values @var{x} can have depends on your machine's floating-point representation. @item int(@var{x}) -@cindex @code{int()} function +@cindexawkfunc{int} +@cindex round to nearest integer Return the nearest integer to @var{x}, located between @var{x} and zero and truncated toward zero. @@ -15462,12 +15679,13 @@ For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. @item log(@var{x}) -@cindex @code{log()} function +@cindexawkfunc{log} +@cindex logarithm Return the natural logarithm of @var{x}, if @var{x} is positive; otherwise, report an error. @item rand() -@cindex @code{rand()} function +@cindexawkfunc{rand} @cindex random numbers, @code{rand()}/@code{srand()} functions Return a random number. The values of @code{rand()} are uniformly distributed between zero and one. @@ -15509,7 +15727,7 @@ function roll(n) @{ return 1 + int(rand() * n) @} @} @end example -@cindex numbers, random +@cindex seeding random number generator @cindex random numbers, seed of @quotation CAUTION In most @command{awk} implementations, including @command{gawk}, @@ -15525,17 +15743,19 @@ use @code{srand()}. @end quotation @item sin(@var{x}) -@cindex @code{sin()} function +@cindexawkfunc{sin} +@cindex sine Return the sine of @var{x}, with @var{x} in radians. @item sqrt(@var{x}) -@cindex @code{sqrt()} function +@cindexawkfunc{sqrt} +@cindex square root Return the positive square root of @var{x}. @command{gawk} prints a warning message if @var{x} is negative. Thus, @code{sqrt(4)} is 2. @item srand(@r{[}@var{x}@r{]}) -@cindex @code{srand()} function +@cindexawkfunc{srand} Set the starting point, or seed, for generating random numbers to the value @var{x}. @@ -15565,16 +15785,18 @@ sequences of random numbers. @node String Functions @subsection String-Manipulation Functions +@cindex string-manipulation functions -The functions in this @value{SECTION} look at or change the text of one or more -strings. -@code{gawk} understands locales (@pxref{Locales}), and does all string processing in terms of -@emph{characters}, not @emph{bytes}. This distinction is particularly important -to understand for locales where one character -may be represented by multiple bytes. Thus, for example, @code{length()} -returns the number of characters in a string, and not the number of bytes -used to represent those characters, Similarly, @code{index()} works with -character indices, and not byte indices. +The functions in this @value{SECTION} look at or change the text of one +or more strings. + +@code{gawk} understands locales (@pxref{Locales}), and does all +string processing in terms of @emph{characters}, not @emph{bytes}. +This distinction is particularly important to understand for locales +where one character may be represented by multiple bytes. Thus, for +example, @code{length()} returns the number of characters in a string, +and not the number of bytes used to represent those characters. Similarly, +@code{index()} works with character indices, and not byte indices. In the following list, optional parameters are enclosed in square brackets@w{ ([ ]).} Several functions perform string substitution; the full discussion is @@ -15591,30 +15813,34 @@ pound sign@w{ (@samp{#}):} @table @code @item asort(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # +@itemx asorti(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # +@cindexgawkfunc{asorti} +@cindex sort array @cindex arrays, elements, retrieving number of -@cindex @code{asort()} function (@command{gawk}) +@cindexgawkfunc{asort} +@cindex sort array indices +These two functions are similar in behavior, so they are described +together. + +@quotation NOTE +The following description ignores the third argument, @var{how}, since it +requires understanding features that we have not discussed yet. Thus, +the discussion here is a deliberate simplification. (We do provide all +the details later on: @xref{Array Sorting Functions}, for the full story.) +@end quotation + +Both functions return the number of elements in the array @var{source}. +For @command{asort()}, @command{gawk} sorts the values of @var{source} +and replaces the indices of the sorted values of @var{source} with +sequential integers starting with one. If the optional array @var{dest} +is specified, then @var{source} is duplicated into @var{dest}. @var{dest} +is then sorted, leaving the indices of @var{source} unchanged. + @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -Return the number of elements in the array @var{source}. -@command{gawk} sorts the contents of @var{source} -and replaces the indices -of the sorted values of @var{source} with sequential -integers starting with one. If the optional array @var{dest} is specified, -then @var{source} is duplicated into @var{dest}. @var{dest} is then -sorted, leaving the indices of @var{source} unchanged. The optional third -argument @var{how} is a string which controls the rule for comparing values, -and the sort direction. A single space is required between the -comparison mode, @samp{string} or @samp{number}, and the direction specification, -@samp{ascending} or @samp{descending}. You can omit direction and/or mode -in which case it will default to @samp{ascending} and @samp{string}, respectively. -An empty string "" is the same as the default @code{"ascending string"} -for the value of @var{how}. If the @samp{source} array contains subarrays as values, -they will come out last(first) in the @samp{dest} array for @samp{ascending}(@samp{descending}) -order specification. The value of @code{IGNORECASE} affects the sorting. -The third argument can also be a user-defined function name in which case -the value returned by the function is used to order the array elements -before constructing the result array. -@xref{Array Sorting Functions}, for more information. +When comparing strings, @code{IGNORECASE} affects the sorting +(@pxref{Array Sorting Functions}). If the +@var{source} array contains subarrays as values (@pxref{Arrays of +Arrays}), they will come last, after all scalar values. For example, if the contents of @code{a} are as follows: @@ -15640,32 +15866,24 @@ a[2] = "de" a[3] = "sac" @end example -In order to reverse the direction of the sorted results in the above example, -@code{asort()} can be called with three arguments as follows: +The @code{asorti()} function works similarly to @code{asort()}, however, +the @emph{indices} are sorted, instead of the values. Thus, in the +previous example, starting with the same initial set of indices and +values in @code{a}, calling @samp{asorti(a)} would yield: @example -asort(a, a, "descending") +a[1] = "first" +a[2] = "last" +a[3] = "middle" @end example -The @code{asort()} function is described in more detail in -@ref{Array Sorting Functions}. -@code{asort()} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options}). - -@item asorti(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # -@cindex @code{asorti()} function (@command{gawk}) -Return the number of elements in the array @var{source}. -It works similarly to @code{asort()}, however, the @emph{indices} -are sorted, instead of the values. (Here too, -@code{IGNORECASE} affects the sorting.) - -The @code{asorti()} function is described in more detail in -@ref{Array Sorting Functions}. -@code{asorti()} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options}). +@code{asort()} and @code{asorti()} are @command{gawk} extensions; they +are not available in compatibility mode (@pxref{Options}). @item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) # -@cindex @code{gensub()} function (@command{gawk}) +@cindexgawkfunc{gensub} +@cindex search and replace in strings +@cindex substitute in string Search the target string @var{target} for matches of the regular expression @var{regexp}. If @var{how} is a string beginning with @samp{g} or @samp{G} (short for ``global''), then replace all matches of @var{regexp} with @@ -15674,7 +15892,7 @@ which match of @var{regexp} to replace. If no @var{target} is supplied, use @code{$0}. It returns the modified string as the result of the function and the original target string is @emph{not} changed. -@code{gensub()} is a general substitution function. It's purpose is +@code{gensub()} is a general substitution function. Its purpose is to provide more features than the standard @code{sub()} and @code{gsub()} functions. @@ -15728,7 +15946,7 @@ is the original unchanged value of @var{target}. in compatibility mode (@pxref{Options}). @item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{gsub()} function +@cindexawkfunc{gsub} Search @var{target} for @emph{all} of the longest, leftmost, @emph{nonoverlapping} matching substrings it can find and replace them with @var{replacement}. @@ -15750,8 +15968,9 @@ As in @code{sub()}, the characters @samp{&} and @samp{\} are special, and the third argument must be assignable. @item index(@var{in}, @var{find}) -@cindex @code{index()} function -@cindex searching +@cindexawkfunc{index} +@cindex search in string +@cindex find substring in string Search the string @var{in} for the first occurrence of the string @var{find}, and return the position in characters where that occurrence begins in the string @var{in}. Consider the following example: @@ -15768,7 +15987,9 @@ If @var{find} is not found, @code{index()} returns zero. It is a fatal error to use a regexp constant for @var{find}. @item length(@r{[}@var{string}@r{]}) -@cindex @code{length()} function +@cindexawkfunc{length} +@cindex string length +@cindex length of string Return the number of characters in @var{string}. If @var{string} is a number, the length of the digit string representing that number is returned. For example, @code{length("abcde")} is five. By @@ -15776,6 +15997,8 @@ contrast, @code{length(15 * 35)} works out to three. In this example, 15 * 35 = 525, and 525 is then converted to the string @code{"525"}, which has three characters. +@cindex length of input record +@cindex input record, length of If no argument is supplied, @code{length()} returns the length of @code{$0}. @c @cindex historical features @@ -15814,6 +16037,8 @@ warning about this. @cindex common extensions, @code{length()} applied to an array @cindex extensions, common@comma{} @code{length()} applied to an array @cindex differences between @command{gawk} and @command{awk} +@cindex number of array elements +@cindex array, number of elements With @command{gawk} and several other @command{awk} implementations, when given an array argument, the @code{length()} function returns the number of elements in the array. @value{COMMONEXT} @@ -15827,7 +16052,9 @@ If @option{--posix} is supplied, using an array argument is a fatal error (@pxref{Arrays}). @item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]}) -@cindex @code{match()} function +@cindexawkfunc{match} +@cindex string, regular expression match +@cindex match regexp in string Search @var{string} for the longest, leftmost substring matched by the regular expression, @var{regexp} and return the character position, or @dfn{index}, @@ -15942,7 +16169,8 @@ The @var{array} argument to @code{match()} is a using a third argument is a fatal error. @item patsplit(@var{string}, @var{array} @r{[}, @var{fieldpat} @r{[}, @var{seps} @r{]} @r{]}) # -@cindex @code{patsplit()} function (@command{gawk}) +@cindexgawkfunc{patsplit} +@cindex split string into array Divide @var{string} into pieces defined by @var{fieldpat} and store the pieces in @var{array} and the separator strings in the @@ -15973,7 +16201,7 @@ The @code{patsplit()} function is a it is not available. @item split(@var{string}, @var{array} @r{[}, @var{fieldsep} @r{[}, @var{seps} @r{]} @r{]}) -@cindex @code{split()} function +@cindexawkfunc{split} Divide @var{string} into pieces separated by @var{fieldsep} and store the pieces in @var{array} and the separator strings in the @var{seps} array. The first piece is stored in @@ -16002,7 +16230,7 @@ split("cul-de-sac", a, "-", seps) @end example @noindent -@cindex strings, splitting +@cindex strings splitting, example splits the string @samp{cul-de-sac} into three fields using @samp{-} as the separator. It sets the contents of the array @code{a} as follows: @@ -16058,7 +16286,8 @@ If @var{string} does not match @var{fieldsep} at all (but is not null), @var{string}. @item sprintf(@var{format}, @var{expression1}, @dots{}) -@cindex @code{sprintf()} function +@cindexawkfunc{sprintf} +@cindex formatting strings Return (without printing) the string that @code{printf} would have printed out with the same arguments (@pxref{Printf}). @@ -16071,7 +16300,8 @@ pival = sprintf("pi = %.2f (approx.)", 22/7) @noindent assigns the string @w{@samp{pi = 3.14 (approx.)}} to the variable @code{pival}. -@cindex @code{strtonum()} function (@command{gawk}) +@cindexgawkfunc{strtonum} +@cindex convert string to number @item strtonum(@var{str}) # Examine @var{str} and return its numeric value. If @var{str} begins with a leading @samp{0}, @code{strtonum()} assumes that @var{str} @@ -16094,12 +16324,12 @@ you use the @option{--non-decimal-data} option, which isn't recommended. Note also that @code{strtonum()} uses the current locale's decimal point for recognizing numbers (@pxref{Locales}). -@cindex differences in @command{awk} and @command{gawk}, @code{strtonum()} function (@command{gawk}) @code{strtonum()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{sub()} function +@cindexawkfunc{sub} +@cindex replace in string Search @var{target}, which is treated as a string, for the leftmost, longest substring matched by the regular expression @var{regexp}. Modify the entire string @@ -16199,7 +16429,8 @@ Finally, if the @var{regexp} is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match. @item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]}) -@cindex @code{substr()} function +@cindexawkfunc{substr} +@cindex substring Return a @var{length}-character-long substring of @var{string}, starting at character number @var{start}. The first character of a string is character number one.@footnote{This is different from @@ -16213,6 +16444,7 @@ suffix is also returned if @var{length} is greater than the number of characters remaining in the string, counting from character @var{start}. +@cindex Brian Kernighan's @command{awk} If @var{start} is less than one, @code{substr()} treats it as if it was one. (POSIX doesn't specify what to do in this case: Brian Kernighan's @command{awk} acts this way, and therefore @command{gawk} @@ -16255,16 +16487,18 @@ string = substr(string, 1, 2) "CDE" substr(string, 6) @end example @cindex case sensitivity, converting case -@cindex converting, case +@cindex strings, converting letter case @item tolower(@var{string}) -@cindex @code{tolower()} function +@cindexawkfunc{tolower} +@cindex convert string to lower case Return a copy of @var{string}, with each uppercase character in the string replaced with its corresponding lowercase character. Nonalphabetic characters are left unchanged. For example, @code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}. @item toupper(@var{string}) -@cindex @code{toupper()} function +@cindexawkfunc{toupper} +@cindex convert string to upper case Return a copy of @var{string}, with each lowercase character in the string replaced with its corresponding uppercase character. Nonalphabetic characters are left unchanged. For example, @@ -16292,6 +16526,7 @@ and builds an internal copy of it that can be executed. Then there is the runtime level, which is when @command{awk} actually scans the replacement string to determine what to generate. +@cindex Brian Kernighan's @command{awk} At both levels, @command{awk} looks for a defined set of characters that can come after a backslash. At the lexical level, it looks for the escape sequences listed in @ref{Escape Sequences}. @@ -16561,17 +16796,17 @@ _bigskip} The only case where the difference is noticeable is the last one: @samp{\\\\} is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}. -Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules +Starting with version 3.1.4, @command{gawk} followed the POSIX rules when @option{--posix} is specified (@pxref{Options}). Otherwise, it continued to follow the 1996 proposed rules, since that had been its behavior for many years. -When @value{PVERSION} 4.0.0 was released, the @command{gawk} maintainer +When version 4.0.0 was released, the @command{gawk} maintainer made the POSIX rules the default, breaking well over a decade's worth of backwards compatibility.@footnote{This was rather naive of him, despite there being a note in this section indicating that the next major version would move to the POSIX rules.} Needless to say, this was a bad idea, -and as of @value{PVERSION} 4.0.1, @command{gawk} resumed its historical +and as of version 4.0.1, @command{gawk} resumed its historical behavior, and only follows the POSIX rules when @option{--posix} is given. The rules for @code{gensub()} are considerably simpler. At the runtime @@ -16689,14 +16924,16 @@ Although this makes a certain amount of sense, it can be surprising. @node I/O Functions @subsection Input/Output Functions +@cindex input/output functions The following functions relate to input/output (I/O). Optional parameters are enclosed in square brackets ([ ]): @table @code @item close(@var{filename} @r{[}, @var{how}@r{]}) -@cindex @code{close()} function +@cindexawkfunc{close} @cindex files, closing +@cindex close file or coprocess Close the file @var{filename} for input or output. Alternatively, the argument may be a shell command that was used for creating a coprocess, or for redirecting to or from a pipe; then the coprocess or pipe is closed. @@ -16713,7 +16950,8 @@ not matter. which discusses this feature in more detail and gives an example. @item fflush(@r{[}@var{filename}@r{]}) -@cindex @code{fflush()} function +@cindexawkfunc{fflush} +@cindex flush buffered output Flush any buffered output associated with @var{filename}, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess. @@ -16731,11 +16969,12 @@ This is the purpose of the @code{fflush()} function---@command{gawk} also buffers its output and the @code{fflush()} function forces @command{gawk} to flush its buffers. -@code{fflush()} was added to Brian Kernighan's -version of @command{awk} in 1994. -For over two decades, it was not part of the POSIX standard. -As of December, 2012, it was accepted for -inclusion into the POSIX standard. +@cindex extensions, common@comma{} @code{fflush()} function +@cindex Brian Kernighan's @command{awk} +@code{fflush()} was added to Brian Kernighan's version of @command{awk} in +April of 1992. For two decades, it was not part of the POSIX standard. +As of December, 2012, it was accepted for inclusion into the POSIX +standard. See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}. POSIX standardizes @code{fflush()} as follows: If there @@ -16771,7 +17010,8 @@ or if @var{filename} is not an open file, pipe, or coprocess. In such a case, @code{fflush()} returns @minus{}1, as well. @item system(@var{command}) -@cindex @code{system()} function +@cindexawkfunc{system} +@cindex invoke shell command @cindex interacting with other programs Execute the operating-system command @var{command} and then return to the @command{awk} program. @@ -16802,7 +17042,7 @@ close("/bin/sh") @noindent @cindex troubleshooting, @code{system()} function -@cindex @code{--sandbox} option, disabling @code{system()} function +@cindex @option{--sandbox} option, disabling @code{system()} function However, if your @command{awk} program is interactive, @code{system()} is useful for running large self-contained programs, such as a shell or an editor. @@ -16843,7 +17083,7 @@ $ @kbd{awk '@{ print $1 + $2 @}'} @print{} 2 @kbd{2 3} @print{} 5 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @noindent @@ -16854,13 +17094,13 @@ with this example: $ @kbd{awk '@{ print $1 + $2 @}' | cat} @kbd{1 1} @kbd{2 3} -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @print{} 2 @print{} 5 @end example @noindent -Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because +Here, no output is printed until after the @kbd{Ctrl-d} is typed, because it is all buffered and sent down the pipe to @command{cat} in one shot. @docbook @@ -16894,7 +17134,7 @@ $ @kbd{awk '@{ print $1 + $2 @}'} @print{} 2 @kbd{2 3} @print{} 5 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @noindent @@ -16905,13 +17145,13 @@ with this example: $ @kbd{awk '@{ print $1 + $2 @}' | cat} @kbd{1 1} @kbd{2 3} -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @print{} 2 @print{} 5 @end example @noindent -Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because +Here, no output is printed until after the @kbd{Ctrl-d} is typed, because it is all buffered and sent down the pipe to @command{cat} in one shot. @end cartouche @end ifnotdocbook @@ -17046,6 +17286,7 @@ you would see the latter (undesirable) output. @node Time Functions @subsection Time Functions +@cindex time functions @c STARTOFRANGE tst @cindex timestamps @@ -17065,7 +17306,18 @@ it is the number of seconds since 1970-01-01 00:00:00 UTC, not counting leap seconds.@footnote{@xref{Glossary}, especially the entries ``Epoch'' and ``UTC.''} All known POSIX-compliant systems support timestamps from 0 through -@math{2^{31} - 1}, which is sufficient to represent times through +@iftex +@math{2^{31} - 1}, +@end iftex +@ifnottex +@ifnotdocbook +2^31 - 1, +@end ifnotdocbook +@end ifnottex +@docbook +2<superscript>31</superscript> − 1, @c +@end docbook +which is sufficient to represent times through 2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps, including negative timestamps that represent times before the epoch. @@ -17084,7 +17336,8 @@ Optional parameters are enclosed in square brackets ([ ]): @table @code @item mktime(@var{datespec}) -@cindex @code{mktime()} function (@command{gawk}) +@cindexgawkfunc{mktime} +@cindex generate time values Turn @var{datespec} into a timestamp in the same form as is returned by @code{systime()}. It is similar to the function of the same name in ISO C. The argument, @var{datespec}, is a string of the form @@ -17114,7 +17367,8 @@ is out of range, @code{mktime()} returns @minus{}1. @cindex @code{PROCINFO} array @item strftime(@r{[}@var{format} @r{[}, @var{timestamp} @r{[}, @var{utc-flag}@r{]]]}) @c STARTOFRANGE strf -@cindex @code{strftime()} function (@command{gawk}) +@cindexgawkfunc{strftime} +@cindex format time string Format the time specified by @var{timestamp} based on the contents of the @var{format} string and return the result. It is similar to the function of the same name in ISO C. @@ -17131,11 +17385,12 @@ The default string value is @code{@w{"%a %b %e %H:%M:%S %Z %Y"}}. This format string produces output that is equivalent to that of the @command{date} utility. You can assign a new value to @code{PROCINFO["strftime"]} to -change the default format. +change the default format; see below for the various format directives. @item systime() -@cindex @code{systime()} function (@command{gawk}) +@cindexgawkfunc{systime} @cindex timestamps +@cindex current system time Return the current time as the number of seconds since the system epoch. On POSIX systems, this is the number of seconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. @@ -17429,6 +17684,7 @@ gawk 'BEGIN @{ @node Bitwise Functions @subsection Bit-Manipulation Functions +@cindex bit-manipulation functions @c STARTOFRANGE bit @cindex bitwise, operations @c STARTOFRANGE and @@ -17440,8 +17696,8 @@ gawk 'BEGIN @{ @c STARTOFRANGE opbit @cindex operations, bitwise @quotation -@i{I can explain it for you, but I can't understand it for you.}@* -Anonymous +@i{I can explain it for you, but I can't understand it for you.} +@author Anonymous @end quotation Many languages provide the ability to perform @dfn{bitwise} operations @@ -17591,27 +17847,33 @@ bitwise operations just described. They are: @cindex @command{gawk}, bitwise operations in @table @code -@cindex @code{and()} function (@command{gawk}) +@cindexgawkfunc{and} +@cindex bitwise AND @item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise AND of the arguments. There must be at least two. -@cindex @code{compl()} function (@command{gawk}) +@cindexgawkfunc{compl} +@cindex bitwise complement @item compl(@var{val}) Return the bitwise complement of @var{val}. -@cindex @code{lshift()} function (@command{gawk}) +@cindexgawkfunc{lshift} +@cindex left shift @item lshift(@var{val}, @var{count}) Return the value of @var{val}, shifted left by @var{count} bits. -@cindex @code{or()} function (@command{gawk}) +@cindexgawkfunc{or} +@cindex bitwise OR @item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise OR of the arguments. There must be at least two. -@cindex @code{rshift()} function (@command{gawk}) +@cindexgawkfunc{rshift} +@cindex right shift @item rshift(@var{val}, @var{count}) Return the value of @var{val}, shifted right by @var{count} bits. -@cindex @code{xor()} function (@command{gawk}) +@cindexgawkfunc{xor} +@cindex bitwise XOR @item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise XOR of the arguments. There must be at least two. @end table @@ -17703,6 +17965,7 @@ $ @kbd{gawk -f testbits.awk} @cindex strings, converting @cindex numbers, converting @cindex converting, numbers to strings +@cindex number as string of bits The @code{bits2str()} function turns a binary number into a string. The number @code{1} represents a binary value where the rightmost bit is set to 1. Using this mask, @@ -17738,7 +18001,8 @@ that traverses every element of a true multidimensional array (@pxref{Arrays of Arrays}). @table @code -@cindex @code{isarray()} function (@command{gawk}) +@cindexgawkfunc{isarray} +@cindex scalar or array @item isarray(@var{x}) Return a true value if @var{x} is an array. Otherwise return false. @end table @@ -17746,7 +18010,7 @@ Return a true value if @var{x} is an array. Otherwise return false. @code{isarray()} is meant for use in two circumstances. The first is when traversing a multidimensional array: you can test if an element is itself an array or not. The second is inside the body of a user-defined function -(not discussed yet; @pxref{User-defined}), to test if a paramater is an +(not discussed yet; @pxref{User-defined}), to test if a parameter is an array or not. Note, however, that using @code{isarray()} at the global level to test @@ -17760,6 +18024,7 @@ will end up turning it into a scalar. @subsection String-Translation Functions @cindex @command{gawk}, string-translation functions @cindex functions, string-translation +@cindex string-translation functions @cindex internationalization @cindex @command{awk} programs, internationalizing @@ -17771,7 +18036,8 @@ for the full story. Optional parameters are enclosed in square brackets ([ ]): @table @code -@cindex @code{bindtextdomain()} function (@command{gawk}) +@cindexgawkfunc{bindtextdomain} +@cindex set directory of message catalogs @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) Set the directory in which @command{gawk} will look for message translation files, in case they @@ -17784,14 +18050,15 @@ If @var{directory} is the null string (@code{""}), then @code{bindtextdomain()} returns the current binding for the given @var{domain}. -@cindex @code{dcgettext()} function (@command{gawk}) +@cindexgawkfunc{dcgettext} +@cindex translate string @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the translation of @var{string} in text domain @var{domain} for locale category @var{category}. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. -@cindex @code{dcngettext()} function (@command{gawk}) +@cindexgawkfunc{dcngettext} @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -17808,7 +18075,7 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. @section User-Defined Functions @c STARTOFRANGE udfunc -@cindex user-defined, functions +@cindex user-defined functions @c STARTOFRANGE funcud @cindex functions, user-defined Complicated @command{awk} programs can often be simplified by defining @@ -17867,7 +18134,7 @@ have a parameter with the same name as the function itself. In addition, according to the POSIX standard, function parameters cannot have the same name as one of the special built-in variables (@pxref{Built-in Variables}. Not all versions of @command{awk} -enforce this restriction. +enforce this restriction.) The @var{body-of-function} consists of @command{awk} statements. It is the most important part of the definition, because it says what the function @@ -17894,6 +18161,7 @@ conventional to place some extra space between the arguments and the local variables, in order to document how your function is supposed to be used. @cindex variables, shadowing +@cindex shadowing of variable values During execution of the function body, the arguments and local variable values hide, or @dfn{shadow}, any variables of the same names used in the rest of the program. The shadowed variables are not accessible in the @@ -17914,7 +18182,7 @@ function. When this happens, we say the function is @dfn{recursive}. The act of a function calling itself is called @dfn{recursion}. All the built-in functions return a value to their caller. -User-defined functions can do also, using the @code{return} statement, +User-defined functions can do so also, using the @code{return} statement, which is described in detail in @ref{Return Statement}. Many of the subsequent examples in this @value{SECTION} use the @code{return} statement. @@ -17952,6 +18220,7 @@ keyword @code{function} when defining a function. @node Function Example @subsection Function Definition Examples +@cindex function definition example Here is an example of a user-defined function, called @code{myprint()}, that takes a number and prints it in a specific format: @@ -18006,7 +18275,8 @@ Instead of having to repeat this loop everywhere that you need to clear out an array, your program can just call @code{delarray}. (This guarantees portability. The use of @samp{delete @var{array}} to delete -the contents of an entire array is a nonstandard extension.) +the contents of an entire array is a recent@footnote{Late in 2012.} +addition to the POSIX standard.) The following is an example of a recursive function. It takes a string as an input parameter and returns the string in backwards order. @@ -18062,7 +18332,10 @@ function ctime(ts, format) @subsection Calling User-Defined Functions @c STARTOFRANGE fudc -This section describes how to call a user-defined function. +@cindex functions, user-defined, calling +@dfn{Calling a function} means causing the function to run and do its job. +A function call is an expression and its value is the value returned by +the function. @menu * Calling A Function:: Don't use spaces. @@ -18073,11 +18346,6 @@ This section describes how to call a user-defined function. @node Calling A Function @subsubsection Writing A Function Call -@cindex functions, user-defined, calling -@dfn{Calling a function} means causing the function to run and do its job. -A function call is an expression and its value is the value returned by -the function. - A function call consists of the function name followed by the arguments in parentheses. @command{awk} expressions are what you write in the call for the arguments. Each time the call is executed, these @@ -18101,8 +18369,8 @@ an error. @node Variable Scope @subsubsection Controlling Variable Scope -@cindex local variables -@cindex variables, local +@cindex local variables, in a function +@cindex variables, local to a function There is no way to make a variable local to a @code{@{ @dots{} @}} block in @command{awk}, but you can make a variable local to a function. It is good practice to do so whenever a variable is needed only in that @@ -18547,7 +18815,7 @@ character: @example the_func = "sum" -result = @@the_func() # calls the `sum' function +result = @@the_func() # calls the sum() function @end example Here is a full program that processes the previously shown data, @@ -18668,8 +18936,9 @@ We can do something similar using @command{gawk}, like this: @ignore @c file eg/lib/quicksort.awk # -# Arnold Robbins, arnold@skeeve.com, Public Domain +# Arnold Robbins, arnold@@skeeve.com, Public Domain # January 2009 + @c endfile @end ignore @@ -18742,7 +19011,7 @@ or equal to), which yields data sorted in descending order. Next comes a sorting function. It is parameterized with the starting and ending field numbers and the comparison function. It builds an array with -the data and calls @code{quicksort} appropriately, and then formats the +the data and calls @code{quicksort()} appropriately, and then formats the results as a single string: @example @@ -18880,9 +19149,11 @@ it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more manageable, and making programs more readable. -In their seminal 1976 book, @cite{Software Tools}@footnote{Sadly, over 35 +@cindex Kernighan, Brian +@cindex Plauger, P.J.@: +In their seminal 1976 book, @cite{Software Tools},@footnote{Sadly, over 35 years later, many of the lessons taught by this book have yet to be -learned by a vast number of practicing programmers.}, Brian Kernighan +learned by a vast number of practicing programmers.} Brian Kernighan and P.J.@: Plauger wrote: @quotation @@ -19009,7 +19280,7 @@ with the user's program. @cindex underscore (@code{_}), in names of private variables In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables---for example, -@code{_pw_byname} in the user database routines +@code{_pw_byname()} in the user database routines (@pxref{Passwd Functions}). This convention is recommended, since it even further decreases the chance of inadvertent conflict among variable names. Note that this @@ -19028,7 +19299,7 @@ The leading capital letter indicates that it is global, while the fact that the variable name is not all capital letters indicates that the variable is not one of @command{awk}'s built-in variables, such as @code{FS}. -@cindex @code{--dump-variables} option +@cindex @option{--dump-variables} option, using for library functions It is also important that @emph{all} variables in library functions that do not need to save state are, in fact, declared local.@footnote{@command{gawk}'s @option{--dump-variables} command-line @@ -19081,6 +19352,7 @@ programming use. vice versa. * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at once. @end menu @node Strtonum Function @@ -19296,7 +19568,7 @@ An @code{END} rule is automatically added to the program calling @code{assert()}. Normally, if a program consists of just a @code{BEGIN} rule, the input files and/or standard input are not read. However, now that the program has an @code{END} rule, @command{awk} -attempts to read the input @value{DF}s or standard input +attempts to read the input data files or standard input (@pxref{Using BEGIN/END}), most likely causing the program to hang as it waits for input. @@ -19322,9 +19594,9 @@ with an @code{exit} statement. The way @code{printf} and @code{sprintf()} (@pxref{Printf}) perform rounding often depends upon the system's C @code{sprintf()} -subroutine. On many machines, @code{sprintf()} rounding is ``unbiased,'' -which means it doesn't always round a trailing @samp{.5} up, contrary -to naive expectations. In unbiased rounding, @samp{.5} rounds to even, +subroutine. On many machines, @code{sprintf()} rounding is @dfn{unbiased}, +which means it doesn't always round a trailing .5 up, contrary +to naive expectations. In unbiased rounding, .5 rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means that if you are using a format that does rounding (e.g., @code{"%.0f"}), you should check what your system does. The following function does @@ -19373,7 +19645,7 @@ function round(x, ival, aval, fraction) @c don't include test harness in the file that gets installed # test harness -@{ print $0, round($0) @} +# @{ print $0, round($0) @} @end example @node Cliff Random Function @@ -19440,6 +19712,7 @@ reason to build them into the @command{awk} interpreter: @cindex @code{ord()} user-defined function @cindex @code{chr()} user-defined function +@cindex @code{_ord_init()} user-defined function @example @c file eg/lib/ord.awk # ord.awk --- do ord and chr @@ -19486,8 +19759,9 @@ function _ord_init( low, high, i, t) @cindex character sets (machine character encodings) @cindex ASCII @cindex EBCDIC +@cindex Unicode @cindex mark parity -Some explanation of the numbers used by @code{chr} is worthwhile. +Some explanation of the numbers used by @code{_ord_init()} is worthwhile. The most prominent character set in use today is ASCII.@footnote{This is changing; many systems use Unicode, a very large character set that includes ASCII as a subset. On systems with full Unicode support, @@ -19498,7 +19772,7 @@ Although an defines characters that use the values from 0 to 127.@footnote{ASCII has been extended in many countries to use the values from 128 to 255 for country-specific characters. If your system uses these extensions, -you can simplify @code{_ord_init} to loop from 0 to 255.} +you can simplify @code{_ord_init()} to loop from 0 to 255.} In the now distant past, at least one minicomputer manufacturer @c Pr1me, blech @@ -19705,17 +19979,92 @@ A more general design for the @code{getlocaltime()} function would have allowed the user to supply an optional timestamp value to use instead of the current time. +@node Readfile Function +@subsection Reading A Whole File At Once + +Often, it is convenient to have the entire contents of a file available +in memory as a single string. A straightforward but naive way to +do that might be as follows: + +@example +function readfile(file, tmp, contents) +@{ + if ((getline tmp < file) < 0) + return + + contents = tmp + while (getline tmp < file) > 0) + contents = contents RT tmp + + close(file) + return contents +@} +@end example + +This function reads from @code{file} one record at a time, building +up the full contents of the file in the local variable @code{contents}. +It works, but is not necessarily efficient. + +The following function, based on a suggestion by Denis Shirokov, +reads the entire contents of the named file in one shot: + +@cindex @code{readfile()} user-defined function +@example +@c file eg/lib/readfile.awk +# readfile.awk --- read an entire file at once +@c endfile +@ignore +@c file eg/lib/readfile.awk +# +# Original idea by Denis Shirokov, cosmogen@@gmail.com, April 2013 +# +@c endfile +@end ignore +@c file eg/lib/readfile.awk + +function readfile(file, tmp, save_rs) +@{ + save_rs = RS + RS = "^$" + getline tmp < file + close(file) + RS = save_rs + + return tmp +@} +@c endfile +@end example + +It works by setting @code{RS} to @samp{^$}, a regular expression that +will never match if the file has contents. @command{gawk} reads data from +the file into @code{tmp} attempting to match @code{RS}. The match fails +after each read, but fails quickly, such that @command{gawk} fills +@code{tmp} with the entire contents of the file. +(@xref{Records}, for information on @code{RT} and @code{RS}.) + +In the case that @code{file} is empty, the return value is the null +string. Thus calling code may use something like: + +@example +contents = readfile("/some/path") +if (length(contents) == 0) + # file was empty @dots{} +@end example + +This tests the result to see if it is empty or not. An equivalent +test would be @samp{contents == ""}. + @node Data File Management -@section @value{DDF} Management +@section Data File Management @c STARTOFRANGE dataf @cindex files, managing @c STARTOFRANGE libfdataf -@cindex libraries of @command{awk} functions, managing, @value{DF}s +@cindex libraries of @command{awk} functions, managing, data files @c STARTOFRANGE flibdataf -@cindex functions, library, managing @value{DF}s +@cindex functions, library, managing data files This @value{SECTION} presents functions that are useful for managing -command-line @value{DF}s. +command-line data files. @menu * Filetrans Function:: A function for handling data file transitions. @@ -19726,16 +20075,16 @@ command-line @value{DF}s. @end menu @node Filetrans Function -@subsection Noting @value{DDF} Boundaries +@subsection Noting Data File Boundaries -@cindex files, managing, @value{DF} boundaries +@cindex files, managing, data file boundaries @cindex files, initialization and cleanup The @code{BEGIN} and @code{END} rules are each executed exactly once at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the -@code{BEGIN} rule is executed at the beginning of each @value{DF} and the -@code{END} rule is executed at the end of each @value{DF}. +@code{BEGIN} rule is executed at the beginning of each data file and the +@code{END} rule is executed at the end of each data file. When informed that this was not the case, the user requested that we add new special @@ -19746,7 +20095,7 @@ Adding these special patterns to @command{gawk} wasn't necessary; the job can be done cleanly in @command{awk} itself, as illustrated by the following library program. It arranges to call two user-supplied functions, @code{beginfile()} and -@code{endfile()}, at the beginning and end of each @value{DF}. +@code{endfile()}, at the beginning and end of each data file. Besides solving the problem in only nine(!) lines of code, it does so @emph{portably}; this works with any implementation of @command{awk}: @@ -19777,17 +20126,17 @@ This file must be loaded before the user's ``main'' program, so that the rule it supplies is executed first. This rule relies on @command{awk}'s @code{FILENAME} variable that -automatically changes for each new @value{DF}. The current @value{FN} is +automatically changes for each new data file. The current file name is saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does -not equal @code{_oldfilename}, then a new @value{DF} is being processed and +not equal @code{_oldfilename}, then a new data file is being processed and it is necessary to call @code{endfile()} for the old file. Because @code{endfile()} should only be called if a file has been processed, the program first checks to make sure that @code{_oldfilename} is not the null -string. The program then assigns the current @value{FN} to +string. The program then assigns the current file name to @code{_oldfilename} and calls @code{beginfile()} for the file. Because, like all @command{awk} variables, @code{_oldfilename} is initialized to the null string, this rule executes correctly even for the -first @value{DF}. +first data file. The program also supplies an @code{END} rule to do the final processing for the last file. Because this @code{END} rule comes before any @code{END} rules @@ -19796,7 +20145,7 @@ again the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile()} user-defined function @cindex @code{endfile()} user-defined function -If the same @value{DF} occurs twice in a row on the command line, then +If the same data file occurs twice in a row on the command line, then @code{endfile()} and @code{beginfile()} are not executed at the end of the first pass and at the beginning of the second pass. The following version solves the problem: @@ -19940,12 +20289,12 @@ The @code{rewind()} function also relies on the @code{nextfile} keyword (@pxref{Nextfile Statement}). @node File Checking -@subsection Checking for Readable @value{DDF}s +@subsection Checking for Readable Data Files -@cindex troubleshooting, readable @value{DF}s -@cindex readable @value{DF}s@comma{} checking +@cindex troubleshooting, readable data files +@cindex readable data files@comma{} checking @cindex files, skipping -Normally, if you give @command{awk} a @value{DF} that isn't readable, +Normally, if you give @command{awk} a data file that isn't readable, it stops with a fatal error. There are times when you might want to just ignore such files and keep going. You can do this by prepending the following program to your @command{awk} @@ -19994,15 +20343,15 @@ This is a by-product of @command{awk}'s implicit read-a-record-and-match-against-the-rules loop: when @command{awk} tries to read a record from an empty file, it immediately receives an end of file indication, closes the file, and proceeds on to the next -command-line @value{DF}, @emph{without} executing any user-level +command-line data file, @emph{without} executing any user-level @command{awk} program code. Using @command{gawk}'s @code{ARGIND} variable (@pxref{Built-in Variables}), it is possible to detect when an empty -@value{DF} has been skipped. Similar to the library file presented +data file has been skipped. Similar to the library file presented in @ref{Filetrans Function}, the following library file calls a function named @code{zerofile()} that the user must provide. The arguments passed are -the @value{FN} and the position in @code{ARGV} where it was found: +the file name and the position in @code{ARGV} where it was found: @cindex @code{zerofile.awk} program @example @@ -20090,15 +20439,15 @@ END @{ @end ignore @node Ignoring Assigns -@subsection Treating Assignments as @value{FFN}s +@subsection Treating Assignments as File Names @cindex assignments as filenames @cindex filenames, assignments as Occasionally, you might not want @command{awk} to process command-line variable assignments (@pxref{Assignment Options}). -In particular, if you have a @value{FN} that contain an @samp{=} character, -@command{awk} treats the @value{FN} as an assignment, and does not process it. +In particular, if you have a file name that contains an @samp{=} character, +@command{awk} treats the file name as an assignment, and does not process it. Some users have suggested an additional command-line option for @command{gawk} to disable command-line assignments. However, some simple programming with @@ -20142,7 +20491,7 @@ awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk * The function works by looping through the arguments. It prepends @samp{./} to any argument that matches the form -of a variable assignment, turning that argument into a @value{FN}. +of a variable assignment, turning that argument into a file name. The use of @code{No_command_assign} allows you to disable command-line assignments at invocation time, by giving the variable a true value. @@ -20309,7 +20658,7 @@ The discussion that follows walks through the code a bit at a time: # <c> a character representing the current option # Private Data: -# _opti -- index in multi-flag option, e.g., -abc +# _opti -- index in multiflag option, e.g., -abc @c endfile @end example @@ -20501,7 +20850,7 @@ After @code{getopt()} is through, it is the responsibility of the user level code to clear out all the elements of @code{ARGV} from 1 to @code{Optind}, so that @command{awk} does not try to process the command-line options -as @value{FN}s. +as file names. @end quotation Several of the sample programs presented in @@ -20518,7 +20867,7 @@ use @code{getopt()} to process their arguments. @c STARTOFRANGE libfudata @cindex libraries of @command{awk} functions, user database, reading @c STARTOFRANGE flibudata -@cindex functions, library, user database, reading +@cindex functions, library, user database@comma{} reading @c STARTOFRANGE udatar @cindex user database@comma{} reading @c STARTOFRANGE dataur @@ -20767,7 +21116,7 @@ from anywhere within a user's program, and the user may have his or her own way of splitting records and fields. -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, testing the field splitting The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which is @code{"FIELDWIDTHS"} if field splitting is being done with @code{FIELDWIDTHS}. This makes it possible to restore the correct @@ -20776,7 +21125,7 @@ field-splitting mechanism later. The test can only be true for or on some other @command{awk} implementation. The code that checks for using @code{FPAT}, using @code{using_fpat} -and @code{PROCINFO["FS"]} is similar. +and @code{PROCINFO["FS"]}, is similar. The main part of the function uses a loop to read database lines, split the line into fields, and then store the line into each array as necessary. @@ -20806,10 +21155,9 @@ function getpwnam(name) @end example @cindex @code{getpwuid()} function (C library) -Similarly, -the @code{getpwuid} function takes a user ID number argument. If that -user number is in the database, it returns the appropriate line. Otherwise, it -returns the null string: +Similarly, the @code{getpwuid()} function takes a user ID number +argument. If that user number is in the database, it returns the +appropriate line. Otherwise, it returns the null string: @cindex @code{getpwuid()} user-defined function @example @@ -20886,12 +21234,12 @@ uses these functions. @c STARTOFRANGE libfgdata @cindex libraries of @command{awk} functions, group database, reading @c STARTOFRANGE flibgdata -@cindex functions, library, group database, reading +@cindex functions, library, group database@comma{} reading @c STARTOFRANGE gdatar @cindex group database, reading @c STARTOFRANGE datagr @cindex database, group, reading -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and group membership @cindex @code{getgrent()} function (C library) @cindex @code{getgrent()} user-defined function @cindex groups@comma{} information about @@ -21313,7 +21661,7 @@ index and value, use the indirect function call syntax and the value. When calling @code{walk_array()}, you would pass the name of a user-defined -function that expects to receive and index and a value, and then processes +function that expects to receive an index and a value, and then processes the element. @@ -21375,7 +21723,7 @@ awk -f @var{program} -- @var{options} @var{files} @noindent Here, @var{program} is the name of the @command{awk} program (such as @file{cut.awk}), @var{options} are any command-line options for the -program that start with a @samp{-}, and @var{files} are the actual @value{DF}s. +program that start with a @samp{-}, and @var{files} are the actual data files. If your system supports the @samp{#!} executable interpreter mechanism (@pxref{Executable Scripts}), @@ -21580,7 +21928,7 @@ spaces. Also remember that after @code{getopt()} is through we have to clear out all the elements of @code{ARGV} from 1 to @code{Optind}, so that @command{awk} does not try to process the command-line options -as @value{FN}s. +as file names. After dealing with the command-line options, the program verifies that the options make sense. Only one or the other of @option{-c} and @option{-f} @@ -21667,7 +22015,7 @@ complete field list, including filler fields: @example @c file eg/prog/cut.awk -function set_charlist( field, i, j, f, g, t, +function set_charlist( field, i, j, f, g, n, m, t, filler, last, len) @{ field = 1 # count total fields @@ -21764,6 +22112,7 @@ of picking the input line apart by characters. @cindex searching, files for regular expressions @c STARTOFRANGE fsregexp @cindex files, searching for regular expressions +@c STARTOFRANGE egrep @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} @@ -21776,8 +22125,8 @@ egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{} The @var{pattern} is a regular expression. In typical usage, the regular expression is quoted to prevent the shell from expanding any of the -special characters as @value{FN} wildcards. Normally, @command{egrep} -prints the lines that matched. If multiple @value{FN}s are provided on +special characters as file name wildcards. Normally, @command{egrep} +prints the lines that matched. If multiple file names are provided on the command line, each output line is preceded by the name of the file and a colon. @@ -21868,7 +22217,7 @@ pattern is supplied with @option{-e}, the first nonoption on the command line is used. The @command{awk} command-line arguments up to @code{ARGV[Optind]} are cleared, so that @command{awk} won't try to process them as files. If no files are specified, the standard input is used, and if multiple files are -specified, we make sure to note this so that the @value{FN}s can precede the +specified, we make sure to note this so that the file names can precede the matched lines in the output: @example @@ -21966,9 +22315,9 @@ A number of additional tests are made, but they are only done if we are not counting lines. First, if the user only wants exit status (@code{no_print} is true), then it is enough to know that @emph{one} line in this file matched, and we can skip on to the next file with -@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can -print the @value{FN}, and then skip to the next file with @code{nextfile}. -Finally, each line is printed, with a leading @value{FN} and colon +@code{nextfile}. Similarly, if we are only printing file names, we can +print the file name, and then skip to the next file with @code{nextfile}. +Finally, each line is printed, with a leading file name and colon if necessary: @cindex @code{!} (exclamation point), @code{!} operator @@ -22049,12 +22398,14 @@ or not. @c ENDOFRANGE regexps @c ENDOFRANGE sfregexp @c ENDOFRANGE fsregexp +@c ENDOFRANGE egrep @node Id Program @subsection Printing out User Information @cindex printing, user information @cindex users, information about, printing +@c STARTOFRANGE id @cindex @command{id} utility The @command{id} utility lists a user's real and effective user ID numbers, real and effective group ID numbers, and the user's group set, if any. @@ -22067,7 +22418,7 @@ $ @kbd{id} @print{} uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy) @end example -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and user and group ID numbers This information is part of what is provided by @command{gawk}'s @code{PROCINFO} array (@pxref{Built-in Variables}). However, the @command{id} utility provides a more palatable output than just @@ -22168,7 +22519,6 @@ BEGIN \ @c endfile @end example -@cindex @code{in} operator The test in the @code{for} loop is worth noting. Any supplementary groups in the @code{PROCINFO} array have the indices @code{"group1"} through @code{"group@var{N}"} for some @@ -22178,7 +22528,7 @@ there are. This loop works by starting at one, concatenating the value with @code{"group"}, and then using @code{in} to see if that value is -in the array. Eventually, @code{i} is incremented past +in the array (@pxref{Reference to Elements}). Eventually, @code{i} is incremented past the last group in the array and the loop exits. The loop is also correct if there are @emph{no} supplementary @@ -22191,6 +22541,7 @@ The POSIX version of @command{id} takes arguments that control which information is printed. Modify this version to accept the same arguments and perform in the same way. @end ignore +@c ENDOFRANGE id @node Split Program @subsection Splitting a Large File into Pieces @@ -22199,6 +22550,7 @@ arguments and perform in the same way. @c STARTOFRANGE filspl @cindex files, splitting +@c STARTOFRANGE split @cindex @code{split} utility The @command{split} program splits large text files into smaller pieces. Usage is as follows:@footnote{This is the traditional usage. The @@ -22216,7 +22568,7 @@ number of lines in each file, supply a number on the command line preceded with a minus; e.g., @samp{-500} for files with 500 lines in them instead of 1000. To change the name of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional -argument that specifies the @value{FN} prefix. +argument that specifies the file name prefix. Here is a version of @command{split} in @command{awk}. It uses the @code{ord()} and @code{chr()} functions presented in @@ -22226,8 +22578,8 @@ The program first sets its defaults, and then tests to make sure there are not too many arguments. It then looks at each argument in turn. The first argument could be a minus sign followed by a number. If it is, this happens to look like a negative number, so it is made positive, and that is the -count of lines. The data @value{FN} is skipped over and the final argument -is used as the prefix for the output @value{FN}s: +count of lines. The data file name is skipped over and the final argument +is used as the prefix for the output file names: @cindex @code{split.awk} program @example @@ -22276,7 +22628,7 @@ BEGIN @{ The next rule does most of the work. @code{tcount} (temporary count) tracks how many lines have been printed to the output file so far. If it is greater than @code{count}, it is time to close the current file and start a new one. -@code{s1} and @code{s2} track the current suffixes for the @value{FN}. If +@code{s1} and @code{s2} track the current suffixes for the file name. If they are both @samp{z}, the file is just too big. Otherwise, @code{s1} moves to the next letter in the alphabet and @code{s2} starts over again at @samp{a}: @@ -22342,12 +22694,14 @@ which isn't true for EBCDIC systems. @c Exercise: Fix these problems. @c BFD... @c ENDOFRANGE filspl +@c ENDOFRANGE split @node Tee Program @subsection Duplicating Output into Multiple Files @cindex files, multiple@comma{} duplicating output into @cindex output, duplicating into files +@c STARTOFRANGE tee @cindex @code{tee} utility The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies its standard input to its standard output and also duplicates it to the @@ -22364,13 +22718,13 @@ The @code{BEGIN} rule first makes a copy of all the command-line arguments into an array named @code{copy}. @code{ARGV[0]} is not copied, since it is not needed. @code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to -process each @value{FN} in @code{ARGV} as input data. +process each file name in @code{ARGV} as input data. @cindex flag variables If the first argument is @option{-a}, then the flag variable @code{append} is set to true, and both @code{ARGV[1]} and @code{copy[1]} are deleted. If @code{ARGC} is less than two, then no -@value{FN}s were supplied and @code{tee} prints a usage message and exits. +file names were supplied and @code{tee} prints a usage message and exits. Finally, @command{awk} is forced to read the standard input by setting @code{ARGV[1]} to @code{"-"} and @code{ARGC} to two: @@ -22462,6 +22816,7 @@ END \ @} @c endfile @end example +@c ENDOFRANGE tee @node Uniq Program @subsection Printing Nonduplicated Lines of Text @@ -22472,6 +22827,7 @@ END \ @cindex printing, unduplicated lines of text @c STARTOFRANGE tpul @cindex text@comma{} printing, unduplicated lines of +@c STARTOFRANGE uniq @cindex @command{uniq} utility The @command{uniq} utility reads sorted lines of data on its standard input, and by default removes duplicate lines. In other words, it only @@ -22723,6 +23079,7 @@ END @{ @end example @c ENDOFRANGE prunt @c ENDOFRANGE tpul +@c ENDOFRANGE uniq @node Wc Program @subsection Counting Things @@ -22739,6 +23096,7 @@ END @{ @cindex characters, counting @c STARTOFRANGE lico @cindex lines, counting +@c STARTOFRANGE wc @cindex @command{wc} utility The @command{wc} (word count) utility counts lines, words, and characters in one or more input files. Its usage is as follows: @@ -22832,7 +23190,7 @@ BEGIN @{ @end example The @code{beginfile()} function is simple; it just resets the counts of lines, -words, and characters to zero, and saves the current @value{FN} in +words, and characters to zero, and saves the current file name in @code{fname}: @example @@ -22854,7 +23212,7 @@ you will see that @code{FNR} has already been reset by the time @code{endfile()} is called.} It then prints out those numbers for the file that was just read. It relies on @code{beginfile()} to reset the -numbers for the following @value{DF}: +numbers for the following data file: @c FIXME: ONE DAY: make the above footnote an exercise, @c instead of giving away the answer. @@ -22921,6 +23279,7 @@ END @{ @c ENDOFRANGE lico @c ENDOFRANGE woco @c ENDOFRANGE chco +@c ENDOFRANGE wc @c ENDOFRANGE posimawk @node Miscellaneous Programs @@ -23022,8 +23381,34 @@ word, comparing it to the previous one: @cindex insomnia, cure for @cindex Robbins, Arnold @quotation -@i{Nothing cures insomnia like a ringing alarm clock.}@* -Arnold Robbins +@i{Nothing cures insomnia like a ringing alarm clock.} +@author Arnold Robbins +@end quotation +@cindex Quanstrom, Erik +@ignore +Date: Sat, 15 Feb 2014 16:47:09 -0500 +Subject: Re: 9atom install question +Message-ID: <l2jcvx6j6mey60xnrkb0hhob.1392500829294@email.android.com> +From: Erik Quanstrom <quanstro@quanstro.net> +To: Aharon Robbins <arnold@skeeve.com> + +yes. + +- erik + +Aharon Robbins <arnold@skeeve.com> wrote: + +>> sleep is for web developers. +> +>Can I quote you, in the gawk manual? +> +>Thanks, +> +>Arnold +@end ignore +@quotation +@i{Sleep is for web developers.} +@author Erik Quanstrom @end quotation @c STARTOFRANGE tialarm @@ -23189,6 +23574,7 @@ seconds are necessary: @c STARTOFRANGE chtra @cindex characters, transliterating +@c STARTOFRANGE tr @cindex @command{tr} utility The system @command{tr} utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: @@ -23199,12 +23585,10 @@ often used to map uppercase letters into lowercase for further processing: @command{tr} requires two lists of characters.@footnote{On some older systems, -@ifset ORA including Solaris, -@end ifset @command{tr} may require that the lists be written as range expressions enclosed in square brackets (@samp{[a-z]}) and quoted, -to prevent the shell from attempting a @value{FN} expansion. This is +to prevent the shell from attempting a file name expansion. This is not a feature.} When processing the input, the first character in the first list is replaced with the first character in the second list, the second character in the first list is replaced with the second @@ -23339,6 +23723,7 @@ An obvious improvement to this program would be to set up the assumes that the ``from'' and ``to'' lists will never change throughout the lifetime of the program. @c ENDOFRANGE chtra +@c ENDOFRANGE tr @node Labels Program @subsection Printing Mailing Labels @@ -23398,6 +23783,7 @@ that there are two blank lines at the top and two blank lines at the bottom. The @code{END} rule arranges to flush the final page of labels; there may not have been an even multiple of 20 labels in the data: +@c STARTOFRANGE labels @cindex @code{labels.awk} program @example @c file eg/prog/labels.awk @@ -23465,6 +23851,7 @@ END \ @end example @c ENDOFRANGE prml @c ENDOFRANGE mlprint +@c ENDOFRANGE labels @node Word Sorting @subsection Generating Word-Usage Counts @@ -23531,6 +23918,7 @@ to remove punctuation characters. Finally, we solve the third problem by using the system @command{sort} utility to process the output of the @command{awk} script. Here is the new version of the program: +@c STARTOFRANGE wordfreq @cindex @code{wordfreq.awk} program @example @c file eg/prog/wordfreq.awk @@ -23592,6 +23980,7 @@ have true pipes at the command-line (or batch-file) level. See the general operating system documentation for more information on how to use the @command{sort} program. @c ENDOFRANGE worus +@c ENDOFRANGE wordfreq @node History Sorting @subsection Removing Duplicates from Unsorted Text @@ -23602,7 +23991,7 @@ The @command{uniq} program (@pxref{Uniq Program}), removes duplicate lines from @emph{sorted} data. -Suppose, however, you need to remove duplicate lines from a @value{DF} but +Suppose, however, you need to remove duplicate lines from a data file but that you want to preserve the order the lines are in. A good example of this might be a shell history file. The history file keeps a copy of all the commands you have entered, and it is not unusual to repeat a command @@ -23621,6 +24010,7 @@ Each element of @code{lines} is a unique command, and the indices of The @code{END} rule simply prints out the lines, in order: @cindex Rakitzis, Byron +@c STARTOFRANGE histsort @cindex @code{histsort.awk} program @example @c file eg/prog/histsort.awk @@ -23663,6 +24053,7 @@ print data[lines[i]], lines[i] This works because @code{data[$0]} is incremented each time a line is seen. @c ENDOFRANGE lidu +@c ENDOFRANGE histsort @node Extract Program @subsection Extracting Programs from Texinfo Source Files @@ -23694,7 +24085,8 @@ printed and online documentation. @ifnotinfo Texinfo is fully documented in the book @cite{Texinfo---The GNU Documentation Format}, -available from the Free Software Foundation. +available from the Free Software Foundation, +and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}. @end ifnotinfo @ifinfo The Texinfo language is described fully, starting with @@ -23738,7 +24130,7 @@ Lines containing @samp{@@group} and @samp{@@end group} are simply removed. (@pxref{Join Function}). The example programs in the online Texinfo source for @cite{@value{TITLE}} -(@file{gawk.texi}) have all been bracketed inside @samp{file} and +(@file{gawktexi.in}) have all been bracketed inside @samp{file} and @samp{endfile} lines. The @command{gawk} distribution uses a copy of @file{extract.awk} to extract the sample programs and install many of them in a standard directory where @command{gawk} can find them. @@ -23772,6 +24164,7 @@ The first rule handles calling @code{system()}, checking that a command is given (@code{NF} is at least three) and also checking that the command exits with a zero exit status, signifying OK: +@c STARTOFRANGE extract @cindex @code{extract.awk} program @example @c file eg/prog/extract.awk @@ -23821,7 +24214,7 @@ screen. @end ifnottex The second rule handles moving data into files. It verifies that a -@value{FN} is given in the directive. If the file named is not the +file name is given in the directive. If the file named is not the current file, then the current file is closed. Keeping the current file open until a new file is encountered allows the use of the @samp{>} redirection for printing the contents, keeping open file management @@ -23903,7 +24296,7 @@ subsequent output is appended to the file (@pxref{Redirection}). This makes it easy to mix program text and explanatory prose for the same sample source file (as has been done here!) without any hassle. The file is -only closed when a new data @value{FN} is encountered or at the end of the +only closed when a new data file name is encountered or at the end of the input file. Finally, the function @code{@w{unexpected_eof()}} prints an appropriate @@ -23930,6 +24323,7 @@ END @{ @end example @c ENDOFRANGE texse @c ENDOFRANGE fitex +@c ENDOFRANGE extract @node Simple Sed @subsection A Simple Stream Editor @@ -23955,10 +24349,11 @@ Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp The following program, @file{awksed.awk}, accepts at least two command-line arguments: the pattern to look for and the text to replace it with. Any -additional arguments are treated as data @value{FN}s to process. If none +additional arguments are treated as data file names to process. If none are provided, the standard input is used: @cindex Brennan, Michael +@c STARTOFRANGE awksed @cindex @command{awksed.awk} program @c @cindex simple stream editor @c @cindex stream editor, simple @@ -24028,7 +24423,7 @@ The @code{BEGIN} rule handles the setup, checking for the right number of arguments and calling @code{usage()} if there is a problem. Then it sets @code{RS} and @code{ORS} from the command-line arguments and sets @code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are -not treated as @value{FN}s +not treated as file names (@pxref{ARGC and ARGV}). The @code{usage()} function prints an error message and exits. @@ -24055,6 +24450,7 @@ Exercise: what are the advantages and disadvantages of this version versus sed? Others? @end ignore +@c ENDOFRANGE awksed @node Igawk Program @subsection An Easy Way to Use Library Functions @@ -24126,7 +24522,7 @@ Literal text, provided with @option{--source} or @option{--source=}. This text is just appended directly. @item -Source @value{FN}s, provided with @option{-f}. We use a neat trick and append +Source file names, provided with @option{-f}. We use a neat trick and append @samp{@@include @var{filename}} to the shell variable's contents. Since the file-inclusion program works the way @command{gawk} does, this gets the text of the file included into the program at the correct point. @@ -24139,7 +24535,7 @@ shell variable. @item Run the expanded program with @command{gawk} and any other original command-line -arguments that the user supplied (such as the data @value{FN}s). +arguments that the user supplied (such as the data file names). @end enumerate This program uses shell variables extensively: for storing command-line arguments, @@ -24170,7 +24566,7 @@ programming trick. Don't worry about it if you are not familiar with These are saved and passed on to @command{gawk}. @item -f@r{,} --file@r{,} --file=@r{,} -Wfile= -The @value{FN} is appended to the shell variable @code{program} with an +The file name is appended to the shell variable @code{program} with an @samp{@@include} statement. The @command{expr} utility is used to remove the leading option part of the argument (e.g., @samp{--file=}). @@ -24198,6 +24594,7 @@ program. The program is as follows: +@c STARTOFRANGE igawk @cindex @code{igawk.sh} program @example @c file eg/prog/igawk.sh @@ -24294,10 +24691,10 @@ is stored in the shell variable @code{expand_prog}. Doing this keeps the shell script readable. The @command{awk} program reads through the user's program, one line at a time, using @code{getline} (@pxref{Getline}). The input -@value{FN}s and @samp{@@include} statements are managed using a stack. -As each @samp{@@include} is encountered, the current @value{FN} is +file names and @samp{@@include} statements are managed using a stack. +As each @samp{@@include} is encountered, the current file name is ``pushed'' onto the stack and the file named in the @samp{@@include} -directive becomes the current @value{FN}. As each file is finished, +directive becomes the current file name. As each file is finished, the stack is ``popped,'' and the previous input file becomes the current input file again. The process is started by making the original file the first one on the stack. @@ -24306,16 +24703,16 @@ The @code{pathto()} function does the work of finding the full path to a file. It simulates @command{gawk}'s behavior when searching the @env{AWKPATH} environment variable (@pxref{AWKPATH Variable}). -If a @value{FN} has a @samp{/} in it, no path search is done. -Similarly, if the @value{FN} is @code{"-"}, then that string is +If a file name has a @samp{/} in it, no path search is done. +Similarly, if the file name is @code{"-"}, then that string is used as-is. Otherwise, -the @value{FN} is concatenated with the name of each directory in -the path, and an attempt is made to open the generated @value{FN}. +the file name is concatenated with the name of each directory in +the path, and an attempt is made to open the generated file name. The only way to test if a file can be read in @command{awk} is to go ahead and try to read it with @code{getline}; this is what @code{pathto()} does.@footnote{On some very old versions of @command{awk}, the test @samp{getline junk < t} can loop forever if the file exists but is empty. -Caveat emptor.} If the file can be read, it is closed and the @value{FN} +Caveat emptor.} If the file can be read, it is closed and the file name is returned: @ignore @@ -24370,17 +24767,17 @@ BEGIN @{ @c endfile @end example -The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}. +The stack is initialized with @code{ARGV[1]}, which will be @samp{/dev/stdin}. The main loop comes next. Input lines are read in succession. Lines that do not start with @samp{@@include} are printed verbatim. -If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}. +If the line does start with @samp{@@include}, the file name is in @code{$2}. @code{pathto()} is called to generate the full path. If it cannot, then the program prints an error message and continues. The next thing to check is if the file is included already. The -@code{processed} array is indexed by the full @value{FN} of each included +@code{processed} array is indexed by the full file name of each included file and it tracks this information for us. If the file is -seen again, a warning message is printed. Otherwise, the new @value{FN} is +seen again, a warning message is printed. Otherwise, the new file name is pushed onto the stack and processing continues. Finally, when @code{getline} encounters the end of the input file, the file @@ -24458,10 +24855,10 @@ options and command-line arguments that the user supplied. @c this causes more problems than it solves, so leave it out. @ignore -The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk} +The special file @file{/dev/null} is passed as a data file to @command{gawk} to handle an interesting case. Suppose that the user's program only has -a @code{BEGIN} rule and there are no @value{DF}s to read. -The program should exit without reading any @value{DF}s. +a @code{BEGIN} rule and there are no data files to read. +The program should exit without reading any data files. However, suppose that an included library file defines an @code{END} rule of its own. In this case, @command{gawk} will hang, reading standard input. In order to avoid this, @file{/dev/null} is explicitly added to the @@ -24557,10 +24954,12 @@ statements for the desired library functions. @c ENDOFRANGE libfex @c ENDOFRANGE flibex @c ENDOFRANGE awkpex +@c ENDOFRANGE igawk @node Anagram Program @subsection Finding Anagrams From A Dictionary +@cindex anagrams, finding An interesting programming challenge is to search for @dfn{anagrams} in a word list (such as @@ -24580,6 +24979,7 @@ The following program uses arrays of arrays to bring together words with the same signature and array sorting to print the words in sorted order. +@c STARTOFRANGE anagram @cindex @code{anagram.awk} program @example @c file eg/prog/anagram.awk @@ -24687,10 +25087,13 @@ babels beslab babery yabber @dots{} @end example +@c ENDOFRANGE anagram @node Signature Program @subsection And Now For Something Completely Different +@cindex signature program +@cindex Brini, Davide The following program was written by Davide Brini @c (@email{dave_br@@gmx.com}) and is published on @uref{http://backreference.org/2011/02/03/obfuscated-awk/, @@ -24822,12 +25225,15 @@ It contains the following chapters: @item @ref{Dynamic Extensions}. +@end itemize @end ifdocbook @end ignore @node Advanced Features @chapter Advanced Features of @command{gawk} -@cindex advanced features, network connections, See Also networks, connections +@ifset WITH_NETWORK_CHAPTER +@cindex advanced features, network connections, See Also networks@comma{} connections +@end ifset @c STARTOFRANGE gawadv @cindex @command{gawk}, features, advanced @c STARTOFRANGE advgaw @@ -24842,8 +25248,8 @@ who knows where you live." @end ignore @quotation @i{Write documentation as if whoever reads it is -a violent psychopath who knows where you live.}@* -Steve English, as quoted by Peter Langston +a violent psychopath who knows where you live.} +@author Steve English, as quoted by Peter Langston @end quotation This @value{CHAPTER} discusses advanced features in @command{gawk}. @@ -24893,7 +25299,7 @@ discusses the ability to dynamically add new built-in functions to @node Nondecimal Data @section Allowing Nondecimal Input Data -@cindex @code{--non-decimal-data} option +@cindex @option{--non-decimal-data} option @cindex advanced features, nondecimal input data @cindex input, data@comma{} nondecimal @cindex constants, nondecimal @@ -24937,7 +25343,7 @@ using this facility could lead to surprising results, the default is to leave it disabled. If you want it, you must explicitly request it. @cindex programming conventions, @code{--non-decimal-data} option -@cindex @code{--non-decimal-data} option, @code{strtonum()} function and +@cindex @option{--non-decimal-data} option, @code{strtonum()} function and @cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and @quotation CAUTION @emph{Use of this option is not recommended.} @@ -25162,7 +25568,7 @@ ordered data: @example function cmp_randomize(i1, v1, i2, v2) @{ - # random order + # random order (caution: this may never terminate!) return (2 - 4 * rand()) @} @end example @@ -25177,7 +25583,7 @@ with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, so consider it only if necessary. The following comparison functions force a deterministic order, and are based on the fact that the -indices of two elements are never equal: +(string) indices of two elements are never equal: @example function cmp_numeric(i1, v1, i2, v2) @@ -25234,17 +25640,16 @@ sorted array traversal is not the default. @subsection Sorting Array Values and Indices with @command{gawk} @cindex arrays, sorting -@cindex @code{asort()} function (@command{gawk}) +@cindexgawkfunc{asort} @cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting +@cindexgawkfunc{asorti} +@cindex @code{asorti()} function (@command{gawk}), arrays@comma{} sorting @cindex sort function, arrays, sorting -In most @command{awk} implementations, sorting an array requires -writing a @code{sort()} function. -While this can be educational for exploring different sorting algorithms, -usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort()} -and @code{asorti()} functions -(@pxref{String Functions}) -for sorting arrays. For example: +In most @command{awk} implementations, sorting an array requires writing +a @code{sort()} function. While this can be educational for exploring +different sorting algorithms, usually that's not the point of the program. +@command{gawk} provides the built-in @code{asort()} and @code{asorti()} +functions (@pxref{String Functions}) for sorting arrays. For example: @example @var{populate the array} data @@ -25257,7 +25662,7 @@ After the call to @code{asort()}, the array @code{data} is indexed from 1 to some number @var{n}, the total number of elements in @code{data}. (This count is @code{asort()}'s return value.) @code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. -The comparison is based on the type of the elements +The default comparison is based on the type of the elements (@pxref{Typing and Comparison}). All numeric values come before all string values, which in turn come before all subarrays. @@ -25279,24 +25684,11 @@ In this case, @command{gawk} copies the @code{source} array into the @code{dest} array and then sorts @code{dest}, destroying its indices. However, the @code{source} array is not affected. -@code{asort()} accepts a third string argument to control comparison of -array elements. As with @code{PROCINFO["sorted_in"]}, this argument -may be one of the predefined names that @command{gawk} provides -(@pxref{Controlling Scanning}), or the name of a user-defined function -(@pxref{Controlling Array Traversal}). - -@quotation NOTE -In all cases, the sorted element values consist of the original -array's element values. The ability to control comparison merely -affects the way in which they are sorted. -@end quotation - Often, what's needed is to sort on the values of the @emph{indices} -instead of the values of the elements. -To do that, use the -@code{asorti()} function. The interface is identical to that of -@code{asort()}, except that the index values are used for sorting, and -become the values of the result array: +instead of the values of the elements. To do that, use the +@code{asorti()} function. The interface and behavior are identical to +that of @code{asort()}, except that the index values are used for sorting, +and become the values of the result array: @example @{ source[$0] = some_func($0) @} @@ -25313,29 +25705,40 @@ END @{ @} @end example -Similar to @code{asort()}, -in all cases, the sorted element values consist of the original -array's indices. The ability to control comparison merely -affects the way in which they are sorted. +So far, so good. Now it starts to get interesting. Both @code{asort()} +and @code{asorti()} accept a third string argument to control comparison +of array elements. In @ref{String Functions}, we ignored this third +argument; however, the time has now come to describe how this argument +affects these two functions. + +Basically, the third argument specifies how the array is to be sorted. +There are two possibilities. As with @code{PROCINFO["sorted_in"]}, +this argument may be one of the predefined names that @command{gawk} +provides (@pxref{Controlling Scanning}), or it may be the name of a +user-defined function (@pxref{Controlling Array Traversal}). -Sorting the array by replacing the indices provides maximal flexibility. -To traverse the elements in decreasing order, use a loop that goes from -@var{n} down to 1, either over the elements or over the indices.@footnote{You -may also use one of the predefined sorting names that sorts in -decreasing order.} +In the latter case, @emph{the function can compare elements in any way +it chooses}, taking into account just the indices, just the values, +or both. This is extremely powerful. + +Once the array is sorted, @code{asort()} takes the @emph{values} in +their final order, and uses them to fill in the result array, whereas +@code{asorti()} takes the @emph{indices} in their final order, and uses +them to fill in the result array. @cindex reference counting, sorting arrays +@quotation NOTE Copying array indices and elements isn't expensive in terms of memory. Internally, @command{gawk} maintains @dfn{reference counts} to data. For example, when @code{asort()} copies the first array to the second one, there is only one copy of the original array elements' data, even though both arrays use the values. +@end quotation @c Document It And Call It A Feature. Sigh. @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -@cindex arrays, sorting, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array sorting and +@cindex arrays, sorting, and @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, and array sorting functions Because @code{IGNORECASE} affects string comparisons, the value of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. Note also that the locale's sorting order does @emph{not} @@ -25414,7 +25817,7 @@ open a @emph{two-way} pipe to another process. The second process is termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}. The two-way connection is created using the @samp{|&} operator (borrowed from the Korn shell, @command{ksh}):@footnote{This is very -different from the same operator in the C shell.} +different from the same operator in the C shell and in Bash.} @example do @{ @@ -25504,7 +25907,7 @@ As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} command ensures traditional Unix (ASCII) sorting from @command{sort}. @cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and communications via ptys You may also use pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element @@ -25555,10 +25958,10 @@ another process on another system across an IP network connection. You can think of this as just a @emph{very long} two-way pipeline to a coprocess. The way @command{gawk} decides that you want to use TCP/IP networking is -by recognizing special @value{FN}s that begin with one of @samp{/inet/}, +by recognizing special file names that begin with one of @samp{/inet/}, @samp{/inet4/} or @samp{/inet6}. -The full syntax of the special @value{FN} is +The full syntax of the special file name is @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}. The components are: @@ -25647,7 +26050,7 @@ When @command{gawk} has finished running, it creates a profile of your program i named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than @command{gawk} normally does. -@cindex @code{--profile} option +@cindex @option{--profile} option As shown in the following example, the @option{--profile} option can be used to change the name of the file where @command{gawk} will write the profile: @@ -25702,52 +26105,60 @@ foo junk @end example -Here is the @file{awkprof.out} that results from running the @command{gawk} -profiler on this program and data (this example also illustrates that @command{awk} -programmers sometimes have to work late): +Here is the @file{awkprof.out} that results from running the +@command{gawk} profiler on this program and data. (This example also +illustrates that @command{awk} programmers sometimes get up very early +in the morning to work.) -@cindex @code{BEGIN} pattern -@cindex @code{END} pattern +@cindex @code{BEGIN} pattern, and profiling +@cindex @code{END} pattern, and profiling @example - # gawk profile, created Sun Aug 13 00:00:15 2000 + # gawk profile, created Thu Feb 27 05:16:21 2014 - # BEGIN block(s) + # BEGIN block(s) - BEGIN @{ - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - @} + BEGIN @{ + 1 print "First BEGIN rule" + @} + + BEGIN @{ + 1 print "Second BEGIN rule" + @} - # Rule(s) + # Rule(s) - 5 /foo/ @{ # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) @{ - 6 sing() - @} - @} + 5 /foo/ @{ # 2 + 2 print "matched /foo/, gosh" + 6 for (i = 1; i <= 3; i++) @{ + 6 sing() + @} + @} - 5 @{ - 5 if (/foo/) @{ # 2 - 2 print "if is true" - 3 @} else @{ - 3 print "else is true" - @} - @} + 5 @{ + 5 if (/foo/) @{ # 2 + 2 print "if is true" + 3 @} else @{ + 3 print "else is true" + @} + @} - # END block(s) + # END block(s) - END @{ - 1 print "First END rule" - 1 print "Second END rule" - @} + END @{ + 1 print "First END rule" + @} - # Functions, listed alphabetically + END @{ + 1 print "Second END rule" + @} - 6 function sing(dummy) - @{ - 6 print "I gotta be me!" - @} + + # Functions, listed alphabetically + + 6 function sing(dummy) + @{ + 6 print "I gotta be me!" + @} @end example This example illustrates many of the basic features of profiling output. @@ -25755,15 +26166,16 @@ They are as follows: @itemize @bullet @item -The program is printed in the order @code{BEGIN} rule, -@code{BEGINFILE} rule, +The program is printed in the order @code{BEGIN} rules, +@code{BEGINFILE} rules, pattern/action rules, -@code{ENDFILE} rule, @code{END} rule and functions, listed +@code{ENDFILE} rules, @code{END} rules and functions, listed alphabetically. -Multiple @code{BEGIN} and @code{END} rules are merged together, -as are multiple @code{BEGINFILE} and @code{ENDFILE} rules. +Multiple @code{BEGIN} and @code{END} rules retain their +separate identities, as do +multiple @code{BEGINFILE} and @code{ENDFILE} rules. -@cindex patterns, counts +@cindex patterns, counts, in a profile @item Pattern-action rules have two counts. The first count, to the left of the rule, shows how many times @@ -25783,7 +26195,7 @@ is a count showing how many times the condition was true. The count for the @code{else} indicates how many times the test failed. -@cindex loops, count for header +@cindex loops, count for header, in a profile @item The count for a loop header (such as @code{for} or @code{while}) shows how many times the loop test was executed. @@ -25791,8 +26203,8 @@ or @code{while}) shows how many times the loop test was executed. statement in a rule to determine how many times the rule was executed. If the first statement is a loop, the count is misleading.) -@cindex functions, user-defined, counts -@cindex user-defined, functions, counts +@cindex functions, user-defined, counts, in a profile +@cindex user-defined, functions, counts, in a profile @item For user-defined functions, the count next to the @code{function} keyword indicates how many times the function was called. @@ -25806,8 +26218,8 @@ The layout uses ``K&R'' style with TABs. Braces are used everywhere, even when the body of an @code{if}, @code{else}, or loop is only a single statement. -@cindex @code{()} (parentheses) -@cindex parentheses @code{()} +@cindex @code{()} (parentheses), in a profile +@cindex parentheses @code{()}, in a profile @item Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. @@ -25842,8 +26254,8 @@ typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce a standard representation. The disadvantage is that all source-code -comments are lost, as are the distinctions among multiple @code{BEGIN}, -@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: +comments are lost. +Also, things such as: @example /foo/ @@ -25863,6 +26275,7 @@ which is correct, but possibly surprising. @cindex profiling @command{awk} programs, dynamically @cindex @command{gawk} program, dynamic profiling +@cindex dynamic profiling Besides creating profiles when a program has completed, @command{gawk} can produce a profile while it is running. This is useful if your @command{awk} program goes into an @@ -25876,9 +26289,9 @@ $ @kbd{gawk --profile -f myprog &} @end example @cindex @command{kill} command@comma{} dynamic profiling -@cindex @code{USR1} signal -@cindex @code{SIGUSR1} signal -@cindex signals, @code{USR1}/@code{SIGUSR1} +@cindex @code{USR1} signal, for dynamic profiling +@cindex @code{SIGUSR1} signal, for dynamic profiling +@cindex signals, @code{USR1}/@code{SIGUSR1}, for profiling @noindent The shell prints a job number and process ID number; in this case, 13992. Use the @command{kill} command to send the @code{USR1} signal @@ -25909,9 +26322,9 @@ You may send @command{gawk} the @code{USR1} signal as many times as you like. Each time, the profile and function call trace are appended to the output profile file. -@cindex @code{HUP} signal -@cindex @code{SIGHUP} signal -@cindex signals, @code{HUP}/@code{SIGHUP} +@cindex @code{HUP} signal, for dynamic profiling +@cindex @code{SIGHUP} signal, for dynamic profiling +@cindex signals, @code{HUP}/@code{SIGHUP}, for profiling If you use the @code{HUP} signal instead of the @code{USR1} signal, @command{gawk} produces the profile and the function call trace and then exits. @@ -25927,12 +26340,17 @@ the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the -@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the -@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. +@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the +@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key. Finally, @command{gawk} also accepts another option, @option{--pretty-print}. When called this way, @command{gawk} ``pretty prints'' the program into @file{awkprof.out}, without any execution counts. + +@quotation NOTE +The @option{--pretty-print} option still runs your program. +This will change in the next major release. +@end quotation @c ENDOFRANGE advgaw @c ENDOFRANGE gawadv @c ENDOFRANGE awkp @@ -26044,6 +26462,7 @@ lookup of the translations. @cindex @code{.po} files @cindex files, @code{.po} +@c STARTOFRANGE portobfi @cindex portable object files @cindex files, portable object @item @@ -26055,6 +26474,7 @@ For example, there might be a @file{fr.po} for a French translation. @cindex @code{.gmo} files @cindex files, @code{.gmo} @cindex message object files +@c STARTOFRANGE portmsgfi @cindex files, message object @item Each language's @file{.po} file is converted into a binary @@ -26202,7 +26622,7 @@ String constants marked with a leading underscore are candidates for translation at runtime. String constants without a leading underscore are not translated. -@cindex @code{dcgettext()} function (@command{gawk}) +@cindexgawkfunc{dcgettext} @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the translation of @var{string} in text domain @var{domain} for locale category @var{category}. @@ -26228,7 +26648,7 @@ chosen to be simple and to allow for reasonable @command{awk}-style default arguments. @end quotation -@cindex @code{dcngettext()} function (@command{gawk}) +@cindexgawkfunc{dcngettext} @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -26244,7 +26664,7 @@ The same remarks about argument order as for the @code{dcgettext()} function app @cindex files, @code{.gmo}, specifying directory of @cindex message object files, specifying directory of @cindex files, message object, specifying directory of -@cindex @code{bindtextdomain()} function (@command{gawk}) +@cindexgawkfunc{bindtextdomain} @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) Change the directory in which @code{gettext} looks for @file{.gmo} files, in case they @@ -26346,7 +26766,7 @@ and use translations from @command{awk}. @cindex portable object files @cindex files, portable object Once a program's translatable strings have been marked, they must -be extracted to create the initial @file{.po} file. +be extracted to create the initial @file{.pot} file. As part of translation, it is often helpful to rearrange the order in which arguments to @code{printf} are output. @@ -26366,13 +26786,13 @@ is covered. @subsection Extracting Marked Strings @cindex strings, extracting @cindex marked strings@comma{} extracting -@cindex @code{--gen-pot} option +@cindex @option{--gen-pot} option @cindex command-line options, string extraction @cindex string extraction (internationalization) @cindex marked string extraction (internationalization) @cindex extraction, of marked strings (internationalization) -@cindex @code{--gen-pot} option +@cindex @option{--gen-pot} option Once your @command{awk} program is working, and all the strings have been marked and you've set (and perhaps bound) the text domain, it is time to produce translations. @@ -26395,6 +26815,8 @@ second argument to @code{dcngettext()}.@footnote{The @xref{I18N Example}, for the full list of steps to go through to create and test translations for @command{guide}. +@c ENDOFRANGE portobfi +@c ENDOFRANGE portmsgfi @node Printf Ordering @subsection Rearranging @code{printf} Arguments @@ -26441,7 +26863,7 @@ example, @samp{string} is the first argument and @samp{length(string)} is the se @example $ @kbd{gawk 'BEGIN @{} > @kbd{string = "Dont Panic"} -> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} +> @kbd{printf "%2$d characters live in \"%1$s\"\n",} > @kbd{string, length(string)} > @kbd{@}'} @print{} 10 characters live in "Dont Panic" @@ -26475,7 +26897,7 @@ This is somewhat counterintuitive. and those with positional specifiers in the same string: @example -$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} +$ @kbd{gawk 'BEGIN @{ printf "%d %3$s\n", 1, 2, "hi" @}'} @error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none @end example @@ -26720,7 +27142,7 @@ complete detail in @cite{GNU gettext tools}.) @end ifnotinfo As of this writing, the latest version of GNU @code{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.2.1.tar.gz, @value{PVERSION} 0.18.2.1}. +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.2.1.tar.gz, version 0.18.2.1}. If a translation of @command{gawk}'s messages exists, then @command{gawk} produces usage messages, warnings, @@ -26816,6 +27238,7 @@ The following list defines terms used throughout the rest of this @value{CHAPTER}. @table @dfn +@cindex stack frame @item Stack Frame Programs generally call functions during the course of their execution. One function can call another, or a function can call itself (recursion). @@ -26837,6 +27260,7 @@ invoked. Commands that print the call stack print information about each stack frame (as detailed later on). @item Breakpoint +@cindex breakpoint During debugging, you often wish to let the program run until it reaches a certain point, and then continue execution from there one statement (or instruction) at a time. The way to do this is to set @@ -26846,6 +27270,7 @@ take over control of the program's execution. You can add and remove as many breakpoints as you like. @item Watchpoint +@cindex watchpoint A watchpoint is similar to a breakpoint. The difference is that breakpoints are oriented around the code: stop when a certain point in the code is reached. A watchpoint, however, specifies that program execution @@ -26877,6 +27302,7 @@ by the higher-level @command{awk} commands. @node Sample Debugging Session @section Sample Debugging Session +@cindex sample debugging session In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample debugging session. We will use the @command{awk} implementation of the @@ -26890,13 +27316,16 @@ as our example. @node Debugger Invocation @subsection How to Start the Debugger +@cindex starting the debugger +@cindex debugger, how to start -Starting the debugger is almost exactly like running @command{awk}, except you have to -pass an additional option @option{--debug} or the corresponding short option @option{-D}. -The file(s) containing the program and any supporting code are given on the command -line as arguments to one or more @option{-f} options. (@command{gawk} is not designed -to debug command-line programs, only programs contained in files.) In our case, -we invoke the debugger like this: +Starting the debugger is almost exactly like running @command{gawk}, +except you have to pass an additional option @option{--debug} or the +corresponding short option @option{-D}. The file(s) containing the +program and any supporting code are given on the command line as arguments +to one or more @option{-f} options. (@command{gawk} is not designed +to debug command-line programs, only programs contained in files.) +In our case, we invoke the debugger like this: @example $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile} @@ -27029,7 +27458,7 @@ gawk> @kbd{p NR} @noindent So we can see that @code{are_equal()} was only called for the second record -of the file. Of course, this is because our program contained a rule for +of the file. Of course, this is because our program contains a rule for @samp{NR == 1}: @example @@ -27229,21 +27658,24 @@ controlling breakpoints are: @cindex debugger commands, @code{break} @cindex @code{break} debugger command @cindex @code{b} debugger command (alias for @code{break}) +@cindex set breakpoint +@cindex breakpoint, setting @item @code{break} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] @itemx @code{b} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] Without any argument, set a breakpoint at the next instruction to be executed in the selected stack frame. Arguments can be one of the following: +@c @asis for docbook @c nested table -@table @var -@item n +@table @asis +@item @var{n} Set a breakpoint at line number @var{n} in the current source file. -@item filename@code{:}n +@item @var{filename}@code{:}@var{n} Set a breakpoint at line number @var{n} in source file @var{filename}. -@item function +@item @var{function} Set a breakpoint at entry to (the first instruction of) function @var{function}. @end table @@ -27259,6 +27691,8 @@ it continues executing the program. @cindex debugger commands, @code{clear} @cindex @code{clear} debugger command +@cindex delete breakpoint at location +@cindex breakpoint at location, how to delete @item @code{clear} [[@var{filename}@code{:}]@var{n} | @var{function}] Without any argument, delete any breakpoint at the next instruction to be executed in the selected stack frame. If the program stops at @@ -27266,19 +27700,20 @@ a breakpoint, this deletes that breakpoint so that the program does not stop at that location again. Arguments can be one of the following: @c nested table -@table @var -@item n +@table @asis +@item @var{n} Delete breakpoint(s) set at line number @var{n} in the current source file. -@item filename@code{:}n +@item @var{filename}@code{:}@var{n} Delete breakpoint(s) set at line number @var{n} in source file @var{filename}. -@item function +@item @var{function} Delete breakpoint(s) set at entry to function @var{function}. @end table @cindex debugger commands, @code{condition} @cindex @code{condition} debugger command +@cindex breakpoint condition @item @code{condition} @var{n} @code{"@var{expression}"} Add a condition to existing breakpoint or watchpoint @var{n}. The condition is an @command{awk} expression that the debugger evaluates @@ -27292,6 +27727,8 @@ watchpoint is made unconditional. @cindex debugger commands, @code{delete} @cindex @code{delete} debugger command @cindex @code{d} debugger command (alias for @code{delete}) +@cindex delete breakpoint by number +@cindex breakpoint, delete by number @item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Delete specified breakpoints or a range of breakpoints. Deletes @@ -27299,6 +27736,8 @@ all defined breakpoints if no argument is supplied. @cindex debugger commands, @code{disable} @cindex @code{disable} debugger command +@cindex disable breakpoint +@cindex breakpoint, how to disable or enable @item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] Disable specified breakpoints or a range of breakpoints. Without any argument, disables all breakpoints. @@ -27307,6 +27746,7 @@ any argument, disables all breakpoints. @cindex debugger commands, @code{enable} @cindex @code{enable} debugger command @cindex @code{e} debugger command (alias for @code{enable}) +@cindex enable breakpoint @item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Enable specified breakpoints or a range of breakpoints. Without @@ -27326,6 +27766,7 @@ the program stops at the breakpoint. @cindex debugger commands, @code{ignore} @cindex @code{ignore} debugger command +@cindex ignore breakpoint @item @code{ignore} @var{n} @var{count} Ignore breakpoint number @var{n} the next @var{count} times it is hit. @@ -27334,6 +27775,7 @@ hit. @cindex debugger commands, @code{tbreak} @cindex @code{tbreak} debugger command @cindex @code{t} debugger command (alias for @code{tbreak}) +@cindex temporary breakpoint @item @code{tbreak} [[@var{filename}@code{:}]@var{n} | @var{function}] @itemx @code{t} [[@var{filename}@code{:}]@var{n} | @var{function}] Set a temporary breakpoint (enabled for only one stop). @@ -27354,6 +27796,8 @@ execution of the program than we saw in our earlier example: @cindex @code{silent} debugger command @cindex debugger commands, @code{end} @cindex @code{end} debugger command +@cindex breakpoint commands +@cindex commands to execute at breakpoint @item @code{commands} [@var{n}] @itemx @code{silent} @itemx @dots{} @@ -27381,6 +27825,7 @@ gawk> @cindex debugger commands, @code{c} (@code{continue}) @cindex debugger commands, @code{continue} +@cindex continue program, in debugger @item @code{continue} [@var{count}] @itemx @code{c} [@var{count}] Resume program execution. If continued from a breakpoint and @var{count} is @@ -27397,6 +27842,7 @@ Print the returned value. @cindex debugger commands, @code{next} @cindex @code{next} debugger command @cindex @code{n} debugger command (alias for @code{next}) +@cindex single-step execution, in the debugger @item @code{next} [@var{count}] @itemx @code{n} [@var{count}] Continue execution to the next source line, stepping over function calls. @@ -27491,6 +27937,7 @@ items on the list. @cindex debugger commands, @code{eval} @cindex @code{eval} debugger command +@cindex evaluate expressions, in debugger @item @code{eval "@var{awk statements}"} Evaluate @var{awk statements} in the context of the running program. You can do anything that an @command{awk} program would do: assign @@ -27508,6 +27955,7 @@ parameters defined by the program. @cindex debugger commands, @code{print} @cindex @code{print} debugger command @cindex @code{p} debugger command (alias for @code{print}) +@cindex print variables, in debugger @item @code{print} @var{var1}[@code{,} @var{var2} @dots{}] @itemx @code{p} @var{var1}[@code{,} @var{var2} @dots{}] Print the value of a @command{gawk} variable or field. @@ -27541,6 +27989,7 @@ No newline is printed unless one is specified. @cindex debugger commands, @code{set} @cindex @code{set} debugger command +@cindex assign values to variables, in debugger @item @code{set} @var{var}@code{=}@var{value} Assign a constant (number or string) value to an @command{awk} variable or field. @@ -27553,6 +28002,7 @@ You can also set special @command{awk} variables, such as @code{FS}, @cindex debugger commands, @code{watch} @cindex @code{watch} debugger command @cindex @code{w} debugger command (alias for @code{watch}) +@cindex set watchpoint @item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] @itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] Add variable @var{var} (or field @code{$@var{n}}) to the watch list. @@ -27569,12 +28019,14 @@ then the debugger stops execution and prompts for a command. Otherwise, @cindex debugger commands, @code{undisplay} @cindex @code{undisplay} debugger command +@cindex stop automatic display, in debugger @item @code{undisplay} [@var{n}] Remove item number @var{n} (or all items, if no argument) from the automatic display list. @cindex debugger commands, @code{unwatch} @cindex @code{unwatch} debugger command +@cindex delete watchpoint @item @code{unwatch} [@var{n}] Remove item number @var{n} (or all items, if no argument) from the watch list. @@ -27595,12 +28047,14 @@ functions which called the one you are in. The commands for doing this are: @cindex debugger commands, @code{backtrace} @cindex @code{backtrace} debugger command @cindex @code{bt} debugger command (alias for @code{backtrace}) +@cindex call stack, display in debugger +@cindex traceback, display in debugger @item @code{backtrace} [@var{count}] @itemx @code{bt} [@var{count}] Print a backtrace of all function calls (stack frames), or innermost @var{count} frames if @var{count} > 0. Print the outermost @var{count} frames if @var{count} < 0. The backtrace displays the name and arguments to each -function, the source @value{FN}, and the line number. +function, the source file name, and the line number. @cindex debugger commands, @code{down} @cindex @code{down} debugger command @@ -27648,25 +28102,32 @@ The value for @var{what} should be one of the following: @c nested table @table @code @item args +@cindex show function arguments, in debugger Arguments of the selected frame. @item break +@cindex show breakpoints List all currently set breakpoints. @item display +@cindex automatic displays, in debugger List all items in the automatic display list. @item frame +@cindex describe call stack frame, in debugger Description of the selected stack frame. @item functions +@cindex list function definitions, in debugger List all function definitions including source file names and line numbers. @item locals +@cindex show local variables, in debugger Local variables of the selected frame. @item source +@cindex show name of current source file, in debugger The name of the current source file. Each time the program stops, the current source file is the file containing the current instruction. When the debugger first starts, the current source file is the first file @@ -27675,12 +28136,15 @@ included via the @option{-f} option. The be used at any time to change the current source. @item sources +@cindex show all source files, in debugger List all program sources. @item variables +@cindex list all global variables, in debugger List all global variables. @item watch +@cindex show watchpoints List all items in the watch list. @end table @end table @@ -27694,6 +28158,8 @@ from a file. The commands are: @cindex debugger commands, @code{option} @cindex @code{option} debugger command @cindex @code{o} debugger command (alias for @code{option}) +@cindex display debugger options +@cindex debugger options @item @code{option} [@var{name}[@code{=}@var{value}]] @itemx @code{o} [@var{name}[@code{=}@var{value}]] Without an argument, display the available debugger options @@ -27705,38 +28171,46 @@ The available options are: @c nested table @table @code @item history_size +@cindex debugger history size The maximum number of lines to keep in the history file @file{./.gawk_history}. The default is 100. @item listsize +@cindex debugger default list amount The number of lines that @code{list} prints. The default is 15. @item outfile +@cindex redirect @command{gawk} output, in debugger Send @command{gawk} output to a file; debugger output still goes to standard output. An empty string (@code{""}) resets output to standard output. @item prompt +@cindex debugger prompt The debugger prompt. The default is @samp{@w{gawk> }}. @item save_history @r{[}on @r{|} off@r{]} +@cindex debugger history file Save command history to file @file{./.gawk_history}. The default is @code{on}. @item save_options @r{[}on @r{|} off@r{]} +@cindex save debugger options Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. Options are read back in to the next session upon startup. @item trace @r{[}on @r{|} off@r{]} +@cindex instruction tracing, in debugger Turn instruction tracing on or off. The default is @code{off}. @end table @item @code{save} @var{filename} -Save the commands from the current session to the given @value{FN}, +Save the commands from the current session to the given file name, so that they can be replayed using the @command{source} command. @item @code{source} @var{filename} +@cindex debugger, read commands from a file Run command(s) from a file; an error in any command does not terminate execution of subsequent commands. Comments (lines starting with @samp{#}) are allowed in a command file. @@ -27835,8 +28309,8 @@ about the command @var{command}. @cindex debugger commands, @code{list} @cindex @code{list} debugger command @cindex @code{l} debugger command (alias for @code{list}) -@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}] -@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}] +@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}] +@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}] Print the specified lines (default 15) from the current source file or the file named @var{filename}. The possible arguments to @code{list} are as follows: @@ -27856,7 +28330,7 @@ Print lines centered around line number @var{n}. @item @var{n}--@var{m} Print lines from @var{n} to @var{m}. -@item @var{filename@code{:}n} +@item @var{filename}@code{:}@var{n} Print lines centered around line number @var{n} in source file @var{filename}. This command may change the current source file. @@ -27869,6 +28343,7 @@ function @var{function}. This command may change the current source file. @cindex debugger commands, @code{quit} @cindex @code{quit} debugger command @cindex @code{q} debugger command (alias for @code{quit}) +@cindex exit the debugger @item @code{quit} @itemx @code{q} Exit the debugger. Debugging is great fun, but sometimes we all have @@ -27892,6 +28367,8 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while @node Readline Support @section Readline Support +@cindex command completion, in debugger +@cindex history expansion, in debugger If @command{gawk} is compiled with the @code{readline} library, you can take advantage of that library's command completion and history expansion @@ -27901,8 +28378,8 @@ features. The following types of completion are available: @item Command completion Command names. -@item Source @value{FN} completion -Source @value{FN}s. Relevant commands are +@item Source file name completion +Source file names. Relevant commands are @code{break}, @code{clear}, @code{list}, @@ -27979,9 +28456,7 @@ be added, and of course feel free to try to add them yourself! @cindex arbitrary precision @cindex multiple precision @cindex infinite precision -@cindex floating-point numbers, arbitrary precision -@cindex MPFR -@cindex GMP +@cindex floating-point, numbers@comma{} arbitrary precision @cindex Knuth, Donald @quotation @@ -27990,11 +28465,11 @@ to believe. Novice computer users solve this problem by implicitly trusting in the computer as an infallible authority; they tend to believe that all digits of a printed answer are significant. Disillusioned computer users have just the opposite approach; they are constantly afraid that their answers -are almost meaningless.}@* -Donald Knuth@footnote{Donald E.@: Knuth. +are almost meaningless.}@footnote{Donald E.@: Knuth. @cite{The Art of Computer Programming}. Volume 2, @cite{Seminumerical Algorithms}, third edition, 1998, ISBN 0-201-89683-4, p.@: 229.} +@author Donald Knuth @end quotation This @value{CHAPTER} discusses issues that you may encounter @@ -28132,7 +28607,7 @@ This makes it clear that the full numeric value is different from what the default string representations show. @code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to +at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, most of the time, 17 digits is enough to capture a floating-point number's @@ -28161,7 +28636,7 @@ $ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} @print{} 0000051580 515.82 @print{} 0000051582 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @noindent @@ -28325,23 +28800,38 @@ then the answer is @math{2^{53}}. @end iftex @ifnottex +@ifnotdocbook 2^53. +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript>. @c +@end docbook The next representable number is the even number @iftex @math{2^{53} + 2}, @end iftex @ifnottex +@ifnotdocbook 2^53 + 2, +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript> + 2, @c +@end docbook meaning it is unlikely that you will be able to make @command{gawk} print @iftex @math{2^{53} + 1} @end iftex @ifnottex +@ifnotdocbook 2^53 + 1 +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript> + 1 @c +@end docbook in integer format. The range of integers exactly representable by a 64-bit double is @@ -28349,8 +28839,13 @@ is @math{[-2^{53}, 2^{53}]}. @end iftex @ifnottex +@ifnotdocbook [@minus{}2^53, 2^53]. +@end ifnotdocbook @end ifnottex +@docbook +[−2<superscript>53</superscript>, 2<superscript>53</superscript>]. @c +@end docbook If you ever see an integer outside this range in @command{awk} using 64-bit doubles, you have reason to be very suspicious about the accuracy of the output. Here is a simple program with erroneous output: @@ -28574,8 +29069,13 @@ number is then @math{s @cdot 2^e}. @end iftex @ifnottex +@ifnotdocbook @var{s * 2^e}. +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>s ⋅ 2<superscript>e</superscript></emphasis>. @c +@end docbook The first bit of a non-zero binary significand is always one, so the significand in an IEEE-754 format only includes the fractional part, leaving the leading one implicit. @@ -28745,6 +29245,8 @@ when you change the rounding mode. @node Gawk and MPFR @section @command{gawk} + MPFR = Powerful Arithmetic +@cindex MPFR +@cindex GMP The rest of this @value{CHAPTER} describes how to use the arbitrary precision (also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric @@ -28757,12 +29259,17 @@ The easiest way to find out is to look at the output of the following command: @example -$ @kbd{gawk --version} -@print{} GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) -@print{} Copyright (C) 1989, 1991-2013 Free Software Foundation. +$ @kbd{./gawk --version} +@print{} GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) +@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation. @dots{} @end example +@noindent +(You may see different version numbers than what's shown here. That's OK; +what's important is to see that GNU MPFR and GNU MP are listed in +the output.) + @command{gawk} uses the @uref{http://www.mpfr.org, GNU MPFR} and @@ -28816,8 +29323,13 @@ numbers are not implemented.} (@math{emax = 2^{30} - 1, emin = -emax}) @end iftex @ifnottex +@ifnotdocbook (@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnotdocbook @end ifnottex +@docbook +(<emphasis>emax</emphasis> = 2<superscript>30</superscript> − 1, <emphasis>emin</emphasis> = −<emphasis>emax</emphasis>) @c +@end docbook for all floating-point contexts. There is no explicit mechanism to adjust the exponent range. MPFR does not implement subnormal numbers by default, @@ -28849,6 +29361,7 @@ your program. @node Setting Precision @subsection Setting the Working Precision @cindex @code{PREC} variable +@cindex setting working precision @command{gawk} uses a global working precision; it does not keep track of the precision or accuracy of individual numbers. Performing an arithmetic @@ -28888,8 +29401,15 @@ formula: @math{prec = 3.322 @cdot dps} @end iftex @ifnottex +@ifnotdocbook @var{prec} = 3.322 * @var{dps} +@end ifnotdocbook @end ifnottex +@docbook +<para> +<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis> @c +</para> +@end docbook @noindent Here, @var{prec} denotes the binary precision @@ -28924,6 +29444,7 @@ issues that occur because numbers are stored internally in binary. @node Setting Rounding Mode @subsection Setting the Rounding Mode @cindex @code{ROUNDMODE} variable +@cindex setting rounding mode The @code{ROUNDMODE} variable provides program level control over the rounding mode. @@ -28991,6 +29512,7 @@ In the first case, the number is stored with the default precision of 53 bits. @node Changing Precision @subsection Changing the Precision of a Number +@cindex changing precision of a number @cindex Laurie, Dirk @quotation @@ -29001,11 +29523,10 @@ floating-point format to a precision lower than working precision. Do we promote them to full membership of the high-precision club, or do we treat them and all their associates as second-class citizens? Sometimes the first course is proper, sometimes the second, and it takes -careful analysis to tell which.} - -Dirk Laurie@footnote{Dirk Laurie. +careful analysis to tell which.}@footnote{Dirk Laurie. @cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@author Dirk Laurie @end quotation @command{gawk} does not implicitly modify the precision of any previously @@ -29109,7 +29630,8 @@ the problem at hand is often the correct approach in such situations. @node Arbitrary Precision Integers @section Arbitrary Precision Integer Arithmetic with @command{gawk} -@cindex integer, arbitrary precision +@cindex integers, arbitrary precision +@cindex arbitrary precision integers If one of the options @option{--bignum} or @option{-M} is specified, @command{gawk} performs all @@ -29123,8 +29645,13 @@ For example, the following computes @math{5^{4^{3^{2}}}}, @end iftex @ifnottex +@ifnotdocbook 5^4^3^2, +@end ifnotdocbook @end ifnottex +@docbook +5<superscript>4<superscript>3<superscript>2</superscript></superscript></superscript>, @c +@end docbook the result of which is beyond the limits of ordinary @command{gawk} numbers: @@ -29146,9 +29673,16 @@ floating-point values instead, the precision needed for correct output would be @math{3.322 @cdot 183231}, @end iftex @ifnottex +@ifnotdocbook @samp{prec = 3.322 * dps}), would be 3.322 x 183231, +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis>), +would be +<emphasis>prec</emphasis> = 3.322 ⋅ 183231, @c +@end docbook or 608693. The result from an arithmetic operation with an integer and a floating-point value @@ -29197,7 +29731,7 @@ to begin with: gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' @end example -Note that for the particular example above, there is likely best +Note that for the particular example above, it is likely best to just use the following: @example @@ -29206,6 +29740,7 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} +@cindex dynamically loaded extensions It is possible to add new functions written in C or C++ to @command{gawk} using dynamically loaded libraries. This facility is available on systems @@ -29240,6 +29775,7 @@ When @option{--sandbox} is specified, extensions are disabled @node Extension Intro @section Introduction +@cindex plug-in An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of external compiled code that @command{gawk} can load at runtime to provide additional functionality, over and above the built-in capabilities @@ -29285,8 +29821,14 @@ Communication between @command{gawk} and an extension is two-way. First, when an extension is loaded, it is passed a pointer to a @code{struct} whose fields are function pointers. +@ifnotdocbook This is shown in @ref{load-extension}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="load-extension"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,load-extension @caption{Loading The Extension} @c FIXME: One day, it should not be necessary to have two cases, @@ -29299,13 +29841,27 @@ This is shown in @ref{load-extension}. @center @image{api-figure1, , , Loading the extension} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="load-extension"> +<title>Loading the extension</title> +<graphic fileref="api-figure1.eps"/> +</figure> +@end docbook The extension can call functions inside @command{gawk} through these function pointers, at runtime, without needing (link-time) access to @command{gawk}'s symbols. One of these function pointers is to a function for ``registering'' new built-in functions. +@ifnotdocbook This is shown in @ref{load-new-function}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="load-new-function"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,load-new-function @caption{Loading The New Function} @ifinfo @@ -29315,14 +29871,28 @@ This is shown in @ref{load-new-function}. @center @image{api-figure2, , , Loading the new function} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="load-new-function"> +<title>Loading the new function</title> +<graphic fileref="api-figure2.eps"/> +</figure> +@end docbook In the other direction, the extension registers its new functions with @command{gawk} by passing function pointers to the functions that provide the new feature (@code{do_chdir()}, for example). @command{gawk} associates the function pointer with a name and can then call it, using a defined calling convention. +@ifnotdocbook This is shown in @ref{call-new-function}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="call-new-function"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,call-new-function @caption{Calling The New Function} @ifinfo @@ -29332,6 +29902,14 @@ This is shown in @ref{call-new-function}. @center @image{api-figure3, , , Calling the new function} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="call-new-function"> +<title>Calling The New Function</title> +<graphic fileref="api-figure3.eps"/> +</figure> +@end docbook The @code{do_@var{xxx}()} function, in turn, then uses the function pointers in the API @code{struct} to do its work, such as updating @@ -29368,6 +29946,7 @@ happen, but we all know how @emph{that} goes.) @node Extension API Description @section API Description +@cindex extension API This (rather large) @value{SECTION} describes the API in detail. @@ -29375,6 +29954,7 @@ This (rather large) @value{SECTION} describes the API in detail. * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with @command{gawk}. @@ -29430,6 +30010,9 @@ Symbol table access: retrieving a global variable, creating one, or changing one. @item +Allocating, reallocating, and releasing memory. + +@item Creating and releasing cached values; this provides an efficient way to use values for multiple variables and can be a big performance win. @@ -29468,10 +30051,8 @@ corresponding standard header file @emph{before} including @file{gawkapi.h}: @item @code{EOF} @tab @code{<stdio.h>} @item @code{FILE} @tab @code{<stdio.h>} @item @code{NULL} @tab @code{<stddef.h>} -@item @code{malloc()} @tab @code{<stdlib.h>} @item @code{memcpy()} @tab @code{<string.h>} @item @code{memset()} @tab @code{<string.h>} -@item @code{realloc()} @tab @code{<stdlib.h>} @item @code{size_t} @tab @code{<sys/types.h>} @item @code{struct stat} @tab @code{<sys/stat.h>} @end multitable @@ -29501,8 +30082,9 @@ does not support this keyword, you should either place All pointers filled in by @command{gawk} are to memory managed by @command{gawk} and should be treated by the extension as read-only. Memory for @emph{all} strings passed into @command{gawk} -from the extension @emph{must} come from @code{malloc()} and is managed -by @command{gawk} from then on. +from the extension @emph{must} come from calling the API-provided function +pointers @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}, +and is managed by @command{gawk} from then on. @item The API defines several simple @code{struct}s that map values as seen @@ -29542,13 +30124,17 @@ the macros as if they were functions. @node General Data Types @subsection General Purpose Data Types +@cindex Robbins, Arnold +@cindex Ramey, Chet @quotation -@i{I have a true love/hate relationship with unions.}@* -Arnold Robbins +@i{I have a true love/hate relationship with unions.} +@author Arnold Robbins +@end quotation +@quotation @i{That's the thing about unions: the compiler will arrange things so they -can accommodate both love and hate.}@* -Chet Ramey +can accommodate both love and hate.} +@author Chet Ramey @end quotation The extension API defines a number of simple types and structures for general @@ -29568,9 +30154,9 @@ certain fields in the API data structures unwritable from extension code, while allowing @command{gawk} to use them as it needs to. @item typedef enum awk_bool @{ -@item @ @ @ @ awk_false = 0, -@item @ @ @ @ awk_true -@item @} awk_bool_t; +@itemx @ @ @ @ awk_false = 0, +@itemx @ @ @ @ awk_true +@itemx @} awk_bool_t; A simple boolean type. @item typedef struct awk_string @{ @@ -29580,7 +30166,8 @@ A simple boolean type. This represents a mutable string. @command{gawk} owns the memory pointed to if it supplied the value. Otherwise, it takes ownership of the memory pointed to. -@strong{Such memory must come from @code{malloc()}!} +@strong{Such memory must come from calling the API-provided function +pointers @code{api_malloc()}, @code{api_calloc()}, or @code{api_realloc()}!} As mentioned earlier, strings are maintained using the current multibyte encoding. @@ -29696,7 +30283,94 @@ print an error message, or reissue the request for the actual value type, as appropriate. This behavior is summarized in @ref{table-value-types-returned}. +@c FIXME: Try to do this with spans... +@ifdocbook +@anchor{table-value-types-returned} +@end ifdocbook +@docbook +<informaltable> +<tgroup cols="2"> + <colspec colwidth="50*"/><colspec colwidth="50*"/> + <thead> + <row><entry></entry><entry><para>Type of Actual Value:</para></entry></row> + </thead> + <tbody> + <row><entry></entry><entry></entry></row> + </tbody> +</tgroup> +<tgroup cols="6"> + <colspec colwidth="16.6*"/> + <colspec colwidth="16.6*"/> + <colspec colwidth="19.8*"/> + <colspec colwidth="15*"/> + <colspec colwidth="15*"/> + <colspec colwidth="16.6*"/> + <thead> + <row> + <entry></entry> + <entry></entry> + <entry><para>String</para></entry> + <entry><para>Number</para></entry> + <entry><para>Array</para></entry> + <entry><para>Undefined</para></entry> + </row> + </thead> + <tbody> + <row> + <entry></entry> + <entry><para><emphasis role="bold">String</emphasis></para></entry> + <entry><para>String</para></entry> + <entry><para>String</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Number</emphasis></para></entry> + <entry><para>Number if can be converted, else false</para></entry> + <entry><para>Number</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry><para><emphasis role="bold">Type</emphasis></para></entry> + <entry><para><emphasis role="bold">Array</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>Array</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry><para><emphasis role="bold">Requested:</emphasis></para></entry> + <entry><para><emphasis role="bold">Scalar</emphasis></para></entry> + <entry><para>Scalar</para></entry> + <entry><para>Scalar</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Undefined</emphasis></para></entry> + <entry><para>String</para></entry> + <entry><para>Number</para></entry> + <entry><para>Array</para></entry> + <entry><para>Undefined</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para> + </entry><entry><para>false</para></entry> + </row> + </tbody> +</tgroup> +</informaltable> +@end docbook + @ifnotplaintext +@ifnotdocbook @float Table,table-value-types-returned @caption{Value Types Returned} @multitable @columnfractions .50 .50 @@ -29712,6 +30386,7 @@ value type, as appropriate. This behavior is summarized in @item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false @end multitable @end float +@end ifnotdocbook @end ifnotplaintext @ifplaintext @float Table,table-value-types-returned @@ -29742,45 +30417,46 @@ value type, as appropriate. This behavior is summarized in @end float @end ifplaintext -@node Constructor Functions -@subsection Constructor Functions and Convenience Macros +@node Memory Allocation Functions +@subsection Memory Allocation Functions and Convenience Macros +@cindex allocating memory for extensions +@cindex extensions, allocating memory -The API provides a number of @dfn{constructor} functions for creating -string and numeric values, as well as a number of convenience macros. -This @value{SUBSECTION} presents them all as function prototypes, in -the way that extension code would use them. +The API provides a number of @dfn{memory allocation} functions for +allocating memory that can be passed to @command{gawk}, as well as a number of +convenience macros. @table @code -@item static inline awk_value_t * -@itemx make_const_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a C string constant -(or other string data), and automatically creates a @emph{copy} of the data -for storage in @code{result}. It returns @code{result}. +@item void *gawk_malloc(size_t size); +Call @command{gawk}-provided @code{api_malloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a @samp{char *} -value pointing to data previously obtained from @code{malloc()}. The idea here -is that the data is passed directly to @command{gawk}, which assumes -responsibility for it. It returns @code{result}. +@item void *gawk_calloc(size_t nmemb, size_t size); +Call @command{gawk}-provided @code{api_calloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_null_string(awk_value_t *result) -This specialized function creates a null string (the ``undefined'' value) -in the @code{awk_value_t} variable pointed to by @code{result}. -It returns @code{result}. +@item void *gawk_realloc(void *ptr, size_t size); +Call @command{gawk}-provided @code{api_realloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_number(double num, awk_value_t *result) -This function simply creates a numeric value in the @code{awk_value_t} variable -pointed to by @code{result}. +@item void gawk_free(void *ptr); +Call @command{gawk}-provided @code{api_free()} to release storage that was +allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. @end table -Two convenience macros may be used for allocating storage from @code{malloc()} -and @code{realloc()}. If the allocation fails, they cause @command{gawk} to -exit with a fatal error message. They should be used as if they were +The API has to provide these functions because it is possible +for an extension to be compiled and linked against a different +version of the C library than was used for the @command{gawk} +executable.@footnote{This is more common on MS-Windows systems, but +can happen on Unix-like systems as well.} If @command{gawk} were +to use its version of @code{free()} when the memory came from an +unrelated version of @code{malloc()}, unexpected behavior would +likely result. + +Two convenience macros may be used for allocating storage +from the API-provided function pointers @code{api_malloc()} and +@code{api_realloc()}. If the allocation fails, they cause @command{gawk} +to exit with a fatal error message. They should be used as if they were procedure calls that do not return a value. @table @code @@ -29792,7 +30468,7 @@ The arguments to this macro are as follows: The pointer variable to point at the allocated storage. @item type -The type of the pointer variable, used to create a cast for the call to @code{malloc()}. +The type of the pointer variable, used to create a cast for the call to @code{api_malloc()}. @item size The total number of bytes to be allocated. @@ -29816,13 +30492,51 @@ make_malloced_string(message, strlen(message), & result); @end example @item #define erealloc(pointer, type, size, message) @dots{} -This is like @code{emalloc()}, but it calls @code{realloc()}, -instead of @code{malloc()}. +This is like @code{emalloc()}, but it calls @code{api_realloc()}, +instead of @code{api_malloc()}. The arguments are the same as for the @code{emalloc()} macro. @end table +@node Constructor Functions +@subsection Constructor Functions + +The API provides a number of @dfn{constructor} functions for creating +string and numeric values, as well as a number of convenience macros. +This @value{SUBSECTION} presents them all as function prototypes, in +the way that extension code would use them. + +@table @code +@item static inline awk_value_t * +@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a C string constant +(or other string data), and automatically creates a @emph{copy} of the data +for storage in @code{result}. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a @samp{char *} +value pointing to data previously obtained from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The idea here +is that the data is passed directly to @command{gawk}, which assumes +responsibility for it. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_null_string(awk_value_t *result) +This specialized function creates a null string (the ``undefined'' value) +in the @code{awk_value_t} variable pointed to by @code{result}. +It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_number(double num, awk_value_t *result) +This function simply creates a numeric value in the @code{awk_value_t} variable +pointed to by @code{result}. +@end table + @node Registration Functions @subsection Registration Functions +@cindex register extension +@cindex extension registration This @value{SECTION} describes the API functions for registering parts of your extension with @command{gawk}. @@ -29867,8 +30581,8 @@ Letter case in function names is significant. This is a pointer to the C function that provides the desired functionality. The function must fill in the result with either a number -or a string. @command{awk} takes ownership of any string memory. -As mentioned earlier, string memory @strong{must} come from @code{malloc()}. +or a string. @command{gawk} takes ownership of any string memory. +As mentioned earlier, string memory @strong{must} come from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The @code{num_actual_args} argument tells the C function how many actual parameters were passed from the calling @command{awk} code. @@ -29944,6 +30658,7 @@ is invoked with the @option{--version} option. @node Input Parsers @subsubsection Customized Input Parsers +@cindex customized input parser By default, @command{gawk} reads text files as its input. It uses the value of @code{RS} to find the end of the record, and then uses @code{FS} @@ -30191,7 +30906,9 @@ Register the input parser pointed to by @code{input_parser} with @node Output Wrappers @subsubsection Customized Output Wrappers +@cindex customized output wrapper +@cindex output wrapper An @dfn{output wrapper} is the mirror image of an input parser. It allows an extension to take over the output to a file opened with the @samp{>} or @samp{>>} I/O redirection operators (@pxref{Redirection}). @@ -30305,6 +31022,7 @@ Register the output wrapper pointed to by @code{output_wrapper} with @node Two-way processors @subsubsection Customized Two-way Processors +@cindex customized two-way processor A @dfn{two-way processor} combines an input parser and an output wrapper for two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical @@ -30362,6 +31080,8 @@ Register the two-way processor pointed to by @code{two_way_processor} with @node Printing Messages @subsection Printing Messages +@cindex printing messages from extensions +@cindex messages from extensions You can print different kinds of warning messages from your extension, as described below. Note that for these functions, @@ -30435,6 +31155,7 @@ for more information on creating arrays. @node Symbol Table Access @subsection Symbol Table Access +@cindex accessing global variables from extensions Two sets of routines provide access to global variables, and one set allows you to create and release cached values. @@ -30480,6 +31201,13 @@ An extension can look up the value of @command{gawk}'s special variables. However, with the exception of the @code{PROCINFO} array, an extension cannot change any of those variables. +@quotation NOTE +It is possible for the lookup of @code{PROCINFO} to fail. This happens if +the @command{awk} program being run does not reference @code{PROCINFO}; +in this case @command{gawk} doesn't bother to create the array and +populate it. +@end quotation + @node Symbol table by cookie @subsubsection Variable Access and Update by Cookie @@ -30606,7 +31334,7 @@ assign those values to variables using @code{sym_update()} or @code{sym_update_scalar()}, as you like. However, you can understand the point of cached values if you remember that -@emph{every} string value's storage @emph{must} come from @code{malloc()}. +@emph{every} string value's storage @emph{must} come from @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. If you have 20 variables, all of which have the same string value, you must create 20 identical copies of the string.@footnote{Numeric values are clearly less problematic, requiring only a C @code{double} to store.} @@ -30692,6 +31420,7 @@ you should release any cached values that you created, using @node Array Manipulation @subsection Array Manipulation +@cindex array manipulation in extensions The primary data structure@footnote{Okay, the only data structure.} in @command{awk} is the associative array (@pxref{Arrays}). @@ -30803,7 +31532,7 @@ requires that you understand how such values are converted to strings (@pxref{Conversion}); thus using integral values is safest. As with @emph{all} strings passed into @code{gawk} from an extension, -the string value of @code{index} must come from @code{malloc()}, and +the string value of @code{index} must come from the API-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()} and @command{gawk} releases the storage. @item awk_bool_t set_array_element(awk_array_t a_cookie, @@ -31271,6 +32000,8 @@ information about how @command{gawk} was invoked. @node Extension Versioning @subsubsection API Version Constants and Variables +@cindex API version +@cindex extension API version The API provides both a ``major'' and a ``minor'' version number. The API versions are available at compile time as constants: @@ -31324,6 +32055,8 @@ provided in @file{gawkapi.h} (discussed later, in @node Extension API Informational Variables @subsubsection Informational Variables +@cindex API informational variables +@cindex extension API informational variables The API provides access to several variables that describe whether the corresponding command-line options were enabled when @@ -31469,6 +32202,8 @@ the version string with @command{gawk}. @node Finding Extensions @section How @command{gawk} Finds Extensions +@cindex extension search path +@cindex finding extensions Compiled extensions have to be installed in a directory where @command{gawk} can find them. If @command{gawk} is configured and @@ -31479,10 +32214,11 @@ path with a list of directories to search for compiled extensions. @node Extension Example @section Example: Some File Functions +@cindex extension example @quotation -@i{No matter where you go, there you are.} @* -Buckaroo Bonzai +@i{No matter where you go, there you are.} +@author Buckaroo Bonzai @end quotation @c It's enough to show chdir and stat, no need for fts @@ -31937,7 +32673,7 @@ do_stat(int nargs, awk_value_t *result) awk_array_t array; int ret; struct stat sbuf; - /* default is stat() */ + /* default is lstat() */ int (*statfunc)(const char *path, struct stat *sbuf) = lstat; assert(result != NULL); @@ -32123,6 +32859,7 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} @node Extension Samples @section The Sample Extensions In The @command{gawk} Distribution +@cindex extensions distributed with @command{gawk} This @value{SECTION} provides brief overviews of the sample extensions that come in the @command{gawk} distribution. Some of them are intended @@ -32157,15 +32894,15 @@ The usage is: @item @@load "filefuncs" This is how you load the extension. -@cindex @code{chdir} extension function +@cindex @code{chdir()} extension function @item result = chdir("/some/directory") The @code{chdir()} function is a direct hook to the @code{chdir()} system call to change the current directory. It returns zero upon success or less than zero upon error. In the latter case it updates @code{ERRNO}. -@cindex @code{stat} extension function -@item result = stat("/some/path", statdata [, follow]) +@cindex @code{stat()} extension function +@item result = stat("/some/path", statdata @r{[}, follow@r{]}) The @code{stat()} function provides a hook into the @code{stat()} system call. It returns zero upon success or less than zero upon error. @@ -32254,7 +32991,7 @@ or Not all systems support all file types. @end multitable -@cindex @code{fts} extension function +@cindex @code{fts()} extension function @item flags = or(FTS_PHYSICAL, ...) @itemx result = fts(pathlist, flags, filedata) Walk the file trees provided in @code{pathlist} and fill in the @@ -32265,7 +33002,7 @@ Return zero if there were no errors, otherwise return @minus{}1. The @code{fts()} function provides a hook to the C library @code{fts()} routines for traversing file hierarchies. Instead of returning data -about one file at a time in a stream, it fills in a multi-dimensional +about one file at a time in a stream, it fills in a multidimensional array with data about each file and directory encountered in the requested hierarchies. @@ -32366,7 +33103,7 @@ be more comfortable to use from an @command{awk} program. This includes the lack of a comparison function, since @command{gawk} already provides powerful array sorting facilities. While an @code{fts_read()}-like interface could have been provided, this felt less natural than simply -creating a multi-dimensional array to represent the file hierarchy and +creating a multidimensional array to represent the file hierarchy and its information. @end quotation @@ -32375,19 +33112,23 @@ See @file{test/fts.awk} in the @command{gawk} distribution for an example. @node Extension Sample Fnmatch @subsection Interface To @code{fnmatch()} -@cindex @code{fnmatch} extension function This extension provides an interface to the C library @code{fnmatch()} function. The usage is: -@example -@@load "fnmatch" +@table @code +@item @@load "fnmatch" +This is how you load the extension. -result = fnmatch(pattern, string, flags) -@end example +@cindex @code{fnmatch()} extension function +@item result = fnmatch(pattern, string, flags) +The return value is zero on success, @code{FNM_NOMATCH} +if the string did not match the pattern, or +a different non-zero value if an error occurred. +@end table -The @code{fnmatch} extension adds a single function named -@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of -flag values named @code{FNM}. +Besides the @code{fnmatch()} function, the @code{fnmatch} extension +adds one constant (@code{FNM_NOMATCH}), and an array of flag values +named @code{FNM}. The arguments to @code{fnmatch()} are: @@ -32403,10 +33144,6 @@ Either zero, or the bitwise OR of one or more of the flags in the @code{FNM} array. @end table -The return value is zero on success, @code{FNM_NOMATCH} -if the string did not match the pattern, or -a different non-zero value if an error occurred. - The flags are follows: @multitable @columnfractions .25 .75 @@ -32448,21 +33185,21 @@ The @code{fork} extension adds three functions, as follows. @item @@load "fork" This is how you load the extension. -@cindex @code{fork} extension function +@cindex @code{fork()} extension function @item pid = fork() -This function creates a new process. The return value is the zero in the -child and the process-id number of the child in the parent, or @minus{}1 +This function creates a new process. The return value is zero in the +child and the process-ID number of the child in the parent, or @minus{}1 upon error. In the latter case, @code{ERRNO} indicates the problem. In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are updated to reflect the correct values. -@cindex @code{waitpid} extension function +@cindex @code{waitpid()} extension function @item ret = waitpid(pid) -This function takes a numeric argument, which is the process-id to +This function takes a numeric argument, which is the process-ID to wait for. The return value is that of the @code{waitpid()} system call. -@cindex @code{wait} extension function +@cindex @code{wait()} extension function @item ret = wait() This function waits for the first child to die. The return value is that of the @@ -32549,11 +33286,11 @@ The @code{ordchr} extension adds two functions, named @item @@load "ordchr" This is how you load the extension. -@cindex @code{ord} extension function +@cindex @code{ord()} extension function @item number = ord(string) Return the numeric value of the first character in @code{string}. -@cindex @code{chr} extension function +@cindex @code{chr()} extension function @item char = chr(number) Return a string whose first character is that represented by @code{number}. @end table @@ -32670,14 +33407,14 @@ The @code{rwarray} extension adds two functions, named @code{writea()} and @code{reada()}, as follows: @table @code -@cindex @code{writea} extension function +@cindex @code{writea()} extension function @item ret = writea(file, array) This function takes a string argument, which is the name of the file to which dump the array, and the array itself as the second argument. @code{writea()} understands multidimensional arrays. It returns one on success, or zero upon failure. -@cindex @code{reada} extension function +@cindex @code{reada()} extension function @item ret = reada(file, array) @code{reada()} is the inverse of @code{writea()}; it reads the file named as its first argument, filling in @@ -32714,17 +33451,23 @@ ret = reada("arraydump.bin", array) @subsection Reading An Entire File The @code{readfile} extension adds a single function -named @code{readfile()}: +named @code{readfile()}, and an input parser: @table @code @item @@load "readfile" This is how you load the extension. -@cindex @code{readfile} extension function +@cindex @code{readfile()} extension function @item result = readfile("/some/path") The argument is the name of the file to read. The return value is a string containing the entire contents of the requested file. Upon error, the function returns the empty string and sets @code{ERRNO}. + +@item BEGIN @{ PROCINFO["readfile"] = 1 @} +In addition, the extension adds an input parser that is activated if +@code{PROCINFO["readfile"]} exists. +When activated, each input file is returned in its entirety as @code{$0}. +@code{RT} is set to the null string. @end table Here is an example: @@ -32761,7 +33504,7 @@ inserting @samp{@@load "time"} in your script. @item @@load "time" This is how you load the extension. -@cindex @code{gettimeofday} extension function +@cindex @code{gettimeofday()} extension function @item the_time = gettimeofday() Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating point value. If the time is unavailable on this platform, return @@ -32771,7 +33514,7 @@ If the standard C @code{gettimeofday()} system call is available on this platform, then it simply returns the value. Otherwise, if on Windows, it tries to use @code{GetSystemTimeAsFileTime()}. -@cindex @code{sleep} extension function +@cindex @code{sleep()} extension function @item result = sleep(@var{seconds}) Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. @@ -32783,6 +33526,8 @@ tries to use @code{nanosleep()} or @code{select()} to implement the delay. @node gawkextlib @section The @code{gawkextlib} Project +@cindex @code{gawkextlib} +@cindex extensions, where to find @cindex @code{gawkextlib} project The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} @@ -32790,7 +33535,7 @@ project provides a number of @command{gawk} extensions, including one for processing XML files. This is the evolution of the original @command{xgawk} (XML @command{gawk}) project. -As of this writing, there are four extensions: +As of this writing, there are five extensions: @itemize @bullet @item @@ -32798,6 +33543,9 @@ XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} XML parsing library. @item +PDF extension. + +@item PostgreSQL extension. @item @@ -32813,6 +33561,7 @@ The @code{time} extension described earlier (@pxref{Extension Sample Time}) was originally from this project but has been moved in to the main @command{gawk} distribution. +@cindex @command{git} utility You can check out the code for the @code{gawkextlib} project using the @uref{http://git-scm.com, GIT} distributed source code control system. The command is as follows: @@ -32927,6 +33676,7 @@ of the @value{DOCUMENT} where you can find more information. @command{awk}. * POSIX/GNU:: The extensions in @command{gawk} not in POSIX @command{awk}. +* Feature History:: The history of the features in @command{gawk}. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. * Contributors:: The major contributors to @command{gawk}. @@ -33024,7 +33774,7 @@ Multiple @code{BEGIN} and @code{END} rules @item Multidimensional arrays -(@pxref{Multi-dimensional}). +(@pxref{Multidimensional}). @end itemize @c ENDOFRANGE gawkv1 @@ -33231,7 +33981,7 @@ Special files in I/O redirections: @itemize @minus{} @item The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and -@file{/dev/fd/@var{N}} special @value{FN}s +@file{/dev/fd/@var{N}} special file names (@pxref{Special Files}). @item @@ -33455,7 +34205,7 @@ long options @item Support for the following obsolete systems was removed from the code -and the documentation for @command{gawk} @value{PVERSION} 4.0: +and the documentation for @command{gawk} version 4.0: @c nested table @itemize @minus @@ -33492,6 +34242,9 @@ Tandem (non-POSIX) @item Prestandard VAX C compiler for VAX/VMS +@item +GCC for VAX and Alpha has not been tested for a while. + @end itemize @end itemize @@ -33502,6 +34255,612 @@ Prestandard VAX C compiler for VAX/VMS @c ENDOFRANGE exgnot @c ENDOFRANGE posnot +@node Feature History +@appendixsec History of @command{gawk} Features + +@ignore +See the thread: +https://groups.google.com/forum/#!topic/comp.lang.awk/SAUiRuff30c +This motivated me to add this section. +@end ignore + +@ignore +I've tried to follow this general order, esp.@: for the 3.0 and 3.1 sections: + variables + special files + language changes (e.g., hex constants) + differences in standard awk functions + new gawk functions + new keywords + new command-line options + behavioral changes + new ports +Within each category, be alphabetical. +@end ignore + +This @value{SECTION} describes the features in @command{gawk} +over and above those in POSIX @command{awk}, +in the order they were added to @command{gawk}. + +Version 2.10 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +The @env{AWKPATH} environment variable for specifying a path search for +the @option{-f} command-line option +(@pxref{Options}). + +@item +The @code{IGNORECASE} variable and its effects +(@pxref{Case-sensitivity}). + +@item +The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and +@file{/dev/fd/@var{N}} special file names +(@pxref{Special Files}). +@end itemize + +Version 2.13 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +The @code{FIELDWIDTHS} variable and its effects +(@pxref{Constant Size}). + +@item +The @code{systime()} and @code{strftime()} built-in functions for obtaining +and printing timestamps +(@pxref{Time Functions}). + +@item +Additional command-line options +(@pxref{Options}): + +@itemize @minus +@item +The @option{-W lint} option to provide error and portability checking +for both the source code and at runtime. + +@item +The @option{-W compat} option to turn off the GNU extensions. + +@item +The @option{-W posix} option for full POSIX compliance. +@end itemize +@end itemize + +Version 2.14 of @command{gawk} introduced the following feature: + +@itemize @bullet +@item +The @code{next file} statement for skipping to the next data file +(@pxref{Nextfile Statement}). +@end itemize + +Version 2.15 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New variables (@pxref{Built-in Variables}): + +@itemize @minus +@item +@code{ARGIND}, which tracks the movement of @code{FILENAME} +through @code{ARGV}. + +@item +@code{ERRNO}, which contains the system error message when +@code{getline} returns @minus{}1 or @code{close()} fails. +@end itemize + +@item +The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and +@file{/dev/user} special file names. These have since been removed. + +@item +The ability to delete all of an array at once with @samp{delete @var{array}} +(@pxref{Delete}). + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The ability to use GNU-style long-named options that start with @option{--}. + +@item +The @option{--source} option for mixing command-line and library-file +source code. +@end itemize +@end itemize + +Version 3.0 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New or changed variables: + +@itemize @minus +@item +@code{IGNORECASE} changed, now applying to string comparison as well +as regexp operations +(@pxref{Case-sensitivity}). + +@item +@code{RT}, which contains the input text that matched @code{RS} +(@pxref{Records}). +@end itemize + +@item +Full support for both POSIX and GNU regexps +(@pxref{Regexp}). + +@item +The @code{gensub()} function for more powerful text manipulation +(@pxref{String Functions}). + +@item +The @code{strftime()} function acquired a default time format, +allowing it to be called with no arguments +(@pxref{Time Functions}). + +@item +The ability for @code{FS} and for the third +argument to @code{split()} to be null strings +(@pxref{Single Character Fields}). + +@item +The ability for @code{RS} to be a regexp +(@pxref{Records}). + +@item +The @code{next file} statement became @code{nextfile} +(@pxref{Nextfile Statement}). + +@item +The @code{fflush()} function from the +Bell Laboratories research version of @command{awk} +(@pxref{I/O Functions}). + +@item +New command line options: + +@itemize @minus +@item +The @option{--lint-old} option to +warn about constructs that are not available in +the original Version 7 Unix version of @command{awk} +(@pxref{V7/SVR3.1}). + +@item +The @option{-m} option from the +Bell Laboratories research version of @command{awk} +This was later removed. + +@item +The @option{--re-interval} option to provide interval expressions in regexps +(@pxref{Regexp Operators}). + +@item +The @option{--traditional} option was added as a better name for +@option{--compat} (@pxref{Options}). +@end itemize + +@item +The use of GNU Autoconf to control the configuration process +(@pxref{Quick Installation}). + +@item +Amiga support. + +@end itemize + +Version 3.1 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New variables +(@pxref{Built-in Variables}): + +@itemize @minus +@item +@code{BINMODE}, for non-POSIX systems, +which allows binary I/O for input and/or output files +(@pxref{PC Using}). + +@item +@code{LINT}, which dynamically controls lint warnings. + +@item +@code{PROCINFO}, an array for providing process-related information. + +@item +@code{TEXTDOMAIN}, for setting an application's internationalization text domain +(@pxref{Internationalization}). +@end itemize + +@item +The ability to use octal and hexadecimal constants in @command{awk} +program source code +(@pxref{Nondecimal-numbers}). + +@item +The @samp{|&} operator for two-way I/O to a coprocess +(@pxref{Two-way I/O}). + +@item +The @file{/inet} special files for TCP/IP networking using @samp{|&} +(@pxref{TCP/IP Networking}). + +@item +The optional second argument to @code{close()} that allows closing one end +of a two-way pipe to a coprocess +(@pxref{Two-way I/O}). + +@item +The optional third argument to the @code{match()} function +for capturing text-matching subexpressions within a regexp +(@pxref{String Functions}). + +@item +Positional specifiers in @code{printf} formats for +making translations easier +(@pxref{Printf Ordering}). + +@item +A number of new built-in functions: + +@itemize @minus +@item +The @code{asort()} and @code{asorti()} functions for sorting arrays +(@pxref{Array Sorting}). + +@item +The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()} functions +for internationalization +(@pxref{Programmer i18n}). + +@item +The @code{extension()} function and the ability to add +new built-in functions dynamically +(@pxref{Dynamic Extensions}). + +@item +The @code{mktime()} function for creating timestamps +(@pxref{Time Functions}). + +@item +The @code{and()}, @code{or()}, @code{xor()}, @code{compl()}, +@code{lshift()}, @code{rshift()}, and @code{strtonum()} functions +(@pxref{Bitwise Functions}). +@end itemize + +@item +@cindex @code{next file} statement +The support for @samp{next file} as two words was removed completely +(@pxref{Nextfile Statement}). + +@item +Additional commnd line options +(@pxref{Options}): + +@itemize @minus +@item +The @option{--dump-variables} option to print a list of all global variables. + +@item +The @option{--exec} option, for use in CGI scripts. + +@item +The @option{--gen-po} command-line option and the use of a leading +underscore to mark strings that should be translated +(@pxref{String Extraction}). + +@item +The @option{--non-decimal-data} option to allow non-decimal +input data +(@pxref{Nondecimal Data}). + +@item +The @option{--profile} option and @command{pgawk}, the +profiling version of @command{gawk}, for producing execution +profiles of @command{awk} programs +(@pxref{Profiling}). + +@item +The @option{--use-lc-numeric} option to force @command{gawk} +to use the locale's decimal point for parsing input data +(@pxref{Conversion}). +@end itemize + +@item +The use of GNU Automake to help in standardizing the configuration process +(@pxref{Quick Installation}). + +@item +The use of GNU @code{gettext} for @command{gawk}'s own message output +(@pxref{Gawk I18N}). + +@item +BeOS support. This was later removed. + +@item +Tandem support. This was later removed. + +@item +The Atari port became officially unsupported. + +@item +The source code changed to use ISO C standard-style function definitions. + +@item +POSIX compliance for @code{sub()} and @code{gsub()} +(@pxref{Gory Details}). + +@item +The @code{length()} function was extended to accept an array argument +and return the number of elements in the array +(@pxref{String Functions}). + +@item +The @code{strftime()} function acquired a third argument to +enable printing times as UTC +(@pxref{Time Functions}). +@end itemize + +Version 4.0 of @command{gawk} introduced the following features: + +@itemize @bullet + +@item +Variable additions: + +@itemize @minus +@item +@code{FPAT}, which allows you to specify a regexp that matches +the fields, instead of matching the field separator +(@pxref{Splitting By Content}). + +@item +If @code{PROCINFO["sorted_in"]} exists, @samp{for(iggy in foo)} loops sort the +indices before looping over them. The value of this element +provides control over how the indices are sorted before the loop +traversal starts +(@pxref{Controlling Scanning}). + +@item +@code{PROCINFO["strftime"]}, which holds +the default format for @code{strftime()} +(@pxref{Time Functions}). +@end itemize + +@item +The special files @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid} +and @file{/dev/user} were removed. + +@item +Support for IPv6 was added via the @file{/inet6} special file. +@file{/inet4} forces IPv4 and @file{/inet} chooses the system +default, which is probably IPv4 +(@pxref{TCP/IP Networking}). + +@item +The use of @samp{\s} and @samp{\S} escape sequences in regular expressions +(@pxref{GNU Regexp Operators}). + +@item +Interval expressions became part of default regular expressions +(@pxref{Regexp Operators}). + +@item +POSIX character classes work even with @option{--traditional} +(@pxref{Regexp Operators}). + +@item +@code{break} and @code{continue} became invalid outside a loop, +even with @option{--traditional} +(@pxref{Break Statement}, and also see +@ref{Continue Statement}). + +@item +@code{fflush()}, @code{nextfile}, and @samp{delete @var{array}} +are allowed if @option{--posix} or @option{--traditional}, since they +are all now part of POSIX. + +@item +An optional third argument to +@code{asort()} and @code{asorti()}, specifying how to sort +(@pxref{String Functions}). + +@item +The behavior of @code{fflush()} changed to match Brian Kernighan's @command{awk} +and for POSIX; now both @samp{fflush()} and @samp{fflush("")} +flush all open output redirections +(@pxref{I/O Functions}). + +@item +The @code{isarray()} +function which distinguishes if an item is an array +or not, to make it possible to traverse multidimensional arrays +(@pxref{Type Functions}). + +@item +The @code{patsplit()} +function which gives the same capability as @code{FPAT}, for splitting +(@pxref{String Functions}). + +@item +An optional fourth argument to the @code{split()} function, +which is an array to hold the values of the separators +(@pxref{String Functions}). + +@item +Arrays of arrays +(@pxref{Arrays of Arrays}). + +@item +The @code{BEGINFILE} and @code{ENDFILE} special patterns +(@pxref{BEGINFILE/ENDFILE}). + +@item +Indirect function calls +(@pxref{Indirect Calls}). + +@item +@code{switch} / @code{case} are enabled by default +(@pxref{Switch Statement}). + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The @option{-b} and @option{--characters-as-bytes} options +which prevent @command{gawk} from treating input as a multibyte string. + +@item +The redundant @option{--compat}, @option{--copyleft}, and @option{--usage} +long options were removed. + +@item +The @option{--gen-po} option was finally renamed to the correct @option{--gen-pot}. + +@item +The @option{--sandbox} option which disables certain features. + +@item +All long options acquired corresponding short options, for use in @samp{#!} scripts. +@end itemize + +@item +Directories named on the command line now produce a warning, not a fatal +error, unless @option{--posix} or @option{--traditional} are used +(@pxref{Command line directories}). + +@item +The @command{gawk} internals were rewritten, bringing the @command{dgawk} +debugger and possibly improved performance +(@pxref{Debugger}). + +@item +Per the GNU Coding Standards, dynamic extensions must now define +a global symbol indicating that they are GPL-compatible +(@pxref{Plugin License}). + +@item +In POSIX mode, string comparisons use @code{strcoll()} / @code{wcscoll()} +(@pxref{POSIX String Comparison}). + +@item +The option for raw sockets was removed, since it was never implemented +(@pxref{TCP/IP Networking}). + +@item +Ranges of the form @samp{[d-h]} are treated as if they were in the +C locale, no matter what kind of regexp is being used, and even if +@option{--posix} +(@pxref{Ranges and Locales}). + +@item +Support was removed for the following systems: + +@itemize @minus +@item +Atari + +@item +Amiga + +@item +BeOS + +@item +Cray + +@item +MIPS RiscOS + +@item +MS-DOS with Microsoft Compiler + +@item +MS-Windows with Microsoft Compiler + +@item +NeXT + +@item +SunOS 3.x, Sun 386 (Road Runner) + +@item +Tandem (non-POSIX) + +@item +Prestandard VAX C compiler for VAX/VMS +@end itemize +@end itemize + +Version 4.1 of @command{gawk} introduced the following features: + +@itemize @bullet + +@item +Three new arrays: +@code{SYMTAB}, @code{FUNCTAB}, and @code{PROCINFO["identifiers"]} +(@pxref{Auto-set}). + +@item +The three executables @command{gawk}, @command{pgawk}, and @command{dgawk}, were merged into +one, named just @command{gawk}. As a result the command line options changed. + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The @option{-D} option invokes the debugger. + +@item +The @option{-i} and @option{--include} options +load @command{awk} library files. + +@item +The @option{-l} and @option{--load} options load compiled dynamic extensions. + +@item +The @option{-M} and @option{--bignum} options enable MPFR. + +@item +The @option{-o} only does pretty-printing. + +@item +The @option{-p} option is used for profiling. + +@item +The @option{-R} option was removed. +@end itemize + +@item +Support for high precision arithmetic with MPFR. +(@pxref{Gawk and MPFR}). + +@item +The @code{and()}, @code{or()} and @code{xor()} functions +changed to allow any number of arguments, +with a minimum of two +(@pxref{Bitwise Functions}). + +@item +The dynamic extension interface was completely redone +(@pxref{Dynamic Extensions}). + +@end itemize + +@c XXX ADD MORE STUFF HERE + @node Common Extensions @appendixsec Common Extensions Summary @@ -33515,18 +34874,18 @@ the three most widely-used freely available versions of @command{awk} @multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk} @headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk @item @samp{\x} Escape sequence @tab X @tab X @tab X -@item @code{RS} as regexp @tab @tab X @tab X @item @code{FS} as null string @tab X @tab X @tab X -@item @file{/dev/stdin} special file @tab X @tab @tab X +@item @file{/dev/stdin} special file @tab X @tab X @tab X @item @file{/dev/stdout} special file @tab X @tab X @tab X @item @file{/dev/stderr} special file @tab X @tab X @tab X -@item @code{**} and @code{**=} operators @tab X @tab @tab X +@item @code{delete} without subscript @tab X @tab X @tab X @item @code{fflush()} function @tab X @tab X @tab X -@item @code{func} keyword @tab X @tab @tab X +@item @code{length()} of an array @tab X @tab X @tab X @item @code{nextfile} statement @tab X @tab X @tab X -@item @code{delete} without subscript @tab X @tab X @tab X -@item @code{length()} of an array @tab X @tab @tab X +@item @code{**} and @code{**=} operators @tab X @tab @tab X +@item @code{func} keyword @tab X @tab @tab X @item @code{BINMODE} variable @tab @tab X @tab X +@item @code{RS} as regexp @tab @tab X @tab X @item Time related functions @tab @tab X @tab X @end multitable @@ -33546,7 +34905,7 @@ character ranges (such as @samp{[a-z]}) to match any character between the first character in the range and the last character in the range, inclusive. Ordering was based on the numeric value of each character in the machine's native character set. Thus, on ASCII-based systems, -@code{[a-z]} matched all the lowercase letters, and only the lowercase +@samp{[a-z]} matched all the lowercase letters, and only the lowercase letters, since the numeric values for the letters from @samp{a} through @samp{z} were contiguous. (On an EBCDIC system, the range @samp{[a-z]} includes additional, non-alphabetic characters as well.) @@ -33557,7 +34916,7 @@ as working in this fashion, and in particular, would teach that the that @samp{[A-Z]} was the ``correct'' way to match uppercase letters. And indeed, this was true.@footnote{And Life was good.} -The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}). +The 1992 POSIX standard introduced the idea of locales (@pxref{Locales}). Since many locales include other letters besides the plain twenty-six letters of the American English alphabet, the POSIX standard added character classes (@pxref{Bracket Expressions}) as a way to match @@ -33596,6 +34955,7 @@ This output is unexpected, since the @samp{bc} at the end of This result is due to the locale setting (and thus you may not see it on your system). +@cindex Unicode Similar considerations apply to other ranges. For example, @samp{["-/]} is perfectly valid in ASCII, but is not valid in many Unicode locales, such as @samp{en_US.UTF-8}. @@ -33607,18 +34967,19 @@ When @command{gawk} switched to using locale-aware regexp matchers, the problems began; especially as both GNU/Linux and commercial Unix vendors started implementing non-ASCII locales, @emph{and making them the default}. Perhaps the most frequently asked question became something -like ``why does @code{[A-Z]} match lowercase letters?!?'' +like ``why does @samp{[A-Z]} match lowercase letters?!?'' +@cindex Berry, Karl This situation existed for close to 10 years, if not more, and the @command{gawk} maintainer grew weary of trying to explain that @command{gawk} was being nicely standards-compliant, and that the issue was in the user's locale. During the development of version 4.0, he modified @command{gawk} to always treat ranges in the original, pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And -thus was born the Campain for Rational Range Interpretation (or RRI). A number -of GNU tools, such as @command{grep} and @command{sed}, have either -implemented this change, or will soon. Thanks to Karl Berry for coining the phrase -``Rational Range Interpretation.''} +thus was born the Campaign for Rational Range Interpretation (or +RRI). A number of GNU tools have either implemented this change, +or will soon. Thanks to Karl Berry for coining the phrase ``Rational +Range Interpretation.''} Fortunately, shortly before the final release of @command{gawk} 4.0, the maintainer learned that the 2008 standard had changed the @@ -33631,15 +34992,15 @@ and By using this lovely technical term, the standard gives license to implementors to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all -cases: the default regexp matching; with @option{--traditional}, and with +cases: the default regexp matching; with @option{--traditional} and with @option{--posix}; in all cases, @command{gawk} remains POSIX compliant. @node Contributors @appendixsec Major Contributors to @command{gawk} @cindex @command{gawk}, list of contributors to @quotation -@i{Always give credit where credit is due.}@* -Anonymous +@i{Always give credit where credit is due.} +@author Anonymous @end quotation This @value{SECTION} names the major contributors to @command{gawk} @@ -33827,6 +35188,15 @@ environments. (This is no longer supported) @item +@cindex Wallin, Anders +Anders Wallin helped keep the VMS port going for several years. + +@item +@cindex Gordon, Assaf +Assaf Gordon contributed the code to implement the +@option{--sandbox} option. + +@item @cindex Haque, John John Haque made the following contributions: @@ -33836,6 +35206,10 @@ The modifications to convert @command{gawk} into a byte-code interpreter, including the debugger. @item +The addition of true multidimensional arrays. +@ref{Arrays of Arrays}. + +@item The additional modifications for support of arbitrary precision arithmetic. @item @@ -33848,6 +35222,10 @@ into one, for the 4.1 release. @item Improved array internals for arrays indexed by integers. + +@item +The improved array sorting features were driven by John together +with Pat Rankin. @end itemize @item @@ -33862,6 +35240,11 @@ Arnold Robbins and Andrew Schorr, with notable contributions from the rest of the development team. @item +@cindex Colombo, Antonio +Antonio Giovanni Colombo rewrote a number of examples in the early +chapters that were severely dated, for which I am incredibly grateful. + +@item @cindex Robbins, Arnold Arnold Robbins has been working on @command{gawk} since 1988, at first @@ -33872,7 +35255,7 @@ helping David Trueman, and as the primary maintainer since around 1994. @appendix Installing @command{gawk} @c last two commas are part of see also -@cindex operating systems, See Also GNU/Linux, PC operating systems, Unix +@cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix @c STARTOFRANGE gligawk @cindex @command{gawk}, installing @c STARTOFRANGE ingawk @@ -33969,7 +35352,7 @@ Extracting the archive creates a directory named @file{gawk-@value{VERSION}.@value{PATCHLEVEL}} in the current directory. -The distribution @value{FN} is of the form +The distribution file name is of the form @file{gawk-@var{V}.@var{R}.@var{P}.tar.gz}. The @var{V} represents the major version of @command{gawk}, the @var{R} represents the current release of version @var{V}, and @@ -34001,6 +35384,13 @@ The actual @command{gawk} source code. @end table @table @file +@item ABOUT-NLS +Information about GNU @command{gettext} and translations. + +@item AUTHORS +A file with some information about the authorship of @command{gawk}. +It exists only to satisfy the pedants at the Free Software Foundation. + @item README @itemx README_d/README.* Descriptive files: @file{README} for @command{gawk} under Unix and the @@ -34024,16 +35414,6 @@ An older list of changes to @command{gawk}. @item COPYING The GNU General Public License. -@item FUTURES -A brief list of features and changes being contemplated for future -releases, with some indication of the time frame for the feature, based -on its difficulty. - -@item LIMITATIONS -A list of those factors that limit @command{gawk}'s performance. -Most of these depend on the hardware or operating system software and -are not limits in @command{gawk} itself. - @item POSIX.STD A description of behaviors in the POSIX standard for @command{awk} which are left undefined, or where @command{gawk} may not comply fully, as well @@ -34066,12 +35446,19 @@ The @command{troff} source for a manual page describing @command{gawk}. This is distributed for the convenience of Unix users. @cindex Texinfo -@item doc/gawk.texi +@item doc/gawktexi.in +@itemx doc/sidebar.awk The Texinfo source file for this @value{DOCUMENT}. -It should be processed with @TeX{} -(via @command{texi2dvi} or @command{texi2pdf}) +It should be processed by @file{doc/sidebar.awk} +before processing with @command{texi2dvi} or @command{texi2pdf} to produce a printed document, and with @command{makeinfo} to produce an Info or HTML file. +The @file{Makefile} takes care of this processing and produces +printable output via @command{texi2dvi} or @command{texi2pdf}. + +@item doc/gawk.texi +The file produced after processing @file{gawktexi.in} +with @file{sidebar.awk}. @item doc/gawk.info The generated Info file for this @value{DOCUMENT}. @@ -34110,15 +35497,21 @@ the @file{Makefile.in} files used by @command{autoconf} and @item Makefile.in @itemx aclocal.m4 +@itemx bisonfix.awk +@itemx config.guess @itemx configh.in @itemx configure.ac @itemx configure @itemx custom.h +@itemx depcomp +@itemx install-sh @itemx missing_d/* +@itemx mkinstalldirs @itemx m4/* -These files and subdirectories are used when configuring @command{gawk} -for various Unix systems. They are explained in -@ref{Unix Installation}. +These files and subdirectories are used when configuring and compiling +@command{gawk} for various Unix systems. Most of them are explained +in @ref{Unix Installation}. The rest are there to support the main +infrastructure. @item po/* The @file{po} library contains message translations. @@ -34142,6 +35535,11 @@ They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate subdirectories of @file{awklib/eg}. +@item extension/* +The source code, manual pages, and infrastructure files for +the sample extensions included with @command{gawk}. +@xref{Dynamic Extensions}, for more information. + @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @@ -34245,7 +35643,7 @@ please send in a bug report (@pxref{Bugs}). Of course, once you've built @command{gawk}, it is likely that you will wish to install it. To do so, you need to run the command @samp{make -check}, as a user with the appropriate permissions. How to do this +install}, as a user with the appropriate permissions. How to do this varies by system, but on many systems you can use the @command{sudo} command to do so. The command then becomes @samp{sudo make install}. It is likely that you will be asked for your password, and you will have @@ -34262,7 +35660,15 @@ command line when compiling @command{gawk} from scratch, including: @table @code -@cindex @code{--disable-lint} configuration option +@cindex @option{--disable-extensions} configuration option +@cindex configuration option, @code{--disable-extensions} +@item --disable-extensions +Disable configuring and building the sample extensions in the +@file{extension} directory. This is useful for cross-compiling. +The default action is to dynamically check if the extensions +can be configured and compiled. + +@cindex @option{--disable-lint} configuration option @cindex configuration option, @code{--disable-lint} @item --disable-lint Disable all lint checking within @code{gawk}. The @@ -34282,14 +35688,14 @@ Using this option may bring you some slight performance improvement. Using this option will cause some of the tests in the test suite to fail. This option may be removed at a later date. -@cindex @code{--disable-nls} configuration option +@cindex @option{--disable-nls} configuration option @cindex configuration option, @code{--disable-nls} @item --disable-nls Disable all message-translation facilities. This is usually not desirable, but it may bring you some slight performance improvement. -@cindex @code{--with-whiny-user-strftime} configuration option +@cindex @option{--with-whiny-user-strftime} configuration option @cindex configuration option, @code{--with-whiny-user-strftime} @item --with-whiny-user-strftime Force use of the included version of the @code{strftime()} @@ -34563,11 +35969,10 @@ multibyte functionality is not available. @c STARTOFRANGE pcgawon @cindex PC operating systems, @command{gawk} on -With the exception of the Cygwin environment, -the @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking}) -are not supported for MS-DOS or MS-Windows. EMX (OS/2 only) does support -at least the @samp{|&} operator. +Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support +both the @samp{|&} operator and TCP/IP networking +(@pxref{TCP/IP Networking}). +EMX (OS/2 only) supports at least the @samp{|&} operator. @cindex search paths @cindex search paths, for source files @@ -34697,7 +36102,7 @@ moved into the @code{BEGIN} rule. @command{gawk} can be built and used ``out of the box'' under MS-Windows if you are using the @uref{http://www.cygwin.com, Cygwin environment}. -This environment provides an excellent simulation of Unix, using the +This environment provides an excellent simulation of GNU/Linux, using the GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, and other GNU programs. Compilation and installation for Cygwin is the same as for a Unix system: @@ -34713,13 +36118,6 @@ When compared to GNU/Linux on the same system, the @samp{configure} step on Cygwin takes considerably longer. However, it does finish, and then the @samp{make} proceeds as usual. -@quotation NOTE -The @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking}) -are fully supported in the Cygwin environment. This is not true -for any other environment on MS-Windows. -@end quotation - @node MSYS @appendixsubsubsec Using @command{gawk} In The MSYS Environment @@ -34745,8 +36143,11 @@ The older designation ``VMS'' is used throughout to refer to OpenVMS. @menu * VMS Compilation:: How to compile @command{gawk} under VMS. +* VMS Dynamic Extensions:: Compiling @command{gawk} dynamic extensions on + VMS. * VMS Installation Details:: How to install @command{gawk} under VMS. * VMS Running:: How to run @command{gawk} under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. @end menu @@ -34754,41 +36155,110 @@ The older designation ``VMS'' is used throughout to refer to OpenVMS. @appendixsubsubsec Compiling @command{gawk} on VMS @cindex compiling @command{gawk} for VMS -To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that -issues all the necessary @code{CC} and @code{LINK} commands. There is -also a @file{Makefile} for use with the @code{MMS} utility. From the source -directory, use either: +To compile @command{gawk} under VMS, there is a @code{DCL} command procedure +that issues all the necessary @code{CC} and @code{LINK} commands. There is +also a @file{Makefile} for use with the @code{MMS} and @code{MMK} utilities. +From the source directory, use either: + +@example +$ @kbd{@@[.vms]vmsbuild.com} +@end example + +@noindent +or: @example -$ @kbd{@@[.VMS]VMSBUILD.COM} +$ @kbd{MMS/DESCRIPTION=[.vms]descrip.mms gawk} @end example @noindent or: @example -$ @kbd{MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK} +$ @kbd{MMK/DESCRIPTION=[.vms]descrip.mms gawk} @end example -Older versions of @command{gawk} could be built with VAX C or -GNU C on VAX/VMS, as well as with DEC C, but that is no longer -supported. DEC C (also briefly known as ``Compaq C'' and now known -as ``HP C,'' but referred to here as ``DEC C'') is required. Both -@code{VMSBUILD.COM} and @code{DESCRIP.MMS} contain some obsolete support -for the older compilers but are set up to use DEC C by default. +@code{MMK} is an open source, free, near-clone of @code{MMS} and +can better handle @code{ODS-5} volumes with upper- and lowercase filenames. +@code{MMK} is available from @uref{https://github.com/endlesssoftware/mmk}. + +With @code{ODS-5} volumes and extended parsing enabled, the case of the target +parameter may need to be exact. + +@command{gawk} has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1 +using Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. +The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both +Alpha and IA64 VMS 8.4 used HP C 7.3.@footnote{The IA64 architecture +is also known as ``Itanium.''} + +The @file{[.vms]gawk_build_steps.txt} provides information on how to build +@command{gawk} into a PCSI kit that is compatible with the GNV product. + +@node VMS Dynamic Extensions +@appendixsubsubsec Compiling @command{gawk} Dynamic Extensions on VMS + +The extensions that have been ported to VMS can be built using one of +the following commands. + +@example +$ @kbd{MMS/DESCRIPTION=[.vms]descrip.mms extensions} +@end example + +@noindent +or: + +@example +$ @kbd{MMK/DESCRIPTION=[.vms]descrip.mms extensions} +@end example -@command{gawk} has been tested under Alpha/VMS 7.3-1 using Compaq C V6.4, -and on Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.@footnote{The IA64 -architecture is also known as ``Itanium.''} +@command{gawk} uses @code{AWKLIBPATH} as either an environment variable +or a logical name to find the dynamic extensions. + +Dynamic extensions need to be compiled with the same compiler options for +floating point, pointer size, and symbol name handling as were used +to compile @command{gawk} itself. +Alpha and Itanium should use IEEE floating point. The pointer size is 32 bits, +and the symbol name handling should be exact case with CRC shortening for +symbols longer than 32 bits. + +For Alpha and Itanium: + +@example +/name=(as_is,short) +/float=ieee/ieee_mode=denorm_results +@end example + +For VAX: + +@example +/name=(as_is,short) +@end example + +Compile time macros need to be defined before the first VMS-supplied +header file is included. + +@example +#if (__CRTL_VER >= 70200000) && !defined (__VAX) +#define _LARGEFILE 1 +#endif + +#ifndef __VAX +#ifdef __CRTL_VER +#if __CRTL_VER >= 80200000 +#define _USE_STD_STAT 1 +#endif +#endif +#endif +@end example @node VMS Installation Details @appendixsubsubsec Installing @command{gawk} on VMS -To install @command{gawk}, all you need is a ``foreign'' command, which is -a @code{DCL} symbol whose value begins with a dollar sign. For example: +To use @command{gawk}, all you need is a ``foreign'' command, which is a +@code{DCL} symbol whose value begins with a dollar sign. For example: @example -$ @kbd{GAWK :== $disk1:[gnubin]GAWK} +$ @kbd{GAWK :== $disk1:[gnubin]gawk} @end example @noindent @@ -34800,10 +36270,29 @@ Alternatively, the symbol may be placed in the system-wide @file{sylogin.com} procedure, which allows all users to run @command{gawk}. -Optionally, the help entry can be loaded into a VMS help library: +If your @command{gawk} was installed by a PCSI kit into the +@file{GNV$GNU:} directory tree, the program will be known as +@file{GNV$GNU:[bin]gnv$gawk.exe} and the help file will be +@file{GNV$GNU:[vms_help]gawk.hlp}. + +The PCSI kit also installs a @file{GNV$GNU:[vms_bin]gawk_verb.cld} file +which can be used to add @command{gawk} and @command{awk} as DCL commands. + +For just the current process you can use: + +@example +$ @kbd{set command gnv$gnu:[vms_bin]gawk_verb.cld} +@end example + +Or the system manager can use @file{GNV$GNU:[vms_bin]gawk_verb.cld} to +add the @command{gawk} and @command{awk} to the system wide @samp{DCLTABLES}. + +The DCL syntax is documented in the @file{gawk.hlp} file. + +Optionally, the @file{gawk.hlp} entry can be loaded into a VMS help library: @example -$ @kbd{LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP} +$ @kbd{LIBRARY/HELP sys$help:helplib [.vms]gawk.hlp} @end example @noindent @@ -34821,7 +36310,7 @@ provides information about both the @command{gawk} implementation and the The logical name @samp{AWK_LIBRARY} can designate a default location for @command{awk} program files. For the @option{-f} option, if the specified -@value{FN} has no device or directory path information in it, @command{gawk} +file name has no device or directory path information in it, @command{gawk} looks in the current directory first, then in the directory specified by the translation of @samp{AWK_LIBRARY} if the file is not found. If, after searching in both directories, the file still is not found, @@ -34854,9 +36343,42 @@ One side effect of dual command-line parsing is that if there is only a single parameter (as in the quoted string program above), the command becomes ambiguous. To work around this, the normally optional @option{--} flag is required to force Unix-style parsing rather than @code{DCL} parsing. If any -other dash-type options (or multiple parameters such as @value{DF}s to +other dash-type options (or multiple parameters such as data files to process) are present, there is no ambiguity and @option{--} can be omitted. +@cindex exit status, of VMS +The @code{exit} value is a Unix-style value and is encoded to a VMS exit +status value when the program exits. + +The VMS severity bits will be set based on the @code{exit} value. +A failure is indicated by 1 and VMS sets the @code{ERROR} status. +A fatal error is indicated by 2 and VMS will set the @code{FATAL} status. +All other values will have the @code{SUCCESS} status. The exit value is +encoded to comply with VMS coding standards and will have the +@code{C_FACILITY_NO} of @code{0x350000} with the constant @code{0xA000} +added to the number shifted over by 3 bits to make room for the severity codes. + +To extract the actual @command{gawk} exit code from the VMS status use: + +@example +unix_status = (vms_status .and. &x7f8) / 8 +@end example + +@noindent +A C program that uses @code{exec()} to call @command{gawk} will get the original +Unix-style exit value. + +Older versions of @command{gawk} treated a Unix exit code 0 as 1, a failure +as 2, a fatal error as 4, and passed all the other numbers through. +This violated the VMS exit status coding requirements. + +@cindex floating-point, VAX/VMS +VAX/VMS floating point uses unbiased rounding. @xref{Round Function}. + +VMS reports time values in GMT unless one of the @code{SYS$TIMEZONE_RULE} +or @code{TZ} logical names is set. Older versions of VMS, such as VAX/VMS +7.3 do not set these logical names. + @c @cindex directory search @c @cindex path, search @cindex search paths @@ -34868,6 +36390,21 @@ of @env{AWKPATH} is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation and not a multitranslation @code{RMS} searchlist. +@node VMS GNV +@appendixsubsubsec The VMS GNV Project + +The VMS GNV package provides a build environment similar to POSIX with ports +of a collection of open source tools. The @command{gawk} found in the GNV +base kit is an older port. Currently the GNV project is being reorganized +to supply individual PCSI packages for each component. +See @uref{https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/}. + +The normal build procedure for @command{gawk} produces a program that +is suitable for use with GNV. + +The @file{vms/gawk_build_steps.txt} in the source documents the procedure +for building a VMS PCSI kit that is compatible with GNV. + @ignore @c The VMS POSIX product, also known as POSIX for OpenVMS, is long defunct @c and building gawk for it has not been tested in many years, but these @@ -34915,7 +36452,7 @@ define a symbol, as follows: $ @kbd{gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe} @end example -This is apparently @value{PVERSION} 2.15.6, which is extremely old. We +This is apparently version 2.15.6, which is extremely old. We recommend compiling and using the current version. @c ENDOFRANGE opgawx @@ -34925,8 +36462,8 @@ recommend compiling and using the current version. @appendixsec Reporting Problems and Bugs @cindex archeologists @quotation -@i{There is nothing more dangerous than a bored archeologist.}@* -The Hitchhiker's Guide to the Galaxy +@i{There is nothing more dangerous than a bored archeologist.} +@author The Hitchhiker's Guide to the Galaxy @end quotation @c the radio show, not the book. :-) @@ -34944,8 +36481,8 @@ what you're trying to do. If it's not clear whether you should be able to do something or not, report that too; it's a bug in the documentation! Before reporting a bug or trying to fix it yourself, try to isolate it -to the smallest possible @command{awk} program and input @value{DF} that -reproduces the problem. Then send us the program and @value{DF}, +to the smallest possible @command{awk} program and input data file that +reproduces the problem. Then send us the program and data file, some idea of what kind of Unix system you're using, the compiler you used to compile @command{gawk}, and the exact results @command{gawk} gave you. Also say what you expected to occur; this helps @@ -34999,32 +36536,37 @@ mail at the Internet address noted previously. If you find bugs in one of the non-Unix ports of @command{gawk}, please send an electronic mail message to the person who maintains that port. They -are named in the following list, as well as in the @file{README} file in the @command{gawk} -distribution. Information in the @file{README} file should be considered -authoritative if it conflicts with this @value{DOCUMENT}. +are named in the following list, as well as in the @file{README} file +in the @command{gawk} distribution. Information in the @file{README} +file should be considered authoritative if it conflicts with this +@value{DOCUMENT}. The people maintaining the non-Unix ports of @command{gawk} are as follows: -@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890} +@c put the index entries outside the table, for docbook @cindex Deifik, Scott +@cindex Zaretskii, Eli +@cindex Buening, Andreas +@cindex Rankin, Pat +@cindex Malmberg, John +@cindex Pitts, Dave +@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890} @item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}. -@cindex Zaretskii, Eli @item MS-Windows with MINGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}. -@cindex Buening, Andreas @item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}. -@cindex Rankin, Pat -@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com} +@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com}, and +John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}. -@cindex Pitts, Dave @item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}. @end multitable If your bug is also reproducible under Unix, please send a copy of your -report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as well. +report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email +list as well. @c ENDOFRANGE dbugg @c ENDOFRANGE tblgawb @@ -35042,8 +36584,8 @@ Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT) @cindex Brennan, Michael @quotation @i{It's kind of fun to put comments like this in your awk code.}@* -@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}@* -Michael Brennan +@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course} +@author Michael Brennan @end quotation There are a number of other freely available @command{awk} implementations. @@ -35053,7 +36595,7 @@ This @value{SECTION} briefly describes where to get them: @cindex Kernighan, Brian @cindex source code, Brian Kernighan's @command{awk} @cindex @command{awk}, versions of, See Also Brian Kernighan's @command{awk} -@cindex Brian Kernighan's @command{awk} +@cindex Brian Kernighan's @command{awk}, source code @item Unix @command{awk} Brian Kernighan, one of the original designers of Unix @command{awk}, has made his implementation of @@ -35073,6 +36615,7 @@ It is available in several archive formats: @uref{http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip} @end table +@cindex @command{git} utility You can also retrieve it from Git Hub: @example @@ -35085,16 +36628,14 @@ repository in a directory named @file{bwkawk}. If you leave that argument off the @command{git} command line, the repository copy is created in a directory named @file{awk}. -This version requires an ISO C (1990 standard) compiler; -the C compiler from -GCC (the GNU Compiler Collection) -works quite nicely. +This version requires an ISO C (1990 standard) compiler; the C compiler +from GCC (the GNU Compiler Collection) works quite nicely. @xref{Common Extensions}, for a list of extensions in this @command{awk} that are not in POSIX @command{awk}. @cindex Brennan, Michael -@cindex @command{mawk} program +@cindex @command{mawk} utility @cindex source code, @command{mawk} @item @command{mawk} Michael Brennan wrote an independent implementation of @command{awk}, @@ -35140,7 +36681,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}. The project seems to be frozen; no new code changes have been made since approximately 2003. -@cindex Beebe, Nelson +@cindex Beebe, Nelson H.F.@: @cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk}) @cindex source code, @command{pawk} @item @command{pawk} @@ -35169,15 +36710,22 @@ information, see the @uref{http://busybox.net, project's home page}. @cindex source code, Solaris @command{awk} @item The OpenSolaris POSIX @command{awk} The version of @command{awk} in @file{/usr/xpg4/bin} on Solaris is -more-or-less -POSIX-compliant. It is based on the @command{awk} from Mortice Kern -Systems for PCs. The source code can be downloaded from -the @uref{http://www.opensolaris.org, OpenSolaris web site}. +more-or-less POSIX-compliant. It is based on the @command{awk} from +Mortice Kern Systems for PCs. This author was able to make it compile and work under GNU/Linux with 1--2 hours of work. Making it more generally portable (using GNU Autoconf and/or Automake) would take more work, and this has not been done, at least to our knowledge. +@cindex Illumos +@cindex Illumos, POSIX-compliant @command{awk} +@cindex source code, Illumos @command{awk} +The source code used to be available from the OpenSolaris web site. +However, that project was ended and the web site shut down. Fortunately, the +@uref{http://wiki.illumos.org/display/illumos/illumos+Home, Illumos project} +makes this implementation available. You can view the files one at a time from +@uref{https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/awk_xpg4}. + @cindex @command{jawk} @cindex Java implementation of @command{awk} @cindex source code, @command{jawk} @@ -35196,6 +36744,7 @@ This is an embeddable @command{awk} interpreter derived from @uref{http://repo.hu/projects/libmawk/}. @item @code{pawk} +@cindex source code, @command{pawk} (Python version) @cindex @code{pawk}, @command{awk}-like facilities for Python This is a Python module that claims to bring @command{awk}-like features to Python. See @uref{https://github.com/alecthomas/pawk} @@ -35218,6 +36767,10 @@ under the GPL. It has a large number of extensions over standard See @uref{http://www.quiktrim.org/QTawk.html} for more information, including the manual and a download link. +@item Other Versions +See also the @uref{http://en.wikipedia.org/wiki/Awk_language#Versions_and_implementations, +Wikipedia article}, for information on additional versions. + @end table @c ENDOFRANGE gligawk @c ENDOFRANGE ingawk @@ -35297,6 +36850,7 @@ As @command{gawk} is Free Software, the source code is always available. @ref{Gawk Distribution}, describes how to get and build the formal, released versions of @command{gawk}. +@cindex @command{git} utility However, if you want to modify @command{gawk} and contribute back your changes, you will probably wish to work with the development version. To do so, you will need to access the @command{gawk} source code @@ -35368,7 +36922,7 @@ for information on getting the latest version of @command{gawk}.) @item @ifnotinfo -Follow the @cite{GNU Coding Standards}. +Follow the @uref{http://www.gnu.org/prep/standards/, @cite{GNU Coding Standards}}. @end ifnotinfo @ifinfo See @inforef{Top, , Version, standards, GNU Coding Standards}. @@ -35472,6 +37026,7 @@ If possible, please update the @command{man} page as well. You will also have to sign paperwork for your documentation changes. +@cindex @command{git} utility @item Submit changes as unified diffs. Use @samp{diff -u -r -N} to compare @@ -35527,11 +37082,9 @@ Be prepared to sign the appropriate paperwork. In order for the FSF to distribute your code, you must either place your code in the public domain and submit a signed statement to that effect, or assign the copyright in your code to the FSF. -@ifinfo Both of these actions are easy to do and @emph{many} people have done so already. If you have questions, please contact me, or @email{gnu@@gnu.org}. -@end ifinfo @item When doing a port, bear in mind that your code must coexist peacefully @@ -35607,6 +37160,8 @@ coding style and brace layout that suits your taste. @node Derived Files @appendixsubsec Why Generated Files Are Kept In @command{git} +@c STARTOFRANGE gawkgit +@cindex @command{git}, use of for @command{gawk} source code @c From emails written March 22, 2012, to the gawk developers list. If you look at the @command{gawk} source in the @command{git} @@ -35786,7 +37341,7 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta @noindent to retrieve a snapshot of the given branch. - +@c ENDOFRANGE gawkgit @node Future Extensions @appendixsec Probable Future Extensions @@ -35828,11 +37383,13 @@ Larry @cindex Wall, Larry @cindex Robbins, Arnold @quotation -@i{AWK is a language similar to PERL, only considerably more elegant.}@* -Arnold Robbins +@i{AWK is a language similar to PERL, only considerably more elegant.} +@author Arnold Robbins +@end quotation -@i{Hey!}@* -Larry Wall +@quotation +@i{Hey!} +@author Larry Wall @end quotation The @file{TODO} file in the @command{gawk} Git repository lists possible @@ -35964,7 +37521,7 @@ in order to loop over all the element in an easy fashion for C code. @item The ability to create arrays (including @command{gawk}'s true -multi-dimensional arrays). +multidimensional arrays). @end itemize @end itemize @@ -36097,11 +37654,11 @@ to any of the above. @ref{Dynamic Extensions}, describes the supported API and mechanisms for writing extensions for @command{gawk}. This API was introduced -in @value{PVERSION} 4.1. However, for many years @command{gawk} +in version 4.1. However, for many years @command{gawk} provided an extension mechanism that required knowledge of @command{gawk} internals and that was not as well designed. -In order to provide a transition period, @command{gawk} @value{PVERSION} +In order to provide a transition period, @command{gawk} version 4.1 continues to support the original extension mechanism. This will be true for the life of exactly one major release. This support will be withdrawn, and removed from the source code, at the next major @@ -36155,8 +37712,15 @@ other introductory texts that you should refer to instead.) @cindex processing data At the most basic level, the job of a program is to process -some input data and produce results. See @ref{figure-general-flow}. +some input data and produce results. +@ifnotdocbook +See @ref{figure-general-flow}. +@end ifnotdocbook +@ifdocbook +See @inlineraw{docbook, <xref linkend="figure-general-flow"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,figure-general-flow @caption{General Program Flow} @ifinfo @@ -36166,6 +37730,14 @@ some input data and produce results. See @ref{figure-general-flow}. @center @image{general-program, , , General program flow} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="figure-general-flow"> +<title>General Program Flow</title> +<graphic fileref="general-program.eps"/> +</figure> +@end docbook @cindex compiled programs @cindex interpreted programs @@ -36181,9 +37753,15 @@ instructions in your program to process the data. @cindex programming, basic steps When you write a program, it usually consists -of the following, very basic set of steps, as shown -in @ref{figure-process-flow}: +of the following, very basic set of steps, +@ifnotdocbook +as shown in @ref{figure-process-flow}: +@end ifnotdocbook +@ifdocbook +as shown in @inlineraw{docbook <xref linkend="figure-process-flow"/>}: +@end ifdocbook +@ifnotdocbook @float Figure,figure-process-flow @caption{Basic Program Steps} @ifinfo @@ -36193,6 +37771,14 @@ in @ref{figure-process-flow}: @center @image{process-flow, , , Basic Program Stages} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="figure-process-flow"> +<title>Basic Program Stages</title> +<graphic fileref="process-flow.eps"/> +</figure> +@end docbook @table @asis @item Initialization @@ -36363,7 +37949,7 @@ better written in another language. You can get it from @uref{http://awk.info/?awk100/aaa}. @cindex Ada programming language -@cindex Programming languages, Ada +@cindex programming languages, Ada @item Ada A programming language originally defined by the U.S.@: Department of Defense for embedded programming. It was designed to enforce good @@ -36431,9 +38017,6 @@ The GNU version of the standard shell @end ifinfo See also ``Bourne Shell.'' -@item BBS -See ``Bulletin Board System.'' - @item Bit Short for ``Binary Digit.'' All values in computer memory ultimately reduce to binary digits: values @@ -36508,11 +38091,6 @@ Changing some of them affects @command{awk}'s running environment. @item Braces See ``Curly Braces.'' -@item Bulletin Board System -A computer system allowing users to log in and read and/or leave messages -for other users of the system, much like leaving paper notes on a bulletin -board. - @item C The system programming language that most GNU software is written in. The @command{awk} programming language has C-like syntax, and this @value{DOCUMENT} @@ -36539,6 +38117,8 @@ The @uref{http://www.unicode.org, Unicode character set} is becoming increasingly popular and standard, and is particularly widely used on GNU/Linux systems. +@cindex Kernighan, Brian +@cindex Bentley, Jon @cindex @command{chem} utility @item CHEM A preprocessor for @command{pic} that reads descriptions of molecules @@ -36675,7 +38255,7 @@ ordinary expression. It could be a string constant, such as (@xref{Computed Regexps}.) @item Environment -A collection of strings, of the form @var{name@code{=}val}, that each +A collection of strings, of the form @var{name}@code{=}@code{val}, that each program has available to it. Users generally place values into the environment in order to provide information to various programs. Typical examples are the environment variables @env{HOME} and @env{PATH}. @@ -36844,7 +38424,7 @@ information about the name of the organization and its language-independent three-letter acronym. @cindex Java programming language -@cindex Programming languages, Java +@cindex programming languages, Java @item Java A modern programming language originally developed by Sun Microsystems (now Oracle) supporting Object-Oriented programming. Although usually @@ -37069,7 +38649,7 @@ numeric values. It is the C type @code{float}. The character generated by hitting the space bar on the keyboard. @item Special File -A @value{FN} interpreted internally by @command{gawk}, instead of being handed +A file name interpreted internally by @command{gawk}, instead of being handed directly to the underlying operating system---for example, @file{/dev/stderr}. (@xref{Special Files}.) @@ -37131,7 +38711,12 @@ record or a string. @c The GNU General Public License. @node Copying @unnumbered GNU General Public License +@ifnotdocbook @center Version 3, 29 June 2007 +@end ifnotdocbook +@docbook +<subtitle>Version 3, 29 June 2007</subtitle> +@end docbook @c This file is intended to be included within another document, @c hence no sectioning command or @node. @@ -37856,10 +39441,17 @@ first, please read @url{http://www.gnu.org/philosophy/why-not-lgpl.html}. @c The GNU Free Documentation License. @node GNU Free Documentation License @unnumbered GNU Free Documentation License +@ifnotdocbook +@center Version 1.3, 3 November 2008 +@end ifnotdocbook + +@docbook +<subtitle>Version 1.3, 3 November 2008</subtitle> +@end docbook + @cindex FDL (Free Documentation License) @cindex Free Documentation License (FDL) @cindex GNU Free Documentation License -@center Version 1.3, 3 November 2008 @c This file is intended to be included within another document, @c hence no sectioning command or @node. @@ -38364,8 +39956,10 @@ to permit their use in free software. @c ispell-local-pdict: "ispell-dict" @c End: +@ifnotdocbook @node Index @unnumbered Index +@end ifnotdocbook @printindex cp @bye @@ -38450,6 +40044,7 @@ Consistency issues: Use MS-Windows not MS Windows Use MS-DOS not MS-DOS Use an empty set of parentheses after built-in and awk function names. + Use "multiFOO" without a hyphen. Date: Wed, 13 Apr 94 15:20:52 -0400 From: rms@gnu.org (Richard Stallman) @@ -38475,8 +40070,6 @@ Suggestions: % Next edition: % 1. Standardize the error messages from the functions and programs % in the two sample code chapters. -% 2. Nuke the BBS stuff and use something that won't be obsolete -% 3. Turn the advanced notes into sidebars by using @cartouche Better sidebars can almost sort of be done with: @@ -38508,4 +40101,3 @@ But to use it you have to say } which sorta sucks. - diff --git a/doc/gawkinet.info b/doc/gawkinet.info index c8ce6b8d..0a0d69d8 100644 --- a/doc/gawkinet.info +++ b/doc/gawkinet.info @@ -613,7 +613,7 @@ tcp, udp x 0 x Invalid tcp, udp 0 0 0 Invalid tcp, udp 0 x 0 Invalid -Table 2.1: /inet Special File Components +Table 2.1: /inet Special File Components In general, TCP is the preferred mechanism to use. It is the simplest protocol to understand and to use. Use UDP only if @@ -4358,40 +4358,40 @@ Node: Using Networking17966 Node: Gawk Special Files20284 Node: Special File Fields22094 Ref: table-inet-components25967 -Node: Comparing Protocols27290 -Node: File /inet/tcp27823 -Node: File /inet/udp28849 -Node: TCP Connecting29947 -Node: Troubleshooting32285 -Ref: Troubleshooting-Footnote-135337 -Node: Interacting35906 -Node: Setting Up38636 -Node: Email42130 -Node: Web page44456 -Ref: Web page-Footnote-147261 -Node: Primitive Service47458 -Node: Interacting Service50192 -Ref: Interacting Service-Footnote-159321 -Node: CGI Lib59353 -Node: Simple Server66314 -Ref: Simple Server-Footnote-174037 -Node: Caveats74138 -Node: Challenges75281 -Node: Some Applications and Techniques83960 -Node: PANIC86417 -Node: GETURL88135 -Node: REMCONF90758 -Node: URLCHK96234 -Node: WEBGRAB100069 -Node: STATIST104519 -Ref: STATIST-Footnote-1116227 -Node: MAZE116672 -Node: MOBAGWHO122856 -Ref: MOBAGWHO-Footnote-1136800 -Node: STOXPRED136855 -Node: PROTBASE151110 -Node: Links164191 -Node: GNU Free Documentation License167625 -Node: Index192764 +Node: Comparing Protocols27287 +Node: File /inet/tcp27820 +Node: File /inet/udp28846 +Node: TCP Connecting29944 +Node: Troubleshooting32282 +Ref: Troubleshooting-Footnote-135334 +Node: Interacting35903 +Node: Setting Up38633 +Node: Email42127 +Node: Web page44453 +Ref: Web page-Footnote-147258 +Node: Primitive Service47455 +Node: Interacting Service50189 +Ref: Interacting Service-Footnote-159318 +Node: CGI Lib59350 +Node: Simple Server66311 +Ref: Simple Server-Footnote-174034 +Node: Caveats74135 +Node: Challenges75278 +Node: Some Applications and Techniques83957 +Node: PANIC86414 +Node: GETURL88132 +Node: REMCONF90755 +Node: URLCHK96231 +Node: WEBGRAB100066 +Node: STATIST104516 +Ref: STATIST-Footnote-1116224 +Node: MAZE116669 +Node: MOBAGWHO122853 +Ref: MOBAGWHO-Footnote-1136797 +Node: STOXPRED136852 +Node: PROTBASE151107 +Node: Links164188 +Node: GNU Free Documentation License167622 +Node: Index192761 End Tag Table diff --git a/doc/gawkinet.texi b/doc/gawkinet.texi index eb0f2d81..40198e1d 100644 --- a/doc/gawkinet.texi +++ b/doc/gawkinet.texi @@ -597,7 +597,7 @@ is started, @command{gawk} creates the appropriate network connection, and then two-way I/O proceeds as usual. @c last comma is part of see-also -@cindex input/output, two-way, See Also @command{gawk}, networking +@cindex input/output, two-way, See Also @command{gawk}@comma{} networking @cindex TCP/IP, sockets and At the C, C++, and Perl level, networking is accomplished via @dfn{sockets}, an Application Programming Interface (API) originally @@ -1144,9 +1144,9 @@ or the application cannot tolerate virtual circuit overhead. @node Setting Up, Email, Interacting, Using Networking @section Setting Up a Service @c last comma is part of tertiary -@cindex networks, @command{gawk} and, service, establishing +@cindex networks, @command{gawk} and, service@comma{} establishing @c last comma is part of tertiary -@cindex @command{gawk}, networking, service, establishing +@cindex @command{gawk}, networking, service@comma{} establishing The preceding programs behaved as clients that connect to a server somewhere on the Internet and request a particular service. Now we set up such a service to mimic the behavior of the @samp{daytime} service. diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 59ee1a69..791f787f 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -14,6 +14,20 @@ * awk: (gawk)Invoking gawk. Text scanning and processing. @end direntry +@ifset FOR_PRINT +@tex +\gdef\xrefprintnodename#1{``#1''} +@end tex +@end ifset +@ifclear FOR_PRINT +@c With early 2014 texinfo.tex, restore PDF links and colors +@tex +\gdef\linkcolor{0.5 0.09 0.12} % Dark Red +\gdef\urlcolor{0.5 0.09 0.12} % Also +\global\urefurlonlylinktrue +@end tex +@end ifclear + @set xref-automatic-section-title @c The following information should be updated here only! @@ -21,9 +35,9 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH May, 2013 +@set UPDATE-MONTH April, 2014 @set VERSION 4.1 -@set PATCHLEVEL 0 +@set PATCHLEVEL 1 @set FSF @@ -97,11 +111,19 @@ @end ifnottex @ifnottex +@ifnotdocbook @macro ii{text} @i{\text\} @end macro +@end ifnotdocbook @end ifnottex +@ifdocbook +@macro ii{text} +@inlineraw{docbook,<lineannotation>\text\</lineannotation>} +@end macro +@end ifdocbook + @c For HTML, spell out email addresses, to avoid problems with @c address harvesters for spammers. @ifhtml @@ -115,19 +137,36 @@ @end macro @end ifnothtml -@set FN file name -@set FFN File Name -@set DF data file -@set DDF Data File -@set PVERSION version -@set CTL Ctrl +@c Indexing macros +@ifinfo + +@macro cindexawkfunc{name} +@cindex @code{\name\} +@end macro + +@macro cindexgawkfunc{name} +@cindex @code{\name\} +@end macro + +@end ifinfo + +@ifnotinfo + +@macro cindexawkfunc{name} +@cindex @code{\name\()} function +@end macro + +@macro cindexgawkfunc{name} +@cindex @code{\name\()} function (@command{gawk}) +@end macro +@end ifnotinfo @ignore Some comments on the layout for TeX. -1. Use at least texinfo.tex 2000-09-06.09 -2. I have done A LOT of work to make this look good. There are `@page' commands - and use of `@group ... @end group' in a number of places. If you muck - with anything, it's your responsibility not to break the layout. +1. Use at least texinfo.tex 2014-01-30.15 +2. When using @docbook, if the last line is part of a paragraph, end +it with a space and @c so that the lines won't run together. This is a +quirk of the language / makeinfo, and isn't going to change. @end ignore @c merge the function and variable indexes into the concept index @@ -143,6 +182,10 @@ Some comments on the layout for TeX. @syncodeindex fn cp @syncodeindex vr cp @end ifxml +@ifdocbook +@synindex fn cp +@synindex vr cp +@end ifdocbook @c If "finalout" is commented out, the printed output will show @c black boxes that mark lines that are too long. Thus, it is @@ -154,9 +197,26 @@ Some comments on the layout for TeX. @end iftex @copying -Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, -2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013 +@docbook +<para>Published by:</para> + +<literallayout class="normal">Free Software Foundation +51 Franklin Street, Fifth Floor +Boston, MA 02110-1301 USA +Phone: +1-617-542-5942 +Fax: +1-617-542-2652 +Email: <email>gnu@@gnu.org</email> +URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout> + +<literallayout class="normal">Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2014 Free Software Foundation, Inc. +All Rights Reserved.</literallayout> +@end docbook + +@ifnotdocbook +Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @* +Free Software Foundation, Inc. +@end ifnotdocbook @sp 2 This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, @@ -196,6 +256,7 @@ supports it in developing GNU and promoting software freedom.'' @c during editing and review. @setchapternewpage odd +@shorttitlepage GNU Awk @titlepage @title @value{TITLE} @subtitle @value{SUBTITLE} @@ -203,6 +264,7 @@ supports it in developing GNU and promoting software freedom.'' @subtitle @value{UPDATE-MONTH} @author Arnold D. Robbins +@ifnotdocbook @c Include the Distribution inside the titlepage environment so @c that headings are turned off. Headings on and off do not work. @@ -227,6 +289,7 @@ URL: @uref{http://www.gnu.org/} @* ISBN 1-882114-28-0 @* @sp 2 @insertcopying +@end ifnotdocbook @end titlepage @c Thanks to Bob Chassell for directions on doing dedications. @@ -251,6 +314,18 @@ ISBN 1-882114-28-0 @* @headings on @end iftex +@docbook +<dedication> +<simplelist> +<member>To Miriam, for making me complete.</member> +<member>To Chana, for the joy you bring us.</member> +<member>To Rivka, for the exponential increase.</member> +<member>To Nachum, for the added dimension.</member> +<member>To Malka, for the new beginning.</member> +</simplelist> +</dedication> +@end docbook + @iftex @headings off @evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| @@ -259,6 +334,7 @@ ISBN 1-882114-28-0 @* @ifnottex @ifnotxml +@ifnotdocbook @node Top @top General Introduction @c Preface node should come right after the Top @@ -270,6 +346,7 @@ particular records in a file and perform operations upon them. @insertcopying +@end ifnotdocbook @end ifnotxml @end ifnottex @@ -402,10 +479,12 @@ particular records in a file and perform operations upon them. field. * Command Line Field Separator:: Setting @code{FS} from the command-line. +* Full Line Fields:: Making the full line be a single + field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. @@ -556,9 +635,9 @@ particular records in a file and perform operations upon them. @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. * Arrays of Arrays:: True multidimensional arrays. * Built-in:: Summarizes the built-in functions. * Calling Built-in:: How to call built-in functions. @@ -610,6 +689,8 @@ particular records in a file and perform operations upon them. * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at + once. * Data File Management:: Functions for managing command-line data files. * Filetrans Function:: A function for handling data file @@ -727,6 +808,7 @@ particular records in a file and perform operations upon them. * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with @command{gawk}. @@ -789,6 +871,8 @@ particular records in a file and perform operations upon them. version of @command{awk}. * POSIX/GNU:: The extensions in @command{gawk} not in POSIX @command{awk}. +* Feature History:: The history of the features in + @command{gawk}. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. @@ -821,9 +905,12 @@ particular records in a file and perform operations upon them. * VMS Installation:: Installing @command{gawk} on VMS. * VMS Compilation:: How to compile @command{gawk} under VMS. +* VMS Dynamic Extensions:: Compiling @command{gawk} dynamic + extensions on VMS. * VMS Installation Details:: How to install @command{gawk} under VMS. * VMS Running:: How to run @command{gawk} under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. * Bugs:: Reporting Problems and Bugs. @@ -957,21 +1044,37 @@ and the AWK prototype becomes the product. The new @command{pgawk} (profiling @command{gawk}), produces program execution counts. I recently experimented with an algorithm that for -@math{n} lines of input, exhibited +@ifnotdocbook +@math{n} +@end ifnotdocbook +@ifdocbook +@i{n} +@end ifdocbook +lines of input, exhibited @tex $\sim\! Cn^2$ @end tex @ifnottex +@ifnotdocbook ~ C n^2 +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>∼ Cn<superscript>2</superscript></emphasis> @c +@end docbook performance, while theory predicted @tex $\sim\! Cn\log n$ @end tex @ifnottex +@ifnotdocbook ~ C n log n +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>∼ Cn log n</emphasis> @c +@end docbook behavior. A few minutes poring over the @file{awkprof.out} profile pinpointed the problem to a single line of code. @command{pgawk} is a welcome addition to @@ -981,6 +1084,7 @@ Arnold has distilled over a decade of experience writing and using AWK programs, and developing @command{gawk}, into this book. If you use AWK or want to learn how, then read this book. +@cindex Brennan, Michael @display Michael Brennan Author of @command{mawk} @@ -1005,6 +1109,7 @@ Such jobs are often easier with @command{awk}. The @command{awk} utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. +@cindex Brian Kernighan's @command{awk} The GNU implementation of @command{awk} is called @command{gawk}; if you invoke it with the proper options or environment variables (@pxref{Options}), it is fully @@ -1155,17 +1260,17 @@ wrote the bulk of @cite{TCP/IP Internetworking with @command{gawk}} (a separate document, available as part of the @command{gawk} distribution). His code finally became part of the main @command{gawk} distribution -with @command{gawk} @value{PVERSION} 3.1. +with @command{gawk} version 3.1. John Haque rewrote the @command{gawk} internals, in the process providing an @command{awk}-level debugger. This version became available as -@command{gawk} @value{PVERSION} 4.0, in 2011. +@command{gawk} version 4.0, in 2011. @xref{Contributors}, for a complete list of those who made important contributions to @command{gawk}. @node Names -@section A Rose by Any Other Name +@unnumberedsec A Rose by Any Other Name @cindex @command{awk}, new vs.@: old The @command{awk} language has evolved over the years. Full details are @@ -1201,7 +1306,7 @@ we simply use the term @command{awk}. When referring to a feature that is specific to the GNU implementation, we use the term @command{gawk}. @node This Manual -@section Using This Book +@unnumberedsec Using This Book @cindex @command{awk}, terms describing The term @command{awk} refers to a particular program as well as to the language you @@ -1211,7 +1316,7 @@ and the program ``the @command{awk} utility.'' This @value{DOCUMENT} explains both how to write programs in the @command{awk} language and how to run the @command{awk} utility. -The term @dfn{@command{awk} program} refers to a program written by you in +The term ``@command{awk} program'' refers to a program written by you in the @command{awk} programming language. @cindex @command{gawk}, @command{awk} and @@ -1374,7 +1479,7 @@ present the licenses that cover the @command{gawk} source code and this @value{DOCUMENT}, respectively. @node Conventions -@section Typographical Conventions +@unnumberedsec Typographical Conventions @cindex Texinfo This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, @@ -1413,23 +1518,23 @@ emphasized @emph{like this}, and if a point needs to be made strongly, it is done @strong{like this}. The first occurrence of a new term is usually its @dfn{definition} and appears in the same font as the previous occurrence of ``definition'' in this sentence. -Finally, @value{FN}s are indicated like this: @file{/path/to/ourfile}. +Finally, file names are indicated like this: @file{/path/to/ourfile}. @end ifnotinfo Characters that you type at the keyboard look @kbd{like this}. In particular, there are special characters called ``control characters.'' These are characters that you type by holding down both the @kbd{CONTROL} key and -another key, at the same time. For example, a @kbd{@value{CTL}-d} is typed +another key, at the same time. For example, a @kbd{Ctrl-d} is typed by first pressing and holding the @kbd{CONTROL} key, next pressing the @kbd{d} key and finally releasing both keys. @c fakenode --- for prepinfo -@subsubheading Dark Corners +@unnumberedsubsec Dark Corners @cindex Kernighan, Brian @quotation @i{Dark corners are basically fractal --- no matter how much -you illuminate, there's always a smaller but darker one.}@* -Brian Kernighan +you illuminate, there's always a smaller but darker one.} +@author Brian Kernighan @end quotation @cindex d.c., See dark corner @@ -1564,7 +1669,7 @@ of @cite{GAWK: The GNU Awk User's Guide}. Edition @value{EDITION} maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features -in @command{gawk} @value{PVERSION} @value{VERSION}. +in @command{gawk} version @value{VERSION}. Of particular note is @ref{Array Sorting}, @ref{Bitwise Functions}, @@ -1727,7 +1832,7 @@ significant editorial help for this @value{DOCUMENT} for the 3.1 release of @command{gawk}. @end quotation -@cindex Beebe, Nelson +@cindex Beebe, Nelson H.F.@: @cindex Buening, Andreas @cindex Collado, Manuel @cindex Colombo, Antonio @@ -1744,7 +1849,6 @@ significant editorial help for this @value{DOCUMENT} for the @cindex Rankin, Pat @cindex Schorr, Andrew @cindex Vinschen, Corinna -@cindex Wallin, Anders @cindex Zaretskii, Eli Dr.@: Nelson Beebe, @@ -1764,7 +1868,6 @@ Chet Ramey, Pat Rankin, Andrew Schorr, Corinna Vinschen, -Anders Wallin, and Eli Zaretskii (in alphabetical order) make up the current @@ -2000,9 +2103,9 @@ awk '@var{program}' @noindent @command{awk} applies the @var{program} to the @dfn{standard input}, which usually means whatever you type on the terminal. This continues -until you indicate end-of-file by typing @kbd{@value{CTL}-d}. +until you indicate end-of-file by typing @kbd{Ctrl-d}. (On other operating systems, the end-of-file character may be different. -For example, on OS/2, it is @kbd{@value{CTL}-z}.) +For example, on OS/2, it is @kbd{Ctrl-z}.) @cindex files, input, See input files @cindex input files, running @command{awk} without @@ -2022,11 +2125,11 @@ $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} @print{} Don't Panic! @end example -@cindex quoting -@cindex double quote (@code{"}) -@cindex @code{"} (double quote) -@cindex @code{\} (backslash) -@cindex backslash (@code{\}) +@cindex shell quoting, double quote +@cindex double quote (@code{"}) in shell commands +@cindex @code{"} (double quote) in shell commands +@cindex @code{\} (backslash) in shell commands +@cindex backslash (@code{\}) in shell commands This program does not read any input. The @samp{\} before each of the inner double quotes is necessary because of the shell's quoting rules---in particular because it mixes both single quotes and @@ -2048,7 +2151,7 @@ $ @kbd{awk '@{ print @}'} @print{} Four score and seven years ago, ... @kbd{What, me worry?} @print{} What, me worry? -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @node Long @@ -2065,11 +2168,10 @@ more convenient to put the program into a separate file. In order to tell awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} @end example -@cindex @code{-f} option -@cindex command line, options -@cindex options, command-line +@cindex @option{-f} option +@cindex command line, option @option{-f} The @option{-f} instructs the @command{awk} utility to get the @command{awk} program -from the file @var{source-file}. Any @value{FN} can be used for +from the file @var{source-file}. Any file name can be used for @var{source-file}. For example, you could put the program: @example @@ -2090,22 +2192,22 @@ does the same thing as this one: awk "BEGIN @{ print \"Don't Panic!\" @}" @end example -@cindex quoting +@cindex quoting in @command{gawk} command lines @noindent This was explained earlier (@pxref{Read Terminal}). -Note that you don't usually need single quotes around the @value{FN} that you -specify with @option{-f}, because most @value{FN}s don't contain any of the shell's +Note that you don't usually need single quotes around the file name that you +specify with @option{-f}, because most file names don't contain any of the shell's special characters. Notice that in @file{advice}, the @command{awk} program did not have single quotes around it. The quotes are only needed for programs that are provided on the @command{awk} command line. @c STARTOFRANGE sq1x -@cindex single quote (@code{'}) +@cindex single quote (@code{'}) in @command{gawk} command lines @c STARTOFRANGE qs2x -@cindex @code{'} (single quote) +@cindex @code{'} (single quote) in @command{gawk} command lines If you want to clearly identify your @command{awk} program files as such, -you can add the extension @file{.awk} to the @value{FN}. This doesn't +you can add the extension @file{.awk} to the file name. This doesn't affect the execution of the @command{awk} program but it does make ``housekeeping'' easier. @@ -2132,13 +2234,13 @@ BEGIN @{ print "Don't Panic!" @} After making this file executable (with the @command{chmod} utility), simply type @samp{advice} at the shell and the system arranges to run @command{awk}@footnote{The -line beginning with @samp{#!} lists the full @value{FN} of an interpreter +line beginning with @samp{#!} lists the full file name of an interpreter to run and an optional initial command-line argument to pass to that interpreter. The operating system then runs the interpreter with the given argument and the full argument list of the executed program. The first argument -in the list is the full @value{FN} of the @command{awk} program. +in the list is the full file name of the @command{awk} program. The rest of the -argument list contains either options to @command{awk}, or @value{DF}s, +argument list contains either options to @command{awk}, or data files, or both. Note that on many systems @command{awk} may be found in @file{/usr/bin} instead of in @file{/bin}. Caveat Emptor.} as if you had typed @samp{awk -f advice}: @@ -2213,7 +2315,7 @@ programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time. -@cindex quoting +@cindex quoting, for small awk programs @cindex single quote (@code{'}), vs.@: apostrophe @cindex @code{'} (single quote), vs.@: apostrophe @quotation CAUTION @@ -2254,7 +2356,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules. @node Quoting @subsection Shell-Quoting Issues -@cindex quoting, rules for +@cindex shell quoting, rules for @menu * DOS Quoting:: Quoting in Windows Batch Files. @@ -2289,10 +2391,10 @@ that character. The shell removes the backslash and passes the quoted character on to the command. @item -@cindex @code{\} (backslash) -@cindex backslash (@code{\}) -@cindex single quote (@code{'}) -@cindex @code{'} (single quote) +@cindex @code{\} (backslash), in shell commands +@cindex backslash (@code{\}), in shell commands +@cindex single quote (@code{'}), in shell commands +@cindex @code{'} (single quote), in shell commands Single quotes protect everything between the opening and closing quotes. The shell does no interpretation of the quoted text, passing it on verbatim to the command. @@ -2302,8 +2404,8 @@ Refer back to for an example of what happens if you try. @item -@cindex double quote (@code{"}) -@cindex @code{"} (double quote) +@cindex double quote (@code{"}), in shell commands +@cindex @code{"} (double quote), in shell commands Double quotes protect most things between the opening and closing quotes. The shell does at least variable and command substitution on the quoted text. Different shells may do additional kinds of processing on double-quoted text. @@ -2340,7 +2442,7 @@ awk -F "" '@var{program}' @var{files} # correct @end example @noindent -@cindex null strings, quoting and +@cindex null strings in @command{gawk} arguments, quoting and Don't use this: @example @@ -2349,11 +2451,11 @@ awk -F"" '@var{program}' @var{files} # wrong! @noindent In the second case, @command{awk} will attempt to use the text of the program -as the value of @code{FS}, and the first @value{FN} as the text of the program! +as the value of @code{FS}, and the first file name as the text of the program! This results in syntax errors at best, and confusing behavior at worst. @end itemize -@cindex quoting, tricks for +@cindex quoting in @command{gawk} command lines, tricks for Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this: @@ -2464,49 +2566,48 @@ gawk "@{ print \"\042\" $0 \"\042\" @}" @var{file} @node Sample Data Files -@section @value{DDF}s for the Examples +@section Data Files for the Examples @c For gawk >= 4.0, update these data files. No-one has such slow modems! @cindex input files, examples -@cindex @code{BBS-list} file +@cindex @code{mail-list} file Many of the examples in this @value{DOCUMENT} take their input from two sample -@value{DF}s. The first, @file{BBS-list}, represents a list of -computer bulletin board systems together with information about those systems. -The second @value{DF}, called @file{inventory-shipped}, contains +data files. The first, @file{mail-list}, represents a list of peoples' names +together with their email addresses and information about those people. +The second data file, called @file{inventory-shipped}, contains information about monthly shipments. In both files, each line is considered to be one @dfn{record}. -In the @value{DF} @file{BBS-list}, each record contains the name of a computer -bulletin board, its phone number, the board's baud rate(s), and a code for -the number of hours it is operational. An @samp{A} in the last column -means the board operates 24 hours a day. A @samp{B} in the last -column means the board only operates on evening and weekend hours. -A @samp{C} means the board operates only on weekends: +In the data file @file{mail-list}, each record contains the name of a person, +his/her phone number, his/her email-address, and a code for their relationship +with the author of the list. An @samp{A} in the last column +means that the person is an acquaintance. An @samp{F} in the last +column means that the person is a friend. +An @samp{R} means that the person is a relative: -@c 2e: Update the baud rates to reflect today's faster modems @example @c system if test ! -d eg ; then mkdir eg ; fi @c system if test ! -d eg/lib ; then mkdir eg/lib ; fi @c system if test ! -d eg/data ; then mkdir eg/data ; fi @c system if test ! -d eg/prog ; then mkdir eg/prog ; fi @c system if test ! -d eg/misc ; then mkdir eg/misc ; fi -@c file eg/data/BBS-list -aardvark 555-5553 1200/300 B -alpo-net 555-3412 2400/1200/300 A -barfly 555-7685 1200/300 A -bites 555-1675 2400/1200/300 A -camelot 555-0542 300 C -core 555-2912 1200/300 C -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sdace 555-3430 2400/1200/300 A -sabafoo 555-2127 1200/300 C +@c file eg/data/mail-list +Amelia 555-5553 amelia.zodiacusque@@gmail.com F +Anthony 555-3412 anthony.asserturo@@hotmail.com A +Becky 555-7685 becky.algebrarum@@gmail.com A +Bill 555-1675 bill.drowning@@hotmail.com A +Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +Camilla 555-2912 camilla.infusarum@@skynet.be R +Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +Julie 555-6699 julie.perscrutabor@@skeeve.com F +Martin 555-6480 martin.codicibus@@hotmail.com A +Samuel 555-3430 samuel.lanceolis@@shu.edu A +Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @c endfile @end example @cindex @code{inventory-shipped} file -The @value{DF} @file{inventory-shipped} represents +The data file @file{inventory-shipped} represents information about shipments during the year. Each record contains the month, the number of green crates shipped, the number of red boxes shipped, the number of @@ -2536,45 +2637,30 @@ Apr 21 70 74 514 @c endfile @end example -@ifinfo -If you are reading this in GNU Emacs using Info, you can copy the regions -of text showing these sample files into your own test files. This way you -can try out the examples shown in the remainder of this document. You do -this by using the command @kbd{M-x write-region} to copy text from the Info -file into a file for use with @command{awk} -(@xref{Misc File Ops, , Miscellaneous File Operations, emacs, GNU Emacs Manual}, -for more information). Using this information, create your own -@file{BBS-list} and @file{inventory-shipped} files and practice what you -learn in this @value{DOCUMENT}. - -@cindex Texinfo -If you are using the stand-alone version of Info, -see @ref{Extract Program}, -for an @command{awk} program that extracts these @value{DF}s from -@file{gawk.texi}, the Texinfo source file for this Info file. -@end ifinfo +The sample files are included in the @command{gawk} distribution, +in the directory @file{awklib/eg/data}. @node Very Simple @section Some Simple Examples The following command runs a simple @command{awk} program that searches the -input file @file{BBS-list} for the character string @samp{foo} (a +input file @file{mail-list} for the character string @samp{li} (a grouping of characters is usually called a @dfn{string}; the term @dfn{string} is based on similar usage in English, such as ``a string of pearls,'' or ``a string of cars in a train''): @example -awk '/foo/ @{ print $0 @}' BBS-list +awk '/li/ @{ print $0 @}' mail-list @end example @noindent -When lines containing @samp{foo} are found, they are printed because +When lines containing @samp{li} are found, they are printed because @w{@samp{print $0}} means print the current line. (Just @samp{print} by itself means the same thing, so we could have written that instead.) -You will notice that slashes (@samp{/}) surround the string @samp{foo} -in the @command{awk} program. The slashes indicate that @samp{foo} +You will notice that slashes (@samp{/}) surround the string @samp{li} +in the @command{awk} program. The slashes indicate that @samp{li} is the pattern to search for. This type of pattern is called a @dfn{regular expression}, which is covered in more detail later (@pxref{Regexp}). @@ -2586,11 +2672,11 @@ interpret any of it as special shell characters. Here is what this program prints: @example -$ @kbd{awk '/foo/ @{ print $0 @}' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '/li/ @{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @end example @cindex actions, default @@ -2603,7 +2689,7 @@ action is to print all lines that match the pattern. @cindex actions, empty Thus, we could leave out the action (the @code{print} statement and the curly braces) in the previous example and the result would be the same: -@command{awk} prints all lines matching the pattern @samp{foo}. By comparison, +@command{awk} prints all lines matching the pattern @samp{li}. By comparison, omitting the @code{print} statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed). @@ -2613,9 +2699,9 @@ collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description of the program will give you a good idea of what is going on, but please read the rest of the @value{DOCUMENT} to become an @command{awk} expert!) -Most of the examples use a @value{DF} named @file{data}. This is just a +Most of the examples use a data file named @file{data}. This is just a placeholder; if you use these programs yourself, substitute -your own @value{FN}s for @file{data}. +your own file names for @file{data}. For future reference, note that there is often more than one way to do things in @command{awk}. At some point, you may want to look back at these examples and see if @@ -2705,7 +2791,7 @@ awk 'END @{ print NR @}' data @end example @item -Print the even-numbered lines in the @value{DF}: +Print the even-numbered lines in the data file: @example awk 'NR % 2 == 0' data @@ -2747,30 +2833,24 @@ This program prints every line that contains the string @samp{12} @emph{or} the string @samp{21}. If a line contains both strings, it is printed twice, once by each rule. -This is what happens if we run this program on our two sample @value{DF}s, -@file{BBS-list} and @file{inventory-shipped}: +This is what happens if we run this program on our two sample data files, +@file{mail-list} and @file{inventory-shipped}: @example $ @kbd{awk '/12/ @{ print $0 @}} -> @kbd{/21/ @{ print $0 @}' BBS-list inventory-shipped} -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} core 555-2912 1200/300 C -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@print{} sabafoo 555-2127 1200/300 C +> @kbd{/21/ @{ print $0 @}' mail-list inventory-shipped} +@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A +@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @print{} Jan 21 36 64 620 @print{} Apr 21 70 74 514 @end example @noindent -Note how the line beginning with @samp{sabafoo} -in @file{BBS-list} was printed twice, once for each rule. +Note how the line beginning with @samp{Jean-Paul} +in @file{mail-list} was printed twice, once for each rule. @node More Complex @section A More Complex Example @@ -2813,7 +2893,7 @@ the file. The fourth field identifies the group of the file. The fifth field contains the size of the file in bytes. The sixth, seventh, and eighth fields contain the month, day, and time, respectively, that the file was last modified. Finally, the ninth field -contains the @value{FN}.@footnote{The @samp{LC_ALL=C} is +contains the file name.@footnote{The @samp{LC_ALL=C} is needed to produce this traditional-style output from @command{ls}.} @c @cindex automatic initialization @@ -2849,7 +2929,7 @@ separate rule, like this: @example awk '/12/ @{ print $0 @} - /21/ @{ print $0 @}' BBS-list inventory-shipped + /21/ @{ print $0 @}' mail-list inventory-shipped @end example @cindex @command{gawk}, newlines in @@ -2964,8 +3044,8 @@ noticed because it is ``hidden'' inside the comment. Thus, the @code{BEGIN} is noted as a syntax error. @cindex statements, multiple -@cindex @code{;} (semicolon) -@cindex semicolon (@code{;}) +@cindex @code{;} (semicolon), separating statements in actions +@cindex semicolon (@code{;}), separating statements in actions When @command{awk} statements within one rule are short, you might want to put more than one of them on a line. This is accomplished by separating the statements with a semicolon (@samp{;}). @@ -3025,6 +3105,7 @@ used once, and thrown away. Because @command{awk} programs are interpreted, you can avoid the (usually lengthy) compilation part of the typical edit-compile-test-debug cycle of software development. +@cindex Brian Kernighan's @command{awk} Complex programs have been written in @command{awk}, including a complete retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for more information), and a microcode assembler for a special-purpose Prolog @@ -3049,7 +3130,7 @@ easier to maintain and usually run more efficiently. @node Invoking Gawk @chapter Running @command{awk} and @command{gawk} -This @value{CHAPTER} covers how to run awk, both POSIX-standard +This @value{CHAPTER} covers how to run @command{awk}, both POSIX-standard and @command{gawk}-specific command-line options, and what @command{awk} and @command{gawk} do with non-option arguments. @@ -3087,10 +3168,19 @@ There are two ways to run @command{awk}---with an explicit program or with one or more program files. Here are templates for both of them; items enclosed in [@dots{}] in these templates are optional: +@ifnotdocbook @example awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{} awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{} @end example +@end ifnotdocbook + +@c FIXME - find a better way to mark this up in docbook +@docbook +<screen>awk [<replaceable>options</replaceable>] -f progfile [<literal>--</literal>] <replaceable>file</replaceable> … +awk [<replaceable>options</replaceable>] [<literal>--</literal>] '<replaceable>program</replaceable>' <replaceable>file</replaceable> … +</screen> +@end docbook @cindex GNU long options @cindex long options @@ -3106,7 +3196,7 @@ It is possible to invoke @command{awk} with an empty program: awk '' datafile1 datafile2 @end example -@cindex @code{--lint} option +@cindex @option{--lint} option @noindent Doing so makes little sense, though; @command{awk} exits silently when given an empty program. @@ -3146,43 +3236,27 @@ The following list describes options mandated by the POSIX standard: @table @code @item -F @var{fs} @itemx --field-separator @var{fs} -@cindex @code{-F} option -@cindex @code{--field-separator} option +@cindex @option{-F} option +@cindex @option{--field-separator} option @cindex @code{FS} variable, @code{--field-separator} option and Set the @code{FS} variable to @var{fs} (@pxref{Field Separators}). @item -f @var{source-file} @itemx --file @var{source-file} -@cindex @code{-f} option -@cindex @code{--file} option +@cindex @option{-f} option +@cindex @option{--file} option @cindex @command{awk} programs, location of Read @command{awk} program source from @var{source-file} instead of in the first non-option argument. This option may be given multiple times; the @command{awk} -program consists of the concatenation the contents of +program consists of the concatenation of the contents of each specified @var{source-file}. -@item -i @var{source-file} -@itemx --include @var{source-file} -@cindex @code{-i} option -@cindex @code{--include} option -@cindex @command{awk} programs, location of -Read @command{awk} source library from @var{source-file}. This option is -completely equivalent to using the @samp{@@include} directive inside -your program. This option is very -similar to the @option{-f} option, but there are two important differences. -First, when @option{-i} is used, the program source will not be loaded if it has -been previously loaded, whereas the @option{-f} will always load the file. -Second, because this option is intended to be used with code libraries, -@command{gawk} does not recognize such files as constituting main program -input. Thus, after processing an @option{-i} argument, @command{gawk} still expects to -find the main source code via the @option{-f} option or on the command-line. - @item -v @var{var}=@var{val} @itemx --assign @var{var}=@var{val} -@cindex @code{-v} option -@cindex @code{--assign} option +@cindex @option{-v} option +@cindex @option{--assign} option @cindex variables, setting Set the variable @var{var} to the value @var{val} @emph{before} execution of the program begins. Such variable values are available @@ -3203,7 +3277,7 @@ predefined value you may have given. @end quotation @item -W @var{gawk-opt} -@cindex @code{-W} option +@cindex @option{-W} option Provide an implementation-specific option. This is the POSIX convention for providing implementation-specific options. These options @@ -3222,8 +3296,8 @@ conventions. @cindex @code{-} (hyphen), filenames beginning with @cindex hyphen (@code{-}), filenames beginning with -This is useful if you have @value{FN}s that start with @samp{-}, -or in shell scripts, if you have @value{FN}s that will be specified +This is useful if you have file names that start with @samp{-}, +or in shell scripts, if you have file names that will be specified by the user that could start with @samp{-}. It is also useful for passing options on to the @command{awk} program; see @ref{Getopt Function}. @@ -3236,8 +3310,8 @@ The following list describes @command{gawk}-specific options: @table @code @item -b @itemx --characters-as-bytes -@cindex @code{-b} option -@cindex @code{--characters-as-bytes} option +@cindex @option{-b} option +@cindex @option{--characters-as-bytes} option Cause @command{gawk} to treat all input data as single-byte characters. In addition, all output written with @code{print} or @code{printf} are treated as single-byte characters. @@ -3251,8 +3325,8 @@ multibyte characters. This option is an easy way to tell @command{gawk}: @item -c @itemx --traditional -@cindex @code{--c} option -@cindex @code{--traditional} option +@cindex @option{-c} option +@cindex @option{--traditional} option @cindex compatibility mode (@command{gawk}), specifying Specify @dfn{compatibility mode}, in which the GNU extensions to the @command{awk} language are disabled, so that @command{gawk} behaves just @@ -3263,17 +3337,18 @@ which summarizes the extensions. Also see @item -C @itemx --copyright -@cindex @code{-C} option -@cindex @code{--copyright} option +@cindex @option{-C} option +@cindex @option{--copyright} option @cindex GPL (General Public License), printing Print the short version of the General Public License and then exit. @item -d@r{[}@var{file}@r{]} @itemx --dump-variables@r{[}=@var{file}@r{]} -@cindex @code{-d} option -@cindex @code{--dump-variables} option -@cindex @code{awkvars.out} file -@cindex files, @code{awkvars.out} +@cindex @option{-d} option +@cindex @option{--dump-variables} option +@cindex dump all variables of a program +@cindex @file{awkvars.out} file +@cindex files, @file{awkvars.out} @cindex variables, global, printing list of Print a sorted list of global variables, their types, and final values to @var{file}. If no @var{file} is provided, print this @@ -3292,8 +3367,8 @@ names like @code{i}, @code{j}, etc.) @item -D@r{[}@var{file}@r{]} @itemx --debug=@r{[}@var{file}@r{]} -@cindex @code{-D} option -@cindex @code{--debug} option +@cindex @option{-D} option +@cindex @option{--debug} option @cindex @command{awk} debugging, enabling Enable debugging of @command{awk} programs (@pxref{Debugging}). @@ -3305,8 +3380,8 @@ No space is allowed between the @option{-D} and @var{file}, if @item -e @var{program-text} @itemx --source @var{program-text} -@cindex @code{-e} option -@cindex @code{--source} option +@cindex @option{-e} option +@cindex @option{--source} option @cindex source code, mixing Provide program source code in the @var{program-text}. This option allows you to mix source code in files with source @@ -3317,8 +3392,8 @@ programs (@pxref{AWKPATH Variable}). @item -E @var{file} @itemx --exec @var{file} -@cindex @code{-E} option -@cindex @code{--exec} option +@cindex @option{-E} option +@cindex @option{--exec} option @cindex @command{awk} programs, location of @cindex CGI, @command{awk} scripts for Similar to @option{-f}, read @command{awk} program text from @var{file}. @@ -3348,8 +3423,8 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so: @item -g @itemx --gen-pot -@cindex @code{-g} option -@cindex @code{--gen-pot} option +@cindex @option{-g} option +@cindex @option{--gen-pot} option @cindex portable object files, generating @cindex files, portable object, generating Analyze the source program and @@ -3360,18 +3435,34 @@ for information about this option. @item -h @itemx --help -@cindex @code{-h} option -@cindex @code{--help} option +@cindex @option{-h} option +@cindex @option{--help} option @cindex GNU long options, printing list of @cindex options, printing list of @cindex printing, list of options Print a ``usage'' message summarizing the short and long style options that @command{gawk} accepts and then exit. +@item -i @var{source-file} +@itemx --include @var{source-file} +@cindex @option{-i} option +@cindex @option{--include} option +@cindex @command{awk} programs, location of +Read @command{awk} source library from @var{source-file}. This option is +completely equivalent to using the @samp{@@include} directive inside +your program. This option is very +similar to the @option{-f} option, but there are two important differences. +First, when @option{-i} is used, the program source will not be loaded if it has +been previously loaded, whereas the @option{-f} will always load the file. +Second, because this option is intended to be used with code libraries, +@command{gawk} does not recognize such files as constituting main program +input. Thus, after processing an @option{-i} argument, @command{gawk} still expects to +find the main source code via the @option{-f} option or on the command-line. + @item -l @var{lib} @itemx --load @var{lib} -@cindex @code{-l} option -@cindex @code{--load} option +@cindex @option{-l} option +@cindex @option{--load} option @cindex loading, library Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH} environment variable. The correct library suffix for your platform will be @@ -3382,8 +3473,8 @@ a shared library. @item -L @r{[}value@r{]} @itemx --lint@r{[}=value@r{]} -@cindex @code{-l} option -@cindex @code{--lint} option +@cindex @option{-l} option +@cindex @option{--lint} option @cindex lint checking, issuing warnings @cindex warnings, issuing Warn about constructs that are dubious or nonportable to @@ -3405,16 +3496,16 @@ care to search for all occurrences of each inappropriate construct. As @item -M @itemx --bignum -@cindex @code{-M} option -@cindex @code{--bignum} option +@cindex @option{-M} option +@cindex @option{--bignum} option Force arbitrary precision arithmetic on numbers. This option has no effect if @command{gawk} is not compiled to use the GNU MPFR and MP libraries -(@pxref{Arbitrary Precision Arithmetic}). +(@pxref{Gawk and MPFR}). @item -n @itemx --non-decimal-data -@cindex @code{-n} option -@cindex @code{--non-decimal-data} option +@cindex @option{-n} option +@cindex @option{--non-decimal-data} option @cindex hexadecimal values@comma{} enabling interpretation of @cindex octal values@comma{} enabling interpretation of @cindex troubleshooting, @code{--non-decimal-data} option @@ -3429,40 +3520,40 @@ Use with care. @item -N @itemx --use-lc-numeric -@cindex @code{-N} option -@cindex @code{--use-lc-numeric} option +@cindex @option{-N} option +@cindex @option{--use-lc-numeric} option Force the use of the locale's decimal point character when parsing numeric input data (@pxref{Locales}). @item -o@r{[}@var{file}@r{]} @itemx --pretty-print@r{[}=@var{file}@r{]} -@cindex @code{-o} option -@cindex @code{--pretty-print} option +@cindex @option{-o} option +@cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. By default, output program is created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different -@value{FN} for the output. +file name for the output. No space is allowed between the @option{-o} and @var{file}, if @var{file} is supplied. @item -O @itemx --optimize -@cindex @code{--optimize} option -@cindex @code{-O} option +@cindex @option{--optimize} option +@cindex @option{-O} option Enable some optimizations on the internal representation of the program. At the moment this includes just simple constant folding. The @command{gawk} maintainer hopes to add more optimizations over time. @item -p@r{[}@var{file}@r{]} @itemx --profile@r{[}=@var{file}@r{]} -@cindex @code{-p} option -@cindex @code{--profile} option +@cindex @option{-p} option +@cindex @option{--profile} option @cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). By default, profiles are created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different -@value{FN} for the profile file. +file name for the profile file. No space is allowed between the @option{-p} and @var{file}, if @var{file} is supplied. @@ -3471,8 +3562,8 @@ in the left margin, and function call counts for each function. @item -P @itemx --posix -@cindex @code{-P} option -@cindex @code{--posix} option +@cindex @option{-P} option +@cindex @option{--posix} option @cindex POSIX mode @cindex @command{gawk}, extensions@comma{} disabling Operate in strict POSIX mode. This disables all @command{gawk} @@ -3513,16 +3604,16 @@ data (@pxref{Locales}). @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex @code{--traditional} option, @code{--posix} option and -@cindex @code{--posix} option, @code{--traditional} option and +@cindex @option{--traditional} option, @code{--posix} option and +@cindex @option{--posix} option, @code{--traditional} option and If you supply both @option{--traditional} and @option{--posix} on the command line, @option{--posix} takes precedence. @command{gawk} also issues a warning if both options are supplied. @item -r @itemx --re-interval -@cindex @code{-r} option -@cindex @code{--re-interval} option +@cindex @option{-r} option +@cindex @option{--re-interval} option @cindex regular expressions, interval expressions and Allow interval expressions (@pxref{Regexp Operators}) @@ -3533,8 +3624,8 @@ and for use in combination with the @option{--traditional} option. @item -S @itemx --sandbox -@cindex @code{-S} option -@cindex @code{--sandbox} option +@cindex @option{-S} option +@cindex @option{--sandbox} option @cindex sandbox mode Disable the @code{system()} function, input redirections with @code{getline}, @@ -3546,16 +3637,16 @@ can't access your system (other than the specified input data file). @item -t @itemx --lint-old -@cindex @code{--L} option -@cindex @code{--lint-old} option +@cindex @option{-L} option +@cindex @option{--lint-old} option Warn about constructs that are not available in the original version of @command{awk} from Version 7 Unix (@pxref{V7/SVR3.1}). @item -V @itemx --version -@cindex @code{-V} option -@cindex @code{--version} option +@cindex @option{-V} option +@cindex @option{--version} option @cindex @command{gawk}, versions of, information about@comma{} printing Print version information for this particular copy of @command{gawk}. This allows you to determine if your copy of @command{gawk} is up to date @@ -3569,14 +3660,14 @@ As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored. -@cindex @code{-F} option, @code{-Ft} sets @code{FS} to TAB +@cindex @option{-F} option, @option{-Ft} sets @code{FS} to TAB In compatibility mode, as a special case, if the value of @var{fs} supplied to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB character (@code{"\t"}). This is true only for @option{--traditional} and not for @option{--posix} (@pxref{Field Separators}). -@cindex @code{-f} option, multiple uses +@cindex @option{-f} option, multiple uses The @option{-f} option may be used more than once on the command line. If it is, @command{awk} reads its program source from all of the named files, as if they had been concatenated together into one big file. This is @@ -3590,7 +3681,7 @@ function names must be unique.) With standard @command{awk}, library functions can still be used, even if the program is entered at the terminal, by specifying @samp{-f /dev/tty}. After typing your program, -type @kbd{@value{CTL}-d} (the end-of-file character) to terminate it. +type @kbd{Ctrl-d} (the end-of-file character) to terminate it. (You may also use @samp{-f -} to read program source from the standard input but then you will not be able to also use the standard input as a source of data.) @@ -3603,7 +3694,7 @@ and library source code (@pxref{AWKPATH Variable}). The @option{--source} option may also be used multiple times on the command line. -@cindex @code{--source} option +@cindex @option{--source} option If no @option{-f} or @option{--source} option is specified, then @command{gawk} uses the first non-option command-line argument as the text of the program source code. @@ -3662,6 +3753,7 @@ file at all. @cindex @command{gawk}, @code{ARGIND} variable in @cindex @code{ARGIND} variable, command-line arguments +@cindex @code{ARGV} array, indexing into @cindex @code{ARGC}/@code{ARGV} variables, command-line arguments All these arguments are made available to your @command{awk} program in the @code{ARGV} array (@pxref{Built-in Variables}). Command-line options @@ -3672,9 +3764,10 @@ sets the variable @code{ARGIND} to the index in @code{ARGV} of the current element. @cindex input files, variable assignments and -The distinction between @value{FN} arguments and variable-assignment +@cindex variable assignments and input files +The distinction between file name arguments and variable-assignment arguments is made when @command{awk} is about to open the next input file. -At that point in execution, it checks the @value{FN} to see whether +At that point in execution, it checks the file name to see whether it is really a variable assignment; if so, @command{awk} sets the variable instead of reading a file. @@ -3691,7 +3784,7 @@ sequences (@pxref{Escape Sequences}). @value{DARKCORNER} In some earlier implementations of @command{awk}, when a variable assignment -occurred before any @value{FN}s, the assignment would happen @emph{before} +occurred before any file names, the assignment would happen @emph{before} the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus inconsistent; some command-line assignments were available inside the @code{BEGIN} rule, while others were not. Unfortunately, @@ -3702,8 +3795,8 @@ upon the old behavior. The variable assignment feature is most useful for assigning to variables such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and -output formats before scanning the @value{DF}s. It is also useful for -controlling state if multiple passes are needed over a @value{DF}. For +output formats before scanning the data files. It is also useful for +controlling state if multiple passes are needed over a data file. For example: @cindex files, multiple passes over @@ -3739,16 +3832,17 @@ You may also use @code{"-"} to name standard input when reading files with @code{getline} (@pxref{Getline/File}). In addition, @command{gawk} allows you to specify the special -@value{FN} @file{/dev/stdin}, both on the command line and +file name @file{/dev/stdin}, both on the command line and with @code{getline}. Some other versions of @command{awk} also support this, but it is not standard. (Some operating systems provide a @file{/dev/stdin} file in the file system, however, @command{gawk} always processes -this @value{FN} itself.) +this file name itself.) @node Environment Variables @section The Environment Variables @command{gawk} Uses +@cindex environment variables used by @command{gawk} A number of environment variables influence how @command{gawk} behaves. @@ -3764,8 +3858,7 @@ behaves. @node AWKPATH Variable @subsection The @env{AWKPATH} Environment Variable @cindex @env{AWKPATH} environment variable -@cindex directories, searching -@cindex search paths +@cindex directories, searching for source files @cindex search paths, for source files @cindex differences in @command{awk} and @command{gawk}, @code{AWKPATH} environment variable @ifinfo @@ -3775,14 +3868,14 @@ on the command-line with the @option{-f} option. In most @command{awk} implementations, you must supply a precise path name for each program file, unless the file is in the current directory. -But in @command{gawk}, if the @value{FN} supplied to the @option{-f} +But in @command{gawk}, if the file name supplied to the @option{-f} or @option{-i} options -does not contain a @samp{/}, then @command{gawk} searches a list of +does not contain a directory separator @samp{/}, then @command{gawk} searches a list of directories (called the @dfn{search path}), one by one, looking for a file with the specified name. The search path is a string consisting of directory names -separated by colons. @command{gawk} gets its search path from the +separated by colons@footnote{Semicolons on MS-Windows and MS-DOS.}. @command{gawk} gets its search path from the @env{AWKPATH} environment variable. If that variable does not exist, @command{gawk} uses a default path, @samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk} @@ -3795,7 +3888,7 @@ though.} The search path feature is particularly useful for building libraries of useful @command{awk} functions. The library files can be placed in a standard directory in the default path and then specified on -the command line with a short @value{FN}. Otherwise, the full @value{FN} +the command line with a short file name. Otherwise, the full file name would have to be typed for each file. By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line @@ -3840,8 +3933,7 @@ found, and @command{gawk} no longer needs to use @env{AWKPATH}. @node AWKLIBPATH Variable @subsection The @env{AWKLIBPATH} Environment Variable @cindex @env{AWKLIBPATH} environment variable -@cindex directories, searching -@cindex search paths +@cindex directories, searching for shared libraries @cindex search paths, for shared libraries @cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable @@ -3889,10 +3981,6 @@ for use by the @command{gawk} developers for testing and tuning. They are subject to change. The variables are: @table @env -@item AVG_CHAIN_MAX -The average number of items @command{gawk} will maintain on a -hash chain for managing arrays. - @item AWK_HASH If this variable exists with a value of @samp{gst}, @command{gawk} will switch to using the hash function from GNU Smalltalk for @@ -3905,6 +3993,13 @@ files one line at a time, instead of reading in blocks. This exists for debugging problems on filesystems on non-POSIX operating systems where I/O is performed in records, not in blocks. +@item GAWK_MSG_SRC +If this variable exists, @command{gawk} includes the source file +name and line number from which warning and/or fatal messages +are generated. Its purpose is to help isolate the source of a +message, since there can be multiple places which produce the +same warning or error message. + @item GAWK_NO_DFA If this variable exists, @command{gawk} does not use the DFA regexp matcher for ``does it match'' kinds of tests. This can cause @command{gawk} @@ -3917,6 +4012,14 @@ coordinate with each other.) This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. +@item INT_CHAIN_MAX +The average number of items @command{gawk} will maintain on a +hash chain for managing arrays indexed by integers. + +@item STR_CHAIN_MAX +The average number of items @command{gawk} will maintain on a +hash chain for managing arrays indexed by strings. + @item TIDYMEM If this variable exists, @command{gawk} uses the @code{mtrace()} library calls from GNU LIBC to help track down possible memory leaks. @@ -3995,7 +4098,7 @@ use @samp{@@include} followed by the name of the file to be included, enclosed in double quotes. @quotation NOTE -Keep in mind that this is a language construct and the @value{FN} cannot +Keep in mind that this is a language construct and the file name cannot be a string variable, but rather just a literal string in double quotes. @end quotation @@ -4020,7 +4123,7 @@ $ @kbd{gawk -f test3} @print{} This is file test3. @end example -The @value{FN} can, of course, be a pathname. For example: +The file name can, of course, be a pathname. For example: @example @@include "../io_funcs" @@ -4115,10 +4218,9 @@ they will @emph{not} be in the next release). @c update this section for each release! -@cindex @code{PROCINFO} array The process-related special files @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk} -3.1, but still worked. As of @value{PVERSION} 4.0, they are no longer +3.1, but still worked. As of version 4.0, they are no longer interpreted specially by @command{gawk}. (Use @code{PROCINFO} instead; see @ref{Auto-set}.) @@ -4137,10 +4239,11 @@ in case some option becomes obsolete in a future version of @command{gawk}. @cindex Jedi knights @cindex Knights, jedi @quotation -@i{Use the Source, Luke!}@* -Obi-Wan +@i{Use the Source, Luke!} +@author Obi-Wan @end quotation +@cindex shells, sea This @value{SECTION} intentionally left blank. @@ -4153,7 +4256,7 @@ blank. @table @code @item -W nostalgia @itemx --nostalgia -Print the message @code{"awk: bailing out near line 1"} and dump core. +Print the message @samp{awk: bailing out near line 1} and dump core. This option was inspired by the common behavior of very early versions of Unix @command{awk} and by a t--shirt. The message is @emph{not} subject to translation in non-English locales. @@ -4199,7 +4302,7 @@ long-undocumented ``feature'' of Unix @code{awk}. @node Regexp @chapter Regular Expressions -@cindex regexp, See regular expressions +@cindex regexp @c STARTOFRANGE regexp @cindex regular expressions @@ -4208,8 +4311,8 @@ set of strings. Because regular expressions are such a fundamental part of @command{awk} programming, their format and use deserve a separate @value{CHAPTER}. -@cindex forward slash (@code{/}) -@cindex @code{/} (forward slash) +@cindex forward slash (@code{/}) to enclose regular expressions +@cindex @code{/} (forward slash) to enclose regular expressions A regular expression enclosed in slashes (@samp{/}) is an @command{awk} pattern that matches every input record whose text belongs to that set. @@ -4246,14 +4349,14 @@ slashes. Then the regular expression is tested against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, the following prints the second field of each record that contains the string -@samp{foo} anywhere in it: +@samp{li} anywhere in it: @example -$ @kbd{awk '/foo/ @{ print $2 @}' BBS-list} -@print{} 555-1234 +$ @kbd{awk '/li/ @{ print $2 @}' mail-list} +@print{} 555-5553 +@print{} 555-0542 @print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 +@print{} 555-3430 @end example @cindex regular expressions, operators @@ -4265,9 +4368,9 @@ $ @kbd{awk '/foo/ @{ print $2 @}' BBS-list} @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator @c @cindex operators, @code{!~} -@cindex @code{if} statement -@cindex @code{while} statement -@cindex @code{do}-@code{while} statement +@cindex @code{if} statement, use of regexps in +@cindex @code{while} statement, use of regexps in +@cindex @code{do}-@code{while} statement, use of regexps in @c @cindex statements, @code{if} @c @cindex statements, @code{while} @c @cindex statements, @code{do} @@ -4326,6 +4429,7 @@ $ @kbd{awk '$1 !~ /J/' inventory-shipped} @end example @cindex regexp constants +@cindex constant regexps @cindex regular expressions, constants, See regexp constants When a regexp is enclosed in slashes, such as @code{/foo/}, we call it a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @@ -4334,7 +4438,7 @@ a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @node Escape Sequences @section Escape Sequences -@cindex escape sequences +@cindex escape sequences, in strings @cindex backslash (@code{\}), in escape sequences @cindex @code{\} (backslash), in escape sequences Some characters cannot be included literally in string constants @@ -4374,39 +4478,39 @@ A literal backslash, @samp{\}. @cindex @code{\} (backslash), @code{\a} escape sequence @cindex backslash (@code{\}), @code{\a} escape sequence @item \a -The ``alert'' character, @kbd{@value{CTL}-g}, ASCII code 7 (BEL). +The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL). (This usually makes some sort of audible noise.) @cindex @code{\} (backslash), @code{\b} escape sequence @cindex backslash (@code{\}), @code{\b} escape sequence @item \b -Backspace, @kbd{@value{CTL}-h}, ASCII code 8 (BS). +Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS). @cindex @code{\} (backslash), @code{\f} escape sequence @cindex backslash (@code{\}), @code{\f} escape sequence @item \f -Formfeed, @kbd{@value{CTL}-l}, ASCII code 12 (FF). +Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF). @cindex @code{\} (backslash), @code{\n} escape sequence @cindex backslash (@code{\}), @code{\n} escape sequence @item \n -Newline, @kbd{@value{CTL}-j}, ASCII code 10 (LF). +Newline, @kbd{Ctrl-j}, ASCII code 10 (LF). @cindex @code{\} (backslash), @code{\r} escape sequence @cindex backslash (@code{\}), @code{\r} escape sequence @item \r -Carriage return, @kbd{@value{CTL}-m}, ASCII code 13 (CR). +Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR). @cindex @code{\} (backslash), @code{\t} escape sequence @cindex backslash (@code{\}), @code{\t} escape sequence @item \t -Horizontal TAB, @kbd{@value{CTL}-i}, ASCII code 9 (HT). +Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). @c @cindex @command{awk} language, V.4 version @cindex @code{\} (backslash), @code{\v} escape sequence @cindex backslash (@code{\}), @code{\v} escape sequence @item \v -Vertical tab, @kbd{@value{CTL}-k}, ASCII code 11 (VT). +Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT). @cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence @cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence @@ -4499,6 +4603,7 @@ leaves what happens as undefined. There are two choices: @c @cindex automatic warnings @c @cindex warnings, automatic +@cindex Brian Kernighan's @command{awk} @table @asis @item Strip the backslash out This is what Brian Kernighan's @command{awk} and @command{gawk} both do. @@ -4512,6 +4617,7 @@ two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) @cindex @command{gawk}, escape sequences @cindex Unix @command{awk}, backslashes in escape sequences +@cindex @command{mawk} utility @item Leave the backslash alone Some other @command{awk} implementations do this. In such implementations, typing @code{"a\qc"} is the same as typing @@ -4543,6 +4649,7 @@ escape sequences literally when used in regexp constants. Thus, @section Regular Expression Operators @c STARTOFRANGE regexpo @cindex regular expressions, operators +@cindex metacharacters in regular expressions You can combine regular expressions with special characters, called @dfn{regular expression operators} or @dfn{metacharacters}, to @@ -4561,8 +4668,8 @@ Here is a list of metacharacters. All characters that are not escape sequences and that are not listed in the table stand for themselves: @table @code -@cindex backslash (@code{\}) -@cindex @code{\} (backslash) +@cindex backslash (@code{\}), regexp operator +@cindex @code{\} (backslash), regexp operator @item \ This is used to suppress the special meaning of a character when matching. For example, @samp{\$} @@ -4587,8 +4694,8 @@ The condition is not true in the following example: if ("line1\nLINE 2" ~ /^L/) @dots{} @end example -@cindex @code{$} (dollar sign) -@cindex dollar sign (@code{$}) +@cindex @code{$} (dollar sign), regexp operator +@cindex dollar sign (@code{$}), regexp operator @item $ This is similar to @samp{^}, but it matches only at the end of a string. For example, @samp{p$} @@ -4600,8 +4707,8 @@ The condition in the following example is not true: if ("line1\nLINE 2" ~ /1$/) @dots{} @end example -@cindex @code{.} (period) -@cindex period (@code{.}) +@cindex @code{.} (period), regexp operator +@cindex period (@code{.}), regexp operator @item . @r{(period)} This matches any single character, @emph{including} the newline character. For example, @samp{.P} @@ -4617,11 +4724,12 @@ character, which is a character with all bits equal to zero. Otherwise, @sc{nul} is just another character. Other versions of @command{awk} may not be able to match the @sc{nul} character. -@cindex @code{[]} (square brackets) -@cindex square brackets (@code{[]}) +@cindex @code{[]} (square brackets), regexp operator +@cindex square brackets (@code{[]}), regexp operator @cindex bracket expressions @cindex character sets, See Also bracket expressions @cindex character lists, See bracket expressions +@cindex character classes, See bracket expressions @item [@dots{}] This is called a @dfn{bracket expression}.@footnote{In other literature, you may see a bracket expression referred to as either a @@ -4654,8 +4762,8 @@ means it matches any string that starts with @samp{P} or contains a digit. The alternation applies to the largest possible regexps on either side. -@cindex @code{()} (parentheses) -@cindex parentheses @code{()} +@cindex @code{()} (parentheses), regexp operator +@cindex parentheses @code{()}, regexp operator @item (@dots{}) Parentheses are used for grouping in regular expressions, as in arithmetic. They can be used to concatenate regular expressions @@ -4683,8 +4791,8 @@ prints every record in @file{sample} containing a string of the form Notice the escaping of the parentheses by preceding them with backslashes. -@cindex @code{+} (plus sign) -@cindex plus sign (@code{+}) +@cindex @code{+} (plus sign), regexp operator +@cindex plus sign (@code{+}), regexp operator @item + This symbol is similar to @samp{*}, except that the preceding expression must be matched at least once. This means that @samp{wh+y} @@ -4697,14 +4805,14 @@ way of writing the last @samp{*} example: awk '/\(c[ad]+r x\)/ @{ print @}' sample @end example -@cindex @code{?} (question mark) regexp operator -@cindex question mark (@code{?}) regexp operator +@cindex @code{?} (question mark), regexp operator +@cindex question mark (@code{?}), regexp operator @item ? This symbol is similar to @samp{*}, except that the preceding expression can be matched either once or not at all. For example, @samp{fe?d} matches @samp{fed} and @samp{fd}, but nothing else. -@cindex interval expressions +@cindex interval expressions, regexp operator @item @{@var{n}@} @itemx @{@var{n},@} @itemx @{@var{n},@var{m}@} @@ -4738,7 +4846,7 @@ constants, @command{gawk} did @emph{not} match interval expressions in regexps. -However, beginning with @value{PVERSION} 4.0, +However, beginning with version 4.0, @command{gawk} does match interval expressions by default. This is because compatibility with POSIX has become more important to most @command{gawk} users than compatibility with @@ -4781,6 +4889,7 @@ expressions are not available in regular expressions. @cindex bracket expressions @cindex bracket expressions, range expressions @cindex range expressions (regexps) +@cindex character lists in regular expression As mentioned earlier, a bracket expression matches any character amongst those listed between the opening and closing square brackets. @@ -4882,8 +4991,8 @@ These sequences are: @item Collating symbols Multicharacter collating elements enclosed between @samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element, -then @code{[[.ch.]]} is a regexp that matches this collating element, whereas -@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}. +then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas +@samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}. @cindex bracket expressions, equivalence classes @item Equivalence classes @@ -4891,7 +5000,7 @@ Locale-specific names for a list of characters that are equal. The name is enclosed between @samp{[=} and @samp{=]}. For example, the name @samp{e} might be used to represent all of -``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp +``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}. @end table @@ -4935,7 +5044,7 @@ or underscores (@samp{_}): @item \s Matches any whitespace character. Think of it as shorthand for -@w{@code{[[:space:]]}}. +@w{@samp{[[:space:]]}}. @c @cindex operators, @code{\S} (@command{gawk}) @cindex backslash (@code{\}), @code{\S} operator (@command{gawk}) @@ -4943,7 +5052,7 @@ Think of it as shorthand for @item \S Matches any character that is not whitespace. Think of it as shorthand for -@w{@code{[^[:space:]]}}. +@w{@samp{[^[:space:]]}}. @c @cindex operators, @code{\w} (@command{gawk}) @cindex backslash (@code{\}), @code{\w} operator (@command{gawk}) @@ -4951,7 +5060,7 @@ Think of it as shorthand for @item \w Matches any word-constituent character---that is, it matches any letter, digit, or underscore. Think of it as shorthand for -@w{@code{[[:alnum:]_]}}. +@w{@samp{[[:alnum:]_]}}. @c @cindex operators, @code{\W} (@command{gawk}) @cindex backslash (@code{\}), @code{\W} operator (@command{gawk}) @@ -4959,7 +5068,7 @@ letter, digit, or underscore. Think of it as shorthand for @item \W Matches any character that is not word-constituent. Think of it as shorthand for -@w{@code{[^[:alnum:]_]}}. +@w{@samp{[^[:alnum:]_]}}. @c @cindex operators, @code{\<} (@command{gawk}) @cindex backslash (@code{\}), @code{\<} operator (@command{gawk}) @@ -5020,10 +5129,10 @@ Matches the empty string at the end of a buffer (string). @end table -@cindex @code{^} (caret) -@cindex caret (@code{^}) -@cindex @code{?} (question mark) regexp operator -@cindex question mark (@code{?}) regexp operator +@cindex @code{^} (caret), regexp operator +@cindex caret (@code{^}), regexp operator +@cindex @code{?} (question mark), regexp operator +@cindex question mark (@code{?}), regexp operator Because @samp{^} and @samp{$} always work in terms of the beginning and end of strings, these operators don't add any new capabilities for @command{awk}. They are provided for compatibility with other @@ -5044,7 +5153,7 @@ lesser of two evils. @c @c Should really do this with file inclusion. @cindex regular expressions, @command{gawk}, command-line options -@cindex @command{gawk}, command-line options +@cindex @command{gawk}, command-line options, and regular expressions The various command-line options (@pxref{Options}) control how @command{gawk} interprets characters in regexps: @@ -5067,10 +5176,11 @@ Only POSIX regexps are supported; the GNU operators are not special (e.g., @samp{\w} matches a literal @samp{w}). Interval expressions are allowed. +@cindex Brian Kernighan's @command{awk} @item @code{--traditional} Traditional Unix @command{awk} regexps are matched. The GNU operators are not special, and interval expressions are not available. -The POSIX character classes (@code{[[:alnum:]]}, etc.) are supported, +The POSIX character classes (@samp{[[:alnum:]]}, etc.) are supported, as Brian Kernighan's @command{awk} does support them. Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters. @@ -5122,7 +5232,7 @@ This works in any POSIX-compliant @command{awk}. @cindex tilde (@code{~}), @code{~} operator @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator -@cindex @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, with @code{~} and @code{!~} operators @cindex @command{gawk}, @code{IGNORECASE} variable in @c @cindex variables, @code{IGNORECASE} Another method, specific to @command{gawk}, is to set the variable @@ -5329,7 +5439,7 @@ But a newline in a regexp constant works with no problem: $ @kbd{awk '$0 ~ /[ \t\n]/'} @kbd{here is a sample line} @print{} here is a sample line -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @command{gawk} does not have this problem, and it isn't likely to @@ -5343,6 +5453,7 @@ occur often in practice, but it's worth noting for future reference. @chapter Reading Input Files @c STARTOFRANGE infir +@cindex reading input files @cindex input files, reading @cindex input files @cindex @code{FILENAME} variable @@ -5379,7 +5490,7 @@ used with it do not have to be named on the @command{awk} command line * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. +* Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. @@ -5404,7 +5515,7 @@ so far from the current input file. This value is stored in a built-in variable called @code{FNR}. It is reset to zero when a new file is started. Another built-in variable, @code{NR}, records the total -number of input records read so far from all @value{DF}s. It starts at zero, +number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero. @cindex separators, for records @@ -5429,69 +5540,80 @@ To do this, use the special @code{BEGIN} pattern (@pxref{BEGIN/END}). For example: -@cindex @code{BEGIN} pattern @example -awk 'BEGIN @{ RS = "/" @} - @{ print $0 @}' BBS-list +awk 'BEGIN @{ RS = "u" @} + @{ print $0 @}' mail-list @end example @noindent -changes the value of @code{RS} to @code{"/"}, before reading any input. -This is a string whose first character is a slash; as a result, records -are separated by slashes. Then the input file is read, and the second +changes the value of @code{RS} to @samp{u}, before reading any input. +This is a string whose first character is the letter ``u;'' as a result, records +are separated by the letter ``u.'' Then the input file is read, and the second rule in the @command{awk} program (the action with no pattern) prints each record. Because each @code{print} statement adds a newline at the end of its output, this @command{awk} program copies the input -with each slash changed to a newline. Here are the results of running -the program on @file{BBS-list}: - -@example -$ @kbd{awk 'BEGIN @{ RS = "/" @}} -> @kbd{@{ print $0 @}' BBS-list} -@print{} aardvark 555-5553 1200 -@print{} 300 B -@print{} alpo-net 555-3412 2400 -@print{} 1200 -@print{} 300 A -@print{} barfly 555-7685 1200 -@print{} 300 A -@print{} bites 555-1675 2400 -@print{} 1200 -@print{} 300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200 -@print{} 300 C -@print{} fooey 555-1234 2400 -@print{} 1200 -@print{} 300 B -@print{} foot 555-6699 1200 -@print{} 300 B -@print{} macfoo 555-6480 1200 -@print{} 300 A -@print{} sdace 555-3430 2400 -@print{} 1200 -@print{} 300 A -@print{} sabafoo 555-2127 1200 -@print{} 300 C -@print{} +with each @samp{u} changed to a newline. Here are the results of running +the program on @file{mail-list}: + +@example +$ @kbd{awk 'BEGIN @{ RS = "u" @}} +> @kbd{@{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiac +@print{} sq +@print{} e@@gmail.com F +@print{} Anthony 555-3412 anthony.assert +@print{} ro@@hotmail.com A +@print{} Becky 555-7685 becky.algebrar +@print{} m@@gmail.com A +@print{} Bill 555-1675 bill.drowning@@hotmail.com A +@print{} Broderick 555-0542 broderick.aliq +@print{} otiens@@yahoo.com R +@print{} Camilla 555-2912 camilla.inf +@print{} sar +@print{} m@@skynet.be R +@print{} Fabi +@print{} s 555-1234 fabi +@print{} s. +@print{} ndevicesim +@print{} s@@ +@print{} cb.ed +@print{} F +@print{} J +@print{} lie 555-6699 j +@print{} lie.perscr +@print{} tabor@@skeeve.com F +@print{} Martin 555-6480 martin.codicib +@print{} s@@hotmail.com A +@print{} Sam +@print{} el 555-3430 sam +@print{} el.lanceolis@@sh +@print{} .ed +@print{} A +@print{} Jean-Pa +@print{} l 555-2127 jeanpa +@print{} l.campanor +@print{} m@@ny +@print{} .ed +@print{} R +@print{} @end example @noindent -Note that the entry for the @samp{camelot} BBS is not split. -In the original @value{DF} +Note that the entry for the name @samp{Bill} is not split. +In the original data file (@pxref{Sample Data Files}), the line looks like this: @example -camelot 555-0542 300 C +Bill 555-1675 bill.drowning@@hotmail.com A @end example @noindent -It has one baud rate only, so there are no slashes in the record, -unlike the others which have two or more baud rates. -In fact, this record is treated as part of the record -for the @samp{core} BBS; the newline separating them in the output -is the original newline in the @value{DF}, not the one added by +It contains no @samp{u} so there is no reason to split the record, +unlike the others which have one or more occurrences of the @samp{u}. +In fact, this record is treated as part of the previous record; +the newline separating them in the output +is the original newline in the data file, not the one added by @command{awk} when it printed the record! @cindex record separators, changing @@ -5501,14 +5623,17 @@ using the variable-assignment feature (@pxref{Other Arguments}): @example -awk '@{ print $0 @}' RS="/" BBS-list +awk '@{ print $0 @}' RS="u" mail-list @end example @noindent -This sets @code{RS} to @samp{/} before processing @file{BBS-list}. +This sets @code{RS} to @samp{u} before processing @file{mail-list}. -Using an unusual character such as @samp{/} for the record separator -produces correct behavior in the vast majority of cases. +Using an alphabetic character such as @samp{u} for the record separator +is highly likely to produce strange results. +Using an unusual character such as @samp{/} is more likely to +produce correct behavior in the majority of cases, but there +are no guarantees. The moral is: Know Your Data. There is one unusual case, that occurs when @command{gawk} is being fully POSIX-compliant (@pxref{Options}). @@ -5530,6 +5655,7 @@ Reaching the end of an input file terminates the current input record, even if the last character in the file is not the character in @code{RS}. @value{DARKCORNER} +@cindex empty strings @cindex null strings @cindex strings, empty, See null strings The empty string @code{""} (a string without any characters) @@ -5627,8 +5753,8 @@ In compatibility mode, only the first character of the value of @code{RS} is used to determine the end of the record. @sidebar @code{RS = "\0"} Is Not Portable -@cindex portability, @value{DF}s as single record -There are times when you might want to treat an entire @value{DF} as a +@cindex portability, data files as single record +There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give @code{RS} a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary @@ -5647,20 +5773,26 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @command{gawk} in fact accepts this, and uses the @sc{nul} character for the record separator. However, this usage is @emph{not} portable -to other @command{awk} implementations. +to most other @command{awk} implementations. @cindex dark corner, strings, storing -All other @command{awk} implementations@footnote{At least that we know +Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the @sc{nul} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} +It happens that recent versions of @command{mawk} can use the @sc{nul} +character as a record separator. However, this is a special case: +@command{mawk} does not allow embedded @sc{nul} characters in strings. + @cindex records, treating files as -@cindex files, as single records +@cindex treating files, as single records The best way to treat a whole file as a single record is to simply read the file in, one record at a time, concatenating each record onto the end of the previous ones. + +@c @strong{FIXME}: Using @sc{nul} is good for @file{/proc/environ} etc. @end sidebar @c ENDOFRANGE inspl @c ENDOFRANGE recspl @@ -5732,31 +5864,29 @@ when you are not interested in specific fields. Here are some more examples: @example -$ @kbd{awk '$1 ~ /foo/ @{ print $0 @}' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '$1 ~ /li/ @{ print $0 @}' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F @end example @noindent -This example prints each record in the file @file{BBS-list} whose first -field contains the string @samp{foo}. The operator @samp{~} is called a +This example prints each record in the file @file{mail-list} whose first +field contains the string @samp{li}. The operator @samp{~} is called a @dfn{matching operator} (@pxref{Regexp Usage}); it tests whether a string (here, the field @code{$1}) matches a given regular expression. By contrast, the following example -looks for @samp{foo} in @emph{the entire record} and prints the first +looks for @samp{li} in @emph{the entire record} and prints the first field and the last field for each matching input record: @example -$ @kbd{awk '/foo/ @{ print $1, $NF @}' BBS-list} -@print{} fooey B -@print{} foot B -@print{} macfoo A -@print{} sabafoo C +$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} +@print{} Amelia F +@print{} Broderick R +@print{} Julie F +@print{} Samuel A @end example @c ENDOFRANGE fiex @@ -5784,7 +5914,7 @@ the record has fewer than 20 fields, so this prints a blank line. Here is another example of using expressions as field numbers: @example -awk '@{ print $(2*2) @}' BBS-list +awk '@{ print $(2*2) @}' mail-list @end example @command{awk} evaluates the expression @samp{(2*2)} and uses @@ -5793,8 +5923,8 @@ represents multiplication, so the expression @samp{2*2} evaluates to four. The parentheses are used so that the multiplication is done before the @samp{$} operation; they are necessary whenever there is a binary operator in the field-number expression. This example, then, prints the -hours of operation (the fourth field) for every line of the file -@file{BBS-list}. (All of the @command{awk} operators are listed, in +type of relationship (the fourth field) for every line of the file +@file{mail-list}. (All of the @command{awk} operators are listed, in order of decreasing precedence, in @ref{Precedence}.) @@ -6017,6 +6147,7 @@ with a statement such as @samp{$1 = $1}, as described earlier. * Regexp Field Splitting:: Using regexps as the field separator. * Single Character Fields:: Making each character a separate field. * Command Line Field Separator:: Setting @code{FS} from the command-line. +* Full Line Fields:: Making the full line be a single field. * Field Splitting Summary:: Some final points and a summary table. @end menu @@ -6204,7 +6335,7 @@ was ignored when finding @code{$1}, it is not part of the new @code{$0}. Finally, the last @code{print} statement prints the new @code{$0}. @cindex @code{FS}, containing @code{^} -@cindex @code{^}, in @code{FS} +@cindex @code{^} (caret), in @code{FS} @cindex dark corner, @code{^}, in @code{FS} There is an additional subtlety to be aware of when using regular expressions for field splitting. @@ -6215,6 +6346,7 @@ different @command{awk} versions answer this question differently, and you should not rely on any specific behavior in your programs. @value{DARKCORNER} +@cindex Brian Kernighan's @command{awk} As a point of information, Brian Kernighan's @command{awk} allows @samp{^} to match only at the beginning of the record. @command{gawk} also works this way. For example: @@ -6258,7 +6390,7 @@ $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}} @end example @cindex dark corner, @code{FS} as null string -@cindex FS variable, as null string +@cindex @code{FS} variable, as null string Traditionally, the behavior of @code{FS} equal to @code{""} was not defined. In this case, most versions of Unix @command{awk} simply treat the entire record as only having one field. @@ -6270,10 +6402,8 @@ behaves this way. @node Command Line Field Separator @subsection Setting @code{FS} from the Command Line -@cindex @code{-F} option -@cindex options, command-line -@cindex command line, options -@cindex field separators, on command line +@cindex @option{-F} option, command line +@cindex field separator, on command line @cindex command line, @code{FS} on@comma{} setting @cindex @code{FS} variable, setting from command line @@ -6323,68 +6453,76 @@ figures that you really want your fields to be separated with TABs and not @samp{t}s. Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line if you really do want to separate your fields with @samp{t}s. -As an example, let's use an @command{awk} program file called @file{baud.awk} -that contains the pattern @code{/300/} and the action @samp{print $1}: +As an example, let's use an @command{awk} program file called @file{edu.awk} +that contains the pattern @code{/edu/} and the action @samp{print $1}: @example -/300/ @{ print $1 @} +/edu/ @{ print $1 @} @end example Let's also set @code{FS} to be the @samp{-} character and run the -program on the file @file{BBS-list}. The following command prints a -list of the names of the bulletin boards that operate at 300 baud and +program on the file @file{mail-list}. The following command prints a +list of the names of the people that work at or attend a university, and the first three digits of their phone numbers: @c tweaked to make the tex output look better in @smallbook @example -$ @kbd{awk -F- -f baud.awk BBS-list} -@print{} aardvark 555 -@print{} alpo -@print{} barfly 555 -@print{} bites 555 -@print{} camelot 555 -@print{} core 555 -@print{} fooey 555 -@print{} foot 555 -@print{} macfoo 555 -@print{} sdace 555 -@print{} sabafoo 555 +$ @kbd{awk -F- -f edu.awk mail-list} +@print{} Fabius 555 +@print{} Samuel 555 +@print{} Jean @end example @noindent -Note the second line of output. The second line +Note the third line of output. The third line in the original file looked like this: @example -alpo-net 555-3412 2400/1200/300 A +Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example -The @samp{-} as part of the system's name was used as the field +The @samp{-} as part of the person's name was used as the field separator, instead of the @samp{-} in the phone number that was originally intended. This demonstrates why you have to be careful in choosing your field and record separators. @cindex Unix @command{awk}, password files@comma{} field separators and -Perhaps the most common use of a single character as the field -separator occurs when processing the Unix system password file. -On many Unix systems, each user has a separate entry in the system password -file, one line per user. The information in these lines is separated -by colons. The first field is the user's login name and the second is -the user's (encrypted or shadow) password. A password file entry might look -like this: +Perhaps the most common use of a single character as the field separator +occurs when processing the Unix system password file. On many Unix +systems, each user has a separate entry in the system password file, one +line per user. The information in these lines is separated by colons. +The first field is the user's login name and the second is the user's +encrypted or shadow password. (A shadow password is indicated by the +presence of a single @samp{x} in the second field.) A password file +entry might look like this: @cindex Robbins, Arnold @example -arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash +arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash @end example The following program searches the system password file and prints -the entries for users who have no password: +the entries for users whose full name is not indicated: + +@example +awk -F: '$5 == ""' /etc/passwd +@end example + +@node Full Line Fields +@subsection Making The Full Line Be A Single Field + +Occasionally, it's useful to treat the whole input line as a +single field. This can be done easily and portably simply by +setting @code{FS} to @code{"\n"} (a newline).@footnote{Thanks to +Andrew Schorr for this tip.} @example -awk -F: '$2 == ""' /etc/passwd +awk -F'\n' '@var{program}' @var{files @dots{}} @end example +@noindent +When you do this, @code{$1} is the same as @code{$0}. + @node Field Splitting Summary @subsection Field-Splitting Summary @@ -6425,7 +6563,7 @@ POSIX standard.) @sidebar Changing @code{FS} Does Not Affect the Fields @cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and +@cindex field separator, POSIX and According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} @@ -6495,19 +6633,11 @@ will take effect. @node Constant Size @section Reading Fixed-Width Data -@ifnotinfo @quotation NOTE This @value{SECTION} discusses an advanced feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @end quotation -@end ifnotinfo - -@ifinfo -(This @value{SECTION} discusses an advanced feature of @command{awk}. -If you are a novice @command{awk} user, you might want to skip it on -the first reading.) -@end ifinfo @cindex data, fixed-width @cindex fixed-width data @@ -6637,19 +6767,11 @@ for an example of such a function). @node Splitting By Content @section Defining Fields By Content -@ifnotinfo @quotation NOTE This @value{SECTION} discusses an advanced feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @end quotation -@end ifnotinfo - -@ifinfo -(This @value{SECTION} discusses an advanced feature of @command{awk}. -If you are a novice @command{awk} user, you might want to skip it on -the first reading.) -@end ifinfo @cindex advanced features, specifying field content Normally, when using @code{FS}, @command{gawk} defines the fields as the @@ -6764,6 +6886,7 @@ available for splitting regular strings (@pxref{String Functions}). @node Multiple Line @section Multiple-Line Records +@cindex multiple-line records @c STARTOFRANGE recm @cindex records, multiline @c STARTOFRANGE imr @@ -6810,12 +6933,13 @@ appear in a row, they are considered one record separator. @cindex dark corner, multiline records There is an important difference between @samp{RS = ""} and @samp{RS = "\n\n+"}. In the first case, leading newlines in the input -@value{DF} are ignored, and if a file ends without extra blank lines +data file are ignored, and if a file ends without extra blank lines after the last record, the final newline is removed from the record. In the second case, this special processing is not done. @value{DARKCORNER} -@cindex field separators, in multiline records +@cindex field separator, in multiline records +@cindex @code{FS}, in multiline records Now that the input is separated into records, the second step is to separate the fields in the record. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default @@ -6845,7 +6969,7 @@ Another way to separate fields is to put each field on a separate line: to do this, just set the variable @code{FS} to the string @code{"\n"}. (This single character separator matches a single newline.) -A practical example of a @value{DF} organized this way might be a mailing +A practical example of a data file organized this way might be a mailing list, where each entry is separated by blank lines. Consider a mailing list in a file named @file{addresses}, which looks like this: @@ -6910,7 +7034,7 @@ value of @table @code @item RS == "\n" Records are separated by the newline character (@samp{\n}). In effect, -every line in the @value{DF} is a separate record, including blank lines. +every line in the data file is a separate record, including blank lines. This is the default. @item RS == @var{any single character} @@ -6946,6 +7070,7 @@ then @command{gawk} sets @code{RT} to the null string. @c STARTOFRANGE getl @cindex @code{getline} command, explicit input with +@c STARTOFRANGE inex @cindex input, explicit So far we have been getting our input data from @command{awk}'s main input stream---either the standard input (usually your terminal, sometimes @@ -6962,10 +7087,10 @@ and study the @code{getline} command @emph{after} you have reviewed the rest of this @value{DOCUMENT} and have a good knowledge of how @command{awk} works. @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @command{getline} command @cindex differences in @command{awk} and @command{gawk}, @code{getline} command @cindex @code{getline} command, return values -@cindex @code{--sandbox} option, input redirection with @command{getline} +@cindex @option{--sandbox} option, input redirection with @code{getline} The @code{getline} command returns one if it finds a record and zero if it encounters the end of the file. If there is some error in getting @@ -7058,6 +7183,7 @@ rule in the program. @xref{Next Statement}. @node Getline/Variable @subsection Using @code{getline} into a Variable +@cindex @code{getline} into a variable @cindex variables, @code{getline} command into@comma{} using You can use @samp{getline @var{var}} to read the next record from @@ -7109,6 +7235,7 @@ the value of @code{NF} do not change. @node Getline/File @subsection Using @code{getline} from a File +@cindex @code{getline} from a file @cindex input redirection @cindex redirection of input @cindex @code{<} (left angle bracket), @code{<} operator (I/O) @@ -7116,7 +7243,7 @@ the value of @code{NF} do not change. @cindex operators, input/output Use @samp{getline < @var{file}} to read the next record from @var{file}. Here @var{file} is a string-valued expression that -specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} +specifies the file name. @samp{< @var{file}} is called a @dfn{redirection} because it directs input to come from a different place. For example, the following program reads its input record from the file @file{secondary.input} when it @@ -7157,8 +7284,6 @@ from the file @var{file}, and put it in the variable @var{var}. As above, @var{file} is a string-valued expression that specifies the file from which to read. -@cindex @command{gawk}, @code{RT} variable in -@cindex @code{RT} variable In this version of @code{getline}, none of the built-in variables are changed and the record is not split into fields. The only variable changed is @var{var}.@footnote{This is not quite true. @code{RT} could @@ -7183,7 +7308,6 @@ Note here how the name of the extra input file is not built into the program; it is taken directly from the data, specifically from the second field on the @samp{@@include} line. -@cindex @code{close()} function The @code{close()} function is called to ensure that if two identical @samp{@@include} lines appear in the input, the entire specified file is included twice. @@ -7200,16 +7324,17 @@ that does handle nested @samp{@@include} statements. @subsection Using @code{getline} from a Pipe @c From private email, dated October 2, 1988. Used by permission, March 2013. +@cindex Kernighan, Brian @quotation @i{Omniscience has much to recommend it. -Failing that, attention to details would be useful.}@* -Brian Kernighan +Failing that, attention to details would be useful.} +@author Brian Kernighan @end quotation @cindex @code{|} (vertical bar), @code{|} operator (I/O) @cindex vertical bar (@code{|}), @code{|} operator (I/O) @cindex input pipeline -@cindex pipes, input +@cindex pipe, input @cindex operators, input/output The output of a command can also be piped into @code{getline}, using @samp{@var{command} | getline}. In @@ -7233,7 +7358,6 @@ produced by running the rest of the line as a shell command: @end example @noindent -@cindex @code{close()} function The @code{close()} function is called to ensure that if two identical @samp{@@execute} lines appear in the input, the command is run for each one. @@ -7287,6 +7411,8 @@ because the concatenation operator is not parenthesized. You should write it as @samp{(@w{"echo "} "date") | getline} if you want your program to be portable to all @command{awk} implementations. +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility @quotation NOTE Unfortunately, @command{gawk} has not been consistent in its treatment of a construct like @samp{@w{"echo "} "date" | getline}. @@ -7423,10 +7549,10 @@ system permits. @item An interesting side effect occurs if you use @code{getline} without a redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline} -reads from the command-line @value{DF}s, the first @code{getline} command +reads from the command-line data files, the first @code{getline} command causes @command{awk} to set the value of @code{FILENAME}. Normally, @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you -have not yet started to process the command-line @value{DF}s. +have not yet started to process the command-line data files. @value{DARKCORNER} (@xref{BEGIN/END}, also @pxref{Auto-set}.) @@ -7611,6 +7737,7 @@ indefinitely until some other process opens it for writing. @node Command line directories @section Directories On The Command Line +@cindex differences in @command{awk} and @command{gawk}, command line directories @cindex directories, command line @cindex command line, directories on @@ -7648,7 +7775,7 @@ For printing with specifications, you need the @code{printf} statement @cindex @code{printf} statement Besides basic and formatted printing, this @value{CHAPTER} also covers I/O redirections to files and pipes, introduces -the special @value{FN}s that @command{gawk} processes internally, +the special file names that @command{gawk} processes internally, and discusses the @code{close()} built-in function. @menu @@ -7854,13 +7981,29 @@ program by using a new value of @code{OFS}. @example $ @kbd{awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}} -> @kbd{@{ print $1, $2 @}' BBS-list} -@print{} aardvark;555-5553 -@print{} -@print{} alpo-net;555-3412 -@print{} -@print{} barfly;555-7685 -@dots{} +> @kbd{@{ print $1, $2 @}' mail-list} +@print{} Amelia;555-5553 +@print{} +@print{} Anthony;555-3412 +@print{} +@print{} Becky;555-7685 +@print{} +@print{} Bill;555-1675 +@print{} +@print{} Broderick;555-0542 +@print{} +@print{} Camilla;555-2912 +@print{} +@print{} Fabius;555-1234 +@print{} +@print{} Julie;555-6699 +@print{} +@print{} Martin;555-6480 +@print{} +@print{} Samuel;555-3430 +@print{} +@print{} Jean-Paul;555-2127 +@print{} @end example If the value of @code{ORS} does not contain a newline, the program's output @@ -7882,7 +8025,7 @@ numbers can be formatted. The different format specifications are discussed more fully in @ref{Control Letters}. -@cindex @code{sprintf()} function +@cindexawkfunc{sprintf} @cindex @code{OFMT} variable @cindex output, format specifier@comma{} @code{OFMT} The built-in variable @code{OFMT} contains the default format specification @@ -7948,7 +8091,7 @@ parentheses are necessary if any of the item expressions use the @samp{>} relational operator; otherwise, it can be confused with an output redirection (@pxref{Redirection}). -@cindex format strings +@cindex format specifiers The difference between @code{printf} and @code{print} is the @var{format} argument. This is an expression whose value is taken as a string; it specifies how to output each of the other arguments. It is called the @@ -8334,30 +8477,30 @@ The following simple example shows how to use @code{printf} to make an aligned table: @example -awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list +awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example @noindent This command -prints the names of the bulletin boards (@code{$1}) in the file -@file{BBS-list} as a string of 10 characters that are left-justified. It also +prints the names of the people (@code{$1}) in the file +@file{mail-list} as a string of 10 characters that are left-justified. It also prints the phone numbers (@code{$2}) next on the line. This produces an aligned two-column table of names and phone numbers, as shown here: @example -$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list} -@print{} aardvark 555-5553 -@print{} alpo-net 555-3412 -@print{} barfly 555-7685 -@print{} bites 555-1675 -@print{} camelot 555-0542 -@print{} core 555-2912 -@print{} fooey 555-1234 -@print{} foot 555-6699 -@print{} macfoo 555-6480 -@print{} sdace 555-3430 -@print{} sabafoo 555-2127 +$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list} +@print{} Amelia 555-5553 +@print{} Anthony 555-3412 +@print{} Becky 555-7685 +@print{} Bill 555-1675 +@print{} Broderick 555-0542 +@print{} Camilla 555-2912 +@print{} Fabius 555-1234 +@print{} Julie 555-6699 +@print{} Martin 555-6480 +@print{} Samuel 555-3430 +@print{} Jean-Paul 555-2127 @end example In this case, the phone numbers had to be printed as strings because @@ -8378,7 +8521,7 @@ the @command{awk} program: @example awk 'BEGIN @{ print "Name Number" print "---- ------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list + @{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example The above example mixes @code{print} and @code{printf} statements in @@ -8388,7 +8531,7 @@ same results: @example awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list + @{ printf "%-10s %s\n", $1, $2 @}' mail-list @end example @noindent @@ -8403,7 +8546,7 @@ emphasized by storing it in a variable, like this: awk 'BEGIN @{ format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" @} - @{ printf format, $1, $2 @}' BBS-list + @{ printf format, $1, $2 @}' mail-list @end example @c !!! exercise @@ -8417,9 +8560,11 @@ on the @code{print} statement @node Redirection @section Redirecting Output of @code{print} and @code{printf} +@c STARTOFRANGE outre @cindex output redirection +@c STARTOFRANGE reout @cindex redirection of output -@cindex @code{--sandbox} option, output redirection with @code{print}, @code{printf} +@cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf} So far, the output from @code{print} and @code{printf} has gone to the standard output, usually the screen. Both @code{print} and @code{printf} can @@ -8436,8 +8581,8 @@ Redirections in @command{awk} are written just like redirections in shell commands, except that they are written inside the @command{awk} program. @c the commas here are part of the see also -@cindex @code{print} statement, See Also redirection, of output -@cindex @code{printf} statement, See Also redirection, of output +@cindex @code{print} statement, See Also redirection@comma{} of output +@cindex @code{printf} statement, See Also redirection@comma{} of output There are four forms of output redirection: output to a file, output appended to a file, output through a pipe to another command, and output to a coprocess. They are all shown for the @code{print} statement, @@ -8449,29 +8594,29 @@ but they work identically for @code{printf}: @cindex operators, input/output @item print @var{items} > @var{output-file} This redirection prints the items into the output file named -@var{output-file}. The @value{FN} @var{output-file} can be any +@var{output-file}. The file name @var{output-file} can be any expression. Its value is changed to a string and then used as a -@value{FN} (@pxref{Expressions}). +file name (@pxref{Expressions}). When this type of redirection is used, the @var{output-file} is erased before the first output is written to it. Subsequent writes to the same @var{output-file} do not erase @var{output-file}, but append to it. (This is different from how you use redirections in shell scripts.) If @var{output-file} does not exist, it is created. For example, here -is how an @command{awk} program can write a list of BBS names to one +is how an @command{awk} program can write a list of peoples' names to one file named @file{name-list}, and a list of phone numbers to another file named @file{phone-list}: @example $ @kbd{awk '@{ print $2 > "phone-list"} -> @kbd{print $1 > "name-list" @}' BBS-list} +> @kbd{print $1 > "name-list" @}' mail-list} $ @kbd{cat phone-list} @print{} 555-5553 @print{} 555-3412 @dots{} $ @kbd{cat name-list} -@print{} aardvark -@print{} alpo-net +@print{} Amelia +@print{} Anthony @dots{} @end example @@ -8489,7 +8634,7 @@ appended to the file. If @var{output-file} does not exist, then it is created. @cindex @code{|} (vertical bar), @code{|} operator (I/O) -@cindex pipes, output +@cindex pipe, output @cindex output, pipes @item print @var{items} | @var{command} It is possible to send output to another program through a pipe @@ -8500,7 +8645,7 @@ to another process created to execute @var{command}. The redirection argument @var{command} is actually an @command{awk} expression. Its value is converted to a string whose contents give the shell command to be run. For example, the following produces two -files, one unsorted list of BBS names, and one list sorted in reverse +files, one unsorted list of peoples' names, and one list sorted in reverse alphabetical order: @ignore @@ -8513,7 +8658,7 @@ alone for now and let's hope no-one notices. @example awk '@{ print $1 > "names.unsorted" command = "sort -r > names.sorted" - print $1 | command @}' BBS-list + print $1 | command @}' mail-list @end example The unsorted list is written with an ordinary redirection, while @@ -8617,7 +8762,7 @@ open as many pipelines as the underlying operating system permits. A particularly powerful way to use redirection is to build command lines and pipe them into the shell, @command{sh}. For example, suppose you -have a list of files brought over from a system where all the @value{FN}s +have a list of files brought over from a system where all the file names are stored in uppercase, and you wish to rename them to have names in all lowercase. The following program is both simple and efficient: @@ -8639,12 +8784,12 @@ It then sends the list to the shell for execution. @c ENDOFRANGE reout @node Special Files -@section Special @value{FFN}s in @command{gawk} +@section Special File Names in @command{gawk} @c STARTOFRANGE gfn -@cindex @command{gawk}, @value{FN}s in +@cindex @command{gawk}, file names in -@command{gawk} provides a number of special @value{FN}s that it interprets -internally. These @value{FN}s provide access to standard file descriptors +@command{gawk} provides a number of special file names that it interprets +internally. These file names provide access to standard file descriptors and TCP/IP networking. @menu @@ -8708,12 +8853,12 @@ that happens, writing to the screen is not correct. In fact, if terminal at all. Then opening @file{/dev/tty} fails. -@command{gawk} provides special @value{FN}s for accessing the three standard +@command{gawk} provides special file names for accessing the three standard streams. @value{COMMONEXT}. It also provides syntax for accessing -any other inherited open files. If the @value{FN} matches +any other inherited open files. If the file name matches one of these special names when @command{gawk} redirects input or output, -then it directly uses the stream that the @value{FN} stands for. -These special @value{FN}s work for all operating systems that @command{gawk} +then it directly uses the stream that the file name stands for. +These special file names work for all operating systems that @command{gawk} has been ported to, not just those that are POSIX-compliant: @cindex common extensions, @code{/dev/stdin} special file @@ -8722,10 +8867,10 @@ has been ported to, not just those that are POSIX-compliant: @cindex extensions, common@comma{} @code{/dev/stdin} special file @cindex extensions, common@comma{} @code{/dev/stdout} special file @cindex extensions, common@comma{} @code{/dev/stderr} special file -@cindex @value{FN}s, standard streams in @command{gawk} -@cindex @code{/dev/@dots{}} special files (@command{gawk}) +@cindex file names, standard streams in @command{gawk} +@cindex @code{/dev/@dots{}} special files @cindex files, @code{/dev/@dots{}} special files -@cindex @code{/dev/fd/@var{N}} special files +@cindex @code{/dev/fd/@var{N}} special files (@command{gawk}) @table @file @item /dev/stdin The standard input (file descriptor 0). @@ -8743,7 +8888,7 @@ the shell). Unless special pains are taken in the shell from which @command{gawk} is invoked, only descriptors 0, 1, and 2 are available. @end table -The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} +The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2}, respectively. However, they are more self-explanatory. The proper way to write an error message in a @command{gawk} program @@ -8753,14 +8898,14 @@ is to use @file{/dev/stderr}, like this: print "Serious error detected!" > "/dev/stderr" @end example -@cindex troubleshooting, quotes with @value{FN}s -Note the use of quotes around the @value{FN}. +@cindex troubleshooting, quotes with file names +Note the use of quotes around the file name. Like any other redirection, the value must be a string. It is a common error to omit the quotes, which leads to confusing results. @c Exercise: What does it do? :-) -Finally, using the @code{close()} function on a @value{FN} of the +Finally, using the @code{close()} function on a file name of the form @code{"/dev/fd/@var{N}"}, for file descriptor numbers above two, does actually close the given file descriptor. @@ -8776,7 +8921,7 @@ versions of @command{awk}. @command{gawk} programs can open a two-way TCP/IP connection, acting as either a client or a server. -This is done using a special @value{FN} of the form: +This is done using a special file name of the form: @example @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}} @@ -8786,7 +8931,7 @@ The @var{net-type} is one of @samp{inet}, @samp{inet4} or @samp{inet6}. The @var{protocol} is one of @samp{tcp} or @samp{udp}, and the other fields represent the other essential pieces of information for making a networking connection. -These @value{FN}s are used with the @samp{|&} operator for communicating +These file names are used with the @samp{|&} operator for communicating with a coprocess (@pxref{Two-way I/O}). This is an advanced feature, mentioned here only for completeness. @@ -8794,21 +8939,21 @@ Full discussion is delayed until @ref{TCP/IP Networking}. @node Special Caveats -@subsection Special @value{FFN} Caveats +@subsection Special File Name Caveats Here is a list of things to bear in mind when using the -special @value{FN}s that @command{gawk} provides: +special file names that @command{gawk} provides: @itemize @bullet -@cindex compatibility mode (@command{gawk}), @value{FN}s -@cindex @value{FN}s, in compatibility mode +@cindex compatibility mode (@command{gawk}), file names +@cindex file names, in compatibility mode @item -Recognition of these special @value{FN}s is disabled if @command{gawk} is in +Recognition of these special file names is disabled if @command{gawk} is in compatibility mode (@pxref{Options}). @item @command{gawk} @emph{always} -interprets these special @value{FN}s. +interprets these special file names. For example, using @samp{/dev/fd/4} for output actually writes on file descriptor 4, and not on a new file descriptor that is @code{dup()}'ed from file descriptor 4. Most of @@ -8826,12 +8971,12 @@ Doing so results in unpredictable behavior. @c STARTOFRANGE ofc @cindex output, files@comma{} closing @c STARTOFRANGE pc -@cindex pipes, closing +@cindex pipe, closing @c STARTOFRANGE cc @cindex coprocesses, closing @cindex @code{getline} command, coprocesses@comma{} using from -If the same @value{FN} or the same shell command is used with @code{getline} +If the same file name or the same shell command is used with @code{getline} more than once during the execution of an @command{awk} program (@pxref{Getline}), the file is opened (or the command is executed) the first time only. @@ -8840,11 +8985,11 @@ The next time the same file or command is used with @code{getline}, another record is read from it, and so on. Similarly, when a file or pipe is opened for output, @command{awk} remembers -the @value{FN} or command associated with it, and subsequent +the file name or command associated with it, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until @command{awk} exits. -@cindex @code{close()} function +@cindexawkfunc{close} This implies that special steps are necessary in order to read the same file again from the beginning, or to rerun a shell command (rather than reading more output from the same command). The @code{close()} function @@ -8882,7 +9027,7 @@ file or command, or the next @code{print} or @code{printf} to that file or command, reopens the file or reruns the command. Because the expression that you use to close a file or pipeline must exactly match the expression used to open the file or run the command, -it is good practice to use a variable to store the @value{FN} or command. +it is good practice to use a variable to store the file name or command. The previous example becomes the following: @example @@ -8929,9 +9074,10 @@ a separate message. @cindex differences in @command{awk} and @command{gawk}, @code{close()} function @cindex portability, @code{close()} function and +@cindex @code{close()} function, portability If you use more files than the system allows you to have open, @command{gawk} attempts to multiplex the available open files among -your @value{DF}s. @command{gawk}'s ability to do this depends upon the +your data files. @command{gawk}'s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use @code{close()} on your files when you are done with them. @@ -9008,7 +9154,7 @@ retval = close(command) # syntax error in many Unix awks @end example @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @command{close()} function @command{gawk} treats @code{close()} as a function. The return value is @minus{}1 if the argument names something that was never opened with a redirection, or if there is @@ -9086,6 +9232,8 @@ which provide the values used in expressions. @node Constants @subsection Constant Expressions + +@c STARTOFRANGE cnst @cindex constants, types of The simplest type of expression is the @dfn{constant}, which always has @@ -9105,7 +9253,8 @@ have different forms, but are stored identically internally. @node Scalar Constants @subsubsection Numeric and String Constants -@cindex numeric, constants +@cindex constants, numeric +@cindex numeric constants A @dfn{numeric constant} stands for a number. This number can be an integer, a decimal fraction, or a number in scientific (exponential) notation.@footnote{The internal representation of all numbers, @@ -9131,7 +9280,7 @@ double-quotation marks. For example: @noindent @cindex differences in @command{awk} and @command{gawk}, strings -@cindex strings, length of +@cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible eight-bit ASCII characters including ASCII @sc{nul} (character code zero). @@ -9318,9 +9467,9 @@ upon the contents of the current input record. @cindex differences in @command{awk} and @command{gawk}, regexp constants @cindex dark corner, regexp constants, as arguments to user-defined functions -@cindex @code{gensub()} function (@command{gawk}) -@cindex @code{sub()} function -@cindex @code{gsub()} function +@cindexgawkfunc{gensub} +@cindexawkfunc{sub} +@cindexawkfunc{gsub} Constant regular expressions are also used as the first argument for the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the second argument of the @code{match()} function, @@ -9431,7 +9580,7 @@ Such an assignment has the following form: @var{variable}=@var{text} @end example -@cindex @code{-v} option +@cindex @option{-v} option @noindent With it, a variable is set either at the beginning of the @command{awk} run or in between input files. @@ -9445,7 +9594,7 @@ as in the following: @noindent the variable is set at the very beginning, even before the @code{BEGIN} rules execute. The @option{-v} option and its assignment -must precede all the @value{FN} arguments, as well as the program text. +must precede all the file name arguments, as well as the program text. (@xref{Options}, for more information about the @option{-v} option.) Otherwise, the variable assignment is performed at a time determined by @@ -9453,7 +9602,7 @@ its position among the input file arguments---after the processing of the preceding input file argument. For example: @example -awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list +awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list @end example @noindent @@ -9462,10 +9611,10 @@ the first file is read, the command line sets the variable @code{n} equal to four. This causes the fourth field to be printed in lines from @file{inventory-shipped}. After the first file has finished, but before the second file is started, @code{n} is set to two, so that the -second field is printed in lines from @file{BBS-list}: +second field is printed in lines from @file{mail-list}: @example -$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list} +$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list} @print{} 15 @print{} 24 @dots{} @@ -9527,7 +9676,7 @@ with @code{CONVFMT} as the format specifier (@pxref{String Functions}). -@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with +@code{CONVFMT}'s default value is @code{"%.6g"}, which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, @@ -9589,7 +9738,7 @@ point when reading the @command{awk} program source code, and for command-line variable assignments (@pxref{Other Arguments}). However, when interpreting input data, for @code{print} and @code{printf} output, and for number to string conversion, the local decimal point character is used. -@value{DARKCORNER}. +@value{DARKCORNER} Here are some examples indicating the difference in behavior, on a GNU/Linux system: @@ -9776,8 +9925,8 @@ For maximum portability, do not use the @samp{**} operator. @subsection String Concatenation @cindex Kernighan, Brian @quotation -@i{It seemed like a good idea at the time.}@* -Brian Kernighan +@i{It seemed like a good idea at the time.} +@author Brian Kernighan @end quotation @cindex string operators @@ -9788,9 +9937,9 @@ specific operator to represent it. Instead, concatenation is performed by writing expressions next to one another, with no operator. For example: @example -$ @kbd{awk '@{ print "Field number one: " $1 @}' BBS-list} -@print{} Field number one: aardvark -@print{} Field number one: alpo-net +$ @kbd{awk '@{ print "Field number one: " $1 @}' mail-list} +@print{} Field number one: Amelia +@print{} Field number one: Anthony @dots{} @end example @@ -9798,9 +9947,9 @@ Without the space in the string constant after the @samp{:}, the line runs together. For example: @example -$ @kbd{awk '@{ print "Field number one:" $1 @}' BBS-list} -@print{} Field number one:aardvark -@print{} Field number one:alpo-net +$ @kbd{awk '@{ print "Field number one:" $1 @}' mail-list} +@print{} Field number one:Amelia +@print{} Field number one:Anthony @dots{} @end example @@ -9817,6 +9966,8 @@ name = "name" print "something meaningful" > file name @end example +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility @noindent This produces a syntax error with some versions of Unix @command{awk}.@footnote{It happens that Brian Kernighan's @@ -10208,6 +10359,7 @@ just like variables. (Use @samp{$(i++)} when you want to do a field reference and a variable increment at the same time. The parentheses are necessary because of the precedence of the field reference operator @samp{$}.) +@c STARTOFRANGE deop @cindex decrement operators The decrement operator @samp{--} works just like @samp{++}, except that it subtracts one instead of adding it. As with @samp{++}, it can be used before @@ -10248,8 +10400,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @cindex Marx, Groucho @quotation @i{Doctor, doctor! It hurts when I do this!@* -So don't do that!}@* -Groucho Marx +So don't do that!} +@author Groucho Marx @end quotation @noindent @@ -10346,8 +10498,8 @@ the string constant @code{"0"} is actually true, because it is non-null. @node Typing and Comparison @subsection Variable Typing and Comparison Expressions @quotation -@i{The Guide is definitive. Reality is frequently inaccurate.}@* -The Hitchhiker's Guide to the Galaxy +@i{The Guide is definitive. Reality is frequently inaccurate.} +@author The Hitchhiker's Guide to the Galaxy @end quotation @c STARTOFRANGE comex @@ -10626,7 +10778,7 @@ string comparison (true) string comparison (true) @item a = 2; b = " +2" -@item a == b +@itemx a == b string comparison (false) @end table @@ -10760,10 +10912,10 @@ The Boolean operators are: @item @var{boolean1} && @var{boolean2} True if both @var{boolean1} and @var{boolean2} are true. For example, the following statement prints the current input record if it contains -both @samp{2400} and @samp{foo}: +both @samp{edu} and @samp{li}: @example -if ($0 ~ /2400/ && $0 ~ /foo/) print +if ($0 ~ /edu/ && $0 ~ /li/) print @end example @cindex side effects, Boolean operators @@ -10776,11 +10928,11 @@ no substring @samp{foo} in the record. @item @var{boolean1} || @var{boolean2} True if at least one of @var{boolean1} or @var{boolean2} is true. For example, the following statement prints all records in the input -that contain @emph{either} @samp{2400} or -@samp{foo} or both: +that contain @emph{either} @samp{edu} or +@samp{li} or both: @example -if ($0 ~ /2400/ || $0 ~ /foo/) print +if ($0 ~ /edu/ || $0 ~ /li/) print @end example The subexpression @var{boolean2} is evaluated only if @var{boolean1} @@ -11005,7 +11157,7 @@ $ @kbd{awk '@{ print "The square root of", $1, "is", sqrt($1) @}'} @print{} The square root of 3 is 1.73205 @kbd{5} @print{} The square root of 5 is 2.23607 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example A function can also have side effects, such as assigning @@ -11375,7 +11527,7 @@ slashes (@code{/@var{regexp}/}), or any expression whose string value is used as a dynamic regular expression (@pxref{Computed Regexps}). The following example prints the second field of each input record -whose first field is precisely @samp{foo}: +whose first field is precisely @samp{li}: @cindex @code{/} (forward slash), patterns and @cindex forward slash (@code{/}), patterns and @@ -11384,68 +11536,65 @@ whose first field is precisely @samp{foo}: @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator @example -$ @kbd{awk '$1 == "foo" @{ print $2 @}' BBS-list} +$ @kbd{awk '$1 == "li" @{ print $2 @}' mail-list} @end example @noindent -(There is no output, because there is no BBS site with the exact name @samp{foo}.) +(There is no output, because there is no person with the exact name @samp{li}.) Contrast this with the following regular expression match, which -accepts any record with a first field that contains @samp{foo}: +accepts any record with a first field that contains @samp{li}: @example -$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' BBS-list} -@print{} 555-1234 +$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' mail-list} +@print{} 555-5553 @print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 @end example @cindex regexp constants, as patterns @cindex patterns, regexp constants as A regexp constant as a pattern is also a special case of an expression -pattern. The expression @code{/foo/} has the value one if @samp{foo} -appears in the current input record. Thus, as a pattern, @code{/foo/} -matches any record containing @samp{foo}. +pattern. The expression @code{/li/} has the value one if @samp{li} +appears in the current input record. Thus, as a pattern, @code{/li/} +matches any record containing @samp{li}. @cindex Boolean expressions, as patterns Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match. For example, the following command prints all the records in -@file{BBS-list} that contain both @samp{2400} and @samp{foo}: +@file{mail-list} that contain both @samp{edu} and @samp{li}: @example -$ @kbd{awk '/2400/ && /foo/' BBS-list} -@print{} fooey 555-1234 2400/1200/300 B +$ @kbd{awk '/edu/ && /li/' mail-list} +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @end example The following command prints all records in -@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo} +@file{mail-list} that contain @emph{either} @samp{edu} or @samp{li} (or both, of course): @example -$ @kbd{awk '/2400/ || /foo/' BBS-list} -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C +$ @kbd{awk '/edu/ || /li/' mail-list} +@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F +@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F +@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example The following command prints all records in -@file{BBS-list} that do @emph{not} contain the string @samp{foo}: +@file{mail-list} that do @emph{not} contain the string @samp{li}: @example -$ @kbd{awk '! /foo/' BBS-list} -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200/300 C -@print{} sdace 555-3430 2400/1200/300 A +$ @kbd{awk '! /li/' mail-list} +@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A +@print{} Becky 555-7685 becky.algebrarum@@gmail.com A +@print{} Bill 555-1675 bill.drowning@@hotmail.com A +@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R +@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@print{} Martin 555-6480 martin.codicibus@@hotmail.com A +@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R @end example @cindex @code{BEGIN} pattern, Boolean patterns and @@ -11549,6 +11698,11 @@ $ @kbd{echo Yes | gawk '(/1/,/2/) || /Yes/'} @error{} gawk: cmd. line:1: ^ syntax error @end example +@cindex range patterns, line continuation and +As a minor point of interest, although it is poor style, +POSIX allows you to put a newline after the comma in +a range pattern. @value{DARKCORNER} + @node BEGIN/END @subsection The @code{BEGIN} and @code{END} Special Patterns @@ -11573,28 +11727,30 @@ programmers. @node Using BEGIN/END @subsubsection Startup and Cleanup Actions +@cindex @code{BEGIN} pattern +@cindex @code{END} pattern A @code{BEGIN} rule is executed once only, before the first input record is read. Likewise, an @code{END} rule is executed once only, after all the input is read. For example: @example $ @kbd{awk '} -> @kbd{BEGIN @{ print "Analysis of \"foo\"" @}} -> @kbd{/foo/ @{ ++n @}} -> @kbd{END @{ print "\"foo\" appears", n, "times." @}' BBS-list} -@print{} Analysis of "foo" -@print{} "foo" appears 4 times. +> @kbd{BEGIN @{ print "Analysis of \"li\"" @}} +> @kbd{/li/ @{ ++n @}} +> @kbd{END @{ print "\"li\" appears in", n, "records." @}' mail-list} +@print{} Analysis of "li" +@print{} "li" appears in 4 records. @end example @cindex @code{BEGIN} pattern, operators and @cindex @code{END} pattern, operators and -This program finds the number of records in the input file @file{BBS-list} -that contain the string @samp{foo}. The @code{BEGIN} rule prints a title +This program finds the number of records in the input file @file{mail-list} +that contain the string @samp{li}. The @code{BEGIN} rule prints a title for the report. There is no need to use the @code{BEGIN} rule to initialize the counter @code{n} to zero, since @command{awk} does this automatically (@pxref{Variables}). The second rule increments the variable @code{n} every time a -record containing the pattern @samp{foo} is read. The @code{END} rule +record containing the pattern @samp{li} is read. The @code{END} rule prints the value of @code{n} at the end of the run. The special patterns @code{BEGIN} and @code{END} cannot be used in ranges @@ -11647,6 +11803,7 @@ to give @code{$0} a real value is to execute a @code{getline} command without a variable (@pxref{Getline}). Another way is simply to assign a value to @code{$0}. +@cindex Brian Kernighan's @command{awk} @cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns @cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and @@ -11715,7 +11872,7 @@ you can bypass the fatal error and move on to the next file on the command line. @cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable +@cindex @code{ERRNO} variable, with @code{BEGINFILE} pattern @cindex @code{nextfile} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and You do this by checking if the @code{ERRNO} variable is not the empty string; if so, then @command{gawk} was not able to open the file. In @@ -11757,7 +11914,7 @@ both @code{BEGINFILE} and @code{ENDFILE}. Only the @samp{getline In most other @command{awk} implementations, or if @command{gawk} is in compatibility mode (@pxref{Options}), they are not special. -@c FIXME: For 4.1 maybe deal with this? +@c FIXME: For 4.2 maybe deal with this? @ignore Date: Tue, 17 May 2011 02:06:10 PDT From: rankin@pactechdata.com (Pat Rankin) @@ -11788,7 +11945,7 @@ An empty (i.e., nonexistent) pattern is considered to match @emph{every} input record. For example, the program: @example -awk '@{ print $1 @}' BBS-list +awk '@{ print $1 @}' mail-list @end example @noindent @@ -12041,6 +12198,7 @@ the first thing on its line. @subsection The @code{while} Statement @cindex @code{while} statement @cindex loops +@cindex loops, @code{while} @cindex loops, See Also @code{while} statement In programming, a @dfn{loop} is a part of a program that can @@ -12101,6 +12259,7 @@ program is harder to read without it. @node Do Statement @subsection The @code{do}-@code{while} Statement @cindex @code{do}-@code{while} statement +@cindex loops, @code{do}-@code{while} The @code{do} loop is a variation of the @code{while} looping statement. The @code{do} loop executes the @var{body} once and then repeats the @@ -12146,6 +12305,7 @@ occasionally is there a real use for a @code{do} statement. @node For Statement @subsection The @code{for} Statement @cindex @code{for} statement +@cindex loops, @code{for}, iterative The @code{for} statement makes it more convenient to count iterations of a loop. The general form of the @code{for} statement looks like this: @@ -12252,6 +12412,8 @@ for more information on this version of the @code{for} loop. @cindex @code{case} keyword @cindex @code{default} keyword +This @value{SECTION} describes a @command{gawk}-specific feature. + The @code{switch} statement allows the evaluation of an expression and the execution of statements based on a @code{case} match. Case statements are checked for a match in the order they are defined. If no suitable @@ -12316,6 +12478,7 @@ it is not available. @subsection The @code{break} Statement @cindex @code{break} statement @cindex loops, exiting +@cindex loops, @code{break} statement and The @code{break} statement jumps out of the innermost @code{for}, @code{while}, or @code{do} loop that encloses it. The following example @@ -12375,6 +12538,7 @@ This is discussed in @ref{Switch Statement}. @cindex POSIX @command{awk}, @code{break} statement and @cindex dark corner, @code{break} statement @cindex @command{gawk}, @code{break} statement in +@cindex Brian Kernighan's @command{awk} The @code{break} statement has no meaning when used outside the body of a loop or @code{switch}. However, although it was never documented, @@ -12439,6 +12603,7 @@ This program loops forever once @code{x} reaches 5. @cindex POSIX @command{awk}, @code{continue} statement and @cindex dark corner, @code{continue} statement @cindex @command{gawk}, @code{continue} statement in +@cindex Brian Kernighan's @command{awk} The @code{continue} statement has no special meaning with respect to the @code{switch} statement, nor does it have any meaning when used outside the body of a loop. Historical versions of @command{awk} treated a @code{continue} @@ -12527,11 +12692,11 @@ The @code{nextfile} statement is similar to the @code{next} statement. However, instead of abandoning processing of the current record, the @code{nextfile} statement instructs @command{awk} to stop processing the -current @value{DF}. +current data file. Upon execution of the @code{nextfile} statement, @code{FILENAME} is -updated to the name of the next @value{DF} listed on the command line, +updated to the name of the next data file listed on the command line, @code{FNR} is reset to one, and processing starts over with the first rule in the program. @@ -12540,10 +12705,10 @@ then the code in any @code{END} rules is executed. An exception to this is when @code{nextfile} is invoked during execution of any statement in an @code{END} rule; In this case, it causes the program to stop immediately. @xref{BEGIN/END}. -The @code{nextfile} statement is useful when there are many @value{DF}s +The @code{nextfile} statement is useful when there are many data files to process but it isn't necessary to process every record in every file. Without @code{nextfile}, -in order to move on to the next @value{DF}, a program +in order to move on to the next data file, a program would have to continue scanning the unwanted records. The @code{nextfile} statement accomplishes this much more efficiently. @@ -12576,8 +12741,10 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @cindex functions, user-defined, @code{next}/@code{nextfile} statements and @cindex @code{nextfile} statement, user-defined functions and -The current version of the Brian Kernighan's @command{awk} (@pxref{Other -Versions}) also supports @code{nextfile}. However, it doesn't allow the +@cindex Brian Kernighan's @command{awk} +@cindex @command{mawk} utility +The current version of the Brian Kernighan's @command{awk}, and @command{mawk} (@pxref{Other +Versions}) also support @code{nextfile}. However, they don't allow the @code{nextfile} statement inside function bodies (@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the next record and starts processing it with the first rule in the program, @@ -12781,7 +12948,7 @@ exclusively on the value of @code{FS}. @item FS This is the input field separator (@pxref{Field Separators}). -The value is a single-character string or a multi-character regular +The value is a single-character string or a multicharacter regular expression that matches the separations between fields in an input record. If the value is the null string (@code{""}), then each character in the record becomes a separate field. @@ -12814,8 +12981,8 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment. @cindex @command{gawk}, @code{IGNORECASE} variable in @cindex @code{IGNORECASE} variable @cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable -@cindex case sensitivity, string comparisons and -@cindex case sensitivity, regexps and +@cindex case sensitivity, and string comparisons +@cindex case sensitivity, and regexps @cindex regular expressions, case sensitivity @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons @@ -12927,7 +13094,7 @@ This is the subscript separator. It has the default value of @code{"\034"} and is used to separate the parts of the indices of a multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} -(@pxref{Multi-dimensional}). +(@pxref{Multidimensional}). @cindex @command{gawk}, @code{TEXTDOMAIN} variable in @cindex @code{TEXTDOMAIN} variable @@ -12980,16 +13147,16 @@ In the following example: $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} -> @kbd{@}' inventory-shipped BBS-list} +> @kbd{@}' inventory-shipped mail-list} @print{} awk @print{} inventory-shipped -@print{} BBS-list +@print{} mail-list @end example @noindent @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]} contains @samp{inventory-shipped}, and @code{ARGV[2]} contains -@samp{BBS-list}. The value of @code{ARGC} is three, one more than the +@samp{mail-list}. The value of @code{ARGC} is three, one more than the index of the last element in @code{ARGV}, because the elements are numbered from zero. @@ -13010,17 +13177,17 @@ about how @command{awk} uses these variables. @cindex differences in @command{awk} and @command{gawk}, @code{ARGIND} variable @item ARGIND # The index in @code{ARGV} of the current file being processed. -Every time @command{gawk} opens a new @value{DF} for processing, it sets -@code{ARGIND} to the index in @code{ARGV} of the @value{FN}. +Every time @command{gawk} opens a new data file for processing, it sets +@code{ARGIND} to the index in @code{ARGV} of the file name. When @command{gawk} is processing the input files, @samp{FILENAME == ARGV[ARGIND]} is always true. @cindex files, processing@comma{} @code{ARGIND} variable and This variable is useful in file processing; it allows you to tell how far -along you are in the list of @value{DF}s as well as to distinguish between -successive instances of the same @value{FN} on the command line. +along you are in the list of data files as well as to distinguish between +successive instances of the same file name on the command line. -@cindex @value{FN}s, distinguishing +@cindex file names, distinguishing While you can change the value of @code{ARGIND} within your @command{awk} program, @command{gawk} automatically sets it to a new value when the next file is opened. @@ -13032,15 +13199,23 @@ or if @command{gawk} is in compatibility mode it is not special. @cindex @code{ENVIRON} array -@cindex environment variables +@cindex environment variables, in @code{ENVIRON} array @item ENVIRON An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system()} function. -@c (In a future version of @command{gawk}, it may do so.) +@code{ENVIRON["HOME"]} might be @file{/home/arnold}. + +For POSIX @command{awk}, changing this array does not affect the +environment passed on to any programs that @command{awk} may spawn via +redirection or the @code{system()} function. + +However, beginning with version 4.2, if not in POSIX +compatibility mode, @command{gawk} does update its own environment when +@code{ENVIRON} is changed, thus changing the environment seen by programs +that it creates. You should therefore be especially careful if you +modify @code{ENVIRON["PATH"]"}, which is the search path for finding +executable programs. Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @@ -13082,14 +13257,14 @@ it is not special. @cindex dark corner, @code{FILENAME} variable @item FILENAME The name of the file that @command{awk} is currently reading. -When no @value{DF}s are listed on the command line, @command{awk} reads +When no data files are listed on the command line, @command{awk} reads from the standard input and @code{FILENAME} is set to @code{"-"}. @code{FILENAME} is changed each time a new file is read (@pxref{Reading Files}). Inside a @code{BEGIN} rule, the value of @code{FILENAME} is @code{""}, since there are no input files being processed yet.@footnote{Some early implementations of Unix @command{awk} initialized -@code{FILENAME} to @code{"-"}, even if there were @value{DF}s to be +@code{FILENAME} to @code{"-"}, even if there were data files to be processed. This behavior was incorrect and should not be relied upon in your programs.} @value{DARKCORNER} @@ -13111,13 +13286,7 @@ The number of fields in the current input record. @code{NF} is set each time a new record is read, when a new field is created or when @code{$0} changes (@pxref{Fields}). -Unlike most of the variables described in this -@ifnotinfo -section, -@end ifnotinfo -@ifinfo -node, -@end ifinfo +Unlike most of the variables described in this @value{SUBSECTION}, assigning a value to @code{NF} has the potential to affect @command{awk}'s internal workings. In particular, assignments to @code{NF} can be used to create or remove fields from the @@ -13153,10 +13322,12 @@ The following elements (listed alphabetically) are guaranteed to be available: @table @code +@cindex effective group ID of @command{gawk} user @item PROCINFO["egid"] The value of the @code{getegid()} system call. @item PROCINFO["euid"] +@cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @item PROCINFO["FS"] @@ -13166,6 +13337,7 @@ This is or @code{"FPAT"} if field matching with @code{FPAT} is in effect. @item PROCINFO["identifiers"] +@cindex program identifiers A subarray, indexed by the names of all identifiers used in the text of the AWK program. For each identifier, the value of the element is one of the following: @@ -13194,15 +13366,19 @@ after it has finished parsing the program; they are @emph{not} updated while the program runs. @item PROCINFO["gid"] +@cindex group ID of @command{gawk} user The value of the @code{getgid()} system call. @item PROCINFO["pgrpid"] +@cindex process group idIDof @command{gawk} process The process group ID of the current process. @item PROCINFO["pid"] +@cindex process ID of @command{gawk} process The process ID of the current process. @item PROCINFO["ppid"] +@cindex parent process ID of @command{gawk} process The parent process ID of the current process. @item PROCINFO["sorted_in"] @@ -13222,25 +13398,31 @@ Assigning a new value to this element changes the default. The value of the @code{getuid()} system call. @item PROCINFO["version"] +@cindex version of @command{gawk} +@cindex @command{gawk} version The version of @command{gawk}. @end table The following additional elements in the array are available to provide information about the MPFR and GMP libraries if your version of @command{gawk} supports arbitrary precision numbers -(@pxref{Arbitrary Precision Arithmetic}): +(@pxref{Gawk and MPFR}): @table @code +@cindex version of GNU MPFR library @item PROCINFO["mpfr_version"] The version of the GNU MPFR library. @item PROCINFO["gmp_version"] +@cindex version of GNU MP library The version of the GNU MP library. @item PROCINFO["prec_max"] +@cindex maximum precision supported by MPFR library The maximum precision supported by MPFR. @item PROCINFO["prec_min"] +@cindex minimum precision supported by MPFR library The minimum precision required by MPFR. @end table @@ -13251,12 +13433,15 @@ of @command{gawk} supports dynamic loading of extension functions @table @code @item PROCINFO["api_major"] +@cindex version of @command{gawk} extension API +@cindex extension API, version number The major version of the extension API. @item PROCINFO["api_minor"] The minor version of the extension API. @end table +@cindex supplementary groups of @command{gawk} process On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -13264,7 +13449,7 @@ to test for these elements (@pxref{Reference to Elements}). @cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, uses The @code{PROCINFO} array has the following additional uses: @itemize @bullet @@ -13336,7 +13521,7 @@ if an element in @code{SYMTAB} is an array. Also, you may not use the @code{delete} statement with the @code{SYMTAB} array. -You may use an index for @code{SYMTAB} that is not a predefined identifer: +You may use an index for @code{SYMTAB} that is not a predefined identifier: @example SYMTAB["xxx"] = 5 @@ -13404,7 +13589,7 @@ changed. @node ARGC and ARGV @subsection Using @code{ARGC} and @code{ARGV} -@cindex @code{ARGC}/@code{ARGV} variables +@cindex @code{ARGC}/@code{ARGV} variables, how to use @cindex arguments, command-line @cindex command line, arguments @@ -13416,16 +13601,16 @@ and @code{ARGV}: $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} -> @kbd{@}' inventory-shipped BBS-list} +> @kbd{@}' inventory-shipped mail-list} @print{} awk @print{} inventory-shipped -@print{} BBS-list +@print{} mail-list @end example @noindent In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]} contains @samp{inventory-shipped}, and @code{ARGV[2]} contains -@samp{BBS-list}. +@samp{mail-list}. Notice that the @command{awk} program is not entered in @code{ARGV}. The other command-line options, with their arguments, are also not entered. This includes variable assignments done with the @option{-v} @@ -13466,11 +13651,11 @@ additional files to be read. If the value of @code{ARGC} is decreased, that eliminates input files from the end of the list. By recording the old value of @code{ARGC} elsewhere, a program can treat the eliminated arguments as -something other than @value{FN}s. +something other than file names. To eliminate a file from the middle of the list, store the null string (@code{""}) into @code{ARGV} in place of the file's name. As a -special feature, @command{awk} ignores @value{FN}s that have been +special feature, @command{awk} ignores file names that have been replaced with the null string. Another option is to use the @code{delete} statement to remove elements from @@ -13549,7 +13734,7 @@ ability to support true multidimensional arrays. @cindex variables, names of @cindex functions, names of -@cindex arrays, names of +@cindex arrays, names of, and names of functions/variables @cindex names, arrays/variables @cindex namespace issues @command{awk} maintains a single set @@ -13565,7 +13750,7 @@ same @command{awk} program. * Numeric Array Subscripts:: How to use numbers as subscripts in @command{awk}. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in +* Multidimensional:: Emulating multidimensional arrays in @command{awk}. * Arrays of Arrays:: True multidimensional arrays. @end menu @@ -13595,8 +13780,8 @@ an array. @cindex Wall, Larry @quotation @i{Doing linear scans over an associative array is like trying to club someone -to death with a loaded Uzi.}@* -Larry Wall +to death with a loaded Uzi.} +@author Larry Wall @end quotation The @command{awk} language provides one-dimensional arrays @@ -13725,10 +13910,9 @@ Here, the number @code{1} isn't double-quoted, since @command{awk} automatically converts it to a string. @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable @cindex case sensitivity, array indices and -@cindex arrays, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array subscripts and +@cindex arrays, and @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, and array indices The value of @code{IGNORECASE} has no effect upon array subscripting. The identical string value used to store an array element must be used to retrieve it. @@ -13744,8 +13928,9 @@ is independent of the number of elements in the array. @node Reference to Elements @subsection Referring to an Array Element -@cindex arrays, elements, referencing -@cindex elements in arrays +@cindex arrays, referencing elements +@cindex array members +@cindex elements of arrays The principal way to use an array is to refer to one of its elements. An array reference is an expression as follows: @@ -13762,11 +13947,16 @@ The value of the array reference is the current value of that array element. For example, @code{foo[4.3]} is an expression for the element of array @code{foo} at index @samp{4.3}. +@cindex arrays, unassigned elements +@cindex unassigned array elements +@cindex empty array elements A reference to an array element that has no recorded value yields a value of @code{""}, the null string. This includes elements that have not been assigned any value as well as elements that have been deleted (@pxref{Delete}). +@cindex non-existent array elements +@cindex arrays, elements that don't exist @quotation NOTE A reference to an element that does not exist @emph{automatically} creates that array element, with the null string as its value. (In some cases, @@ -13786,7 +13976,7 @@ if it didn't exist before! @end quotation @c @cindex arrays, @code{in} operator and -@cindex @code{in} operator +@cindex @code{in} operator, testing if array element exists To determine whether an element exists in an array at a certain index, use the following expression: @@ -13821,8 +14011,8 @@ if (frequencies[2] != "") @node Assigning Elements @subsection Assigning Array Elements -@cindex arrays, elements, assigning -@cindex elements in arrays, assigning +@cindex arrays, elements, assigning values +@cindex elements in arrays, assigning values Array elements can be assigned values just like @command{awk} variables: @@ -13839,6 +14029,7 @@ assign to that element of the array. @node Array Example @subsection Basic Array Example +@cindex arrays, an example of using The following program takes a list of lines, each beginning with a line number, and prints them out in order of line number. The line numbers @@ -13908,7 +14099,9 @@ END @{ @node Scanning an Array @subsection Scanning All Elements of an Array @cindex elements in arrays, scanning +@cindex scanning arrays @cindex arrays, scanning +@cindex loops, @code{for}, array scanning In programs that use arrays, it is often necessary to use a loop that executes once for each element of an array. In other languages, where @@ -13925,7 +14118,7 @@ for (@var{var} in @var{array}) @end example @noindent -@cindex @code{in} operator +@cindex @code{in} operator, use in loops This loop executes @var{body} once for each index in @var{array} that the program has previously used, with the variable @var{var} set to that index. @@ -13964,8 +14157,9 @@ END @{ @xref{Word Sorting}, for a more detailed example of this type. -@cindex arrays, elements, order of -@cindex elements in arrays, order of +@cindex arrays, elements, order of access by @code{in} operator +@cindex elements in arrays, order of access by @code{in} operator +@cindex @code{in} operator, order of array access The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within @command{awk} and normally cannot be controlled or changed. This can lead to @@ -13983,6 +14177,8 @@ determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of @command{awk} to the next. +@cindex array scanning order, controlling +@cindex controlling array scanning order Often, though, you may wish to do something simple, such as ``traverse the array by comparing the indices in ascending order,'' or ``traverse the array by comparing the values in descending order.'' @@ -13999,6 +14195,7 @@ to use for comparison of array elements. This advanced feature is described later, in @ref{Array Sorting}. @end itemize +@cindex @code{PROCINFO}, values of @code{sorted_in} The following special values for @code{PROCINFO["sorted_in"]} are available: @table @code @@ -14007,29 +14204,29 @@ Array elements are processed in arbitrary order, which is the default @command{awk} behavior. @item "@@ind_str_asc" -Order by indices compared as strings; this is the most basic sort. +Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with @samp{a[2*5] = 1} the index is @code{"10"} rather than numeric 10.) @item "@@ind_num_asc" -Order by indices but force them to be treated as numbers in the process. +Order by indices in ascending order but force them to be treated as numbers in the process. Any index with a non-numeric value will end up positioned as if it were zero. @item "@@val_type_asc" -Order by element values rather than indices. +Order by element values in ascending order (rather than by indices). Ordering is by the type assigned to the element (@pxref{Typing and Comparison}). All numeric values come before all string values, which in turn come before all subarrays. (Subarrays have not been described yet; -@pxref{Arrays of Arrays}). +@pxref{Arrays of Arrays}.) @item "@@val_str_asc" -Order by element values rather than by indices. Scalar values are +Order by element values in ascending order (rather than by indices). Scalar values are compared as strings. Subarrays, if present, come out last. @item "@@val_num_asc" -Order by element values rather than by indices. Scalar values are +Order by element values in ascending order (rather than by indices). Scalar values are compared as numbers. Subarrays, if present, come out last. When numeric values are equal, the string values are used to provide an ordering: this guarantees consistent results across different @@ -14042,13 +14239,14 @@ across different environments.} which @command{gawk} uses internally to perform the sorting. @item "@@ind_str_desc" -Reverse order from the most basic sort. +String indices ordered from high to low. @item "@@ind_num_desc" Numeric indices ordered from high to low. @item "@@val_type_desc" -Element values, based on type, in descending order. +Element values, based on type, ordered from high to low. +Subarrays, if present, come out first. @item "@@val_str_desc" Element values, treated as strings, ordered from high to low. @@ -14158,7 +14356,7 @@ if (4 in foo) print "This will never be printed" @end example -@cindex null strings, array elements and +@cindex null strings, and deleting array elements It is important to note that deleting an element is @emph{not} the same as assigning it a null value (the empty string, @code{""}). For example: @@ -14180,6 +14378,7 @@ is not in the array is deleted. @cindex extensions, common@comma{} @code{delete} to delete entire arrays @cindex arrays, deleting entire contents @cindex deleting entire arrays +@cindex @code{delete} @var{array} @cindex differences in @command{awk} and @command{gawk}, array elements, deleting All the elements of an array may be deleted with a single statement by leaving off the subscript in the @code{delete} statement, @@ -14194,6 +14393,7 @@ Using this version of the @code{delete} statement is about three times more efficient than the equivalent loop that deletes each element one at a time. +@cindex Brian Kernighan's @command{awk} @quotation NOTE For many years, using @code{delete} without a subscript was a @command{gawk} extension. @@ -14236,9 +14436,9 @@ a = 3 @section Using Numbers to Subscript Arrays @cindex numbers, as array subscripts -@cindex arrays, subscripts +@cindex arrays, numeric subscripts @cindex subscripts in arrays, numbers as -@cindex @code{CONVFMT} variable, array subscripts and +@cindex @code{CONVFMT} variable, and array subscripts An important aspect to remember about arrays is that @emph{array subscripts are always strings}. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting @@ -14268,7 +14468,8 @@ string value from @code{xyz}---this time @code{"12.15"}---because the value of @code{CONVFMT} only allows two significant digits. This test fails, since @code{"12.15"} is different from @code{"12.153"}. -@cindex converting, during subscripting +@cindex converting integer array subscripts +@cindex integer array indices According to the rules for conversions (@pxref{Conversion}), integer values are always converted to strings as integers, no matter what the @@ -14358,11 +14559,11 @@ Even though it is somewhat unusual, the null string if @option{--lint} is provided on the command line (@pxref{Options}). -@node Multi-dimensional +@node Multidimensional @section Multidimensional Arrays @menu -* Multi-scanning:: Scanning multidimensional arrays. +* Multiscanning:: Scanning multidimensional arrays. @end menu @cindex subscripts in arrays, multidimensional @@ -14374,7 +14575,7 @@ languages, including @command{awk}) to refer to an element of a two-dimensional array named @code{grid} is with @code{grid[@var{x},@var{y}]}. -@cindex @code{SUBSEP} variable, multidimensional arrays +@cindex @code{SUBSEP} variable, and multidimensional arrays Multidimensional arrays are supported in @command{awk} through concatenation of indices into one string. @command{awk} converts the indices into strings @@ -14406,6 +14607,7 @@ combined strings that are ambiguous. Suppose that @code{SUBSEP} is "b@@c"]}} are indistinguishable because both are actually stored as @samp{foo["a@@b@@c"]}. +@cindex @code{in} operator, index existence in multidimensional arrays To test whether a particular index sequence exists in a multidimensional array, use the same operator (@code{in}) that is used for single dimensional arrays. Write the whole sequence of indices @@ -14460,7 +14662,7 @@ the program produces the following output: 3 2 1 6 @end example -@node Multi-scanning +@node Multiscanning @subsection Scanning Multidimensional Arrays There is no special @code{for} statement for scanning a @@ -14471,6 +14673,7 @@ multidimensional @emph{way of accessing} an array. @cindex subscripts in arrays, multidimensional, scanning @cindex arrays, multidimensional, scanning +@cindex scanning multidimensional arrays However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining the scanning @code{for} statement @@ -14512,6 +14715,7 @@ separate indices is recovered. @node Arrays of Arrays @section Arrays of Arrays +@cindex arrays of arrays @command{gawk} goes beyond standard @command{awk}'s multidimensional array access and provides true arrays of @@ -14771,6 +14975,7 @@ two arguments 11 and 10. @node Numeric Functions @subsection Numeric Functions +@cindex numeric functions The following list describes all of the built-in functions that work with numbers. @@ -14778,22 +14983,26 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):} @table @code @item atan2(@var{y}, @var{x}) -@cindex @code{atan2()} function +@cindexawkfunc{atan2} +@cindex arctangent Return the arctangent of @code{@var{y} / @var{x}} in radians. You can use @samp{pi = atan2(0, -1)} to retrieve the value of @value{PI}. @item cos(@var{x}) -@cindex @code{cos()} function +@cindexawkfunc{cos} +@cindex cosine Return the cosine of @var{x}, with @var{x} in radians. @item exp(@var{x}) -@cindex @code{exp()} function +@cindexawkfunc{exp} +@cindex exponent Return the exponential of @var{x} (@code{e ^ @var{x}}) or report an error if @var{x} is out of range. The range of values @var{x} can have depends on your machine's floating-point representation. @item int(@var{x}) -@cindex @code{int()} function +@cindexawkfunc{int} +@cindex round to nearest integer Return the nearest integer to @var{x}, located between @var{x} and zero and truncated toward zero. @@ -14801,12 +15010,13 @@ For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. @item log(@var{x}) -@cindex @code{log()} function +@cindexawkfunc{log} +@cindex logarithm Return the natural logarithm of @var{x}, if @var{x} is positive; otherwise, report an error. @item rand() -@cindex @code{rand()} function +@cindexawkfunc{rand} @cindex random numbers, @code{rand()}/@code{srand()} functions Return a random number. The values of @code{rand()} are uniformly distributed between zero and one. @@ -14848,7 +15058,7 @@ function roll(n) @{ return 1 + int(rand() * n) @} @} @end example -@cindex numbers, random +@cindex seeding random number generator @cindex random numbers, seed of @quotation CAUTION In most @command{awk} implementations, including @command{gawk}, @@ -14864,17 +15074,19 @@ use @code{srand()}. @end quotation @item sin(@var{x}) -@cindex @code{sin()} function +@cindexawkfunc{sin} +@cindex sine Return the sine of @var{x}, with @var{x} in radians. @item sqrt(@var{x}) -@cindex @code{sqrt()} function +@cindexawkfunc{sqrt} +@cindex square root Return the positive square root of @var{x}. @command{gawk} prints a warning message if @var{x} is negative. Thus, @code{sqrt(4)} is 2. @item srand(@r{[}@var{x}@r{]}) -@cindex @code{srand()} function +@cindexawkfunc{srand} Set the starting point, or seed, for generating random numbers to the value @var{x}. @@ -14904,16 +15116,18 @@ sequences of random numbers. @node String Functions @subsection String-Manipulation Functions +@cindex string-manipulation functions -The functions in this @value{SECTION} look at or change the text of one or more -strings. -@code{gawk} understands locales (@pxref{Locales}), and does all string processing in terms of -@emph{characters}, not @emph{bytes}. This distinction is particularly important -to understand for locales where one character -may be represented by multiple bytes. Thus, for example, @code{length()} -returns the number of characters in a string, and not the number of bytes -used to represent those characters, Similarly, @code{index()} works with -character indices, and not byte indices. +The functions in this @value{SECTION} look at or change the text of one +or more strings. + +@code{gawk} understands locales (@pxref{Locales}), and does all +string processing in terms of @emph{characters}, not @emph{bytes}. +This distinction is particularly important to understand for locales +where one character may be represented by multiple bytes. Thus, for +example, @code{length()} returns the number of characters in a string, +and not the number of bytes used to represent those characters. Similarly, +@code{index()} works with character indices, and not byte indices. In the following list, optional parameters are enclosed in square brackets@w{ ([ ]).} Several functions perform string substitution; the full discussion is @@ -14930,30 +15144,34 @@ pound sign@w{ (@samp{#}):} @table @code @item asort(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # +@itemx asorti(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # +@cindexgawkfunc{asorti} +@cindex sort array @cindex arrays, elements, retrieving number of -@cindex @code{asort()} function (@command{gawk}) +@cindexgawkfunc{asort} +@cindex sort array indices +These two functions are similar in behavior, so they are described +together. + +@quotation NOTE +The following description ignores the third argument, @var{how}, since it +requires understanding features that we have not discussed yet. Thus, +the discussion here is a deliberate simplification. (We do provide all +the details later on: @xref{Array Sorting Functions}, for the full story.) +@end quotation + +Both functions return the number of elements in the array @var{source}. +For @command{asort()}, @command{gawk} sorts the values of @var{source} +and replaces the indices of the sorted values of @var{source} with +sequential integers starting with one. If the optional array @var{dest} +is specified, then @var{source} is duplicated into @var{dest}. @var{dest} +is then sorted, leaving the indices of @var{source} unchanged. + @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -Return the number of elements in the array @var{source}. -@command{gawk} sorts the contents of @var{source} -and replaces the indices -of the sorted values of @var{source} with sequential -integers starting with one. If the optional array @var{dest} is specified, -then @var{source} is duplicated into @var{dest}. @var{dest} is then -sorted, leaving the indices of @var{source} unchanged. The optional third -argument @var{how} is a string which controls the rule for comparing values, -and the sort direction. A single space is required between the -comparison mode, @samp{string} or @samp{number}, and the direction specification, -@samp{ascending} or @samp{descending}. You can omit direction and/or mode -in which case it will default to @samp{ascending} and @samp{string}, respectively. -An empty string "" is the same as the default @code{"ascending string"} -for the value of @var{how}. If the @samp{source} array contains subarrays as values, -they will come out last(first) in the @samp{dest} array for @samp{ascending}(@samp{descending}) -order specification. The value of @code{IGNORECASE} affects the sorting. -The third argument can also be a user-defined function name in which case -the value returned by the function is used to order the array elements -before constructing the result array. -@xref{Array Sorting Functions}, for more information. +When comparing strings, @code{IGNORECASE} affects the sorting +(@pxref{Array Sorting Functions}). If the +@var{source} array contains subarrays as values (@pxref{Arrays of +Arrays}), they will come last, after all scalar values. For example, if the contents of @code{a} are as follows: @@ -14979,32 +15197,24 @@ a[2] = "de" a[3] = "sac" @end example -In order to reverse the direction of the sorted results in the above example, -@code{asort()} can be called with three arguments as follows: +The @code{asorti()} function works similarly to @code{asort()}, however, +the @emph{indices} are sorted, instead of the values. Thus, in the +previous example, starting with the same initial set of indices and +values in @code{a}, calling @samp{asorti(a)} would yield: @example -asort(a, a, "descending") +a[1] = "first" +a[2] = "last" +a[3] = "middle" @end example -The @code{asort()} function is described in more detail in -@ref{Array Sorting Functions}. -@code{asort()} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options}). - -@item asorti(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) # -@cindex @code{asorti()} function (@command{gawk}) -Return the number of elements in the array @var{source}. -It works similarly to @code{asort()}, however, the @emph{indices} -are sorted, instead of the values. (Here too, -@code{IGNORECASE} affects the sorting.) - -The @code{asorti()} function is described in more detail in -@ref{Array Sorting Functions}. -@code{asorti()} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options}). +@code{asort()} and @code{asorti()} are @command{gawk} extensions; they +are not available in compatibility mode (@pxref{Options}). @item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) # -@cindex @code{gensub()} function (@command{gawk}) +@cindexgawkfunc{gensub} +@cindex search and replace in strings +@cindex substitute in string Search the target string @var{target} for matches of the regular expression @var{regexp}. If @var{how} is a string beginning with @samp{g} or @samp{G} (short for ``global''), then replace all matches of @var{regexp} with @@ -15013,7 +15223,7 @@ which match of @var{regexp} to replace. If no @var{target} is supplied, use @code{$0}. It returns the modified string as the result of the function and the original target string is @emph{not} changed. -@code{gensub()} is a general substitution function. It's purpose is +@code{gensub()} is a general substitution function. Its purpose is to provide more features than the standard @code{sub()} and @code{gsub()} functions. @@ -15067,7 +15277,7 @@ is the original unchanged value of @var{target}. in compatibility mode (@pxref{Options}). @item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{gsub()} function +@cindexawkfunc{gsub} Search @var{target} for @emph{all} of the longest, leftmost, @emph{nonoverlapping} matching substrings it can find and replace them with @var{replacement}. @@ -15089,8 +15299,9 @@ As in @code{sub()}, the characters @samp{&} and @samp{\} are special, and the third argument must be assignable. @item index(@var{in}, @var{find}) -@cindex @code{index()} function -@cindex searching +@cindexawkfunc{index} +@cindex search in string +@cindex find substring in string Search the string @var{in} for the first occurrence of the string @var{find}, and return the position in characters where that occurrence begins in the string @var{in}. Consider the following example: @@ -15107,7 +15318,9 @@ If @var{find} is not found, @code{index()} returns zero. It is a fatal error to use a regexp constant for @var{find}. @item length(@r{[}@var{string}@r{]}) -@cindex @code{length()} function +@cindexawkfunc{length} +@cindex string length +@cindex length of string Return the number of characters in @var{string}. If @var{string} is a number, the length of the digit string representing that number is returned. For example, @code{length("abcde")} is five. By @@ -15115,6 +15328,8 @@ contrast, @code{length(15 * 35)} works out to three. In this example, 15 * 35 = 525, and 525 is then converted to the string @code{"525"}, which has three characters. +@cindex length of input record +@cindex input record, length of If no argument is supplied, @code{length()} returns the length of @code{$0}. @c @cindex historical features @@ -15153,6 +15368,8 @@ warning about this. @cindex common extensions, @code{length()} applied to an array @cindex extensions, common@comma{} @code{length()} applied to an array @cindex differences between @command{gawk} and @command{awk} +@cindex number of array elements +@cindex array, number of elements With @command{gawk} and several other @command{awk} implementations, when given an array argument, the @code{length()} function returns the number of elements in the array. @value{COMMONEXT} @@ -15166,7 +15383,9 @@ If @option{--posix} is supplied, using an array argument is a fatal error (@pxref{Arrays}). @item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]}) -@cindex @code{match()} function +@cindexawkfunc{match} +@cindex string, regular expression match +@cindex match regexp in string Search @var{string} for the longest, leftmost substring matched by the regular expression, @var{regexp} and return the character position, or @dfn{index}, @@ -15281,7 +15500,8 @@ The @var{array} argument to @code{match()} is a using a third argument is a fatal error. @item patsplit(@var{string}, @var{array} @r{[}, @var{fieldpat} @r{[}, @var{seps} @r{]} @r{]}) # -@cindex @code{patsplit()} function (@command{gawk}) +@cindexgawkfunc{patsplit} +@cindex split string into array Divide @var{string} into pieces defined by @var{fieldpat} and store the pieces in @var{array} and the separator strings in the @@ -15312,7 +15532,7 @@ The @code{patsplit()} function is a it is not available. @item split(@var{string}, @var{array} @r{[}, @var{fieldsep} @r{[}, @var{seps} @r{]} @r{]}) -@cindex @code{split()} function +@cindexawkfunc{split} Divide @var{string} into pieces separated by @var{fieldsep} and store the pieces in @var{array} and the separator strings in the @var{seps} array. The first piece is stored in @@ -15341,7 +15561,7 @@ split("cul-de-sac", a, "-", seps) @end example @noindent -@cindex strings, splitting +@cindex strings splitting, example splits the string @samp{cul-de-sac} into three fields using @samp{-} as the separator. It sets the contents of the array @code{a} as follows: @@ -15397,7 +15617,8 @@ If @var{string} does not match @var{fieldsep} at all (but is not null), @var{string}. @item sprintf(@var{format}, @var{expression1}, @dots{}) -@cindex @code{sprintf()} function +@cindexawkfunc{sprintf} +@cindex formatting strings Return (without printing) the string that @code{printf} would have printed out with the same arguments (@pxref{Printf}). @@ -15410,7 +15631,8 @@ pival = sprintf("pi = %.2f (approx.)", 22/7) @noindent assigns the string @w{@samp{pi = 3.14 (approx.)}} to the variable @code{pival}. -@cindex @code{strtonum()} function (@command{gawk}) +@cindexgawkfunc{strtonum} +@cindex convert string to number @item strtonum(@var{str}) # Examine @var{str} and return its numeric value. If @var{str} begins with a leading @samp{0}, @code{strtonum()} assumes that @var{str} @@ -15433,12 +15655,12 @@ you use the @option{--non-decimal-data} option, which isn't recommended. Note also that @code{strtonum()} uses the current locale's decimal point for recognizing numbers (@pxref{Locales}). -@cindex differences in @command{awk} and @command{gawk}, @code{strtonum()} function (@command{gawk}) @code{strtonum()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{sub()} function +@cindexawkfunc{sub} +@cindex replace in string Search @var{target}, which is treated as a string, for the leftmost, longest substring matched by the regular expression @var{regexp}. Modify the entire string @@ -15538,7 +15760,8 @@ Finally, if the @var{regexp} is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match. @item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]}) -@cindex @code{substr()} function +@cindexawkfunc{substr} +@cindex substring Return a @var{length}-character-long substring of @var{string}, starting at character number @var{start}. The first character of a string is character number one.@footnote{This is different from @@ -15552,6 +15775,7 @@ suffix is also returned if @var{length} is greater than the number of characters remaining in the string, counting from character @var{start}. +@cindex Brian Kernighan's @command{awk} If @var{start} is less than one, @code{substr()} treats it as if it was one. (POSIX doesn't specify what to do in this case: Brian Kernighan's @command{awk} acts this way, and therefore @command{gawk} @@ -15594,16 +15818,18 @@ string = substr(string, 1, 2) "CDE" substr(string, 6) @end example @cindex case sensitivity, converting case -@cindex converting, case +@cindex strings, converting letter case @item tolower(@var{string}) -@cindex @code{tolower()} function +@cindexawkfunc{tolower} +@cindex convert string to lower case Return a copy of @var{string}, with each uppercase character in the string replaced with its corresponding lowercase character. Nonalphabetic characters are left unchanged. For example, @code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}. @item toupper(@var{string}) -@cindex @code{toupper()} function +@cindexawkfunc{toupper} +@cindex convert string to upper case Return a copy of @var{string}, with each lowercase character in the string replaced with its corresponding uppercase character. Nonalphabetic characters are left unchanged. For example, @@ -15631,6 +15857,7 @@ and builds an internal copy of it that can be executed. Then there is the runtime level, which is when @command{awk} actually scans the replacement string to determine what to generate. +@cindex Brian Kernighan's @command{awk} At both levels, @command{awk} looks for a defined set of characters that can come after a backslash. At the lexical level, it looks for the escape sequences listed in @ref{Escape Sequences}. @@ -15900,17 +16127,17 @@ _bigskip} The only case where the difference is noticeable is the last one: @samp{\\\\} is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}. -Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules +Starting with version 3.1.4, @command{gawk} followed the POSIX rules when @option{--posix} is specified (@pxref{Options}). Otherwise, it continued to follow the 1996 proposed rules, since that had been its behavior for many years. -When @value{PVERSION} 4.0.0 was released, the @command{gawk} maintainer +When version 4.0.0 was released, the @command{gawk} maintainer made the POSIX rules the default, breaking well over a decade's worth of backwards compatibility.@footnote{This was rather naive of him, despite there being a note in this section indicating that the next major version would move to the POSIX rules.} Needless to say, this was a bad idea, -and as of @value{PVERSION} 4.0.1, @command{gawk} resumed its historical +and as of version 4.0.1, @command{gawk} resumed its historical behavior, and only follows the POSIX rules when @option{--posix} is given. The rules for @code{gensub()} are considerably simpler. At the runtime @@ -15995,14 +16222,16 @@ Although this makes a certain amount of sense, it can be surprising. @node I/O Functions @subsection Input/Output Functions +@cindex input/output functions The following functions relate to input/output (I/O). Optional parameters are enclosed in square brackets ([ ]): @table @code @item close(@var{filename} @r{[}, @var{how}@r{]}) -@cindex @code{close()} function +@cindexawkfunc{close} @cindex files, closing +@cindex close file or coprocess Close the file @var{filename} for input or output. Alternatively, the argument may be a shell command that was used for creating a coprocess, or for redirecting to or from a pipe; then the coprocess or pipe is closed. @@ -16019,7 +16248,8 @@ not matter. which discusses this feature in more detail and gives an example. @item fflush(@r{[}@var{filename}@r{]}) -@cindex @code{fflush()} function +@cindexawkfunc{fflush} +@cindex flush buffered output Flush any buffered output associated with @var{filename}, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess. @@ -16037,11 +16267,12 @@ This is the purpose of the @code{fflush()} function---@command{gawk} also buffers its output and the @code{fflush()} function forces @command{gawk} to flush its buffers. -@code{fflush()} was added to Brian Kernighan's -version of @command{awk} in 1994. -For over two decades, it was not part of the POSIX standard. -As of December, 2012, it was accepted for -inclusion into the POSIX standard. +@cindex extensions, common@comma{} @code{fflush()} function +@cindex Brian Kernighan's @command{awk} +@code{fflush()} was added to Brian Kernighan's version of @command{awk} in +April of 1992. For two decades, it was not part of the POSIX standard. +As of December, 2012, it was accepted for inclusion into the POSIX +standard. See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}. POSIX standardizes @code{fflush()} as follows: If there @@ -16077,7 +16308,8 @@ or if @var{filename} is not an open file, pipe, or coprocess. In such a case, @code{fflush()} returns @minus{}1, as well. @item system(@var{command}) -@cindex @code{system()} function +@cindexawkfunc{system} +@cindex invoke shell command @cindex interacting with other programs Execute the operating-system command @var{command} and then return to the @command{awk} program. @@ -16108,7 +16340,7 @@ close("/bin/sh") @noindent @cindex troubleshooting, @code{system()} function -@cindex @code{--sandbox} option, disabling @code{system()} function +@cindex @option{--sandbox} option, disabling @code{system()} function However, if your @command{awk} program is interactive, @code{system()} is useful for running large self-contained programs, such as a shell or an editor. @@ -16144,7 +16376,7 @@ $ @kbd{awk '@{ print $1 + $2 @}'} @print{} 2 @kbd{2 3} @print{} 5 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @noindent @@ -16155,13 +16387,13 @@ with this example: $ @kbd{awk '@{ print $1 + $2 @}' | cat} @kbd{1 1} @kbd{2 3} -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @print{} 2 @print{} 5 @end example @noindent -Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because +Here, no output is printed until after the @kbd{Ctrl-d} is typed, because it is all buffered and sent down the pipe to @command{cat} in one shot. @end sidebar @@ -16224,6 +16456,7 @@ you would see the latter (undesirable) output. @node Time Functions @subsection Time Functions +@cindex time functions @c STARTOFRANGE tst @cindex timestamps @@ -16243,7 +16476,18 @@ it is the number of seconds since 1970-01-01 00:00:00 UTC, not counting leap seconds.@footnote{@xref{Glossary}, especially the entries ``Epoch'' and ``UTC.''} All known POSIX-compliant systems support timestamps from 0 through -@math{2^{31} - 1}, which is sufficient to represent times through +@iftex +@math{2^{31} - 1}, +@end iftex +@ifnottex +@ifnotdocbook +2^31 - 1, +@end ifnotdocbook +@end ifnottex +@docbook +2<superscript>31</superscript> − 1, @c +@end docbook +which is sufficient to represent times through 2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps, including negative timestamps that represent times before the epoch. @@ -16262,7 +16506,8 @@ Optional parameters are enclosed in square brackets ([ ]): @table @code @item mktime(@var{datespec}) -@cindex @code{mktime()} function (@command{gawk}) +@cindexgawkfunc{mktime} +@cindex generate time values Turn @var{datespec} into a timestamp in the same form as is returned by @code{systime()}. It is similar to the function of the same name in ISO C. The argument, @var{datespec}, is a string of the form @@ -16292,7 +16537,8 @@ is out of range, @code{mktime()} returns @minus{}1. @cindex @code{PROCINFO} array @item strftime(@r{[}@var{format} @r{[}, @var{timestamp} @r{[}, @var{utc-flag}@r{]]]}) @c STARTOFRANGE strf -@cindex @code{strftime()} function (@command{gawk}) +@cindexgawkfunc{strftime} +@cindex format time string Format the time specified by @var{timestamp} based on the contents of the @var{format} string and return the result. It is similar to the function of the same name in ISO C. @@ -16309,11 +16555,12 @@ The default string value is @code{@w{"%a %b %e %H:%M:%S %Z %Y"}}. This format string produces output that is equivalent to that of the @command{date} utility. You can assign a new value to @code{PROCINFO["strftime"]} to -change the default format. +change the default format; see below for the various format directives. @item systime() -@cindex @code{systime()} function (@command{gawk}) +@cindexgawkfunc{systime} @cindex timestamps +@cindex current system time Return the current time as the number of seconds since the system epoch. On POSIX systems, this is the number of seconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. @@ -16607,6 +16854,7 @@ gawk 'BEGIN @{ @node Bitwise Functions @subsection Bit-Manipulation Functions +@cindex bit-manipulation functions @c STARTOFRANGE bit @cindex bitwise, operations @c STARTOFRANGE and @@ -16618,8 +16866,8 @@ gawk 'BEGIN @{ @c STARTOFRANGE opbit @cindex operations, bitwise @quotation -@i{I can explain it for you, but I can't understand it for you.}@* -Anonymous +@i{I can explain it for you, but I can't understand it for you.} +@author Anonymous @end quotation Many languages provide the ability to perform @dfn{bitwise} operations @@ -16769,27 +17017,33 @@ bitwise operations just described. They are: @cindex @command{gawk}, bitwise operations in @table @code -@cindex @code{and()} function (@command{gawk}) +@cindexgawkfunc{and} +@cindex bitwise AND @item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise AND of the arguments. There must be at least two. -@cindex @code{compl()} function (@command{gawk}) +@cindexgawkfunc{compl} +@cindex bitwise complement @item compl(@var{val}) Return the bitwise complement of @var{val}. -@cindex @code{lshift()} function (@command{gawk}) +@cindexgawkfunc{lshift} +@cindex left shift @item lshift(@var{val}, @var{count}) Return the value of @var{val}, shifted left by @var{count} bits. -@cindex @code{or()} function (@command{gawk}) +@cindexgawkfunc{or} +@cindex bitwise OR @item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise OR of the arguments. There must be at least two. -@cindex @code{rshift()} function (@command{gawk}) +@cindexgawkfunc{rshift} +@cindex right shift @item rshift(@var{val}, @var{count}) Return the value of @var{val}, shifted right by @var{count} bits. -@cindex @code{xor()} function (@command{gawk}) +@cindexgawkfunc{xor} +@cindex bitwise XOR @item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) Return the bitwise XOR of the arguments. There must be at least two. @end table @@ -16881,6 +17135,7 @@ $ @kbd{gawk -f testbits.awk} @cindex strings, converting @cindex numbers, converting @cindex converting, numbers to strings +@cindex number as string of bits The @code{bits2str()} function turns a binary number into a string. The number @code{1} represents a binary value where the rightmost bit is set to 1. Using this mask, @@ -16916,7 +17171,8 @@ that traverses every element of a true multidimensional array (@pxref{Arrays of Arrays}). @table @code -@cindex @code{isarray()} function (@command{gawk}) +@cindexgawkfunc{isarray} +@cindex scalar or array @item isarray(@var{x}) Return a true value if @var{x} is an array. Otherwise return false. @end table @@ -16924,7 +17180,7 @@ Return a true value if @var{x} is an array. Otherwise return false. @code{isarray()} is meant for use in two circumstances. The first is when traversing a multidimensional array: you can test if an element is itself an array or not. The second is inside the body of a user-defined function -(not discussed yet; @pxref{User-defined}), to test if a paramater is an +(not discussed yet; @pxref{User-defined}), to test if a parameter is an array or not. Note, however, that using @code{isarray()} at the global level to test @@ -16938,6 +17194,7 @@ will end up turning it into a scalar. @subsection String-Translation Functions @cindex @command{gawk}, string-translation functions @cindex functions, string-translation +@cindex string-translation functions @cindex internationalization @cindex @command{awk} programs, internationalizing @@ -16949,7 +17206,8 @@ for the full story. Optional parameters are enclosed in square brackets ([ ]): @table @code -@cindex @code{bindtextdomain()} function (@command{gawk}) +@cindexgawkfunc{bindtextdomain} +@cindex set directory of message catalogs @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) Set the directory in which @command{gawk} will look for message translation files, in case they @@ -16962,14 +17220,15 @@ If @var{directory} is the null string (@code{""}), then @code{bindtextdomain()} returns the current binding for the given @var{domain}. -@cindex @code{dcgettext()} function (@command{gawk}) +@cindexgawkfunc{dcgettext} +@cindex translate string @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the translation of @var{string} in text domain @var{domain} for locale category @var{category}. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. -@cindex @code{dcngettext()} function (@command{gawk}) +@cindexgawkfunc{dcngettext} @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -16986,7 +17245,7 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. @section User-Defined Functions @c STARTOFRANGE udfunc -@cindex user-defined, functions +@cindex user-defined functions @c STARTOFRANGE funcud @cindex functions, user-defined Complicated @command{awk} programs can often be simplified by defining @@ -17045,7 +17304,7 @@ have a parameter with the same name as the function itself. In addition, according to the POSIX standard, function parameters cannot have the same name as one of the special built-in variables (@pxref{Built-in Variables}. Not all versions of @command{awk} -enforce this restriction. +enforce this restriction.) The @var{body-of-function} consists of @command{awk} statements. It is the most important part of the definition, because it says what the function @@ -17072,6 +17331,7 @@ conventional to place some extra space between the arguments and the local variables, in order to document how your function is supposed to be used. @cindex variables, shadowing +@cindex shadowing of variable values During execution of the function body, the arguments and local variable values hide, or @dfn{shadow}, any variables of the same names used in the rest of the program. The shadowed variables are not accessible in the @@ -17092,7 +17352,7 @@ function. When this happens, we say the function is @dfn{recursive}. The act of a function calling itself is called @dfn{recursion}. All the built-in functions return a value to their caller. -User-defined functions can do also, using the @code{return} statement, +User-defined functions can do so also, using the @code{return} statement, which is described in detail in @ref{Return Statement}. Many of the subsequent examples in this @value{SECTION} use the @code{return} statement. @@ -17130,6 +17390,7 @@ keyword @code{function} when defining a function. @node Function Example @subsection Function Definition Examples +@cindex function definition example Here is an example of a user-defined function, called @code{myprint()}, that takes a number and prints it in a specific format: @@ -17184,7 +17445,8 @@ Instead of having to repeat this loop everywhere that you need to clear out an array, your program can just call @code{delarray}. (This guarantees portability. The use of @samp{delete @var{array}} to delete -the contents of an entire array is a nonstandard extension.) +the contents of an entire array is a recent@footnote{Late in 2012.} +addition to the POSIX standard.) The following is an example of a recursive function. It takes a string as an input parameter and returns the string in backwards order. @@ -17240,7 +17502,10 @@ function ctime(ts, format) @subsection Calling User-Defined Functions @c STARTOFRANGE fudc -This section describes how to call a user-defined function. +@cindex functions, user-defined, calling +@dfn{Calling a function} means causing the function to run and do its job. +A function call is an expression and its value is the value returned by +the function. @menu * Calling A Function:: Don't use spaces. @@ -17251,11 +17516,6 @@ This section describes how to call a user-defined function. @node Calling A Function @subsubsection Writing A Function Call -@cindex functions, user-defined, calling -@dfn{Calling a function} means causing the function to run and do its job. -A function call is an expression and its value is the value returned by -the function. - A function call consists of the function name followed by the arguments in parentheses. @command{awk} expressions are what you write in the call for the arguments. Each time the call is executed, these @@ -17279,8 +17539,8 @@ an error. @node Variable Scope @subsubsection Controlling Variable Scope -@cindex local variables -@cindex variables, local +@cindex local variables, in a function +@cindex variables, local to a function There is no way to make a variable local to a @code{@{ @dots{} @}} block in @command{awk}, but you can make a variable local to a function. It is good practice to do so whenever a variable is needed only in that @@ -17725,7 +17985,7 @@ character: @example the_func = "sum" -result = @@the_func() # calls the `sum' function +result = @@the_func() # calls the sum() function @end example Here is a full program that processes the previously shown data, @@ -17846,8 +18106,9 @@ We can do something similar using @command{gawk}, like this: @ignore @c file eg/lib/quicksort.awk # -# Arnold Robbins, arnold@skeeve.com, Public Domain +# Arnold Robbins, arnold@@skeeve.com, Public Domain # January 2009 + @c endfile @end ignore @@ -17920,7 +18181,7 @@ or equal to), which yields data sorted in descending order. Next comes a sorting function. It is parameterized with the starting and ending field numbers and the comparison function. It builds an array with -the data and calls @code{quicksort} appropriately, and then formats the +the data and calls @code{quicksort()} appropriately, and then formats the results as a single string: @example @@ -18058,9 +18319,11 @@ it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more manageable, and making programs more readable. -In their seminal 1976 book, @cite{Software Tools}@footnote{Sadly, over 35 +@cindex Kernighan, Brian +@cindex Plauger, P.J.@: +In their seminal 1976 book, @cite{Software Tools},@footnote{Sadly, over 35 years later, many of the lessons taught by this book have yet to be -learned by a vast number of practicing programmers.}, Brian Kernighan +learned by a vast number of practicing programmers.} Brian Kernighan and P.J.@: Plauger wrote: @quotation @@ -18187,7 +18450,7 @@ with the user's program. @cindex underscore (@code{_}), in names of private variables In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables---for example, -@code{_pw_byname} in the user database routines +@code{_pw_byname()} in the user database routines (@pxref{Passwd Functions}). This convention is recommended, since it even further decreases the chance of inadvertent conflict among variable names. Note that this @@ -18206,7 +18469,7 @@ The leading capital letter indicates that it is global, while the fact that the variable name is not all capital letters indicates that the variable is not one of @command{awk}'s built-in variables, such as @code{FS}. -@cindex @code{--dump-variables} option +@cindex @option{--dump-variables} option, using for library functions It is also important that @emph{all} variables in library functions that do not need to save state are, in fact, declared local.@footnote{@command{gawk}'s @option{--dump-variables} command-line @@ -18259,6 +18522,7 @@ programming use. vice versa. * Join Function:: A function to join an array into a string. * Getlocaltime Function:: A function to get formatted times. +* Readfile Function:: A function to read an entire file at once. @end menu @node Strtonum Function @@ -18474,7 +18738,7 @@ An @code{END} rule is automatically added to the program calling @code{assert()}. Normally, if a program consists of just a @code{BEGIN} rule, the input files and/or standard input are not read. However, now that the program has an @code{END} rule, @command{awk} -attempts to read the input @value{DF}s or standard input +attempts to read the input data files or standard input (@pxref{Using BEGIN/END}), most likely causing the program to hang as it waits for input. @@ -18500,9 +18764,9 @@ with an @code{exit} statement. The way @code{printf} and @code{sprintf()} (@pxref{Printf}) perform rounding often depends upon the system's C @code{sprintf()} -subroutine. On many machines, @code{sprintf()} rounding is ``unbiased,'' -which means it doesn't always round a trailing @samp{.5} up, contrary -to naive expectations. In unbiased rounding, @samp{.5} rounds to even, +subroutine. On many machines, @code{sprintf()} rounding is @dfn{unbiased}, +which means it doesn't always round a trailing .5 up, contrary +to naive expectations. In unbiased rounding, .5 rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means that if you are using a format that does rounding (e.g., @code{"%.0f"}), you should check what your system does. The following function does @@ -18551,7 +18815,7 @@ function round(x, ival, aval, fraction) @c don't include test harness in the file that gets installed # test harness -@{ print $0, round($0) @} +# @{ print $0, round($0) @} @end example @node Cliff Random Function @@ -18618,6 +18882,7 @@ reason to build them into the @command{awk} interpreter: @cindex @code{ord()} user-defined function @cindex @code{chr()} user-defined function +@cindex @code{_ord_init()} user-defined function @example @c file eg/lib/ord.awk # ord.awk --- do ord and chr @@ -18664,8 +18929,9 @@ function _ord_init( low, high, i, t) @cindex character sets (machine character encodings) @cindex ASCII @cindex EBCDIC +@cindex Unicode @cindex mark parity -Some explanation of the numbers used by @code{chr} is worthwhile. +Some explanation of the numbers used by @code{_ord_init()} is worthwhile. The most prominent character set in use today is ASCII.@footnote{This is changing; many systems use Unicode, a very large character set that includes ASCII as a subset. On systems with full Unicode support, @@ -18676,7 +18942,7 @@ Although an defines characters that use the values from 0 to 127.@footnote{ASCII has been extended in many countries to use the values from 128 to 255 for country-specific characters. If your system uses these extensions, -you can simplify @code{_ord_init} to loop from 0 to 255.} +you can simplify @code{_ord_init()} to loop from 0 to 255.} In the now distant past, at least one minicomputer manufacturer @c Pr1me, blech @@ -18883,17 +19149,92 @@ A more general design for the @code{getlocaltime()} function would have allowed the user to supply an optional timestamp value to use instead of the current time. +@node Readfile Function +@subsection Reading A Whole File At Once + +Often, it is convenient to have the entire contents of a file available +in memory as a single string. A straightforward but naive way to +do that might be as follows: + +@example +function readfile(file, tmp, contents) +@{ + if ((getline tmp < file) < 0) + return + + contents = tmp + while (getline tmp < file) > 0) + contents = contents RT tmp + + close(file) + return contents +@} +@end example + +This function reads from @code{file} one record at a time, building +up the full contents of the file in the local variable @code{contents}. +It works, but is not necessarily efficient. + +The following function, based on a suggestion by Denis Shirokov, +reads the entire contents of the named file in one shot: + +@cindex @code{readfile()} user-defined function +@example +@c file eg/lib/readfile.awk +# readfile.awk --- read an entire file at once +@c endfile +@ignore +@c file eg/lib/readfile.awk +# +# Original idea by Denis Shirokov, cosmogen@@gmail.com, April 2013 +# +@c endfile +@end ignore +@c file eg/lib/readfile.awk + +function readfile(file, tmp, save_rs) +@{ + save_rs = RS + RS = "^$" + getline tmp < file + close(file) + RS = save_rs + + return tmp +@} +@c endfile +@end example + +It works by setting @code{RS} to @samp{^$}, a regular expression that +will never match if the file has contents. @command{gawk} reads data from +the file into @code{tmp} attempting to match @code{RS}. The match fails +after each read, but fails quickly, such that @command{gawk} fills +@code{tmp} with the entire contents of the file. +(@xref{Records}, for information on @code{RT} and @code{RS}.) + +In the case that @code{file} is empty, the return value is the null +string. Thus calling code may use something like: + +@example +contents = readfile("/some/path") +if (length(contents) == 0) + # file was empty @dots{} +@end example + +This tests the result to see if it is empty or not. An equivalent +test would be @samp{contents == ""}. + @node Data File Management -@section @value{DDF} Management +@section Data File Management @c STARTOFRANGE dataf @cindex files, managing @c STARTOFRANGE libfdataf -@cindex libraries of @command{awk} functions, managing, @value{DF}s +@cindex libraries of @command{awk} functions, managing, data files @c STARTOFRANGE flibdataf -@cindex functions, library, managing @value{DF}s +@cindex functions, library, managing data files This @value{SECTION} presents functions that are useful for managing -command-line @value{DF}s. +command-line data files. @menu * Filetrans Function:: A function for handling data file transitions. @@ -18904,16 +19245,16 @@ command-line @value{DF}s. @end menu @node Filetrans Function -@subsection Noting @value{DDF} Boundaries +@subsection Noting Data File Boundaries -@cindex files, managing, @value{DF} boundaries +@cindex files, managing, data file boundaries @cindex files, initialization and cleanup The @code{BEGIN} and @code{END} rules are each executed exactly once at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the -@code{BEGIN} rule is executed at the beginning of each @value{DF} and the -@code{END} rule is executed at the end of each @value{DF}. +@code{BEGIN} rule is executed at the beginning of each data file and the +@code{END} rule is executed at the end of each data file. When informed that this was not the case, the user requested that we add new special @@ -18924,7 +19265,7 @@ Adding these special patterns to @command{gawk} wasn't necessary; the job can be done cleanly in @command{awk} itself, as illustrated by the following library program. It arranges to call two user-supplied functions, @code{beginfile()} and -@code{endfile()}, at the beginning and end of each @value{DF}. +@code{endfile()}, at the beginning and end of each data file. Besides solving the problem in only nine(!) lines of code, it does so @emph{portably}; this works with any implementation of @command{awk}: @@ -18955,17 +19296,17 @@ This file must be loaded before the user's ``main'' program, so that the rule it supplies is executed first. This rule relies on @command{awk}'s @code{FILENAME} variable that -automatically changes for each new @value{DF}. The current @value{FN} is +automatically changes for each new data file. The current file name is saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does -not equal @code{_oldfilename}, then a new @value{DF} is being processed and +not equal @code{_oldfilename}, then a new data file is being processed and it is necessary to call @code{endfile()} for the old file. Because @code{endfile()} should only be called if a file has been processed, the program first checks to make sure that @code{_oldfilename} is not the null -string. The program then assigns the current @value{FN} to +string. The program then assigns the current file name to @code{_oldfilename} and calls @code{beginfile()} for the file. Because, like all @command{awk} variables, @code{_oldfilename} is initialized to the null string, this rule executes correctly even for the -first @value{DF}. +first data file. The program also supplies an @code{END} rule to do the final processing for the last file. Because this @code{END} rule comes before any @code{END} rules @@ -18974,7 +19315,7 @@ again the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile()} user-defined function @cindex @code{endfile()} user-defined function -If the same @value{DF} occurs twice in a row on the command line, then +If the same data file occurs twice in a row on the command line, then @code{endfile()} and @code{beginfile()} are not executed at the end of the first pass and at the beginning of the second pass. The following version solves the problem: @@ -19089,12 +19430,12 @@ The @code{rewind()} function also relies on the @code{nextfile} keyword (@pxref{Nextfile Statement}). @node File Checking -@subsection Checking for Readable @value{DDF}s +@subsection Checking for Readable Data Files -@cindex troubleshooting, readable @value{DF}s -@cindex readable @value{DF}s@comma{} checking +@cindex troubleshooting, readable data files +@cindex readable data files@comma{} checking @cindex files, skipping -Normally, if you give @command{awk} a @value{DF} that isn't readable, +Normally, if you give @command{awk} a data file that isn't readable, it stops with a fatal error. There are times when you might want to just ignore such files and keep going. You can do this by prepending the following program to your @command{awk} @@ -19143,15 +19484,15 @@ This is a by-product of @command{awk}'s implicit read-a-record-and-match-against-the-rules loop: when @command{awk} tries to read a record from an empty file, it immediately receives an end of file indication, closes the file, and proceeds on to the next -command-line @value{DF}, @emph{without} executing any user-level +command-line data file, @emph{without} executing any user-level @command{awk} program code. Using @command{gawk}'s @code{ARGIND} variable (@pxref{Built-in Variables}), it is possible to detect when an empty -@value{DF} has been skipped. Similar to the library file presented +data file has been skipped. Similar to the library file presented in @ref{Filetrans Function}, the following library file calls a function named @code{zerofile()} that the user must provide. The arguments passed are -the @value{FN} and the position in @code{ARGV} where it was found: +the file name and the position in @code{ARGV} where it was found: @cindex @code{zerofile.awk} program @example @@ -19239,15 +19580,15 @@ END @{ @end ignore @node Ignoring Assigns -@subsection Treating Assignments as @value{FFN}s +@subsection Treating Assignments as File Names @cindex assignments as filenames @cindex filenames, assignments as Occasionally, you might not want @command{awk} to process command-line variable assignments (@pxref{Assignment Options}). -In particular, if you have a @value{FN} that contain an @samp{=} character, -@command{awk} treats the @value{FN} as an assignment, and does not process it. +In particular, if you have a file name that contains an @samp{=} character, +@command{awk} treats the file name as an assignment, and does not process it. Some users have suggested an additional command-line option for @command{gawk} to disable command-line assignments. However, some simple programming with @@ -19291,7 +19632,7 @@ awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk * The function works by looping through the arguments. It prepends @samp{./} to any argument that matches the form -of a variable assignment, turning that argument into a @value{FN}. +of a variable assignment, turning that argument into a file name. The use of @code{No_command_assign} allows you to disable command-line assignments at invocation time, by giving the variable a true value. @@ -19458,7 +19799,7 @@ The discussion that follows walks through the code a bit at a time: # <c> a character representing the current option # Private Data: -# _opti -- index in multi-flag option, e.g., -abc +# _opti -- index in multiflag option, e.g., -abc @c endfile @end example @@ -19650,7 +19991,7 @@ After @code{getopt()} is through, it is the responsibility of the user level code to clear out all the elements of @code{ARGV} from 1 to @code{Optind}, so that @command{awk} does not try to process the command-line options -as @value{FN}s. +as file names. @end quotation Several of the sample programs presented in @@ -19667,7 +20008,7 @@ use @code{getopt()} to process their arguments. @c STARTOFRANGE libfudata @cindex libraries of @command{awk} functions, user database, reading @c STARTOFRANGE flibudata -@cindex functions, library, user database, reading +@cindex functions, library, user database@comma{} reading @c STARTOFRANGE udatar @cindex user database@comma{} reading @c STARTOFRANGE dataur @@ -19916,7 +20257,7 @@ from anywhere within a user's program, and the user may have his or her own way of splitting records and fields. -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, testing the field splitting The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which is @code{"FIELDWIDTHS"} if field splitting is being done with @code{FIELDWIDTHS}. This makes it possible to restore the correct @@ -19925,7 +20266,7 @@ field-splitting mechanism later. The test can only be true for or on some other @command{awk} implementation. The code that checks for using @code{FPAT}, using @code{using_fpat} -and @code{PROCINFO["FS"]} is similar. +and @code{PROCINFO["FS"]}, is similar. The main part of the function uses a loop to read database lines, split the line into fields, and then store the line into each array as necessary. @@ -19955,10 +20296,9 @@ function getpwnam(name) @end example @cindex @code{getpwuid()} function (C library) -Similarly, -the @code{getpwuid} function takes a user ID number argument. If that -user number is in the database, it returns the appropriate line. Otherwise, it -returns the null string: +Similarly, the @code{getpwuid()} function takes a user ID number +argument. If that user number is in the database, it returns the +appropriate line. Otherwise, it returns the null string: @cindex @code{getpwuid()} user-defined function @example @@ -20035,12 +20375,12 @@ uses these functions. @c STARTOFRANGE libfgdata @cindex libraries of @command{awk} functions, group database, reading @c STARTOFRANGE flibgdata -@cindex functions, library, group database, reading +@cindex functions, library, group database@comma{} reading @c STARTOFRANGE gdatar @cindex group database, reading @c STARTOFRANGE datagr @cindex database, group, reading -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and group membership @cindex @code{getgrent()} function (C library) @cindex @code{getgrent()} user-defined function @cindex groups@comma{} information about @@ -20462,7 +20802,7 @@ index and value, use the indirect function call syntax and the value. When calling @code{walk_array()}, you would pass the name of a user-defined -function that expects to receive and index and a value, and then processes +function that expects to receive an index and a value, and then processes the element. @@ -20524,7 +20864,7 @@ awk -f @var{program} -- @var{options} @var{files} @noindent Here, @var{program} is the name of the @command{awk} program (such as @file{cut.awk}), @var{options} are any command-line options for the -program that start with a @samp{-}, and @var{files} are the actual @value{DF}s. +program that start with a @samp{-}, and @var{files} are the actual data files. If your system supports the @samp{#!} executable interpreter mechanism (@pxref{Executable Scripts}), @@ -20729,7 +21069,7 @@ spaces. Also remember that after @code{getopt()} is through we have to clear out all the elements of @code{ARGV} from 1 to @code{Optind}, so that @command{awk} does not try to process the command-line options -as @value{FN}s. +as file names. After dealing with the command-line options, the program verifies that the options make sense. Only one or the other of @option{-c} and @option{-f} @@ -20816,7 +21156,7 @@ complete field list, including filler fields: @example @c file eg/prog/cut.awk -function set_charlist( field, i, j, f, g, t, +function set_charlist( field, i, j, f, g, n, m, t, filler, last, len) @{ field = 1 # count total fields @@ -20913,6 +21253,7 @@ of picking the input line apart by characters. @cindex searching, files for regular expressions @c STARTOFRANGE fsregexp @cindex files, searching for regular expressions +@c STARTOFRANGE egrep @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} @@ -20925,8 +21266,8 @@ egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{} The @var{pattern} is a regular expression. In typical usage, the regular expression is quoted to prevent the shell from expanding any of the -special characters as @value{FN} wildcards. Normally, @command{egrep} -prints the lines that matched. If multiple @value{FN}s are provided on +special characters as file name wildcards. Normally, @command{egrep} +prints the lines that matched. If multiple file names are provided on the command line, each output line is preceded by the name of the file and a colon. @@ -21017,7 +21358,7 @@ pattern is supplied with @option{-e}, the first nonoption on the command line is used. The @command{awk} command-line arguments up to @code{ARGV[Optind]} are cleared, so that @command{awk} won't try to process them as files. If no files are specified, the standard input is used, and if multiple files are -specified, we make sure to note this so that the @value{FN}s can precede the +specified, we make sure to note this so that the file names can precede the matched lines in the output: @example @@ -21115,9 +21456,9 @@ A number of additional tests are made, but they are only done if we are not counting lines. First, if the user only wants exit status (@code{no_print} is true), then it is enough to know that @emph{one} line in this file matched, and we can skip on to the next file with -@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can -print the @value{FN}, and then skip to the next file with @code{nextfile}. -Finally, each line is printed, with a leading @value{FN} and colon +@code{nextfile}. Similarly, if we are only printing file names, we can +print the file name, and then skip to the next file with @code{nextfile}. +Finally, each line is printed, with a leading file name and colon if necessary: @cindex @code{!} (exclamation point), @code{!} operator @@ -21198,12 +21539,14 @@ or not. @c ENDOFRANGE regexps @c ENDOFRANGE sfregexp @c ENDOFRANGE fsregexp +@c ENDOFRANGE egrep @node Id Program @subsection Printing out User Information @cindex printing, user information @cindex users, information about, printing +@c STARTOFRANGE id @cindex @command{id} utility The @command{id} utility lists a user's real and effective user ID numbers, real and effective group ID numbers, and the user's group set, if any. @@ -21216,7 +21559,7 @@ $ @kbd{id} @print{} uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy) @end example -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and user and group ID numbers This information is part of what is provided by @command{gawk}'s @code{PROCINFO} array (@pxref{Built-in Variables}). However, the @command{id} utility provides a more palatable output than just @@ -21317,7 +21660,6 @@ BEGIN \ @c endfile @end example -@cindex @code{in} operator The test in the @code{for} loop is worth noting. Any supplementary groups in the @code{PROCINFO} array have the indices @code{"group1"} through @code{"group@var{N}"} for some @@ -21327,7 +21669,7 @@ there are. This loop works by starting at one, concatenating the value with @code{"group"}, and then using @code{in} to see if that value is -in the array. Eventually, @code{i} is incremented past +in the array (@pxref{Reference to Elements}). Eventually, @code{i} is incremented past the last group in the array and the loop exits. The loop is also correct if there are @emph{no} supplementary @@ -21340,6 +21682,7 @@ The POSIX version of @command{id} takes arguments that control which information is printed. Modify this version to accept the same arguments and perform in the same way. @end ignore +@c ENDOFRANGE id @node Split Program @subsection Splitting a Large File into Pieces @@ -21348,6 +21691,7 @@ arguments and perform in the same way. @c STARTOFRANGE filspl @cindex files, splitting +@c STARTOFRANGE split @cindex @code{split} utility The @command{split} program splits large text files into smaller pieces. Usage is as follows:@footnote{This is the traditional usage. The @@ -21365,7 +21709,7 @@ number of lines in each file, supply a number on the command line preceded with a minus; e.g., @samp{-500} for files with 500 lines in them instead of 1000. To change the name of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional -argument that specifies the @value{FN} prefix. +argument that specifies the file name prefix. Here is a version of @command{split} in @command{awk}. It uses the @code{ord()} and @code{chr()} functions presented in @@ -21375,8 +21719,8 @@ The program first sets its defaults, and then tests to make sure there are not too many arguments. It then looks at each argument in turn. The first argument could be a minus sign followed by a number. If it is, this happens to look like a negative number, so it is made positive, and that is the -count of lines. The data @value{FN} is skipped over and the final argument -is used as the prefix for the output @value{FN}s: +count of lines. The data file name is skipped over and the final argument +is used as the prefix for the output file names: @cindex @code{split.awk} program @example @@ -21425,7 +21769,7 @@ BEGIN @{ The next rule does most of the work. @code{tcount} (temporary count) tracks how many lines have been printed to the output file so far. If it is greater than @code{count}, it is time to close the current file and start a new one. -@code{s1} and @code{s2} track the current suffixes for the @value{FN}. If +@code{s1} and @code{s2} track the current suffixes for the file name. If they are both @samp{z}, the file is just too big. Otherwise, @code{s1} moves to the next letter in the alphabet and @code{s2} starts over again at @samp{a}: @@ -21491,12 +21835,14 @@ which isn't true for EBCDIC systems. @c Exercise: Fix these problems. @c BFD... @c ENDOFRANGE filspl +@c ENDOFRANGE split @node Tee Program @subsection Duplicating Output into Multiple Files @cindex files, multiple@comma{} duplicating output into @cindex output, duplicating into files +@c STARTOFRANGE tee @cindex @code{tee} utility The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies its standard input to its standard output and also duplicates it to the @@ -21513,13 +21859,13 @@ The @code{BEGIN} rule first makes a copy of all the command-line arguments into an array named @code{copy}. @code{ARGV[0]} is not copied, since it is not needed. @code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to -process each @value{FN} in @code{ARGV} as input data. +process each file name in @code{ARGV} as input data. @cindex flag variables If the first argument is @option{-a}, then the flag variable @code{append} is set to true, and both @code{ARGV[1]} and @code{copy[1]} are deleted. If @code{ARGC} is less than two, then no -@value{FN}s were supplied and @code{tee} prints a usage message and exits. +file names were supplied and @code{tee} prints a usage message and exits. Finally, @command{awk} is forced to read the standard input by setting @code{ARGV[1]} to @code{"-"} and @code{ARGC} to two: @@ -21611,6 +21957,7 @@ END \ @} @c endfile @end example +@c ENDOFRANGE tee @node Uniq Program @subsection Printing Nonduplicated Lines of Text @@ -21621,6 +21968,7 @@ END \ @cindex printing, unduplicated lines of text @c STARTOFRANGE tpul @cindex text@comma{} printing, unduplicated lines of +@c STARTOFRANGE uniq @cindex @command{uniq} utility The @command{uniq} utility reads sorted lines of data on its standard input, and by default removes duplicate lines. In other words, it only @@ -21872,6 +22220,7 @@ END @{ @end example @c ENDOFRANGE prunt @c ENDOFRANGE tpul +@c ENDOFRANGE uniq @node Wc Program @subsection Counting Things @@ -21888,6 +22237,7 @@ END @{ @cindex characters, counting @c STARTOFRANGE lico @cindex lines, counting +@c STARTOFRANGE wc @cindex @command{wc} utility The @command{wc} (word count) utility counts lines, words, and characters in one or more input files. Its usage is as follows: @@ -21981,7 +22331,7 @@ BEGIN @{ @end example The @code{beginfile()} function is simple; it just resets the counts of lines, -words, and characters to zero, and saves the current @value{FN} in +words, and characters to zero, and saves the current file name in @code{fname}: @example @@ -22003,7 +22353,7 @@ you will see that @code{FNR} has already been reset by the time @code{endfile()} is called.} It then prints out those numbers for the file that was just read. It relies on @code{beginfile()} to reset the -numbers for the following @value{DF}: +numbers for the following data file: @c FIXME: ONE DAY: make the above footnote an exercise, @c instead of giving away the answer. @@ -22070,6 +22420,7 @@ END @{ @c ENDOFRANGE lico @c ENDOFRANGE woco @c ENDOFRANGE chco +@c ENDOFRANGE wc @c ENDOFRANGE posimawk @node Miscellaneous Programs @@ -22171,8 +22522,34 @@ word, comparing it to the previous one: @cindex insomnia, cure for @cindex Robbins, Arnold @quotation -@i{Nothing cures insomnia like a ringing alarm clock.}@* -Arnold Robbins +@i{Nothing cures insomnia like a ringing alarm clock.} +@author Arnold Robbins +@end quotation +@cindex Quanstrom, Erik +@ignore +Date: Sat, 15 Feb 2014 16:47:09 -0500 +Subject: Re: 9atom install question +Message-ID: <l2jcvx6j6mey60xnrkb0hhob.1392500829294@email.android.com> +From: Erik Quanstrom <quanstro@quanstro.net> +To: Aharon Robbins <arnold@skeeve.com> + +yes. + +- erik + +Aharon Robbins <arnold@skeeve.com> wrote: + +>> sleep is for web developers. +> +>Can I quote you, in the gawk manual? +> +>Thanks, +> +>Arnold +@end ignore +@quotation +@i{Sleep is for web developers.} +@author Erik Quanstrom @end quotation @c STARTOFRANGE tialarm @@ -22338,6 +22715,7 @@ seconds are necessary: @c STARTOFRANGE chtra @cindex characters, transliterating +@c STARTOFRANGE tr @cindex @command{tr} utility The system @command{tr} utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: @@ -22348,12 +22726,10 @@ often used to map uppercase letters into lowercase for further processing: @command{tr} requires two lists of characters.@footnote{On some older systems, -@ifset ORA including Solaris, -@end ifset @command{tr} may require that the lists be written as range expressions enclosed in square brackets (@samp{[a-z]}) and quoted, -to prevent the shell from attempting a @value{FN} expansion. This is +to prevent the shell from attempting a file name expansion. This is not a feature.} When processing the input, the first character in the first list is replaced with the first character in the second list, the second character in the first list is replaced with the second @@ -22488,6 +22864,7 @@ An obvious improvement to this program would be to set up the assumes that the ``from'' and ``to'' lists will never change throughout the lifetime of the program. @c ENDOFRANGE chtra +@c ENDOFRANGE tr @node Labels Program @subsection Printing Mailing Labels @@ -22547,6 +22924,7 @@ that there are two blank lines at the top and two blank lines at the bottom. The @code{END} rule arranges to flush the final page of labels; there may not have been an even multiple of 20 labels in the data: +@c STARTOFRANGE labels @cindex @code{labels.awk} program @example @c file eg/prog/labels.awk @@ -22614,6 +22992,7 @@ END \ @end example @c ENDOFRANGE prml @c ENDOFRANGE mlprint +@c ENDOFRANGE labels @node Word Sorting @subsection Generating Word-Usage Counts @@ -22680,6 +23059,7 @@ to remove punctuation characters. Finally, we solve the third problem by using the system @command{sort} utility to process the output of the @command{awk} script. Here is the new version of the program: +@c STARTOFRANGE wordfreq @cindex @code{wordfreq.awk} program @example @c file eg/prog/wordfreq.awk @@ -22741,6 +23121,7 @@ have true pipes at the command-line (or batch-file) level. See the general operating system documentation for more information on how to use the @command{sort} program. @c ENDOFRANGE worus +@c ENDOFRANGE wordfreq @node History Sorting @subsection Removing Duplicates from Unsorted Text @@ -22751,7 +23132,7 @@ The @command{uniq} program (@pxref{Uniq Program}), removes duplicate lines from @emph{sorted} data. -Suppose, however, you need to remove duplicate lines from a @value{DF} but +Suppose, however, you need to remove duplicate lines from a data file but that you want to preserve the order the lines are in. A good example of this might be a shell history file. The history file keeps a copy of all the commands you have entered, and it is not unusual to repeat a command @@ -22770,6 +23151,7 @@ Each element of @code{lines} is a unique command, and the indices of The @code{END} rule simply prints out the lines, in order: @cindex Rakitzis, Byron +@c STARTOFRANGE histsort @cindex @code{histsort.awk} program @example @c file eg/prog/histsort.awk @@ -22812,6 +23194,7 @@ print data[lines[i]], lines[i] This works because @code{data[$0]} is incremented each time a line is seen. @c ENDOFRANGE lidu +@c ENDOFRANGE histsort @node Extract Program @subsection Extracting Programs from Texinfo Source Files @@ -22843,7 +23226,8 @@ printed and online documentation. @ifnotinfo Texinfo is fully documented in the book @cite{Texinfo---The GNU Documentation Format}, -available from the Free Software Foundation. +available from the Free Software Foundation, +and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}. @end ifnotinfo @ifinfo The Texinfo language is described fully, starting with @@ -22887,7 +23271,7 @@ Lines containing @samp{@@group} and @samp{@@end group} are simply removed. (@pxref{Join Function}). The example programs in the online Texinfo source for @cite{@value{TITLE}} -(@file{gawk.texi}) have all been bracketed inside @samp{file} and +(@file{gawktexi.in}) have all been bracketed inside @samp{file} and @samp{endfile} lines. The @command{gawk} distribution uses a copy of @file{extract.awk} to extract the sample programs and install many of them in a standard directory where @command{gawk} can find them. @@ -22921,6 +23305,7 @@ The first rule handles calling @code{system()}, checking that a command is given (@code{NF} is at least three) and also checking that the command exits with a zero exit status, signifying OK: +@c STARTOFRANGE extract @cindex @code{extract.awk} program @example @c file eg/prog/extract.awk @@ -22970,7 +23355,7 @@ screen. @end ifnottex The second rule handles moving data into files. It verifies that a -@value{FN} is given in the directive. If the file named is not the +file name is given in the directive. If the file named is not the current file, then the current file is closed. Keeping the current file open until a new file is encountered allows the use of the @samp{>} redirection for printing the contents, keeping open file management @@ -23052,7 +23437,7 @@ subsequent output is appended to the file (@pxref{Redirection}). This makes it easy to mix program text and explanatory prose for the same sample source file (as has been done here!) without any hassle. The file is -only closed when a new data @value{FN} is encountered or at the end of the +only closed when a new data file name is encountered or at the end of the input file. Finally, the function @code{@w{unexpected_eof()}} prints an appropriate @@ -23079,6 +23464,7 @@ END @{ @end example @c ENDOFRANGE texse @c ENDOFRANGE fitex +@c ENDOFRANGE extract @node Simple Sed @subsection A Simple Stream Editor @@ -23104,10 +23490,11 @@ Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp The following program, @file{awksed.awk}, accepts at least two command-line arguments: the pattern to look for and the text to replace it with. Any -additional arguments are treated as data @value{FN}s to process. If none +additional arguments are treated as data file names to process. If none are provided, the standard input is used: @cindex Brennan, Michael +@c STARTOFRANGE awksed @cindex @command{awksed.awk} program @c @cindex simple stream editor @c @cindex stream editor, simple @@ -23177,7 +23564,7 @@ The @code{BEGIN} rule handles the setup, checking for the right number of arguments and calling @code{usage()} if there is a problem. Then it sets @code{RS} and @code{ORS} from the command-line arguments and sets @code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are -not treated as @value{FN}s +not treated as file names (@pxref{ARGC and ARGV}). The @code{usage()} function prints an error message and exits. @@ -23204,6 +23591,7 @@ Exercise: what are the advantages and disadvantages of this version versus sed? Others? @end ignore +@c ENDOFRANGE awksed @node Igawk Program @subsection An Easy Way to Use Library Functions @@ -23275,7 +23663,7 @@ Literal text, provided with @option{--source} or @option{--source=}. This text is just appended directly. @item -Source @value{FN}s, provided with @option{-f}. We use a neat trick and append +Source file names, provided with @option{-f}. We use a neat trick and append @samp{@@include @var{filename}} to the shell variable's contents. Since the file-inclusion program works the way @command{gawk} does, this gets the text of the file included into the program at the correct point. @@ -23288,7 +23676,7 @@ shell variable. @item Run the expanded program with @command{gawk} and any other original command-line -arguments that the user supplied (such as the data @value{FN}s). +arguments that the user supplied (such as the data file names). @end enumerate This program uses shell variables extensively: for storing command-line arguments, @@ -23319,7 +23707,7 @@ programming trick. Don't worry about it if you are not familiar with These are saved and passed on to @command{gawk}. @item -f@r{,} --file@r{,} --file=@r{,} -Wfile= -The @value{FN} is appended to the shell variable @code{program} with an +The file name is appended to the shell variable @code{program} with an @samp{@@include} statement. The @command{expr} utility is used to remove the leading option part of the argument (e.g., @samp{--file=}). @@ -23347,6 +23735,7 @@ program. The program is as follows: +@c STARTOFRANGE igawk @cindex @code{igawk.sh} program @example @c file eg/prog/igawk.sh @@ -23443,10 +23832,10 @@ is stored in the shell variable @code{expand_prog}. Doing this keeps the shell script readable. The @command{awk} program reads through the user's program, one line at a time, using @code{getline} (@pxref{Getline}). The input -@value{FN}s and @samp{@@include} statements are managed using a stack. -As each @samp{@@include} is encountered, the current @value{FN} is +file names and @samp{@@include} statements are managed using a stack. +As each @samp{@@include} is encountered, the current file name is ``pushed'' onto the stack and the file named in the @samp{@@include} -directive becomes the current @value{FN}. As each file is finished, +directive becomes the current file name. As each file is finished, the stack is ``popped,'' and the previous input file becomes the current input file again. The process is started by making the original file the first one on the stack. @@ -23455,16 +23844,16 @@ The @code{pathto()} function does the work of finding the full path to a file. It simulates @command{gawk}'s behavior when searching the @env{AWKPATH} environment variable (@pxref{AWKPATH Variable}). -If a @value{FN} has a @samp{/} in it, no path search is done. -Similarly, if the @value{FN} is @code{"-"}, then that string is +If a file name has a @samp{/} in it, no path search is done. +Similarly, if the file name is @code{"-"}, then that string is used as-is. Otherwise, -the @value{FN} is concatenated with the name of each directory in -the path, and an attempt is made to open the generated @value{FN}. +the file name is concatenated with the name of each directory in +the path, and an attempt is made to open the generated file name. The only way to test if a file can be read in @command{awk} is to go ahead and try to read it with @code{getline}; this is what @code{pathto()} does.@footnote{On some very old versions of @command{awk}, the test @samp{getline junk < t} can loop forever if the file exists but is empty. -Caveat emptor.} If the file can be read, it is closed and the @value{FN} +Caveat emptor.} If the file can be read, it is closed and the file name is returned: @ignore @@ -23519,17 +23908,17 @@ BEGIN @{ @c endfile @end example -The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}. +The stack is initialized with @code{ARGV[1]}, which will be @samp{/dev/stdin}. The main loop comes next. Input lines are read in succession. Lines that do not start with @samp{@@include} are printed verbatim. -If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}. +If the line does start with @samp{@@include}, the file name is in @code{$2}. @code{pathto()} is called to generate the full path. If it cannot, then the program prints an error message and continues. The next thing to check is if the file is included already. The -@code{processed} array is indexed by the full @value{FN} of each included +@code{processed} array is indexed by the full file name of each included file and it tracks this information for us. If the file is -seen again, a warning message is printed. Otherwise, the new @value{FN} is +seen again, a warning message is printed. Otherwise, the new file name is pushed onto the stack and processing continues. Finally, when @code{getline} encounters the end of the input file, the file @@ -23607,10 +23996,10 @@ options and command-line arguments that the user supplied. @c this causes more problems than it solves, so leave it out. @ignore -The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk} +The special file @file{/dev/null} is passed as a data file to @command{gawk} to handle an interesting case. Suppose that the user's program only has -a @code{BEGIN} rule and there are no @value{DF}s to read. -The program should exit without reading any @value{DF}s. +a @code{BEGIN} rule and there are no data files to read. +The program should exit without reading any data files. However, suppose that an included library file defines an @code{END} rule of its own. In this case, @command{gawk} will hang, reading standard input. In order to avoid this, @file{/dev/null} is explicitly added to the @@ -23706,10 +24095,12 @@ statements for the desired library functions. @c ENDOFRANGE libfex @c ENDOFRANGE flibex @c ENDOFRANGE awkpex +@c ENDOFRANGE igawk @node Anagram Program @subsection Finding Anagrams From A Dictionary +@cindex anagrams, finding An interesting programming challenge is to search for @dfn{anagrams} in a word list (such as @@ -23729,6 +24120,7 @@ The following program uses arrays of arrays to bring together words with the same signature and array sorting to print the words in sorted order. +@c STARTOFRANGE anagram @cindex @code{anagram.awk} program @example @c file eg/prog/anagram.awk @@ -23836,10 +24228,13 @@ babels beslab babery yabber @dots{} @end example +@c ENDOFRANGE anagram @node Signature Program @subsection And Now For Something Completely Different +@cindex signature program +@cindex Brini, Davide The following program was written by Davide Brini @c (@email{dave_br@@gmx.com}) and is published on @uref{http://backreference.org/2011/02/03/obfuscated-awk/, @@ -23971,12 +24366,15 @@ It contains the following chapters: @item @ref{Dynamic Extensions}. +@end itemize @end ifdocbook @end ignore @node Advanced Features @chapter Advanced Features of @command{gawk} -@cindex advanced features, network connections, See Also networks, connections +@ifset WITH_NETWORK_CHAPTER +@cindex advanced features, network connections, See Also networks@comma{} connections +@end ifset @c STARTOFRANGE gawadv @cindex @command{gawk}, features, advanced @c STARTOFRANGE advgaw @@ -23991,8 +24389,8 @@ who knows where you live." @end ignore @quotation @i{Write documentation as if whoever reads it is -a violent psychopath who knows where you live.}@* -Steve English, as quoted by Peter Langston +a violent psychopath who knows where you live.} +@author Steve English, as quoted by Peter Langston @end quotation This @value{CHAPTER} discusses advanced features in @command{gawk}. @@ -24042,7 +24440,7 @@ discusses the ability to dynamically add new built-in functions to @node Nondecimal Data @section Allowing Nondecimal Input Data -@cindex @code{--non-decimal-data} option +@cindex @option{--non-decimal-data} option @cindex advanced features, nondecimal input data @cindex input, data@comma{} nondecimal @cindex constants, nondecimal @@ -24086,7 +24484,7 @@ using this facility could lead to surprising results, the default is to leave it disabled. If you want it, you must explicitly request it. @cindex programming conventions, @code{--non-decimal-data} option -@cindex @code{--non-decimal-data} option, @code{strtonum()} function and +@cindex @option{--non-decimal-data} option, @code{strtonum()} function and @cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and @quotation CAUTION @emph{Use of this option is not recommended.} @@ -24311,7 +24709,7 @@ ordered data: @example function cmp_randomize(i1, v1, i2, v2) @{ - # random order + # random order (caution: this may never terminate!) return (2 - 4 * rand()) @} @end example @@ -24326,7 +24724,7 @@ with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, so consider it only if necessary. The following comparison functions force a deterministic order, and are based on the fact that the -indices of two elements are never equal: +(string) indices of two elements are never equal: @example function cmp_numeric(i1, v1, i2, v2) @@ -24383,17 +24781,16 @@ sorted array traversal is not the default. @subsection Sorting Array Values and Indices with @command{gawk} @cindex arrays, sorting -@cindex @code{asort()} function (@command{gawk}) +@cindexgawkfunc{asort} @cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting +@cindexgawkfunc{asorti} +@cindex @code{asorti()} function (@command{gawk}), arrays@comma{} sorting @cindex sort function, arrays, sorting -In most @command{awk} implementations, sorting an array requires -writing a @code{sort()} function. -While this can be educational for exploring different sorting algorithms, -usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort()} -and @code{asorti()} functions -(@pxref{String Functions}) -for sorting arrays. For example: +In most @command{awk} implementations, sorting an array requires writing +a @code{sort()} function. While this can be educational for exploring +different sorting algorithms, usually that's not the point of the program. +@command{gawk} provides the built-in @code{asort()} and @code{asorti()} +functions (@pxref{String Functions}) for sorting arrays. For example: @example @var{populate the array} data @@ -24406,7 +24803,7 @@ After the call to @code{asort()}, the array @code{data} is indexed from 1 to some number @var{n}, the total number of elements in @code{data}. (This count is @code{asort()}'s return value.) @code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. -The comparison is based on the type of the elements +The default comparison is based on the type of the elements (@pxref{Typing and Comparison}). All numeric values come before all string values, which in turn come before all subarrays. @@ -24428,24 +24825,11 @@ In this case, @command{gawk} copies the @code{source} array into the @code{dest} array and then sorts @code{dest}, destroying its indices. However, the @code{source} array is not affected. -@code{asort()} accepts a third string argument to control comparison of -array elements. As with @code{PROCINFO["sorted_in"]}, this argument -may be one of the predefined names that @command{gawk} provides -(@pxref{Controlling Scanning}), or the name of a user-defined function -(@pxref{Controlling Array Traversal}). - -@quotation NOTE -In all cases, the sorted element values consist of the original -array's element values. The ability to control comparison merely -affects the way in which they are sorted. -@end quotation - Often, what's needed is to sort on the values of the @emph{indices} -instead of the values of the elements. -To do that, use the -@code{asorti()} function. The interface is identical to that of -@code{asort()}, except that the index values are used for sorting, and -become the values of the result array: +instead of the values of the elements. To do that, use the +@code{asorti()} function. The interface and behavior are identical to +that of @code{asort()}, except that the index values are used for sorting, +and become the values of the result array: @example @{ source[$0] = some_func($0) @} @@ -24462,29 +24846,40 @@ END @{ @} @end example -Similar to @code{asort()}, -in all cases, the sorted element values consist of the original -array's indices. The ability to control comparison merely -affects the way in which they are sorted. +So far, so good. Now it starts to get interesting. Both @code{asort()} +and @code{asorti()} accept a third string argument to control comparison +of array elements. In @ref{String Functions}, we ignored this third +argument; however, the time has now come to describe how this argument +affects these two functions. + +Basically, the third argument specifies how the array is to be sorted. +There are two possibilities. As with @code{PROCINFO["sorted_in"]}, +this argument may be one of the predefined names that @command{gawk} +provides (@pxref{Controlling Scanning}), or it may be the name of a +user-defined function (@pxref{Controlling Array Traversal}). -Sorting the array by replacing the indices provides maximal flexibility. -To traverse the elements in decreasing order, use a loop that goes from -@var{n} down to 1, either over the elements or over the indices.@footnote{You -may also use one of the predefined sorting names that sorts in -decreasing order.} +In the latter case, @emph{the function can compare elements in any way +it chooses}, taking into account just the indices, just the values, +or both. This is extremely powerful. + +Once the array is sorted, @code{asort()} takes the @emph{values} in +their final order, and uses them to fill in the result array, whereas +@code{asorti()} takes the @emph{indices} in their final order, and uses +them to fill in the result array. @cindex reference counting, sorting arrays +@quotation NOTE Copying array indices and elements isn't expensive in terms of memory. Internally, @command{gawk} maintains @dfn{reference counts} to data. For example, when @code{asort()} copies the first array to the second one, there is only one copy of the original array elements' data, even though both arrays use the values. +@end quotation @c Document It And Call It A Feature. Sigh. @cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -@cindex arrays, sorting, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array sorting and +@cindex arrays, sorting, and @code{IGNORECASE} variable +@cindex @code{IGNORECASE} variable, and array sorting functions Because @code{IGNORECASE} affects string comparisons, the value of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. Note also that the locale's sorting order does @emph{not} @@ -24563,7 +24958,7 @@ open a @emph{two-way} pipe to another process. The second process is termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}. The two-way connection is created using the @samp{|&} operator (borrowed from the Korn shell, @command{ksh}):@footnote{This is very -different from the same operator in the C shell.} +different from the same operator in the C shell and in Bash.} @example do @{ @@ -24653,7 +25048,7 @@ As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} command ensures traditional Unix (ASCII) sorting from @command{sort}. @cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array +@cindex @code{PROCINFO} array, and communications via ptys You may also use pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element @@ -24704,10 +25099,10 @@ another process on another system across an IP network connection. You can think of this as just a @emph{very long} two-way pipeline to a coprocess. The way @command{gawk} decides that you want to use TCP/IP networking is -by recognizing special @value{FN}s that begin with one of @samp{/inet/}, +by recognizing special file names that begin with one of @samp{/inet/}, @samp{/inet4/} or @samp{/inet6}. -The full syntax of the special @value{FN} is +The full syntax of the special file name is @file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}. The components are: @@ -24796,7 +25191,7 @@ When @command{gawk} has finished running, it creates a profile of your program i named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than @command{gawk} normally does. -@cindex @code{--profile} option +@cindex @option{--profile} option As shown in the following example, the @option{--profile} option can be used to change the name of the file where @command{gawk} will write the profile: @@ -24851,52 +25246,60 @@ foo junk @end example -Here is the @file{awkprof.out} that results from running the @command{gawk} -profiler on this program and data (this example also illustrates that @command{awk} -programmers sometimes have to work late): +Here is the @file{awkprof.out} that results from running the +@command{gawk} profiler on this program and data. (This example also +illustrates that @command{awk} programmers sometimes get up very early +in the morning to work.) -@cindex @code{BEGIN} pattern -@cindex @code{END} pattern +@cindex @code{BEGIN} pattern, and profiling +@cindex @code{END} pattern, and profiling @example - # gawk profile, created Sun Aug 13 00:00:15 2000 + # gawk profile, created Thu Feb 27 05:16:21 2014 - # BEGIN block(s) + # BEGIN block(s) - BEGIN @{ - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - @} + BEGIN @{ + 1 print "First BEGIN rule" + @} - # Rule(s) + BEGIN @{ + 1 print "Second BEGIN rule" + @} - 5 /foo/ @{ # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) @{ - 6 sing() - @} - @} + # Rule(s) - 5 @{ - 5 if (/foo/) @{ # 2 - 2 print "if is true" - 3 @} else @{ - 3 print "else is true" - @} - @} + 5 /foo/ @{ # 2 + 2 print "matched /foo/, gosh" + 6 for (i = 1; i <= 3; i++) @{ + 6 sing() + @} + @} - # END block(s) + 5 @{ + 5 if (/foo/) @{ # 2 + 2 print "if is true" + 3 @} else @{ + 3 print "else is true" + @} + @} - END @{ - 1 print "First END rule" - 1 print "Second END rule" - @} + # END block(s) + + END @{ + 1 print "First END rule" + @} + + END @{ + 1 print "Second END rule" + @} - # Functions, listed alphabetically - 6 function sing(dummy) - @{ - 6 print "I gotta be me!" - @} + # Functions, listed alphabetically + + 6 function sing(dummy) + @{ + 6 print "I gotta be me!" + @} @end example This example illustrates many of the basic features of profiling output. @@ -24904,15 +25307,16 @@ They are as follows: @itemize @bullet @item -The program is printed in the order @code{BEGIN} rule, -@code{BEGINFILE} rule, +The program is printed in the order @code{BEGIN} rules, +@code{BEGINFILE} rules, pattern/action rules, -@code{ENDFILE} rule, @code{END} rule and functions, listed +@code{ENDFILE} rules, @code{END} rules and functions, listed alphabetically. -Multiple @code{BEGIN} and @code{END} rules are merged together, -as are multiple @code{BEGINFILE} and @code{ENDFILE} rules. +Multiple @code{BEGIN} and @code{END} rules retain their +separate identities, as do +multiple @code{BEGINFILE} and @code{ENDFILE} rules. -@cindex patterns, counts +@cindex patterns, counts, in a profile @item Pattern-action rules have two counts. The first count, to the left of the rule, shows how many times @@ -24932,7 +25336,7 @@ is a count showing how many times the condition was true. The count for the @code{else} indicates how many times the test failed. -@cindex loops, count for header +@cindex loops, count for header, in a profile @item The count for a loop header (such as @code{for} or @code{while}) shows how many times the loop test was executed. @@ -24940,8 +25344,8 @@ or @code{while}) shows how many times the loop test was executed. statement in a rule to determine how many times the rule was executed. If the first statement is a loop, the count is misleading.) -@cindex functions, user-defined, counts -@cindex user-defined, functions, counts +@cindex functions, user-defined, counts, in a profile +@cindex user-defined, functions, counts, in a profile @item For user-defined functions, the count next to the @code{function} keyword indicates how many times the function was called. @@ -24955,8 +25359,8 @@ The layout uses ``K&R'' style with TABs. Braces are used everywhere, even when the body of an @code{if}, @code{else}, or loop is only a single statement. -@cindex @code{()} (parentheses) -@cindex parentheses @code{()} +@cindex @code{()} (parentheses), in a profile +@cindex parentheses @code{()}, in a profile @item Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. @@ -24991,8 +25395,8 @@ typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce a standard representation. The disadvantage is that all source-code -comments are lost, as are the distinctions among multiple @code{BEGIN}, -@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: +comments are lost. +Also, things such as: @example /foo/ @@ -25012,6 +25416,7 @@ which is correct, but possibly surprising. @cindex profiling @command{awk} programs, dynamically @cindex @command{gawk} program, dynamic profiling +@cindex dynamic profiling Besides creating profiles when a program has completed, @command{gawk} can produce a profile while it is running. This is useful if your @command{awk} program goes into an @@ -25025,9 +25430,9 @@ $ @kbd{gawk --profile -f myprog &} @end example @cindex @command{kill} command@comma{} dynamic profiling -@cindex @code{USR1} signal -@cindex @code{SIGUSR1} signal -@cindex signals, @code{USR1}/@code{SIGUSR1} +@cindex @code{USR1} signal, for dynamic profiling +@cindex @code{SIGUSR1} signal, for dynamic profiling +@cindex signals, @code{USR1}/@code{SIGUSR1}, for profiling @noindent The shell prints a job number and process ID number; in this case, 13992. Use the @command{kill} command to send the @code{USR1} signal @@ -25058,9 +25463,9 @@ You may send @command{gawk} the @code{USR1} signal as many times as you like. Each time, the profile and function call trace are appended to the output profile file. -@cindex @code{HUP} signal -@cindex @code{SIGHUP} signal -@cindex signals, @code{HUP}/@code{SIGHUP} +@cindex @code{HUP} signal, for dynamic profiling +@cindex @code{SIGHUP} signal, for dynamic profiling +@cindex signals, @code{HUP}/@code{SIGHUP}, for profiling If you use the @code{HUP} signal instead of the @code{USR1} signal, @command{gawk} produces the profile and the function call trace and then exits. @@ -25076,12 +25481,17 @@ the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the -@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the -@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. +@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the +@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key. Finally, @command{gawk} also accepts another option, @option{--pretty-print}. When called this way, @command{gawk} ``pretty prints'' the program into @file{awkprof.out}, without any execution counts. + +@quotation NOTE +The @option{--pretty-print} option still runs your program. +This will change in the next major release. +@end quotation @c ENDOFRANGE advgaw @c ENDOFRANGE gawadv @c ENDOFRANGE awkp @@ -25193,6 +25603,7 @@ lookup of the translations. @cindex @code{.po} files @cindex files, @code{.po} +@c STARTOFRANGE portobfi @cindex portable object files @cindex files, portable object @item @@ -25204,6 +25615,7 @@ For example, there might be a @file{fr.po} for a French translation. @cindex @code{.gmo} files @cindex files, @code{.gmo} @cindex message object files +@c STARTOFRANGE portmsgfi @cindex files, message object @item Each language's @file{.po} file is converted into a binary @@ -25351,7 +25763,7 @@ String constants marked with a leading underscore are candidates for translation at runtime. String constants without a leading underscore are not translated. -@cindex @code{dcgettext()} function (@command{gawk}) +@cindexgawkfunc{dcgettext} @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the translation of @var{string} in text domain @var{domain} for locale category @var{category}. @@ -25377,7 +25789,7 @@ chosen to be simple and to allow for reasonable @command{awk}-style default arguments. @end quotation -@cindex @code{dcngettext()} function (@command{gawk}) +@cindexgawkfunc{dcngettext} @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -25393,7 +25805,7 @@ The same remarks about argument order as for the @code{dcgettext()} function app @cindex files, @code{.gmo}, specifying directory of @cindex message object files, specifying directory of @cindex files, message object, specifying directory of -@cindex @code{bindtextdomain()} function (@command{gawk}) +@cindexgawkfunc{bindtextdomain} @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) Change the directory in which @code{gettext} looks for @file{.gmo} files, in case they @@ -25495,7 +25907,7 @@ and use translations from @command{awk}. @cindex portable object files @cindex files, portable object Once a program's translatable strings have been marked, they must -be extracted to create the initial @file{.po} file. +be extracted to create the initial @file{.pot} file. As part of translation, it is often helpful to rearrange the order in which arguments to @code{printf} are output. @@ -25515,13 +25927,13 @@ is covered. @subsection Extracting Marked Strings @cindex strings, extracting @cindex marked strings@comma{} extracting -@cindex @code{--gen-pot} option +@cindex @option{--gen-pot} option @cindex command-line options, string extraction @cindex string extraction (internationalization) @cindex marked string extraction (internationalization) @cindex extraction, of marked strings (internationalization) -@cindex @code{--gen-pot} option +@cindex @option{--gen-pot} option Once your @command{awk} program is working, and all the strings have been marked and you've set (and perhaps bound) the text domain, it is time to produce translations. @@ -25544,6 +25956,8 @@ second argument to @code{dcngettext()}.@footnote{The @xref{I18N Example}, for the full list of steps to go through to create and test translations for @command{guide}. +@c ENDOFRANGE portobfi +@c ENDOFRANGE portmsgfi @node Printf Ordering @subsection Rearranging @code{printf} Arguments @@ -25590,7 +26004,7 @@ example, @samp{string} is the first argument and @samp{length(string)} is the se @example $ @kbd{gawk 'BEGIN @{} > @kbd{string = "Dont Panic"} -> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} +> @kbd{printf "%2$d characters live in \"%1$s\"\n",} > @kbd{string, length(string)} > @kbd{@}'} @print{} 10 characters live in "Dont Panic" @@ -25624,7 +26038,7 @@ This is somewhat counterintuitive. and those with positional specifiers in the same string: @example -$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} +$ @kbd{gawk 'BEGIN @{ printf "%d %3$s\n", 1, 2, "hi" @}'} @error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none @end example @@ -25869,7 +26283,7 @@ complete detail in @cite{GNU gettext tools}.) @end ifnotinfo As of this writing, the latest version of GNU @code{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.2.1.tar.gz, @value{PVERSION} 0.18.2.1}. +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.2.1.tar.gz, version 0.18.2.1}. If a translation of @command{gawk}'s messages exists, then @command{gawk} produces usage messages, warnings, @@ -25965,6 +26379,7 @@ The following list defines terms used throughout the rest of this @value{CHAPTER}. @table @dfn +@cindex stack frame @item Stack Frame Programs generally call functions during the course of their execution. One function can call another, or a function can call itself (recursion). @@ -25986,6 +26401,7 @@ invoked. Commands that print the call stack print information about each stack frame (as detailed later on). @item Breakpoint +@cindex breakpoint During debugging, you often wish to let the program run until it reaches a certain point, and then continue execution from there one statement (or instruction) at a time. The way to do this is to set @@ -25995,6 +26411,7 @@ take over control of the program's execution. You can add and remove as many breakpoints as you like. @item Watchpoint +@cindex watchpoint A watchpoint is similar to a breakpoint. The difference is that breakpoints are oriented around the code: stop when a certain point in the code is reached. A watchpoint, however, specifies that program execution @@ -26026,6 +26443,7 @@ by the higher-level @command{awk} commands. @node Sample Debugging Session @section Sample Debugging Session +@cindex sample debugging session In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample debugging session. We will use the @command{awk} implementation of the @@ -26039,13 +26457,16 @@ as our example. @node Debugger Invocation @subsection How to Start the Debugger +@cindex starting the debugger +@cindex debugger, how to start -Starting the debugger is almost exactly like running @command{awk}, except you have to -pass an additional option @option{--debug} or the corresponding short option @option{-D}. -The file(s) containing the program and any supporting code are given on the command -line as arguments to one or more @option{-f} options. (@command{gawk} is not designed -to debug command-line programs, only programs contained in files.) In our case, -we invoke the debugger like this: +Starting the debugger is almost exactly like running @command{gawk}, +except you have to pass an additional option @option{--debug} or the +corresponding short option @option{-D}. The file(s) containing the +program and any supporting code are given on the command line as arguments +to one or more @option{-f} options. (@command{gawk} is not designed +to debug command-line programs, only programs contained in files.) +In our case, we invoke the debugger like this: @example $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile} @@ -26178,7 +26599,7 @@ gawk> @kbd{p NR} @noindent So we can see that @code{are_equal()} was only called for the second record -of the file. Of course, this is because our program contained a rule for +of the file. Of course, this is because our program contains a rule for @samp{NR == 1}: @example @@ -26378,21 +26799,24 @@ controlling breakpoints are: @cindex debugger commands, @code{break} @cindex @code{break} debugger command @cindex @code{b} debugger command (alias for @code{break}) +@cindex set breakpoint +@cindex breakpoint, setting @item @code{break} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] @itemx @code{b} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] Without any argument, set a breakpoint at the next instruction to be executed in the selected stack frame. Arguments can be one of the following: +@c @asis for docbook @c nested table -@table @var -@item n +@table @asis +@item @var{n} Set a breakpoint at line number @var{n} in the current source file. -@item filename@code{:}n +@item @var{filename}@code{:}@var{n} Set a breakpoint at line number @var{n} in source file @var{filename}. -@item function +@item @var{function} Set a breakpoint at entry to (the first instruction of) function @var{function}. @end table @@ -26408,6 +26832,8 @@ it continues executing the program. @cindex debugger commands, @code{clear} @cindex @code{clear} debugger command +@cindex delete breakpoint at location +@cindex breakpoint at location, how to delete @item @code{clear} [[@var{filename}@code{:}]@var{n} | @var{function}] Without any argument, delete any breakpoint at the next instruction to be executed in the selected stack frame. If the program stops at @@ -26415,19 +26841,20 @@ a breakpoint, this deletes that breakpoint so that the program does not stop at that location again. Arguments can be one of the following: @c nested table -@table @var -@item n +@table @asis +@item @var{n} Delete breakpoint(s) set at line number @var{n} in the current source file. -@item filename@code{:}n +@item @var{filename}@code{:}@var{n} Delete breakpoint(s) set at line number @var{n} in source file @var{filename}. -@item function +@item @var{function} Delete breakpoint(s) set at entry to function @var{function}. @end table @cindex debugger commands, @code{condition} @cindex @code{condition} debugger command +@cindex breakpoint condition @item @code{condition} @var{n} @code{"@var{expression}"} Add a condition to existing breakpoint or watchpoint @var{n}. The condition is an @command{awk} expression that the debugger evaluates @@ -26441,6 +26868,8 @@ watchpoint is made unconditional. @cindex debugger commands, @code{delete} @cindex @code{delete} debugger command @cindex @code{d} debugger command (alias for @code{delete}) +@cindex delete breakpoint by number +@cindex breakpoint, delete by number @item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Delete specified breakpoints or a range of breakpoints. Deletes @@ -26448,6 +26877,8 @@ all defined breakpoints if no argument is supplied. @cindex debugger commands, @code{disable} @cindex @code{disable} debugger command +@cindex disable breakpoint +@cindex breakpoint, how to disable or enable @item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] Disable specified breakpoints or a range of breakpoints. Without any argument, disables all breakpoints. @@ -26456,6 +26887,7 @@ any argument, disables all breakpoints. @cindex debugger commands, @code{enable} @cindex @code{enable} debugger command @cindex @code{e} debugger command (alias for @code{enable}) +@cindex enable breakpoint @item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Enable specified breakpoints or a range of breakpoints. Without @@ -26475,6 +26907,7 @@ the program stops at the breakpoint. @cindex debugger commands, @code{ignore} @cindex @code{ignore} debugger command +@cindex ignore breakpoint @item @code{ignore} @var{n} @var{count} Ignore breakpoint number @var{n} the next @var{count} times it is hit. @@ -26483,6 +26916,7 @@ hit. @cindex debugger commands, @code{tbreak} @cindex @code{tbreak} debugger command @cindex @code{t} debugger command (alias for @code{tbreak}) +@cindex temporary breakpoint @item @code{tbreak} [[@var{filename}@code{:}]@var{n} | @var{function}] @itemx @code{t} [[@var{filename}@code{:}]@var{n} | @var{function}] Set a temporary breakpoint (enabled for only one stop). @@ -26503,6 +26937,8 @@ execution of the program than we saw in our earlier example: @cindex @code{silent} debugger command @cindex debugger commands, @code{end} @cindex @code{end} debugger command +@cindex breakpoint commands +@cindex commands to execute at breakpoint @item @code{commands} [@var{n}] @itemx @code{silent} @itemx @dots{} @@ -26530,6 +26966,7 @@ gawk> @cindex debugger commands, @code{c} (@code{continue}) @cindex debugger commands, @code{continue} +@cindex continue program, in debugger @item @code{continue} [@var{count}] @itemx @code{c} [@var{count}] Resume program execution. If continued from a breakpoint and @var{count} is @@ -26546,6 +26983,7 @@ Print the returned value. @cindex debugger commands, @code{next} @cindex @code{next} debugger command @cindex @code{n} debugger command (alias for @code{next}) +@cindex single-step execution, in the debugger @item @code{next} [@var{count}] @itemx @code{n} [@var{count}] Continue execution to the next source line, stepping over function calls. @@ -26640,6 +27078,7 @@ items on the list. @cindex debugger commands, @code{eval} @cindex @code{eval} debugger command +@cindex evaluate expressions, in debugger @item @code{eval "@var{awk statements}"} Evaluate @var{awk statements} in the context of the running program. You can do anything that an @command{awk} program would do: assign @@ -26657,6 +27096,7 @@ parameters defined by the program. @cindex debugger commands, @code{print} @cindex @code{print} debugger command @cindex @code{p} debugger command (alias for @code{print}) +@cindex print variables, in debugger @item @code{print} @var{var1}[@code{,} @var{var2} @dots{}] @itemx @code{p} @var{var1}[@code{,} @var{var2} @dots{}] Print the value of a @command{gawk} variable or field. @@ -26690,6 +27130,7 @@ No newline is printed unless one is specified. @cindex debugger commands, @code{set} @cindex @code{set} debugger command +@cindex assign values to variables, in debugger @item @code{set} @var{var}@code{=}@var{value} Assign a constant (number or string) value to an @command{awk} variable or field. @@ -26702,6 +27143,7 @@ You can also set special @command{awk} variables, such as @code{FS}, @cindex debugger commands, @code{watch} @cindex @code{watch} debugger command @cindex @code{w} debugger command (alias for @code{watch}) +@cindex set watchpoint @item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] @itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] Add variable @var{var} (or field @code{$@var{n}}) to the watch list. @@ -26718,12 +27160,14 @@ then the debugger stops execution and prompts for a command. Otherwise, @cindex debugger commands, @code{undisplay} @cindex @code{undisplay} debugger command +@cindex stop automatic display, in debugger @item @code{undisplay} [@var{n}] Remove item number @var{n} (or all items, if no argument) from the automatic display list. @cindex debugger commands, @code{unwatch} @cindex @code{unwatch} debugger command +@cindex delete watchpoint @item @code{unwatch} [@var{n}] Remove item number @var{n} (or all items, if no argument) from the watch list. @@ -26744,12 +27188,14 @@ functions which called the one you are in. The commands for doing this are: @cindex debugger commands, @code{backtrace} @cindex @code{backtrace} debugger command @cindex @code{bt} debugger command (alias for @code{backtrace}) +@cindex call stack, display in debugger +@cindex traceback, display in debugger @item @code{backtrace} [@var{count}] @itemx @code{bt} [@var{count}] Print a backtrace of all function calls (stack frames), or innermost @var{count} frames if @var{count} > 0. Print the outermost @var{count} frames if @var{count} < 0. The backtrace displays the name and arguments to each -function, the source @value{FN}, and the line number. +function, the source file name, and the line number. @cindex debugger commands, @code{down} @cindex @code{down} debugger command @@ -26797,25 +27243,32 @@ The value for @var{what} should be one of the following: @c nested table @table @code @item args +@cindex show function arguments, in debugger Arguments of the selected frame. @item break +@cindex show breakpoints List all currently set breakpoints. @item display +@cindex automatic displays, in debugger List all items in the automatic display list. @item frame +@cindex describe call stack frame, in debugger Description of the selected stack frame. @item functions +@cindex list function definitions, in debugger List all function definitions including source file names and line numbers. @item locals +@cindex show local variables, in debugger Local variables of the selected frame. @item source +@cindex show name of current source file, in debugger The name of the current source file. Each time the program stops, the current source file is the file containing the current instruction. When the debugger first starts, the current source file is the first file @@ -26824,12 +27277,15 @@ included via the @option{-f} option. The be used at any time to change the current source. @item sources +@cindex show all source files, in debugger List all program sources. @item variables +@cindex list all global variables, in debugger List all global variables. @item watch +@cindex show watchpoints List all items in the watch list. @end table @end table @@ -26843,6 +27299,8 @@ from a file. The commands are: @cindex debugger commands, @code{option} @cindex @code{option} debugger command @cindex @code{o} debugger command (alias for @code{option}) +@cindex display debugger options +@cindex debugger options @item @code{option} [@var{name}[@code{=}@var{value}]] @itemx @code{o} [@var{name}[@code{=}@var{value}]] Without an argument, display the available debugger options @@ -26854,38 +27312,46 @@ The available options are: @c nested table @table @code @item history_size +@cindex debugger history size The maximum number of lines to keep in the history file @file{./.gawk_history}. The default is 100. @item listsize +@cindex debugger default list amount The number of lines that @code{list} prints. The default is 15. @item outfile +@cindex redirect @command{gawk} output, in debugger Send @command{gawk} output to a file; debugger output still goes to standard output. An empty string (@code{""}) resets output to standard output. @item prompt +@cindex debugger prompt The debugger prompt. The default is @samp{@w{gawk> }}. @item save_history @r{[}on @r{|} off@r{]} +@cindex debugger history file Save command history to file @file{./.gawk_history}. The default is @code{on}. @item save_options @r{[}on @r{|} off@r{]} +@cindex save debugger options Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. Options are read back in to the next session upon startup. @item trace @r{[}on @r{|} off@r{]} +@cindex instruction tracing, in debugger Turn instruction tracing on or off. The default is @code{off}. @end table @item @code{save} @var{filename} -Save the commands from the current session to the given @value{FN}, +Save the commands from the current session to the given file name, so that they can be replayed using the @command{source} command. @item @code{source} @var{filename} +@cindex debugger, read commands from a file Run command(s) from a file; an error in any command does not terminate execution of subsequent commands. Comments (lines starting with @samp{#}) are allowed in a command file. @@ -26984,8 +27450,8 @@ about the command @var{command}. @cindex debugger commands, @code{list} @cindex @code{list} debugger command @cindex @code{l} debugger command (alias for @code{list}) -@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}] -@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}] +@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}] +@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}] Print the specified lines (default 15) from the current source file or the file named @var{filename}. The possible arguments to @code{list} are as follows: @@ -27005,7 +27471,7 @@ Print lines centered around line number @var{n}. @item @var{n}--@var{m} Print lines from @var{n} to @var{m}. -@item @var{filename@code{:}n} +@item @var{filename}@code{:}@var{n} Print lines centered around line number @var{n} in source file @var{filename}. This command may change the current source file. @@ -27018,6 +27484,7 @@ function @var{function}. This command may change the current source file. @cindex debugger commands, @code{quit} @cindex @code{quit} debugger command @cindex @code{q} debugger command (alias for @code{quit}) +@cindex exit the debugger @item @code{quit} @itemx @code{q} Exit the debugger. Debugging is great fun, but sometimes we all have @@ -27041,6 +27508,8 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while @node Readline Support @section Readline Support +@cindex command completion, in debugger +@cindex history expansion, in debugger If @command{gawk} is compiled with the @code{readline} library, you can take advantage of that library's command completion and history expansion @@ -27050,8 +27519,8 @@ features. The following types of completion are available: @item Command completion Command names. -@item Source @value{FN} completion -Source @value{FN}s. Relevant commands are +@item Source file name completion +Source file names. Relevant commands are @code{break}, @code{clear}, @code{list}, @@ -27128,9 +27597,7 @@ be added, and of course feel free to try to add them yourself! @cindex arbitrary precision @cindex multiple precision @cindex infinite precision -@cindex floating-point numbers, arbitrary precision -@cindex MPFR -@cindex GMP +@cindex floating-point, numbers@comma{} arbitrary precision @cindex Knuth, Donald @quotation @@ -27139,11 +27606,11 @@ to believe. Novice computer users solve this problem by implicitly trusting in the computer as an infallible authority; they tend to believe that all digits of a printed answer are significant. Disillusioned computer users have just the opposite approach; they are constantly afraid that their answers -are almost meaningless.}@* -Donald Knuth@footnote{Donald E.@: Knuth. +are almost meaningless.}@footnote{Donald E.@: Knuth. @cite{The Art of Computer Programming}. Volume 2, @cite{Seminumerical Algorithms}, third edition, 1998, ISBN 0-201-89683-4, p.@: 229.} +@author Donald Knuth @end quotation This @value{CHAPTER} discusses issues that you may encounter @@ -27281,7 +27748,7 @@ This makes it clear that the full numeric value is different from what the default string representations show. @code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to +at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, most of the time, 17 digits is enough to capture a floating-point number's @@ -27310,7 +27777,7 @@ $ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} @print{} 0000051580 515.82 @print{} 0000051582 -@kbd{@value{CTL}-d} +@kbd{Ctrl-d} @end example @noindent @@ -27474,23 +27941,38 @@ then the answer is @math{2^{53}}. @end iftex @ifnottex +@ifnotdocbook 2^53. +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript>. @c +@end docbook The next representable number is the even number @iftex @math{2^{53} + 2}, @end iftex @ifnottex +@ifnotdocbook 2^53 + 2, +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript> + 2, @c +@end docbook meaning it is unlikely that you will be able to make @command{gawk} print @iftex @math{2^{53} + 1} @end iftex @ifnottex +@ifnotdocbook 2^53 + 1 +@end ifnotdocbook @end ifnottex +@docbook +2<superscript>53</superscript> + 1 @c +@end docbook in integer format. The range of integers exactly representable by a 64-bit double is @@ -27498,8 +27980,13 @@ is @math{[-2^{53}, 2^{53}]}. @end iftex @ifnottex +@ifnotdocbook [@minus{}2^53, 2^53]. +@end ifnotdocbook @end ifnottex +@docbook +[−2<superscript>53</superscript>, 2<superscript>53</superscript>]. @c +@end docbook If you ever see an integer outside this range in @command{awk} using 64-bit doubles, you have reason to be very suspicious about the accuracy of the output. Here is a simple program with erroneous output: @@ -27723,8 +28210,13 @@ number is then @math{s @cdot 2^e}. @end iftex @ifnottex +@ifnotdocbook @var{s * 2^e}. +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>s ⋅ 2<superscript>e</superscript></emphasis>. @c +@end docbook The first bit of a non-zero binary significand is always one, so the significand in an IEEE-754 format only includes the fractional part, leaving the leading one implicit. @@ -27894,6 +28386,8 @@ when you change the rounding mode. @node Gawk and MPFR @section @command{gawk} + MPFR = Powerful Arithmetic +@cindex MPFR +@cindex GMP The rest of this @value{CHAPTER} describes how to use the arbitrary precision (also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric @@ -27906,12 +28400,17 @@ The easiest way to find out is to look at the output of the following command: @example -$ @kbd{gawk --version} -@print{} GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) -@print{} Copyright (C) 1989, 1991-2013 Free Software Foundation. +$ @kbd{./gawk --version} +@print{} GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) +@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation. @dots{} @end example +@noindent +(You may see different version numbers than what's shown here. That's OK; +what's important is to see that GNU MPFR and GNU MP are listed in +the output.) + @command{gawk} uses the @uref{http://www.mpfr.org, GNU MPFR} and @@ -27965,8 +28464,13 @@ numbers are not implemented.} (@math{emax = 2^{30} - 1, emin = -emax}) @end iftex @ifnottex +@ifnotdocbook (@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnotdocbook @end ifnottex +@docbook +(<emphasis>emax</emphasis> = 2<superscript>30</superscript> − 1, <emphasis>emin</emphasis> = −<emphasis>emax</emphasis>) @c +@end docbook for all floating-point contexts. There is no explicit mechanism to adjust the exponent range. MPFR does not implement subnormal numbers by default, @@ -27998,6 +28502,7 @@ your program. @node Setting Precision @subsection Setting the Working Precision @cindex @code{PREC} variable +@cindex setting working precision @command{gawk} uses a global working precision; it does not keep track of the precision or accuracy of individual numbers. Performing an arithmetic @@ -28037,8 +28542,15 @@ formula: @math{prec = 3.322 @cdot dps} @end iftex @ifnottex +@ifnotdocbook @var{prec} = 3.322 * @var{dps} +@end ifnotdocbook @end ifnottex +@docbook +<para> +<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis> @c +</para> +@end docbook @noindent Here, @var{prec} denotes the binary precision @@ -28073,6 +28585,7 @@ issues that occur because numbers are stored internally in binary. @node Setting Rounding Mode @subsection Setting the Rounding Mode @cindex @code{ROUNDMODE} variable +@cindex setting rounding mode The @code{ROUNDMODE} variable provides program level control over the rounding mode. @@ -28140,6 +28653,7 @@ In the first case, the number is stored with the default precision of 53 bits. @node Changing Precision @subsection Changing the Precision of a Number +@cindex changing precision of a number @cindex Laurie, Dirk @quotation @@ -28150,11 +28664,10 @@ floating-point format to a precision lower than working precision. Do we promote them to full membership of the high-precision club, or do we treat them and all their associates as second-class citizens? Sometimes the first course is proper, sometimes the second, and it takes -careful analysis to tell which.} - -Dirk Laurie@footnote{Dirk Laurie. +careful analysis to tell which.}@footnote{Dirk Laurie. @cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@author Dirk Laurie @end quotation @command{gawk} does not implicitly modify the precision of any previously @@ -28258,7 +28771,8 @@ the problem at hand is often the correct approach in such situations. @node Arbitrary Precision Integers @section Arbitrary Precision Integer Arithmetic with @command{gawk} -@cindex integer, arbitrary precision +@cindex integers, arbitrary precision +@cindex arbitrary precision integers If one of the options @option{--bignum} or @option{-M} is specified, @command{gawk} performs all @@ -28272,8 +28786,13 @@ For example, the following computes @math{5^{4^{3^{2}}}}, @end iftex @ifnottex +@ifnotdocbook 5^4^3^2, +@end ifnotdocbook @end ifnottex +@docbook +5<superscript>4<superscript>3<superscript>2</superscript></superscript></superscript>, @c +@end docbook the result of which is beyond the limits of ordinary @command{gawk} numbers: @@ -28295,9 +28814,16 @@ floating-point values instead, the precision needed for correct output would be @math{3.322 @cdot 183231}, @end iftex @ifnottex +@ifnotdocbook @samp{prec = 3.322 * dps}), would be 3.322 x 183231, +@end ifnotdocbook @end ifnottex +@docbook +<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis>), +would be +<emphasis>prec</emphasis> = 3.322 ⋅ 183231, @c +@end docbook or 608693. The result from an arithmetic operation with an integer and a floating-point value @@ -28346,7 +28872,7 @@ to begin with: gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' @end example -Note that for the particular example above, there is likely best +Note that for the particular example above, it is likely best to just use the following: @example @@ -28355,6 +28881,7 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} +@cindex dynamically loaded extensions It is possible to add new functions written in C or C++ to @command{gawk} using dynamically loaded libraries. This facility is available on systems @@ -28389,6 +28916,7 @@ When @option{--sandbox} is specified, extensions are disabled @node Extension Intro @section Introduction +@cindex plug-in An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of external compiled code that @command{gawk} can load at runtime to provide additional functionality, over and above the built-in capabilities @@ -28434,8 +28962,14 @@ Communication between @command{gawk} and an extension is two-way. First, when an extension is loaded, it is passed a pointer to a @code{struct} whose fields are function pointers. +@ifnotdocbook This is shown in @ref{load-extension}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="load-extension"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,load-extension @caption{Loading The Extension} @c FIXME: One day, it should not be necessary to have two cases, @@ -28448,13 +28982,27 @@ This is shown in @ref{load-extension}. @center @image{api-figure1, , , Loading the extension} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="load-extension"> +<title>Loading the extension</title> +<graphic fileref="api-figure1.eps"/> +</figure> +@end docbook The extension can call functions inside @command{gawk} through these function pointers, at runtime, without needing (link-time) access to @command{gawk}'s symbols. One of these function pointers is to a function for ``registering'' new built-in functions. +@ifnotdocbook This is shown in @ref{load-new-function}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="load-new-function"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,load-new-function @caption{Loading The New Function} @ifinfo @@ -28464,14 +29012,28 @@ This is shown in @ref{load-new-function}. @center @image{api-figure2, , , Loading the new function} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="load-new-function"> +<title>Loading the new function</title> +<graphic fileref="api-figure2.eps"/> +</figure> +@end docbook In the other direction, the extension registers its new functions with @command{gawk} by passing function pointers to the functions that provide the new feature (@code{do_chdir()}, for example). @command{gawk} associates the function pointer with a name and can then call it, using a defined calling convention. +@ifnotdocbook This is shown in @ref{call-new-function}. +@end ifnotdocbook +@ifdocbook +This is shown in @inlineraw{docbook, <xref linkend="call-new-function"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,call-new-function @caption{Calling The New Function} @ifinfo @@ -28481,6 +29043,14 @@ This is shown in @ref{call-new-function}. @center @image{api-figure3, , , Calling the new function} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="call-new-function"> +<title>Calling The New Function</title> +<graphic fileref="api-figure3.eps"/> +</figure> +@end docbook The @code{do_@var{xxx}()} function, in turn, then uses the function pointers in the API @code{struct} to do its work, such as updating @@ -28517,6 +29087,7 @@ happen, but we all know how @emph{that} goes.) @node Extension API Description @section API Description +@cindex extension API This (rather large) @value{SECTION} describes the API in detail. @@ -28524,6 +29095,7 @@ This (rather large) @value{SECTION} describes the API in detail. * Extension API Functions Introduction:: Introduction to the API functions. * General Data Types:: The data types. * Requesting Values:: How to get a value. +* Memory Allocation Functions:: Functions for allocating memory. * Constructor Functions:: Functions for creating values. * Registration Functions:: Functions to register things with @command{gawk}. @@ -28579,6 +29151,9 @@ Symbol table access: retrieving a global variable, creating one, or changing one. @item +Allocating, reallocating, and releasing memory. + +@item Creating and releasing cached values; this provides an efficient way to use values for multiple variables and can be a big performance win. @@ -28617,10 +29192,8 @@ corresponding standard header file @emph{before} including @file{gawkapi.h}: @item @code{EOF} @tab @code{<stdio.h>} @item @code{FILE} @tab @code{<stdio.h>} @item @code{NULL} @tab @code{<stddef.h>} -@item @code{malloc()} @tab @code{<stdlib.h>} @item @code{memcpy()} @tab @code{<string.h>} @item @code{memset()} @tab @code{<string.h>} -@item @code{realloc()} @tab @code{<stdlib.h>} @item @code{size_t} @tab @code{<sys/types.h>} @item @code{struct stat} @tab @code{<sys/stat.h>} @end multitable @@ -28650,8 +29223,9 @@ does not support this keyword, you should either place All pointers filled in by @command{gawk} are to memory managed by @command{gawk} and should be treated by the extension as read-only. Memory for @emph{all} strings passed into @command{gawk} -from the extension @emph{must} come from @code{malloc()} and is managed -by @command{gawk} from then on. +from the extension @emph{must} come from calling the API-provided function +pointers @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}, +and is managed by @command{gawk} from then on. @item The API defines several simple @code{struct}s that map values as seen @@ -28691,13 +29265,17 @@ the macros as if they were functions. @node General Data Types @subsection General Purpose Data Types +@cindex Robbins, Arnold +@cindex Ramey, Chet @quotation -@i{I have a true love/hate relationship with unions.}@* -Arnold Robbins +@i{I have a true love/hate relationship with unions.} +@author Arnold Robbins +@end quotation +@quotation @i{That's the thing about unions: the compiler will arrange things so they -can accommodate both love and hate.}@* -Chet Ramey +can accommodate both love and hate.} +@author Chet Ramey @end quotation The extension API defines a number of simple types and structures for general @@ -28717,9 +29295,9 @@ certain fields in the API data structures unwritable from extension code, while allowing @command{gawk} to use them as it needs to. @item typedef enum awk_bool @{ -@item @ @ @ @ awk_false = 0, -@item @ @ @ @ awk_true -@item @} awk_bool_t; +@itemx @ @ @ @ awk_false = 0, +@itemx @ @ @ @ awk_true +@itemx @} awk_bool_t; A simple boolean type. @item typedef struct awk_string @{ @@ -28729,7 +29307,8 @@ A simple boolean type. This represents a mutable string. @command{gawk} owns the memory pointed to if it supplied the value. Otherwise, it takes ownership of the memory pointed to. -@strong{Such memory must come from @code{malloc()}!} +@strong{Such memory must come from calling the API-provided function +pointers @code{api_malloc()}, @code{api_calloc()}, or @code{api_realloc()}!} As mentioned earlier, strings are maintained using the current multibyte encoding. @@ -28845,7 +29424,94 @@ print an error message, or reissue the request for the actual value type, as appropriate. This behavior is summarized in @ref{table-value-types-returned}. +@c FIXME: Try to do this with spans... +@ifdocbook +@anchor{table-value-types-returned} +@end ifdocbook +@docbook +<informaltable> +<tgroup cols="2"> + <colspec colwidth="50*"/><colspec colwidth="50*"/> + <thead> + <row><entry></entry><entry><para>Type of Actual Value:</para></entry></row> + </thead> + <tbody> + <row><entry></entry><entry></entry></row> + </tbody> +</tgroup> +<tgroup cols="6"> + <colspec colwidth="16.6*"/> + <colspec colwidth="16.6*"/> + <colspec colwidth="19.8*"/> + <colspec colwidth="15*"/> + <colspec colwidth="15*"/> + <colspec colwidth="16.6*"/> + <thead> + <row> + <entry></entry> + <entry></entry> + <entry><para>String</para></entry> + <entry><para>Number</para></entry> + <entry><para>Array</para></entry> + <entry><para>Undefined</para></entry> + </row> + </thead> + <tbody> + <row> + <entry></entry> + <entry><para><emphasis role="bold">String</emphasis></para></entry> + <entry><para>String</para></entry> + <entry><para>String</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Number</emphasis></para></entry> + <entry><para>Number if can be converted, else false</para></entry> + <entry><para>Number</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry><para><emphasis role="bold">Type</emphasis></para></entry> + <entry><para><emphasis role="bold">Array</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>Array</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry><para><emphasis role="bold">Requested:</emphasis></para></entry> + <entry><para><emphasis role="bold">Scalar</emphasis></para></entry> + <entry><para>Scalar</para></entry> + <entry><para>Scalar</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Undefined</emphasis></para></entry> + <entry><para>String</para></entry> + <entry><para>Number</para></entry> + <entry><para>Array</para></entry> + <entry><para>Undefined</para></entry> + </row> + <row> + <entry></entry> + <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry> + <entry><para>false</para></entry> + <entry><para>false</para></entry> + <entry><para>false</para> + </entry><entry><para>false</para></entry> + </row> + </tbody> +</tgroup> +</informaltable> +@end docbook + @ifnotplaintext +@ifnotdocbook @float Table,table-value-types-returned @caption{Value Types Returned} @multitable @columnfractions .50 .50 @@ -28861,6 +29527,7 @@ value type, as appropriate. This behavior is summarized in @item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false @end multitable @end float +@end ifnotdocbook @end ifnotplaintext @ifplaintext @float Table,table-value-types-returned @@ -28891,45 +29558,46 @@ value type, as appropriate. This behavior is summarized in @end float @end ifplaintext -@node Constructor Functions -@subsection Constructor Functions and Convenience Macros +@node Memory Allocation Functions +@subsection Memory Allocation Functions and Convenience Macros +@cindex allocating memory for extensions +@cindex extensions, allocating memory -The API provides a number of @dfn{constructor} functions for creating -string and numeric values, as well as a number of convenience macros. -This @value{SUBSECTION} presents them all as function prototypes, in -the way that extension code would use them. +The API provides a number of @dfn{memory allocation} functions for +allocating memory that can be passed to @command{gawk}, as well as a number of +convenience macros. @table @code -@item static inline awk_value_t * -@itemx make_const_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a C string constant -(or other string data), and automatically creates a @emph{copy} of the data -for storage in @code{result}. It returns @code{result}. +@item void *gawk_malloc(size_t size); +Call @command{gawk}-provided @code{api_malloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a @samp{char *} -value pointing to data previously obtained from @code{malloc()}. The idea here -is that the data is passed directly to @command{gawk}, which assumes -responsibility for it. It returns @code{result}. +@item void *gawk_calloc(size_t nmemb, size_t size); +Call @command{gawk}-provided @code{api_calloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_null_string(awk_value_t *result) -This specialized function creates a null string (the ``undefined'' value) -in the @code{awk_value_t} variable pointed to by @code{result}. -It returns @code{result}. +@item void *gawk_realloc(void *ptr, size_t size); +Call @command{gawk}-provided @code{api_realloc()} to allocate storage that may +be passed to @command{gawk}. -@item static inline awk_value_t * -@itemx make_number(double num, awk_value_t *result) -This function simply creates a numeric value in the @code{awk_value_t} variable -pointed to by @code{result}. +@item void gawk_free(void *ptr); +Call @command{gawk}-provided @code{api_free()} to release storage that was +allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. @end table -Two convenience macros may be used for allocating storage from @code{malloc()} -and @code{realloc()}. If the allocation fails, they cause @command{gawk} to -exit with a fatal error message. They should be used as if they were +The API has to provide these functions because it is possible +for an extension to be compiled and linked against a different +version of the C library than was used for the @command{gawk} +executable.@footnote{This is more common on MS-Windows systems, but +can happen on Unix-like systems as well.} If @command{gawk} were +to use its version of @code{free()} when the memory came from an +unrelated version of @code{malloc()}, unexpected behavior would +likely result. + +Two convenience macros may be used for allocating storage +from the API-provided function pointers @code{api_malloc()} and +@code{api_realloc()}. If the allocation fails, they cause @command{gawk} +to exit with a fatal error message. They should be used as if they were procedure calls that do not return a value. @table @code @@ -28941,7 +29609,7 @@ The arguments to this macro are as follows: The pointer variable to point at the allocated storage. @item type -The type of the pointer variable, used to create a cast for the call to @code{malloc()}. +The type of the pointer variable, used to create a cast for the call to @code{api_malloc()}. @item size The total number of bytes to be allocated. @@ -28965,13 +29633,51 @@ make_malloced_string(message, strlen(message), & result); @end example @item #define erealloc(pointer, type, size, message) @dots{} -This is like @code{emalloc()}, but it calls @code{realloc()}, -instead of @code{malloc()}. +This is like @code{emalloc()}, but it calls @code{api_realloc()}, +instead of @code{api_malloc()}. The arguments are the same as for the @code{emalloc()} macro. @end table +@node Constructor Functions +@subsection Constructor Functions + +The API provides a number of @dfn{constructor} functions for creating +string and numeric values, as well as a number of convenience macros. +This @value{SUBSECTION} presents them all as function prototypes, in +the way that extension code would use them. + +@table @code +@item static inline awk_value_t * +@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a C string constant +(or other string data), and automatically creates a @emph{copy} of the data +for storage in @code{result}. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a @samp{char *} +value pointing to data previously obtained from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The idea here +is that the data is passed directly to @command{gawk}, which assumes +responsibility for it. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_null_string(awk_value_t *result) +This specialized function creates a null string (the ``undefined'' value) +in the @code{awk_value_t} variable pointed to by @code{result}. +It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_number(double num, awk_value_t *result) +This function simply creates a numeric value in the @code{awk_value_t} variable +pointed to by @code{result}. +@end table + @node Registration Functions @subsection Registration Functions +@cindex register extension +@cindex extension registration This @value{SECTION} describes the API functions for registering parts of your extension with @command{gawk}. @@ -29016,8 +29722,8 @@ Letter case in function names is significant. This is a pointer to the C function that provides the desired functionality. The function must fill in the result with either a number -or a string. @command{awk} takes ownership of any string memory. -As mentioned earlier, string memory @strong{must} come from @code{malloc()}. +or a string. @command{gawk} takes ownership of any string memory. +As mentioned earlier, string memory @strong{must} come from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The @code{num_actual_args} argument tells the C function how many actual parameters were passed from the calling @command{awk} code. @@ -29093,6 +29799,7 @@ is invoked with the @option{--version} option. @node Input Parsers @subsubsection Customized Input Parsers +@cindex customized input parser By default, @command{gawk} reads text files as its input. It uses the value of @code{RS} to find the end of the record, and then uses @code{FS} @@ -29340,7 +30047,9 @@ Register the input parser pointed to by @code{input_parser} with @node Output Wrappers @subsubsection Customized Output Wrappers +@cindex customized output wrapper +@cindex output wrapper An @dfn{output wrapper} is the mirror image of an input parser. It allows an extension to take over the output to a file opened with the @samp{>} or @samp{>>} I/O redirection operators (@pxref{Redirection}). @@ -29454,6 +30163,7 @@ Register the output wrapper pointed to by @code{output_wrapper} with @node Two-way processors @subsubsection Customized Two-way Processors +@cindex customized two-way processor A @dfn{two-way processor} combines an input parser and an output wrapper for two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical @@ -29511,6 +30221,8 @@ Register the two-way processor pointed to by @code{two_way_processor} with @node Printing Messages @subsection Printing Messages +@cindex printing messages from extensions +@cindex messages from extensions You can print different kinds of warning messages from your extension, as described below. Note that for these functions, @@ -29584,6 +30296,7 @@ for more information on creating arrays. @node Symbol Table Access @subsection Symbol Table Access +@cindex accessing global variables from extensions Two sets of routines provide access to global variables, and one set allows you to create and release cached values. @@ -29629,6 +30342,13 @@ An extension can look up the value of @command{gawk}'s special variables. However, with the exception of the @code{PROCINFO} array, an extension cannot change any of those variables. +@quotation NOTE +It is possible for the lookup of @code{PROCINFO} to fail. This happens if +the @command{awk} program being run does not reference @code{PROCINFO}; +in this case @command{gawk} doesn't bother to create the array and +populate it. +@end quotation + @node Symbol table by cookie @subsubsection Variable Access and Update by Cookie @@ -29755,7 +30475,7 @@ assign those values to variables using @code{sym_update()} or @code{sym_update_scalar()}, as you like. However, you can understand the point of cached values if you remember that -@emph{every} string value's storage @emph{must} come from @code{malloc()}. +@emph{every} string value's storage @emph{must} come from @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. If you have 20 variables, all of which have the same string value, you must create 20 identical copies of the string.@footnote{Numeric values are clearly less problematic, requiring only a C @code{double} to store.} @@ -29841,6 +30561,7 @@ you should release any cached values that you created, using @node Array Manipulation @subsection Array Manipulation +@cindex array manipulation in extensions The primary data structure@footnote{Okay, the only data structure.} in @command{awk} is the associative array (@pxref{Arrays}). @@ -29952,7 +30673,7 @@ requires that you understand how such values are converted to strings (@pxref{Conversion}); thus using integral values is safest. As with @emph{all} strings passed into @code{gawk} from an extension, -the string value of @code{index} must come from @code{malloc()}, and +the string value of @code{index} must come from the API-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()} and @command{gawk} releases the storage. @item awk_bool_t set_array_element(awk_array_t a_cookie, @@ -30420,6 +31141,8 @@ information about how @command{gawk} was invoked. @node Extension Versioning @subsubsection API Version Constants and Variables +@cindex API version +@cindex extension API version The API provides both a ``major'' and a ``minor'' version number. The API versions are available at compile time as constants: @@ -30473,6 +31196,8 @@ provided in @file{gawkapi.h} (discussed later, in @node Extension API Informational Variables @subsubsection Informational Variables +@cindex API informational variables +@cindex extension API informational variables The API provides access to several variables that describe whether the corresponding command-line options were enabled when @@ -30618,6 +31343,8 @@ the version string with @command{gawk}. @node Finding Extensions @section How @command{gawk} Finds Extensions +@cindex extension search path +@cindex finding extensions Compiled extensions have to be installed in a directory where @command{gawk} can find them. If @command{gawk} is configured and @@ -30628,10 +31355,11 @@ path with a list of directories to search for compiled extensions. @node Extension Example @section Example: Some File Functions +@cindex extension example @quotation -@i{No matter where you go, there you are.} @* -Buckaroo Bonzai +@i{No matter where you go, there you are.} +@author Buckaroo Bonzai @end quotation @c It's enough to show chdir and stat, no need for fts @@ -31086,7 +31814,7 @@ do_stat(int nargs, awk_value_t *result) awk_array_t array; int ret; struct stat sbuf; - /* default is stat() */ + /* default is lstat() */ int (*statfunc)(const char *path, struct stat *sbuf) = lstat; assert(result != NULL); @@ -31272,6 +32000,7 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} @node Extension Samples @section The Sample Extensions In The @command{gawk} Distribution +@cindex extensions distributed with @command{gawk} This @value{SECTION} provides brief overviews of the sample extensions that come in the @command{gawk} distribution. Some of them are intended @@ -31306,15 +32035,15 @@ The usage is: @item @@load "filefuncs" This is how you load the extension. -@cindex @code{chdir} extension function +@cindex @code{chdir()} extension function @item result = chdir("/some/directory") The @code{chdir()} function is a direct hook to the @code{chdir()} system call to change the current directory. It returns zero upon success or less than zero upon error. In the latter case it updates @code{ERRNO}. -@cindex @code{stat} extension function -@item result = stat("/some/path", statdata [, follow]) +@cindex @code{stat()} extension function +@item result = stat("/some/path", statdata @r{[}, follow@r{]}) The @code{stat()} function provides a hook into the @code{stat()} system call. It returns zero upon success or less than zero upon error. @@ -31403,7 +32132,7 @@ or Not all systems support all file types. @end multitable -@cindex @code{fts} extension function +@cindex @code{fts()} extension function @item flags = or(FTS_PHYSICAL, ...) @itemx result = fts(pathlist, flags, filedata) Walk the file trees provided in @code{pathlist} and fill in the @@ -31414,7 +32143,7 @@ Return zero if there were no errors, otherwise return @minus{}1. The @code{fts()} function provides a hook to the C library @code{fts()} routines for traversing file hierarchies. Instead of returning data -about one file at a time in a stream, it fills in a multi-dimensional +about one file at a time in a stream, it fills in a multidimensional array with data about each file and directory encountered in the requested hierarchies. @@ -31515,7 +32244,7 @@ be more comfortable to use from an @command{awk} program. This includes the lack of a comparison function, since @command{gawk} already provides powerful array sorting facilities. While an @code{fts_read()}-like interface could have been provided, this felt less natural than simply -creating a multi-dimensional array to represent the file hierarchy and +creating a multidimensional array to represent the file hierarchy and its information. @end quotation @@ -31524,19 +32253,23 @@ See @file{test/fts.awk} in the @command{gawk} distribution for an example. @node Extension Sample Fnmatch @subsection Interface To @code{fnmatch()} -@cindex @code{fnmatch} extension function This extension provides an interface to the C library @code{fnmatch()} function. The usage is: -@example -@@load "fnmatch" +@table @code +@item @@load "fnmatch" +This is how you load the extension. -result = fnmatch(pattern, string, flags) -@end example +@cindex @code{fnmatch()} extension function +@item result = fnmatch(pattern, string, flags) +The return value is zero on success, @code{FNM_NOMATCH} +if the string did not match the pattern, or +a different non-zero value if an error occurred. +@end table -The @code{fnmatch} extension adds a single function named -@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of -flag values named @code{FNM}. +Besides the @code{fnmatch()} function, the @code{fnmatch} extension +adds one constant (@code{FNM_NOMATCH}), and an array of flag values +named @code{FNM}. The arguments to @code{fnmatch()} are: @@ -31552,10 +32285,6 @@ Either zero, or the bitwise OR of one or more of the flags in the @code{FNM} array. @end table -The return value is zero on success, @code{FNM_NOMATCH} -if the string did not match the pattern, or -a different non-zero value if an error occurred. - The flags are follows: @multitable @columnfractions .25 .75 @@ -31597,21 +32326,21 @@ The @code{fork} extension adds three functions, as follows. @item @@load "fork" This is how you load the extension. -@cindex @code{fork} extension function +@cindex @code{fork()} extension function @item pid = fork() -This function creates a new process. The return value is the zero in the -child and the process-id number of the child in the parent, or @minus{}1 +This function creates a new process. The return value is zero in the +child and the process-ID number of the child in the parent, or @minus{}1 upon error. In the latter case, @code{ERRNO} indicates the problem. In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are updated to reflect the correct values. -@cindex @code{waitpid} extension function +@cindex @code{waitpid()} extension function @item ret = waitpid(pid) -This function takes a numeric argument, which is the process-id to +This function takes a numeric argument, which is the process-ID to wait for. The return value is that of the @code{waitpid()} system call. -@cindex @code{wait} extension function +@cindex @code{wait()} extension function @item ret = wait() This function waits for the first child to die. The return value is that of the @@ -31698,11 +32427,11 @@ The @code{ordchr} extension adds two functions, named @item @@load "ordchr" This is how you load the extension. -@cindex @code{ord} extension function +@cindex @code{ord()} extension function @item number = ord(string) Return the numeric value of the first character in @code{string}. -@cindex @code{chr} extension function +@cindex @code{chr()} extension function @item char = chr(number) Return a string whose first character is that represented by @code{number}. @end table @@ -31819,14 +32548,14 @@ The @code{rwarray} extension adds two functions, named @code{writea()} and @code{reada()}, as follows: @table @code -@cindex @code{writea} extension function +@cindex @code{writea()} extension function @item ret = writea(file, array) This function takes a string argument, which is the name of the file to which dump the array, and the array itself as the second argument. @code{writea()} understands multidimensional arrays. It returns one on success, or zero upon failure. -@cindex @code{reada} extension function +@cindex @code{reada()} extension function @item ret = reada(file, array) @code{reada()} is the inverse of @code{writea()}; it reads the file named as its first argument, filling in @@ -31863,17 +32592,23 @@ ret = reada("arraydump.bin", array) @subsection Reading An Entire File The @code{readfile} extension adds a single function -named @code{readfile()}: +named @code{readfile()}, and an input parser: @table @code @item @@load "readfile" This is how you load the extension. -@cindex @code{readfile} extension function +@cindex @code{readfile()} extension function @item result = readfile("/some/path") The argument is the name of the file to read. The return value is a string containing the entire contents of the requested file. Upon error, the function returns the empty string and sets @code{ERRNO}. + +@item BEGIN @{ PROCINFO["readfile"] = 1 @} +In addition, the extension adds an input parser that is activated if +@code{PROCINFO["readfile"]} exists. +When activated, each input file is returned in its entirety as @code{$0}. +@code{RT} is set to the null string. @end table Here is an example: @@ -31910,7 +32645,7 @@ inserting @samp{@@load "time"} in your script. @item @@load "time" This is how you load the extension. -@cindex @code{gettimeofday} extension function +@cindex @code{gettimeofday()} extension function @item the_time = gettimeofday() Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating point value. If the time is unavailable on this platform, return @@ -31920,7 +32655,7 @@ If the standard C @code{gettimeofday()} system call is available on this platform, then it simply returns the value. Otherwise, if on Windows, it tries to use @code{GetSystemTimeAsFileTime()}. -@cindex @code{sleep} extension function +@cindex @code{sleep()} extension function @item result = sleep(@var{seconds}) Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. @@ -31932,6 +32667,8 @@ tries to use @code{nanosleep()} or @code{select()} to implement the delay. @node gawkextlib @section The @code{gawkextlib} Project +@cindex @code{gawkextlib} +@cindex extensions, where to find @cindex @code{gawkextlib} project The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} @@ -31939,7 +32676,7 @@ project provides a number of @command{gawk} extensions, including one for processing XML files. This is the evolution of the original @command{xgawk} (XML @command{gawk}) project. -As of this writing, there are four extensions: +As of this writing, there are five extensions: @itemize @bullet @item @@ -31947,6 +32684,9 @@ XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} XML parsing library. @item +PDF extension. + +@item PostgreSQL extension. @item @@ -31962,6 +32702,7 @@ The @code{time} extension described earlier (@pxref{Extension Sample Time}) was originally from this project but has been moved in to the main @command{gawk} distribution. +@cindex @command{git} utility You can check out the code for the @code{gawkextlib} project using the @uref{http://git-scm.com, GIT} distributed source code control system. The command is as follows: @@ -32076,6 +32817,7 @@ of the @value{DOCUMENT} where you can find more information. @command{awk}. * POSIX/GNU:: The extensions in @command{gawk} not in POSIX @command{awk}. +* Feature History:: The history of the features in @command{gawk}. * Common Extensions:: Common Extensions Summary. * Ranges and Locales:: How locales used to affect regexp ranges. * Contributors:: The major contributors to @command{gawk}. @@ -32173,7 +32915,7 @@ Multiple @code{BEGIN} and @code{END} rules @item Multidimensional arrays -(@pxref{Multi-dimensional}). +(@pxref{Multidimensional}). @end itemize @c ENDOFRANGE gawkv1 @@ -32380,7 +33122,7 @@ Special files in I/O redirections: @itemize @minus{} @item The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and -@file{/dev/fd/@var{N}} special @value{FN}s +@file{/dev/fd/@var{N}} special file names (@pxref{Special Files}). @item @@ -32604,7 +33346,7 @@ long options @item Support for the following obsolete systems was removed from the code -and the documentation for @command{gawk} @value{PVERSION} 4.0: +and the documentation for @command{gawk} version 4.0: @c nested table @itemize @minus @@ -32641,6 +33383,9 @@ Tandem (non-POSIX) @item Prestandard VAX C compiler for VAX/VMS +@item +GCC for VAX and Alpha has not been tested for a while. + @end itemize @end itemize @@ -32651,6 +33396,612 @@ Prestandard VAX C compiler for VAX/VMS @c ENDOFRANGE exgnot @c ENDOFRANGE posnot +@node Feature History +@appendixsec History of @command{gawk} Features + +@ignore +See the thread: +https://groups.google.com/forum/#!topic/comp.lang.awk/SAUiRuff30c +This motivated me to add this section. +@end ignore + +@ignore +I've tried to follow this general order, esp.@: for the 3.0 and 3.1 sections: + variables + special files + language changes (e.g., hex constants) + differences in standard awk functions + new gawk functions + new keywords + new command-line options + behavioral changes + new ports +Within each category, be alphabetical. +@end ignore + +This @value{SECTION} describes the features in @command{gawk} +over and above those in POSIX @command{awk}, +in the order they were added to @command{gawk}. + +Version 2.10 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +The @env{AWKPATH} environment variable for specifying a path search for +the @option{-f} command-line option +(@pxref{Options}). + +@item +The @code{IGNORECASE} variable and its effects +(@pxref{Case-sensitivity}). + +@item +The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and +@file{/dev/fd/@var{N}} special file names +(@pxref{Special Files}). +@end itemize + +Version 2.13 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +The @code{FIELDWIDTHS} variable and its effects +(@pxref{Constant Size}). + +@item +The @code{systime()} and @code{strftime()} built-in functions for obtaining +and printing timestamps +(@pxref{Time Functions}). + +@item +Additional command-line options +(@pxref{Options}): + +@itemize @minus +@item +The @option{-W lint} option to provide error and portability checking +for both the source code and at runtime. + +@item +The @option{-W compat} option to turn off the GNU extensions. + +@item +The @option{-W posix} option for full POSIX compliance. +@end itemize +@end itemize + +Version 2.14 of @command{gawk} introduced the following feature: + +@itemize @bullet +@item +The @code{next file} statement for skipping to the next data file +(@pxref{Nextfile Statement}). +@end itemize + +Version 2.15 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New variables (@pxref{Built-in Variables}): + +@itemize @minus +@item +@code{ARGIND}, which tracks the movement of @code{FILENAME} +through @code{ARGV}. + +@item +@code{ERRNO}, which contains the system error message when +@code{getline} returns @minus{}1 or @code{close()} fails. +@end itemize + +@item +The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and +@file{/dev/user} special file names. These have since been removed. + +@item +The ability to delete all of an array at once with @samp{delete @var{array}} +(@pxref{Delete}). + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The ability to use GNU-style long-named options that start with @option{--}. + +@item +The @option{--source} option for mixing command-line and library-file +source code. +@end itemize +@end itemize + +Version 3.0 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New or changed variables: + +@itemize @minus +@item +@code{IGNORECASE} changed, now applying to string comparison as well +as regexp operations +(@pxref{Case-sensitivity}). + +@item +@code{RT}, which contains the input text that matched @code{RS} +(@pxref{Records}). +@end itemize + +@item +Full support for both POSIX and GNU regexps +(@pxref{Regexp}). + +@item +The @code{gensub()} function for more powerful text manipulation +(@pxref{String Functions}). + +@item +The @code{strftime()} function acquired a default time format, +allowing it to be called with no arguments +(@pxref{Time Functions}). + +@item +The ability for @code{FS} and for the third +argument to @code{split()} to be null strings +(@pxref{Single Character Fields}). + +@item +The ability for @code{RS} to be a regexp +(@pxref{Records}). + +@item +The @code{next file} statement became @code{nextfile} +(@pxref{Nextfile Statement}). + +@item +The @code{fflush()} function from the +Bell Laboratories research version of @command{awk} +(@pxref{I/O Functions}). + +@item +New command line options: + +@itemize @minus +@item +The @option{--lint-old} option to +warn about constructs that are not available in +the original Version 7 Unix version of @command{awk} +(@pxref{V7/SVR3.1}). + +@item +The @option{-m} option from the +Bell Laboratories research version of @command{awk} +This was later removed. + +@item +The @option{--re-interval} option to provide interval expressions in regexps +(@pxref{Regexp Operators}). + +@item +The @option{--traditional} option was added as a better name for +@option{--compat} (@pxref{Options}). +@end itemize + +@item +The use of GNU Autoconf to control the configuration process +(@pxref{Quick Installation}). + +@item +Amiga support. + +@end itemize + +Version 3.1 of @command{gawk} introduced the following features: + +@itemize @bullet +@item +New variables +(@pxref{Built-in Variables}): + +@itemize @minus +@item +@code{BINMODE}, for non-POSIX systems, +which allows binary I/O for input and/or output files +(@pxref{PC Using}). + +@item +@code{LINT}, which dynamically controls lint warnings. + +@item +@code{PROCINFO}, an array for providing process-related information. + +@item +@code{TEXTDOMAIN}, for setting an application's internationalization text domain +(@pxref{Internationalization}). +@end itemize + +@item +The ability to use octal and hexadecimal constants in @command{awk} +program source code +(@pxref{Nondecimal-numbers}). + +@item +The @samp{|&} operator for two-way I/O to a coprocess +(@pxref{Two-way I/O}). + +@item +The @file{/inet} special files for TCP/IP networking using @samp{|&} +(@pxref{TCP/IP Networking}). + +@item +The optional second argument to @code{close()} that allows closing one end +of a two-way pipe to a coprocess +(@pxref{Two-way I/O}). + +@item +The optional third argument to the @code{match()} function +for capturing text-matching subexpressions within a regexp +(@pxref{String Functions}). + +@item +Positional specifiers in @code{printf} formats for +making translations easier +(@pxref{Printf Ordering}). + +@item +A number of new built-in functions: + +@itemize @minus +@item +The @code{asort()} and @code{asorti()} functions for sorting arrays +(@pxref{Array Sorting}). + +@item +The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()} functions +for internationalization +(@pxref{Programmer i18n}). + +@item +The @code{extension()} function and the ability to add +new built-in functions dynamically +(@pxref{Dynamic Extensions}). + +@item +The @code{mktime()} function for creating timestamps +(@pxref{Time Functions}). + +@item +The @code{and()}, @code{or()}, @code{xor()}, @code{compl()}, +@code{lshift()}, @code{rshift()}, and @code{strtonum()} functions +(@pxref{Bitwise Functions}). +@end itemize + +@item +@cindex @code{next file} statement +The support for @samp{next file} as two words was removed completely +(@pxref{Nextfile Statement}). + +@item +Additional commnd line options +(@pxref{Options}): + +@itemize @minus +@item +The @option{--dump-variables} option to print a list of all global variables. + +@item +The @option{--exec} option, for use in CGI scripts. + +@item +The @option{--gen-po} command-line option and the use of a leading +underscore to mark strings that should be translated +(@pxref{String Extraction}). + +@item +The @option{--non-decimal-data} option to allow non-decimal +input data +(@pxref{Nondecimal Data}). + +@item +The @option{--profile} option and @command{pgawk}, the +profiling version of @command{gawk}, for producing execution +profiles of @command{awk} programs +(@pxref{Profiling}). + +@item +The @option{--use-lc-numeric} option to force @command{gawk} +to use the locale's decimal point for parsing input data +(@pxref{Conversion}). +@end itemize + +@item +The use of GNU Automake to help in standardizing the configuration process +(@pxref{Quick Installation}). + +@item +The use of GNU @code{gettext} for @command{gawk}'s own message output +(@pxref{Gawk I18N}). + +@item +BeOS support. This was later removed. + +@item +Tandem support. This was later removed. + +@item +The Atari port became officially unsupported. + +@item +The source code changed to use ISO C standard-style function definitions. + +@item +POSIX compliance for @code{sub()} and @code{gsub()} +(@pxref{Gory Details}). + +@item +The @code{length()} function was extended to accept an array argument +and return the number of elements in the array +(@pxref{String Functions}). + +@item +The @code{strftime()} function acquired a third argument to +enable printing times as UTC +(@pxref{Time Functions}). +@end itemize + +Version 4.0 of @command{gawk} introduced the following features: + +@itemize @bullet + +@item +Variable additions: + +@itemize @minus +@item +@code{FPAT}, which allows you to specify a regexp that matches +the fields, instead of matching the field separator +(@pxref{Splitting By Content}). + +@item +If @code{PROCINFO["sorted_in"]} exists, @samp{for(iggy in foo)} loops sort the +indices before looping over them. The value of this element +provides control over how the indices are sorted before the loop +traversal starts +(@pxref{Controlling Scanning}). + +@item +@code{PROCINFO["strftime"]}, which holds +the default format for @code{strftime()} +(@pxref{Time Functions}). +@end itemize + +@item +The special files @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid} +and @file{/dev/user} were removed. + +@item +Support for IPv6 was added via the @file{/inet6} special file. +@file{/inet4} forces IPv4 and @file{/inet} chooses the system +default, which is probably IPv4 +(@pxref{TCP/IP Networking}). + +@item +The use of @samp{\s} and @samp{\S} escape sequences in regular expressions +(@pxref{GNU Regexp Operators}). + +@item +Interval expressions became part of default regular expressions +(@pxref{Regexp Operators}). + +@item +POSIX character classes work even with @option{--traditional} +(@pxref{Regexp Operators}). + +@item +@code{break} and @code{continue} became invalid outside a loop, +even with @option{--traditional} +(@pxref{Break Statement}, and also see +@ref{Continue Statement}). + +@item +@code{fflush()}, @code{nextfile}, and @samp{delete @var{array}} +are allowed if @option{--posix} or @option{--traditional}, since they +are all now part of POSIX. + +@item +An optional third argument to +@code{asort()} and @code{asorti()}, specifying how to sort +(@pxref{String Functions}). + +@item +The behavior of @code{fflush()} changed to match Brian Kernighan's @command{awk} +and for POSIX; now both @samp{fflush()} and @samp{fflush("")} +flush all open output redirections +(@pxref{I/O Functions}). + +@item +The @code{isarray()} +function which distinguishes if an item is an array +or not, to make it possible to traverse multidimensional arrays +(@pxref{Type Functions}). + +@item +The @code{patsplit()} +function which gives the same capability as @code{FPAT}, for splitting +(@pxref{String Functions}). + +@item +An optional fourth argument to the @code{split()} function, +which is an array to hold the values of the separators +(@pxref{String Functions}). + +@item +Arrays of arrays +(@pxref{Arrays of Arrays}). + +@item +The @code{BEGINFILE} and @code{ENDFILE} special patterns +(@pxref{BEGINFILE/ENDFILE}). + +@item +Indirect function calls +(@pxref{Indirect Calls}). + +@item +@code{switch} / @code{case} are enabled by default +(@pxref{Switch Statement}). + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The @option{-b} and @option{--characters-as-bytes} options +which prevent @command{gawk} from treating input as a multibyte string. + +@item +The redundant @option{--compat}, @option{--copyleft}, and @option{--usage} +long options were removed. + +@item +The @option{--gen-po} option was finally renamed to the correct @option{--gen-pot}. + +@item +The @option{--sandbox} option which disables certain features. + +@item +All long options acquired corresponding short options, for use in @samp{#!} scripts. +@end itemize + +@item +Directories named on the command line now produce a warning, not a fatal +error, unless @option{--posix} or @option{--traditional} are used +(@pxref{Command line directories}). + +@item +The @command{gawk} internals were rewritten, bringing the @command{dgawk} +debugger and possibly improved performance +(@pxref{Debugger}). + +@item +Per the GNU Coding Standards, dynamic extensions must now define +a global symbol indicating that they are GPL-compatible +(@pxref{Plugin License}). + +@item +In POSIX mode, string comparisons use @code{strcoll()} / @code{wcscoll()} +(@pxref{POSIX String Comparison}). + +@item +The option for raw sockets was removed, since it was never implemented +(@pxref{TCP/IP Networking}). + +@item +Ranges of the form @samp{[d-h]} are treated as if they were in the +C locale, no matter what kind of regexp is being used, and even if +@option{--posix} +(@pxref{Ranges and Locales}). + +@item +Support was removed for the following systems: + +@itemize @minus +@item +Atari + +@item +Amiga + +@item +BeOS + +@item +Cray + +@item +MIPS RiscOS + +@item +MS-DOS with Microsoft Compiler + +@item +MS-Windows with Microsoft Compiler + +@item +NeXT + +@item +SunOS 3.x, Sun 386 (Road Runner) + +@item +Tandem (non-POSIX) + +@item +Prestandard VAX C compiler for VAX/VMS +@end itemize +@end itemize + +Version 4.1 of @command{gawk} introduced the following features: + +@itemize @bullet + +@item +Three new arrays: +@code{SYMTAB}, @code{FUNCTAB}, and @code{PROCINFO["identifiers"]} +(@pxref{Auto-set}). + +@item +The three executables @command{gawk}, @command{pgawk}, and @command{dgawk}, were merged into +one, named just @command{gawk}. As a result the command line options changed. + +@item +Command line option changes +(@pxref{Options}): + +@itemize @minus +@item +The @option{-D} option invokes the debugger. + +@item +The @option{-i} and @option{--include} options +load @command{awk} library files. + +@item +The @option{-l} and @option{--load} options load compiled dynamic extensions. + +@item +The @option{-M} and @option{--bignum} options enable MPFR. + +@item +The @option{-o} only does pretty-printing. + +@item +The @option{-p} option is used for profiling. + +@item +The @option{-R} option was removed. +@end itemize + +@item +Support for high precision arithmetic with MPFR. +(@pxref{Gawk and MPFR}). + +@item +The @code{and()}, @code{or()} and @code{xor()} functions +changed to allow any number of arguments, +with a minimum of two +(@pxref{Bitwise Functions}). + +@item +The dynamic extension interface was completely redone +(@pxref{Dynamic Extensions}). + +@end itemize + +@c XXX ADD MORE STUFF HERE + @node Common Extensions @appendixsec Common Extensions Summary @@ -32664,18 +34015,18 @@ the three most widely-used freely available versions of @command{awk} @multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk} @headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk @item @samp{\x} Escape sequence @tab X @tab X @tab X -@item @code{RS} as regexp @tab @tab X @tab X @item @code{FS} as null string @tab X @tab X @tab X -@item @file{/dev/stdin} special file @tab X @tab @tab X +@item @file{/dev/stdin} special file @tab X @tab X @tab X @item @file{/dev/stdout} special file @tab X @tab X @tab X @item @file{/dev/stderr} special file @tab X @tab X @tab X -@item @code{**} and @code{**=} operators @tab X @tab @tab X +@item @code{delete} without subscript @tab X @tab X @tab X @item @code{fflush()} function @tab X @tab X @tab X -@item @code{func} keyword @tab X @tab @tab X +@item @code{length()} of an array @tab X @tab X @tab X @item @code{nextfile} statement @tab X @tab X @tab X -@item @code{delete} without subscript @tab X @tab X @tab X -@item @code{length()} of an array @tab X @tab @tab X +@item @code{**} and @code{**=} operators @tab X @tab @tab X +@item @code{func} keyword @tab X @tab @tab X @item @code{BINMODE} variable @tab @tab X @tab X +@item @code{RS} as regexp @tab @tab X @tab X @item Time related functions @tab @tab X @tab X @end multitable @@ -32695,7 +34046,7 @@ character ranges (such as @samp{[a-z]}) to match any character between the first character in the range and the last character in the range, inclusive. Ordering was based on the numeric value of each character in the machine's native character set. Thus, on ASCII-based systems, -@code{[a-z]} matched all the lowercase letters, and only the lowercase +@samp{[a-z]} matched all the lowercase letters, and only the lowercase letters, since the numeric values for the letters from @samp{a} through @samp{z} were contiguous. (On an EBCDIC system, the range @samp{[a-z]} includes additional, non-alphabetic characters as well.) @@ -32706,7 +34057,7 @@ as working in this fashion, and in particular, would teach that the that @samp{[A-Z]} was the ``correct'' way to match uppercase letters. And indeed, this was true.@footnote{And Life was good.} -The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}). +The 1992 POSIX standard introduced the idea of locales (@pxref{Locales}). Since many locales include other letters besides the plain twenty-six letters of the American English alphabet, the POSIX standard added character classes (@pxref{Bracket Expressions}) as a way to match @@ -32745,6 +34096,7 @@ This output is unexpected, since the @samp{bc} at the end of This result is due to the locale setting (and thus you may not see it on your system). +@cindex Unicode Similar considerations apply to other ranges. For example, @samp{["-/]} is perfectly valid in ASCII, but is not valid in many Unicode locales, such as @samp{en_US.UTF-8}. @@ -32756,18 +34108,19 @@ When @command{gawk} switched to using locale-aware regexp matchers, the problems began; especially as both GNU/Linux and commercial Unix vendors started implementing non-ASCII locales, @emph{and making them the default}. Perhaps the most frequently asked question became something -like ``why does @code{[A-Z]} match lowercase letters?!?'' +like ``why does @samp{[A-Z]} match lowercase letters?!?'' +@cindex Berry, Karl This situation existed for close to 10 years, if not more, and the @command{gawk} maintainer grew weary of trying to explain that @command{gawk} was being nicely standards-compliant, and that the issue was in the user's locale. During the development of version 4.0, he modified @command{gawk} to always treat ranges in the original, pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And -thus was born the Campain for Rational Range Interpretation (or RRI). A number -of GNU tools, such as @command{grep} and @command{sed}, have either -implemented this change, or will soon. Thanks to Karl Berry for coining the phrase -``Rational Range Interpretation.''} +thus was born the Campaign for Rational Range Interpretation (or +RRI). A number of GNU tools have either implemented this change, +or will soon. Thanks to Karl Berry for coining the phrase ``Rational +Range Interpretation.''} Fortunately, shortly before the final release of @command{gawk} 4.0, the maintainer learned that the 2008 standard had changed the @@ -32780,15 +34133,15 @@ and By using this lovely technical term, the standard gives license to implementors to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all -cases: the default regexp matching; with @option{--traditional}, and with +cases: the default regexp matching; with @option{--traditional} and with @option{--posix}; in all cases, @command{gawk} remains POSIX compliant. @node Contributors @appendixsec Major Contributors to @command{gawk} @cindex @command{gawk}, list of contributors to @quotation -@i{Always give credit where credit is due.}@* -Anonymous +@i{Always give credit where credit is due.} +@author Anonymous @end quotation This @value{SECTION} names the major contributors to @command{gawk} @@ -32976,6 +34329,15 @@ environments. (This is no longer supported) @item +@cindex Wallin, Anders +Anders Wallin helped keep the VMS port going for several years. + +@item +@cindex Gordon, Assaf +Assaf Gordon contributed the code to implement the +@option{--sandbox} option. + +@item @cindex Haque, John John Haque made the following contributions: @@ -32985,6 +34347,10 @@ The modifications to convert @command{gawk} into a byte-code interpreter, including the debugger. @item +The addition of true multidimensional arrays. +@ref{Arrays of Arrays}. + +@item The additional modifications for support of arbitrary precision arithmetic. @item @@ -32997,6 +34363,10 @@ into one, for the 4.1 release. @item Improved array internals for arrays indexed by integers. + +@item +The improved array sorting features were driven by John together +with Pat Rankin. @end itemize @item @@ -33011,6 +34381,11 @@ Arnold Robbins and Andrew Schorr, with notable contributions from the rest of the development team. @item +@cindex Colombo, Antonio +Antonio Giovanni Colombo rewrote a number of examples in the early +chapters that were severely dated, for which I am incredibly grateful. + +@item @cindex Robbins, Arnold Arnold Robbins has been working on @command{gawk} since 1988, at first @@ -33021,7 +34396,7 @@ helping David Trueman, and as the primary maintainer since around 1994. @appendix Installing @command{gawk} @c last two commas are part of see also -@cindex operating systems, See Also GNU/Linux, PC operating systems, Unix +@cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix @c STARTOFRANGE gligawk @cindex @command{gawk}, installing @c STARTOFRANGE ingawk @@ -33118,7 +34493,7 @@ Extracting the archive creates a directory named @file{gawk-@value{VERSION}.@value{PATCHLEVEL}} in the current directory. -The distribution @value{FN} is of the form +The distribution file name is of the form @file{gawk-@var{V}.@var{R}.@var{P}.tar.gz}. The @var{V} represents the major version of @command{gawk}, the @var{R} represents the current release of version @var{V}, and @@ -33150,6 +34525,13 @@ The actual @command{gawk} source code. @end table @table @file +@item ABOUT-NLS +Information about GNU @command{gettext} and translations. + +@item AUTHORS +A file with some information about the authorship of @command{gawk}. +It exists only to satisfy the pedants at the Free Software Foundation. + @item README @itemx README_d/README.* Descriptive files: @file{README} for @command{gawk} under Unix and the @@ -33173,16 +34555,6 @@ An older list of changes to @command{gawk}. @item COPYING The GNU General Public License. -@item FUTURES -A brief list of features and changes being contemplated for future -releases, with some indication of the time frame for the feature, based -on its difficulty. - -@item LIMITATIONS -A list of those factors that limit @command{gawk}'s performance. -Most of these depend on the hardware or operating system software and -are not limits in @command{gawk} itself. - @item POSIX.STD A description of behaviors in the POSIX standard for @command{awk} which are left undefined, or where @command{gawk} may not comply fully, as well @@ -33215,12 +34587,19 @@ The @command{troff} source for a manual page describing @command{gawk}. This is distributed for the convenience of Unix users. @cindex Texinfo -@item doc/gawk.texi +@item doc/gawktexi.in +@itemx doc/sidebar.awk The Texinfo source file for this @value{DOCUMENT}. -It should be processed with @TeX{} -(via @command{texi2dvi} or @command{texi2pdf}) +It should be processed by @file{doc/sidebar.awk} +before processing with @command{texi2dvi} or @command{texi2pdf} to produce a printed document, and with @command{makeinfo} to produce an Info or HTML file. +The @file{Makefile} takes care of this processing and produces +printable output via @command{texi2dvi} or @command{texi2pdf}. + +@item doc/gawk.texi +The file produced after processing @file{gawktexi.in} +with @file{sidebar.awk}. @item doc/gawk.info The generated Info file for this @value{DOCUMENT}. @@ -33259,15 +34638,21 @@ the @file{Makefile.in} files used by @command{autoconf} and @item Makefile.in @itemx aclocal.m4 +@itemx bisonfix.awk +@itemx config.guess @itemx configh.in @itemx configure.ac @itemx configure @itemx custom.h +@itemx depcomp +@itemx install-sh @itemx missing_d/* +@itemx mkinstalldirs @itemx m4/* -These files and subdirectories are used when configuring @command{gawk} -for various Unix systems. They are explained in -@ref{Unix Installation}. +These files and subdirectories are used when configuring and compiling +@command{gawk} for various Unix systems. Most of them are explained +in @ref{Unix Installation}. The rest are there to support the main +infrastructure. @item po/* The @file{po} library contains message translations. @@ -33291,6 +34676,11 @@ They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate subdirectories of @file{awklib/eg}. +@item extension/* +The source code, manual pages, and infrastructure files for +the sample extensions included with @command{gawk}. +@xref{Dynamic Extensions}, for more information. + @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @@ -33394,7 +34784,7 @@ please send in a bug report (@pxref{Bugs}). Of course, once you've built @command{gawk}, it is likely that you will wish to install it. To do so, you need to run the command @samp{make -check}, as a user with the appropriate permissions. How to do this +install}, as a user with the appropriate permissions. How to do this varies by system, but on many systems you can use the @command{sudo} command to do so. The command then becomes @samp{sudo make install}. It is likely that you will be asked for your password, and you will have @@ -33411,7 +34801,15 @@ command line when compiling @command{gawk} from scratch, including: @table @code -@cindex @code{--disable-lint} configuration option +@cindex @option{--disable-extensions} configuration option +@cindex configuration option, @code{--disable-extensions} +@item --disable-extensions +Disable configuring and building the sample extensions in the +@file{extension} directory. This is useful for cross-compiling. +The default action is to dynamically check if the extensions +can be configured and compiled. + +@cindex @option{--disable-lint} configuration option @cindex configuration option, @code{--disable-lint} @item --disable-lint Disable all lint checking within @code{gawk}. The @@ -33431,14 +34829,14 @@ Using this option may bring you some slight performance improvement. Using this option will cause some of the tests in the test suite to fail. This option may be removed at a later date. -@cindex @code{--disable-nls} configuration option +@cindex @option{--disable-nls} configuration option @cindex configuration option, @code{--disable-nls} @item --disable-nls Disable all message-translation facilities. This is usually not desirable, but it may bring you some slight performance improvement. -@cindex @code{--with-whiny-user-strftime} configuration option +@cindex @option{--with-whiny-user-strftime} configuration option @cindex configuration option, @code{--with-whiny-user-strftime} @item --with-whiny-user-strftime Force use of the included version of the @code{strftime()} @@ -33712,11 +35110,10 @@ multibyte functionality is not available. @c STARTOFRANGE pcgawon @cindex PC operating systems, @command{gawk} on -With the exception of the Cygwin environment, -the @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking}) -are not supported for MS-DOS or MS-Windows. EMX (OS/2 only) does support -at least the @samp{|&} operator. +Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support +both the @samp{|&} operator and TCP/IP networking +(@pxref{TCP/IP Networking}). +EMX (OS/2 only) supports at least the @samp{|&} operator. @cindex search paths @cindex search paths, for source files @@ -33846,7 +35243,7 @@ moved into the @code{BEGIN} rule. @command{gawk} can be built and used ``out of the box'' under MS-Windows if you are using the @uref{http://www.cygwin.com, Cygwin environment}. -This environment provides an excellent simulation of Unix, using the +This environment provides an excellent simulation of GNU/Linux, using the GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, and other GNU programs. Compilation and installation for Cygwin is the same as for a Unix system: @@ -33862,13 +35259,6 @@ When compared to GNU/Linux on the same system, the @samp{configure} step on Cygwin takes considerably longer. However, it does finish, and then the @samp{make} proceeds as usual. -@quotation NOTE -The @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking}) -are fully supported in the Cygwin environment. This is not true -for any other environment on MS-Windows. -@end quotation - @node MSYS @appendixsubsubsec Using @command{gawk} In The MSYS Environment @@ -33894,8 +35284,11 @@ The older designation ``VMS'' is used throughout to refer to OpenVMS. @menu * VMS Compilation:: How to compile @command{gawk} under VMS. +* VMS Dynamic Extensions:: Compiling @command{gawk} dynamic extensions on + VMS. * VMS Installation Details:: How to install @command{gawk} under VMS. * VMS Running:: How to run @command{gawk} under VMS. +* VMS GNV:: The VMS GNV Project. * VMS Old Gawk:: An old version comes with some VMS systems. @end menu @@ -33903,41 +35296,110 @@ The older designation ``VMS'' is used throughout to refer to OpenVMS. @appendixsubsubsec Compiling @command{gawk} on VMS @cindex compiling @command{gawk} for VMS -To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that -issues all the necessary @code{CC} and @code{LINK} commands. There is -also a @file{Makefile} for use with the @code{MMS} utility. From the source -directory, use either: +To compile @command{gawk} under VMS, there is a @code{DCL} command procedure +that issues all the necessary @code{CC} and @code{LINK} commands. There is +also a @file{Makefile} for use with the @code{MMS} and @code{MMK} utilities. +From the source directory, use either: + +@example +$ @kbd{@@[.vms]vmsbuild.com} +@end example + +@noindent +or: + +@example +$ @kbd{MMS/DESCRIPTION=[.vms]descrip.mms gawk} +@end example + +@noindent +or: + +@example +$ @kbd{MMK/DESCRIPTION=[.vms]descrip.mms gawk} +@end example + +@code{MMK} is an open source, free, near-clone of @code{MMS} and +can better handle @code{ODS-5} volumes with upper- and lowercase filenames. +@code{MMK} is available from @uref{https://github.com/endlesssoftware/mmk}. + +With @code{ODS-5} volumes and extended parsing enabled, the case of the target +parameter may need to be exact. + +@command{gawk} has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1 +using Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. +The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both +Alpha and IA64 VMS 8.4 used HP C 7.3.@footnote{The IA64 architecture +is also known as ``Itanium.''} + +The @file{[.vms]gawk_build_steps.txt} provides information on how to build +@command{gawk} into a PCSI kit that is compatible with the GNV product. + +@node VMS Dynamic Extensions +@appendixsubsubsec Compiling @command{gawk} Dynamic Extensions on VMS + +The extensions that have been ported to VMS can be built using one of +the following commands. @example -$ @kbd{@@[.VMS]VMSBUILD.COM} +$ @kbd{MMS/DESCRIPTION=[.vms]descrip.mms extensions} @end example @noindent or: @example -$ @kbd{MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK} +$ @kbd{MMK/DESCRIPTION=[.vms]descrip.mms extensions} @end example -Older versions of @command{gawk} could be built with VAX C or -GNU C on VAX/VMS, as well as with DEC C, but that is no longer -supported. DEC C (also briefly known as ``Compaq C'' and now known -as ``HP C,'' but referred to here as ``DEC C'') is required. Both -@code{VMSBUILD.COM} and @code{DESCRIP.MMS} contain some obsolete support -for the older compilers but are set up to use DEC C by default. +@command{gawk} uses @code{AWKLIBPATH} as either an environment variable +or a logical name to find the dynamic extensions. + +Dynamic extensions need to be compiled with the same compiler options for +floating point, pointer size, and symbol name handling as were used +to compile @command{gawk} itself. +Alpha and Itanium should use IEEE floating point. The pointer size is 32 bits, +and the symbol name handling should be exact case with CRC shortening for +symbols longer than 32 bits. + +For Alpha and Itanium: -@command{gawk} has been tested under Alpha/VMS 7.3-1 using Compaq C V6.4, -and on Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.@footnote{The IA64 -architecture is also known as ``Itanium.''} +@example +/name=(as_is,short) +/float=ieee/ieee_mode=denorm_results +@end example + +For VAX: + +@example +/name=(as_is,short) +@end example + +Compile time macros need to be defined before the first VMS-supplied +header file is included. + +@example +#if (__CRTL_VER >= 70200000) && !defined (__VAX) +#define _LARGEFILE 1 +#endif + +#ifndef __VAX +#ifdef __CRTL_VER +#if __CRTL_VER >= 80200000 +#define _USE_STD_STAT 1 +#endif +#endif +#endif +@end example @node VMS Installation Details @appendixsubsubsec Installing @command{gawk} on VMS -To install @command{gawk}, all you need is a ``foreign'' command, which is -a @code{DCL} symbol whose value begins with a dollar sign. For example: +To use @command{gawk}, all you need is a ``foreign'' command, which is a +@code{DCL} symbol whose value begins with a dollar sign. For example: @example -$ @kbd{GAWK :== $disk1:[gnubin]GAWK} +$ @kbd{GAWK :== $disk1:[gnubin]gawk} @end example @noindent @@ -33949,10 +35411,29 @@ Alternatively, the symbol may be placed in the system-wide @file{sylogin.com} procedure, which allows all users to run @command{gawk}. -Optionally, the help entry can be loaded into a VMS help library: +If your @command{gawk} was installed by a PCSI kit into the +@file{GNV$GNU:} directory tree, the program will be known as +@file{GNV$GNU:[bin]gnv$gawk.exe} and the help file will be +@file{GNV$GNU:[vms_help]gawk.hlp}. + +The PCSI kit also installs a @file{GNV$GNU:[vms_bin]gawk_verb.cld} file +which can be used to add @command{gawk} and @command{awk} as DCL commands. + +For just the current process you can use: + +@example +$ @kbd{set command gnv$gnu:[vms_bin]gawk_verb.cld} +@end example + +Or the system manager can use @file{GNV$GNU:[vms_bin]gawk_verb.cld} to +add the @command{gawk} and @command{awk} to the system wide @samp{DCLTABLES}. + +The DCL syntax is documented in the @file{gawk.hlp} file. + +Optionally, the @file{gawk.hlp} entry can be loaded into a VMS help library: @example -$ @kbd{LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP} +$ @kbd{LIBRARY/HELP sys$help:helplib [.vms]gawk.hlp} @end example @noindent @@ -33970,7 +35451,7 @@ provides information about both the @command{gawk} implementation and the The logical name @samp{AWK_LIBRARY} can designate a default location for @command{awk} program files. For the @option{-f} option, if the specified -@value{FN} has no device or directory path information in it, @command{gawk} +file name has no device or directory path information in it, @command{gawk} looks in the current directory first, then in the directory specified by the translation of @samp{AWK_LIBRARY} if the file is not found. If, after searching in both directories, the file still is not found, @@ -34003,9 +35484,42 @@ One side effect of dual command-line parsing is that if there is only a single parameter (as in the quoted string program above), the command becomes ambiguous. To work around this, the normally optional @option{--} flag is required to force Unix-style parsing rather than @code{DCL} parsing. If any -other dash-type options (or multiple parameters such as @value{DF}s to +other dash-type options (or multiple parameters such as data files to process) are present, there is no ambiguity and @option{--} can be omitted. +@cindex exit status, of VMS +The @code{exit} value is a Unix-style value and is encoded to a VMS exit +status value when the program exits. + +The VMS severity bits will be set based on the @code{exit} value. +A failure is indicated by 1 and VMS sets the @code{ERROR} status. +A fatal error is indicated by 2 and VMS will set the @code{FATAL} status. +All other values will have the @code{SUCCESS} status. The exit value is +encoded to comply with VMS coding standards and will have the +@code{C_FACILITY_NO} of @code{0x350000} with the constant @code{0xA000} +added to the number shifted over by 3 bits to make room for the severity codes. + +To extract the actual @command{gawk} exit code from the VMS status use: + +@example +unix_status = (vms_status .and. &x7f8) / 8 +@end example + +@noindent +A C program that uses @code{exec()} to call @command{gawk} will get the original +Unix-style exit value. + +Older versions of @command{gawk} treated a Unix exit code 0 as 1, a failure +as 2, a fatal error as 4, and passed all the other numbers through. +This violated the VMS exit status coding requirements. + +@cindex floating-point, VAX/VMS +VAX/VMS floating point uses unbiased rounding. @xref{Round Function}. + +VMS reports time values in GMT unless one of the @code{SYS$TIMEZONE_RULE} +or @code{TZ} logical names is set. Older versions of VMS, such as VAX/VMS +7.3 do not set these logical names. + @c @cindex directory search @c @cindex path, search @cindex search paths @@ -34017,6 +35531,21 @@ of @env{AWKPATH} is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation and not a multitranslation @code{RMS} searchlist. +@node VMS GNV +@appendixsubsubsec The VMS GNV Project + +The VMS GNV package provides a build environment similar to POSIX with ports +of a collection of open source tools. The @command{gawk} found in the GNV +base kit is an older port. Currently the GNV project is being reorganized +to supply individual PCSI packages for each component. +See @uref{https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/}. + +The normal build procedure for @command{gawk} produces a program that +is suitable for use with GNV. + +The @file{vms/gawk_build_steps.txt} in the source documents the procedure +for building a VMS PCSI kit that is compatible with GNV. + @ignore @c The VMS POSIX product, also known as POSIX for OpenVMS, is long defunct @c and building gawk for it has not been tested in many years, but these @@ -34064,7 +35593,7 @@ define a symbol, as follows: $ @kbd{gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe} @end example -This is apparently @value{PVERSION} 2.15.6, which is extremely old. We +This is apparently version 2.15.6, which is extremely old. We recommend compiling and using the current version. @c ENDOFRANGE opgawx @@ -34074,8 +35603,8 @@ recommend compiling and using the current version. @appendixsec Reporting Problems and Bugs @cindex archeologists @quotation -@i{There is nothing more dangerous than a bored archeologist.}@* -The Hitchhiker's Guide to the Galaxy +@i{There is nothing more dangerous than a bored archeologist.} +@author The Hitchhiker's Guide to the Galaxy @end quotation @c the radio show, not the book. :-) @@ -34093,8 +35622,8 @@ what you're trying to do. If it's not clear whether you should be able to do something or not, report that too; it's a bug in the documentation! Before reporting a bug or trying to fix it yourself, try to isolate it -to the smallest possible @command{awk} program and input @value{DF} that -reproduces the problem. Then send us the program and @value{DF}, +to the smallest possible @command{awk} program and input data file that +reproduces the problem. Then send us the program and data file, some idea of what kind of Unix system you're using, the compiler you used to compile @command{gawk}, and the exact results @command{gawk} gave you. Also say what you expected to occur; this helps @@ -34148,32 +35677,37 @@ mail at the Internet address noted previously. If you find bugs in one of the non-Unix ports of @command{gawk}, please send an electronic mail message to the person who maintains that port. They -are named in the following list, as well as in the @file{README} file in the @command{gawk} -distribution. Information in the @file{README} file should be considered -authoritative if it conflicts with this @value{DOCUMENT}. +are named in the following list, as well as in the @file{README} file +in the @command{gawk} distribution. Information in the @file{README} +file should be considered authoritative if it conflicts with this +@value{DOCUMENT}. The people maintaining the non-Unix ports of @command{gawk} are as follows: -@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890} +@c put the index entries outside the table, for docbook @cindex Deifik, Scott +@cindex Zaretskii, Eli +@cindex Buening, Andreas +@cindex Rankin, Pat +@cindex Malmberg, John +@cindex Pitts, Dave +@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890} @item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}. -@cindex Zaretskii, Eli @item MS-Windows with MINGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}. -@cindex Buening, Andreas @item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}. -@cindex Rankin, Pat -@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com} +@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com}, and +John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}. -@cindex Pitts, Dave @item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}. @end multitable If your bug is also reproducible under Unix, please send a copy of your -report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as well. +report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email +list as well. @c ENDOFRANGE dbugg @c ENDOFRANGE tblgawb @@ -34191,8 +35725,8 @@ Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT) @cindex Brennan, Michael @quotation @i{It's kind of fun to put comments like this in your awk code.}@* -@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}@* -Michael Brennan +@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course} +@author Michael Brennan @end quotation There are a number of other freely available @command{awk} implementations. @@ -34202,7 +35736,7 @@ This @value{SECTION} briefly describes where to get them: @cindex Kernighan, Brian @cindex source code, Brian Kernighan's @command{awk} @cindex @command{awk}, versions of, See Also Brian Kernighan's @command{awk} -@cindex Brian Kernighan's @command{awk} +@cindex Brian Kernighan's @command{awk}, source code @item Unix @command{awk} Brian Kernighan, one of the original designers of Unix @command{awk}, has made his implementation of @@ -34222,6 +35756,7 @@ It is available in several archive formats: @uref{http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip} @end table +@cindex @command{git} utility You can also retrieve it from Git Hub: @example @@ -34234,16 +35769,14 @@ repository in a directory named @file{bwkawk}. If you leave that argument off the @command{git} command line, the repository copy is created in a directory named @file{awk}. -This version requires an ISO C (1990 standard) compiler; -the C compiler from -GCC (the GNU Compiler Collection) -works quite nicely. +This version requires an ISO C (1990 standard) compiler; the C compiler +from GCC (the GNU Compiler Collection) works quite nicely. @xref{Common Extensions}, for a list of extensions in this @command{awk} that are not in POSIX @command{awk}. @cindex Brennan, Michael -@cindex @command{mawk} program +@cindex @command{mawk} utility @cindex source code, @command{mawk} @item @command{mawk} Michael Brennan wrote an independent implementation of @command{awk}, @@ -34289,7 +35822,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}. The project seems to be frozen; no new code changes have been made since approximately 2003. -@cindex Beebe, Nelson +@cindex Beebe, Nelson H.F.@: @cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk}) @cindex source code, @command{pawk} @item @command{pawk} @@ -34318,15 +35851,22 @@ information, see the @uref{http://busybox.net, project's home page}. @cindex source code, Solaris @command{awk} @item The OpenSolaris POSIX @command{awk} The version of @command{awk} in @file{/usr/xpg4/bin} on Solaris is -more-or-less -POSIX-compliant. It is based on the @command{awk} from Mortice Kern -Systems for PCs. The source code can be downloaded from -the @uref{http://www.opensolaris.org, OpenSolaris web site}. +more-or-less POSIX-compliant. It is based on the @command{awk} from +Mortice Kern Systems for PCs. This author was able to make it compile and work under GNU/Linux with 1--2 hours of work. Making it more generally portable (using GNU Autoconf and/or Automake) would take more work, and this has not been done, at least to our knowledge. +@cindex Illumos +@cindex Illumos, POSIX-compliant @command{awk} +@cindex source code, Illumos @command{awk} +The source code used to be available from the OpenSolaris web site. +However, that project was ended and the web site shut down. Fortunately, the +@uref{http://wiki.illumos.org/display/illumos/illumos+Home, Illumos project} +makes this implementation available. You can view the files one at a time from +@uref{https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/awk_xpg4}. + @cindex @command{jawk} @cindex Java implementation of @command{awk} @cindex source code, @command{jawk} @@ -34345,6 +35885,7 @@ This is an embeddable @command{awk} interpreter derived from @uref{http://repo.hu/projects/libmawk/}. @item @code{pawk} +@cindex source code, @command{pawk} (Python version) @cindex @code{pawk}, @command{awk}-like facilities for Python This is a Python module that claims to bring @command{awk}-like features to Python. See @uref{https://github.com/alecthomas/pawk} @@ -34367,6 +35908,10 @@ under the GPL. It has a large number of extensions over standard See @uref{http://www.quiktrim.org/QTawk.html} for more information, including the manual and a download link. +@item Other Versions +See also the @uref{http://en.wikipedia.org/wiki/Awk_language#Versions_and_implementations, +Wikipedia article}, for information on additional versions. + @end table @c ENDOFRANGE gligawk @c ENDOFRANGE ingawk @@ -34446,6 +35991,7 @@ As @command{gawk} is Free Software, the source code is always available. @ref{Gawk Distribution}, describes how to get and build the formal, released versions of @command{gawk}. +@cindex @command{git} utility However, if you want to modify @command{gawk} and contribute back your changes, you will probably wish to work with the development version. To do so, you will need to access the @command{gawk} source code @@ -34517,7 +36063,7 @@ for information on getting the latest version of @command{gawk}.) @item @ifnotinfo -Follow the @cite{GNU Coding Standards}. +Follow the @uref{http://www.gnu.org/prep/standards/, @cite{GNU Coding Standards}}. @end ifnotinfo @ifinfo See @inforef{Top, , Version, standards, GNU Coding Standards}. @@ -34621,6 +36167,7 @@ If possible, please update the @command{man} page as well. You will also have to sign paperwork for your documentation changes. +@cindex @command{git} utility @item Submit changes as unified diffs. Use @samp{diff -u -r -N} to compare @@ -34676,11 +36223,9 @@ Be prepared to sign the appropriate paperwork. In order for the FSF to distribute your code, you must either place your code in the public domain and submit a signed statement to that effect, or assign the copyright in your code to the FSF. -@ifinfo Both of these actions are easy to do and @emph{many} people have done so already. If you have questions, please contact me, or @email{gnu@@gnu.org}. -@end ifinfo @item When doing a port, bear in mind that your code must coexist peacefully @@ -34756,6 +36301,8 @@ coding style and brace layout that suits your taste. @node Derived Files @appendixsubsec Why Generated Files Are Kept In @command{git} +@c STARTOFRANGE gawkgit +@cindex @command{git}, use of for @command{gawk} source code @c From emails written March 22, 2012, to the gawk developers list. If you look at the @command{gawk} source in the @command{git} @@ -34935,7 +36482,7 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta @noindent to retrieve a snapshot of the given branch. - +@c ENDOFRANGE gawkgit @node Future Extensions @appendixsec Probable Future Extensions @@ -34977,11 +36524,13 @@ Larry @cindex Wall, Larry @cindex Robbins, Arnold @quotation -@i{AWK is a language similar to PERL, only considerably more elegant.}@* -Arnold Robbins +@i{AWK is a language similar to PERL, only considerably more elegant.} +@author Arnold Robbins +@end quotation -@i{Hey!}@* -Larry Wall +@quotation +@i{Hey!} +@author Larry Wall @end quotation The @file{TODO} file in the @command{gawk} Git repository lists possible @@ -35113,7 +36662,7 @@ in order to loop over all the element in an easy fashion for C code. @item The ability to create arrays (including @command{gawk}'s true -multi-dimensional arrays). +multidimensional arrays). @end itemize @end itemize @@ -35246,11 +36795,11 @@ to any of the above. @ref{Dynamic Extensions}, describes the supported API and mechanisms for writing extensions for @command{gawk}. This API was introduced -in @value{PVERSION} 4.1. However, for many years @command{gawk} +in version 4.1. However, for many years @command{gawk} provided an extension mechanism that required knowledge of @command{gawk} internals and that was not as well designed. -In order to provide a transition period, @command{gawk} @value{PVERSION} +In order to provide a transition period, @command{gawk} version 4.1 continues to support the original extension mechanism. This will be true for the life of exactly one major release. This support will be withdrawn, and removed from the source code, at the next major @@ -35304,8 +36853,15 @@ other introductory texts that you should refer to instead.) @cindex processing data At the most basic level, the job of a program is to process -some input data and produce results. See @ref{figure-general-flow}. +some input data and produce results. +@ifnotdocbook +See @ref{figure-general-flow}. +@end ifnotdocbook +@ifdocbook +See @inlineraw{docbook, <xref linkend="figure-general-flow"/>}. +@end ifdocbook +@ifnotdocbook @float Figure,figure-general-flow @caption{General Program Flow} @ifinfo @@ -35315,6 +36871,14 @@ some input data and produce results. See @ref{figure-general-flow}. @center @image{general-program, , , General program flow} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="figure-general-flow"> +<title>General Program Flow</title> +<graphic fileref="general-program.eps"/> +</figure> +@end docbook @cindex compiled programs @cindex interpreted programs @@ -35330,9 +36894,15 @@ instructions in your program to process the data. @cindex programming, basic steps When you write a program, it usually consists -of the following, very basic set of steps, as shown -in @ref{figure-process-flow}: +of the following, very basic set of steps, +@ifnotdocbook +as shown in @ref{figure-process-flow}: +@end ifnotdocbook +@ifdocbook +as shown in @inlineraw{docbook <xref linkend="figure-process-flow"/>}: +@end ifdocbook +@ifnotdocbook @float Figure,figure-process-flow @caption{Basic Program Steps} @ifinfo @@ -35342,6 +36912,14 @@ in @ref{figure-process-flow}: @center @image{process-flow, , , Basic Program Stages} @end ifnotinfo @end float +@end ifnotdocbook + +@docbook +<figure id="figure-process-flow"> +<title>Basic Program Stages</title> +<graphic fileref="process-flow.eps"/> +</figure> +@end docbook @table @asis @item Initialization @@ -35512,7 +37090,7 @@ better written in another language. You can get it from @uref{http://awk.info/?awk100/aaa}. @cindex Ada programming language -@cindex Programming languages, Ada +@cindex programming languages, Ada @item Ada A programming language originally defined by the U.S.@: Department of Defense for embedded programming. It was designed to enforce good @@ -35580,9 +37158,6 @@ The GNU version of the standard shell @end ifinfo See also ``Bourne Shell.'' -@item BBS -See ``Bulletin Board System.'' - @item Bit Short for ``Binary Digit.'' All values in computer memory ultimately reduce to binary digits: values @@ -35657,11 +37232,6 @@ Changing some of them affects @command{awk}'s running environment. @item Braces See ``Curly Braces.'' -@item Bulletin Board System -A computer system allowing users to log in and read and/or leave messages -for other users of the system, much like leaving paper notes on a bulletin -board. - @item C The system programming language that most GNU software is written in. The @command{awk} programming language has C-like syntax, and this @value{DOCUMENT} @@ -35688,6 +37258,8 @@ The @uref{http://www.unicode.org, Unicode character set} is becoming increasingly popular and standard, and is particularly widely used on GNU/Linux systems. +@cindex Kernighan, Brian +@cindex Bentley, Jon @cindex @command{chem} utility @item CHEM A preprocessor for @command{pic} that reads descriptions of molecules @@ -35824,7 +37396,7 @@ ordinary expression. It could be a string constant, such as (@xref{Computed Regexps}.) @item Environment -A collection of strings, of the form @var{name@code{=}val}, that each +A collection of strings, of the form @var{name}@code{=}@code{val}, that each program has available to it. Users generally place values into the environment in order to provide information to various programs. Typical examples are the environment variables @env{HOME} and @env{PATH}. @@ -35993,7 +37565,7 @@ information about the name of the organization and its language-independent three-letter acronym. @cindex Java programming language -@cindex Programming languages, Java +@cindex programming languages, Java @item Java A modern programming language originally developed by Sun Microsystems (now Oracle) supporting Object-Oriented programming. Although usually @@ -36218,7 +37790,7 @@ numeric values. It is the C type @code{float}. The character generated by hitting the space bar on the keyboard. @item Special File -A @value{FN} interpreted internally by @command{gawk}, instead of being handed +A file name interpreted internally by @command{gawk}, instead of being handed directly to the underlying operating system---for example, @file{/dev/stderr}. (@xref{Special Files}.) @@ -36280,7 +37852,12 @@ record or a string. @c The GNU General Public License. @node Copying @unnumbered GNU General Public License +@ifnotdocbook @center Version 3, 29 June 2007 +@end ifnotdocbook +@docbook +<subtitle>Version 3, 29 June 2007</subtitle> +@end docbook @c This file is intended to be included within another document, @c hence no sectioning command or @node. @@ -37005,10 +38582,17 @@ first, please read @url{http://www.gnu.org/philosophy/why-not-lgpl.html}. @c The GNU Free Documentation License. @node GNU Free Documentation License @unnumbered GNU Free Documentation License +@ifnotdocbook +@center Version 1.3, 3 November 2008 +@end ifnotdocbook + +@docbook +<subtitle>Version 1.3, 3 November 2008</subtitle> +@end docbook + @cindex FDL (Free Documentation License) @cindex Free Documentation License (FDL) @cindex GNU Free Documentation License -@center Version 1.3, 3 November 2008 @c This file is intended to be included within another document, @c hence no sectioning command or @node. @@ -37513,8 +39097,10 @@ to permit their use in free software. @c ispell-local-pdict: "ispell-dict" @c End: +@ifnotdocbook @node Index @unnumbered Index +@end ifnotdocbook @printindex cp @bye @@ -37599,6 +39185,7 @@ Consistency issues: Use MS-Windows not MS Windows Use MS-DOS not MS-DOS Use an empty set of parentheses after built-in and awk function names. + Use "multiFOO" without a hyphen. Date: Wed, 13 Apr 94 15:20:52 -0400 From: rms@gnu.org (Richard Stallman) @@ -37624,8 +39211,6 @@ Suggestions: % Next edition: % 1. Standardize the error messages from the functions and programs % in the two sample code chapters. -% 2. Nuke the BBS stuff and use something that won't be obsolete -% 3. Turn the advanced notes into sidebars by using @cartouche Better sidebars can almost sort of be done with: @@ -37657,4 +39242,3 @@ But to use it you have to say } which sorta sucks. - diff --git a/doc/texinfo.tex b/doc/texinfo.tex index 9cf29f2e..7506dffb 100644 --- a/doc/texinfo.tex +++ b/doc/texinfo.tex @@ -3,11 +3,11 @@ % Load plain if necessary, i.e., if running under initex. \expandafter\ifx\csname fmtname\endcsname\relax\input plain\fi % -\def\texinfoversion{2013-06-21.17} +\def\texinfoversion{2014-03-18.17} % % Copyright 1985, 1986, 1988, 1990, 1991, 1992, 1993, 1994, 1995, % 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, -% 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc. +% 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc. % % This texinfo.tex file is free software: you can redistribute it and/or % modify it under the terms of the GNU General Public License as @@ -281,9 +281,9 @@ \toks6=\expandafter{\prevsectiondefs}% \toks8=\expandafter{\lastcolordefs}% \mark{% - \the\toks0 \the\toks2 - \noexpand\or \the\toks4 \the\toks6 - \noexpand\else \the\toks8 + \the\toks0 \the\toks2 % 0: top marks (\last...) + \noexpand\or \the\toks4 \the\toks6 % 1: bottom marks (default, \prev...) + \noexpand\else \the\toks8 % 2: color marks }% } % \topmark doesn't work for the very first chapter (after the title @@ -322,10 +322,13 @@ % % Do this outside of the \shipout so @code etc. will be expanded in % the headline as they should be, not taken literally (outputting ''code). + \def\commmonheadfootline{\let\hsize=\pagewidth \texinfochars} + % \ifodd\pageno \getoddheadingmarks \else \getevenheadingmarks \fi - \setbox\headlinebox = \vbox{\let\hsize=\pagewidth \makeheadline}% + \global\setbox\headlinebox = \vbox{\commmonheadfootline \makeheadline}% + % \ifodd\pageno \getoddfootingmarks \else \getevenfootingmarks \fi - \setbox\footlinebox = \vbox{\let\hsize=\pagewidth \makefootline}% + \global\setbox\footlinebox = \vbox{\commmonheadfootline \makefootline}% % {% % Have to do this stuff outside the \shipout because we want it to @@ -1135,10 +1138,12 @@ output) for that.)} \ifpdf % - % Color manipulation macros based on pdfcolor.tex, + % Color manipulation macros using ideas from pdfcolor.tex, % except using rgb instead of cmyk; the latter is said to render as a % very dark gray on-screen and a very dark halftone in print, instead - % of actual black. + % of actual black. The dark red here is dark enough to print on paper as + % nearly black, but still distinguishable for online viewing. We use + % black by default, though. \def\rgbDarkRed{0.50 0.09 0.12} \def\rgbBlack{0 0 0} % @@ -1248,10 +1253,9 @@ output) for that.)} % used to mark target names; must be expandable. \def\pdfmkpgn#1{#1} % - % by default, use a color that is dark enough to print on paper as - % nearly black, but still distinguishable for online viewing. - \def\urlcolor{\rgbDarkRed} - \def\linkcolor{\rgbDarkRed} + % by default, use black for everything. + \def\urlcolor{\rgbBlack} + \def\linkcolor{\rgbBlack} \def\endlink{\setcolor{\maincolor}\pdfendlink} % % Adding outlines to PDF; macros for calculating structure of outlines @@ -2377,8 +2381,10 @@ end \ifx\next,% \else\ifx\next-% \else\ifx\next.% + \else\ifx\next\.% + \else\ifx\next\comma% \else\ptexslash - \fi\fi\fi + \fi\fi\fi\fi\fi \aftersmartic } @@ -2519,7 +2525,9 @@ end \ifx\codedashprev\codedash \else \discretionary{}{}{}\fi \fi - \global\let\codedashprev=\next + % we need the space after the = for the case when \next itself is a + % space token; it would get swallowed otherwise. As in @code{- a}. + \global\let\codedashprev= \next } } \def\normaldash{-} @@ -2567,37 +2575,21 @@ end \let\file=\code \let\option=\code -% @uref (abbreviation for `urlref') takes an optional (comma-separated) -% second argument specifying the text to display and an optional third -% arg as text to display instead of (rather than in addition to) the url -% itself. First (mandatory) arg is the url. -% (This \urefnobreak definition isn't used now, leaving it for a while -% for comparison.) -\def\urefnobreak#1{\dourefnobreak #1,,,\finish} -\def\dourefnobreak#1,#2,#3,#4\finish{\begingroup - \unsepspaces - \pdfurl{#1}% - \setbox0 = \hbox{\ignorespaces #3}% - \ifdim\wd0 > 0pt - \unhbox0 % third arg given, show only that - \else - \setbox0 = \hbox{\ignorespaces #2}% - \ifdim\wd0 > 0pt - \ifpdf - \unhbox0 % PDF: 2nd arg given, show only it - \else - \unhbox0\ (\code{#1})% DVI: 2nd arg given, show both it and url - \fi - \else - \code{#1}% only url given, so show it - \fi - \fi - \endlink -\endgroup} +% @uref (abbreviation for `urlref') aka @url takes an optional +% (comma-separated) second argument specifying the text to display and +% an optional third arg as text to display instead of (rather than in +% addition to) the url itself. First (mandatory) arg is the url. + +% TeX-only option to allow changing PDF output to show only the second +% arg (if given), and not the url (which is then just the link target). +\newif\ifurefurlonlylink -% This \urefbreak definition is the active one. +% The main macro is \urefbreak, which allows breaking at expected +% places within the url. (There used to be another version, which +% didn't support automatic breaking.) \def\urefbreak{\begingroup \urefcatcodes \dourefbreak} \let\uref=\urefbreak +% \def\dourefbreak#1{\urefbreakfinish #1,,,\finish} \def\urefbreakfinish#1,#2,#3,#4\finish{% doesn't work in @example \unsepspaces @@ -2606,12 +2598,19 @@ end \ifdim\wd0 > 0pt \unhbox0 % third arg given, show only that \else - \setbox0 = \hbox{\ignorespaces #2}% + \setbox0 = \hbox{\ignorespaces #2}% look for second arg \ifdim\wd0 > 0pt \ifpdf - \unhbox0 % PDF: 2nd arg given, show only it + \ifurefurlonlylink + % PDF plus option to not display url, show just arg + \unhbox0 + \else + % PDF, normally display both arg and url for consistency, + % visibility, if the pdf is eventually used to print, etc. + \unhbox0\ (\urefcode{#1})% + \fi \else - \unhbox0\ (\urefcode{#1})% DVI: 2nd arg given, show both it and url + \unhbox0\ (\urefcode{#1})% DVI, always show arg and url \fi \else \urefcode{#1}% only url given, so show it @@ -2651,8 +2650,10 @@ end % we put a little stretch before and after the breakable chars, to help % line breaking of long url's. The unequal skips make look better in % cmtt at least, especially for dots. -\def\urefprestretch{\urefprebreak \hskip0pt plus.13em } -\def\urefpoststretch{\urefpostbreak \hskip0pt plus.1em } +\def\urefprestretchamount{.13em} +\def\urefpoststretchamount{.1em} +\def\urefprestretch{\urefprebreak \hskip0pt plus\urefprestretchamount\relax} +\def\urefpoststretch{\urefpostbreak \hskip0pt plus\urefprestretchamount\relax} % \def\urefcodeamp{\urefprestretch \&\urefpoststretch} \def\urefcodedot{\urefprestretch .\urefpoststretch} @@ -2887,6 +2888,15 @@ end \def\inlinefmtname{#1}% \ifx\inlinefmtname\outfmtnametex \ignorespaces #2\fi } +% +% @inlinefmtifelse{FMTNAME,THEN-TEXT,ELSE-TEXT} expands THEN-TEXT if +% FMTNAME is tex, else ELSE-TEXT. +\long\def\inlinefmtifelse#1{\doinlinefmtifelse #1,,,\finish} +\long\def\doinlinefmtifelse#1,#2,#3,#4,\finish{% + \def\inlinefmtname{#1}% + \ifx\inlinefmtname\outfmtnametex \ignorespaces #2\else \ignorespaces #3\fi +} +% % For raw, must switch into @tex before parsing the argument, to avoid % setting catcodes prematurely. Doing it this way means that, for % example, @inlineraw{html, foo{bar} gets a parse error instead of being @@ -2903,6 +2913,23 @@ end \endgroup % close group opened by \tex. } +% @inlineifset{VAR, TEXT} expands TEXT if VAR is @set. +% +\long\def\inlineifset#1{\doinlineifset #1,\finish} +\long\def\doinlineifset#1,#2,\finish{% + \def\inlinevarname{#1}% + \expandafter\ifx\csname SET\inlinevarname\endcsname\relax + \else\ignorespaces#2\fi +} + +% @inlineifclear{VAR, TEXT} expands TEXT if VAR is not @set. +% +\long\def\inlineifclear#1{\doinlineifclear #1,\finish} +\long\def\doinlineifclear#1,#2,\finish{% + \def\inlinevarname{#1}% + \expandafter\ifx\csname SET\inlinevarname\endcsname\relax \ignorespaces#2\fi +} + \message{glyphs,} % and logos. @@ -3658,7 +3685,7 @@ end \parskip=\smallskipamount \ifdim\parskip=0pt \parskip=2pt \fi % - % Try typesetting the item mark that if the document erroneously says + % Try typesetting the item mark so that if the document erroneously says % something like @itemize @samp (intending @table), there's an error % right away at the @itemize. It's not the best error message in the % world, but it's better than leaving it to the @item. This means if @@ -3908,19 +3935,23 @@ end } % multitable-only commands. -% -% @headitem starts a heading row, which we typeset in bold. -% Assignments have to be global since we are inside the implicit group -% of an alignment entry. \everycr resets \everytab so we don't have to +% +% @headitem starts a heading row, which we typeset in bold. Assignments +% have to be global since we are inside the implicit group of an +% alignment entry. \everycr below resets \everytab so we don't have to % undo it ourselves. \def\headitemfont{\b}% for people to use in the template row; not changeable \def\headitem{% \checkenv\multitable \crcr + \gdef\headitemcrhook{\nobreak}% attempt to avoid page break after headings \global\everytab={\bf}% can't use \headitemfont since the parsing differs \the\everytab % for the first item }% % +% default for tables with no headings. +\let\headitemcrhook=\relax +% % A \tab used to include \hskip1sp. But then the space in a template % line is not enough. That is bad. So let's go back to just `&' until % we again encounter the problem the 1sp was intended to solve. @@ -3951,15 +3982,15 @@ end % \everycr = {% \noalign{% - \global\everytab={}% + \global\everytab={}% Reset from possible headitem. \global\colcount=0 % Reset the column counter. - % Check for saved footnotes, etc. + % + % Check for saved footnotes, etc.: \checkinserts - % Keeps underfull box messages off when table breaks over pages. - %\filbreak - % Maybe so, but it also creates really weird page breaks when the - % table breaks over pages. Wouldn't \vfil be better? Wait until the - % problem manifests itself, so it can be fixed for real --karl. + % + % Perhaps a \nobreak, then reset: + \headitemcrhook + \global\let\headitemcrhook=\relax }% }% % @@ -4198,7 +4229,7 @@ end \def\value{\begingroup\makevalueexpandable\valuexxx} \def\valuexxx#1{\expandablevalue{#1}\endgroup} { - \catcode`\- = \active \catcode`\_ = \active + \catcode`\-=\active \catcode`\_=\active % \gdef\makevalueexpandable{% \let\value = \expandablevalue @@ -4218,7 +4249,12 @@ end % variable's value contains other Texinfo commands, it's almost certain % it will fail (although perhaps we could fix that with sufficient work % to do a one-level expansion on the result, instead of complete). -% +% +% Unfortunately, this has the consequence that when _ is in the *value* +% of an @set, it does not print properly in the roman fonts (get the cmr +% dot accent at position 126 instead). No fix comes to mind, and it's +% been this way since 2003 or earlier, so just ignore it. +% \def\expandablevalue#1{% \expandafter\ifx\csname SET#1\endcsname\relax {[No value for ``#1'']}% @@ -4396,7 +4432,7 @@ end % complicated, when \tex is in effect and \{ is a \delimiter again. % We can't use \lbracecmd and \rbracecmd because texindex assumes % braces and backslashes are used only as delimiters. Perhaps we - % should define @lbrace and @rbrace commands a la @comma. + % should use @lbracechar and @rbracechar? \def\{{{\tt\char123}}% \def\}{{\tt\char125}}% % @@ -4417,8 +4453,7 @@ end % @end macro % ... % @funindex commtest - % - % The above is not enough to reproduce the bug, but it gives the flavor. + % This is not enough to reproduce the bug, but it gives the flavor. % % Sample whatsit resulting: % .@write3{\entry{xyz}{@folio }{@code {xyz@endinput }}} @@ -4619,8 +4654,21 @@ end \definedummyword\verb \definedummyword\w \definedummyword\xref + % + % Consider: + % @macro mkind{arg1,arg2} + % @cindex \arg2\ + % @end macro + % @mkind{foo, bar} + % The space after the comma will end up in the temporary definition + % that we make for arg2 (see \parsemargdef ff.). We want all this to be + % expanded for the sake of the index, so we end up just seeing "bar". + \let\xeatspaces = \eatspaces } +% For testing: output @{ and @} in index sort strings as \{ and \}. +\newif\ifusebracesinindexes + % \indexnofonts is used when outputting the strings to sort the index % by, and when constructing control sequence names. It eliminates all % control sequences and just writes whatever the best ASCII sort string @@ -4649,11 +4697,16 @@ end % Unfortunately, texindex is not prepared to handle braces in the % content at all. So for index sorting, we map @{ and @} to strings % starting with |, since that ASCII character is between ASCII { and }. - \def\{{|a}% - \def\lbracechar{|a}% + \ifusebracesinindexes + \def\lbracechar{\lbracecmd}% + \def\rbracechar{\rbracecmd}% + \else + \def\lbracechar{|a}% + \def\rbracechar{|b}% + \fi + \let\{=\lbracechar + \let\}=\rbracechar % - \def\}{|b}% - \def\rbracechar{|b}% % % Non-English letters. \def\AA{AA}% @@ -5905,7 +5958,7 @@ end % % Now the second mark, after the heading break. No break points % between here and the heading. - \let\prevsectiondefs=\lastsectiondefs + \global\let\prevsectiondefs=\lastsectiondefs \domark % % Only insert the space after the number if we have a section number. @@ -6272,8 +6325,8 @@ end \catcode `\|=\other \catcode `\<=\other \catcode `\>=\other - \catcode`\`=\other - \catcode`\'=\other + \catcode `\`=\other + \catcode `\'=\other \escapechar=`\\ % % ' is active in math mode (mathcode"8000). So reset it, and all our @@ -6297,7 +6350,7 @@ end \let\/=\ptexslash \let\*=\ptexstar \let\t=\ptext - \expandafter \let\csname top\endcsname=\ptextop % outer + \expandafter \let\csname top\endcsname=\ptextop % we've made it outer \let\frenchspacing=\plainfrenchspacing % \def\endldots{\mathinner{\ldots\ldots\ldots\ldots}}% @@ -6381,8 +6434,6 @@ end % side, and for 6pt waste from % each corner char, and rule thickness \normbskip=\baselineskip \normpskip=\parskip \normlskip=\lineskip - % Flag to tell @lisp, etc., not to narrow margin. - \let\nonarrowing = t% % % If this cartouche directly follows a sectioning command, we need the % \parskip glue (backspaced over by default) or the cartouche can @@ -6549,9 +6600,13 @@ end % @raggedright does more-or-less normal line breaking but no right -% justification. From plain.tex. +% justification. From plain.tex. Don't stretch around special +% characters in urls in this environment, since the stretch at the right +% should be enough. \envdef\raggedright{% - \rightskip0pt plus2em \spaceskip.3333em \xspaceskip.5em\relax + \rightskip0pt plus2.4em \spaceskip.3333em \xspaceskip.5em\relax + \def\urefprestretchamount{0pt}% + \def\urefpoststretchamount{0pt}% } \let\Eraggedright\par @@ -7444,7 +7499,7 @@ end % Parse the optional {params} list. Set up \paramno and \paramlist % so \defmacro knows what to do. Define \macarg.BLAH for each BLAH -% in the params list to some hook where the argument si to be expanded. If +% in the params list to some hook where the argument is to be expanded. If % there are less than 10 arguments that hook is to be replaced by ##N where N % is the position in that list, that is to say the macro arguments are to be % defined `a la TeX in the macro body. @@ -8306,6 +8361,7 @@ end \gdef\footnote{% \let\indent=\ptexindent \let\noindent=\ptexnoindent + % \global\advance\footnoteno by \@ne \edef\thisfootno{$^{\the\footnoteno}$}% % @@ -8329,6 +8385,11 @@ end % \gdef\dofootnote{% \insert\footins\bgroup + % + % Nested footnotes are not supported in TeX, that would take a lot + % more work. (\startsavinginserts does not suffice.) + \let\footnote=\errfootnote + % % We want to typeset this text as a normal paragraph, even if the % footnote reference occurs in (for example) a display environment. % So reset some parameters. @@ -8366,13 +8427,19 @@ end } }%end \catcode `\@=11 +\def\errfootnote{% + \errhelp=\EMsimple + \errmessage{Nested footnotes not supported in texinfo.tex, + even though they work in makeinfo; sorry} +} + % In case a @footnote appears in a vbox, save the footnote text and create % the real \insert just after the vbox finished. Otherwise, the insertion % would be lost. % Similarly, if a @footnote appears inside an alignment, save the footnote % text to a box and make the \insert when a row of the table is finished. % And the same can be done for other insert classes. --kasal, 16nov03. - +% % Replace the \insert primitive by a cheating macro. % Deeper inside, just make sure that the saved insertions are not spilled % out prematurely. @@ -9940,11 +10007,9 @@ directory should work if nowhere else does.} \catcode`\"=\active \def\activedoublequote{{\tt\char34}} \let"=\activedoublequote -\catcode`\~=\active -\def~{{\tt\char126}} +\catcode`\~=\active \def\activetilde{{\tt\char126}} \let~ = \activetilde \chardef\hat=`\^ -\catcode`\^=\active -\def^{{\tt \hat}} +\catcode`\^=\active \def\activehat{{\tt \hat}} \let^ = \activehat \catcode`\_=\active \def_{\ifusingtt\normalunderscore\_} @@ -9954,16 +10019,26 @@ directory should work if nowhere else does.} \catcode`\|=\active \def|{{\tt\char124}} + \chardef \less=`\< -\catcode`\<=\active -\def<{{\tt \less}} +\catcode`\<=\active \def\activeless{{\tt \less}}\let< = \activeless \chardef \gtr=`\> -\catcode`\>=\active -\def>{{\tt \gtr}} -\catcode`\+=\active -\def+{{\tt \char 43}} -\catcode`\$=\active -\def${\ifusingit{{\sl\$}}\normaldollar}%$ font-lock fix +\catcode`\>=\active \def\activegtr{{\tt \gtr}}\let> = \activegtr +\catcode`\+=\active \def+{{\tt \char 43}} +\catcode`\$=\active \def${\ifusingit{{\sl\$}}\normaldollar}%$ font-lock fix + +% used for headline/footline in the output routine, in case the page +% breaks in the middle of an @tex block. +\def\texinfochars{% + \let< = \activeless + \let> = \activegtr + \let~ = \activetilde + \let^ = \activehat + \markupsetuplqdefault \markupsetuprqdefault + \let\b = \strong + \let\i = \smartitalic + % in principle, all other definitions in \tex have to be undone too. +} % If a .fmt file is being used, characters that might appear in a file % name cannot be active until we have parsed the command line. |