diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 3589 |
1 files changed, 2515 insertions, 1074 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 329718e7..47d2ba7a 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -1,4 +1,9 @@ \input texinfo @c -*-texinfo-*- +@ignore +TODO: + Globally add () after built in and awk function names. +DONE: +@end ignore @c %**start of header (This is for running Texinfo on a region.) @setfilename gawk.info @settitle The GNU Awk User's Guide @@ -28,7 +33,7 @@ @set TITLE GAWK: Effective AWK Programming @set SUBTITLE A User's Guide for GNU Awk -@set EDITION 3 +@set EDITION 4 @iftex @set DOCUMENT book @@ -37,6 +42,7 @@ @set SECTION section @set SUBSECTION subsection @set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} +@set COMMONEXT (c.e.) @end iftex @ifinfo @set DOCUMENT Info file @@ -45,6 +51,7 @@ @set SECTION minor node @set SUBSECTION node @set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) @end ifinfo @ifhtml @set DOCUMENT Web page @@ -53,6 +60,7 @@ @set SECTION section @set SUBSECTION subsection @set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) @end ifhtml @ifdocbook @set DOCUMENT book @@ -61,6 +69,7 @@ @set SECTION section @set SUBSECTION subsection @set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) @end ifdocbook @c some special symbols @@ -164,25 +173,6 @@ supports it in developing GNU and promoting software freedom.'' @page @vskip 0pt plus 1filll -@ignore -The programs and applications presented in this book have been -included for their instructional value. They have been tested with care -but are not guaranteed for any particular purpose. The publisher does not -offer any warranties or representations, nor does it accept any -liabilities with respect to the programs or applications. -So there. -@sp 2 -UNIX is a registered trademark of The Open Group in the United States and other countries. @* -Linux is a registered trademark of Linus Torvalds in the United States and other countries. @* -Microsoft, MS and MS-DOS are registered trademarks, and Windows is a -trademark of Microsoft Corporation in the United States and other -countries. @* -Atari, 520ST, 1040ST, TT, STE, Mega and Falcon are registered trademarks -or trademarks of Atari Corporation. @* -Once upon a time, -DEC, Digital, OpenVMS, ULTRIX and VMS were trademarks of Digital Equipment -Corporation. Now they belong to Hewlett-Packard Corporation. @* -@end ignore ``To boldly go where no man has gone before'' is a Registered Trademark of Paramount Pictures Corporation. @* @c sorry, i couldn't resist @@ -202,8 +192,6 @@ URL: @uref{http://www.gnu.org/} @* ISBN 1-882114-28-0 @* @sp 2 @insertcopying -@sp 2 -Cover art by Etienne Suvasa. @end titlepage @c Thanks to Bob Chassell for directions on doing dedications. @@ -279,6 +267,7 @@ particular records in a file and perform operations upon them. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. +* Debugger:: The @code{dgawk} debugger. * Language History:: The evolution of the @command{awk} language. * Installation:: Installing @command{gawk} under various @@ -318,7 +307,7 @@ particular records in a file and perform operations upon them. * Comments:: Adding documentation to @command{gawk} programs. * Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in MS-DOS Batch Files. +* DOS Quoting:: Quoting in Windows Batch Files. * Sample Data Files:: Sample data files for use in the @command{awk} programs illustrated in this @value{DOCUMENT}. @@ -345,6 +334,7 @@ particular records in a file and perform operations upon them. * Nonconstant Fields:: Nonconstant Field Numbers. * Changing Fields:: Changing the Contents of a Field. * Field Separators:: The field separator and how to change it. +* Default Field Splitting:: How fields are normally separated. * Regexp Field Splitting:: Using regexps as the field separator. * Single Character Fields:: Making each character a separate field. * Command Line Field Separator:: Setting @code{FS} from the command-line. @@ -484,16 +474,18 @@ particular records in a file and perform operations upon them. @command{awk}. * Multi-scanning:: Scanning multidimensional arrays. * Array Sorting:: Sorting array values and indices. +* Arrays of Arrays:: True multidimensional arrays. * Built-in:: Summarizes the built-in functions. * Calling Built-in:: How to call built-in functions. * Numeric Functions:: Functions that work with numbers, including - @code{int}, @code{sin} and @code{rand}. + @code{int()}, @code{sin()} and + @code{rand()}. * String Functions:: Functions for string manipulation, such as - @code{split}, @code{match} and - @code{sprintf}. + @code{split()}, @code{match()} and + @code{sprintf()}. * Gory Details:: More than you want to know about @samp{\} - and @samp{&} with @code{sub}, @code{gsub}, - and @code{gensub}. + and @samp{&} with @code{sub()}, + @code{gsub()}, and @code{gensub()}. * I/O Functions:: Functions for files and shell commands. * Time Functions:: Functions for dealing with timestamps. * Bitwise Functions:: Functions for bitwise operations. @@ -521,7 +513,6 @@ particular records in a file and perform operations upon them. process. * TCP/IP Networking:: Using @command{gawk} for network programming. -* Portal Files:: Using @command{gawk} with BSD portals. * Profiling:: Profiling your @command{awk} programs. * Command Line:: How to run @command{awk}. * Options:: Command-line options and their meanings. @@ -529,6 +520,7 @@ particular records in a file and perform operations upon them. * AWKPATH Variable:: Searching directories for @command{awk} programs. * Exit Status:: @command{gawk}'s exit status. +* Include Files:: Including other files into your program. * Obsolete:: Obsolete Options and/or features. * Undocumented:: Undocumented Options and Features. * Known Bugs:: Known Bugs in @command{gawk}. @@ -538,10 +530,10 @@ particular records in a file and perform operations upon them. * Nextfile Function:: Two implementations of a @code{nextfile} function. * Strtonum Function:: A replacement for the built-in - @code{strtonum} function. + @code{strtonum()} function. * Assert Function:: A function for assertions in @command{awk} programs. -* Round Function:: A function for rounding if @code{sprintf} +* Round Function:: A function for rounding if @code{sprintf()} does not do it correctly. * Cliff Random Function:: The Cliff Random Number Generator. * Ordinal Functions:: Functions for using characters as numbers @@ -585,6 +577,23 @@ particular records in a file and perform operations upon them. files. * Signature Program:: People do amazing things with too much time on their hands. +* Debugging:: Introduction to @command{dgawk}. +* Debugging Concepts:: Debugging In General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample dgawk session:: Sample @command{dgawk} session. +* dgawk invocation:: @command{dgawk} Invocation. +* Finding The Bug:: Finding The Bug. +* List of Debugger Commands:: Main @command{dgawk} Commands. +* Breakpoint Control:: Control of breakpoints. +* Dgawk Execution Control:: Control of execution. +* Viewing And Changing Data:: Viewing and changing data. +* Dgawk Stack:: Dealing with the stack. +* Dgawk Info:: Obtaining information about the program and + the debugger state. +* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +* Readline Support:: Readline Support. +* Dgawk Limitations:: Limitations and future plans. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -683,7 +692,9 @@ particular records in a file and perform operations upon them. @node Foreword @unnumbered Foreword -Arnold Robbins and I are good friends. We were introduced 11 years ago +Arnold Robbins and I are good friends. We were introduced +@c 11 years ago +in 1990 by circumstances---and our favorite programming language, AWK. The circumstances started a couple of years earlier. I was working at a new job and noticed an unplugged @@ -806,16 +817,20 @@ when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. -Writing single-use programs for these tasks in languages such as C, C++, or Pascal -is time-consuming and inconvenient. +Writing single-use programs for these tasks in languages such as C, C++, +or Java is time-consuming and inconvenient. Such jobs are often easier with @command{awk}. The @command{awk} utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. The GNU implementation of @command{awk} is called @command{gawk}; it is fully -compatible with the System V Release 4 version of -@command{awk}. @command{gawk} is also compatible with the POSIX -specification of the @command{awk} language. This means that all +compatible with +the POSIX@footnote{The 2008 POSIX standard can be found online at +@url{http://www.opengroup.org/onlinepubs/9699919799/}.} +specification of the @command{awk} language +and with the Unix version of @command{awk} maintained +by Brian Kernighan. +This means that all properly written @command{awk} programs should work with @command{gawk}. Thus, we usually don't distinguish between @command{gawk} and other @command{awk} implementations. @@ -878,11 +893,14 @@ different computing environments. This @value{DOCUMENT}, while describing the @command{awk} language in general, also describes the particular implementation of @command{awk} called @command{gawk} (which stands for ``GNU awk''). @command{gawk} runs on a broad range of Unix systems, -ranging from 80386 PC-based computers up through large-scale systems, +ranging from Intel@registeredsymbol{}-architecture PC-based computers +up through large-scale systems, such as Crays. @command{gawk} has also been ported to Mac OS X, -MS-DOS, Microsoft Windows (all versions) and OS/2 PCs, Atari -@c and Amiga -microcomputers, BeOS, Tandem D20, and VMS. +Microsoft Windows (all versions) and OS/2 PCs, +and VMS. +(Other systems to which @command{gawk} was once ported +are no longer supported and the code for those systems +has been removed.) @menu * History:: The history of @command{gawk} and @@ -927,8 +945,8 @@ In 1985, a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became widely available with Unix System V -Release 3.1 (SVR3.1). -The version in SVR4 added some new features and cleaned +Release 3.1 (1987) +The version in System V Release 4 (1989) added some new features and cleaned up the behavior in some of the ``dark corners'' of the language. The specification for @command{awk} in the POSIX Command Language and Utilities standard further clarified the language. @@ -943,7 +961,7 @@ Jay Fenlason completed it, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988 and 1989, David Trueman, with help from me, thoroughly reworked @command{gawk} for compatibility with the newer @command{awk}. -Circa 1995, I became the primary maintainer. +Circa 1994, I became the primary maintainer. Current development focuses on bug fixes, performance improvements, standards compliance, and occasionally, new features. @@ -969,12 +987,11 @@ The language described in this @value{DOCUMENT} is often referred to as ``new @command{awk}'' (@command{nawk}). @cindex @command{awk}, versions of -Because of this, many systems have multiple +Because of this, there are systems with multiple versions of @command{awk}. Some systems have an @command{awk} utility that implements the original version of the @command{awk} language and a @command{nawk} utility -for the new -version. +for the new version. Others have an @command{oawk} version for the ``old @command{awk}'' language and plain @command{awk} for the new one. Still others only have one version, which is usually the new one.@footnote{Often, these systems @@ -984,7 +1001,7 @@ use @command{gawk} for their @command{awk} implementation!} @cindex @command{oawk} utility All in all, this makes it difficult for you to know which version of @command{awk} you should run when writing your programs. The best advice -I can give here is to check your local documentation. Look for @command{awk}, +we can give here is to check your local documentation. Look for @command{awk}, @command{oawk}, and @command{nawk}, as well as for @command{gawk}. It is likely that you already have some version of new @command{awk} on your system, which is what @@ -1005,14 +1022,14 @@ use to tell this program what to do. When we need to be careful, we call the language ``the @command{awk} language,'' and the program ``the @command{awk} utility.'' This @value{DOCUMENT} explains -both the @command{awk} language and how to run the @command{awk} utility. +both how to write program in the @command{awk} language and how to run the @command{awk} utility. The term @dfn{@command{awk} program} refers to a program written by you in the @command{awk} programming language. @cindex @command{gawk}, @command{awk} and @cindex @command{awk}, @command{gawk} and @cindex POSIX @command{awk} -Primarily, this @value{DOCUMENT} explains the features of @command{awk}, +Primarily, this @value{DOCUMENT} explains the features of @command{awk} as defined in the POSIX standard. It does so in the context of the @command{gawk} implementation. While doing so, it also attempts to describe important differences between @command{gawk} @@ -1026,7 +1043,7 @@ the POSIX standard for @command{awk} are noted. This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross-references; they are for the -expert user and for the online Info version of the document. +expert user and for the online Info and HTML versions of the document. @end ifnotinfo There are @@ -1062,6 +1079,7 @@ describes how @command{awk} reads your data. It introduces the concepts of records and fields, as well as the @code{getline} command. I/O redirection is first described here. +Network I/O is also briefly introduced here. @ref{Printing}, describes how @command{awk} programs can produce output with @@ -1108,17 +1126,21 @@ provide many sample @command{awk} programs. Reading them allows you to see @command{awk} solving real problems. +@ref{Debugger}, describes the @command{awk} debugger, +@command{dgawk}. + @ref{Language History}, describes how the @command{awk} language has evolved since -first release to present. It also describes how @command{gawk} +its first release to present. It also describes how @command{gawk} has acquired features over time. @ref{Installation}, describes how to get @command{gawk}, how to compile it -under Unix, and how to compile and use it on different -non-Unix systems. It also describes how to report bugs -in @command{gawk} and where to get three other freely -available implementations of @command{awk}. +on POSIX-compatible systems, +and how to compile and use it on different +non-POSIX systems. It also describes how to report bugs +in @command{gawk} and where to get other freely +available @command{awk} implementations. @ref{Notes}, describes how to disable @command{gawk}'s extensions, as @@ -1161,21 +1183,24 @@ This @value{SECTION} briefly documents the typographical conventions used in Tex Examples you would type at the command-line are preceded by the common shell primary and secondary prompts, @samp{$} and @samp{>}. +Input that you type is shown @kbd{like this}. Output from the command is preceded by the glyph ``@print{}''. This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph ``@error{}''. For example: @example -$ echo hi on stdout +$ @kbd{echo hi on stdout} @print{} hi on stdout -$ echo hello on stderr 1>&2 +$ @kbd{echo hello on stderr 1>&2} @error{} hello on stderr @end example @ifnotinfo In the text, command names appear in @code{this font}, while code segments -appear in the same font and quoted, @samp{like this}. Some things are +appear in the same font and quoted, @samp{like this}. +Options look like this: @option{-f}. +Some things are emphasized @emph{like this}, and if a point needs to be made strongly, it is done @strong{like this}. The first occurrence of a new term is usually its @dfn{definition} and appears in the same @@ -1201,7 +1226,7 @@ Brian Kernighan @cindex d.c., See dark corner @cindex dark corner -Until the POSIX standard (and @cite{The Gawk Manual}), +Until the POSIX standard (and @cite{@value{TITLE}}), many features of @command{awk} were either poorly documented or not documented at all. Descriptions of such features (often called ``dark corners'') are noted in this @value{DOCUMENT} with @@ -1216,7 +1241,11 @@ They also appear in the index under the heading ``dark corner.'' As noted by the opening quote, though, any coverage of dark corners -is, by definition, something that is incomplete. +is, by definition, incomplete. + +Extensions to the standard @command{awk} language are marked +``@value{COMMONEXT},'' and listed in the index under ``common extensions'' +and ``extensions, common.'' @node Manual History @unnumberedsec The GNU Project and This Book @@ -1256,7 +1285,7 @@ A shell, an editor (Emacs), highly portable optimizing C, C++, and Objective-C compilers, a symbolic debugger and dozens of large and small utilities (such as @command{gawk}), have all been completed and are freely available. The GNU operating -system kernel (the HURD), has been released but is still in an early +system kernel (the HURD), has been released but remains in an early stage of development. @cindex Linux @@ -1265,18 +1294,20 @@ stage of development. @cindex Alpha (DEC) Until the GNU operating system is more fully developed, you should consider using GNU/Linux, a freely distributable, Unix-like operating -system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other +system for Intel@registeredsymbol{}, +Power Architecture, +Sun SPARC, IBM S/390, and other systems.@footnote{The terminology ``GNU/Linux'' is explained in the @ref{Glossary}.} -There are -many books on GNU/Linux. One that is freely available is @cite{Linux -Installation and Getting Started}, by Matt Welsh. -Many GNU/Linux distributions are often available in computer stores or -bundled on CD-ROMs with books about Linux. -(There are three other freely available, Unix-like operating systems for -80386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the -4.4-Lite Berkeley Software Distribution, and they use recent versions -of @command{gawk} for their versions of @command{awk}.) +Many GNU/Linux distributions are +available for download from the Internet. + +(There are numerous other freely available, Unix-like operating systems +based on the +Berkeley Software Distribution, and they use recent versions +of @command{gawk} for their versions of @command{awk}. +NetBSD, FreeBSD and OpenBSD are three of the most popular ones, but there +are others.) @ifnotinfo The @value{DOCUMENT} you are reading is actually free---at least, the @@ -1285,11 +1316,6 @@ source code for the @value{DOCUMENT} comes with @command{gawk}; anyone may take this @value{DOCUMENT} to a copying machine and make as many copies as they like. (Take a moment to check the Free Documentation License in @ref{GNU Free Documentation License}.) - -Although you could just print it out yourself, bound books are much -easier to read and use. Furthermore, -the proceeds from sales of this book go back to the FSF -to help fund development of more free software. @end ifnotinfo @ignore @@ -1351,25 +1377,18 @@ In 1996, Edition 1.0 was released with @command{gawk} 3.0.0. The FSF published the first two editions under the title @cite{The GNU Awk User's Guide}. -This edition maintains the basic structure of Edition 1.0, -but with significant additional material, reflecting the host of new features -in @command{gawk} @value{PVERSION} @value{VERSION}. -Of particular note is -@ref{Array Sorting}, -as well as -@ref{Bitwise Functions}, -@ref{Internationalization}, -and also -@ref{Advanced Features}, -and -@ref{Dynamic Extensions}. +This edition maintains the basic structure of Edition 1.0. +For Edition 4.0, the content has been thoroughly reviewed +and updated. All references to versions prior to 4.0 have been +removed. +Of significant note for this edition is @ref{Debugger}. @cite{@value{TITLE}} will undoubtedly continue to evolve. An electronic version comes with the @command{gawk} distribution from the FSF. If you find an error in this @value{DOCUMENT}, please report it! @xref{Bugs}, for information on submitting -problem reports electronically, or write to me in care of the publisher. +problem reports electronically. @node How To Contribute @unnumberedsec How to Contribute @@ -1416,9 +1435,12 @@ I would like to acknowledge Richard M.@: Stallman, for his vision of a better world and for his courage in founding the FSF and starting the GNU Project. +Earlier editins of this @value{DOCUMENT} had the following acknowledgements: + +@quotation The following people (in alphabetical order) provided helpful comments on various -versions of this book, up to and including this edition. +versions of this book, Rick Adams, Nelson H.F. Beebe, Karl Berry, @@ -1477,23 +1499,23 @@ The intrepid members of the GNITS mailing list, and most notably Ulrich Drepper, provided invaluable help and feedback for the design of the internationalization features. -@c @cindex Brown, Martin -@c @cindex Hasegawa, Isamu -@c @cindex Rommel, Kai Uwe -@c Martin Brown, -@c Isamu Hasegawa, -@c Kai Uwe Rommel, +Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed +significant editorial help for this @value{DOCUMENT} for the +3.1 release of @command{gawk}. +@end quotation @cindex Beebe, Nelson @cindex Buening, Andreas @cindex Colombo, Antonio +@cindex Davies, Stephen @cindex Deifik, Scott @cindex DuBois, John @cindex Hankerson, Darrel +@cindex Haque, John @cindex Jaegermann, Michal @cindex Kahrs, J@"urgen @cindex Kasal, Stepan -@cindex Pitts, Davi +@cindex Pitts, Dave @cindex Rankin, Pat @cindex Schorr, Andrew @cindex Vinschen, Corinna @@ -1502,6 +1524,7 @@ internationalization features. Nelson Beebe, Andreas Buening, Antonio Colombo, +Stephen Davies, Scott Deifik, John H. DuBois III, Darrel Hankerson, @@ -1521,17 +1544,19 @@ help, @command{gawk} would not be nearly the fine program it is today. It has been and continues to be a pleasure working with this team of fine people. +John Haque contributed the modifications to convert @command{gawk} +into a byte-code interpreter, including the debugger. Stephen Davies +contributed to the effort to bring the byte-code changes into the mainstream +code base. + @cindex Kernighan, Brian -David and I would like to thank Brian Kernighan of Bell Laboratories for +I would like to thank Brian Kernighan of Bell Laboratories for invaluable assistance during the testing and debugging of @command{gawk}, and for +ongoing help in clarifying numerous points about the language. We could not have done nearly as good a job on either @command{gawk} or its documentation without his help. -Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed -significant editorial help for this @value{DOCUMENT} for the -3.1 release of @command{gawk}. - @cindex Robbins, Miriam @cindex Robbins, Jean @cindex Robbins, Harry @@ -1549,7 +1574,7 @@ take advantage of those opportunities. Arnold Robbins @* Nof Ayalon @* ISRAEL @* -February, 2010 +December, 2010 @ignore @c Try this @@ -1741,23 +1766,6 @@ later in this @value{CHAPTER}, presents several short, self-contained programs. -@c Removed for gawk 3.1, doesn't really add anything here. -@ignore -As an interesting side point, the command - -@example -awk '/foo/' @var{files} @dots{} -@end example - -@noindent -is essentially the same as - -@cindex @command{egrep} utility -@example -egrep foo @var{files} @dots{} -@end example -@end ignore - @node Read Terminal @subsection Running @command{awk} Without Input Files @@ -1776,7 +1784,7 @@ awk '@var{program}' which usually means whatever you type on the terminal. This continues until you indicate end-of-file by typing @kbd{@value{CTL}-d}. (On other operating systems, the end-of-file character may be different. -For example, on OS/2 and MS-DOS, it is @kbd{@value{CTL}-z}.) +For example, on OS/2, it is @kbd{@value{CTL}-z}.) @cindex files, input, See input files @cindex input files, running @command{awk} without @@ -1784,7 +1792,7 @@ For example, on OS/2 and MS-DOS, it is @kbd{@value{CTL}-z}.) As an example, the following program prints a friendly piece of advice (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}), to keep you from worrying about the complexities of computer -programming@footnote{If you use @command{bash} as your shell, you should execute +programming@footnote{If you use Bash as your shell, you should execute the command @samp{set +H} before running this program interactively, to disable the @command{csh}-style command history, which treats @samp{!} as a special character. We recommend putting this command into @@ -1792,7 +1800,7 @@ your personal startup file.} (@code{BEGIN} is a feature we haven't discussed yet): @example -$ awk "BEGIN @{ print \"Don't Panic!\" @}" +$ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} @print{} Don't Panic! @end example @@ -1813,14 +1821,14 @@ emulates the @command{cat} utility; it copies whatever you type on the keyboard to its standard output (why this works is explained shortly). @example -$ awk '@{ print @}' -Now is the time for all good men +$ @kbd{awk '@{ print @}'} +@kbd{Now is the time for all good men} @print{} Now is the time for all good men -to come to the aid of their country. +@kbd{to come to the aid of their country.} @print{} to come to the aid of their country. -Four score and seven years ago, ... +@kbd{Four score and seven years ago, ...} @print{} Four score and seven years ago, ... -What, me worry? +@kbd{What, me worry?} @print{} What, me worry? @kbd{@value{CTL}-d} @end example @@ -1894,10 +1902,8 @@ affect the execution of the @command{awk} program but it does make Once you have learned @command{awk}, you may want to write self-contained @command{awk} scripts, using the @samp{#!} script mechanism. You can do -this on many Unix systems@footnote{The @samp{#!} mechanism works on -Linux systems, -systems derived from the 4.4-Lite Berkeley Software Distribution, -and most commercial Unix systems.} as well as on the GNU system. +this on many systems.@footnote{The @samp{#!} mechanism works on +GNU/Linux systems, BSD-based systems and commercial Unix systems.} For example, you could update the file @file{advice} to look like this: @example @@ -1920,8 +1926,8 @@ or both.} as if you had typed @samp{awk -f advice}: @example -$ chmod +x advice -$ advice +$ @kbd{chmod +x advice} +$ @kbd{advice} @print{} Don't Panic! @end example @@ -2003,7 +2009,7 @@ runs, it will probably print strange messages about syntax errors. For example, look at the following: @example -$ awk '@{ print "hello" @} # let's be cute' +$ @kbd{awk '@{ print "hello" @} # let's be cute'} > @end example @@ -2013,8 +2019,8 @@ It therefore prompts with the secondary prompt, waiting for more input. With Unix @command{awk}, closing the quoted string produces this result: @example -$ awk '@{ print "hello" @} # let's be cute' -> ' +$ @kbd{awk '@{ print "hello" @} # let's be cute'} +> @kbd{'} @error{} awk: can't open file be @error{} source line number 1 @end example @@ -2030,7 +2036,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules. @cindex quoting, rules for @menu -* DOS Quoting:: Quoting in MS-DOS Batch Files. +* DOS Quoting:: Quoting in Windows Batch Files. @end menu For short to medium length @command{awk} programs, it is most convenient @@ -2047,7 +2053,7 @@ awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{} @cindex Bourne shell, quoting rules for Once you are working with the shell, it is helpful to have a basic knowledge of shell quoting rules. The following rules apply only to -POSIX-compliant, Bourne-style shells (such as @command{bash}, the GNU Bourne-Again +POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again Shell). If you use @command{csh}, you're on your own. @itemize @bullet @@ -2131,7 +2137,7 @@ Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this: @example -$ awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}' +$ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'} @print{} Here is a single quote <'> @end example @@ -2142,7 +2148,7 @@ third are single-quoted, the second is double-quoted. This can be ``simplified'' to: @example -$ awk 'BEGIN @{ print "Here is a single quote <'\''>" @}' +$ @kbd{awk 'BEGIN @{ print "Here is a single quote <'\''>" @}'} @print{} Here is a single quote <'> @end example @@ -2153,7 +2159,7 @@ Another option is to use double quotes, escaping the embedded, @command{awk}-lev double quotes: @example -$ awk "BEGIN @{ print \"Here is a single quote <'>\" @}" +$ @kbd{awk "BEGIN @{ print \"Here is a single quote <'>\" @}"} @print{} Here is a single quote <'> @end example @@ -2161,15 +2167,17 @@ $ awk "BEGIN @{ print \"Here is a single quote <'>\" @}" @c ENDOFRANGE sq1x @c ENDOFRANGE qs2x This option is also painful, because double quotes, backslashes, and dollar signs -are very common in @command{awk} programs. +are very common in more advanced @command{awk} programs. -A third option is to use the octal escape sequence equivalents for the +A third option is to use the octal escape sequence equivalents +(@pxref{Escape Sequences}) +for the single- and double-quote characters, like so: @example -$ awk 'BEGIN @{ print "Here is a single quote <\47>" @}' +$ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'} @print{} Here is a single quote <'> -$ awk 'BEGIN @{ print "Here is a double quote <\42>" @}' +$ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'} @print{} Here is a double quote <"> @end example @@ -2189,7 +2197,7 @@ program, it is probably best to move it into a separate file, where the shell won't be part of the picture, and you can say what you mean. @node DOS Quoting -@subsubsection Quoting in MS-DOS Batch Files +@subsubsection Quoting in Windows Batch Files @ignore Date: Wed, 21 May 2008 09:58:43 +0200 (CEST) @@ -2223,7 +2231,7 @@ Although this @value{DOCUMENT} generally only worries about POSIX systems and th POSIX shell, the following issue arises often enough for many users that it is worth addressing. -Systems providing an MS-DOS compatible ``shell'' use the double-quote +The ``shell'' on Microsoft Windows systems use the double-quote character for quoting, and make it difficult or impossible to include an escaped double-quote character in a command-line script. The following example, courtesy of Jeroen Brink, shows @@ -2454,9 +2462,11 @@ ls -l @var{files} | awk '@{ x += $5 @} Print the total number of kilobytes used by @var{files}: @c Don't use \ continuation, not discussed yet +@c Remember that awk does floating point division, +@c no need for (x+1023) / 1024 @example ls -l @var{files} | awk '@{ x += $5 @} - END @{ print "total K-bytes: " (x + 1023)/1024 @}' + END @{ print "total K-bytes:", x /1024 @}' @end example @item @@ -2552,24 +2562,13 @@ features that haven't been covered yet, so don't worry if you don't understand all the details: @example -ls -l | awk '$6 == "Nov" @{ sum += $5 @} - END @{ print sum @}' +LC_ALL=C ls -l | awk '$6 == "Nov" @{ sum += $5 @} + END @{ print sum @}' @end example -@cindex @command{csh} utility, backslash continuation and @cindex @command{ls} utility -@cindex backslash (@code{\}), continuing lines and, in @command{csh} -@cindex @code{\} (backslash), continuing lines and, in @command{csh} This command prints the total number of bytes in all the files in the current directory that were last modified in November (of any year). -@footnote{In the C shell (@command{csh}), you need to type -a semicolon and then a backslash at the end of the first line; see -@ref{Statements/Lines}, for an -explanation. In a POSIX-compliant shell, such as the Bourne -shell or @command{bash}, you can type the example as shown. If the command -@samp{echo $path} produces an empty output line, you are most likely -using a POSIX-compliant shell. Otherwise, you are probably using the -C shell or a shell derived from it.} The @w{@samp{ls -l}} part of this example is a system command that gives you a listing of the files in a directory, including each file's size and the date the file was last modified. Its output looks like this: @@ -2593,8 +2592,8 @@ the file. The fourth field identifies the group of the file. The fifth field contains the size of the file in bytes. The sixth, seventh, and eighth fields contain the month, day, and time, respectively, that the file was last modified. Finally, the ninth field -contains the name of the file.@footnote{On some -very old systems, you may need to use @samp{ls -lg} to get this output.} +contains the @value{FN}.@footnote{The @samp{LC_ALL=C} is +needed to produce traditional-style output from @command{ls}.} @c @cindex automatic initialization @cindex initialization, automatic @@ -2665,8 +2664,8 @@ awk '/This regular expression is too long, so continue it\ @noindent @cindex portability, backslash continuation and -We have generally not used backslash continuation in the sample programs -in this @value{DOCUMENT}. In @command{gawk}, there is no limit on the +We have generally not used backslash continuation in our sample programs. +@command{gawk} places no limit on the length of a line, so backslash continuation is never strictly necessary; it just makes programs more readable. For this same reason, as well as for clarity, we have kept most statements short in the sample programs @@ -2687,16 +2686,16 @@ lines in the middle of a regular expression or a string. @strong{Caution:} @emph{Backslash continuation does not work as described with the C shell.} It works for @command{awk} programs in files and for one-shot programs, @emph{provided} you are using a POSIX-compliant -shell, such as the Unix Bourne shell or @command{bash}. But the C shell behaves +shell, such as the Unix Bourne shell or Bash. But the C shell behaves differently! There, you must use two backslashes in a row, followed by a newline. Note also that when using the C shell, @emph{every} newline in your awk program must be escaped with a backslash. To illustrate: @example -% awk 'BEGIN @{ \ -? print \\ -? "hello, world" \ -? @}' +% @kbd{awk 'BEGIN @{ \} +? @kbd{ print \\} +? @kbd{ "hello, world" \} +? @kbd{@}'} @print{} hello, world @end example @@ -2774,10 +2773,11 @@ as well to control how @command{awk} processes your data. In addition, @command{awk} provides a number of built-in functions for doing common computational and string-related operations. @command{gawk} provides built-in functions for working with timestamps, -performing bit manipulation, and for runtime string translation. +performing bit manipulation, for runtime string translation, +and array sorting. As we develop our presentation of the @command{awk} language, we introduce -most of the variables and many of the functions. They are defined +most of the variables and many of the functions. They are described systematically in @ref{Built-in Variables}, and @ref{Built-in}. @@ -2796,7 +2796,7 @@ from the output of other utility programs like @command{ls}. Programs written with @command{awk} are usually much smaller than they would be in other languages. This makes @command{awk} programs easy to compose and -use. Often, @command{awk} programs can be quickly composed at your terminal, +use. Often, @command{awk} programs can be quickly composed at your keyboard, used once, and thrown away. Because @command{awk} programs are interpreted, you can avoid the (usually lengthy) compilation part of the typical edit-compile-test-debug cycle of software development. @@ -2804,8 +2804,9 @@ edit-compile-test-debug cycle of software development. Complex programs have been written in @command{awk}, including a complete retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for more information), and a microcode assembler for a special-purpose Prolog -computer. More recently, @command{gawk} was used for writing a -@uref{http://www.awk-scripting.de/cgi-bin/wiki.cgi/yawk/, a Wiki clone}. +computer. +@c More recently, @command{gawk} was used for writing a +@c @uref{http://www.awk-scripting.de/cgi-bin/wiki.cgi/yawk/, a Wiki clone}. While the original @command{awk}'s capabilities were strained by tasks of such complexity, modern versions are more capable. Even the Bell Labs version of @command{awk} has fewer predefined limits, and those @@ -2849,7 +2850,7 @@ kinds of regexps let you specify more complicated classes of strings. @ifnotinfo Initially, the examples in this @value{CHAPTER} are simple. As we explain more about how -regular expressions work, we will present more complicated instances. +regular expressions work, we present more complicated instances. @end ifnotinfo @menu @@ -2876,7 +2877,7 @@ following prints the second field of each record that contains the string @samp{foo} anywhere in it: @example -$ awk '/foo/ @{ print $2 @}' BBS-list +$ @kbd{awk '/foo/ @{ print $2 @}' BBS-list} @print{} 555-1234 @print{} 555-6699 @print{} 555-6480 @@ -2887,7 +2888,7 @@ $ awk '/foo/ @{ print $2 @}' BBS-list @cindex operators, string-matching @c @cindex operators, @code{~} @cindex string-matching operators -@code{~} (tilde), @code{~} operator +@cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator @@ -2918,7 +2919,7 @@ all input records with the uppercase letter @samp{J} somewhere in the first field: @example -$ awk '$1 ~ /J/' inventory-shipped +$ @kbd{awk '$1 ~ /J/' inventory-shipped} @print{} Jan 13 25 15 115 @print{} Jun 31 42 75 492 @print{} Jul 24 34 67 436 @@ -2974,7 +2975,7 @@ must use @samp{\"} to represent an actual double-quote character as a part of the string. For example: @example -$ awk 'BEGIN @{ print "He said \"hi!\" to her." @}' +$ @kbd{awk 'BEGIN @{ print "He said \"hi!\" to her." @}'} @print{} He said "hi!" to her. @end example @@ -3046,12 +3047,15 @@ between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @c @cindex @command{awk} language, POSIX version @cindex @code{\} (backslash), @code{\x} escape sequence @cindex backslash (@code{\}), @code{\x} escape sequence +@cindex common extensions, @code{\x} escape sequence +@cindex extensions, common@comma{} @code{\x} escape sequence @item \x@var{hh}@dots{} The hexadecimal value @var{hh}, where @var{hh} stands for a sequence of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} or @samp{a}--@samp{f}). Like the same construct in ISO C, the escape sequence continues until the first nonhexadecimal -digit is seen. However, using more than two hexadecimal digits produces +digit is seen. @value{COMMONEXT} +However, using more than two hexadecimal digits produces undefined results. (The @samp{\x} escape sequence is not allowed in POSIX @command{awk}.) @@ -3059,7 +3063,7 @@ POSIX @command{awk}.) @cindex backslash (@code{\}), @code{\/} escape sequence @item \/ A literal slash (necessary for regexp constants only). -This expression is used when you want to write a regexp +This sequence is used when you want to write a regexp constant that contains a slash. Because the regexp is delimited by slashes, you need to escape the slash that is part of the pattern, in order to tell @command{awk} to keep processing the rest of the regexp. @@ -3068,7 +3072,7 @@ in order to tell @command{awk} to keep processing the rest of the regexp. @cindex backslash (@code{\}), @code{\"} escape sequence @item \" A literal double quote (necessary for string constants only). -This expression is used when you want to write a string +This sequence is used when you want to write a string constant that contains a double quote. Because the string is delimited by double quotes, you need to escape the quote that is part of the string, in order to tell @command{awk} to keep processing the rest of the string. @@ -3132,7 +3136,7 @@ For example, @code{"a\qc"} is the same as @code{"aqc"}. @command{gawk} warns you about it.) Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars surrounded by whitespace as the field separator. There should be -two backslashes in the string @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) +two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) @c I did this! This is why I added the warning. @cindex @command{gawk}, escape sequences @@ -3226,7 +3230,7 @@ if ("line1\nLINE 2" ~ /1$/) @dots{} @cindex @code{.} (period) @cindex period (@code{.}) -@item . +@item . @asis{(period)} This matches any single character, @emph{including} the newline character. For example, @samp{.P} matches any single character followed by a @samp{P} in a string. Using @@ -3362,8 +3366,7 @@ constants, @command{gawk} did @emph{not} match interval expressions in regexps. -However, -beginning with version 3.2 @strong{(FIXME: version)} +However, beginning with @value{PVERSION} 4.0, @command{gawk} does match interval expressions by default. This is because compatibility with POSIX has become more important to most @command{gawk} users than compatibility with @@ -3393,7 +3396,7 @@ For example, @samp{/+/} matches a literal plus sign. However, many other versio If @command{gawk} is in compatibility mode (@pxref{Options}), -POSIX character classes and interval expressions are not available in +interval expressions are not available in regular expressions. @c ENDOFRANGE regexpo @@ -3407,14 +3410,12 @@ regular expressions. Within a character list, a @dfn{range expression} consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's -collating sequence and character set. For example, in the default C -locale, @samp{[a-dx-z]} is equivalent to @samp{[abcdxyz]}. Many locales -sort characters in dictionary order, and in these locales, -@samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]}; instead it -might be equivalent to @samp{[aBbCcDdxXyYz]}, for example. To obtain -the traditional interpretation of bracket expressions, you can use the C -locale by setting the @env{LC_ALL} environment variable to the value -@samp{C}. +collating sequence and character set. +For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}. + +Unfortunately, providing simple character ranges such as @samp{[a-z]} +usually does not work like you might expect, due to locale-related issues. +This is discussed more fully, in @ref{Locales}. @cindex @code{\} (backslash), in character lists @cindex backslash (@code{\}), in character lists @@ -3446,7 +3447,7 @@ traditional @command{egrep} utility. @cindex character lists, character classes @cindex POSIX @command{awk}, character lists and, character classes -@dfn{Character classes} are a new feature introduced in the POSIX standard. +@dfn{Character classes} are a feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but the actual characters can vary from country to country and/or @@ -3691,11 +3692,11 @@ are allowed. @item @code{--traditional} Traditional Unix @command{awk} regexps are matched. The GNU operators -are not special, interval expressions are not available, nor -are the POSIX character classes (@code{[[:alnum:]]}, etc.). +are not special, and interval expressions are not available. +The POSIX character classes (@code{[[:alnum:]]}, etc.) are supported, +as modern Unix @command{awk} does support them. Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters. -Also, @command{gawk} silently skips directories named on the command line. @item @code{--re-interval} Allow interval expressions in regexps, if @option{--traditional} @@ -3724,7 +3725,7 @@ to read. There are two alternatives that you might prefer. One way to perform a case-insensitive match at a particular point in the program is to convert the data to a single case, using the -@code{tolower} or @code{toupper} built-in string functions (which we +@code{tolower()} or @code{toupper()} built-in string functions (which we haven't discussed yet; @pxref{String Functions}). For example: @@ -3772,7 +3773,7 @@ that it is possible, using something like and @samp{IGNORECASE = 0 || /foobar/ @{ @dots{} @}}. However, this is somewhat obscure and we don't recommend it.} -To do this, use either character lists or @code{tolower}. However, one +To do this, use either character lists or @code{tolower()}. However, one thing you can do with @code{IGNORECASE} only is dynamically turn case-sensitivity on or off for all the rules at once. @@ -3782,24 +3783,22 @@ case-sensitivity on or off for all the rules at once. Setting @code{IGNORECASE} from the command line is a way to make a program case-insensitive without having to edit it. -Prior to @command{gawk} 3.0, the value of @code{IGNORECASE} -affected regexp operations only. It did not affect string comparison -with @samp{==}, @samp{!=}, and so on. -Beginning with @value{PVERSION} 3.0, both regexp and string comparison -operations are also affected by @code{IGNORECASE}. +Both regexp and string comparison +operations are affected by @code{IGNORECASE}. @c @cindex ISO 8859-1 @c @cindex ISO Latin-1 -Beginning with @command{gawk} 3.0, +In multibyte locales, the equivalences between upper- -and lowercase characters are based on the ISO-8859-1 (ISO Latin-1) +and lowercase characters are tested based on the wide-character values of +the locale's character set. +Otherwise, the characters are tested based +on the ISO-8859-1 (ISO Latin-1) character set. This character set is a superset of the traditional 128 ASCII characters, which also provides a number of characters suitable -for use with European languages. - -As of @command{gawk} 3.1.4, the case equivalences are fully -locale-aware. They are based on the C @code{<ctype.h>} facilities, -such as @code{isalpha()} and @code{toupper()}. +for use with European languages.@footnote{If you don't understand this, +don't worry about it; it just means that @command{gawk} does +the right thing.} The value of @code{IGNORECASE} has no effect if @command{gawk} is in compatibility mode (@pxref{Options}). @@ -3818,7 +3817,7 @@ Consider the following: echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' @end example -This example uses the @code{sub} function (which we haven't discussed yet; +This example uses the @code{sub()} function (which we haven't discussed yet; @pxref{String Functions}) to make a change to the input record. Here, the regexp @code{/a+/} indicates ``one or more @samp{a} characters,'' and the replacement @@ -3831,13 +3830,13 @@ match. Thus, all four @samp{a} characters are replaced with @samp{<A>} in this example: @example -$ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' +$ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'} @print{} <A>bcd @end example For simple match/no-match tests, this is not so important. But when doing -text matching and substitutions with the @code{match}, @code{sub}, @code{gsub}, -and @code{gensub} functions, it is very important. +text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()}, +and @code{gensub()} functions, it is very important. @ifinfo @xref{String Functions}, for more information on these functions. @@ -3875,7 +3874,8 @@ $0 ~ digits_regexp @{ print @} This sets @code{digits_regexp} to a regexp that describes one or more digits, and tests whether the input record matches this regexp. -@strong{Caution:} When using the @samp{~} and @samp{!~} +@quotation NOTE +When using the @samp{~} and @samp{!~} operators, there is a difference between a regexp constant enclosed in slashes and a string constant enclosed in double quotes. If you are going to use a string constant, you have to understand that @@ -3884,6 +3884,7 @@ the string is, in essence, scanned @emph{twice}: the first time when match the string on the lefthand side of the operator with the pattern on the right. This is true of any string-valued expression (such as @code{digits_regexp}, shown previously), not just string constants. +@end quotation @cindex regexp constants, slashes vs.@: quotes @cindex @code{\} (backslash), regexp constants @@ -3936,7 +3937,7 @@ Some commercial versions of @command{awk} do not allow the newline character to be used inside a character list for a dynamic regexp: @example -$ awk '$0 ~ "[ \t\n]"' +$ @kbd{awk '$0 ~ "[ \t\n]"'} @error{} awk: newline in character class [ @error{} ]... @error{} source line number 1 @@ -3948,8 +3949,8 @@ $ awk '$0 ~ "[ \t\n]"' But a newline in a regexp constant works with no problem: @example -$ awk '$0 ~ /[ \t\n]/' -here is a sample line +$ @kbd{awk '$0 ~ /[ \t\n]/'} +@kbd{here is a sample line} @print{} here is a sample line @kbd{@value{CTL}-d} @end example @@ -3967,30 +3968,43 @@ occur often in practice, but it's worth noting for future reference. Modern systems support the notion of @dfn{locales}: a way to tell the system about the local character set and language. The current locale setting can affect the way regexp matching works, often -in surprising ways. In particular, many locales do case-insensitive -matching, even when you may have specified characters of only -one particular case. +in surprising ways. -The following example uses the @code{sub} function, which -does text replacement -(@pxref{String Functions}). -Here, the intent is to remove trailing uppercase characters: +For example, in the default C locale, @samp{[a-dx-z]} is equivalent to +@samp{[abcdxyz]}. Many locales sort characters in dictionary order, +and in these locales, @samp{[a-dx-z]} is typically not equivalent to +@samp{[abcdxyz]}; instead it might be equivalent to @samp{[aBbCcdXxYyz]}, +for example. + +This point needs to be emphasized: Much literature teaches that one should +use @samp{[a-z]} to match a lower case character. But on systems with +non-ASCII locales, this also matches all of the upper case characters +except @samp{Z}! This is a continuous cause of confusion, even well +into the twenty-first century. + +To obtain the traditional interpretation of bracket expressions, you can +use the C locale by setting the @env{LC_ALL} environment variable to the +value @samp{C}. However, it is best to just use POSIX character classes, +such as @samp{[[:lower:]]} to match specific classes of characters. + +To demonstrate these issues, the following example uses the @code{sub()} +function, which does text replacement (@pxref{String Functions}). Here, +the intent is to remove trailing uppercase characters: @example -$ echo something1234abc | gawk '@{ sub("[A-Z]*$", ""); print @}' +$ @kbd{echo something1234abc | gawk '@{ sub("[A-Z]*$", ""); print @}'} @print{} something1234 @end example @noindent -This output is unexpected, since the @samp{abc} at the end of @samp{something1234abc} -should not normally match @samp{[A-Z]*}. This result is due to the -locale setting (and thus you may not see it on your system). -There are two fixes. The first is to use the POSIX character -class @samp{[[:upper:]]}, instead of @samp{[A-Z]}. +This output is unexpected, since the @samp{abc} at the end of +@samp{something1234abc} should not normally match @samp{[A-Z]*}. +This result is due to the locale setting (and thus you may not see +it on your system). There are two fixes. The first is to use the +POSIX character class @samp{[[:upper:]]}, instead of @samp{[A-Z]}. (This is preferred, since then your program will work everywhere.) -The second is to change the locale setting in the environment, -before running @command{gawk}, -by using the shell statements: +The second is to change the locale setting in the environment, before +running @command{gawk}, by using the shell statements: @example LANG=C LC_ALL=C @@ -4008,6 +4022,7 @@ Unicode locales, such as @samp{en_US.UTF-8}. (In general, such ranges should be avoided; either list the characters individually, or use a POSIX character class such as @samp{[[:punct:]]}.) +An additional factor relates to splitting recoreds. For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant. For other single-character record separators, using @samp{LC_ALL=C} will give you much better performance when reading records. Otherwise, @@ -4025,7 +4040,8 @@ detail in @ref{Conversion}. @cindex input files, reading @cindex input files @cindex @code{FILENAME} variable -In the typical @command{awk} program, all input is read either from the +In the typical @command{awk} program, +@command{awk} reads all input either from the standard input (by default, this is the keyboard, but often it is a pipe from another command) or from files whose names you specify on the @command{awk} command line. If you specify input files, @command{awk} reads them @@ -4081,7 +4097,7 @@ been read so far from the current input file. This value is stored in a built-in variable called @code{FNR}. It is reset to zero when a new -file is started. Another built-in variable, @code{NR}, is the total +file is started. Another built-in variable, @code{NR}, records the total number of input records read so far from all @value{DF}s. It starts at zero, but is never automatically reset to zero. @@ -4124,8 +4140,8 @@ with each slash changed to a newline. Here are the results of running the program on @file{BBS-list}: @example -$ awk 'BEGIN @{ RS = "/" @} -> @{ print $0 @}' BBS-list +$ @kbd{awk 'BEGIN @{ RS = "/" @}} +> @kbd{@{ print $0 @}' BBS-list} @print{} aardvark 555-5553 1200 @print{} 300 B @print{} alpo-net 555-3412 2400 @@ -4225,6 +4241,7 @@ affected. After the end of the record has been determined, @command{gawk} sets the variable @code{RT} to the text in the input that matched @code{RS}. + When using @command{gawk}, the value of @code{RS} is not limited to a one-character string. It can be any regular expression @@ -4344,11 +4361,11 @@ record onto the end of the previous ones. @cindex field separators, POSIX and @cindex separators, field, POSIX and When @command{awk} reads an input record, the record is -automatically @dfn{parsed} or separated by the interpreter into chunks +automatically @dfn{parsed} or separated by the @command{awk} utility into chunks called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, like words in a line. Whitespace in @command{awk} means any string of one or more spaces, -tabs, or newlines;@footnote{In POSIX @command{awk}, newlines are not +TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not considered whitespace for separating fields.} other characters, such as formfeed, vertical tab, etc.@: that are considered whitespace by other languages, are @emph{not} considered @@ -4400,7 +4417,7 @@ when you are not interested in specific fields. Here are some more examples: @example -$ awk '$1 ~ /foo/ @{ print $0 @}' BBS-list +$ @kbd{awk '$1 ~ /foo/ @{ print $0 @}' BBS-list} @print{} fooey 555-1234 2400/1200/300 B @print{} foot 555-6699 1200/300 B @print{} macfoo 555-6480 1200/300 A @@ -4420,7 +4437,7 @@ looks for @samp{foo} in @emph{the entire record} and prints the first field and the last field for each matching input record: @example -$ awk '/foo/ @{ print $1, $NF @}' BBS-list +$ @kbd{awk '/foo/ @{ print $1, $NF @}' BBS-list} @print{} fooey B @print{} foot B @print{} macfoo A @@ -4492,8 +4509,8 @@ modifies the input file.) Consider the following example and its output: @example -$ awk '@{ nboxes = $3 ; $3 = $3 - 10 -> print nboxes, $3 @}' inventory-shipped +$ @kbd{awk '@{ nboxes = $3 ; $3 = $3 - 10} +> @kbd{print nboxes, $3 @}' inventory-shipped} @print{} 25 15 @print{} 32 22 @print{} 24 14 @@ -4525,7 +4542,7 @@ prints a copy of the input file, with 10 subtracted from the second field of each line: @example -$ awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped +$ @kbd{awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped} @print{} Jan 3 25 15 115 @print{} Feb 5 32 24 226 @print{} Mar 5 24 34 228 @@ -4594,8 +4611,8 @@ value of @code{$0} but does not change the value of @code{NF}, even when you assign the empty string to a field. For example: @example -$ echo a b c d | awk '@{ OFS = ":"; $2 = "" -> print $0; print NF @}' +$ @kbd{echo a b c d | awk '@{ OFS = ":"; $2 = ""} +> @kbd{print $0; print NF @}'} @print{} a::c:d @print{} 4 @end example @@ -4606,8 +4623,8 @@ the two colons between @samp{a} and @samp{c}. This example shows what happens if you create a new field: @example -$ echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new" -> print $0; print NF @}' +$ @kbd{echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new"} +> @kbd{print $0; print NF @}'} @print{} a::c:d::new @print{} 6 @end example @@ -4617,7 +4634,6 @@ The intervening field, @code{$5}, is created with an empty value (indicated by the second pair of adjacent colons), and @code{NF} is updated with the value six. -@strong{FIXME:} Verify that this is in POSIX. @cindex dark corner, @code{NF} variable, decrementing @cindex @code{NF} variable, decrementing Decrementing @code{NF} throws away the values of the fields @@ -4654,7 +4670,7 @@ There is a flip side to the relationship between @code{$0} and the fields. Any assignment to @code{$0} causes the record to be reparsed into fields using the @emph{current} value of @code{FS}. This also applies to any built-in function that updates @code{$0}, -such as @code{sub} and @code{gsub} +such as @code{sub()} and @code{gsub()} (@pxref{String Functions}). @c ENDOFRANGE ficon @@ -4662,6 +4678,7 @@ such as @code{sub} and @code{gsub} @section Specifying How Fields Are Separated @menu +* Default Field Splitting:: How fields are normally separated. * Regexp Field Splitting:: Using regexps as the field separator. * Single Character Fields:: Making each character a separate field. * Command Line Field Separator:: Setting @code{FS} from the command-line. @@ -4696,7 +4713,7 @@ Note the leading spaces in the values of the second and third fields. The field separator is represented by the built-in variable @code{FS}. Shell programmers take note: @command{awk} does @emph{not} use the name @code{IFS} that is used by the POSIX-compliant shells (such as -the Unix Bourne shell, @command{sh}, or @command{bash}). +the Unix Bourne shell, @command{sh}, or Bash). @cindex @code{FS} variable, changing value of The value of @code{FS} can be changed in the @command{awk} program with the @@ -4746,6 +4763,10 @@ separator characters carefully to prevent such problems. (If the data is not in a form that is easy to process, perhaps you can massage it first with a separate @command{awk} program.) + +@node Default Field Splitting +@subsection Whitespace Normally Separates Fields + @cindex newlines, as field separators @cindex whitespace, as field separators Fields are normally separated by whitespace sequences @@ -4810,7 +4831,7 @@ the record and then decides where the fields are. For example, the following pipeline prints @samp{b}: @example -$ echo ' a b c d ' | awk '@{ print $2 @}' +$ @kbd{echo ' a b c d ' | awk '@{ print $2 @}'} @print{} b @end example @@ -4819,8 +4840,8 @@ However, this pipeline prints @samp{a} (note the extra spaces around each letter): @example -$ echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @} -> @{ print $2 @}' +$ @kbd{echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}} +> @kbd{@{ print $2 @}'} @print{} a @end example @@ -4834,7 +4855,7 @@ The stripping of leading and trailing whitespace also comes into play whenever @code{$0} is recomputed. For instance, study this pipeline: @example -$ echo ' a b c d' | awk '@{ print; $2 = $2; print @}' +$ @kbd{echo ' a b c d' | awk '@{ print; $2 = $2; print @}'} @print{} a b c d @print{} a b c d @end example @@ -4860,28 +4881,19 @@ should not rely on any specific behavior in your programs. @value{DARKCORNER} As a point of information, the Bell Labs @command{awk} allows @samp{^} -to match only at the beginning of the record. Versions of @command{gawk} -after 3.1.6 also work this way. For example: +to match only at the beginning of the record. @command{gawk} +also works this way. For example: @example -$ echo 'xxAA xxBxx C' | -> nawk -F '(^x+)|( +)' '@{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i @}' -@print{} --><-- -@print{} -->AA<-- -@print{} -->xxBxx<-- -@print{} -->C<-- - -$ echo 'xxAA xxBxx C' | -> gawk-3.1.6 -F '(^x+)|( +)' '@{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i @}' +$ @kbd{echo 'xxAA xxBxx C' |} +> @kbd{gawk -F '(^x+)|( +)' '@{ for (i = 1; i <= NF; i++)} +> @kbd{printf "-->%s<--\n", $i @}'} @print{} --><-- @print{} -->AA<-- @print{} --><-- @print{} -->Bxx<-- @print{} -->C<-- @end example - -@noindent -As mentioned, @command{gawk} now behaves like the Bell Labs @command{awk}. @c ENDOFRANGE regexpfs @c ENDOFRANGE fsregexp @@ -4898,11 +4910,11 @@ each individual character in the record becomes a separate field. For example: @example -$ echo a b | gawk 'BEGIN @{ FS = "" @} -> @{ -> for (i = 1; i <= NF; i = i + 1) -> print "Field", i, "is", $i -> @}' +$ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}} +> @kbd{@{} +> @kbd{for (i = 1; i <= NF; i = i + 1)} +> @kbd{print "Field", i, "is", $i} +> @kbd{@}'} @print{} Field 1 is a @print{} Field 2 is @print{} Field 3 is b @@ -4988,7 +5000,7 @@ the first three digits of their phone numbers: @c tweaked to make the tex output look better in @smallbook @example -$ awk -F- -f baud.awk BBS-list +$ @kbd{awk -F- -f baud.awk BBS-list} @print{} aardvark 555 @print{} alpo @print{} barfly 555 @@ -5163,7 +5175,7 @@ the first reading.) @cindex data, fixed-width @cindex fixed-width data @cindex advanced features, fixed-width data -@command{gawk} @value{PVERSION} 2.13 introduced a facility for dealing with +@command{gawk} provides a facility for dealing with fixed-width fields with no distinctive field separator. For example, data of this nature arises in the input for old Fortran programs where numbers are run together, or in the output of programs that did not @@ -5173,7 +5185,7 @@ An example of the latter is a table where all the columns are lined up by the use of a variable number of spaces and @emph{empty fields are just spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} does not work well in this case. Although a portable @command{awk} program -can use a series of @code{substr} calls on @code{$0} +can use a series of @code{substr()} calls on @code{$0} (@pxref{String Functions}), this is awkward and inefficient for a large number of fields. @@ -5302,13 +5314,13 @@ the first reading.) @cindex advanced features, specifying field content Normally, when using @code{FS}, @command{gawk} defines the fields as the parts of the record that occur in between each field separator. In other -words, @code{FS} defines what a field @emph{is not}, and not what a field +words, @code{FS} defines what a field @emph{is not}, instead of what a field @emph{is}. However, there are times when you really want to define the fields by what they are, and not by what they are not. The most notorious such case -is so-called Comma-Separated-Value (CSV) data. Many spreadsheet programs, +is so-called @dfn{comma separated value} (CSV) data. Many spreadsheet programs, for example, can export their data into text files, where each record is terminated with a newline, and fields are separated by commas. If only commas separated the data, there wouldn't be an issue. The problem comes when @@ -5393,6 +5405,17 @@ the @code{FPAT} mechanism provides an elegant solution for the majority of cases, and the @command{gawk} maintainer is satisfied with that. @end quotation +As written, the regexp used for @code{FPATH} requires that each field +have a least one character. A straightforward modification +(changing changed the first @samp{+} to @samp{*}) allows fields to be empty: + +@example +FPAT = "([^,]*)|(\"[^\"]+\")" +@end example + +Finally, the @code{patsplit()} function makes the same functionality +available for splitting regular strings (@pxref{String Functions}). + @node Multiple Line @section Multiple-Line Records @@ -5464,7 +5487,7 @@ The original motivation for this special exception was probably to provide useful behavior in the default case (i.e., @code{FS} is equal to @w{@code{" "}}). This feature can be a problem if you really don't want the newline character to separate fields, because there is no way to -prevent it. However, you can work around this by using the @code{split} +prevent it. However, you can work around this by using the @code{split()} function to break up the record manually (@pxref{String Functions}). If you have a single character field separator, you can work around @@ -5605,7 +5628,8 @@ In the following examples, @var{command} stands for a string value that represents a shell command. @quotation NOTE -When @option{--sandbox} is specified, reading lines from files, pipes and coprocesses is disabled. +When @option{--sandbox} is specified (@pxref{Options}), +reading lines from files, pipes and coprocesses is disabled. @end quotation @menu @@ -5650,7 +5674,7 @@ processing on the next record @emph{right now}. For example: u = index($0, "*/") offset = 0 @} - # substr expression will be "" if */ + # substr() expression will be "" if */ # occurred at end of line $0 = tmp substr($0, offset + u + 2) @} @@ -5771,7 +5795,7 @@ According to POSIX, @samp{getline < @var{expression}} is ambiguous if @samp{$}; for example, @samp{getline < dir "/" file} is ambiguous because the concatenation operator is not parenthesized. You should write it as @samp{getline < (dir "/" file)} if you want your program -to be portable to other @command{awk} implementations. +to be portable to all @command{awk} implementations. @node Getline/Variable/File @subsection Using @code{getline} into a Variable from a File @@ -5806,8 +5830,8 @@ Note here how the name of the extra input file is not built into the program; it is taken directly from the data, specifically from the second field on the @samp{@@include} line. -@cindex @code{close} function -The @code{close} function is called to ensure that if two identical +@cindex @code{close()} function +The @code{close()} function is called to ensure that if two identical @samp{@@include} lines appear in the input, the entire specified file is included twice. @xref{Close Files And Pipes}. @@ -5839,7 +5863,7 @@ produced by running the rest of the line as a shell command: @example @{ if ($1 == "@@execute") @{ - tmp = substr($0, 10) + tmp = substr($0, 10) # Remove "@@execute" while ((tmp | getline) > 0) print close(tmp) @@ -5849,8 +5873,8 @@ produced by running the rest of the line as a shell command: @end example @noindent -@cindex @code{close} function -The @code{close} function is called to ensure that if two identical +@cindex @code{close()} function +The @code{close()} function is called to ensure that if two identical @samp{@@execute} lines appear in the input, the command is run for each one. @ifnottex @@ -5900,18 +5924,17 @@ According to POSIX, @samp{@var{expression} | getline} is ambiguous if @samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous because the concatenation operator is not parenthesized. You should write it as @samp{(@w{"echo "} "date") | getline} if you want your program -to be portable to other @command{awk} implementations. +to be portable to all @command{awk} implementations. @quotation NOTE Unfortunately, @command{gawk} has not been consistent in its treatment -of a construct like @samp{@w{"echo "} "date" | getline}. Up to and including -@value{PVERSION} 3.1.1 of @command{gawk}, it was treated as +of a construct like @samp{@w{"echo "} "date" | getline}. +Most versions, including the current version, treat it at as @samp{@w{("echo "} "date") | getline}. (This how Unix @command{awk} behaves.) -From 3.1.2 through 3.1.5, it was treated as +Some versions changed and treated it as @samp{@w{"echo "} ("date" | getline)}. (This is how @command{mawk} behaves.) -Starting with @value{PVERSION} 3.1.6, the earlier behavior was reinstated. In short, @emph{always} use explicit parentheses, and then you won't have to worry. @end quotation @@ -6055,6 +6078,13 @@ current input file. However, by not using a variable, @code{$0} and @code{NR} are still updated. If you're doing this, it's probably by accident, and you should reconsider what it is you're trying to accomplish. + +@item +@ref{Getline Summary}, presents a table summarizing the +@code{getline} variants and which variables they can affect. +It is worth noting that those variants which do not use redirection +can cause @code{FILENAME} to be updated if they cause +@command{awk} to start reading a new input file. @end itemize @node Getline Summary @@ -6067,16 +6097,16 @@ listing which built-in variables are set by each one. @float Table,table-getline-variants @caption{getline Variants and What They Set} -@multitable @columnfractions .35 .65 -@headitem Variant @tab Effect -@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR} -@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, and @code{NR} -@item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF} -@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} -@item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF} -@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} -@item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF}. This is a @command{gawk} extension -@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var}. This is a @command{gawk} extension +@multitable @columnfractions .33 .43 .22 +@headitem Variant @tab Effect @tab Standad / Extenstion +@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR} @tab Standard +@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, and @code{NR} @tab Standard +@item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF} @tab Standard +@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} @tab Standard +@item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF} @tab Standard +@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} @tab Standard +@item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF} @tab Extension +@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} @tab Extension @end multitable @end float @c ENDOFRANGE getl @@ -6088,39 +6118,36 @@ listing which built-in variables are set by each one. @cindex @code{BEGINFILE} special pattern @cindex @code{ENDFILE} special pattern -@strong{FIXME:} Get the version right. @quotation NOTE -This @value{SECTION} describes a @command{gawk}-specific feature -added in @command{gawk} 3.X. +This @value{SECTION} describes a @command{gawk}-specific feature. @end quotation -Two special kinds of rule, @code{BEGINFILE} and @code{ENDFILE}, give you ``hooks'' -into @command{gawk}'s command-line file processing loop. As with the @code{BEGIN} -and @code{END} rules (@pxref{BEGIN/END}), -all @code{BEGINFILE} rules in a program are merged, -in the order they are read by @command{gawk}, and all @code{ENDFILE} rules are -merged as well. +Two special kinds of rule, @code{BEGINFILE} and @code{ENDFILE}, give +you ``hooks'' into @command{gawk}'s command-line file processing loop. +As with the @code{BEGIN} and @code{END} rules (@pxref{BEGIN/END}), all +@code{BEGINFILE} rules in a program are merged, in the order they are +read by @command{gawk}, and all @code{ENDFILE} rules are merged as well. -The body of the @code{BEGINFILE} rules is executed just before @command{gawk} -reads the first record from a file. @code{FILENAME} is set to the name of the current file, -and @code{FNR} is set to zero. +The body of the @code{BEGINFILE} rules is executed just before +@command{gawk} reads the first record from a file. @code{FILENAME} +is set to the name of the current file, and @code{FNR} is set to zero. -The @code{BEGINFILE} rule provides you the opportunity for two -tasks that would otherwise be difficult or impossible to perform: +The @code{BEGINFILE} rule provides you the opportunity for two tasks +that would otherwise be difficult or impossible to perform: @enumerate 1 @item -You can test if the file is readable. -Normally, it is a fatal error if a file named on the command line cannot be -opened for reading. However, you can -bypass the fatal error and move on to the next file on the command line. +You can test if the file is readable. Normally, it is a fatal error if a +file named on the command line cannot be opened for reading. However, +you can bypass the fatal error and move on to the next file on the +command line. -You do this by checking if -the @code{ERRNO} variable is not -the empty string; if so, then @command{gawk} was not able to open the file. In -this case, your program can execute the @code{nextfile} statement (@pxref{Nextfile Statement}). -This casuses @command{gawk} to skip the file entirely. -Otherwise, @command{gawk} will exit with the usual fatal error. +You do this by checking if the @code{ERRNO} variable is not the empty +string; if so, then @command{gawk} was not able to open the file. In +this case, your program can execute the @code{nextfile} statement +(@pxref{Nextfile Statement}). This casuses @command{gawk} to skip +the file entirely. Otherwise, @command{gawk} exits with the usual +fatal error. @item If you have written extensions that modify the record handling (by inserting @@ -6130,42 +6157,42 @@ currently used only by the @uref{http://xgawk.sourceforge.net, XMLgawk project}. @end enumerate The @code{ENDFILE} rule is called when @command{gawk} has finished processing -the last record in an input file. It will be called before any @code{END} rules. +the last record in an input file. For the last input file, +it will be called before any @code{END} rules. -Normally, when an error occurs when reading input in the normal input processing -loop, the error is fatal. However, if an @code{ENDFILE} rule is present, the -error becomes non-fatal, and instead @code{ERRNO} is set. This makes it possible -to catch and process I/O errors at the level of the @command{awk} program. +Normally, when an error occurs when reading input in the normal input +processing loop, the error is fatal. However, if an @code{ENDFILE} +rule is present, the error becomes non-fatal, and instead @code{ERRNO} +is set. This makes it possible to catch and process I/O errors at the +level of the @command{awk} program. -The @code{next} statement is not allowed inside either a @code{BEGINFILE} or -and @code{ENDFILE} rule. The @code{nextfile} statement is allowed only inside -a @code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule. +The @code{next} statement (@pxref{Next Statement}) is not allowed inside +either a @code{BEGINFILE} or and @code{ENDFILE} rule. The @code{nextfile} +statement (@pxref{Nextfile Statement}) is allowed only inside a +@code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule. -The @code{getline} statement (@pxref{Getline}) is restricted inside both @code{BEGINFILE} -and @code{ENDFILE}. Only the @samp{getline @var{variable} < @var{file}} form is -allowed. +The @code{getline} statement (@pxref{Getline}) is restricted inside +both @code{BEGINFILE} and @code{ENDFILE}. Only the @samp{getline +@var{variable} < @var{file}} form is allowed. @code{BEGINFILE} and @code{ENDFILE} are @command{gawk} extensions. -In most other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options}), -they are not special. - +In most other @command{awk} implementations, or if @command{gawk} is in +compatibility mode (@pxref{Options}), they are not special. @node Command line directories @section Directories On The Command Line @cindex directories, command line @cindex command line, directories on -According to POSIX, files named on the @command{awk} command line must be -text files. The behavior is ``undefined'' if they are not. Most versions -of @command{awk} treat a directory on the command line as a fatal error. +According to the POSIX standard, files named on the @command{awk} +command line must be text files. It is a fatal error if they are not. +Most versions of @command{awk} treat a directory on the command line as +a fatal error. -@strong{FIXME:} Get the version right. -Starting with version 3.x of @command{gawk}, a directory on the command line -produces a warning, but is otherwise skipped. If either of the @option{--posix} -or @option{--traditional} options is given, then @command{gawk} reverts to -treating directories on the command line as a fatal error. +By default, @command{gawk} produces a warning for a directory on the +command line, but otherwise ignores it. If either of the @option{--posix} +or @option{--traditional} options is given, then @command{gawk} reverts +to treating a directory on the command line as a fatal error. @node Printing @chapter Printing Output @@ -6192,7 +6219,7 @@ For printing with specifications, you need the @code{printf} statement Besides basic and formatted printing, this @value{CHAPTER} also covers I/O redirections to files and pipes, introduces the special @value{FN}s that @command{gawk} processes internally, -and discusses the @code{close} built-in function. +and discusses the @code{close()} built-in function. @menu * Print:: The @code{print} statement. @@ -6211,7 +6238,7 @@ and discusses the @code{close} built-in function. @node Print @section The @code{print} Statement -The @code{print} statement is used to produce output with simple, standardized +The @code{print} statement is used for producing output with simple, standardized formatting. Specify only the strings or numbers to print, in a list separated by commas. They are output, separated by single spaces, followed by a newline. The statement looks like this: @@ -6223,7 +6250,7 @@ print @var{item1}, @var{item2}, @dots{} @noindent The entire list of items may be optionally enclosed in parentheses. The parentheses are necessary if any of the item expressions uses the @samp{>} -relational operator; otherwise it could be confused with a redirection +relational operator; otherwise it could be confused with an output redirection (@pxref{Redirection}). The items to print can be constant strings or numbers, fields of the @@ -6243,10 +6270,10 @@ expression, and you will probably get an error. Keep in mind that a space is printed between any two items. @node Print Examples -@section Examples of @code{print} Statements +@section @code{print} Statement Examples Each @code{print} statement makes at least one line of output. However, it -isn't limited to only one line. If an item value is a string that contains a +isn't limited to only one line. If an item value is a string containing a newline, the newline is output along with the rest of the string. A single @code{print} statement can make any number of lines this way. @@ -6256,7 +6283,7 @@ The following is an example of printing a string that contains embedded newlines character; @pxref{Escape Sequences}): @example -$ awk 'BEGIN @{ print "line one\nline two\nline three" @}' +$ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'} @print{} line one @print{} line two @print{} line three @@ -6268,7 +6295,7 @@ prints the first two fields of each input record, with a space between them: @example -$ awk '@{ print $1, $2 @}' inventory-shipped +$ @kbd{awk '@{ print $1, $2 @}' inventory-shipped} @print{} Jan 13 @print{} Feb 15 @print{} Mar 15 @@ -6284,7 +6311,7 @@ juxtaposing two string expressions in @command{awk} means to concatenate them. Here is the same program, without the comma: @example -$ awk '@{ print $1 $2 @}' inventory-shipped +$ @kbd{awk '@{ print $1 $2 @}' inventory-shipped} @print{} Jan13 @print{} Feb15 @print{} Mar15 @@ -6396,8 +6423,8 @@ program by using a new value of @code{OFS}. @end ignore @example -$ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} -> @{ print $1, $2 @}' BBS-list +$ @kbd{awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}} +> @kbd{@{ print $1, $2 @}' BBS-list} @print{} aardvark;555-5553 @print{} @print{} alpo-net;555-3412 @@ -6407,29 +6434,29 @@ $ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} @end example If the value of @code{ORS} does not contain a newline, the program's output -is run together on a single line. +runs together on a single line. @node OFMT @section Controlling Numeric Output with @code{print} @cindex numeric, output format @cindex formats@comma{} numeric output -When the @code{print} statement is used to print numeric values, +When printing numeric values with the @code{print} statement, @command{awk} internally converts the number to a string of characters -and prints that string. @command{awk} uses the @code{sprintf} function +and prints that string. @command{awk} uses the @code{sprintf()} function to do this conversion (@pxref{String Functions}). -For now, it suffices to say that the @code{sprintf} +For now, it suffices to say that the @code{sprintf()} function accepts a @dfn{format specification} that tells it how to format numbers (or strings), and that there are a number of different ways in which numbers can be formatted. The different format specifications are discussed more fully in @ref{Control Letters}. -@cindex @code{sprintf} function +@cindex @code{sprintf()} function @cindex @code{OFMT} variable @cindex output, format specifier@comma{} @code{OFMT} The built-in variable @code{OFMT} contains the default format specification -that @code{print} uses with @code{sprintf} when it wants to convert a +that @code{print} uses with @code{sprintf()} when it wants to convert a number to a string for printing. The default value of @code{OFMT} is @code{"%.6g"}. The way @code{print} prints numbers can be changed @@ -6437,9 +6464,9 @@ by supplying different format specifications as the value of @code{OFMT}, as shown in the following example: @example -$ awk 'BEGIN @{ -> OFMT = "%.0f" # print numbers as integers (rounds) -> print 17.23, 17.54 @}' +$ @kbd{awk 'BEGIN @{} +> @kbd{OFMT = "%.0f" # print numbers as integers (rounds)} +> @kbd{print 17.23, 17.54 @}'} @print{} 17 18 @end example @@ -6459,12 +6486,12 @@ if @code{OFMT} contains anything but a floating-point conversion specification. @cindex output, formatted @cindex formatting output For more precise control over the output format than what is -normally provided by @code{print}, use @code{printf}. -@code{printf} can be used to +provided by @code{print}, use @code{printf}. +With @code{printf} you can specify the width to use for each item, as well as various formatting choices for numbers (such as what output base to use, whether to print an exponent, whether to print a sign, and how many digits to print -after the decimal point). This is done by supplying a string, called +after the decimal point). You do this by supplying a string, called the @dfn{format string}, that controls how and where to print the other arguments. @@ -6488,7 +6515,7 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{} @noindent The entire list of arguments may optionally be enclosed in parentheses. The parentheses are necessary if any of the item expressions use the @samp{>} -relational operator; otherwise, it can be confused with a redirection +relational operator; otherwise, it can be confused with an output redirection (@pxref{Redirection}). @cindex format strings @@ -6510,17 +6537,17 @@ The output separator variables @code{OFS} and @code{ORS} have no effect on @code{printf} statements. For example: @example -$ awk 'BEGIN @{ -> ORS = "\nOUCH!\n"; OFS = "+" -> msg = "Dont Panic!" -> printf "%s\n", msg -> @}' +$ @kbd{awk 'BEGIN @{} +> @kbd{ORS = "\nOUCH!\n"; OFS = "+"} +> @kbd{msg = "Dont Panic!"} +> @kbd{printf "%s\n", msg} +> @kbd{@}'} @print{} Dont Panic! @end example @noindent -Here, neither the @samp{+} nor the @samp{OUCH} appear when -the message is printed. +Here, neither the @samp{+} nor the @samp{OUCH} appear in +the output message. @node Control Letters @subsection Format-Control Letters @@ -6536,13 +6563,14 @@ the field width. Here is a list of the format-control letters: @table @code @item %c -This prints a number as an ASCII character; thus, @samp{printf "%c", -65} outputs the letter @samp{A}. (The output for a string value is -the first character of the string.) +Print a number as an ASCII character; thus, @samp{printf "%c", +65} outputs the letter @samp{A}. The output for a string value is +the first character of the string. @cindex dark corner, format-control characters @cindex @command{gawk}, format-control characters @quotation NOTE +@ignore The @samp{%c} format does @emph{not} handle values outside the range 0--255. On most systems, values from 0--127 are within the range of ASCII and will yield an ASCII character. Values in the range 128--255 @@ -6551,15 +6579,28 @@ System 390 (IBM architecture mainframe) systems use 8-bit characters, and thus values from 0--255 yield the corresponding EBCDIC character. Any value above 255 is treated as modulo 255; i.e., the lowest eight bits of the value are used. The locale and character set are always ignored. +@end ignore +The POSIX standard says the first character of a string is printed. +In locales with multibyte characters, @command{gawk} attempts to +convert the leading bytes of the string into a valid wide character +and then to print the multibyte encoding of that character. +Similarly, when printing a numeric value, @command{gawk} allows the +value to be within the numeric range of values that can be held +in a wide character. + +Other @command{awk} versions generally restrict themselves to printing +the first byte of a string or to numeric values within the range of +a single byte (0--255). @end quotation @item %d@r{,} %i -These are equivalent; they both print a decimal integer. +Print a decimal integer. +The two control letters are equivalent. (The @samp{%i} specification is for compatibility with ISO C.) @item %e@r{,} %E -These print a number in scientific (exponential) notation; +Print a number in scientific (exponential) notation; for example: @example @@ -6574,7 +6615,7 @@ discussed in the next @value{SUBSECTION}.) @samp{%E} uses @samp{E} instead of @samp{e} in the output. @item %f -This prints a number in floating-point notation. +Print a number in floating-point notation. For example: @example @@ -6603,29 +6644,29 @@ The @code{%F} format is a POSIX extension to ISO C; not all systems support it. On those that don't, @command{gawk} uses @code{%f} instead. @item %g@r{,} %G -These print a number in either scientific notation or in floating-point +Print a number in either scientific notation or in floating-point notation, whichever uses fewer characters; if the result is printed in scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}. @item %o -This prints an unsigned octal integer. +Print an unsigned octal integer. @item %s -This prints a string. +Print a string. @item %u -This prints an unsigned decimal integer. +Print an unsigned decimal integer. (This format is of marginal use, because all numbers in @command{awk} are floating-point; it is provided primarily for compatibility with C.) @item %x@r{,} %X -These print an unsigned hexadecimal integer; +Print an unsigned hexadecimal integer; @samp{%X} uses the letters @samp{A} through @samp{F} instead of @samp{a} through @samp{f}. @item %% -This isn't a format-control letter, but it does have meaning---the -sequence @samp{%%} outputs one @samp{%}; it does not consume an +Print a single @samp{%}. +This does not consume an argument and it ignores any modifiers. @end table @@ -6712,8 +6753,8 @@ Use an ``alternate form'' for certain control letters. For @samp{%o}, supply a leading zero. For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for a nonzero result. -For @samp{%e}, @samp{%E}, and @samp{%f}, the result always contains a -decimal point. +For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always +contains a decimal point. For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result. @item 0 @@ -6782,15 +6823,15 @@ specifies the precision to use when printing. The meaning of the precision varies by control letter: @table @asis -@item @code{%e}, @code{%E}, @code{%f} +@item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X} +Minimum number of digits to print. + +@item @code{%e}, @code{%E}, @code{%f}, @code{%F} Number of digits to the right of the decimal point. @item @code{%g}, @code{%G} Maximum number of significant digits. -@item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X} -Minimum number of digits to print. - @item @code{%s} Maximum number of characters from the string that should print. @end table @@ -6847,7 +6888,7 @@ This is not particularly easy to read but it does work. C programmers may be used to supplying additional @samp{l}, @samp{L}, and @samp{h} modifiers in @code{printf} format strings. These are not valid in @command{awk}. -Most @command{awk} implementations silently ignore these modifiers. +Most @command{awk} implementations silently ignore them. If @option{--lint} is provided on the command line (@pxref{Options}), @command{gawk} warns about their use. If @option{--posix} is supplied, @@ -6857,7 +6898,7 @@ their use is a fatal error. @node Printf Examples @subsection Examples Using @code{printf} -The following is a simple example of +The following simple example shows how to use @code{printf} to make an aligned table: @example @@ -6873,7 +6914,7 @@ produces an aligned two-column table of names and phone numbers, as shown here: @example -$ awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list +$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list} @print{} aardvark 555-5553 @print{} alpo-net 555-3412 @print{} barfly 555-7685 @@ -6908,7 +6949,7 @@ awk 'BEGIN @{ print "Name Number" @{ printf "%-10s %s\n", $1, $2 @}' BBS-list @end example -The above example mixed @code{print} and @code{printf} statements in +The above example mixes @code{print} and @code{printf} statements in the same program. Using just @code{printf} statements can produce the same results: @@ -6946,15 +6987,16 @@ on the @code{print} statement @cindex output redirection @cindex redirection of output -@cindex @code{--sandbox} option, output redirection with @command{print}, @command{printf} +@cindex @code{--sandbox} option, output redirection with @code{print}, @code{printf} So far, the output from @code{print} and @code{printf} has gone to the standard -output, usually the terminal. Both @code{print} and @code{printf} can +output, usually the screen. Both @code{print} and @code{printf} can also send their output to other places. This is called @dfn{redirection}. @quotation NOTE -When @option{--sandbox} is specified, redirecting output to files and pipes is disabled. +When @option{--sandbox} is specified (@pxref{Options}), +redirecting output to files and pipes is disabled. @end quotation A redirection appears after the @code{print} or @code{printf} statement. @@ -6974,7 +7016,7 @@ but they work identically for @code{printf}: @cindex right angle bracket (@code{>}), @code{>} operator (I/O) @cindex operators, input/output @item print @var{items} > @var{output-file} -This type of redirection prints the items into the output file named +This redirection prints the items into the output file named @var{output-file}. The @value{FN} @var{output-file} can be any expression. Its value is changed to a string and then used as a @value{FN} (@pxref{Expressions}). @@ -6989,13 +7031,13 @@ file named @file{name-list}, and a list of phone numbers to another file named @file{phone-list}: @example -$ awk '@{ print $2 > "phone-list" -> print $1 > "name-list" @}' BBS-list -$ cat phone-list +$ @kbd{awk '@{ print $2 > "phone-list"} +> @kbd{print $1 > "name-list" @}' BBS-list} +$ @kbd{cat phone-list} @print{} 555-5553 @print{} 555-3412 @dots{} -$ cat name-list +$ @kbd{cat name-list} @print{} aardvark @print{} alpo-net @dots{} @@ -7007,7 +7049,7 @@ Each output file contains one name or number per line. @cindex @code{>} (right angle bracket), @code{>>} operator (I/O) @cindex right angle bracket (@code{>}), @code{>>} operator (I/O) @item print @var{items} >> @var{output-file} -This type of redirection prints the items into the pre-existing output file +This redirection prints the items into the pre-existing output file named @var{output-file}. The difference between this and the single-@samp{>} redirection is that the old contents (if any) of @var{output-file} are not erased. Instead, the @command{awk} output is @@ -7018,8 +7060,8 @@ If @var{output-file} does not exist, then it is created. @cindex pipes, output @cindex output, pipes @item print @var{items} | @var{command} -It is also possible to send output to another program through a pipe -instead of into a file. This type of redirection opens a pipe to +It is possible to send output to another program through a pipe +instead of into a file. This redirection opens a pipe to @var{command}, and writes the values of @var{items} through this pipe to another process created to execute @var{command}. @@ -7062,7 +7104,7 @@ The message is built using string concatenation and saved in the variable (The parentheses group the items to concatenate---see @ref{Concatenation}.) -The @code{close} function is called here because it's a good idea to close +The @code{close()} function is called here because it's a good idea to close the pipe as soon as all the intended output has been sent to it. @xref{Close Files And Pipes}, for more information. @@ -7079,7 +7121,7 @@ every time. @cindex operators, input/output @cindex differences in @command{awk} and @command{gawk}, input/output operators @item print @var{items} |& @var{command} -This type of redirection prints the items to the input of @var{command}. +This redirection prints the items to the input of @var{command}. The difference between this and the single-@samp{|} redirection is that the output from @var{command} can be read with @code{getline}. @@ -7155,7 +7197,7 @@ all lowercase. The following program is both simple and efficient: END @{ close("sh") @} @end example -The @code{tolower} function returns its argument string with all +The @code{tolower()} function returns its argument string with all uppercase characters converted to lowercase (@pxref{String Functions}). The program builds up a list of command lines, @@ -7170,8 +7212,8 @@ It then sends the list to the shell for execution. @cindex @command{gawk}, @value{FN}s in @command{gawk} provides a number of special @value{FN}s that it interprets -internally. These @value{FN}s provide access to standard file descriptors, -process-related information, and TCP/IP networking. +internally. These @value{FN}s provide access to standard file descriptors +and TCP/IP networking. @menu * Special FD:: Special files for I/O. @@ -7192,7 +7234,7 @@ process-related information, and TCP/IP networking. Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error -output}. These streams are, by default, connected to your terminal, but +output}. These streams are, by default, connected to your screen, but they are often redirected with the shell, via the @samp{<}, @samp{<<}, @samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error is typically used for writing error messages; the reason there are two separate @@ -7214,17 +7256,23 @@ standard error stream that it inherits from the @command{awk} process. This is far from elegant, and it is also inefficient, because it requires a separate process. So people writing @command{awk} programs often don't do this. Instead, they send the error messages to the -terminal, like this: +screen, like this: @example print "Serious error detected!" > "/dev/tty" @end example @noindent +(@file{/dev/tty} is a special file supplied by the operating system +that is connected to your keyboard and screen. It represents the +``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for +``Teletype,'' a serial terminal.} which on modern systems is a keyboard +and screen, not a serial console.) This usually has the same effect but not always: although the -standard error stream is usually the terminal, it can be redirected; when -that happens, writing to the terminal is not correct. In fact, if -@command{awk} is run from a background job, it may not have a terminal at all. +standard error stream is usually the screen, it can be redirected; when +that happens, writing to the screen is not correct. In fact, if +@command{awk} is run from a background job, it may not have a +terminal at all. Then opening @file{/dev/tty} fails. @command{gawk} provides special @value{FN}s for accessing the three standard @@ -7275,10 +7323,14 @@ It is a common error to omit the quotes, which leads to confusing results. @c Exercise: What does it do? :-) -Finally, usng the @code{close} function on a @value{FN} of the +Finally, usng the @code{close()} function on a @value{FN} of the form @code{"/dev/fd/@var{N}"}, for file descriptor numbers above two, will actually close the given file descriptor. +The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} +special files are also recognized internally by several other +versions of @command{awk}. + @node Special Network @subsection Special Files for Network Communications @cindex networks, support for @@ -7317,35 +7369,9 @@ special @value{FN}s that @command{gawk} provides: Recognition of these special @value{FN}s is disabled if @command{gawk} is in compatibility mode (@pxref{Options}). -@c @cindex automatic warnings -@c @cindex warnings, automatic -@cindex @code{PROCINFO} array -@item -@ifnottex -The -@end ifnottex -@ifnotinfo -As mentioned earlier, the -@end ifnotinfo -special files that provide process-related information are now considered -obsolete and will disappear entirely -in the next release of @command{gawk}. -@command{gawk} prints a warning message every time you use one of -these files. -@ifnottex -To obtain process-related information, use the @code{PROCINFO} array. -@xref{Built-in Variables}. -@end ifnottex - @item -Starting with @value{PVERSION} 3.1, @command{gawk} @emph{always} -interprets these special @value{FN}s.@footnote{Older versions of -@command{gawk} would interpret these names internally only if the system -did not actually have a @file{/dev/fd} directory or any of the other -special files listed earlier. Usually this didn't make a difference, -but sometimes it did; thus, it was decided to make @command{gawk}'s -behavior consistent on all systems and to have it always interpret -the special @value{FN}s itself.} +@command{gawk} @emph{always} +interprets these special @value{FN}s. For example, using @samp{/dev/fd/4} for output actually writes on file descriptor 4, and not on a new file descriptor that is @code{dup}'ed from file descriptor 4. Most of @@ -7376,15 +7402,15 @@ At that time, the first record of input is read from that file or command. The next time the same file or command is used with @code{getline}, another record is read from it, and so on. -Similarly, when a file or pipe is opened for output, the @value{FN} or -command associated with it is remembered by @command{awk}, and subsequent +Similarly, when a file or pipe is opened for output, @command{awk} remembers +the @value{FN} or command associated with it, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until @command{awk} exits. -@cindex @code{close} function +@cindex @code{close()} function This implies that special steps are necessary in order to read the same file again from the beginning, or to rerun a shell command (rather than -reading more output from the same command). The @code{close} function +reading more output from the same command). The @code{close()} function makes these things possible: @example @@ -7464,14 +7490,14 @@ program closes the pipe after each line of output, then each line makes a separate message. @end itemize -@cindex differences in @command{awk} and @command{gawk}, @code{close} function -@cindex portability, @code{close} function and +@cindex differences in @command{awk} and @command{gawk}, @code{close()} function +@cindex portability, @code{close()} function and If you use more files than the system allows you to have open, @command{gawk} attempts to multiplex the available open files among your @value{DF}s. @command{gawk}'s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always -use @code{close} on your files when you are done with them. +use @code{close()} on your files when you are done with them. In fact, if you are using a lot of pipes, it is essential that you close commands when done. For example, consider something like this: @@ -7487,7 +7513,7 @@ you close commands when done. For example, consider something like this: @end example This example creates a new pipeline based on data in @emph{each} record. -Without the call to @code{close} indicated in the comment, @command{awk} +Without the call to @code{close()} indicated in the comment, @command{awk} creates child processes to run the commands, until it eventually runs out of file descriptors for more pipelines. @@ -7498,10 +7524,10 @@ The finished child is called a ``zombie,'' and cleaning up after it is referred to as ``reaping.''} @c Good old UNIX: give the marketing guys fits, that's the ticket more importantly, the file descriptor for the pipe -is not closed and released until @code{close} is called or +is not closed and released until @code{close()} is called or @command{awk} exits. -@code{close} will silently do nothing if given an argument that +@code{close()} will silently do nothing if given an argument that does not represent a file, pipe or coprocess that was opened with a redirection. @@ -7515,8 +7541,8 @@ does nothing. When using the @samp{|&} operator to communicate with a coprocess, it is occasionally useful to be able to close one end of the two-way pipe without closing the other. -This is done by supplying a second argument to @code{close}. -As in any other call to @code{close}, +This is done by supplying a second argument to @code{close()}. +As in any other call to @code{close()}, the first argument is the name of the command or special file used to start the coprocess. The second argument should be a string, with either of the values @@ -7527,26 +7553,26 @@ delayed until which discusses it in more detail and gives an example. @c fakenode --- for prepinfo -@subheading Advanced Notes: Using @code{close}'s Return Value -@cindex advanced features, @code{close} function -@cindex dark corner, @code{close} function -@cindex @code{close} function, return values -@cindex return values@comma{} @code{close} function -@cindex differences in @command{awk} and @command{gawk}, @code{close} function -@cindex Unix @command{awk}, @code{close} function and - -In many versions of Unix @command{awk}, the @code{close} function +@subheading Advanced Notes: Using @code{close()}'s Return Value +@cindex advanced features, @code{close()} function +@cindex dark corner, @code{close()} function +@cindex @code{close()} function, return values +@cindex return values@comma{} @code{close()} function +@cindex differences in @command{awk} and @command{gawk}, @code{close()} function +@cindex Unix @command{awk}, @code{close()} function and + +In many versions of Unix @command{awk}, the @code{close()} function is actually a statement. It is a syntax error to try and use the return -value from @code{close}: +value from @code{close()}: @value{DARKCORNER} @example command = "@dots{}" command | getline info -retval = close(command) # syntax error in most Unix awks +retval = close(command) # syntax error in many Unix awks @end example -@command{gawk} treats @code{close} as a function. +@command{gawk} treats @code{close()} as a function. The return value is @minus{}1 if the argument names something that was never opened with a redirection, or if there is a system problem closing the file or process. @@ -7556,16 +7582,16 @@ In these cases, @command{gawk} sets the built-in variable In @command{gawk}, when closing a pipe or coprocess (input or output), the return value is the exit status of the command.@footnote{ -This is a full 16-bit value as returned by the @code{wait} +This is a full 16-bit value as returned by the @code{wait()} system call. See the system manual pages for information on how to decode this value.} -Otherwise, it is the return value from the system's @code{close} or -@code{fclose} C functions when closing input or output +Otherwise, it is the return value from the system's @code{close()} or +@code{fclose()} C functions when closing input or output files, respectively. This value is zero if the close succeeds, or @minus{}1 if it fails. -The POSIX standard is very vague; it says that @code{close} +The POSIX standard is very vague; it says that @code{close()} returns zero on success and non-zero otherwise. In general, different implementations vary in what they report when closing pipes; thus the return value cannot be used portably. @@ -7731,7 +7757,7 @@ programs. command-line option; @pxref{Nondecimal Data}.) If you have octal or hexadecimal data, -you can use the @code{strtonum} function +you can use the @code{strtonum()} function (@pxref{String Functions}) to convert the data into a number. Most of the time, you will want to use octal or hexadecimal constants @@ -7854,15 +7880,15 @@ POSIX specification. @cindex differences in @command{awk} and @command{gawk}, regexp constants @cindex dark corner, regexp constants, as arguments to user-defined functions -@cindex @code{gensub} function (@command{gawk}) -@cindex @code{sub} function -@cindex @code{gsub} function +@cindex @code{gensub()} function (@command{gawk}) +@cindex @code{sub()} function +@cindex @code{gsub()} function Constant regular expressions are also used as the first argument for -the @code{gensub}, @code{sub}, and @code{gsub} functions, and as the -second argument of the @code{match} function +the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, and as the +second argument of the @code{match()} function (@pxref{String Functions}). Modern implementations of @command{awk}, including @command{gawk}, allow -the third argument of @code{split} to be a regexp constant, but some +the third argument of @code{split()} to be a regexp constant, but some older implementations do not. @value{DARKCORNER} This can lead to confusion when attempting to use regexp constants @@ -7892,7 +7918,7 @@ function mysub(pat, repl, str, global) @c @cindex warnings, automatic In this example, the programmer wants to pass a regexp constant to the user-defined function @code{mysub}, which in turn passes it on to -either @code{sub} or @code{gsub}. However, what really happens is that +either @code{sub()} or @code{gsub()}. However, what really happens is that the @code{pat} parameter is either one or zero, depending upon whether or not @code{$0} matches @code{/hi/}. @command{gawk} issues a warning when it sees a regexp constant used as @@ -8054,7 +8080,7 @@ Strings that can't be interpreted as valid numbers convert to zero. @cindex @code{CONVFMT} variable The exact manner in which numbers are converted into strings is controlled by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}). -Numbers are converted using the @code{sprintf} function +Numbers are converted using the @code{sprintf()} function with @code{CONVFMT} as the format specifier (@pxref{String Functions}). @@ -8070,7 +8096,7 @@ most of the time.@footnote{Pathological cases can require up to @cindex dark corner, @code{CONVFMT} variable Strange results can occur if you set @code{CONVFMT} to a string that doesn't -tell @code{sprintf} how to format floating-point numbers in a useful way. +tell @code{sprintf()} how to format floating-point numbers in a useful way. For example, if you forget the @samp{%} in the format, @command{awk} converts all numbers to the same constant string. As a special case, if a number is an integer, then the result of converting @@ -8162,7 +8188,7 @@ features have not been described yet. @item @samp{%'g} @tab Use locale @tab Use locale @item @samp{%g} @tab Use period @tab Use locale @item Input @tab Use period @tab Use locale -@item @code{strtonum} @tab Use period @tab Use locale +@item @code{strtonum()} @tab Use period @tab Use locale @end multitable @end float @@ -8589,8 +8615,8 @@ BEGIN @{ @cindex assignment operators, evaluation order @noindent The indices of @code{bar} are practically guaranteed to be different, because -@code{rand} returns different values each time it is called. -(Arrays and the @code{rand} function haven't been covered yet. +@code{rand()} returns different values each time it is called. +(Arrays and the @code{rand()} function haven't been covered yet. @xref{Arrays}, and see @ref{Numeric Functions}, for more information). This example illustrates an important fact about assignment @@ -8924,7 +8950,7 @@ attribute. @item Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, @code{ENVIRON} elements, and the -elements of an array created by @code{split} and @code{match} that are numeric strings +elements of an array created by @code{split()} and @code{match()} that are numeric strings have the @var{strnum} attribute. Otherwise, they have the @var{string} attribute. Uninitialized variables also have the @var{strnum} attribute. @@ -9425,11 +9451,11 @@ If @option{--posix} is specified A @dfn{function} is a name for a particular calculation. This enables you to ask for it by name at any point in the program. For -example, the function @code{sqrt} computes the square root of a number. +example, the function @code{sqrt()} computes the square root of a number. @cindex functions, built-in A fixed set of functions are @dfn{built-in}, which means they are -available in every @command{awk} program. The @code{sqrt} function is one +available in every @command{awk} program. The @code{sqrt()} function is one of these. @xref{Built-in}, for a list of built-in functions and their descriptions. In addition, you can define functions for use in your program. @@ -9461,7 +9487,7 @@ a variable with an expression inside parentheses. With built-in functions, space before the parenthesis is harmless, but it is best not to get into the habit of using space to avoid mistakes with user-defined functions. Each function expects a particular number -of arguments. For example, the @code{sqrt} function must be called with +of arguments. For example, the @code{sqrt()} function must be called with a single argument, the number of which to take the square root: @example @@ -10078,7 +10104,7 @@ using library functions. @xref{Library Functions}, for a number of useful library functions. -If an @command{awk} program has only a @code{BEGIN} rule and no +If an @command{awk} program has only @code{BEGIN} rules and no other rules, then the program exits after the @code{BEGIN} rule is run.@footnote{The original version of @command{awk} used to keep reading and ignoring input until the end of the file was seen.} However, if an @@ -10733,14 +10759,8 @@ used outside the body of a loop. However, although it was never documented, historical implementations of @command{awk} treated the @code{break} statement outside of a loop as if it were a @code{next} statement (@pxref{Next Statement}). -Recent versions of Unix @command{awk} no longer allow this usage. -@command{gawk} supports this use of @code{break} only -if @option{--traditional} -has been specified on the command line -(@pxref{Options}). -Otherwise, it is treated as an error, since the POSIX standard -specifies that @code{break} should only be used inside the body of a -loop. +Recent versions of Unix @command{awk} no longer allow this usage, +nor does @command{gawk}. @value{DARKCORNER} @node Continue Statement @@ -10803,11 +10823,8 @@ statement outside a loop the same way they treated a @code{break} statement outside a loop: as if it were a @code{next} statement (@pxref{Next Statement}). -Recent versions of Unix @command{awk} no longer work this way, and -@command{gawk} allows it only if @option{--traditional} is specified on -the command line (@pxref{Options}). Just like the -@code{break} statement, the POSIX standard specifies that @code{continue} -should only be used inside the body of a loop. +Recent versions of Unix @command{awk} no longer work this way, nor +does @command{gawk}. @value{DARKCORNER} @node Next Statement @@ -10908,7 +10925,7 @@ has to continue scanning the unwanted records. The @code{nextfile} statement accomplishes this much more efficiently. While one might think that @samp{close(FILENAME)} would accomplish -the same as @code{nextfile}, this isn't true. @code{close} is +the same as @code{nextfile}, this isn't true. @code{close()} is reserved for closing files, pipes, and coprocesses that are opened with redirections. It is not related to the main processing that @command{awk} does with the files listed in @code{ARGV}. @@ -11092,7 +11109,7 @@ it is not special. This string controls conversion of numbers to strings (@pxref{Conversion}). It works by being passed, in effect, as the first argument to the -@code{sprintf} function +@code{sprintf()} function (@pxref{String Functions}). Its default value is @code{"%.6g"}. @code{CONVFMT} was introduced by the POSIX standard. @@ -11173,8 +11190,8 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment. @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons and all regular expression matching are case independent. Thus, regexp -matching with @samp{~} and @samp{!~}, as well as the @code{gensub}, -@code{gsub}, @code{index}, @code{match}, @code{split}, and @code{sub} +matching with @samp{~} and @samp{!~}, as well as the @code{gensub()}, +@code{gsub()}, @code{index()}, @code{match()}, @code{split()}, and @code{sub()} functions, record termination with @code{RS}, and field splitting with @code{FS}, all ignore case when doing their particular regexp operations. However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting @@ -11218,13 +11235,13 @@ of @command{awk} being executed. This string controls conversion of numbers to strings (@pxref{Conversion}) for printing with the @code{print} statement. It works by being passed -as the first argument to the @code{sprintf} function +as the first argument to the @code{sprintf()} function (@pxref{String Functions}). Its default value is @code{"%.6g"}. Earlier versions of @command{awk} also used @code{OFMT} to specify the format for converting numbers to strings in general expressions; this is now done by @code{CONVFMT}. -@cindex @code{sprintf} function, @code{OFMT} variable and +@cindex @code{sprintf()} function, @code{OFMT} variable and @cindex @code{print} statement, @code{OFMT} variable and @cindex @code{OFS} variable @cindex separators, field @@ -11277,7 +11294,7 @@ really accesses @code{foo["A\034B"]} This variable is used for internationalization of programs at the @command{awk} level. It sets the default text domain for specially marked string constants in the source text, as well as for the -@code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} functions +@code{dcgettext()}, @code{dcngettext()} and @code{bindtextdomain()} functions (@pxref{Internationalization}). The default value of @code{TEXTDOMAIN} is @code{"messages"}. @@ -11377,7 +11394,7 @@ indices are the environment variable names; the elements are the values of the particular environment variables. For example, @code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system} function. +@command{awk} may spawn via redirection or the @code{system()} function. @c (In a future version of @command{gawk}, it may do so.) Some operating systems may not have environment variables. @@ -11390,11 +11407,10 @@ On such systems, the @code{ENVIRON} array is empty (except for @cindex error handling, @code{ERRNO} variable and @item ERRNO # If a system error occurs during a redirection for @code{getline}, -during a read for @code{getline}, or during a @code{close} operation, +during a read for @code{getline}, or during a @code{close()} operation, then @code{ERRNO} contains a string describing the error. -@strong{FIXME:} Get the version right. -Starting with @value{PVERSION} 3.X, @command{gawk} clears @code{ERRNO} +Starting with @value{PVERSION} 4.0, @command{gawk} clears @code{ERRNO} before opening each command line input file. This enables checking if the file is readable inside a @code{BEGINFILE} pattern (@pxref{BEGINFILE/ENDFILE}). @@ -11504,7 +11520,7 @@ The value of the @code{getuid} system call. @item PROCINFO["version"] The version of @command{gawk}. This is available from -version 3.1.4 and later. +@value{PVERSION} 3.1.4 and later. @end table On some systems, there may be elements in the array, @code{"group1"} @@ -11522,17 +11538,17 @@ it is not special. @cindex @code{RLENGTH} variable @item RLENGTH The length of the substring matched by the -@code{match} function +@code{match()} function (@pxref{String Functions}). -@code{RLENGTH} is set by invoking the @code{match} function. Its value +@code{RLENGTH} is set by invoking the @code{match()} function. Its value is the length of the matched string, or @minus{}1 if no match is found. @cindex @code{RSTART} variable @item RSTART The start-index in characters of the substring that is matched by the -@code{match} function +@code{match()} function (@pxref{String Functions}). -@code{RSTART} is set by invoking the @code{match} function. Its value +@code{RSTART} is set by invoking the @code{match()} function. Its value is the position of the string where the matched substring starts, or zero if no match was found. @@ -11742,6 +11758,7 @@ same @command{awk} program. * Multi-dimensional:: Emulating multidimensional arrays in @command{awk}. * Array Sorting:: Sorting array values and indices. +* Arrays of Arrays:: True multidimensional arrays. @end menu @node Array Basics @@ -11766,7 +11783,7 @@ an array. @cindex Wall, Larry @quotation -@i{Doing linear scans over an associateive array is like tryinng to club someone +@i{Doing linear scans over an associative array is like tryinng to club someone to death with a loaded Uzi.}@* Larry Wall @end quotation @@ -11910,7 +11927,7 @@ automatically converts it to a string. The value of @code{IGNORECASE} has no effect upon array subscripting. The identical string value used to store an array element must be used to retrieve it. -When @command{awk} creates an array (e.g., with the @code{split} +When @command{awk} creates an array (e.g., with the @code{split()} built-in function), that array's indices are consecutive integers starting at one. (@xref{String Functions}.) @@ -12102,7 +12119,7 @@ find all the distinct words that appear in the input. It prints each word that is more than 10 characters long and also prints the number of such words. @xref{String Functions}, -for more information on the built-in function @code{length}. +for more information on the built-in function @code{length()}. @example # Record a 1 for each word that is used at least once @@ -12218,8 +12235,8 @@ out an array:@footnote{Thanks to Michael Brennan for pointing this out.} split("", array) @end example -@cindex @code{split} function, array elements@comma{} deleting -The @code{split} function +@cindex @code{split()} function, array elements@comma{} deleting +The @code{split()} function (@pxref{String Functions}) clears out the target array first. This call asks it to split apart the null string. Because there is no data to split out, the @@ -12474,7 +12491,7 @@ However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining the scanning @code{for} statement (@pxref{Scanning an Array}) with the -built-in @code{split} function +built-in @code{split()} function (@pxref{String Functions}). It works in the following manner: @@ -12497,7 +12514,7 @@ an element with index @code{"1\034foo"} exists in @code{array}. (Recall that the default value of @code{SUBSEP} is the character with code 034.) Sooner or later, the @code{for} statement finds that index and does an iteration with the variable @code{combined} set to @code{"1\034foo"}. -Then the @code{split} function is called as follows: +Then the @code{split()} function is called as follows: @example split("1\034foo", separate, "\034") @@ -12512,8 +12529,8 @@ separate indices is recovered. @section Sorting Array Values and Indices with @command{gawk} @cindex arrays, sorting -@cindex @code{asort} function (@command{gawk}) -@cindex @code{asort} function (@command{gawk}), arrays@comma{} sorting +@cindex @code{asort()} function (@command{gawk}) +@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting @cindex sort function, arrays, sorting The order in which an array is scanned with a @samp{for (i in array)} loop is essentially arbitrary. @@ -12521,8 +12538,8 @@ In most @command{awk} implementations, sorting an array requires writing a @code{sort} function. While this can be educational for exploring different sorting algorithms, usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort} -and @code{asorti} functions +@command{gawk} provides the built-in @code{asort()} +and @code{asorti()} functions (@pxref{String Functions}) for sorting arrays. For example: @@ -12533,18 +12550,18 @@ for (i = 1; i <= n; i++) @var{do something with} data[i] @end example -After the call to @code{asort}, the array @code{data} is indexed from 1 +After the call to @code{asort()}, the array @code{data} is indexed from 1 to some number @var{n}, the total number of elements in @code{data}. -(This count is @code{asort}'s return value.) +(This count is @code{asort()}'s return value.) @code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. The comparison of array elements is done using @command{gawk}'s usual comparison rules (@pxref{Typing and Comparison}). -@cindex side effects, @code{asort} function -An important side effect of calling @code{asort} is that +@cindex side effects, @code{asort()} function +An important side effect of calling @code{asort()} is that @emph{the array's original indices are irrevocably lost}. -As this isn't always desirable, @code{asort} accepts a +As this isn't always desirable, @code{asort()} accepts a second argument: @example @@ -12561,8 +12578,8 @@ However, the @code{source} array is not affected. Often, what's needed is to sort on the values of the @emph{indices} instead of the values of the elements. To do that, starting with @command{gawk} 3.1.2, use the -@code{asorti} function. The interface is identical to that of -@code{asort}, except that the index values are used for sorting, and +@code{asorti()} function. The interface is identical to that of +@code{asort()}, except that the index values are used for sorting, and become the values of the result array: @example @@ -12579,7 +12596,7 @@ END @{ @end example If your version of @command{gawk} is 3.1.0 or 3.1.1, you don't -have @code{asorti}. Instead, use a helper array +have @code{asorti()}. Instead, use a helper array to hold the sorted index values, and then access the original array's elements. It works in the following way: @@ -12606,7 +12623,7 @@ To traverse the elements in decreasing order, use a loop that goes from @cindex reference counting, sorting arrays Copying array indices and elements isn't expensive in terms of memory. Internally, @command{gawk} maintains @dfn{reference counts} to data. -For example, when @code{asort} copies the first array to the second one, +For example, when @code{asort()} copies the first array to the second one, there is only one copy of the original array elements' data, even though both arrays use the values. Similarly, when copying the indices from @code{data} to @code{ind}, there is only one copy of the actual index @@ -12618,10 +12635,144 @@ strings. We said previously that comparisons are done using @command{gawk}'s ``usual comparison rules.'' Because @code{IGNORECASE} affects string comparisons, the value of @code{IGNORECASE} also -affects sorting for both @code{asort} and @code{asorti}. +affects sorting for both @code{asort()} and @code{asorti()}. Note also that the locale's sorting order does @emph{not} come into play; comparisons are based on character values only. Caveat Emptor. + +@node Arrays of Arrays +@section Arrays of Arrays + +@command{gawk} supports arrays of +arrays. Elements of a subarray are referred to by their own indices +enclosed in square brackets, just like the elements of the main array. +For example, the following creates a two-element subarray at index @samp{1} +of the main array @code{a}: + +@example +a[1][1] = 1 +a[1][2] = 2 +@end example + +This simulates a true two-dimensional array. Each subarray element can +contain another subarray as a value, which in turn can hold other arrays +as well. In this way, you can create arrays of three or more dimensions. +The indices can be any @command{awk} expression, including scalars +seperated by commas (that is, a regular @command{awk} simulated +multidimensional subscript). So the following is valid in +@command{gawk}: + +@example +a[1][3][1, "name"] = "barney" +@end example + +Each subarray and the main array can be of different length. In fact, the +elements of an array or its subarray do not all have to have the same +type. This means that the main array and any of its subarrays can be +non-rectangular, or jagged in structure. One can assign a scalar value to +the index @samp{4} of the main array @code{a}: + +@example +a[4] = "An element in a jagged array" +@end example + +The terms @dfn{dimension}, @dfn{row} and @dfn{column} are +meaningless when applied +to such an array, but we will use ``dimension'' henceforth to imply the +maximum number of indices needed to refer to an existing element. The +type of any element that has already been assigned cannot be changed +by assigning a value of a different type. You have to first delete the +current element, which effectively makes @command{gawk} forget about +the element at that index: + +@example +delete a[4] +a[4][5][6][7] = "An element in a four-dimensional array" +@end example + +@noindent +This removes the scalar value from index @samp{4} and then inserts a +subarray of subarray of subarray containing a scalar. You can also +delete an entire subarray or subarray of subarrays: + +@example +delete a[4][5] +a[4][5] = "An element in subarray a[4]" +@end example + +But recall that you can not delete the main array @code{a} and then use it +as a scalar. + +The built-in functions which take array arguments can also be used +with subarrays. For example, the following code fragment uses @code{length()} +to determine the number of elements in the main array @code{a} and +its subarrays: + +@example +print length(a), length(a[1]), length(a[1][3]) +@end example + +@noindent +This results in the following output for our main array @code{a}: + +@example +2, 3, 1 +@end example + +@noindent +The @samp{@var{subscript} in @var{array}} expression +(@pxref{Reference to Elements}) works similarly for both +regular @command{awk}-style +arrays and arrays of arrays. For example, the tests @samp{1 in a}, +@samp{3 in a[1]}, and @samp{(1, "name") in a[1][3]} all evaluate to +one (true) for our array @code{a}. + +The @samp{for (item in array)} statement (@pxref{Scanning an Array}) +can be nested to scan all the +elements of an array of arrays if it is rectangular in structure. In order +to print the contents (scalar values) of a two-dimensional array of arrays +with each subarray having the same length, you could use the following +code: + +@example +for (i in array) + for (j in array[j]) + print array[i][j] +@end example + +If the structure of a jagged array of arrays is known in advance, +you can often devise workarounds using control statements. For example, +the following code prints the elements of our main array @code{a}: + +@example +for (i in a) @{ + for (j in a[j]) @{ + if (j == 3) @{ + for (k in a[i][j]) + print a[i][j][k] + @} else + print a[i][j] + @} +@} +@end example + +Recall that a reference to an uninitialized array element yields a value +of @code{""}, the null string. This has one important implication when you +intend to use a subarray as an argument to a function, as illustrated by +the following example: + +@example +$ @kbd{gawk 'BEGIN @{ split("a b c d", b[1]); print b[1][1] @}'} +@error{} gawk: cmd. line:1: fatal: split: second argument is not an array +@end example + +The way to work around this is to first force @code{b[1]} to be an array by +creating an arbitray index: + +@example +$ @kbd{gawk 'BEGIN @{ b[1][1] = ""; split("a b c d", b[1]); print b[1][1] @}'} +@print{} a +@end example @c ENDOFRANGE arrs @node Functions @@ -12661,9 +12812,10 @@ but are summarized here for your convenience. @menu * Calling Built-in:: How to call built-in functions. * Numeric Functions:: Functions that work with numbers, including - @code{int}, @code{sin} and @code{rand}. + @code{int()}, @code{sin()} and @code{rand()}. * String Functions:: Functions for string manipulation, such as - @code{split}, @code{match} and @code{sprintf}. + @code{split()}, @code{match()} and + @code{sprintf()}. * I/O Functions:: Functions for files and shell commands. * Time Functions:: Functions for dealing with timestamps. * Bitwise Functions:: Functions for bitwise operations. @@ -12676,7 +12828,7 @@ but are summarized here for your convenience. To call one of @command{awk}'s built-in functions, write the name of the function followed by arguments in parentheses. For example, @samp{atan2(y + z, 1)} -is a call to the function @code{atan2} and has two arguments. +is a call to the function @code{atan2()} and has two arguments. @cindex programming conventions, functions, calling @cindex whitespace, functions@comma{} calling @@ -12709,7 +12861,7 @@ j = sqrt(i++) @cindex functions, built-in, evaluation order @cindex built-in functions, evaluation order @noindent -the variable @code{i} is incremented to the value five before @code{sqrt} +the variable @code{i} is incremented to the value five before @code{sqrt()} is called with a value of four for its actual parameter. The order of evaluation of the expressions used for the function's parameters is undefined. Thus, avoid writing programs that @@ -12722,9 +12874,9 @@ j = atan2(i++, i *= 2) @end example If the order of evaluation is left to right, then @code{i} first becomes -6, and then 12, and @code{atan2} is called with the two arguments 6 +6, and then 12, and @code{atan2()} is called with the two arguments 6 and 12. But if the order of evaluation is right to left, @code{i} -first becomes 10, then 11, and @code{atan2} is called with the +first becomes 10, then 11, and @code{atan2()} is called with the two arguments 11 and 10. @node Numeric Functions @@ -12736,7 +12888,7 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):} @table @code @item int(@var{x}) -@cindex @code{int} function +@cindex @code{int()} function This returns the nearest integer to @var{x}, located between @var{x} and zero and truncated toward zero. @@ -12744,45 +12896,45 @@ For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. @item sqrt(@var{x}) -@cindex @code{sqrt} function +@cindex @code{sqrt()} function This returns the positive square root of @var{x}. @command{gawk} reports an error if @var{x} is negative. Thus, @code{sqrt(4)} is 2. @item exp(@var{x}) -@cindex @code{exp} function +@cindex @code{exp()} function This returns the exponential of @var{x} (@code{e ^ @var{x}}) or reports an error if @var{x} is out of range. The range of values @var{x} can have depends on your machine's floating-point representation. @item log(@var{x}) -@cindex @code{log} function +@cindex @code{log()} function This returns the natural logarithm of @var{x}, if @var{x} is positive; otherwise, it reports an error. @item sin(@var{x}) -@cindex @code{sin} function +@cindex @code{sin()} function This returns the sine of @var{x}, with @var{x} in radians. @item cos(@var{x}) -@cindex @code{cos} function +@cindex @code{cos()} function This returns the cosine of @var{x}, with @var{x} in radians. @item atan2(@var{y}, @var{x}) -@cindex @code{atan2} function +@cindex @code{atan2()} function This returns the arctangent of @code{@var{y} / @var{x}} in radians. @item rand() -@cindex @code{rand} function -@cindex random numbers, @code{rand}/@code{srand} functions -This returns a random number. The values of @code{rand} are +@cindex @code{rand()} function +@cindex random numbers, @code{rand()}/@code{srand()} functions +This returns a random number. The values of @code{rand()} are uniformly distributed between zero and one. -The value could be zero but is never one.@footnote{The C version of @code{rand} +The value could be zero but is never one.@footnote{The C version of @code{rand()} is known to produce fairly poor sequences of random numbers. However, nothing requires that an @command{awk} implementation use the C -@code{rand} to implement the @command{awk} version of @code{rand}. +@code{rand()} to implement the @command{awk} version of @code{rand()}. In fact, @command{gawk} uses the BSD @code{random} function, which is -considerably better than @code{rand}, to produce random numbers.} +considerably better than @code{rand()}, to produce random numbers.} Often random integers are needed instead. Following is a user-defined function that can be used to obtain a random non-negative integer less than @var{n}: @@ -12795,7 +12947,7 @@ function randint(n) @{ @noindent The multiplication produces a random number greater than zero and less -than @code{n}. Using @code{int}, this result is made into +than @code{n}. Using @code{int()}, this result is made into an integer between zero and @code{n} @minus{} 1, inclusive. The following example uses a similar function to produce random integers @@ -12818,18 +12970,18 @@ function roll(n) @{ return 1 + int(rand() * n) @} @cindex random numbers, seed of @c MAWK uses a different seed each time. @strong{Caution:} In most @command{awk} implementations, including @command{gawk}, -@code{rand} starts generating numbers from the same +@code{rand()} starts generating numbers from the same starting number, or @dfn{seed}, each time you run @command{awk}. Thus, a program generates the same results each time you run it. The numbers are random within one @command{awk} run but predictable from run to run. This is convenient for debugging, but if you want a program to do different things each time it is used, you must change the seed to a value that is different in each run. To do this, -use @code{srand}. +use @code{srand()}. @item srand(@r{[}@var{x}@r{]}) -@cindex @code{srand} function -The function @code{srand} sets the starting point, or seed, +@cindex @code{srand()} function +The function @code{srand()} sets the starting point, or seed, for generating random numbers to the value @var{x}. Each seed value leads to a particular sequence of random @@ -12849,7 +13001,7 @@ If the argument @var{x} is omitted, as in @samp{srand()}, then the current date and time of day are used for a seed. This is the way to get random numbers that are truly unpredictable. -The return value of @code{srand} is the previous seed. This makes it +The return value of @code{srand()} is the previous seed. This makes it easy to keep track of the seeds in case you need to consistently reproduce sequences of random numbers. @end table @@ -12865,15 +13017,15 @@ specific to @command{gawk} are marked with a pound sign@w{ (@samp{#}):} @menu * Gory Details:: More than you want to know about @samp{\} and - @samp{&} with @code{sub}, @code{gsub}, and - @code{gensub}. + @samp{&} with @code{sub()}, @code{gsub()}, and + @code{gensub()}. @end menu @table @code @item asort(@var{source} @r{[}, @var{dest}@r{]}) # @cindex arrays, elements, retrieving number of -@cindex @code{asort} function (@command{gawk}) -@code{asort} is a @command{gawk}-specific extension, returning the number of +@cindex @code{asort()} function (@command{gawk}) +@code{asort()} is a @command{gawk}-specific extension, returning the number of elements in the array @var{source}. The contents of @var{source} are sorted using @command{gawk}'s normal rules for comparing values (in particular, @code{IGNORECASE} affects the sorting) @@ -12891,7 +13043,7 @@ a["middle"] = "cul" @end example @noindent -A call to @code{asort}: +A call to @code{asort()}: @example asort(a) @@ -12906,28 +13058,28 @@ a[2] = "de" a[3] = "sac" @end example -The @code{asort} function is described in more detail in +The @code{asort()} function is described in more detail in @ref{Array Sorting}. -@code{asort} is a @command{gawk} extension; it is not available +@code{asort()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item asorti(@var{source} @r{[}, @var{dest}@r{]}) # -@cindex @code{asorti} function (@command{gawk}) -@code{asorti} is a @command{gawk}-specific extension, returning the number of +@cindex @code{asorti()} function (@command{gawk}) +@code{asorti()} is a @command{gawk}-specific extension, returning the number of elements in the array @var{source}. -It works similarly to @code{asort}, however, the @emph{indices} +It works similarly to @code{asort()}, however, the @emph{indices} are sorted, instead of the values. As array indices are always strings, the comparison performed is always a string comparison. (Here too, @code{IGNORECASE} affects the sorting.) -The @code{asorti} function is described in more detail in +The @code{asorti()} function is described in more detail in @ref{Array Sorting}. It was added in @command{gawk} 3.1.2. -@code{asorti} is a @command{gawk} extension; it is not available +@code{asorti()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item index(@var{in}, @var{find}) -@cindex @code{index} function +@cindex @code{index()} function @cindex searching This searches the string @var{in} for the first occurrence of the string @var{find}, and returns the position in characters where that occurrence @@ -12939,11 +13091,11 @@ $ awk 'BEGIN @{ print index("peanut", "an") @}' @end example @noindent -If @var{find} is not found, @code{index} returns zero. +If @var{find} is not found, @code{index()} returns zero. (Remember that string indices in @command{awk} start at one.) @item length(@r{[}@var{string}@r{]}) -@cindex @code{length} function +@cindex @code{length()} function This returns the number of characters in @var{string}. If @var{string} is a number, the length of the digit string representing that number is returned. For example, @code{length("abcde")} is 5. By @@ -12951,23 +13103,22 @@ contrast, @code{length(15 * 35)} works out to 3. In this example, 15 * 35 = 525, and 525 is then converted to the string @code{"525"}, which has three characters. -If no argument is supplied, @code{length} returns the length of @code{$0}. +If no argument is supplied, @code{length()} returns the length of @code{$0}. @c @cindex historical features -@cindex portability, @code{length} function -@cindex POSIX @command{awk}, functions and, @code{length} +@cindex portability, @code{length()} function +@cindex POSIX @command{awk}, functions and, @code{length()} @quotation NOTE -In older versions of @command{awk}, the @code{length} function could +In older versions of @command{awk}, the @code{length()} function could be called -without any parentheses. Doing so is marked as ``deprecated'' in the -POSIX standard. This means that while a program can do this, -it is a feature that can eventually be removed from a future -version of the standard. Therefore, for programs to be maximally portable, +without any parentheses. Doing so is considered poor practice, +although the 2008 POSIX standard explicitly allows it, to +support historical practice. For programs to be maximally portable, always supply the parentheses. @end quotation -@cindex dark corner, @code{length} function -If @code{length} is called with a variable that has not been used, +@cindex dark corner, @code{length()} function +If @code{length()} is called with a variable that has not been used, @command{gawk} forces the variable to be a scalar. Other implementations of @command{awk} leave the variable without a type. @value{DARKCORNER} @@ -12990,7 +13141,7 @@ warning about this. @cindex differences between @command{gawk} and @command{awk} Beginning with @command{gawk} @value{PVERSION} 3.1.5, when supplied an -array argument, the @code{length} function returns the number of elements +array argument, the @code{length()} function returns the number of elements in the array. This is less useful than it might seem at first, as the array is not guaranteed to be indexed from one to the number of elements in it. @@ -13001,8 +13152,8 @@ If @option{--posix} is supplied, using an array argument is a fatal error (@pxref{Arrays}). @item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]}) -@cindex @code{match} function -The @code{match} function searches @var{string} for the +@cindex @code{match()} function +The @code{match()} function searches @var{string} for the longest, leftmost substring matched by the regular expression, @var{regexp}. It returns the character position, or @dfn{index}, at which that substring begins (one, if it starts at the beginning of @@ -13017,14 +13168,14 @@ implications for writing your program correctly. The order of the first two arguments is backwards from most other string functions that work with regular expressions, such as -@code{sub} and @code{gsub}. It might help to remember that -for @code{match}, the order is the same as for the @samp{~} operator: +@code{sub()} and @code{gsub()}. It might help to remember that +for @code{match()}, the order is the same as for the @samp{~} operator: @samp{@var{string} ~ @var{regexp}}. -@cindex @code{RSTART} variable, @code{match} function and -@cindex @code{RLENGTH} variable, @code{match} function and -@cindex @code{match} function, @code{RSTART}/@code{RLENGTH} variables -The @code{match} function sets the built-in variable @code{RSTART} to +@cindex @code{RSTART} variable, @code{match()} function and +@cindex @code{RLENGTH} variable, @code{match()} function and +@cindex @code{match()} function, @code{RSTART}/@code{RLENGTH} variables +The @code{match()} function sets the built-in variable @code{RSTART} to the index. It also sets the built-in variable @code{RLENGTH} to the length in characters of the matched substring. If no match is found, @code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1. @@ -13072,7 +13223,7 @@ Match of ru+n found at 12 in My program runs Match of Melvin found at 1 in Melvin was here. @end example -@cindex differences in @command{awk} and @command{gawk}, @code{match} function +@cindex differences in @command{awk} and @command{gawk}, @code{match()} function If @var{array} is present, it is cleared, and then the 0th element of @var{array} is set to the entire portion of @var{string} matched by @var{regexp}. If @var{regexp} contains parentheses, @@ -13110,14 +13261,14 @@ subexpressions, since they may not all have matched text; thus they should be tested for with the @code{in} operator (@pxref{Reference to Elements}). -@cindex troubleshooting, @code{match} function -The @var{array} argument to @code{match} is a +@cindex troubleshooting, @code{match()} function +The @var{array} argument to @code{match()} is a @command{gawk} extension. In compatibility mode (@pxref{Options}), using a third argument is a fatal error. @item patsplit(@var{string}, @var{array} @r{[}, @var{fieldpat} @r{[}, @var{seps} @r{]} @r{]}) -@cindex @code{patsplit} function +@cindex @code{patsplit()} function This function divides @var{string} into pieces defined by @var{fieldpat} and stores the pieces in @var{array} and the separator strings in the @var{seps} array. The first piece is stored in @@ -13126,17 +13277,17 @@ forth. The string value of the third argument, @var{fieldpat}, is a regexp describing the fields in @var{string} (just as @code{FPAT} is a regexp describing the fields in input records). If @var{fieldpat} is omitted, the value of @code{FPAT} is used. -@code{patsplit} returns the number of elements created. +@code{patsplit()} returns the number of elements created. @code{@var{seps}[@var{i}]} is the separator string between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}. Any leading separator will be in @code{@var{seps}[0]}. -The @code{patsplit} function splits strings into pieces in a +The @code{patsplit()} function splits strings into pieces in a manner similar to the way input lines are split into fields using @code{FPAT}. @item split(@var{string}, @var{array} @r{[}, @var{fieldsep} @r{[}, @var{seps} @r{]} @r{]}) -@cindex @code{split} function +@cindex @code{split()} function This function divides @var{string} into pieces separated by @var{fieldsep} and stores the pieces in @var{array} and the separator strings in the @var{seps} array. The first piece is stored in @@ -13145,7 +13296,7 @@ forth. The string value of the third argument, @var{fieldsep}, is a regexp describing where to split @var{string} (much as @code{FS} can be a regexp describing where to split input records). If @var{fieldsep} is omitted, the value of @code{FS} is used. -@code{split} returns the number of elements created. +@code{split()} returns the number of elements created. @var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]} being the separator string between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}. @@ -13156,7 +13307,7 @@ whitespace goes into @code{@var{seps}[@var{n}]} where @var{n} is the return value of @code{split()} (that is, the number of elements in @var{array}). -The @code{split} function splits strings into pieces in a +The @code{split()} function splits strings into pieces in a manner similar to the way input lines are split into fields. For example: @example @@ -13182,9 +13333,9 @@ seps[2] = "-" @end example @noindent -The value returned by this call to @code{split} is three. +The value returned by this call to @code{split()} is three. -@cindex differences in @command{awk} and @command{gawk}, @code{split} function +@cindex differences in @command{awk} and @command{gawk}, @code{split()} function As with input field-splitting, when the value of @var{fieldsep} is @w{@code{" "}}, leading and trailing whitespace is ignored in @var{array} but not in @var{seps}, and the elements @@ -13193,11 +13344,11 @@ Also as with input field-splitting, if @var{fieldsep} is the null string, each individual character in the string is split into its own array element. (This is a @command{gawk}-specific extension.) -Note, however, that @code{RS} has no effect on the way @code{split} +Note, however, that @code{RS} has no effect on the way @code{split()} works. Even though @samp{RS = ""} causes newline to also be an input -field separator, this does not affect how @code{split} splits strings. +field separator, this does not affect how @code{split()} splits strings. -@cindex dark corner, @code{split} function +@cindex dark corner, @code{split()} function Modern implementations of @command{awk}, including @command{gawk}, allow the third argument to be a regexp constant (@code{/abc/}) as well as a string. @@ -13207,7 +13358,7 @@ The POSIX standard allows this as well. discussion of the difference between using a string constant or a regexp constant, and the implications for writing your program correctly. -Before splitting the string, @code{split} deletes any previously existing +Before splitting the string, @code{split()} deletes any previously existing elements in the arrays @var{array} and @var{seps}. If @var{string} is null, the array has no elements. (So this is a portable @@ -13219,7 +13370,7 @@ If @var{string} does not match @var{fieldsep} at all (but is not null), @var{string}. @item sprintf(@var{format}, @var{expression1}, @dots{}) -@cindex @code{sprintf} function +@cindex @code{sprintf()} function This returns (without printing) the string that @code{printf} would have printed out with the same arguments (@pxref{Printf}). @@ -13232,13 +13383,13 @@ pival = sprintf("pi = %.2f (approx.)", 22/7) @noindent assigns the string @w{@code{"pi = 3.14 (approx.)"}} to the variable @code{pival}. -@cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk}) -@cindex @code{strtonum} function (@command{gawk}) +@cindex differences in @command{awk} and @command{gawk}, @code{strtonum()} function (@command{gawk}) +@cindex @code{strtonum()} function (@command{gawk}) @item strtonum(@var{str}) # Examines @var{str} and returns its numeric value. If @var{str} -begins with a leading @samp{0}, @code{strtonum} assumes that @var{str} +begins with a leading @samp{0}, @code{strtonum()} assumes that @var{str} is an octal number. If @var{str} begins with a leading @samp{0x} or -@samp{0X}, @code{strtonum} assumes that @var{str} is a hexadecimal number. +@samp{0X}, @code{strtonum()} assumes that @var{str} is a hexadecimal number. For example: @example @@ -13247,22 +13398,22 @@ $ echo 0x11 | @print{} 17 @end example -Using the @code{strtonum} function is @emph{not} the same as adding zero +Using the @code{strtonum()} function is @emph{not} the same as adding zero to a string value; the automatic coercion of strings to numbers works only for decimal data, not for octal or hexadecimal.@footnote{Unless you use the @option{--non-decimal-data} option, which isn't recommended. @xref{Nondecimal Data}, for more information.} -Note also that @code{strtonum} uses the current locale's decimal point +Note also that @code{strtonum()} uses the current locale's decimal point for recognizing numbers. -@cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk}) -@code{strtonum} is a @command{gawk} extension; it is not available +@cindex differences in @command{awk} and @command{gawk}, @code{strtonum()} function (@command{gawk}) +@code{strtonum()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{sub} function -The @code{sub} function alters the value of @var{target}. +@cindex @code{sub()} function +The @code{sub()} function alters the value of @var{target}. It searches this value, which is treated as a string, for the leftmost, longest substring matched by the regular expression @var{regexp}. Then the entire string is @@ -13278,7 +13429,7 @@ implications for writing your program correctly. This function is peculiar because @var{target} is not simply used to compute a value, and not just any expression will do---it -must be a variable, field, or array element so that @code{sub} can +must be a variable, field, or array element so that @code{sub()} can store a modified value there. If this argument is omitted, then the default is to use and alter @code{$0}.@footnote{Note that this means that the record will first be regenerated using the value of @code{OFS} if @@ -13296,7 +13447,7 @@ sub(/at/, "ith", str) sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the leftmost longest occurrence of @samp{at} with @samp{ith}. -The @code{sub} function returns the number of substitutions made (either +The @code{sub()} function returns the number of substitutions made (either one or zero). If the special character @samp{&} appears in @var{replacement}, it @@ -13338,12 +13489,12 @@ an @samp{&}: @{ sub(/\|/, "\\&"); print @} @end example -@cindex @code{sub} function, arguments of -@cindex @code{gsub} function, arguments of -As mentioned, the third argument to @code{sub} must +@cindex @code{sub()} function, arguments of +@cindex @code{gsub()} function, arguments of +As mentioned, the third argument to @code{sub()} must be a variable, field or array reference. Some versions of @command{awk} allow the third argument to -be an expression that is not an lvalue. In such a case, @code{sub} +be an expression that is not an lvalue. In such a case, @code{sub()} still searches for the pattern and returns zero or one, but the result of the substitution (if any) is thrown away because there is no place to put it. Such versions of @command{awk} accept expressions @@ -13354,7 +13505,7 @@ sub(/USA/, "United States", "the USA and Canada") @end example @noindent -@cindex troubleshooting, @code{gsub}/@code{sub} functions +@cindex troubleshooting, @code{gsub()}/@code{sub()} functions For historical compatibility, @command{gawk} accepts erroneous code, such as in the previous example. However, using any other nonchangeable object as the third parameter causes a fatal error and your program @@ -13364,10 +13515,10 @@ Finally, if the @var{regexp} is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match. @item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{gsub} function -This is similar to the @code{sub} function, except @code{gsub} replaces +@cindex @code{gsub()} function +This is similar to the @code{sub()} function, except @code{gsub()} replaces @emph{all} of the longest, leftmost, @emph{nonoverlapping} matching -substrings it can find. The @samp{g} in @code{gsub} stands for +substrings it can find. The @samp{g} in @code{gsub()} stands for ``global,'' which means replace everywhere. For example: @example @@ -13378,17 +13529,17 @@ substrings it can find. The @samp{g} in @code{gsub} stands for replaces all occurrences of the string @samp{Britain} with @samp{United Kingdom} for all input records. -The @code{gsub} function returns the number of substitutions made. If +The @code{gsub()} function returns the number of substitutions made. If the variable to search and alter (@var{target}) is omitted, then the entire input record (@code{$0}) is used. -As in @code{sub}, the characters @samp{&} and @samp{\} are special, +As in @code{sub()}, the characters @samp{&} and @samp{\} are special, and the third argument must be assignable. @item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) # -@cindex @code{gensub} function (@command{gawk}) -@code{gensub} is a general substitution function. Like @code{sub} and -@code{gsub}, it searches the target string @var{target} for matches of -the regular expression @var{regexp}. Unlike @code{sub} and @code{gsub}, +@cindex @code{gensub()} function (@command{gawk}) +@code{gensub()} is a general substitution function. Like @code{sub()} and +@code{gsub()}, it searches the target string @var{target} for matches of +the regular expression @var{regexp}. Unlike @code{sub()} and @code{gsub()}, the modified string is returned as the result of the function and the original target string is @emph{not} changed. If @var{how} is a string beginning with @samp{g} or @samp{G}, then it replaces all matches of @@ -13396,8 +13547,8 @@ beginning with @samp{g} or @samp{G}, then it replaces all matches of as a number that indicates which match of @var{regexp} to replace. If no @var{target} is supplied, @code{$0} is used. -@code{gensub} provides an additional feature that is not available -in @code{sub} or @code{gsub}: the ability to specify components of a +@code{gensub()} provides an additional feature that is not available +in @code{sub()} or @code{gsub()}: the ability to specify components of a regexp in the replacement text. This is done by using parentheses in the regexp to mark the components and then specifying @samp{\@var{N}} in the replacement text, where @var{N} is a digit from 1 to 9. @@ -13414,7 +13565,7 @@ $ gawk ' @end example @noindent -As with @code{sub}, you must type two backslashes in order +As with @code{sub()}, you must type two backslashes in order to get one into the string. In the replacement text, the sequence @samp{\0} represents the entire matched text, as does the character @samp{&}. @@ -13429,7 +13580,7 @@ $ echo a b c a b c | @end example In this case, @code{$0} is used as the default target string. -@code{gensub} returns the new string as its result, which is +@code{gensub()} returns the new string as its result, which is passed directly to @code{print} for printing. @c @cindex automatic warnings @@ -13439,14 +13590,14 @@ If the @var{how} argument is a string that does not begin with @samp{g} or substitution is performed. If @var{how} is zero, @command{gawk} issues a warning message. -If @var{regexp} does not match @var{target}, @code{gensub}'s return value +If @var{regexp} does not match @var{target}, @code{gensub()}'s return value is the original unchanged value of @var{target}. -@code{gensub} is a @command{gawk} extension; it is not available +@code{gensub()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]}) -@cindex @code{substr} function +@cindex @code{substr()} function This returns a @var{length}-character-long substring of @var{string}, starting at character number @var{start}. The first character of a string is character number one.@footnote{This is different from @@ -13460,17 +13611,17 @@ suffix is also returned if @var{length} is greater than the number of characters remaining in the string, counting from character @var{start}. -If @var{start} is less than one, @code{substr} treats it as +If @var{start} is less than one, @code{substr()} treats it as if it was one. (POSIX doesn't specify what to do in this case: Unix @command{awk} acts this way, and therefore @command{gawk} does too.) If @var{start} is greater than the number of characters -in the string, @code{substr} returns the null string. +in the string, @code{substr()} returns the null string. Similarly, if @var{length} is present but less than or equal to zero, the null string is returned. -@cindex troubleshooting, @code{substr} function -The string returned by @code{substr} @emph{cannot} be +@cindex troubleshooting, @code{substr()} function +The string returned by @code{substr()} @emph{cannot} be assigned. Thus, it is a mistake to attempt to change a portion of a string, as shown in the following example: @@ -13481,18 +13632,18 @@ substr(string, 3, 3) = "CDE" @end example @noindent -It is also a mistake to use @code{substr} as the third argument -of @code{sub} or @code{gsub}: +It is also a mistake to use @code{substr()} as the third argument +of @code{sub()} or @code{gsub()}: @example gsub(/xyz/, "pdq", substr($0, 5, 20)) # WRONG @end example -@cindex portability, @code{substr} function +@cindex portability, @code{substr()} function (Some commercial versions of @command{awk} do in fact let you use -@code{substr} this way, but doing so is not portable.) +@code{substr()} this way, but doing so is not portable.) -If you need to replace bits and pieces of a string, combine @code{substr} +If you need to replace bits and pieces of a string, combine @code{substr()} with string concatenation, in the following manner: @example @@ -13504,14 +13655,14 @@ string = substr(string, 1, 2) "CDE" substr(string, 6) @cindex case sensitivity, converting case @cindex converting, case @item tolower(@var{string}) -@cindex @code{tolower} function +@cindex @code{tolower()} function This returns a copy of @var{string}, with each uppercase character in the string replaced with its corresponding lowercase character. Nonalphabetic characters are left unchanged. For example, @code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}. @item toupper(@var{string}) -@cindex @code{toupper} function +@cindex @code{toupper()} function This returns a copy of @var{string}, with each lowercase character in the string replaced with its corresponding uppercase character. Nonalphabetic characters are left unchanged. For example, @@ -13519,17 +13670,17 @@ Nonalphabetic characters are left unchanged. For example, @end table @node Gory Details -@subsubsection More About @samp{\} and @samp{&} with @code{sub}, @code{gsub}, and @code{gensub} - -@cindex escape processing, @code{gsub}/@code{gensub}/@code{sub} functions -@cindex @code{sub} function, escape processing -@cindex @code{gsub} function, escape processing -@cindex @code{gensub} function (@command{gawk}), escape processing -@cindex @code{\} (backslash), @code{gsub}/@code{gensub}/@code{sub} functions and -@cindex backslash (@code{\}), @code{gsub}/@code{gensub}/@code{sub} functions and -@cindex @code{&} (ampersand), @code{gsub}/@code{gensub}/@code{sub} functions and -@cindex ampersand (@code{&}), @code{gsub}/@code{gensub}/@code{sub} functions and -When using @code{sub}, @code{gsub}, or @code{gensub}, and trying to get literal +@subsubsection More About @samp{\} and @samp{&} with @code{sub()}, @code{gsub()}, and @code{gensub()} + +@cindex escape processing, @code{gsub()}/@code{gensub()}/@code{sub()} functions +@cindex @code{sub()} function, escape processing +@cindex @code{gsub()} function, escape processing +@cindex @code{gensub()} function (@command{gawk}), escape processing +@cindex @code{\} (backslash), @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex backslash (@code{\}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex @code{&} (ampersand), @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex ampersand (@code{&}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and +When using @code{sub()}, @code{gsub()}, or @code{gensub()}, and trying to get literal backslashes and ampersands into the replacement text, you need to remember that there are several levels of @dfn{escape processing} going on. @@ -13551,7 +13702,7 @@ example, @code{"a\qb"} is treated as @code{"aqb"}. At the runtime level, the various functions handle sequences of @samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex. -Historically, the @code{sub} and @code{gsub} functions treated the two +Historically, the @code{sub()} and @code{gsub()} functions treated the two character sequence @samp{\&} specially; this sequence was replaced in the generated text with a single @samp{&}. Any other @samp{\} within the @var{replacement} string that did not precede an @samp{&} was passed @@ -13567,7 +13718,7 @@ through unchanged. This is illustrated in @ref{table-sub-escapes}. % But then we need character for escape and tab. @catcode`! = 4 @halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr + You type!@code{sub()} sees!@code{sub()} generates@cr @hrulefill!@hrulefill!@hrulefill@cr @code{\&}! @code{&}!the matched text@cr @code{\\&}! @code{\&}!a literal @samp{&}@cr @@ -13581,7 +13732,7 @@ through unchanged. This is illustrated in @ref{table-sub-escapes}. @end tex @ifdocbook @multitable @columnfractions .20 .20 .60 -@headitem You type @tab @code{sub} sees @tab @code{sub} generates +@headitem You type @tab @code{sub()} sees @tab @code{sub()} generates @item @code{\&} @tab @code{&} @tab the matched text @item @code{\\&} @tab @code{\&} @tab a literal @samp{&} @item @code{\\\&} @tab @code{\&} @tab a literal @samp{&} @@ -13594,7 +13745,7 @@ through unchanged. This is illustrated in @ref{table-sub-escapes}. @ifnottex @ifnotdocbook @display - You type @code{sub} sees @code{sub} generates + You type @code{sub()} sees @code{sub()} generates -------- ---------- --------------- @code{\&} @code{&} the matched text @code{\\&} @code{\&} a literal @samp{&} @@ -13611,7 +13762,7 @@ through unchanged. This is illustrated in @ref{table-sub-escapes}. @noindent This table shows both the lexical-level processing, where an odd number of backslashes becomes an even number at the runtime level, -as well as the runtime processing done by @code{sub}. +as well as the runtime processing done by @code{sub()}. (For the sake of simplicity, the rest of the following tables only show the case of even numbers of backslashes entered at the lexical level.) @@ -13619,9 +13770,9 @@ The problem with the historical approach is that there is no way to get a literal @samp{\} followed by the matched text. @c @cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk}, functions and, @code{gsub}/@code{sub} +@cindex POSIX @command{awk}, functions and, @code{gsub()}/@code{sub()} The 1992 POSIX standard attempted to fix this problem. That standard -says that @code{sub} and @code{gsub} look for either a @samp{\} or an @samp{&} +says that @code{sub()} and @code{gsub()} look for either a @samp{\} or an @samp{&} after the @samp{\}. If either one follows a @samp{\}, that character is output literally. The interpretation of @samp{\} and @samp{&} then becomes as shown in @ref{table-sub-posix-92}. @@ -13636,7 +13787,7 @@ as shown in @ref{table-sub-posix-92}. % But then we need character for escape and tab. @catcode`! = 4 @halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr + You type!@code{sub()} sees!@code{sub()} generates@cr @hrulefill!@hrulefill!@hrulefill@cr @code{&}! @code{&}!the matched text@cr @code{\\&}! @code{\&}!a literal @samp{&}@cr @@ -13647,7 +13798,7 @@ as shown in @ref{table-sub-posix-92}. @end tex @ifdocbook @multitable @columnfractions .20 .20 .60 -@headitem You type @tab @code{sub} sees @tab @code{sub} generates +@headitem You type @tab @code{sub()} sees @tab @code{sub()} generates @item @code{&} @tab @code{&} @tab the matched text @item @code{\\&} @tab @code{\&} @tab a literal @samp{&} @item @code{\\\\&} @tab @code{\\&} @tab a literal @samp{\}, then the matched text @@ -13657,7 +13808,7 @@ as shown in @ref{table-sub-posix-92}. @ifnottex @ifnotdocbook @display - You type @code{sub} sees @code{sub} generates + You type @code{sub()} sees @code{sub()} generates -------- ---------- --------------- @code{&} @code{&} the matched text @code{\\&} @code{\&} a literal @samp{&} @@ -13704,7 +13855,7 @@ to produce a @samp{\} preceding the matched text. This is shown in % But then we need character for escape and tab. @catcode`! = 4 @halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr + You type!@code{sub()} sees!@code{sub()} generates@cr @hrulefill!@hrulefill!@hrulefill@cr @code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr @code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr @@ -13716,7 +13867,7 @@ to produce a @samp{\} preceding the matched text. This is shown in @end tex @ifdocbook @multitable @columnfractions .20 .20 .60 -@headitem You type @tab @code{sub} sees @tab @code{sub} generates +@headitem You type @tab @code{sub()} sees @tab @code{sub()} generates @item @code{\\\\\\&} @tab @code{\\\&} @tab a literal @samp{\&} @item @code{\\\\&} @tab @code{\\&} @tab a literal @samp{\}, followed by the matched text @item @code{\\&} @tab @code{\&} @tab a literal @samp{&} @@ -13727,7 +13878,7 @@ to produce a @samp{\} preceding the matched text. This is shown in @ifnottex @ifnotdocbook @display - You type @code{sub} sees @code{sub} generates + You type @code{sub()} sees @code{sub()} generates -------- ---------- --------------- @code{\\\\\\&} @code{\\\&} a literal @samp{\&} @code{\\\\&} @code{\\&} a literal @samp{\}, followed by the matched text @@ -13745,8 +13896,8 @@ there was only one. However, as in the historical case, any @samp{\} that is not part of one of these three sequences is not special and appears in the output literally. -@command{gawk} 3.0 and 3.1 follow these proposed POSIX rules for @code{sub} and -@code{gsub}. +@command{gawk} 3.0 and 3.1 follow these proposed POSIX rules for @code{sub()} and +@code{gsub()}. @c As much as we think it's a lousy idea. You win some, you lose some. Sigh. The POSIX standard took much longer to be revised than was expected in 1996. The 2001 standard does not follow the above rules. Instead, the rules @@ -13766,7 +13917,7 @@ These rules are presented in @ref{table-posix-2001-sub}. % But then we need character for escape and tab. @catcode`! = 4 @halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr + You type!@code{sub()} sees!@code{sub()} generates@cr @hrulefill!@hrulefill!@hrulefill@cr @code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr @code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr @@ -13778,7 +13929,7 @@ These rules are presented in @ref{table-posix-2001-sub}. @end tex @ifdocbook @multitable @columnfractions .20 .20 .60 -@headitem You type @tab @code{sub} sees @tab @code{sub} generates +@headitem You type @tab @code{sub()} sees @tab @code{sub()} generates @item @code{\\\\\\&} @tab @code{\\\&} @tab a literal @samp{\&} @item @code{\\\\&} @tab @code{\\&} @tab a literal @samp{\}, followed by the matched text @item @code{\\&} @tab @code{\&} @tab a literal @samp{&} @@ -13789,7 +13940,7 @@ These rules are presented in @ref{table-posix-2001-sub}. @ifnottex @ifnotdocbook @display - You type @code{sub} sees @code{sub} generates + You type @code{sub()} sees @code{sub()} generates -------- ---------- --------------- @code{\\\\\\&} @code{\\\&} a literal @samp{\&} @code{\\\\&} @code{\\&} a literal @samp{\}, followed by the matched text @@ -13804,14 +13955,14 @@ These rules are presented in @ref{table-posix-2001-sub}. The only case where the difference is noticeable is the last one: @samp{\\\\} is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}. -Starting with version 3.1.4, @command{gawk} followed the POSIX rules +Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules when @option{--posix} is specified (@pxref{Options}). Otherwise, it continued to follow the 1996 proposed rules, since that had been its behavior for many seven years. -As of version 3.2, @command{gawk} uses the POSIX 2001 rules. +As of @value{PVERSION} 4.0, @command{gawk} uses the POSIX 2001 rules. -The rules for @code{gensub} are considerably simpler. At the runtime +The rules for @code{gensub()} are considerably simpler. At the runtime level, whenever @command{gawk} sees a @samp{\}, if the following character is a digit, then the text that matched the corresponding parenthesized subexpression is placed in the generated output. Otherwise, @@ -13828,7 +13979,7 @@ as shown in @ref{table-gensub-escapes}. % But then we need character for escape and tab. @catcode`! = 4 @halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{gensub} sees!@code{gensub} generates@cr + You type!@code{gensub()} sees!@code{gensub()} generates@cr @hrulefill!@hrulefill!@hrulefill@cr @code{&}! @code{&}!the matched text@cr @code{\\&}! @code{\&}!a literal @samp{&}@cr @@ -13841,7 +13992,7 @@ as shown in @ref{table-gensub-escapes}. @end tex @ifdocbook @multitable @columnfractions .20 .20 .60 -@headitem You type @tab @code{gensub} sees @tab @code{gensub} generates +@headitem You type @tab @code{gensub()} sees @tab @code{gensub()} generates @item @code{&} @tab @code{&} @tab the matched text @item @code{\\&} @tab @code{\&} @tab a literal @samp{&} @item @code{\\\\} @tab @code{\\} @tab a literal @samp{\} @@ -13853,7 +14004,7 @@ as shown in @ref{table-gensub-escapes}. @ifnottex @ifnotdocbook @display - You type @code{gensub} sees @code{gensub} generates + You type @code{gensub()} sees @code{gensub()} generates -------- ------------- ------------------ @code{&} @code{&} the matched text @code{\\&} @code{\&} a literal @samp{&} @@ -13867,8 +14018,8 @@ as shown in @ref{table-gensub-escapes}. @end float Because of the complexity of the lexical and runtime level processing -and the special cases for @code{sub} and @code{gsub}, -we recommend the use of @command{gawk} and @code{gensub} when you have +and the special cases for @code{sub()} and @code{gsub()}, +we recommend the use of @command{gawk} and @code{gensub()} when you have to do substitutions. @c fakenode --- for prepinfo @@ -13880,8 +14031,8 @@ to do substitutions. @cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching In @command{awk}, the @samp{*} operator can match the null string. -This is particularly important for the @code{sub}, @code{gsub}, -and @code{gensub} functions. For example: +This is particularly important for the @code{sub()}, @code{gsub()}, +and @code{gensub()} functions. For example: @example $ echo abc | awk '@{ gsub(/m*/, "X"); print @}' @@ -13899,7 +14050,7 @@ Optional parameters are enclosed in square brackets ([ ]): @table @code @item close(@var{filename} @r{[}, @var{how}@r{]}) -@cindex @code{close} function +@cindex @code{close()} function @cindex files, closing Close the file @var{filename} for input or output. Alternatively, the argument may be a shell command that was used for creating a coprocess, or @@ -13909,7 +14060,7 @@ for more information. When closing a coprocess, it is occasionally useful to first close one end of the two-way pipe and then to close the other. This is done -by providing a second argument to @code{close}. This second argument +by providing a second argument to @code{close()}. This second argument should be one of the two string values @code{"to"} or @code{"from"}, indicating which end of the pipe to close. Case in the string does not matter. @@ -13917,12 +14068,12 @@ not matter. which discusses this feature in more detail and gives an example. @item fflush(@r{[}@var{filename}@r{]}) -@cindex @code{fflush} function +@cindex @code{fflush()} function Flush any buffered output associated with @var{filename}, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess. -@cindex portability, @code{fflush} function and +@cindex portability, @code{fflush()} function and @cindex buffers, flushing @cindex output, buffering Many utility programs @dfn{buffer} their output; i.e., they save information @@ -13932,17 +14083,17 @@ This is often more efficient than writing every little bit of information as soon as it is ready. However, sometimes it is necessary to force a program to @dfn{flush} its buffers; that is, write the information to its destination, even if a buffer is not full. -This is the purpose of the @code{fflush} function---@command{gawk} also -buffers its output and the @code{fflush} function forces +This is the purpose of the @code{fflush()} function---@command{gawk} also +buffers its output and the @code{fflush()} function forces @command{gawk} to flush its buffers. -@code{fflush} was added to the Bell Laboratories research +@code{fflush()} was added to the Bell Laboratories research version of @command{awk} in 1994; it is not part of the POSIX standard and is not available if @option{--posix} has been specified on the command line (@pxref{Options}). -@cindex @command{gawk}, @code{fflush} function in -@command{gawk} extends the @code{fflush} function in two ways. The first +@cindex @command{gawk}, @code{fflush()} function in +@command{gawk} extends the @code{fflush()} function in two ways. The first is to allow no argument at all. In this case, the buffer for the standard output is flushed. The second is to allow the null string (@w{@code{""}}) as the argument. In this case, the buffers for @@ -13953,8 +14104,8 @@ support these extensions. @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex troubleshooting, @code{fflush} function -@code{fflush} returns zero if the buffer is successfully flushed; +@cindex troubleshooting, @code{fflush()} function +@code{fflush()} returns zero if the buffer is successfully flushed; otherwise, it returns @minus{}1. In the case where all buffers are flushed, the return value is zero only if all buffers were flushed successfully. Otherwise, it is @@ -13963,13 +14114,13 @@ only if all buffers were flushed successfully. Otherwise, it is @command{gawk} also issues a warning message if you attempt to flush a file or pipe that was opened for reading (such as with @code{getline}), or if @var{filename} is not an open file, pipe, or coprocess. -In such a case, @code{fflush} returns @minus{}1, as well. +In such a case, @code{fflush()} returns @minus{}1, as well. @item system(@var{command}) -@cindex @code{system} function +@cindex @code{system()} function @cindex interacting with other programs Executes operating-system -commands and then returns to the @command{awk} program. The @code{system} +commands and then returns to the @command{awk} program. The @code{system()} function executes the command given by the string @var{command}. It returns the status returned by the command that was executed as its value. @@ -13998,16 +14149,16 @@ close("/bin/sh") @end example @noindent -@cindex troubleshooting, @code{system} function +@cindex troubleshooting, @code{system()} function @cindex @code{--sandbox} option, disabling @command{system} function However, if your @command{awk} -program is interactive, @code{system} is useful for cranking up large +program is interactive, @code{system()} is useful for cranking up large self-contained programs, such as a shell or an editor. -Some operating systems cannot implement the @code{system} function. -@code{system} causes a fatal error if it is not supported. +Some operating systems cannot implement the @code{system()} function. +@code{system()} causes a fatal error if it is not supported. @quotation NOTE -When @option{--sandbox} is specified, the @code{system} function is disabled. +When @option{--sandbox} is specified, the @code{system()} function is disabled. @end quotation @end table @@ -14057,23 +14208,23 @@ Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because it is all buffered and sent down the pipe to @command{cat} in one shot. @c fakenode --- for prepinfo -@subheading Advanced Notes: Controlling Output Buffering with @code{system} +@subheading Advanced Notes: Controlling Output Buffering with @code{system()} @cindex advanced features, buffering @cindex buffers, flushing @cindex buffering, input/output @cindex output, buffering -The @code{fflush} function provides explicit control over output buffering for +The @code{fflush()} function provides explicit control over output buffering for individual files and pipes. However, its use is not portable to many other @command{awk} implementations. An alternative method to flush output -buffers is to call @code{system} with a null string as its argument: +buffers is to call @code{system()} with a null string as its argument: @example system("") # flush output @end example @noindent -@command{gawk} treats this use of the @code{system} function as a special +@command{gawk} treats this use of the @code{system()} function as a special case and is smart enough not to run a shell (or other command interpreter) with the empty command. Therefore, with @command{gawk}, this idiom is not only useful, it is also efficient. While this method should work @@ -14083,7 +14234,7 @@ flush the buffer associated with the standard output and not necessarily all buffered output.) If you think about what a programmer expects, it makes sense that -@code{system} should flush any pending output. The following program: +@code{system()} should flush any pending output. The following program: @example BEGIN @{ @@ -14111,7 +14262,7 @@ first print second print @end example -If @command{awk} did not flush its buffers before calling @code{system}, +If @command{awk} did not flush its buffers before calling @code{system()}, you would see the latter (undesirable) output. @node Time Functions @@ -14153,7 +14304,7 @@ Optional parameters are enclosed in square brackets ([ ]): @table @code @item systime() -@cindex @code{systime} function (@command{gawk}) +@cindex @code{systime()} function (@command{gawk}) @cindex timestamps This function returns the current time as the number of seconds since the system epoch. On POSIX systems, this is the number of seconds @@ -14162,9 +14313,9 @@ It may be a different number on other systems. @item mktime(@var{datespec}) -@cindex @code{mktime} function (@command{gawk}) +@cindex @code{mktime()} function (@command{gawk}) This function turns @var{datespec} into a timestamp in the same form -as is returned by @code{systime}. It is similar to the function of the +as is returned by @code{systime()}. It is similar to the function of the same name in ISO C. The argument, @var{datespec}, is a string of the form @w{@code{"@var{YYYY} @var{MM} @var{DD} @var{HH} @var{MM} @var{SS} [@var{DST}]"}}. The string consists of six or seven numbers representing, respectively, @@ -14182,15 +14333,15 @@ year 1 and year @minus{}1 preceding year 0. The time is assumed to be in the local timezone. If the daylight-savings flag is positive, the time is assumed to be daylight savings time; if zero, the time is assumed to be standard -time; and if negative (the default), @code{mktime} attempts to determine +time; and if negative (the default), @code{mktime()} attempts to determine whether daylight savings time is in effect for the specified time. If @var{datespec} does not contain enough elements or if the resulting time -is out of range, @code{mktime} returns @minus{}1. +is out of range, @code{mktime()} returns @minus{}1. @item strftime(@r{[}@var{format} @r{[}, @var{timestamp} @r{[}, @var{utc-flag}@r{]]]}) @c STARTOFRANGE strf -@cindex @code{strftime} function (@command{gawk}) +@cindex @code{strftime()} function (@command{gawk}) This function returns a string. It is similar to the function of the same name in ISO C. The time specified by @var{timestamp} is used to produce a string, based on the contents of the @var{format} string. @@ -14198,15 +14349,15 @@ If @var{utc-flag} is present and is either non-zero or non-null, the value is formatted as UTC (Coordinated Universal Time, formerly GMT or Greenwich Mean Time). Otherwise, the value is formatted for the local time zone. The @var{timestamp} is in the same format as the value returned by the -@code{systime} function. If no @var{timestamp} argument is supplied, +@code{systime()} function. If no @var{timestamp} argument is supplied, @command{gawk} uses the current time of day as the timestamp. -If no @var{format} argument is supplied, @code{strftime} uses +If no @var{format} argument is supplied, @code{strftime()} uses @code{@w{"%a %b %d %H:%M:%S %Z %Y"}}. This format string produces output that is (almost) equivalent to that of the @command{date} utility. (Versions of @command{gawk} prior to 3.0 require the @var{format} argument.) @end table -The @code{systime} function allows you to compare a timestamp from a +The @code{systime()} function allows you to compare a timestamp from a log file with the current time of day. In particular, it is easy to determine how long ago a particular record was logged. It also allows you to produce log records using the ``seconds since the epoch'' format. @@ -14214,22 +14365,22 @@ you to produce log records using the ``seconds since the epoch'' format. @cindex converting, dates to timestamps @cindex dates, converting to timestamps @cindex timestamps, converting dates to -The @code{mktime} function allows you to convert a textual representation +The @code{mktime()} function allows you to convert a textual representation of a date and time into a timestamp. This makes it easy to do before/after comparisons of dates and times, particularly when dealing with date and time data coming from an external source, such as a log file. -The @code{strftime} function allows you to easily turn a timestamp -into human-readable information. It is similar in nature to the @code{sprintf} +The @code{strftime()} function allows you to easily turn a timestamp +into human-readable information. It is similar in nature to the @code{sprintf()} function (@pxref{String Functions}), in that it copies nonformat specification characters verbatim to the returned string, while substituting date and time values for format specifications in the @var{format} string. -@cindex format specifiers, @code{strftime} function (@command{gawk}) -@code{strftime} is guaranteed by the 1999 ISO C standard@footnote{As this -is a recent standard, not every system's @code{strftime} necessarily +@cindex format specifiers, @code{strftime()} function (@command{gawk}) +@code{strftime()} is guaranteed by the 1999 ISO C standard@footnote{As this +is a recent standard, not every system's @code{strftime()} necessarily supports all of the conversions listed here.} to support the following date format specifications: @@ -14382,8 +14533,8 @@ A literal @samp{%}. If a conversion specifier is not one of the above, the behavior is undefined.@footnote{This is because ISO C leaves the -behavior of the C version of @code{strftime} undefined and @command{gawk} -uses the system's version of @code{strftime} if it's there. +behavior of the C version of @code{strftime()} undefined and @command{gawk} +uses the system's version of @code{strftime()} if it's there. Typically, the conversion specifier either does not appear in the returned string or appears literally.} @@ -14400,7 +14551,7 @@ are used to. For systems that are not yet fully standards-compliant, @command{gawk} supplies a copy of -@code{strftime} from the GNU C Library. +@code{strftime()} from the GNU C Library. It supports all of the just listed format specifications. If that version is used to compile @command{gawk} (@pxref{Installation}), @@ -14651,57 +14802,29 @@ with @samp{11001000}. @command{gawk} provides built-in functions that implement the bitwise operations just described. They are: -@ignore -@table @code -@cindex @code{and} function (@command{gawk}) -@item and(@var{v1}, @var{v2}) -Return the bitwise AND of the values provided by @var{v1} and @var{v2}. - -@cindex @code{or} function (@command{gawk}) -@item or(@var{v1}, @var{v2}) -Return the bitwise OR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{xor} function (@command{gawk}) -@item xor(@var{v1}, @var{v2}) -Return the bitwise XOR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{compl} function (@command{gawk}) -@item compl(@var{val}) -Return the bitwise complement of @var{val}. - -@cindex @code{lshift} function (@command{gawk}) -@item lshift(@var{val}, @var{count}) -Return the value of @var{val}, shifted left by @var{count} bits. - -@cindex @code{rshift} function (@command{gawk}) -@item rshift(@var{val}, @var{count}) -Return the value of @var{val}, shifted right by @var{count} bits. -@end table -@end ignore - @cindex @command{gawk}, bitwise operations in @multitable {@code{rshift(@var{val}, @var{count})}} {Return the value of @var{val}, shifted right by @var{count} bits.} -@cindex @code{and} function (@command{gawk}) +@cindex @code{and()} function (@command{gawk}) @item @code{and(@var{v1}, @var{v2})} @tab Returns the bitwise AND of the values provided by @var{v1} and @var{v2}. -@cindex @code{or} function (@command{gawk}) +@cindex @code{or()} function (@command{gawk}) @item @code{or(@var{v1}, @var{v2})} @tab Returns the bitwise OR of the values provided by @var{v1} and @var{v2}. -@cindex @code{xor} function (@command{gawk}) +@cindex @code{xor()} function (@command{gawk}) @item @code{xor(@var{v1}, @var{v2})} @tab Returns the bitwise XOR of the values provided by @var{v1} and @var{v2}. -@cindex @code{compl} function (@command{gawk}) +@cindex @code{compl()} function (@command{gawk}) @item @code{compl(@var{val})} @tab Returns the bitwise complement of @var{val}. -@cindex @code{lshift} function (@command{gawk}) +@cindex @code{lshift()} function (@command{gawk}) @item @code{lshift(@var{val}, @var{count})} @tab Returns the value of @var{val}, shifted left by @var{count} bits. -@cindex @code{rshift} function (@command{gawk}) +@cindex @code{rshift()} function (@command{gawk}) @item @code{rshift(@var{val}, @var{count})} @tab Returns the value of @var{val}, shifted right by @var{count} bits. @end multitable @@ -14812,7 +14935,7 @@ The main code in the @code{BEGIN} rule shows the difference between the decimal and octal values for the same numbers (@pxref{Nondecimal-numbers}), and then demonstrates the -results of the @code{compl}, @code{lshift}, and @code{rshift} functions. +results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. @c ENDOFRANGE bit @c ENDOFRANGE and @c ENDOFRANGE oro @@ -14834,14 +14957,14 @@ for the full story. Optional parameters are enclosed in square brackets ([ ]): @table @code -@cindex @code{dcgettext} function (@command{gawk}) +@cindex @code{dcgettext()} function (@command{gawk}) @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) This function returns the translation of @var{string} in text domain @var{domain} for locale category @var{category}. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. -@cindex @code{dcngettext} function (@command{gawk}) +@cindex @code{dcngettext()} function (@command{gawk}) @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) This function returns the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -14851,7 +14974,7 @@ variant of the same message. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. -@cindex @code{bindtextdomain} function (@command{gawk}) +@cindex @code{bindtextdomain()} function (@command{gawk}) @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) This function allows you to specify the directory in which @command{gawk} will look for message translation files, in case they @@ -14861,7 +14984,7 @@ It returns the directory in which @var{domain} is ``bound.'' The default @var{domain} is the value of @code{TEXTDOMAIN}. If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain} returns the current binding for the +@code{bindtextdomain()} returns the current binding for the given @var{domain}. @end table @c ENDOFRANGE funcbi @@ -15091,7 +15214,7 @@ $ echo "Don't Panic!" | The C @code{ctime} function takes a timestamp and returns it in a string, formatted in a well-known fashion. -The following example uses the built-in @code{strftime} function +The following example uses the built-in @code{strftime()} function (@pxref{Time Functions}) to create an @command{awk} version of @code{ctime}: @@ -15778,7 +15901,7 @@ monetary values are printed and read. @cindex @code{gettext} library The facilities in GNU @code{gettext} focus on messages; strings printed by a program, either directly or via formatting with @code{printf} or -@code{sprintf}.@footnote{For some operating systems, the @command{gawk} +@code{sprintf()}.@footnote{For some operating systems, the @command{gawk} port doesn't support GNU @code{gettext}. This applies most notably to the PC operating systems. As such, these features are not available if you are using one of those operating systems. Sorry.} @@ -15842,11 +15965,11 @@ at runtime. When @command{guide} is built and installed, the binary translation files are installed in a standard place. -@cindex @code{bindtextdomain} function (C library) +@cindex @code{bindtextdomain()} function (C library) @item For testing and development, it is possible to tell @code{gettext} to use @file{.mo} files in a different directory than the standard -one by using the @code{bindtextdomain} function. +one by using the @code{bindtextdomain()} function. @cindex @code{.mo} files, specifying directory of @cindex files, @code{.mo}, specifying directory of @@ -15976,7 +16099,7 @@ String constants marked with a leading underscore are candidates for translation at runtime. String constants without a leading underscore are not translated. -@cindex @code{dcgettext} function (@command{gawk}) +@cindex @code{dcgettext()} function (@command{gawk}) @item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) This built-in function returns the translation of @var{string} in text domain @var{domain} for locale category @var{category}. @@ -15995,12 +16118,12 @@ You must also supply a text domain. Use @code{TEXTDOMAIN} if you want to use the current domain. @strong{Caution:} The order of arguments to the @command{awk} version -of the @code{dcgettext} function is purposely different from the order for +of the @code{dcgettext()} function is purposely different from the order for the C version. The @command{awk} version's order was chosen to be simple and to allow for reasonable @command{awk}-style default arguments. -@cindex @code{dcngettext} function (@command{gawk}) +@cindex @code{dcngettext()} function (@command{gawk}) @item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) This built-in function returns the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @@ -16010,13 +16133,13 @@ variant of the same message. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. -The same remarks as for the @code{dcgettext} function apply. +The same remarks as for the @code{dcgettext()} function apply. @cindex @code{.mo} files, specifying directory of @cindex files, @code{.mo}, specifying directory of @cindex message object files, specifying directory of @cindex files, message object, specifying directory of -@cindex @code{bindtextdomain} function (@command{gawk}) +@cindex @code{bindtextdomain()} function (@command{gawk}) @item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) This built-in function allows you to specify the directory in which @code{gettext} looks for @file{.mo} files, in case they @@ -16026,7 +16149,7 @@ It returns the directory in which @var{domain} is ``bound.'' The default @var{domain} is the value of @code{TEXTDOMAIN}. If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain} returns the current binding for the +@code{bindtextdomain()} returns the current binding for the given @var{domain}. @end table @@ -16072,7 +16195,7 @@ printf(_"Number of users is %d\n", nusers) @item If you are creating strings dynamically, you can -still translate them, using the @code{dcgettext} +still translate them, using the @code{dcgettext()} built-in function: @example @@ -16081,15 +16204,15 @@ message = dcgettext(message, "adminprog") print message @end example -Here, the call to @code{dcgettext} supplies a different +Here, the call to @code{dcgettext()} supplies a different text domain (@code{"adminprog"}) in which to find the message, but it uses the default @code{"LC_MESSAGES"} category. -@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain} function (@command{gawk}) +@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) @item During development, you might want to put the @file{.mo} file in a private directory for testing. This is done -with the @code{bindtextdomain} built-in function: +with the @code{bindtextdomain()} built-in function: @example BEGIN @{ @@ -16160,8 +16283,8 @@ When run with @option{--gen-pot}, @command{gawk} does not execute your program. Instead, it parses it as usual and prints all marked strings to standard output in the format of a GNU @code{gettext} Portable Object file. Also included in the output are any constant strings that -appear as the first argument to @code{dcgettext} or as the first and -second argument to @code{dcngettext}.@footnote{Starting with @code{gettext} +appear as the first argument to @code{dcgettext()} or as the first and +second argument to @code{dcngettext()}.@footnote{Starting with @code{gettext} version 0.11.5, the @command{xgettext} utility that comes with GNU @code{gettext} can handle @file{.awk} files.} @xref{I18N Example}, @@ -16173,7 +16296,7 @@ translations for @command{guide}. @cindex @code{printf} statement, positional specifiers @cindex positional specifiers@comma{} @code{printf} statement -Format strings for @code{printf} and @code{sprintf} +Format strings for @code{printf} and @code{sprintf()} (@pxref{Printf}) present a special problem for translation. Consider the following:@footnote{This example is borrowed @@ -16302,14 +16425,14 @@ the null string (@code{""}) as its value, leaving the original string constant a the result. @item -By defining ``dummy'' functions to replace @code{dcgettext}, @code{dcngettext} -and @code{bindtextdomain}, the @command{awk} program can be made to run, but +By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()}, the @command{awk} program can be made to run, but all the messages are output in the original language. For example: -@cindex @code{bindtextdomain} function (@command{gawk}), portability and -@cindex @code{dcgettext} function (@command{gawk}), portability and -@cindex @code{dcngettext} function (@command{gawk}), portability and +@cindex @code{bindtextdomain()} function (@command{gawk}), portability and +@cindex @code{dcgettext()} function (@command{gawk}), portability and +@cindex @code{dcngettext()} function (@command{gawk}), portability and @example @c file eg/lib/libintl.awk function bindtextdomain(dir, domain) @@ -16331,12 +16454,12 @@ function dcngettext(string1, string2, number, domain, category) @item The use of positional specifications in @code{printf} or -@code{sprintf} is @emph{not} portable. +@code{sprintf()} is @emph{not} portable. To support @code{gettext} at the C level, many systems' C versions of -@code{sprintf} do support positional specifiers. But it works only if +@code{sprintf()} do support positional specifiers. But it works only if enough arguments are supplied in the function call. Many versions of @command{awk} pass @code{printf} formats and arguments unchanged to the -underlying C library version of @code{sprintf}, but only one format and +underlying C library version of @code{sprintf()}, but only one format and argument at a time. What happens if a positional specification is used is anybody's guess. However, since the positional specifications are primarily for use in @@ -16465,8 +16588,8 @@ $ gawk -f guide.awk @print{} Pardon me, Zaphod who? @end example -If the three replacement functions for @code{dcgettext}, @code{dcngettext} -and @code{bindtextdomain} +If the three replacement functions for @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()} (@pxref{I18N Portability}) are in a file named @file{libintl.awk}, then we can run @file{guide.awk} unchanged as follows: @@ -16529,7 +16652,7 @@ First, a command-line option allows @command{gawk} to recognize nondecimal numbers in input data, not just in @command{awk} programs. Next, two-way I/O, discussed briefly in earlier parts of this @value{DOCUMENT}, is described in full detail, along with the basics -of TCP/IP networking and BSD portal files. Finally, @command{gawk} +of TCP/IP networking. Finally, @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune it for performance. @@ -16542,7 +16665,6 @@ its description is relegated to an appendix. * Nondecimal Data:: Allowing nondecimal input data. * Two-way I/O:: Two-way communications with another process. * TCP/IP Networking:: Using @command{gawk} for network programming. -* Portal Files:: Using @command{gawk} with BSD portals. * Profiling:: Profiling your @command{awk} programs. @end menu @@ -16592,12 +16714,12 @@ using it could lead to surprising results, the default is to leave this facility disabled. If you want it, you must explicitly request it. @cindex programming conventions, @code{--non-decimal-data} option -@cindex @code{--non-decimal-data} option, @code{strtonum} function and -@cindex @code{strtonum} function (@command{gawk}), @code{--non-decimal-data} option and +@cindex @code{--non-decimal-data} option, @code{strtonum()} function and +@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and @strong{Caution:} @emph{Use of this option is not recommended.} It can break old programs very badly. -Instead, use the @code{strtonum} function to convert your data +Instead, use the @code{strtonum()} function to convert your data (@pxref{Nondecimal-numbers}). This makes your programs easier to write and easier to read, and leads to less surprising results. @@ -16713,9 +16835,9 @@ known as @dfn{deadlock}, where each process is waiting for the other one to do something. @end itemize -@cindex @code{close} function, two-way pipes and +@cindex @code{close()} function, two-way pipes and It is possible to close just one end of the two-way pipe to -a coprocess, by supplying a second argument to the @code{close} +a coprocess, by supplying a second argument to the @code{close()} function of either @code{"to"} or @code{"from"} (@pxref{Close Files And Pipes}). These strings tell @command{gawk} to close the end of the pipe @@ -16883,29 +17005,6 @@ which comes as part of the @command{gawk} distribution, for a much more complete introduction and discussion, as well as extensive examples. -@node Portal Files -@section Using @command{gawk} with BSD Portals -@cindex advanced features, @command{gawk}, BSD portals -@cindex portal files -@cindex files, portal -@cindex BSD portals -@cindex @code{/p} files (@command{gawk}) -@cindex files, @code{/p} (@command{gawk}) -@cindex @code{--enable-portals} configuration option -@cindex operating systems, BSD-based - -Similar to the @file{/inet} special files, if @command{gawk} -is configured with the @option{--enable-portals} option -(@pxref{Quick Installation}), -then @command{gawk} treats -files whose pathnames begin with @code{/p} as 4.4 BSD-style portals. - -@cindex @code{|} (vertical bar), @code{|&} operator (I/O), two-way communications -@cindex vertical bar (@code{|}), @code{|&} operator (I/O), two-way communications -When used with the @samp{|&} operator, @command{gawk} opens the file -for two-way communications. The operating system's portal mechanism -then manages creating the process associated with the portal and -the corresponding communications with the portal's process. @c ENDOFRANGE tcpip @node Profiling @@ -17239,6 +17338,7 @@ full details. * AWKPATH Variable:: Searching directories for @command{awk} programs. * Exit Status:: @command{gawk}'s exit status. +* Include Files:: Including other files into your program. * Obsolete:: Obsolete Options and/or features. * Undocumented:: Undocumented Options and Features. * Known Bugs:: Known Bugs in @command{gawk}. @@ -17407,10 +17507,8 @@ multi-byte characters. This option is an easy way to tell @command{gawk}: ``hands off my data!''. @item -c -@itemx --compat @itemx --traditional @cindex @code{--c} option -@cindex @code{--compat} option @cindex @code{--traditional} option @cindex compatibility mode (@command{gawk}), specifying Specifies @dfn{compatibility mode}, in which the GNU extensions to @@ -17423,10 +17521,8 @@ which summarizes the extensions. Also see @item -C @itemx --copyright -@itemx --copyleft @cindex @code{-C} option @cindex @code{--copyright} option -@cindex @code{--copyleft} option @cindex GPL (General Public License), printing Print the short version of the General Public License and then exit. @@ -17500,17 +17596,15 @@ for information about this option. @item -h @itemx --help -@itemx --usage @cindex @code{-h} option @cindex @code{--help} option -@cindex @code{--usage} option @cindex GNU long options, printing list of @cindex options, printing list of @cindex printing, list of options Prints a ``usage'' message summarizing the short and long style options that @command{gawk} accepts and then exit. -@item -l @r{[}value@r{]} +@item -L @r{[}value@r{]} @itemx --lint@r{[}=value@r{]} @cindex @code{-l} option @cindex @code{--lint} option @@ -17524,22 +17618,14 @@ With an optional argument of @samp{fatal}, lint warnings become fatal errors. This may be drastic, but its use will certainly encourage the development of cleaner @command{awk} programs. -With an optional argument of @samp{invalid}, only warnings about things that are -actually invalid are issued. (This is not fully implemented yet.) +With an optional argument of @samp{invalid}, only warnings about things +that are actually invalid are issued. (This is not fully implemented yet.) -Some warnings are only printed once, even if the dubious constructs they warn -about occur multiple times in your @command{awk} program. Thus, when eliminating -problems pointed out by @option{--lint}, you should take care to search for all -occurrences of each inappropriate construct. As @command{awk} programs are -usually short, doing so is not burdensome. - -@item -L -@itemx --lint-old -@cindex @code{--L} option -@cindex @code{--lint-old} option -Warns about constructs that are not available in the original version of -@command{awk} from Version 7 Unix -(@pxref{V7/SVR3.1}). +Some warnings are only printed once, even if the dubious constructs they +warn about occur multiple times in your @command{awk} program. Thus, +when eliminating problems pointed out by @option{--lint}, you should take +care to search for all occurrences of each inappropriate construct. As +@command{awk} programs are usually short, doing so is not burdensome. @item -n @itemx --non-decimal-data @@ -17644,9 +17730,9 @@ of @code{FS} to be a single TAB character The locale's decimal point character is used for parsing input data (@pxref{Locales}). -@cindex @code{fflush} function@comma{} unsupported +@cindex @code{fflush()} function@comma{} unsupported @item -The @code{fflush} built-in function is not supported +The @code{fflush()} built-in function is not supported (@pxref{I/O Functions}). @end itemize @@ -17675,14 +17761,22 @@ and for use in combination with the @option{--traditional} option. @cindex @code{-S} option @cindex @code{--sandbox} option @cindex sandbox mode -In sandbox mode, the @command{system} function, -input redirections with @command{getline}, -output redirections with @command{print} and @command{printf} +In sandbox mode, the @code{system()} function, +input redirections with @code{getline}, +output redirections with @code{print} and @code{printf} and dynamic extensions are disabled. This is particularly useful when you want to run @command{awk} scripts from questionable sources and need to make sure the scripts can't access your system (other then the specified input data file). +@item -t +@itemx --lint-old +@cindex @code{--L} option +@cindex @code{--lint-old} option +Warns about constructs that are not available in the original version of +@command{awk} from Version 7 Unix +(@pxref{V7/SVR3.1}). + @item -V @itemx --version @cindex @code{-V} option @@ -17749,7 +17843,7 @@ and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT}, then it issues a warning message indicating that POSIX mode is in effect. You would typically set this variable in your shell's startup file. -For a Bourne-compatible shell (such as @command{bash}), you would add these +For a Bourne-compatible shell (such as Bash), you would add these lines to the @file{.profile} file in your home directory: @example @@ -17928,6 +18022,124 @@ If @command{gawk} exits because of a fatal error, the exit status is 2. On non-POSIX systems, this value may be mapped to @code{EXIT_FAILURE}. +@node Include Files +@section Including Other Files Into Your Program + +@c Panos Papadopoulos <panos1962@gmail.com> contributed the original +@c text for this section. + +@strong{FIXME:} This section still needs some editing. + +Beginning with version @strong{FIXME:} 3.1.8-bc of @command{gawk}, the +@samp{@@include} keyword can be used to read external source @command{awk} +files. That gives the ability to split huge @command{awk} source files +into smaller and manageable files and also to reuse common @command{awk} +code from various @command{awk} scripts. In other words, you can group +together @command{awk} functions, used to carry out some sort of tasks, +in external files. These files can be used just like function libraries, +using the @samp{@@include} keyword in conjuction with the @code{AWKPATH} +environment variable. + +Let's see an example to demonstrate file inclusion in @command{gawk}. +To do so, we'll use two (trivial) @command{awk} scripts, namely the +@file{test1} and @file{test2} @command{gawk} scripts. Here follows the +@file{test1} @command{gawk} script file: + +@example +BEGIN @{ + print "This is script test1." +@} +@end example + +@noindent +and the @file{test2} file: + +@example +@@include "test1" +BEGIN @{ + print "This is script test2." +@} +@end example + +Running @command{gawk} with the @file{test2} +script you'll get the following result: + +@example +$ @kbd{gawk -f test2} +@print{} This is file test1. +@print{} This is file test2. +@end example + +@code{gawk} runs the @file{test2} script where @file{test1} has been +included in the source of @file{test2} by means of the @samp{@@include} +keyword. So, to include external @command{awk} source files you just +use @samp{@@include} followed by the name of the file to be included in +double quotes. + +@quotation NOTE +Keep in mind that this is a language construct and the @value{FN} cannot +be a string variable, but rather just a literal string in double quotes. +@end quotation + +The files to be included may be nested; e.g. given a third +script, namely @file{test3}: + +@example +@@include "test2" +BEGIN @{ + print "This is script test3." +@} +@end example + +@noindent +and running @command{gawk} with the @file{test3} script you'll get the +following result: + +@example +$ @kbd{gawk -f test3} +@print{} This is file test1. +@print{} This is file test2. +@print{} This is file test3. +@end example + +The @value{FN} can, of course, be a pathname, e.g. + +@example +@@include "../io_funcs" +@end example + +@noindent +or + +@example +@@include "/usr/awklib/network" +@end example + +@noindent +are valid. The @code{AWKPATH} environment variable can be of great +value in @samp{@@include} constructs. The same rules dominating the use +of @code{AWKPATH} variable in command line file searches are valid in +@samp{@@include} constructs too. That can be prooved very helpful in +constructing @command{gawk} function libraries. You can edit huge +scripts containing usefull @command{gawk} libraries and put those +files in a special directory. You can then include those ``libraries'' +using either the full pathnames of the files or by setting accordingly +the @code{AWKPATH} environment variable and then use @samp{@@include} +with just the name part of the full file pathname. Of course you can +have more than one directory to keep library files; the more complex +the working enviroment is, the more directories you need to organize +the files to be included. + +The whole stuff of file inclusion can, of course, be carried out in the +command line, using as many @option{-f} options as required with the +files to be included as arguments, but the @samp{@@include} keyword +can help you in constructing self-contained @command{gawk} programs, +thus reducing the need of writing complex and tedious command lines. + +@code{AWKPATH} is also used by the @samp{@@include} mechanism, that is +the files to be included will be seeked in the directories specified. +Keep in mind, however, that the current directory is been searched first, +either it's listed in the @code{AWKPATH} string or not. @node Obsolete @section Obsolete Options and/or Features @@ -18290,12 +18502,12 @@ programming use. @menu * Nextfile Function:: Two implementations of a @code{nextfile} function. -* Strtonum Function:: A replacement for the built-in @code{strtonum} - function. +* Strtonum Function:: A replacement for the built-in + @code{strtonum()} function. * Assert Function:: A function for assertions in @command{awk} programs. -* Round Function:: A function for rounding if @code{sprintf} does - not do it correctly. +* Round Function:: A function for rounding if @code{sprintf()} + does not do it correctly. * Cliff Random Function:: The Cliff Random Number Generator. * Ordinal Functions:: Functions for using characters as numbers and vice versa. @@ -18438,7 +18650,7 @@ computations). @node Strtonum Function @subsection Converting Strings To Numbers -The @code{strtonum} function (@pxref{String Functions}) +The @code{strtonum()} function (@pxref{String Functions}) is a @command{gawk} extension. The following function provides an implementation for other versions of @command{awk}: @@ -18518,7 +18730,7 @@ adjusts @code{k} so it can be used in computing the return value. Similar logic applies to the code that checks for and converts a hexadecimal value, which starts with @samp{0x} or @samp{0X}. -The use of @code{tolower} simplifies the computation for finding +The use of @code{tolower()} simplifies the computation for finding the correct numeric value for each hexadecimal digit. Finally, if the string matches the (rather complicated) regex for a @@ -18528,7 +18740,7 @@ number. A commented-out test program is included, so that the function can be tested with @command{gawk} and the results compared to the built-in -@code{strtonum} function. +@code{strtonum()} function. @node Assert Function @subsection Assertions @@ -18668,13 +18880,13 @@ with an @code{exit} statement. @cindex numbers, rounding @cindex libraries of @command{awk} functions, rounding numbers @cindex functions, library, rounding numbers -@cindex @code{print} statement, @code{sprintf} function and -@cindex @code{printf} statement, @code{sprintf} function and -@cindex @code{sprintf} function, @code{print}/@code{printf} statements and -The way @code{printf} and @code{sprintf} +@cindex @code{print} statement, @code{sprintf()} function and +@cindex @code{printf} statement, @code{sprintf()} function and +@cindex @code{sprintf()} function, @code{print}/@code{printf} statements and +The way @code{printf} and @code{sprintf()} (@pxref{Printf}) -perform rounding often depends upon the system's C @code{sprintf} -subroutine. On many machines, @code{sprintf} rounding is ``unbiased,'' +perform rounding often depends upon the system's C @code{sprintf()} +subroutine. On many machines, @code{sprintf()} rounding is ``unbiased,'' which means it doesn't always round a trailing @samp{.5} up, contrary to naive expectations. In unbiased rounding, @samp{.5} rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means @@ -18769,7 +18981,7 @@ function cliff_rand() This algorithm requires an initial ``seed'' of 0.1. Each new value uses the current seed as input for the calculation. -If the built-in @code{rand} function +If the built-in @code{rand()} function (@pxref{Numeric Functions}) isn't random enough, you might try using this function instead. @@ -18910,7 +19122,7 @@ Good function design is important; this function needs to be general but it should also have a reasonable default behavior. It is called with an array as well as the beginning and ending indices of the elements in the array to be merged. This assumes that the array indices are numeric---a reasonable -assumption since the array was likely created with @code{split} +assumption since the array was likely created with @code{split()} (@pxref{String Functions}): @cindex @code{join} user-defined function @@ -18960,10 +19172,10 @@ more difficult than they really need to be.} @cindex functions, library, managing time @cindex timestamps, formatted @cindex time, managing -The @code{systime} and @code{strftime} functions described in +The @code{systime()} and @code{strftime()} functions described in @ref{Time Functions}, provide the minimum functionality necessary for dealing with the time of day -in human readable form. While @code{strftime} is extensive, the control +in human readable form. While @code{strftime()} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when reading a program. @@ -19046,7 +19258,7 @@ function gettimeofday(time, ret, now, i) @end example The string indices are easier to use and read than the various formats -required by @code{strftime}. The @code{alarm} program presented in +required by @code{strftime()}. The @code{alarm} program presented in @ref{Alarm Program}, uses this function. A more general design for the @code{gettimeofday} function would have @@ -19576,12 +19788,12 @@ The abstraction provided by @code{getopt} is very useful and is quite handy in @command{awk} programs as well. Following is an @command{awk} version of @code{getopt}. This function highlights one of the greatest weaknesses in @command{awk}, which is that it is very poor at -manipulating single characters. Repeated calls to @code{substr} are +manipulating single characters. Repeated calls to @code{substr()} are necessary for accessing individual characters (@pxref{String Functions}).@footnote{This function was written before @command{gawk} acquired the ability to split strings into single characters using @code{""} as the separator. -We have left it alone, since using @code{substr} is more portable.} +We have left it alone, since using @code{substr()} is more portable.} The discussion that follows walks through the code a bit at a time: @@ -19689,7 +19901,7 @@ to return them to the user one at a time. If @code{_opti} is equal to zero, it is set to two, which is the index in the string of the next character to look at (we skip the @samp{-}, which is at position one). The variable @code{thisopt} holds the character, -obtained with @code{substr}. It is saved in @code{Optopt} for the main +obtained with @code{substr()}. It is saved in @code{Optopt} for the main program to use. If @code{thisopt} is not in the @code{options} string, then it is an @@ -19954,12 +20166,12 @@ The user's encrypted password. This may not be available on some systems. @item User-ID The user's numeric user ID number. -(On some systems it's a C @code{long}, and not an @code{int}. Thus +(On some systems it's a C @code{long}, and not an @code{int()}. Thus we cast it to @code{long} for all cases.) @item Group-ID The user's numeric group ID number. -(Similar comments about @code{long} vs.@: @code{int} apply here.) +(Similar comments about @code{long} vs.@: @code{int()} apply here.) @item Full name The user's full name, and perhaps other information associated with the @@ -19971,7 +20183,7 @@ The user's login (or ``home'') directory (familiar to shell programmers as @item Login shell The program that is run when the user logs in. This is usually a -shell, such as @command{bash}. +shell, such as Bash. @end table @end ignore @@ -19991,7 +20203,7 @@ user. @code{$HOME}). @item Login shell @tab The program that is run when the user logs in. This is usually a -shell, such as @command{bash}. +shell, such as Bash. @end multitable A few lines representative of @command{pwcat}'s output are as follows: @@ -20337,7 +20549,7 @@ usually empty or set to @samp{*}. @item Group ID Number The numeric group ID number. This number is unique within the file. -(On some systems it's a C @code{long}, and not an @code{int}. Thus +(On some systems it's a C @code{long}, and not an @code{int()}. Thus we cast it to @code{long} for all cases.) @item Group Member List @@ -21015,7 +21227,7 @@ written out between the fields: This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS} variable to do the character-based cutting. While it is possible in -other @command{awk} implementations to use @code{substr} +other @command{awk} implementations to use @code{substr()} (@pxref{String Functions}), it is also extremely painful. The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem @@ -21467,8 +21679,8 @@ arguments and perform in the same way. @c STARTOFRANGE filspl @cindex files, splitting -@cindex @code{split} utility -The @code{split} program splits large text files into smaller pieces. +@cindex @code{split()} utility +The @code{split()} program splits large text files into smaller pieces. Usage is as follows: @example @@ -21484,7 +21696,7 @@ instead of 1000. To change the name of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional argument that specifies the @value{FN} prefix. -Here is a version of @code{split} in @command{awk}. It uses the @code{ord} and +Here is a version of @code{split()} in @command{awk}. It uses the @code{ord} and @code{chr} functions presented in @ref{Ordinal Functions}. @@ -21746,7 +21958,7 @@ The options for @command{uniq} are: @table @code @item -d -Pnly print only repeated lines. +Print only repeated lines. @item -u Print only nonrepeated lines. @@ -21883,13 +22095,13 @@ simply returns one or zero depending upon the result of a simple string comparison of @code{last} and @code{$0}. Otherwise, things get more complicated. If fields have to be skipped, each line is broken into an array using -@code{split} +@code{split()} (@pxref{String Functions}); the desired fields are then joined back into a line using @code{join}. The joined lines are stored in @code{clast} and @code{cline}. If no fields are skipped, @code{clast} and @code{cline} are set to @code{last} and @code{$0}, respectively. -Finally, if characters are skipped, @code{substr} is used to strip off the +Finally, if characters are skipped, @code{substr()} is used to strip off the leading @code{charcount} characters in @code{clast} and @code{cline}. The two strings are then compared and @code{are_equal} returns the result: @@ -22393,7 +22605,7 @@ is how long to wait before setting off the alarm: @end example @cindex @command{sleep} utility -Finally, the program uses the @code{system} function +Finally, the program uses the @code{system()} function (@pxref{I/O Functions}) to call the @command{sleep} utility. The @command{sleep} utility simply pauses for the given number of seconds. If the exit status is not zero, @@ -22465,8 +22677,8 @@ but it does most of the job. The @command{translate} program demonstrates one of the few weaknesses of standard @command{awk}: dealing with individual characters is very -painful, requiring repeated use of the @code{substr}, @code{index}, -and @code{gsub} built-in functions +painful, requiring repeated use of the @code{substr()}, @code{index()}, +and @code{gsub()} built-in functions (@pxref{String Functions}).@footnote{This program was written before @command{gawk} acquired the ability to split each character in a string into separate array elements.} @@ -22567,7 +22779,7 @@ While it is possible to do character transliteration in a user-level function, it is not necessarily efficient, and we (the @command{gawk} authors) started to consider adding a built-in function. However, shortly after writing this program, we learned that the System V Release 4 -@command{awk} had added the @code{toupper} and @code{tolower} functions +@command{awk} had added the @code{toupper()} and @code{tolower()} functions (@pxref{String Functions}). These functions handle the vast majority of the cases where character transliteration is necessary, and so we chose to @@ -22769,8 +22981,8 @@ table of how frequently each word occurs. @cindex @command{sort} utility The way to solve these problems is to use some of @command{awk}'s more advanced -features. First, we use @code{tolower} to remove -case distinctions. Next, we use @code{gsub} to remove punctuation +features. First, we use @code{tolower()} to remove +case distinctions. Next, we use @code{gsub()} to remove punctuation characters. Finally, we use the system @command{sort} utility to process the output of the @command{awk} script. Here is the new version of the program: @@ -22967,7 +23179,7 @@ The following program, @file{extract.awk}, reads through a Texinfo source file and does two things, based on the special comments. Upon seeing @samp{@w{@@c system @dots{}}}, it runs a command, by extracting the command text from the -control line and passing it on to the @code{system} function +control line and passing it on to the @code{system()} function (@pxref{I/O Functions}). Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to the file @var{filename}, until @samp{@@c endfile} is encountered. @@ -23008,7 +23220,7 @@ END @@@{ print "Always avoid bored archeologists!" @@@} @file{extract.awk} begins by setting @code{IGNORECASE} to one, so that mixed upper- and lowercase letters in the directives won't matter. -The first rule handles calling @code{system}, checking that a command is +The first rule handles calling @code{system()}, checking that a command is given (@code{NF} is at least three) and also checking that the command exits with a zero exit status, signifying OK: @@ -23080,7 +23292,7 @@ Most of the work is in the following few lines. If the line has no @samp{@@} symbols, the program can print it directly. Otherwise, each leading @samp{@@} must be stripped off. To remove the @samp{@@} symbols, the line is split into separate elements of -the array @code{a}, using the @code{split} function +the array @code{a}, using the @code{split()} function (@pxref{String Functions}). The @samp{@@} symbol is used as the separator character. Each element of @code{a} that is empty indicates two successive @samp{@@} @@ -23187,7 +23399,7 @@ command1 < orig.data | sed 's/old/new/g' | command2 > result Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp @samp{old} on each input line and globally replace it with the text @samp{new}, i.e., all the occurrences on a line. This is similar to -@command{awk}'s @code{gsub} function +@command{awk}'s @code{gsub()} function (@pxref{String Functions}). The following program, @file{awksed.awk}, accepts at least two command-line @@ -23805,6 +24017,1242 @@ O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O@}' We leave it to you to determine what the program does. +@c The original text for this chapter was contributed by Efraim Yawitz. +@c FIXME: Add more indexing. + +@node Debugger +@chapter @command{dgawk}: The @command{awk} Debugger +@cindex @command{dgawk} + +It would be nice if computer programs worked perfectly the first time they +were run, but in real life, this rarely happens for programs of +any complexity. Thus, most programming languages have facilities available +for ``debugging'' programs, and now @command{awk} is no exception. + +The @command{dgawk} debugger is purposely modeled after the GNU Debugger +(GDB) command-line debugger. If you are familiar with GDB, learning +@command{dgawk} is easy. + +@menu +* Debugging:: Introduction to @command{dgawk}. +* Sample dgawk session:: Sample @command{dgawk} session. +* List of Debugger Commands:: Main @command{dgawk} Commands. +* Readline Support:: Readline Support. +* Dgawk Limitations:: Limitations and future plans. +@end menu + +@node Debugging +@section Introduction to @command{dgawk} + +@menu +* Debugging Concepts:: Debugging In General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +@end menu + +@node Debugging Concepts +@subsection Debugging In General + +(If you have used debuggers in other languages, you may want to skip +ahead to the next section on the specific features of the @command{awk} +debugger.) + +Of course, a debugging program cannot remove bugs for you, since it has +no way of knowing what you or your users consider a ``bug'' and what is a +``feature.'' (Sometimes, we humans have a hard time with this ourselves.) +In that case, what can you expect from such a tool? The answer to that +depends on the language being debugged, but in general, it includes at +least the following: + +@itemize @bullet +@item +The ability to watch a program execute its instructions one by one, +giving you, the programmer, the opportunity to think about what is happening +on a time scale of seconds, minutes, or hours, rather than the nanosecond +time scale at which the code usually runs. + +@item +The opportunity to not only passively observe the operation of your +program, but to control it and try different paths of execution, without +having to change your source files. + +@item +The chance to see the values of data in the program at any point in +execution, and also to change that data on the fly, to see how that +effects what happens afterwards. (This often includes the ability +to look at internal data structures besides the variables you actually +defined in your code.) + +@item +The ability to obtain additional information about your program's state +or even its internal structure. +@end itemize + +All of these tools provide a great amount of help in using your own +skills and understanding of the goals of your program to find where it +is going wrong (or, for that matter, to better comprehend a perfectly +functional program that you or someone else wrote.) + +@node Debugging Terms +@subsection Additional Debugging Concepts + +Before diving in to the details, we need to introduce a few more +important concepts that apply to just about all debuggers, including +@command{dgawk}. + +@table @dfn +@item Stack Frame +Programs generally call functions during the course of their execution. +One function can call another, or a function can call itself (recursion). +You can view the chain of called functions (main program calls A, which +calls B, which calls C), as a stack of executing functions: the currently +running function is the topmost one on the stack, and when it finishes +(returns), the next one down then becomes the active function. +Such a stack is termed a @dfn{call stack}. + +For each function on the call stack, the system maintains a data area +that contains the function's parameters, local variables, and return value, +as well as any other ``bookkeeping'' information needed to manage the +call stack. This data area is termed a @dfn{stack frame}. + +@command{gawk} also follows this model, and @command{dgawk} gives you +access to the call stack and to each stack frame. You can see the +call stack, as well as from where each function on the stack was +invoked. Commands that print the call stack print information about +each stack frame (as detailed later on). + +@item Breakpoint +During debugging, you often wish to let the program run until it +reaches a certain point, and then continue execution from there one +statement (or instruction) at a time. The way to do this is to set +a @dfn{breakpoint} within the program. A breakpoint is where the +execution of the program should break off (stop), so that you can +take over control of the program's execution. You can add and remove +as many breakpoints as you like. + +@item Watchpoint +A watchpoint is similar to a breakpoint. The difference is that +breakpoints are oriented around the code: stop when a certain point in the +code is reached. A watchpoint, however, specifies that program execution +should stop when a @emph{data value} is changed. This is useful, since +sometimes it happens that a variable receives an erroneous value, and it's +hard to track down where this happens just by looking at the code. +By using a watchpoint, you can stop whenever a variable is assigned to, +and usually find the errant code quite quickly. +@end table + +@node Awk Debugging +@subsection Awk Debugging + +Debugging an @command{awk} program has some specific aspects that are +not shared with other programming languages. + +First of all, the fact that @command{awk} programs usually take input +line-by-line from a file or files and operate on those lines using specific +rules makes it especially useful to organize viewing the execution of +the program in terms of these rules. As we will see, each @command{awk} +rule is treated almost like a function call, with its own specific block +of instructions. + +In addition, since @command{awk} is by design a very concise language, +it is easy to lose sight of everything that is going on ``inside'' +each line of @command{awk} code. The debugger provides the opportunity +to look at the individual primitive instructions carried out +by the higher-level @command{awk} commands. + +@node Sample dgawk session +@section Sample @command{dgawk} session + +In order to illustrate the use of @command{dgawk}, let's look at a sample +debugging session. We will use the @command{awk} implementation of the +POSIX @command{uniq} command described earlier (@pxref{Uniq Program}) +as our example. + +@menu +* dgawk invocation:: @command{dgawk} Invocation. +* Finding The Bug:: Finding The Bug. +@end menu + +@node dgawk invocation +@subsection @command{dgawk} Invocation + +Starting @command{dgawk} is exactly like running @command{awk}. The +file(s) containing the program and any supporting code are given on the +command line as arguments to one or more @option{-f} options. +(@command{dgawk} is not designed to debug command-line +programs, only programs contained in files.) In our case, +we call @command{dgawk} like this: + +@example +$ @kbd{dgawk -f getopt.awk -f join.awk -f uniq.awk inputfile} +@end example + +@noindent +where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. +(Experienced users of @command{gdb} or similar debuggers should note that +this syntax is slightly different from what they are used to. +With @command{dgawk}, the arguments for running the program are given +in the command line to the debugger rather than as part of the @code{run} +command at the debugger prompt.) + +Instead of immediately running the program on @file{inputfile}, as +@command{gawk} would ordinarily do, @command{dgawk} merely loads all +the program source files, compiles them internally, and then gives +us a prompt: + +@example +dgawk> +@end example + +@noindent +from which we can issue commands to the debugger. At this point, no +code has been executed. + +@node Finding The Bug +@subsection Finding The Bug + +Let's say that we are having a problem using (a faulty version of) +@file{uniq.awk} in the ``field-skipping'' mode, and it doesn't seem to be +catching lines which should be identical when skipping the first field, +such as: + +@example +awk is a wonderful program! +gawk is a wonderful program! +@end example + +This could happen if we were thinking (C-like) of the fields in a record +as being numbered in a zero-based fashion, so instead of the lines: + +@example +clast = join(alast, fcount+1, n) +cline = join(aline, fcount+1, m) +@end example + +@noindent +we wrote: + +@example +clast = join(alast, fcount, n) +cline = join(aline, fcount, m) +@end example + +The first thing we usually want to do when trying to investigate a +problem like this is to put a breakpoint in the program so that we can +watch it at work and catch what it is doing wrong. A reasonable spot for +a breakpoint in @file{uniq.awk} is at the beginning of the function +@code{are_equal}, which compares the current line with the previous one. To set +the breakpoint, use the @code{b} (breakpoint) command: + +@example +dgawk> @kbd{b are_equal} +@print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64 +@end example + +The debugger tells us the file and line number where the breakpoint is. +Now type @code{r} or @code{run} and the program runs until it hits +the breakpoint the first time: + +@example +dgawk> @kbd{r} +@print{} Starting program: +@print{} Stopping in Rule ... +@print{} Breakpoint 1, are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':64 +@print{} 64 if (fcount == 0 && charcount == 0) +dgawk> +@end example + +Now we can look at what's going on inside our program. First of all, +let's see how we got to where we are. At the prompt, we type @code{bt} +(short for ``backtrace''), and @command{dgawk} responds with a +listing of the current stack frames: + +@example +dgawk> @kbd{bt} +@print{} #0 are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':69 +@print{} #1 in main() at `awklib/eg/prog/uniq.awk':89 +@end example + +This tells us that @code{are_equal} was called by the main program at +line 89 of @file{uniq.awk}. (This is not a big surprise, since this +is the only call to @code{are_equal} in the program, but in more complex +programs, knowing who called a function and with what parameters can be +the key to finding the source of the problem.) + +Now that we're in @code{are_equal}, we can start looking at the values +of some variables. Let's say we type @samp{p n} +(@code{p} is short for ``print''). We would expect to see the value of +@code{n}, a parameter to @code{are_equal}. Actually, @command{dgawk} +gives us: + +@example +dgawk> @kbd{p n} +@print{} n = untyped variable +@end example + +@noindent +In this case, @code{n} is an uninitialized local variable, since the +function was called without arguments (@pxref{Function Calls}). + +A more useful variable to display might be the current record: + +@example +dgawk> @kbd{p $0} +@print{} $0 = string ("gawk is a wonderful program!") +@end example + +@noindent +This might be a bit puzzling at first since this is the second line of +our test input above. Let's look at @code{NR}: + +@example +dgawk> @kbd{p NR} +@print{} NR = number (2) +@end example + +@noindent +So we can see that @code{are_equal} was only called for the second record +of the file. Of course, this is because our program contained a rule for +@samp{NR == 1}: + +@example +NR == 1 @{ + last = $0 + next +@} +@end example + +OK, let's just check that that rule worked correctly: + +@example +dgawk> @kbd{p last} +@print{} last = string ("awk is a wonderful program!") +@end example + +Everything we have done so far has verified that the program has worked as +planned, up to and including the call to @code{are_equal}, so the problem must +be inside this function. To investigate further, we have to begin +``stepping through'' the lines of @code{are_equal}. We start by typing +@code{n} (for ``next''): + +@example +dgawk> @kbd{n} +@print{} 67 if (fcount > 0) @{ +@end example + +This tells us that @command{gawk} is now ready to execute line 67, which +decides whether to give the lines the special ``field skipping'' treatment +indicated by the @option{-f} command-line option. (Notice that we skipped +from where we were before at line 64 to here, since the condition in line 64 + +@example +if (fcount == 0 && charcount == 0) +@end example + +@noindent +was false.) + +Continuing to step, we now get to the splitting of the current and +last records: + +@example +dgawk> @kbd{n} +@print{} 68 n = split(last, alast) +dgawk> @kbd{n} +@print{} 69 m = split($0, aline) +@end example + +At this point, we should be curious to see what our records were split +into, so we try to look: + +@example +dgawk> @kbd{p n m alast aline} +@print{} n = number (5) +@print{} m = number (5) +@print{} alast = array, 5 elements +@print{} aline = array, 5 elements +@end example + +@noindent +(The @code{p} command can take more than one argument, similar to +@command{awk}'s @code{print} statement.) + +This is kind of disappointing, though. All we found out is that there +are five elements in each of our arrays. Useful enough (we now know that +none of the words were accidentally left out), but what if we want to see +inside the array? + +The first choice would be to use subscripts: + +@example +dgawk> @kbd{p alast[0]} +@print{} "0" not in array `alast' +@end example + +@noindent +Oops! + +@example +dgawk> @kbd{p alast[1]} +@print{} alast["1"] = string ("awk") +@end example + +This would be kind of slow for a 100-member array, though, so +@command{dgawk} provides a shortcut (reminiscent of another language +not to be mentioned): + +@example +dgawk> @kbd{p @@alast} +@print{} alast["4"] = string ("wonderful") +@print{} alast["5"] = string ("program!") +@print{} alast["1"] = string ("awk") +@print{} alast["2"] = string ("is") +@print{} alast["3"] = string ("a") +@end example + +Ignoring the ordering of the elements for now (a @command{dgawk} internals +issue), it looks like we got this far OK. Let's take another step +or two: + +@example +dgawk> @kbd{n} +@print{} 70 clast = join(alast, fcount, n) +dgawk> @kbd{n} +@print{} 71 cline = join(aline, fcount, m) +@end example + +Well, here we are at our error (sorry to spoil the supense). What we +had in mind was to join the fields starting from the second one to make +the virtual record to compare, and if the first field was numbered zero, +this would work. Let's look at what we've got: + +@example +dgawk> @kbd{p cline clast} +@print{} cline = string ("gawk is a wonderful program!") +@print{} clast = string ("awk is a wonderful program!") +@end example + +Hey, those look pretty familiar! They're just our original, unaltered, +input records. A little thinking (the human brain is still the best +debugging tool), and we realize that we were off by one! + +We get out of @command{dgawk}: + +@example +dgawk> @kbd{q} +@print{} The program is running. Exit anyway (y/n)? @kbd{y} +@end example + +@noindent +Then we get into an editor: + +@example +clast = join(alast, fcount+1, n) +cline = join(aline, fcount+1, m) +@end example + +@noindent +and problem solved! + +@node List of Debugger Commands +@section Main @command{dgawk} Commands + +The @command{dgawk} command set can be divided into the +following categories: + +@itemize @bullet{} + +@item +Breakpoint control + +@item +Execution control + +@item +Viewing and changing data + +@item +Working with the stack + +@item +Getting information + +@item +Miscellaneous +@end itemize + +Each of these are discussed in the following subsections. +In the following descriptions, commands which may be abbreviated +show the abbreviation on a second description line. +A @command{dgawk} command name may also be truncated if that partial +name is unambiguous. @command{dgawk} does have a built-in capability to +automatically repeat the previous command when just hitting @key{Enter}. +This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi} +and @code{continue} executed without any argument. + +@menu +* Breakpoint Control:: Control of breakpoints. +* Dgawk Execution Control:: Control of execution. +* Viewing And Changing Data:: Viewing and changing data. +* Dgawk Stack:: Dealing with the stack. +* Dgawk Info:: Obtaining information about the program and + the debugger state. +* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +@end menu + +@node Breakpoint Control +@subsection Control Of Breakpoints + +As we saw above, the first thing you probably want to do in a debugging +session is to get your breakpoints set up, since otherwise your program +will just run as if it was not under the debugger. The commands for +controlling breakpoints are: + +@table @asis +@cindex debugger commands, @code{b} (@code{break}) +@cindex debugger commands, @code{break} +@cindex @code{break} debugger command +@cindex @code{b} debugger command (alias for @code{break}) +@item @code{break} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] +@itemx @code{b} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] +Without any argument, set a breakpoint at the next instruction +to be executed in the selected stack frame. +Arguments can be one of the following: + +@c nested table +@table @var +@item n +Set a breakpoint at line number @var{n} in the current source file. + +@item filename@code{:}n +Set a breakpoint at line number @var{n} in source file @var{filename}. + +@item function +Set a breakpoint at entry to (the first instruction of) +function @var{function}. +@end table + +With a breakpoint, you may also supply a condition. This is an +@command{awk} expression that @command{dgawk} evaluates whenever +the breakpoint is reached. If the condition is true, then @command{dgawk} +stops execution and prompts for a command. Otherwise, @command{dgawk} +continues executing the program. + +@cindex debugger commands, @code{clear} +@cindex @code{clear} debugger command +@item @code{clear} [[@var{filename}@code{:}]@var{n} | @var{function}] +Without any argument, delete any breakpoint at the next instruction +to be executed in the selected stack frame. If the program stops at +a breakpoint, this deletes that breakpoint so that the program +does not stop at that location again. + +@c nested table +@table @var +@item n +Delete breakpoint(s) set at line number @var{n} in the current source file. + +@item filename@code{:}n +Delete breakpoint(s) set at line number @var{n} in source file @var{filename}. + +@item function +Delete breakpoint(s) set at entry to function @var{function}. +@end table + +@cindex debugger commands, @code{condition} +@cindex @code{condition} debugger command +@item @code{condition} @var{n} @code{"@var{expression}"} +Add a condition to existing breakpoint or watchpoint @var{n}. The +condition is an @command{awk} expression that @command{dgawk} evaluates +whenever the breakpoint is reached. If the condition is true, then +@command{dgawk} stops execution and prompts for a command. Otherwise, +@command{dgawk} continues executing the program. + +@cindex debugger commands, @code{d} (@code{delete}) +@cindex debugger commands, @code{delete} +@cindex @code{delete} debugger command +@cindex @code{d} debugger command (alias for @code{break}) +@item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] +@itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] +Delete specified breakpoints or a range of breakpoints. Deletes +all defined breakpoints if no argument is supplied. + +@cindex debugger commands, @code{disable} +@cindex @code{disable} debugger command +@item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] +Disable specified breakpoints or a range of breakpoints. Without +any argument, disables all breakpoints. + +@cindex debugger commands, @code{e} (@code{enable}) +@cindex debugger commands, @code{enable} +@cindex @code{enable} debugger command +@cindex @code{e} debugger command (alias for @code{break}) +@item @code{enable} [@code{once} | @code{del}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] +@itemx @code{e} [@code{once} | @code{del}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] +Enable specified breakpoints or a range of breakpoints. Without +any argument, enables all breakpoints. +Optionally, you can specify how to enable the breakpoint: + +@c nested table +@table @code +@item del +Enable breakpoint(s) tempoarily, then delete it when +the program stops at the breakpoint. + +@item once +Enable breakpoint(s) temporarily, then disable it when +the program stops at the breakpoint. +@end table + +@cindex debugger commands, @code{ignore} +@cindex @code{ignore} debugger command +@item @code{ignore} @var{n} @var{count} +Ignore breakpoint number @var{n} the next @var{count} times it is +hit. + +@cindex debugger commands, @code{t} (@code{tbreak}) +@cindex debugger commands, @code{tbreak} +@cindex @code{tbreak} debugger command +@cindex @code{t} debugger command (alias for @code{tbreak}) +@item @code{tbreak} [[@var{filename}@code{:}]@var{n} | @var{function}] +@itemx @code{t} [[@var{filename}@code{:}]@var{n} | @var{function}] +Set a temporary breakpoint (enabled for only one stop). +@end table + +@node Dgawk Execution Control +@subsection Control of Execution + +Now that your breakpoints are ready, you can start running the program +and observing its behavior. There are more commands for controlling +execution of the program than we saw in our earlier example: + +@table @asis +@cindex debugger commands, @code{commands} +@cindex @code{commands} debugger command +@cindex debugger commands, @code{silent} +@cindex @code{silent} debugger command +@cindex debugger commands, @code{end} +@cindex @code{end} debugger command +@item @code{commands} [@var{n}] +@itemx @code{silent} +@itemx @dots{} +@itemx @code{end} +Set a list of commands to be executed upon stopping at +a breakpoint or watchpoint. @var{n} is the breakpoint or watchpoint number. +Without a number, last one set is used. The actual commands follow +starting on the next line and are terminated by the @code{end} command. +If the command @code{silent} is in the list, the usual messages about +stopping at a breakpoint and the source line are not printed. Any command +in the list that resumes execution (e.g. @code{continue}) terminates the list +(an implicit @code{end}), and subsequent commands are ignored. +For example: + +@example +dgawk> @kbd{commands} +> @kbd{silent} +> @kbd{printf "A silent breakpoint; i = %d\n", i} +> @kbd{info locals} +> @kbd{set i = 10} +> @kbd{continue} +> @kbd{end} +dgawk> +@end example + +@cindex debugger commands, @code{c} (@code{continue}) +@cindex debugger commands, @code{continue} +@item @code{continue} [@var{count}] +@itemx @code{c} [@var{count}] +Resume program execution. If continued from a breakpoint and @var{count} is +specified, ignores the breakpoint at that location the next @var{count} times +before stopping. + +@cindex debugger commands, @code{finish} +@cindex @code{finish} debugger command +@item @code{finish} +Execute until the selected stack frame returns. +Prints the returned value. + +@cindex debugger commands, @code{n} (@code{next}) +@cindex debugger commands, @code{next} +@cindex @code{next} debugger command +@cindex @code{n} debugger command (alias for @code{next}) +@item @code{next} [@var{count}] +@itemx @code{n} [@var{count}] +Continue execution to the next source line, stepping over function calls. +The argument @var{count} controls how many times to repeat the action, as +in @code{step}. + +@cindex debugger commands, @code{ni} (@code{nexti}) +@cindex debugger commands, @code{nexti} +@cindex @code{nexti} debugger command +@cindex @code{ni} debugger command (alias for @code{nexti}) +@item @code{nexti} [@var{count}] +@itemx @code{ni} [@var{count}] +Execute one (or @var{count}) instruction(s), stepping over function calls. + +@cindex debugger commands, @code{return} +@cindex @code{return} debugger command +@item @code{return} [@var{value}] +Cancel execution of a function call. If @var{value} (either a string or a +number) is specified, it is used as the function's return value. If used in a +frame other than the innermost one (the currently executing function, i.e., +frame number 0), discard all inner frames in addition to the selected one, +and the caller of that frame becomes the innermost frame. + +@cindex debugger commands, @code{r} (@code{run}) +@cindex debugger commands, @code{run} +@cindex @code{run} debugger command +@cindex @code{r} debugger command (alias for @code{run}) +@item @code{run} +@itemx @code{r} +Start/restart execution of the program. When restarting, @command{dgawk} +retains the current breakpoints, watchpoints, command history, +automatic display variables, and debugger options. + +@cindex debugger commands, @code{s} (@code{step}) +@cindex debugger commands, @code{step} +@cindex @code{step} debugger command +@cindex @code{s} debugger command (alias for @code{step}) +@item @code{step} [@var{count}] +@itemx @code{s} [@var{count}] +Continue execution until control reaches a different source line in the +current stack frame. @code{step} steps inside any function called within +the line. If the argument @var{count} is supplied, steps that many times before +stopping, unless it encounters a breakpoint or watchpoint. + +@cindex debugger commands, @code{si} (@code{stepi}) +@cindex debugger commands, @code{stepi} +@cindex @code{stepi} debugger command +@cindex @code{si} debugger command (alias for @code{stepi}) +@item @code{stepi} [@var{count}] +@itemx @code{si} [@var{count}] +Execute one (or @var{count}) instruction(s), stepping inside function calls. +(For illustration of what is meant by an ``instruction'' in @command{gawk}, +see the output shown under @code{dump} in @ref{Miscellaneous Dgawk Commands}). + +@cindex debugger commands, @code{u} (@code{until}) +@cindex debugger commands, @code{until} +@cindex @code{until} debugger command +@cindex @code{u} debugger command (alias for @code{until}) +@item @code{until} [[@var{filename}@code{:}]@var{n} | @var{function}] +@itemx @code{u} [[@var{filename}@code{:}]@var{n} | @var{function}] +Without any argument, continues execution until a line past the current +line in current stack frame is reached. With argument, +continues execution until the specified location is reached, or the current +stack frame returns. +@end table + +@node Viewing And Changing Data +@subsection Viewing and Changing Data + +The commands for viewing and changing variables inside of @command{gawk} are: + +@table @asis +@cindex debugger commands, @code{display} +@cindex @code{display} debugger command +@item @code{display} [@var{var} | @code{$}@var{n}] +Add variable @var{var} (or field @code{$@var{n}}) to the display list. +The value of the variable or field is displayed each time the program stops. +Each variable added to the list is identified by a unique number: + +@example +dgawk> @kbd{display x} +@print{} 10: x = 1 +@end example + +@noindent +displays the assigned item number, the variable name and its current value. +If the display variable refers to a function parameter, it is silently +deleted from the list as soon as the execution reaches a context where +no such variable of the given name exists. +Without argument, @code{display} displays the current values of +items on the list. + +@cindex debugger commands, @code{eval} +@cindex @code{eval} debugger command +@item @code{eval "@var{awk statements}"} +Evaluate @var{awk statements} in the context of the running program. +You can do anything that an @command{awk} program would do: assign +values to variables, call functions, and so on. + +@item @code{eval} @var{param}, @dots{} +@itemx @var{awk statements} +@itemx @code{end} +This form of @code{eval} is similar, but it allows you to define +``local variables'' that exist in the context of the +@var{awk statements}, instead of using variables or function +parameters defined by the program. + +@cindex debugger commands, @code{p} (@code{print}) +@cindex debugger commands, @code{print} +@cindex @code{print} debugger command +@cindex @code{p} debugger command (alias for @code{print}) +@item @code{print} @var{var1}[@code{,} @var{var2} @dots{}] +@itemx @code{p} @var{var1}[@code{,} @var{var2} @dots{}] +Print the value of a @command{gawk} variable or field. +Fields must be referenced by constants: + +@example +dgawk> @kbd{print $3} +@end example + +@noindent +prints the third field in the input record (if the specified field does not +exist, it prints @samp{Null field}). A variable can be an array element, with +the subscripts being constant values. To print the contents of an array, +prefix the name of the array with the @samp{@@} symbol: + +@example +gawk> @kbd{print @@a} +@end example + +@noindent +prints the index and the corresponding value for all elements in +the array @code{a}. + +@cindex debugger commands, @code{printf} +@cindex @code{printf} debugger command +@item @code{printf} @var{format} [@code{,} @var{arg} @dots{}] +Print formatted text. The @var{format} may include escape sequences, +such as @samp{\n} +(@pxref{Escape Sequences}). +No newline is printed unless one is specified. + +@cindex debugger commands, @code{set} +@cindex @code{set} debugger command +@item @code{set} @var{var}@code{=}@var{value} +Assign a constant (number or string) value to an @command{awk} variable +or field. +String values must be enclosed between double quotes (@code{"@dots{}"}). + +You can also set special @command{awk} variables, such as @code{FS}, +@code{NF}, @code{NR}, etc. + +@cindex debugger commands, @code{w} (@code{watch}) +@cindex debugger commands, @code{watch} +@cindex @code{watch} debugger command +@cindex @code{w} debugger command (alias for @code{watch}) +@item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] +@itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] +Add variable @var{var} (or field @code{$@var{n}}) to the watch list. +@command{dgawk} then stops whenever +the value of the variable or field changes. Each watched item is assigned a +number which can be used to delete it from the watch list using the +@code{unwatch} command. + +With a watchpoint, you may also supply a condition. This is an +@command{awk} expression that @command{dgawk} evaluates whenever +the watchpoint is reached. If the condition is true, then @command{dgawk} +stops execution and prompts for a command. Otherwise, @command{dgawk} +continues executing the program. + +@cindex debugger commands, @code{undisplay} +@cindex @code{undisplay} debugger command +@item @code{undisplay} [@var{n}] +Remove item number @var{n} (or all items, if no argument) from the +automatic display list. + +@cindex debugger commands, @code{unwatch} +@cindex @code{unwatch} debugger command +@item @code{unwatch} [@var{n}] +Remove item number @var{n} (or all items, if no argument) from the +watch list. + +@end table + +@node Dgawk Stack +@subsection Dealing With The Stack + +Whenever you run a program which contains any function calls, +@command{gawk} maintains a stack which has all of the functions leading up +to where the program is right now. You can see how you got to where you are, +and also move around in the stack to see what the state of things was in the +functions which called the one you are in. The commands for doing this are: + +@table @asis +@cindex debugger commands, @code{bt} (@code{backtrace}) +@cindex debugger commands, @code{backtrace} +@cindex @code{backtrace} debugger command +@cindex @code{bt} debugger command (alias for @code{backtrace}) +@item @code{backtrace} [@var{count}] +@itemx @code{bt} [@var{count}] +Print a backtrace of all function calls (stack frames), or innermost @var{count} +frames if @var{count} > 0. Print the outermost @var{count} frames if +@var{count} < 0. The backtrace displays the name and arguments to each +function, the source @value{FN}, and the line number. + +@cindex debugger commands, @code{down} +@cindex @code{down} debugger command +@item @code{down} [@var{count}] +Move @var{count} (default 1) frames down the stack toward the innermost frame. +Then select and print the frame. + +@cindex debugger commands, @code{f} (@code{frame}) +@cindex debugger commands, @code{frame} +@cindex @code{frame} debugger command +@cindex @code{f} debugger command (alias for @code{frame}) +@item @code{frame} [@var{n}] +@itemx @code{f} [@var{n}] +Select and print (frame number, function and argument names, source file, +and the source line) stack frame @var{n}. Frame 0 is the currently executing, +or @dfn{innermost}, frame (function call), frame 1 is the frame that called the +innermost one. The highest numbered frame is the one for the main program. + +@cindex debugger commands, @code{up} +@cindex @code{up} debugger command +@item @code{up} [@var{count}] +Move @var{count} (default 1) frames up the stack toward the outermost frame. +The select and print the frame. +@end table + +@node Dgawk Info +@subsection Obtaining Information About The Program and The Debugger State + +Besides looking at the values of variables, there is often a need to get +other sorts of information about the state of your program and of the +debugging environment itself. @command{dgawk} has one command which +provides this information, appropriately called @code{info}. @code{info} +is used with one of a number of arguments which tell it exactly what +you want to know: + +@table @asis +@cindex debugger commands, @code{i} (@code{info}) +@cindex debugger commands, @code{info} +@cindex @code{info} debugger command +@cindex @code{i} debugger command (alias for @code{info}) +@item @code{info} @var{what} +@itemx @code{i} @var{what} +The value for @var{what} should be one of the following: + +@c nested table +@table @code +@item args +Arguments of the selected frame. + +@item break +List all currently set breakpoints. + +@item display +List of all items in the automatic display list. + +@item frame +Description of the selected stack frame. + +@item functions +List all function definitions including source file names and +line numbers. + +@item locals +Local variables of the selected frame. + +@item source +The name of the current source file. Each time the program stops, the +current source file is the file containing the current instruction. +When @command{dgawk} first starts, the current source file is the first file +included via the @option{-f} option. The +@samp{list @var{filename}:@var{lineno}} command can +be used at any time to change the current source. + +@item sources +List all program sources. + +@item variables +List all global variables. + +@item watch +List of all items in the watch list. +@end table +@end table + +Additional commands give you control over the debugger, the ability to +save the debugger's state, and the ability to run debugger commands +from a file. The commands are: + +@table @asis +@cindex debugger commands, @code{o} (@code{option}) +@cindex debugger commands, @code{option} +@cindex @code{option} debugger command +@cindex @code{o} debugger command (alias for @code{option}) +@item @code{option} [@var{name}[@code{=}@var{value}]] +@itemx @code{o} [@var{name}[@code{=}@var{value}]] +Without an argument, display the available debugger options +and their current values. @samp{option @var{name}} shows the current +value of the named option. @samp{option @var{name}=@var{value}} assigns +a new value to the named option. +The available options are: + +@c nested table +@table @code +@item history_size +The maximum number of lines to keep in the history file @file{./.dgawk_history}. +The default is 100. + +@item listsize +The number of lines that @code{list} prints. The default is 15. + +@item outfile +Sends @command{gawk} output to a file; debugger output still goes +to standard output. An empty string (@code{""}) resets output to +standard output. + +@item prompt +The debugger prompt. The default is @samp{dgawk>}. + +@item save_history @r{[}on @r{|} off@r{]} +Save command history to file @file{./.dgawk_history}. +The default is @code{on}. + +@item save_options @r{[}on @r{|} off@r{]} +Save current options to file @file{./.dgawkrc} upon exit. +The default is @code{on}. +Options are read back in to the next session upon startup. + +@item trace @r{[}on @r{|} off@r{]} +Turn instruction tracing on or off. The default is @code{off}. +@end table + +@item @code{save} @var{filename} +Save the commands from the current session to the given @value{FN}, +so that they can be replayed using the @command{source} command. + +@item @code{source} @var{filename} +Run command(s) from a file; an error in any command does not +terminate execution of subsequent commands. Comments (lines starting +with @samp{#}) are allowed in a command file. +Empty lines are ignored; they do @emph{not} +repeat the last command. +You can't restart the program by having more than one @code{run} +command in the file. Also, the list of commands may include additional +@code{source} commands; however, @command{dgawk} will not source the +same file more than once in order to avoid infinite recursion. + +In addition to, or instead of the @code{source} command, you can use +the @option{-R @var{file}} or @option{--command=@var{file}} command-line +options to execute commands from a file non-interactively. +@end table + +@node Miscellaneous Dgawk Commands +@subsection Miscellaneous Commands + +There are a few more commands which do not fit into the +previous categories, as follows: + +@table @asis +@cindex debugger commands, @code{dump} +@cindex @code{dump} debugger command +@item @code{dump} [@var{filename}] +Dump bytecode of the program to standard output or to the file +named in @var{filename}. This prints a representation of the internal +instructions which @command{gawk} executes to implement the @command{awk} +commands in a program. This can be very enlightening, as the following +partial dump of Davide Brini's obfuscated code +(@pxref{Signature Program}) demonstrates: + +@smallexample +dgawk> @kbd{dump} +@print{} # BEGIN +@print{} +@print{} [ 2:0x1d4355f0] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] +@print{} [ 3:0x1d435710] Op_push_i : "~" [MALLOC|PERM|STRING|STRCUR] +@print{} [ 3:0x1d4357c0] Op_push_i : "~" [MALLOC|PERM|STRING|STRCUR] +@print{} [ 3:0x1d435790] Op_match : +@print{} [ 3:0x1d435680] Op_push_lhs : O [do_reference = FALSE] +@print{} [ 3:0x1d4356b0] Op_assign : +@print{} [ :0x1d4356e0] Op_pop : +@print{} [ 4:0x1d4358c0] Op_push_i : "==" [MALLOC|PERM|STRING|STRCUR] +@print{} [ 4:0x1d435970] Op_push_i : "==" [MALLOC|PERM|STRING|STRCUR] +@print{} [ 4:0x1d435940] Op_equal : +@print{} [ 4:0x1d435810] Op_push_lhs : o [do_reference = FALSE] +@print{} [ 4:0x1d435860] Op_assign : +@print{} [ :0x1d435890] Op_pop : +@print{} [ 5:0x1d435a70] Op_push : o +@print{} [ 5:0x1d435a40] Op_plus_i : 0 [MALLOC|NUMCUR|NUMBER] +@print{} [ 5:0x1d4359c0] Op_push_lhs : o [do_reference = TRUE] +@print{} [ 5:0x1d435910] Op_assign_plus : +@print{} [ :0x1d435a10] Op_pop : +@print{} [ 6:0x1d435b50] Op_push : O +@print{} [ 6:0x1d435b80] Op_push_i : "" [MALLOC|PERM|STRING|STRCUR] +@print{} [ :0x1d435c60] Op_no_op : +@print{} [ 6:0x1d435c30] Op_push : O +@print{} [ :0x1d435c90] Op_concat : [expr_count = 3] +@print{} [ 6:0x1d435ad0] Op_push_lhs : x [do_reference = FALSE] +@print{} [ 6:0x1d435aa0] Op_assign : +@print{} [ :0x1d435b00] Op_pop : +@print{} [ 7:0x1d435c00] Op_push_loop : [target_continue = 0x1d435bd0] [target_break = 0x1d435fc0] +@print{} [ 7:0x1d435bd0] Op_push_lhs : X [do_reference = TRUE] +@print{} [ 7:0x1d435cc0] Op_postincrement : +@print{} [ 7:0x1d435d70] Op_push : x +@print{} [ 7:0x1d435e00] Op_push : o +@print{} [ 7:0x1d435da0] Op_plus : +@print{} [ 7:0x1d435e60] Op_push : o +@print{} [ 7:0x1d435e30] Op_plus : +@print{} [ 7:0x1d435d20] Op_leq : +@print{} [ :0x1d435cf0] Op_jmp_false : [target_jmp = 0x1d435fc0] +@print{} [ 8:0x1d435f40] Op_push_i : "%c" [MALLOC|PERM|STRING|STRCUR] +@print{} [ :0x1d435ff0] Op_no_op : +@print{} [ 8:0x1d435dd0] Op_push_lhs : c [do_reference = FALSE] +@print{} [ 8:0x1d435e90] Op_assign_concat : +@print{} [ :0x1d435ec0] Op_pop : +@print{} [ :0x1d435f90] Op_jmp : [target_jmp = 0x1d435bd0] +@print{} [ :0x1d435fc0] Op_pop_loop : +@print{} +@print{} @dots{} +@print{} +@print{} [ 9:0x1d435f10] Op_K_printf : [expr_count = 17] [redir_type = Op_illegal] +@print{} [ :0x1d435180] Op_no_op : +@print{} [ :0x1d435240] Op_exit : [exit_value = 0] +dgawk> +@end smallexample + +@cindex debugger commands, @code{h} (@code{help}) +@cindex debugger commands, @code{help} +@cindex @code{help} debugger command +@cindex @code{h} debugger command (alias for @code{help}) +@item @code{help} +@itemx @code{h} +Print a list of all of the @command{dgawk} commands with a short +summary of their usage. @samp{help @var{command}} prints the information +about the command @var{command}. + +@cindex debugger commands, @code{l} (@code{list}) +@cindex debugger commands, @code{list} +@cindex @code{list} debugger command +@cindex @code{l} debugger command (alias for @code{list}) +@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}---@var{m} | @var{function}] +@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}---@var{m} | @var{function}] +Print the specified lines (default 15) from the current source file +or the file named @var{filename}. The possible arguments to @code{list} +are as follows: + +@c nested table +@table @asis +@item @code{-} +Print lines before the lines last printed. + +@item @code{+} +Print lines after the lines last printed. +@code{list} without any argument does the same thing. + +@item @var{n} +Print lines centered around line number @var{n}. + +@item @var{n}---@var{m} +Print lines from @var{n} to @var{m}. + +@item @var{filename@code{:}n} +Print lines centered around line number @var{n} in +source file @var{filename}. This command may change the current source file. + +@item @var{function} +Print lines centered around beginning of the +function @var{function}. This command may change the current source file. +@end table + +@cindex debugger commands, @code{q} (@code{quit}) +@cindex debugger commands, @code{quit} +@cindex @code{quit} debugger command +@cindex @code{q} debugger command (alias for @code{quit}) +@item @code{quit} +@itemx @code{q} +Exit the debugger. Debugging is great fun, but sometimes we all have +to tend to other obligations in life (and sometimes we find the bug, +and are free to go on to the next one!). As we saw above, if you are +running a program, @command{dgawk} warns you if you accidentally type +@code{q} or @code{quit}, to make sure you really want to quit. + +@cindex debugger commands, @code{trace} +@cindex @code{trace} debugger command +@item @code{trace} @code{on} | @code{off} +Turn on or off a continuous printing of instructions which are about to +be executed, along with printing the @command{awk} line which they +implement. The default is @code{off}. + +It is to be hoped that most of the ``opcodes'' in these instructions are +fairly self-explanatory, and using @code{stepi} and @code{nexti} while +@code{trace} is on will make them into familiar friends. + +@end table + +@node Readline Support +@section Readline Support + +If compiled with the @code{readline} library, you can take advantage of +its command completion and history expansion features. The following types +of completion are available: + +@table @asis +@item Command completion +Command names. + +@item Source @value{FN} completion +Source @value{FN}s. Relevant commands are +@code{break}, +@code{clear}, +@code{list}, +@code{tbreak}, +and +@code{until}. + +@item Argument completion +Non-numeric arguments to a command. +Relevant commands are @code{info} and @code{enable}. + +@item Variable name completion +Global variable names, and function arguments in the current context +if the program is running. Relevant commands are +@code{display}, +@code{print}, +@code{set}, +and +@code{watch}. + +@end table + +@node Dgawk Limitations +@section Limitations and Future Plans + +We hope you find @command{dgawk} useful and enjoyable to work with, +but as with any program, especially in its early releases, it still has +some limitations. A few which are worth being aware of are: + +@itemize @bullet{} +@item +At this point, @command{dgawk} does not give a detailed explanation of +what you did wrong when you type in something it doesn't like. Rather, it just +responds @samp{syntax error}. When you do figure out what your mistake was, +though, you'll feel like a real guru. + +@item +If you perused the dump of opcodes in @xref{Miscellaneous Dgawk Commands}, +(or you are already familiar with @command{gawk} internals), +you will realize that much of the internal manipulation of data +in @command{gawk}, as in many interpreters, is done on a stack. +@code{Op_push}, @code{Op_pop}, etc., are the ``bread and butter'' of +most @command{gawk} code. Unfortunately, as of now, @command{dgawk} +does not provide the capability of examining the stack's contents. + +That is, the intermediate results of expression evaluation are on the +stack, but cannot be printed. Rather, only variables which are defined +in the program can actually be printed. Of course, a workaround for +this is to use more explicit variables at the debugging stage and then +change back to obscure, perhaps more optimal code later. + +@item +There is no way right now to look ``inside'' the process of compiling +regular expressions to see if you got it right. As an @command{awk} +programmer, you are expected to know what @code{/[^[:alnum:][:blank:]]/} +means. + +@item +@command{dgawk} is designed to be used by running a program (with all its +parameters) on the command line, as described in @ref{dgawk invocation}. +There is no way (as of now) to attach or ``break in'' to a running program. +This seems reasonable for a language which is used mainly for quickly +executing, short programs. +@end itemize + +Look forward to a future release when these and other missing features may +be added, and of course feel free to try to add them yourself if you want. + @ignore @c Try this @iftex @@ -23901,15 +25349,15 @@ The @code{do}-@code{while} statement (@pxref{Do Statement}). @item -The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and -@code{srand} (@pxref{Numeric Functions}). +The built-in functions @code{atan2()}, @code{cos()}, @code{sin()}, @code{rand()}, and +@code{srand()} (@pxref{Numeric Functions}). @item -The built-in functions @code{gsub}, @code{sub}, and @code{match} +The built-in functions @code{gsub()}, @code{sub()}, and @code{match()} (@pxref{String Functions}). @item -The built-in functions @code{close} and @code{system} +The built-in functions @code{close()} and @code{system()} (@pxref{I/O Functions}). @item @@ -23939,7 +25387,7 @@ programs (@pxref{Precedence}). @item Regexps as the value of @code{FS} (@pxref{Field Separators}) and as the -third argument to the @code{split} function +third argument to the @code{split()} function (@pxref{String Functions}), rather than using only the first character of @code{FS}. @@ -23999,11 +25447,11 @@ The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences @c GNU, for ANSI C compat @item -A defined return value for the @code{srand} built-in function +A defined return value for the @code{srand()} built-in function (@pxref{Numeric Functions}). @item -The @code{toupper} and @code{tolower} built-in string functions +The @code{toupper()} and @code{tolower()} built-in string functions for case translation (@pxref{String Functions}). @@ -24091,9 +25539,13 @@ The locale's decimal point character is used for parsing input data (@pxref{Locales}). @item -The @code{fflush} built-in function is not supported +The @code{fflush()} built-in function is not supported (@pxref{I/O Functions}). @end itemize + +The 2008 POSIX standard can be found online at +@url{http://www.opengroup.org/onlinepubs/9699919799/}. + @c ENDOFRANGE gawkv @node BTL @@ -24119,7 +25571,7 @@ As a side note, his @command{awk} no longer needs these options; it continues to accept them to avoid breaking old programs. @item -The @code{fflush} built-in function for flushing buffered output +The @code{fflush()} built-in function for flushing buffered output (@pxref{I/O Functions}). @item @@ -24156,7 +25608,7 @@ special files @item The ability for @code{FS} and for the third -argument to @code{split} to be null strings +argument to @code{split()} to be null strings (@pxref{Single Character Fields}). @item @@ -24165,12 +25617,12 @@ The @code{nextfile} statement @item The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{String Functions}). +(@pxref{Delete}). @item -The ability for the @code{length} function to accept an array argument and +The ability for the @code{length()} function to accept an array argument and return the number of elements in the array. -(@pxref{Delete}). +(@pxref{String Functions}). @end itemize @node POSIX/GNU @@ -24227,7 +25679,7 @@ The @code{FIELDWIDTHS} variable and its effects (@pxref{Constant Size}). @item -The @code{systime} and @code{strftime} built-in functions for obtaining +The @code{systime()} and @code{strftime()} built-in functions for obtaining and printing timestamps (@pxref{Time Functions}). @@ -24262,7 +25714,7 @@ through @code{ARGV} (@pxref{Built-in Variables}). @item The @code{ERRNO} variable, which contains the system error message when -@code{getline} returns @minus{}1 or @code{close} fails +@code{getline} returns @minus{}1 or @code{close()} fails (@pxref{Built-in Variables}). @item @@ -24302,17 +25754,17 @@ Full support for both POSIX and GNU regexps (@pxref{Regexp}). @item -The @code{gensub} function for more powerful text manipulation +The @code{gensub()} function for more powerful text manipulation (@pxref{String Functions}). @item -The @code{strftime} function acquired a default time format, +The @code{strftime()} function acquired a default time format, allowing it to be called with no arguments (@pxref{Time Functions}). @item The ability for @code{FS} and for the third -argument to @code{split} to be null strings +argument to @code{split()} to be null strings (@pxref{Single Character Fields}). @item @@ -24330,7 +25782,7 @@ the original Version 7 Unix version of @command{awk} (@pxref{V7/SVR3.1}). @item -The @option{-m} option and the @code{fflush} function from the +The @option{-m} option and the @code{fflush()} function from the Bell Laboratories research version of @command{awk} (@pxref{Options}; also @pxref{I/O Functions}). @@ -24392,12 +25844,12 @@ The @file{/inet} special files for TCP/IP networking using @samp{|&} (@pxref{TCP/IP Networking}). @item -The optional second argument to @code{close} that allows closing one end +The optional second argument to @code{close()} that allows closing one end of a two-way pipe to a coprocess (@pxref{Two-way I/O}). @item -The optional third argument to the @code{match} function +The optional third argument to the @code{match()} function for capturing text-matching subexpressions within a regexp (@pxref{String Functions}). @@ -24407,33 +25859,33 @@ making translations easier (@pxref{Printf Ordering}). @item -The @code{asort} and @code{asorti} functions for sorting arrays +The @code{asort()} and @code{asorti()} functions for sorting arrays (@pxref{Array Sorting}). @item -The @code{bindtextdomain}, @code{dcgettext} and @code{dcngettext} functions +The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()} functions for internationalization (@pxref{Programmer i18n}). @item -The @code{extension} built-in function and the ability to add +The @code{extension()} built-in function and the ability to add new built-in functions dynamically (@pxref{Dynamic Extensions}). @item -The @code{mktime} built-in function for creating timestamps +The @code{mktime()} built-in function for creating timestamps (@pxref{Time Functions}). @item The -@code{and}, -@code{or}, -@code{xor}, -@code{compl}, -@code{lshift}, -@code{rshift}, +@code{and()}, +@code{or()}, +@code{xor()}, +@code{compl()}, +@code{lshift()}, +@code{rshift()}, and -@code{strtonum} built-in +@code{strtonum()} built-in functions (@pxref{Bitwise Functions}). @@ -24468,9 +25920,10 @@ to use the locale's decimal point for parsing input data (@pxref{Conversion}). @item -The @option{--enable-portals} configuration option to enable special treatment of -pathnames that begin with @file{/p} as BSD portals -(@pxref{Portal Files}). +The @option{--enable-portals} configuration option to enable special +treatment of pathnames that begin with @file{/p} as BSD portals. (This +option is no longer available; the related code was removed since it +was never used.) @item The use of GNU Automake to help in standardizing the configuration process @@ -24503,13 +25956,13 @@ at compile time @item The @option{--with-whiny-user-strftime} configuration option to force the use -of the included version of the @code{strftime} +of the included version of the @code{strftime()} function for deficient systems (@pxref{Additional Configuration Options}). @item -POSIX compliance for @code{sub} and @code{gsub} +POSIX compliance for @code{sub()} and @code{gsub()} (@pxref{Gory Details}). @item @@ -24517,12 +25970,12 @@ The @option{--exec} option, for use in CGI scripts (@pxref{Options}). @item -The @code{length} function was extended to accept an array argument +The @code{length()} function was extended to accept an array argument and return the number of elements in the array (@pxref{String Functions}). @item -The @code{strftime} function acquired a third argument to +The @code{strftime()} function acquired a third argument to enable printing times as UTC (@pxref{Time Functions}). @end itemize @@ -24578,7 +26031,7 @@ The @code{FPAT} variable and its effects (@pxref{Splitting By Content}). @item -The @code{patsplit} function +The @code{patsplit()} function (@pxref{String Functions}). @item @@ -24586,6 +26039,19 @@ The @file{/inet4} and @samp{/inet6} special files for TCP/IP networking using @samp{|&} to specify which version of the IP protocol to use. (@pxref{TCP/IP Networking}). +@item +The @option{--compat}, @option{--copyleft} and @option{--usage} +options were removed. + +@item +The @code{break} and @code{continue} statements may no longer +be used outside a loop, even with @option{--traditional} +(@pxref{Break Statement}, and +@pxref{Continue Statement}). + +@item +The @option{--enable-portals} configuration option was removed. + @end itemize @c XXX ADD MORE STUFF HERE @@ -24714,7 +26180,7 @@ the various PC platforms. @item @cindex Zoulas, Christos Christos Zoulas -provided the @code{extension} +provided the @code{extension()} built-in function for dynamically adding new modules. @item @@ -24748,9 +26214,9 @@ GNU Automake and @code{gettext}. @item @cindex Broder, Alan J.@: Alan J.@: Broder -provided the initial version of the @code{asort} function +provided the initial version of the @code{asort()} function as well as the code for the new optional third argument to the -@code{match} function. +@code{match()} function. @item @cindex Buening, Andreas @@ -24829,6 +26295,8 @@ There are three ways to get GNU software: @item Copy it from someone else who already has it. +@cindex FSF (Free Software Foundation) +@cindex Free Software Foundation (FSF) @item Retrieve @command{gawk} from the Internet host @@ -25040,7 +26508,7 @@ Files needed for building @command{gawk} on a Tandem Files needed for building @command{gawk} on POSIX-compliant systems. @item pc/* -Files needed for building @command{gawk} under MS-DOS, MS Windows and OS/2 +Files needed for building @command{gawk} under MS Windows and OS/2 (@pxref{PC Installation}, for details). @item vms/* @@ -25143,18 +26611,10 @@ command line when compiling @command{gawk} from scratch, including: @table @code -@cindex @code{--enable-portals} configuration option -@cindex configuration option, @code{--enable-portals} -@item --enable-portals -Treat pathnames that begin -with @file{/p} as BSD portal files when doing two-way I/O with -the @samp{|&} operator -(@pxref{Portal Files}). - @cindex @code{--with-whiny-user-strftime} configuration option @cindex configuration option, @code{--with-whiny-user-strftime} @item --with-whiny-user-strftime -Force use of the included version of the @code{strftime} +Force use of the included version of the @code{strftime()} function for deficient systems @cindex @code{--disable-lint} configuration option @@ -25185,11 +26645,11 @@ This is usually not desirable, but it may bring you some slight performance improvement. @end table -As of version 3.1.5, the @option{--with-included-gettext} configuration +As of @value{PVERSION} 3.1.5, the @option{--with-included-gettext} configuration option is no longer available, since @command{gawk} expects the GNU @code{gettext} library to be installed as an external library. -As of version 3.1.8, the @option{--disable-libsigsegv} configuration +As of @value{PVERSION} 3.1.8, the @option{--disable-libsigsegv} configuration option is no longer available, since @command{gawk} expects the GNU @code{libsigsegv} library to be installed as an external library. @@ -25468,7 +26928,7 @@ To build some of the example extension libraries, @command{cd} to the extension directory and copy @file{Makefile.pc} to @file{Makefile}. You can then build using the same two targets. To run the example @command{awk} scripts, you'll need to either change the call to -the @code{extension} function to match the name of the library (for +the @code{extension()} function to match the name of the library (for instance, change @code{"./ordchr.so"} to @code{"ordchr.dll"} or simply @code{"ordchr"}), or rename the library to match the call (for instance, rename @file{ordchr.dll} to @file{ordchr.so}). @@ -25542,7 +27002,7 @@ E.g., if @env{UNIXROOT} is set to @file{e:} the complete default search path is An @command{sh}-like shell (as opposed to @command{command.com} under MS-DOS or @command{cmd.exe} under OS/2) may be useful for @command{awk} programming. Ian Stewartson has written an excellent shell for MS-DOS and OS/2, -Daisuke Aoyama has ported GNU @command{bash} to MS-DOS using the DJGPP tools, +Daisuke Aoyama has ported GNU Bash to MS-DOS using the DJGPP tools, and several shells are available for OS/2, including @command{ksh}. The file @file{README_d/README.pc} in the @command{gawk} distribution contains information on these shells. Users of Stewartson's shell on DOS should @@ -25643,7 +27103,7 @@ moved into the @code{BEGIN} rule. @command{gawk} can be used ``out of the box'' under Windows if you are using the @uref{http://www.cygwin.com, Cygwin environment}. This environment provides an excellent simulation of Unix, using the -GNU tools, such as @command{bash}, the GNU Compiler Collection (GCC), +GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, and other GNU tools. Compilation and installation for Cygwin is the same as for a Unix system: @@ -25662,7 +27122,7 @@ and then the @samp{make} proceeds as usual. The @samp{|&} operator and TCP/IP networking (@pxref{TCP/IP Networking}) are fully supported in the Cygwin environment. This is not true -for any other environment for MS-DOS or MS-Windows. +for any other environment for MS-Windows. @end quotation @node MSYS @@ -25921,7 +27381,7 @@ from other environments. Pipes are nice to have but not vital. A proper compilation of @command{gawk} sources when @code{sizeof(int)} differs from @code{sizeof(void *)} requires an ISO C compiler. An initial port was done with @command{gcc}. You may actually prefer executables -where @code{int}s are four bytes wide but the other variant works as well. +where @code{int()}s are four bytes wide but the other variant works as well. You may need quite a bit of memory when trying to recompile the @command{gawk} sources, as some source files (@file{regex.c} in particular) are quite @@ -25930,7 +27390,7 @@ optimization level for this particular file, which may help. @cindex Linux @cindex GNU/Linux -With a reasonable shell (@command{bash} will do), you have a pretty good chance +With a reasonable shell (Bash will do), you have a pretty good chance that the @command{configure} utility will succeed, and in particular if you run GNU/Linux, MiNT or a similar operating system. Otherwise sample versions of @file{config.h} and @file{Makefile.st} are given in the @@ -25945,13 +27405,13 @@ Modify these sections as appropriate if they are not right for your environment. Also see the remarks about @env{AWKPATH} and @code{envsep} in @ref{Atari Using}. -As shipped, the sample @file{config.h} claims that the @code{system} +As shipped, the sample @file{config.h} claims that the @code{system()} function is missing from the libraries, which is not true, and an alternative implementation of this function is provided in @file{unsupported/atari/system.c}. Depending upon your particular combination of shell and operating system, you might want to change the file to indicate -that @code{system} is available. +that @code{system()} is available. @node Atari Using @appendixsubsubsec Running @command{gawk} on the Atari ST @@ -25999,7 +27459,7 @@ When @command{gawk} is compiled with the ST version of @command{gcc} and its usual libraries, it accepts both @samp{/} and @samp{\} as path separators. While this is convenient, it should be remembered that this removes one technically valid character (@samp{/}) from your @value{FN}. -It may also create problems for external programs called via the @code{system} +It may also create problems for external programs called via the @code{system()} function, which may not support this convention. Whenever it is possible that a file created by @command{gawk} will be used by some other program, use only backslashes. Also remember that in @command{awk}, backslashes in @@ -26037,7 +27497,7 @@ $ make $ make install @end example -BeOS uses @command{bash} as its shell; thus, you use @command{gawk} the same way you would +BeOS uses Bash as its shell; thus, you use @command{gawk} the same way you would under Unix. If these steps do not work, please send in a bug report (@pxref{Bugs}). @@ -26198,7 +27658,7 @@ as follows: @item Tandem @tab Stephen Davies, @email{scldad@@sdc.com.au}. @cindex Woehlke, Matthew -@item Tandem (POSIX-compliant) @tab Matthew Woehlke @tab @email{mw_triad@@users.sourceforge.net} +@item Tandem (POSIX-compliant) @tab Matthew Woehlke, @email{mw_triad@@users.sourceforge.net} @end ignore @cindex Rankin, Pat @@ -26298,7 +27758,7 @@ is similar to @command{gawk}'s @itemize @bullet @item -The @code{fflush} built-in function for flushing buffered output +The @code{fflush()} built-in function for flushing buffered output (@pxref{I/O Functions}). @item @@ -26323,7 +27783,7 @@ Use @code{"-"} instead of @code{"/dev/stdin"} with @command{mawk}. @item The ability for @code{FS} and for the third -argument to @code{split} to be null strings +argument to @code{split()} to be null strings (@pxref{Single Character Fields}). @item @@ -26447,7 +27907,7 @@ If @command{gawk} is compiled for debugging with @samp{-DDEBUG}, then there is one more option available on the command line: @table @code -@item -W parsedebug +@item -Y @itemx --parsedebug Prints out the parse stack information as the program is being parsed. @end table @@ -26539,7 +27999,7 @@ Use ANSI/ISO style (prototype) function headers when defining functions. Put the name of the function at the beginning of its own line. @item -Put the return type of the function, even if it is @code{int}, on the +Put the return type of the function, even if it is @code{int()}, on the line above the line with the name and arguments of the function. @item @@ -26851,15 +28311,15 @@ It also guarantees that the string is zero-terminated. This function returns the actual number of parameters passed to the current function. Inside the code of an extension this can be used to determine the maximum index which is -safe to use with @code{stack_ptr}. If this value is -greater than @code{tree->param_cnt}, the function was +safe to use with @code{get_actual_argument}. If this value is +greater than @code{nargs}, the function was called incorrectly from the @command{awk} program. @strong{Caution:} This function is new as of @command{gawk} 3.1.4. @cindex parameters@comma{} number of -@cindex @code{param_cnt} internal variable -@item n->param_cnt +@cindex @code{nargs} internal variable +@item nargs Inside an extension function, this is the maximum number of expected parameters, as set by the @code{make_builtin} function. @@ -26895,7 +28355,7 @@ Make sure that @samp{n->type == Node_var_array} first. @item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference) Finds, and installs if necessary, array elements. @code{symbol} is the array, @code{subs} is the subscript. -This is usually a value created with @code{tmp_string} (see below). +This is usually a value created with @code{make_string} (see below). @code{reference} should be @code{TRUE} if it is an error to use the value before it is created. Typically, @code{FALSE} is the correct value to use from extension functions. @@ -26914,17 +28374,6 @@ Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that can be stored appropriately. This is permanent storage; understanding of @command{gawk} memory management is helpful. -@cindex @code{tmp_string} internal function -@item NODE *tmp_string(char *s, size_t len); -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is temporary storage; understanding -of @command{gawk} memory management is helpful. - -@cindex @code{tmp_number} internal function -@item NODE *tmp_number(AWKNUM val) -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is temporary storage; -understanding of @command{gawk} memory management is helpful. @cindex nodes@comma{} duplicating @cindex @code{dupnode} internal function @@ -26934,10 +28383,10 @@ reference count instead of actually duplicating the entire @code{NODE}; understanding of @command{gawk} memory management is helpful. @cindex memory, releasing -@cindex @code{free_temp} internal macro -@item void free_temp(NODE *n) +@cindex @code{unref} internal function +@item void unref(NODE *n) This macro releases the memory associated with a @code{NODE} -allocated with @code{tmp_string} or @code{tmp_number}. +allocated with @code{make_string} or @code{make_number}. Understanding of @command{gawk} memory management is helpful. @cindex @code{make_builtin} internal function @@ -26951,7 +28400,7 @@ The function should be written in the following manner: /* do_xxx --- do xxx function for gawk */ NODE * -do_xxx(NODE *tree) +do_xxx(int nargs) @{ @dots{} @} @@ -26959,13 +28408,13 @@ do_xxx(NODE *tree) @cindex arguments, retrieving @cindex @code{get_argument} internal function -@item NODE *get_argument(NODE *tree, int i) +@item NODE *get_argument(int i) This function is called from within a C extension function to get the @code{i}-th argument from the function call. The first argument is argument zero. @cindex @code{get_actual_argument} internal function -@item NODE *get_actual_argument(NODE *tree, unsigned int i, +@item NODE *get_actual_argument(int i, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray); This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE} if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is @@ -26976,24 +28425,18 @@ the argument was not provided. @strong{Caution:} This function is new as of @command{gawk} 3.1.4. @cindex @code{get_scalar_argument} internal macro -@item get_scalar_argument(t, i, opt) +@item get_scalar_argument(i, opt) This is a convenience macro that calls @code{get_actual_argument}. @strong{Caution:} This macro is new as of @command{gawk} 3.1.4. @cindex @code{get_array_argument} internal macro -@item get_array_argument(t, i, opt) +@item get_array_argument(i, opt) This is a convenience macro that calls @code{get_actual_argument}. @strong{Caution:} This macro is new as of @command{gawk} 3.1.4. @cindex functions, return values@comma{} setting -@cindex @code{set_value} internal function -@item void set_value(NODE *tree) -This function is called from within a C extension function to set -the return value from the extension function. This value is -what the @command{awk} program sees as the return value from the -new @command{awk} function. @cindex @code{ERRNO} variable @cindex @code{update_ERRNO} internal function @@ -27113,7 +28556,7 @@ the_arg = get_array(the_arg); assoc_clear(the_arg); @end smallexample -As of version 3.1.4, the internals improved again, and became +In @value{PVERSION} 3.1.4, the internals improved again, and became even simpler: @smallexample @@ -27122,6 +28565,14 @@ NODE *the_arg; the_arg = get_array_argument(tree, 2, FALSE); /* assume need 3rd arg, 0-based */ @end smallexample +As of @value{PVERSION} 4.0, the internals changed again: + +@smallexample +NODE *the_arg; + +the_arg = get_array_argument(2, FALSE); /* assume need 3rd arg, 0-based */ +@end smallexample + Again, you should spend time studying the @command{gawk} internals; don't just blindly copy this code. @c ENDOFRANGE gawint @@ -27229,7 +28680,7 @@ be a function of the file's size if the file has holes. @itemx "ctime" The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for formatting -with @code{strftime} +with @code{strftime()} (@pxref{Built-in}). @item "pmode" @@ -27311,8 +28762,7 @@ slightly for presentation. The complete version can be found in chdir() builtin for gawk */ static NODE * -do_chdir(tree) -NODE *tree; +do_chdir(int nargs) @{ NODE *newdir; int ret = -1; @@ -27320,7 +28770,7 @@ NODE *tree; if (do_lint && get_curfunc_arg_count() != 1) lintwarn("chdir: called with incorrect number of arguments"); - newdir = get_scalar_argument(tree, 0); + newdir = get_scalar_argument(0, FALSE); @end example The file includes the @code{"awk.h"} header file for definitions @@ -27330,36 +28780,29 @@ for access to the @code{major} and @code{minor} macros. @cindex programming conventions, @command{gawk} internals By convention, for an @command{awk} function @code{foo}, the function that implements it is called @samp{do_foo}. The function should take -a @samp{NODE *} argument, usually called @code{tree}, that -represents the argument list to the function. The @code{newdir} +a @samp{int} argument, usually called @code{nargs}, that +represents the number of defined arguments for the function. The @code{newdir} variable represents the new directory to change to, retrieved -with @code{get_argument}. Note that the first argument is +with @code{get_scalar_argument}. Note that the first argument is numbered zero. This code actually accomplishes the @code{chdir}. It first forces the argument to be a string and passes the string value to the @code{chdir} system call. If the @code{chdir} fails, @code{ERRNO} is updated. -The result of @code{force_string} has to be freed with @code{free_temp}: @example (void) force_string(newdir); ret = chdir(newdir->stptr); if (ret < 0) update_ERRNO(); - free_temp(newdir); @end example -Finally, the function returns the return value to the @command{awk} level, -using @code{set_value}. Then it must return a value from the call to -the new built-in (this value ignored by the interpreter): +Finally, the function returns the return value to the @command{awk} level: @example /* Set the return value */ - set_value(tmp_number((AWKNUM) ret)); - - /* Just to make the interpreter happy */ - return tmp_number((AWKNUM) 0); + return make_number((AWKNUM) ret); @} @end example @@ -27391,10 +28834,9 @@ Changed message for page breaking. Used to be: /* do_stat --- provide a stat() function for gawk */ static NODE * -do_stat(tree) -NODE *tree; +do_stat(int nargs) @{ - NODE *file, *array; + NODE *file, *array, *tmp; struct stat sbuf; int ret; NODE **aptr; @@ -27414,8 +28856,8 @@ If there's an error, we set @code{ERRNO} and return: @c comment made multiline for page breaking @example /* directory is first arg, array to hold results is second */ - file = get_scalar_argument(tree, 0, FALSE); - array = get_array_argument(tree, 1, FALSE); + file = get_scalar_argument(0, FALSE); + array = get_array_argument(1, FALSE); /* empty out the array */ assoc_clear(array); @@ -27425,11 +28867,7 @@ If there's an error, we set @code{ERRNO} and return: ret = lstat(file->stptr, & sbuf); if (ret < 0) @{ update_ERRNO(); - - set_value(tmp_number((AWKNUM) ret)); - - free_temp(file); - return tmp_number((AWKNUM) 0); + return make_number((AWKNUM) ret); @} @end example @@ -27438,28 +28876,26 @@ calls are shown here, since they all follow the same pattern: @example /* fill in the array */ - aptr = assoc_lookup(array, tmp_string("name", 4), FALSE); + aptr = assoc_lookup(array, tmp = make_string("name", 4), FALSE); *aptr = dupnode(file); + unref(tmp); - aptr = assoc_lookup(array, tmp_string("mode", 4), FALSE); + aptr = assoc_lookup(array, tmp = make_string("mode", 4), FALSE); *aptr = make_number((AWKNUM) sbuf.st_mode); + unref(tmp); - aptr = assoc_lookup(array, tmp_string("pmode", 5), FALSE); + aptr = assoc_lookup(array, tmp = make_string("pmode", 5), FALSE); pmode = format_mode(sbuf.st_mode); *aptr = make_string(pmode, strlen(pmode)); + unref(tmp); @end example -When done, we free the temporary value containing the @value{FN}, -set the return value, and return: +When done, return the @code{lstat} return value: @example - free_temp(file); /* Set the return value */ - set_value(tmp_number((AWKNUM) ret)); - - /* Just to make the interpreter happy */ - return tmp_number((AWKNUM) 0); + return make_number((AWKNUM) ret); @} @end example @@ -27478,7 +28914,7 @@ void *dl; @{ make_builtin("chdir", do_chdir, 1); make_builtin("stat", do_stat, 2); - return tmp_number((AWKNUM) 0); + return make_number((AWKNUM) 0); @} @end example @@ -27502,8 +28938,8 @@ $ gcc -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c $ ld -o filefuncs.so -shared filefuncs.o @end example -@cindex @code{extension} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension} +@cindex @code{extension()} function (@command{gawk}) +Once the library exists, it is loaded by calling the @code{extension()} built-in function. This function takes two arguments: the name of the library to load and the name of a function to call when the library @@ -28237,8 +29673,8 @@ definition of the language and the original POSIX standards specified that @command{awk} only understands decimal numbers (base 10), and not octal (base 8) or hexadecimal numbers (base 16). -As of this writing (February, 2007), changes in the language of the -current POSIX standard can be interpreted to imply that @command{awk} +Changes in the language of the +2001 and 2004 POSIX standard can be interpreted to imply that @command{awk} should support additional features. These features are: @itemize @bullet @@ -28275,10 +29711,15 @@ interpretation of the standard, which requires a certain amount of by the standard developers, either. In other words, ``we see how you got where you are, but we don't think that that's where you want to be.'' -Nevertheless, on systems that support IEEE floating point, it seems +The 2008 POSIX standard added explicit wording to allow, but not require, +that @command{awk} support hexadecimal floating point values and +special values for ``Not A Number'' and infinity. + +Although the @command{gawk} maintainer continues to feel that +providing those features is inadvisable, +nevertheless, on systems that support IEEE floating point, it seems reasonable to provide @emph{some} way to support NaN and Infinity values. -The solution implemented in @command{gawk}, as of version 3.1.6, is -as follows: +The solution implemented in @command{gawk} is as follows: @enumerate 1 @item @@ -28433,13 +29874,13 @@ Named after the English mathematician Boole. See also ``Logical Expression.'' @item Bourne Shell The standard shell (@file{/bin/sh}) on Unix and Unix-like systems, originally written by Steven R.@: Bourne. -Many shells (@command{bash}, @command{ksh}, @command{pdksh}, @command{zsh}) are +Many shells (Bash, @command{ksh}, @command{pdksh}, @command{zsh}) are generally upwardly compatible with the Bourne shell. @item Built-in Function The @command{awk} language provides built-in functions that perform various numerical, I/O-related, and string computations. Examples are -@code{sqrt} (for the square root of a number) and @code{substr} (for a +@code{sqrt()} (for the square root of a number) and @code{substr()} (for a substring of a string). @command{gawk} provides functions for timestamp management, bit manipulation, and runtime string translation. @@ -28648,7 +30089,7 @@ See also ``Double-Precision'' and ``Single-Precision.'' @item Format Format strings are used to control the appearance of output in the -@code{strftime} and @code{sprintf} functions, and are used in the +@code{strftime()} and @code{sprintf()} functions, and are used in the @code{printf} statement as well. Also, data conversions from numbers to strings are controlled by the format string contained in the built-in variable @code{CONVFMT}. (@xref{Control Letters}.) @@ -28992,7 +30433,7 @@ into the local language. @item Timestamp A value in the ``seconds since the epoch'' format used by Unix and POSIX systems. Used for the @command{gawk} functions -@code{mktime}, @code{strftime}, and @code{systime}. +@code{mktime()}, @code{strftime()}, and @code{systime()}. See also ``Epoch'' and ``UTC.'' @cindex Linux |