aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.info')
-rw-r--r--doc/gawk.info18232
1 files changed, 18232 insertions, 0 deletions
diff --git a/doc/gawk.info b/doc/gawk.info
new file mode 100644
index 00000000..680fbab3
--- /dev/null
+++ b/doc/gawk.info
@@ -0,0 +1,18232 @@
+This is gawk.info, produced by makeinfo version 4.0 from gawk.texi.
+
+INFO-DIR-SECTION Programming Languages
+START-INFO-DIR-ENTRY
+* Gawk: (gawk.info). A Text Scanning and Processing Language.
+END-INFO-DIR-ENTRY
+
+ This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ This is Edition 1.0.1 of `The GNU Awk User's Guide', for the
+3.0.1 version of the GNU implementation of AWK.
+
+ Copyright (C) 1989, 1991, 92, 93, 96 Free Software Foundation, Inc.
+
+ Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+ Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+ Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+File: gawk.info, Node: Top, Next: Preface, Prev: (dir), Up: (dir)
+
+General Introduction
+********************
+
+ This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ This is Edition 1.0.1 of `The GNU Awk User's Guide',
+for the 3.0.1 version of the GNU implementation
+of AWK.
+
+* Menu:
+
+* Preface:: What this Info file is about; brief
+ history and acknowledgements.
+* What Is Awk:: What is the `awk' language; using this
+ Info file.
+* Getting Started:: A basic introduction to using `awk'. How
+ to run an `awk' program. Command line
+ syntax.
+* One-liners:: Short, sample `awk' programs.
+* Regexp:: All about matching things using regular
+ expressions.
+* Reading Files:: How to read files and manipulate fields.
+* Printing:: How to print using `awk'. Describes the
+ `print' and `printf' statements.
+ Also describes redirection of output.
+* Expressions:: Expressions are the basic building blocks of
+ statements.
+* Patterns and Actions:: Overviews of patterns and actions.
+* Statements:: The various control statements are described
+ in detail.
+* Built-in Variables:: Built-in Variables
+* Arrays:: The description and use of arrays. Also
+ includes array-oriented control statements.
+* Built-in:: The built-in functions are summarized here.
+* User-defined:: User-defined functions are described in
+ detail.
+* Invoking Gawk:: How to run `gawk'.
+* Library Functions:: A Library of `awk' Functions.
+* Sample Programs:: Many `awk' programs with complete
+ explanations.
+* Language History:: The evolution of the `awk' language.
+* Gawk Summary:: `gawk' Options and Language Summary.
+* Installation:: Installing `gawk' under various operating
+ systems.
+* Notes:: Something about the implementation of
+ `gawk'.
+* Glossary:: An explanation of some unfamiliar terms.
+* Copying:: Your right to copy and distribute `gawk'.
+* Index:: Concept and Variable Index.
+
+* History:: The history of `gawk' and `awk'.
+* Manual History:: Brief history of the GNU project and this
+ Info file.
+* Acknowledgements:: Acknowledgements.
+* This Manual:: Using this Info file. Includes sample
+ input files that you can use.
+* Conventions:: Typographical Conventions.
+* Sample Data Files:: Sample data files for use in the `awk'
+ programs illustrated in this Info file.
+* Names:: What name to use to find `awk'.
+* Running gawk:: How to run `gawk' programs; includes
+ command line syntax.
+* One-shot:: Running a short throw-away `awk' program.
+* Read Terminal:: Using no input files (input from terminal
+ instead).
+* Long:: Putting permanent `awk' programs in
+ files.
+* Executable Scripts:: Making self-contained `awk' programs.
+* Comments:: Adding documentation to `gawk' programs.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example with two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements into
+ lines.
+* Other Features:: Other Features of `awk'.
+* When:: When to use `gawk' and when to use other
+ things.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write non-printing characters.
+* Regexp Operators:: Regular Expression Operators.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Basic Field Splitting:: How fields are split with single characters or
+ simple strings.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate field.
+* Command Line Field Separator:: Setting `FS' from the command line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the `getline' function.
+* Getline Intro:: Introduction to the `getline' function.
+* Plain Getline:: Using `getline' with no arguments.
+* Getline/Variable:: Using `getline' into a variable.
+* Getline/File:: Using `getline' from a file.
+* Getline/Variable/File:: Using `getline' into a variable from a
+ file.
+* Getline/Pipe:: Using `getline' from a pipe.
+* Getline/Variable/Pipe:: Using `getline' into a variable from a
+ pipe.
+* Getline Summary:: Summary Of `getline' Variants.
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print' statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With `print'.
+* Printf:: The `printf' statement.
+* Basic Printf:: Syntax of the `printf' statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple files and
+ pipes.
+* Special Files:: File name interpretation in `gawk'.
+ `gawk' allows access to inherited file
+ descriptors.
+* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+* Constants:: String, numeric, and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command line and a
+ summary of command line syntax. This is an
+ advanced method of input.
+* Conversion:: The conversion of strings to numbers and vice
+ versa.
+* Arithmetic Ops:: Arithmetic operations (`+', `-',
+ etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+* Truth Values:: What is ``true'' and what is ``false''.
+* Typing and Comparison:: How variables acquire types, and how this
+ affects comparison of numbers and strings with
+ `<', etc.
+* Boolean Ops:: Combining comparison expressions using boolean
+ operators `||' (``or''), `&&'
+ (``and'') and `!' (``not'').
+* Conditional Exp:: Conditional expressions select between two
+ subexpressions under control of a third
+ subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Pattern Overview:: What goes into a pattern.
+* Kinds of Patterns:: A list of all kinds of patterns.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* Empty:: The empty pattern, which matches every record.
+* Action Overview:: What goes into an action.
+* If Statement:: Conditionally execute some `awk'
+ statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until some
+ condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Break Statement:: Immediately exit the innermost enclosing loop.
+* Continue Statement:: Skip to the end of the innermost enclosing
+ loop.
+* Next Statement:: Stop processing the current input record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of `awk'.
+* User-modified:: Built-in variables that you change to control
+ `awk'.
+* Auto-set:: Built-in variables where `awk' gives you
+ information.
+* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the `for' statement. It
+ loops through the indices of an array's
+ existing elements.
+* Delete:: The `delete' statement removes an element
+ from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ `awk'.
+* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
+* Multi-dimensional:: Emulating multi-dimensional arrays in
+ `awk'.
+* Multi-scanning:: Scanning multi-dimensional arrays.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers, including
+ `int', `sin' and `rand'.
+* String Functions:: Functions for string manipulation, such as
+ `split', `match', and
+ `sprintf'.
+* I/O Functions:: Functions for files and shell commands.
+* Time Functions:: Functions for dealing with time stamps.
+* Definition Syntax:: How to write definitions and what they mean.
+* Function Example:: An example function definition and what it
+ does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+* Options:: Command line options and their meanings.
+* Other Arguments:: Input file names and variable assignments.
+* AWKPATH Variable:: Searching directories for `awk' programs.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Known Bugs:: Known Bugs in `gawk'.
+* Portability Notes:: What to do if you don't have `gawk'.
+* Nextfile Function:: Two implementations of a `nextfile'
+ function.
+* Assert Function:: A function for assertions in `awk'
+ programs.
+* Round Function:: A function for rounding if `sprintf' does
+ not do it correctly.
+* Ordinal Functions:: Functions for using characters as numbers and
+ vice versa.
+* Join Function:: A function to join an array into a string.
+* Mktime Function:: A function to turn a date into a timestamp.
+* Gettimeofday Function:: A function to get formatted times.
+* Filetrans Function:: A function for handling data file transitions.
+* Getopt Function:: A function for processing command line
+ arguments.
+* Passwd Functions:: Functions for getting user information.
+* Group Functions:: Functions for getting group information.
+* Library Names:: How to best name private global variables in
+ library functions.
+* Clones:: Clones of common utilities.
+* Cut Program:: The `cut' utility.
+* Egrep Program:: The `egrep' utility.
+* Id Program:: The `id' utility.
+* Split Program:: The `split' utility.
+* Tee Program:: The `tee' utility.
+* Uniq Program:: The `uniq' utility.
+* Wc Program:: The `wc' utility.
+* Miscellaneous Programs:: Some interesting `awk' programs.
+* Dupword Program:: Finding duplicated words in a document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the `tr' utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage count.
+* History Sorting:: Eliminating duplicate entries from a history
+ file.
+* Extract Program:: Pulling out programs from Texinfo source
+ files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for `awk' that includes files.
+* V7/SVR3.1:: The major changes between V7 and System V
+ Release 3.1.
+* SVR4:: Minor changes between System V Releases 3.1
+ and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from the Bell Laboratories
+ version of `awk'.
+* POSIX/GNU:: The extensions in `gawk' not in POSIX
+ `awk'.
+* Command Line Summary:: Recapitulation of the command line.
+* Language Summary:: A terse review of the language.
+* Variables/Fields:: Variables, fields, and arrays.
+* Fields Summary:: Input field splitting.
+* Built-in Summary:: `awk''s built-in variables.
+* Arrays Summary:: Using arrays.
+* Data Type Summary:: Values in `awk' are numbers or strings.
+* Rules Summary:: Patterns and Actions, and their component
+ parts.
+* Pattern Summary:: Quick overview of patterns.
+* Regexp Summary:: Quick overview of regular expressions.
+* Actions Summary:: Quick overview of actions.
+* Operator Summary:: `awk' operators.
+* Control Flow Summary:: The control statements.
+* I/O Summary:: The I/O statements.
+* Printf Summary:: A summary of `printf'.
+* Special File Summary:: Special file names interpreted internally.
+* Built-in Functions Summary:: Built-in numeric and string functions.
+* Time Functions Summary:: Built-in time functions.
+* String Constants Summary:: Escape sequences in strings.
+* Functions Summary:: Defining and calling functions.
+* Historical Features:: Some undocumented but supported ``features''.
+* Gawk Distribution:: What is in the `gawk' distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing `gawk' under various versions
+ of Unix.
+* Quick Installation:: Compiling `gawk' under Unix.
+* Configuration Philosophy:: How it's all supposed to work.
+* VMS Installation:: Installing `gawk' on VMS.
+* VMS Compilation:: How to compile `gawk' under VMS.
+* VMS Installation Details:: How to install `gawk' under VMS.
+* VMS Running:: How to run `gawk' under VMS.
+* VMS POSIX:: Alternate instructions for VMS POSIX.
+* PC Installation:: Installing and Compiling `gawk' on MS-DOS
+ and OS/2
+* Atari Installation:: Installing `gawk' on the Atari ST.
+* Atari Compiling:: Compiling `gawk' on Atari
+* Atari Using:: Running `gawk' on Atari
+* Amiga Installation:: Installing `gawk' on an Amiga.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available `awk'
+ implementations.
+* Compatibility Mode:: How to disable certain `gawk' extensions.
+* Additions:: Making Additions To `gawk'.
+* Adding Code:: Adding code to the main body of `gawk'.
+* New Ports:: Porting `gawk' to a new operating system.
+* Future Extensions:: New features that may be implemented one day.
+* Improvements:: Suggestions for improvements by volunteers.
+
+ To Miriam, for making me complete.
+
+
+ To Chana, for the joy you bring us.
+
+
+ To Rivka, for the exponential increase.
+
+
+ To Nachum, for the added dimension.
+
+
+File: gawk.info, Node: Preface, Next: What Is Awk, Prev: Top, Up: Top
+
+Preface
+*******
+
+ This Info file teaches you about the `awk' language and how you can
+use it effectively. You should already be familiar with basic system
+commands, such as `cat' and `ls',(1) and basic shell facilities, such
+as Input/Output (I/O) redirection and pipes.
+
+ Implementations of the `awk' language are available for many
+different computing environments. This Info file, while describing the
+`awk' language in general, also describes a particular implementation
+of `awk' called `gawk' (which stands for "GNU Awk"). `gawk' runs on a
+broad range of Unix systems, ranging from 80386 PC-based computers, up
+through large scale systems, such as Crays. `gawk' has also been ported
+to MS-DOS and OS/2 PC's, Atari and Amiga micro-computers, and VMS.
+
+* Menu:
+
+* History:: The history of `gawk' and `awk'.
+* Manual History:: Brief history of the GNU project and this
+ Info file.
+* Acknowledgements:: Acknowledgements.
+
+ ---------- Footnotes ----------
+
+ (1) These commands are available on POSIX compliant systems, as well
+as on traditional Unix based systems. If you are using some other
+operating system, you still need to be familiar with the ideas of I/O
+redirection and pipes.
+
+
+File: gawk.info, Node: History, Next: Manual History, Prev: Preface, Up: Preface
+
+History of `awk' and `gawk'
+===========================
+
+ The name `awk' comes from the initials of its designers: Alfred V.
+Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version
+of `awk' was written in 1977 at AT&T Bell Laboratories. In 1985 a new
+version made the programming language more powerful, introducing
+user-defined functions, multiple input streams, and computed regular
+expressions. This new version became generally available with Unix
+System V Release 3.1. The version in System V Release 4 added some new
+features and also cleaned up the behavior in some of the "dark corners"
+of the language. The specification for `awk' in the POSIX Command
+Language and Utilities standard further clarified the language based on
+feedback from both the `gawk' designers, and the original Bell Labs
+`awk' designers.
+
+ The GNU implementation, `gawk', was written in 1986 by Paul Rubin
+and Jay Fenlason, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988 and 1989, David
+Trueman, with help from Arnold Robbins, thoroughly reworked `gawk' for
+compatibility with the newer `awk'. Current development focuses on bug
+fixes, performance improvements, standards compliance, and
+occasionally, new features.
+
+
+File: gawk.info, Node: Manual History, Next: Acknowledgements, Prev: History, Up: Preface
+
+The GNU Project and This Book
+=============================
+
+ The Free Software Foundation (FSF) is a non-profit organization
+dedicated to the production and distribution of freely distributable
+software. It was founded by Richard M. Stallman, the author of the
+original Emacs editor. GNU Emacs is the most widely used version of
+Emacs today.
+
+ The GNU project is an on-going effort on the part of the Free
+Software Foundation to create a complete, freely distributable, POSIX
+compliant computing environment. (GNU stands for "GNU's not Unix".)
+The FSF uses the "GNU General Public License" (or GPL) to ensure that
+source code for their software is always available to the end user. A
+copy of the GPL is included for your reference (*note GNU GENERAL
+PUBLIC LICENSE: Copying.). The GPL applies to the C language source
+code for `gawk'.
+
+ As of this writing (1995), the only major component of the GNU
+environment still uncompleted is the operating system kernel, and work
+proceeds apace on that. A shell, an editor (Emacs), highly portable
+optimizing C, C++, and Objective-C compilers, a symbolic debugger, and
+dozens of large and small utilities (such as `gawk'), have all been
+completed and are freely available.
+
+ Until the GNU operating system is released, the FSF recommends the
+use of Linux, a freely distributable, Unix-like operating system for
+80386 and other systems. There are many books on Linux. One freely
+available one is `Linux Installation and Getting Started', by Matt
+Welsh. Many Linux distributions are available, often in computer
+stores or bundled on CD-ROM with books about Linux. Also, the FSF
+provides a Linux distribution ("Debian"); contact them for more
+information. *Note Getting the `gawk' Distribution: Getting, for the
+FSF's contact information. (There are two other freely available,
+Unix-like operating systems for 80386 and other systems, NetBSD and
+FreeBSD. Both are based on the 4.4-Lite Berkeley Software Distribution,
+and both use recent versions of `gawk' for their versions of `awk'.)
+
+ This Info file itself has gone through several previous, preliminary
+editions. I started working on a preliminary draft of `The GAWK
+Manual', by Diane Close, Paul Rubin, and Richard Stallman in the fall
+of 1988. It was around 90 pages long, and barely described the
+original, "old" version of `awk'. After substantial revision, the first
+version of the `The GAWK Manual' to be released was Edition 0.11 Beta in
+October of 1989. The manual then underwent more substantial revision
+for Edition 0.13 of December 1991. David Trueman, Pat Rankin, and
+Michal Jaegermann contributed sections of the manual for Edition 0.13.
+That edition was published by the FSF as a bound book early in 1992.
+Since then there have been several minor revisions, notably Edition
+0.14 of November 1992 that was published by the FSF in January of 1993,
+and Edition 0.16 of August 1993.
+
+ Edition 1.0 of `The GNU Awk User's Guide' represents a significant
+re-working of `The GAWK Manual', with much additional material. The
+FSF and I agree that I am now the primary author. I also felt that it
+needed a more descriptive title.
+
+ `The GNU Awk User's Guide' will undoubtedly continue to evolve. An
+electronic version comes with the `gawk' distribution from the FSF. If
+you find an error in this Info file, please report it! *Note Reporting
+Problems and Bugs: Bugs, for information on submitting problem reports
+electronically, or write to me in care of the FSF.
+
+
+File: gawk.info, Node: Acknowledgements, Prev: Manual History, Up: Preface
+
+Acknowledgements
+================
+
+ I would like to acknowledge Richard M. Stallman, for his vision of a
+better world, and for his courage in founding the FSF and starting the
+GNU project.
+
+ The initial draft of `The GAWK Manual' had the following
+acknowledgements:
+
+ Many people need to be thanked for their assistance in producing
+ this manual. Jay Fenlason contributed many ideas and sample
+ programs. Richard Mlynarik and Robert Chassell gave helpful
+ comments on drafts of this manual. The paper `A Supplemental
+ Document for `awk'' by John W. Pierce of the Chemistry Department
+ at UC San Diego, pinpointed several issues relevant both to `awk'
+ implementation and to this manual, that would otherwise have
+ escaped us.
+
+ The following people provided many helpful comments on Edition 0.13
+of `The GAWK Manual': Rick Adams, Michael Brennan, Rich Burridge, Diane
+Close, Christopher ("Topher") Eliot, Michael Lijewski, Pat Rankin,
+Miriam Robbins, and Michal Jaegermann.
+
+ The following people provided many helpful comments for Edition 1.0
+of `The GNU Awk User's Guide': Karl Berry, Michael Brennan, Darrel
+Hankerson, Michal Jaegermann, Michael Lijewski, and Miriam Robbins.
+Pat Rankin, Michal Jaegermann, Darrel Hankerson and Scott Deifik
+updated their respective sections for Edition 1.0.
+
+ Robert J. Chassell provided much valuable advice on the use of
+Texinfo. He also deserves special thanks for convincing me _not_ to
+title this Info file `How To Gawk Politely'. Karl Berry helped
+significantly with the TeX part of Texinfo.
+
+ David Trueman deserves special credit; he has done a yeoman job of
+evolving `gawk' so that it performs well, and without bugs. Although
+he is no longer involved with `gawk', working with him on this project
+was a significant pleasure.
+
+ Scott Deifik, Darrel Hankerson, Kai Uwe Rommel, Pat Rankin, and
+Michal Jaegermann (in no particular order) are long time members of the
+`gawk' "crack portability team." Without their hard work and help,
+`gawk' would not be nearly the fine program it is today. It has been
+and continues to be a pleasure working with this team of fine people.
+
+ Jeffrey Friedl provided invaluable help in tracking down a number of
+last minute problems with regular expressions in `gawk' 3.0.
+
+ David and I would like to thank Brian Kernighan of Bell Labs for
+invaluable assistance during the testing and debugging of `gawk', and
+for help in clarifying numerous points about the language. We could
+not have done nearly as good a job on either `gawk' or its
+documentation without his help.
+
+ I would like to thank Marshall and Elaine Hartholz of Seattle, and
+Dr. Bert and Rita Schreiber of Detroit for large amounts of quiet
+vacation time in their homes, which allowed me to make significant
+progress on this Info file and on `gawk' itself. Phil Hughes of SSC
+contributed in a very important way by loaning me his laptop Linux
+system, not once, but twice, allowing me to do a lot of work while away
+from home.
+
+ Finally, I must thank my wonderful wife, Miriam, for her patience
+through the many versions of this project, for her proof-reading, and
+for sharing me with the computer. I would like to thank my parents for
+their love, and for the grace with which they raised and educated me.
+I also must acknowledge my gratitude to G-d, for the many opportunities
+He has sent my way, as well as for the gifts He has given me with which
+to take advantage of those opportunities.
+
+
+
+Arnold Robbins
+Atlanta, Georgia
+January, 1996
+
+
+File: gawk.info, Node: What Is Awk, Next: Getting Started, Prev: Preface, Up: Top
+
+Introduction
+************
+
+ If you are like many computer users, you would frequently like to
+make changes in various text files wherever certain patterns appear, or
+extract data from parts of certain lines while discarding the rest. To
+write a program to do this in a language such as C or Pascal is a
+time-consuming inconvenience that may take many lines of code. The job
+may be easier with `awk'.
+
+ The `awk' utility interprets a special-purpose programming language
+that makes it possible to handle simple data-reformatting jobs with
+just a few lines of code.
+
+ The GNU implementation of `awk' is called `gawk'; it is fully upward
+compatible with the System V Release 4 version of `awk'. `gawk' is
+also upward compatible with the POSIX specification of the `awk'
+language. This means that all properly written `awk' programs should
+work with `gawk'. Thus, we usually don't distinguish between `gawk'
+and other `awk' implementations.
+
+ Using `awk' you can:
+
+ * manage small, personal databases
+
+ * generate reports
+
+ * validate data
+
+ * produce indexes, and perform other document preparation tasks
+
+ * even experiment with algorithms that can be adapted later to other
+ computer languages
+
+* Menu:
+
+* This Manual:: Using this Info file. Includes sample
+ input files that you can use.
+* Conventions:: Typographical Conventions.
+* Sample Data Files:: Sample data files for use in the `awk'
+ programs illustrated in this Info file.
+
+
+File: gawk.info, Node: This Manual, Next: Conventions, Prev: What Is Awk, Up: What Is Awk
+
+Using This Book
+===============
+
+ The term `awk' refers to a particular program, and to the language
+you use to tell this program what to do. When we need to be careful,
+we call the program "the `awk' utility" and the language "the `awk'
+language." The term `gawk' refers to a version of `awk' developed as
+part the GNU project. The purpose of this Info file is to explain both
+the `awk' language and how to run the `awk' utility.
+
+ The main purpose of the Info file is to explain the features of
+`awk', as defined in the POSIX standard. It does so in the context of
+one particular implementation, `gawk'. While doing so, it will also
+attempt to describe important differences between `gawk' and other
+`awk' implementations. Finally, any `gawk' features that are not in
+the POSIX standard for `awk' will be noted.
+
+ The term "`awk' program" refers to a program written by you in the
+`awk' programming language.
+
+ *Note Getting Started with `awk': Getting Started, for the bare
+essentials you need to know to start using `awk'.
+
+ Some useful "one-liners" are included to give you a feel for the
+`awk' language (*note Useful One Line Programs: One-liners.).
+
+ Many sample `awk' programs have been provided for you (*note A
+Library of `awk' Functions: Library Functions.; also *note Practical
+`awk' Programs: Sample Programs.).
+
+ The entire `awk' language is summarized for quick reference in *Note
+`gawk' Summary: Gawk Summary. Look there if you just need to refresh
+your memory about a particular feature.
+
+ If you find terms that you aren't familiar with, try looking them up
+in the glossary (*note Glossary::).
+
+ Most of the time complete `awk' programs are used as examples, but in
+some of the more advanced sections, only the part of the `awk' program
+that illustrates the concept being described is shown.
+
+ While this Info file is aimed principally at people who have not been
+exposed to `awk', there is a lot of information here that even the `awk'
+expert should find useful. In particular, the description of POSIX
+`awk', and the example programs in *Note A Library of `awk' Functions:
+Library Functions, and *Note Practical `awk' Programs: Sample Programs,
+should be of interest.
+
+Dark Corners
+------------
+
+ Until the POSIX standard (and `The Gawk Manual'), many features of
+`awk' were either poorly documented, or not documented at all.
+Descriptions of such features (often called "dark corners") are noted
+in this Info file with "(d.c.)". They also appear in the index under
+the heading "dark corner."
+
+
+File: gawk.info, Node: Conventions, Next: Sample Data Files, Prev: This Manual, Up: What Is Awk
+
+Typographical Conventions
+=========================
+
+ This Info file is written using Texinfo, the GNU documentation
+formatting language. A single Texinfo source file is used to produce
+both the printed and on-line versions of the documentation. This
+section briefly documents the typographical conventions used in Texinfo.
+
+ Examples you would type at the command line are preceded by the
+common shell primary and secondary prompts, `$' and `>'. Output from
+the command is preceded by the glyph "-|". This typically represents
+the command's standard output. Error messages, and other output on the
+command's standard error, are preceded by the glyph "error-->". For
+example:
+
+ $ echo hi on stdout
+ -| hi on stdout
+ $ echo hello on stderr 1>&2
+ error--> hello on stderr
+
+ Characters that you type at the keyboard look `like this'. In
+particular, there are special characters called "control characters."
+These are characters that you type by holding down both the `CONTROL'
+key and another key, at the same time. For example, a `Control-d' is
+typed by first pressing and holding the `CONTROL' key, next pressing
+the `d' key, and finally releasing both keys.
+
+
+File: gawk.info, Node: Sample Data Files, Prev: Conventions, Up: What Is Awk
+
+Data Files for the Examples
+===========================
+
+ Many of the examples in this Info file take their input from two
+sample data files. The first, called `BBS-list', represents a list of
+computer bulletin board systems together with information about those
+systems. The second data file, called `inventory-shipped', contains
+information about shipments on a monthly basis. In both files, each
+line is considered to be one "record".
+
+ In the file `BBS-list', each record contains the name of a computer
+bulletin board, its phone number, the board's baud rate(s), and a code
+for the number of hours it is operational. An `A' in the last column
+means the board operates 24 hours a day. A `B' in the last column
+means the board operates evening and weekend hours, only. A `C' means
+the board operates only on weekends.
+
+ aardvark 555-5553 1200/300 B
+ alpo-net 555-3412 2400/1200/300 A
+ barfly 555-7685 1200/300 A
+ bites 555-1675 2400/1200/300 A
+ camelot 555-0542 300 C
+ core 555-2912 1200/300 C
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sdace 555-3430 2400/1200/300 A
+ sabafoo 555-2127 1200/300 C
+
+ The second data file, called `inventory-shipped', represents
+information about shipments during the year. Each record contains the
+month of the year, the number of green crates shipped, the number of
+red boxes shipped, the number of orange bags shipped, and the number of
+blue packages shipped, respectively. There are 16 entries, covering
+the 12 months of one year and four months of the next year.
+
+ Jan 13 25 15 115
+ Feb 15 32 24 226
+ Mar 15 24 34 228
+ Apr 31 52 63 420
+ May 16 34 29 208
+ Jun 31 42 75 492
+ Jul 24 34 67 436
+ Aug 15 34 47 316
+ Sep 13 55 37 277
+ Oct 29 54 68 525
+ Nov 20 87 82 577
+ Dec 17 35 61 401
+
+ Jan 21 36 64 620
+ Feb 26 58 80 652
+ Mar 24 75 70 495
+ Apr 21 70 74 514
+
+ If you are reading this in GNU Emacs using Info, you can copy the
+regions of text showing these sample files into your own test files.
+This way you can try out the examples shown in the remainder of this
+document. You do this by using the command `M-x write-region' to copy
+text from the Info file into a file for use with `awk' (*Note
+Miscellaneous File Operations: (emacs)Misc File Ops, for more
+information). Using this information, create your own `BBS-list' and
+`inventory-shipped' files, and practice what you learn in this Info
+file.
+
+ If you are using the stand-alone version of Info, see *Note
+Extracting Programs from Texinfo Source Files: Extract Program, for an
+`awk' program that will extract these data files from `gawk.texi', the
+Texinfo source file for this Info file.
+
+
+File: gawk.info, Node: Getting Started, Next: One-liners, Prev: What Is Awk, Up: Top
+
+Getting Started with `awk'
+**************************
+
+ The basic function of `awk' is to search files for lines (or other
+units of text) that contain certain patterns. When a line matches one
+of the patterns, `awk' performs specified actions on that line. `awk'
+keeps processing input lines in this way until the end of the input
+files are reached.
+
+ Programs in `awk' are different from programs in most other
+languages, because `awk' programs are "data-driven"; that is, you
+describe the data you wish to work with, and then what to do when you
+find it. Most other languages are "procedural"; you have to describe,
+in great detail, every step the program is to take. When working with
+procedural languages, it is usually much harder to clearly describe the
+data your program will process. For this reason, `awk' programs are
+often refreshingly easy to both write and read.
+
+ When you run `awk', you specify an `awk' "program" that tells `awk'
+what to do. The program consists of a series of "rules". (It may also
+contain "function definitions", an advanced feature which we will
+ignore for now. *Note User-defined Functions: User-defined.) Each
+rule specifies one pattern to search for, and one action to perform
+when that pattern is found.
+
+ Syntactically, a rule consists of a pattern followed by an action.
+The action is enclosed in curly braces to separate it from the pattern.
+Rules are usually separated by newlines. Therefore, an `awk' program
+looks like this:
+
+ PATTERN { ACTION }
+ PATTERN { ACTION }
+ ...
+
+* Menu:
+
+* Names:: What name to use to find `awk'.
+* Running gawk:: How to run `gawk' programs; includes
+ command line syntax.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example with two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements into
+ lines.
+* Other Features:: Other Features of `awk'.
+* When:: When to use `gawk' and when to use other
+ things.
+
+
+File: gawk.info, Node: Names, Next: Running gawk, Prev: Getting Started, Up: Getting Started
+
+A Rose By Any Other Name
+========================
+
+ The `awk' language has evolved over the years. Full details are
+provided in *Note The Evolution of the `awk' Language: Language History.
+The language described in this Info file is often referred to as "new
+`awk'."
+
+ Because of this, many systems have multiple versions of `awk'. Some
+systems have an `awk' utility that implements the original version of
+the `awk' language, and a `nawk' utility for the new version. Others
+have an `oawk' for the "old `awk'" language, and plain `awk' for the
+new one. Still others only have one version, usually the new one.(1)
+
+ All in all, this makes it difficult for you to know which version of
+`awk' you should run when writing your programs. The best advice we
+can give here is to check your local documentation. Look for `awk',
+`oawk', and `nawk', as well as for `gawk'. Chances are, you will have
+some version of new `awk' on your system, and that is what you should
+use when running your programs. (Of course, if you're reading this
+Info file, chances are good that you have `gawk'!)
+
+ Throughout this Info file, whenever we refer to a language feature
+that should be available in any complete implementation of POSIX `awk',
+we simply use the term `awk'. When referring to a feature that is
+specific to the GNU implementation, we use the term `gawk'.
+
+ ---------- Footnotes ----------
+
+ (1) Often, these systems use `gawk' for their `awk' implementation!
+
+
+File: gawk.info, Node: Running gawk, Next: Very Simple, Prev: Names, Up: Getting Started
+
+How to Run `awk' Programs
+=========================
+
+ There are several ways to run an `awk' program. If the program is
+short, it is easiest to include it in the command that runs `awk', like
+this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+where PROGRAM consists of a series of patterns and actions, as
+described earlier. (The reason for the single quotes is described
+below, in *Note One-shot Throw-away `awk' Programs: One-shot.)
+
+ When the program is long, it is usually more convenient to put it in
+a file and run it with a command like this:
+
+ awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+* Menu:
+
+* One-shot:: Running a short throw-away `awk' program.
+* Read Terminal:: Using no input files (input from terminal
+ instead).
+* Long:: Putting permanent `awk' programs in
+ files.
+* Executable Scripts:: Making self-contained `awk' programs.
+* Comments:: Adding documentation to `gawk' programs.
+
+
+File: gawk.info, Node: One-shot, Next: Read Terminal, Prev: Running gawk, Up: Running gawk
+
+One-shot Throw-away `awk' Programs
+----------------------------------
+
+ Once you are familiar with `awk', you will often type in simple
+programs the moment you want to use them. Then you can write the
+program as the first argument of the `awk' command, like this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+where PROGRAM consists of a series of PATTERNS and ACTIONS, as
+described earlier.
+
+ This command format instructs the "shell", or command interpreter,
+to start `awk' and use the PROGRAM to process records in the input
+file(s). There are single quotes around PROGRAM so that the shell
+doesn't interpret any `awk' characters as special shell characters.
+They also cause the shell to treat all of PROGRAM as a single argument
+for `awk' and allow PROGRAM to be more than one line long.
+
+ This format is also useful for running short or medium-sized `awk'
+programs from shell scripts, because it avoids the need for a separate
+file for the `awk' program. A self-contained shell script is more
+reliable since there are no other files to misplace.
+
+ *Note Useful One Line Programs: One-liners, presents several short,
+self-contained programs.
+
+ As an interesting side point, the command
+
+ awk '/foo/' FILES ...
+
+is essentially the same as
+
+ egrep foo FILES ...
+
+
+File: gawk.info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk
+
+Running `awk' without Input Files
+---------------------------------
+
+ You can also run `awk' without any input files. If you type the
+command line:
+
+ awk 'PROGRAM'
+
+then `awk' applies the PROGRAM to the "standard input", which usually
+means whatever you type on the terminal. This continues until you
+indicate end-of-file by typing `Control-d'. (On other operating
+systems, the end-of-file character may be different. For example, on
+OS/2 and MS-DOS, it is `Control-z'.)
+
+ For example, the following program prints a friendly piece of advice
+(from Douglas Adams' `The Hitchhiker's Guide to the Galaxy'), to keep
+you from worrying about the complexities of computer programming
+(`BEGIN' is a feature we haven't discussed yet).
+
+ $ awk "BEGIN { print \"Don't Panic!\" }"
+ -| Don't Panic!
+
+ This program does not read any input. The `\' before each of the
+inner double quotes is necessary because of the shell's quoting rules,
+in particular because it mixes both single quotes and double quotes.
+
+ This next simple `awk' program emulates the `cat' utility; it copies
+whatever you type at the keyboard to its standard output. (Why this
+works is explained shortly.)
+
+ $ awk '{ print }'
+ Now is the time for all good men
+ -| Now is the time for all good men
+ to come to the aid of their country.
+ -| to come to the aid of their country.
+ Four score and seven years ago, ...
+ -| Four score and seven years ago, ...
+ What, me worry?
+ -| What, me worry?
+ Control-d
+
+
+File: gawk.info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk
+
+Running Long Programs
+---------------------
+
+ Sometimes your `awk' programs can be very long. In this case it is
+more convenient to put the program into a separate file. To tell `awk'
+to use that file for its program, you type:
+
+ awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+ The `-f' instructs the `awk' utility to get the `awk' program from
+the file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For
+example, you could put the program:
+
+ BEGIN { print "Don't Panic!" }
+
+into the file `advice'. Then this command:
+
+ awk -f advice
+
+does the same thing as this one:
+
+ awk "BEGIN { print \"Don't Panic!\" }"
+
+which was explained earlier (*note Running `awk' without Input Files:
+Read Terminal.). Note that you don't usually need single quotes around
+the file name that you specify with `-f', because most file names don't
+contain any of the shell's special characters. Notice that in
+`advice', the `awk' program did not have single quotes around it. The
+quotes are only needed for programs that are provided on the `awk'
+command line.
+
+ If you want to identify your `awk' program files clearly as such,
+you can add the extension `.awk' to the file name. This doesn't affect
+the execution of the `awk' program, but it does make "housekeeping"
+easier.
+
+
+File: gawk.info, Node: Executable Scripts, Next: Comments, Prev: Long, Up: Running gawk
+
+Executable `awk' Programs
+-------------------------
+
+ Once you have learned `awk', you may want to write self-contained
+`awk' scripts, using the `#!' script mechanism. You can do this on
+many Unix systems(1) (and someday on the GNU system).
+
+ For example, you could update the file `advice' to look like this:
+
+ #! /bin/awk -f
+
+ BEGIN { print "Don't Panic!" }
+
+After making this file executable (with the `chmod' utility), you can
+simply type `advice' at the shell, and the system will arrange to run
+`awk'(2) as if you had typed `awk -f advice'.
+
+ $ advice
+ -| Don't Panic!
+
+Self-contained `awk' scripts are useful when you want to write a
+program which users can invoke without their having to know that the
+program is written in `awk'.
+
+ Some older systems do not support the `#!' mechanism. You can get a
+similar effect using a regular shell script. It would look something
+like this:
+
+ : The colon ensures execution by the standard shell.
+ awk 'PROGRAM' "$@"
+
+ Using this technique, it is _vital_ to enclose the PROGRAM in single
+quotes to protect it from interpretation by the shell. If you omit the
+quotes, only a shell wizard can predict the results.
+
+ The `"$@"' causes the shell to forward all the command line
+arguments to the `awk' program, without interpretation. The first
+line, which starts with a colon, is used so that this shell script will
+work even if invoked by a user who uses the C shell. (Not all older
+systems obey this convention, but many do.)
+
+ ---------- Footnotes ----------
+
+ (1) The `#!' mechanism works on Linux systems, Unix systems derived
+from Berkeley Unix, System V Release 4, and some System V Release 3
+systems.
+
+ (2) The line beginning with `#!' lists the full file name of an
+interpreter to be run, and an optional initial command line argument to
+pass to that interpreter. The operating system then runs the
+interpreter with the given argument and the full argument list of the
+executed program. The first argument in the list is the full file name
+of the `awk' program. The rest of the argument list will either be
+options to `awk', or data files, or both.
+
+
+File: gawk.info, Node: Comments, Prev: Executable Scripts, Up: Running gawk
+
+Comments in `awk' Programs
+--------------------------
+
+ A "comment" is some text that is included in a program for the sake
+of human readers; it is not really part of the program. Comments can
+explain what the program does, and how it works. Nearly all
+programming languages have provisions for comments, because programs are
+typically hard to understand without their extra help.
+
+ In the `awk' language, a comment starts with the sharp sign
+character, `#', and continues to the end of the line. The `#' does not
+have to be the first character on the line. The `awk' language ignores
+the rest of a line following a sharp sign. For example, we could have
+put the following into `advice':
+
+ # This program prints a nice friendly message. It helps
+ # keep novice users from being afraid of the computer.
+ BEGIN { print "Don't Panic!" }
+
+ You can put comment lines into keyboard-composed throw-away `awk'
+programs also, but this usually isn't very useful; the purpose of a
+comment is to help you or another person understand the program at a
+later time.
+
+
+File: gawk.info, Node: Very Simple, Next: Two Rules, Prev: Running gawk, Up: Getting Started
+
+A Very Simple Example
+=====================
+
+ The following command runs a simple `awk' program that searches the
+input file `BBS-list' for the string of characters: `foo'. (A string
+of characters is usually called a "string". The term "string" is
+perhaps based on similar usage in English, such as "a string of
+pearls," or, "a string of cars in a train.")
+
+ awk '/foo/ { print $0 }' BBS-list
+
+When lines containing `foo' are found, they are printed, because
+`print $0' means print the current line. (Just `print' by itself means
+the same thing, so we could have written that instead.)
+
+ You will notice that slashes, `/', surround the string `foo' in the
+`awk' program. The slashes indicate that `foo' is a pattern to search
+for. This type of pattern is called a "regular expression", and is
+covered in more detail later (*note Regular Expressions: Regexp.). The
+pattern is allowed to match parts of words. There are single-quotes
+around the `awk' program so that the shell won't interpret any of it as
+special shell characters.
+
+ Here is what this program prints:
+
+ $ awk '/foo/ { print $0 }' BBS-list
+ -| fooey 555-1234 2400/1200/300 B
+ -| foot 555-6699 1200/300 B
+ -| macfoo 555-6480 1200/300 A
+ -| sabafoo 555-2127 1200/300 C
+
+ In an `awk' rule, either the pattern or the action can be omitted,
+but not both. If the pattern is omitted, then the action is performed
+for _every_ input line. If the action is omitted, the default action
+is to print all lines that match the pattern.
+
+ Thus, we could leave out the action (the `print' statement and the
+curly braces) in the above example, and the result would be the same:
+all lines matching the pattern `foo' would be printed. By comparison,
+omitting the `print' statement but retaining the curly braces makes an
+empty action that does nothing; then no lines would be printed.
+
+
+File: gawk.info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started
+
+An Example with Two Rules
+=========================
+
+ The `awk' utility reads the input files one line at a time. For
+each line, `awk' tries the patterns of each of the rules. If several
+patterns match then several actions are run, in the order in which they
+appear in the `awk' program. If no patterns match, then no actions are
+run.
+
+ After processing all the rules (perhaps none) that match the line,
+`awk' reads the next line (however, *note The `next' Statement: Next
+Statement., and also *note The `nextfile' Statement: Nextfile
+Statement.). This continues until the end of the file is reached.
+
+ For example, the `awk' program:
+
+ /12/ { print $0 }
+ /21/ { print $0 }
+
+contains two rules. The first rule has the string `12' as the pattern
+and `print $0' as the action. The second rule has the string `21' as
+the pattern and also has `print $0' as the action. Each rule's action
+is enclosed in its own pair of braces.
+
+ This `awk' program prints every line that contains the string `12'
+_or_ the string `21'. If a line contains both strings, it is printed
+twice, once by each rule.
+
+ This is what happens if we run this program on our two sample data
+files, `BBS-list' and `inventory-shipped', as shown here:
+
+ $ awk '/12/ { print $0 }
+ > /21/ { print $0 }' BBS-list inventory-shipped
+ -| aardvark 555-5553 1200/300 B
+ -| alpo-net 555-3412 2400/1200/300 A
+ -| barfly 555-7685 1200/300 A
+ -| bites 555-1675 2400/1200/300 A
+ -| core 555-2912 1200/300 C
+ -| fooey 555-1234 2400/1200/300 B
+ -| foot 555-6699 1200/300 B
+ -| macfoo 555-6480 1200/300 A
+ -| sdace 555-3430 2400/1200/300 A
+ -| sabafoo 555-2127 1200/300 C
+ -| sabafoo 555-2127 1200/300 C
+ -| Jan 21 36 64 620
+ -| Apr 21 70 74 514
+
+Note how the line in `BBS-list' beginning with `sabafoo' was printed
+twice, once for each rule.
+
+
+File: gawk.info, Node: More Complex, Next: Statements/Lines, Prev: Two Rules, Up: Getting Started
+
+A More Complex Example
+======================
+
+ Here is an example to give you an idea of what typical `awk'
+programs do. This example shows how `awk' can be used to summarize,
+select, and rearrange the output of another utility. It uses features
+that haven't been covered yet, so don't worry if you don't understand
+all the details.
+
+ ls -lg | awk '$6 == "Nov" { sum += $5 }
+ END { print sum }'
+
+ This command prints the total number of bytes in all the files in the
+current directory that were last modified in November (of any year).
+(In the C shell you would need to type a semicolon and then a backslash
+at the end of the first line; in a POSIX-compliant shell, such as the
+Bourne shell or Bash, the GNU Bourne-Again shell, you can type the
+example as shown.)
+
+ The `ls -lg' part of this example is a system command that gives you
+a listing of the files in a directory, including file size and the date
+the file was last modified. Its output looks like this:
+
+ -rw-r--r-- 1 arnold user 1933 Nov 7 13:05 Makefile
+ -rw-r--r-- 1 arnold user 10809 Nov 7 13:03 gawk.h
+ -rw-r--r-- 1 arnold user 983 Apr 13 12:14 gawk.tab.h
+ -rw-r--r-- 1 arnold user 31869 Jun 15 12:20 gawk.y
+ -rw-r--r-- 1 arnold user 22414 Nov 7 13:03 gawk1.c
+ -rw-r--r-- 1 arnold user 37455 Nov 7 13:03 gawk2.c
+ -rw-r--r-- 1 arnold user 27511 Dec 9 13:07 gawk3.c
+ -rw-r--r-- 1 arnold user 7989 Nov 7 13:03 gawk4.c
+
+The first field contains read-write permissions, the second field
+contains the number of links to the file, and the third field
+identifies the owner of the file. The fourth field identifies the group
+of the file. The fifth field contains the size of the file in bytes.
+The sixth, seventh and eighth fields contain the month, day, and time,
+respectively, that the file was last modified. Finally, the ninth field
+contains the name of the file.
+
+ The `$6 == "Nov"' in our `awk' program is an expression that tests
+whether the sixth field of the output from `ls -lg' matches the string
+`Nov'. Each time a line has the string `Nov' for its sixth field, the
+action `sum += $5' is performed. This adds the fifth field (the file
+size) to the variable `sum'. As a result, when `awk' has finished
+reading all the input lines, `sum' is the sum of the sizes of files
+whose lines matched the pattern. (This works because `awk' variables
+are automatically initialized to zero.)
+
+ After the last line of output from `ls' has been processed, the
+`END' rule is executed, and the value of `sum' is printed. In this
+example, the value of `sum' would be 80600.
+
+ These more advanced `awk' techniques are covered in later sections
+(*note Overview of Actions: Action Overview.). Before you can move on
+to more advanced `awk' programming, you have to know how `awk'
+interprets your input and displays your output. By manipulating fields
+and using `print' statements, you can produce some very useful and
+impressive looking reports.
+
+
+File: gawk.info, Node: Statements/Lines, Next: Other Features, Prev: More Complex, Up: Getting Started
+
+`awk' Statements Versus Lines
+=============================
+
+ Most often, each line in an `awk' program is a separate statement or
+separate rule, like this:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+ However, `gawk' will ignore newlines after any of the following:
+
+ , { ? : || && do else
+
+A newline at any other point is considered the end of the statement.
+(Splitting lines after `?' and `:' is a minor `gawk' extension. The
+`?' and `:' referred to here is the three operand conditional
+expression described in *Note Conditional Expressions: Conditional Exp.)
+
+ If you would like to split a single statement into two lines at a
+point where a newline would terminate it, you can "continue" it by
+ending the first line with a backslash character, `\'. The backslash
+must be the final character on the line to be recognized as a
+continuation character. This is allowed absolutely anywhere in the
+statement, even in the middle of a string or regular expression. For
+example:
+
+ awk '/This regular expression is too long, so continue it\
+ on the next line/ { print $1 }'
+
+We have generally not used backslash continuation in the sample programs
+in this Info file. Since in `gawk' there is no limit on the length of
+a line, it is never strictly necessary; it just makes programs more
+readable. For this same reason, as well as for clarity, we have kept
+most statements short in the sample programs presented throughout the
+Info file. Backslash continuation is most useful when your `awk'
+program is in a separate source file, instead of typed in on the
+command line. You should also note that many `awk' implementations are
+more particular about where you may use backslash continuation. For
+example, they may not allow you to split a string constant using
+backslash continuation. Thus, for maximal portability of your `awk'
+programs, it is best not to split your lines in the middle of a regular
+expression or a string.
+
+ *Caution: backslash continuation does not work as described above
+with the C shell.* Continuation with backslash works for `awk'
+programs in files, and also for one-shot programs _provided_ you are
+using a POSIX-compliant shell, such as the Bourne shell or Bash, the
+GNU Bourne-Again shell. But the C shell (`csh') behaves differently!
+There, you must use two backslashes in a row, followed by a newline.
+Note also that when using the C shell, _every_ newline in your awk
+program must be escaped with a backslash. To illustrate:
+
+ % awk 'BEGIN { \
+ ? print \\
+ ? "hello, world" \
+ ? }'
+ -| hello, world
+
+Here, the `%' and `?' are the C shell's primary and secondary prompts,
+analogous to the standard shell's `$' and `>'.
+
+ `awk' is a line-oriented language. Each rule's action has to begin
+on the same line as the pattern. To have the pattern and action on
+separate lines, you _must_ use backslash continuation--there is no
+other way.
+
+ Note that backslash continuation and comments do not mix. As soon as
+`awk' sees the `#' that starts a comment, it ignores _everything_ on
+the rest of the line. For example:
+
+ $ gawk 'BEGIN { print "dont panic" # a friendly \
+ > BEGIN rule
+ > }'
+ error--> gawk: cmd. line:2: BEGIN rule
+ error--> gawk: cmd. line:2: ^ parse error
+
+Here, it looks like the backslash would continue the comment onto the
+next line. However, the backslash-newline combination is never even
+noticed, since it is "hidden" inside the comment. Thus, the `BEGIN' is
+noted as a syntax error.
+
+ When `awk' statements within one rule are short, you might want to
+put more than one of them on a line. You do this by separating the
+statements with a semicolon, `;'.
+
+ This also applies to the rules themselves. Thus, the previous
+program could have been written:
+
+ /12/ { print $0 } ; /21/ { print $0 }
+
+*Note:* the requirement that rules on the same line must be separated
+with a semicolon was not in the original `awk' language; it was added
+for consistency with the treatment of statements within an action.
+
+
+File: gawk.info, Node: Other Features, Next: When, Prev: Statements/Lines, Up: Getting Started
+
+Other Features of `awk'
+=======================
+
+ The `awk' language provides a number of predefined, or built-in
+variables, which your programs can use to get information from `awk'.
+There are other variables your program can set to control how `awk'
+processes your data.
+
+ In addition, `awk' provides a number of built-in functions for doing
+common computational and string related operations.
+
+ As we develop our presentation of the `awk' language, we introduce
+most of the variables and many of the functions. They are defined
+systematically in *Note Built-in Variables::, and *Note Built-in
+Functions: Built-in.
+
+
+File: gawk.info, Node: When, Prev: Other Features, Up: Getting Started
+
+When to Use `awk'
+=================
+
+ You might wonder how `awk' might be useful for you. Using utility
+programs, advanced patterns, field separators, arithmetic statements,
+and other selection criteria, you can produce much more complex output.
+The `awk' language is very useful for producing reports from large
+amounts of raw data, such as summarizing information from the output of
+other utility programs like `ls'. (*Note A More Complex Example: More
+Complex.)
+
+ Programs written with `awk' are usually much smaller than they would
+be in other languages. This makes `awk' programs easy to compose and
+use. Often, `awk' programs can be quickly composed at your terminal,
+used once, and thrown away. Since `awk' programs are interpreted, you
+can avoid the (usually lengthy) compilation part of the typical
+edit-compile-test-debug cycle of software development.
+
+ Complex programs have been written in `awk', including a complete
+retargetable assembler for eight-bit microprocessors (*note Glossary::,
+for more information) and a microcode assembler for a special purpose
+Prolog computer. However, `awk''s capabilities are strained by tasks of
+such complexity.
+
+ If you find yourself writing `awk' scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated string
+or pattern matching capabilities. The shell is also good at string and
+pattern matching; in addition, it allows powerful use of the system
+utilities. More conventional languages, such as C, C++, and Lisp, offer
+better facilities for system programming and for managing the complexity
+of large programs. Programs in these languages may require more lines
+of source code than the equivalent `awk' programs, but they are easier
+to maintain and usually run more efficiently.
+
+
+File: gawk.info, Node: One-liners, Next: Regexp, Prev: Getting Started, Up: Top
+
+Useful One Line Programs
+************************
+
+ Many useful `awk' programs are short, just a line or two. Here is a
+collection of useful, short programs to get you started. Some of these
+programs contain constructs that haven't been covered yet. The
+description of the program will give you a good idea of what is going
+on, but please read the rest of the Info file to become an `awk' expert!
+
+ Most of the examples use a data file named `data'. This is just a
+placeholder; if you were to use these programs yourself, you would
+substitute your own file names for `data'.
+
+ Since you are reading this in Info, each line of the example code is
+enclosed in quotes, to represent text that you would type literally.
+The examples themselves represent shell commands that use single quotes
+to keep the shell from interpreting the contents of the program. When
+reading the examples, focus on the text between the open and close
+quotes.
+
+`awk '{ if (length($0) > max) max = length($0) }'
+` END { print max }' data'
+ This program prints the length of the longest input line.
+
+`awk 'length($0) > 80' data'
+ This program prints every line that is longer than 80 characters.
+ The sole rule has a relational expression as its pattern, and has
+ no action (so the default action, printing the record, is used).
+
+`expand data | awk '{ if (x < length()) x = length() }'
+` END { print "maximum line length is " x }''
+ This program prints the length of the longest line in `data'. The
+ input is processed by the `expand' program to change tabs into
+ spaces, so the widths compared are actually the right-margin
+ columns.
+
+`awk 'NF > 0' data'
+ This program prints every line that has at least one field. This
+ is an easy way to delete blank lines from a file (or rather, to
+ create a new file similar to the old file but from which the blank
+ lines have been deleted).
+
+`awk 'BEGIN { for (i = 1; i <= 7; i++)'
+` print int(101 * rand()) }''
+ This program prints seven random numbers from zero to 100,
+ inclusive.
+
+`ls -lg FILES | awk '{ x += $5 } ; END { print "total bytes: " x }''
+ This program prints the total number of bytes used by FILES.
+
+`ls -lg FILES | awk '{ x += $5 }'
+` END { print "total K-bytes: " (x + 1023)/1024 }''
+ This program prints the total number of kilobytes used by FILES.
+
+`awk -F: '{ print $1 }' /etc/passwd | sort'
+ This program prints a sorted list of the login names of all users.
+
+`awk 'END { print NR }' data'
+ This program counts lines in a file.
+
+`awk 'NR % 2 == 0' data'
+ This program prints the even numbered lines in the data file. If
+ you were to use the expression `NR % 2 == 1' instead, it would
+ print the odd numbered lines.
+
+
+File: gawk.info, Node: Regexp, Next: Reading Files, Prev: One-liners, Up: Top
+
+Regular Expressions
+*******************
+
+ A "regular expression", or "regexp", is a way of describing a set of
+strings. Because regular expressions are such a fundamental part of
+`awk' programming, their format and use deserve a separate chapter.
+
+ A regular expression enclosed in slashes (`/') is an `awk' pattern
+that matches every input record whose text belongs to that set.
+
+ The simplest regular expression is a sequence of letters, numbers, or
+both. Such a regexp matches any string that contains that sequence.
+Thus, the regexp `foo' matches any string containing `foo'. Therefore,
+the pattern `/foo/' matches any input record containing the three
+characters `foo', _anywhere_ in the record. Other kinds of regexps let
+you specify more complicated classes of strings.
+
+* Menu:
+
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write non-printing characters.
+* Regexp Operators:: Regular Expression Operators.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+
+
+File: gawk.info, Node: Regexp Usage, Next: Escape Sequences, Prev: Regexp, Up: Regexp
+
+How to Use Regular Expressions
+==============================
+
+ A regular expression can be used as a pattern by enclosing it in
+slashes. Then the regular expression is tested against the entire text
+of each record. (Normally, it only needs to match some part of the
+text in order to succeed.) For example, this prints the second field
+of each record that contains the three characters `foo' anywhere in it:
+
+ $ awk '/foo/ { print $2 }' BBS-list
+ -| 555-1234
+ -| 555-6699
+ -| 555-6480
+ -| 555-2127
+
+ Regular expressions can also be used in matching expressions. These
+expressions allow you to specify the string to match against; it need
+not be the entire current input record. The two operators, `~' and
+`!~', perform regular expression comparisons. Expressions using these
+operators can be used as patterns or in `if', `while', `for', and `do'
+statements. (*Note Control Statements in Actions: Statements.)
+
+`EXP ~ /REGEXP/'
+ This is true if the expression EXP (taken as a string) is matched
+ by REGEXP. The following example matches, or selects, all input
+ records with the upper-case letter `J' somewhere in the first
+ field:
+
+ $ awk '$1 ~ /J/' inventory-shipped
+ -| Jan 13 25 15 115
+ -| Jun 31 42 75 492
+ -| Jul 24 34 67 436
+ -| Jan 21 36 64 620
+
+ So does this:
+
+ awk '{ if ($1 ~ /J/) print }' inventory-shipped
+
+`EXP !~ /REGEXP/'
+ This is true if the expression EXP (taken as a character string)
+ is _not_ matched by REGEXP. The following example matches, or
+ selects, all input records whose first field _does not_ contain
+ the upper-case letter `J':
+
+ $ awk '$1 !~ /J/' inventory-shipped
+ -| Feb 15 32 24 226
+ -| Mar 15 24 34 228
+ -| Apr 31 52 63 420
+ -| May 16 34 29 208
+ ...
+
+ When a regexp is written enclosed in slashes, like `/foo/', we call
+it a "regexp constant", much like `5.27' is a numeric constant, and
+`"foo"' is a string constant.
+
+
+File: gawk.info, Node: Escape Sequences, Next: Regexp Operators, Prev: Regexp Usage, Up: Regexp
+
+Escape Sequences
+================
+
+ Some characters cannot be included literally in string constants
+(`"foo"') or regexp constants (`/foo/'). You represent them instead
+with "escape sequences", which are character sequences beginning with a
+backslash (`\').
+
+ One use of an escape sequence is to include a double-quote character
+in a string constant. Since a plain double-quote would end the string,
+you must use `\"' to represent an actual double-quote character as a
+part of the string. For example:
+
+ $ awk 'BEGIN { print "He said \"hi!\" to her." }'
+ -| He said "hi!" to her.
+
+ The backslash character itself is another character that cannot be
+included normally; you write `\\' to put one backslash in the string or
+regexp. Thus, the string whose contents are the two characters `"' and
+`\' must be written `"\"\\"'.
+
+ Another use of backslash is to represent unprintable characters such
+as tab or newline. While there is nothing to stop you from entering
+most unprintable characters directly in a string constant or regexp
+constant, they may look ugly.
+
+ Here is a table of all the escape sequences used in `awk', and what
+they represent. Unless noted otherwise, all of these escape sequences
+apply to both string constants and regexp constants.
+
+`\\'
+ A literal backslash, `\'.
+
+`\a'
+ The "alert" character, `Control-g', ASCII code 7 (BEL).
+
+`\b'
+ Backspace, `Control-h', ASCII code 8 (BS).
+
+`\f'
+ Formfeed, `Control-l', ASCII code 12 (FF).
+
+`\n'
+ Newline, `Control-j', ASCII code 10 (LF).
+
+`\r'
+ Carriage return, `Control-m', ASCII code 13 (CR).
+
+`\t'
+ Horizontal tab, `Control-i', ASCII code 9 (HT).
+
+`\v'
+ Vertical tab, `Control-k', ASCII code 11 (VT).
+
+`\NNN'
+ The octal value NNN, where NNN are one to three digits between `0'
+ and `7'. For example, the code for the ASCII ESC (escape)
+ character is `\033'.
+
+`\xHH...'
+ The hexadecimal value HH, where HH are hexadecimal digits (`0'
+ through `9' and either `A' through `F' or `a' through `f'). Like
+ the same construct in ANSI C, the escape sequence continues until
+ the first non-hexadecimal digit is seen. However, using more than
+ two hexadecimal digits produces undefined results. (The `\x'
+ escape sequence is not allowed in POSIX `awk'.)
+
+`\/'
+ A literal slash (necessary for regexp constants only). You use
+ this when you wish to write a regexp constant that contains a
+ slash. Since the regexp is delimited by slashes, you need to
+ escape the slash that is part of the pattern, in order to tell
+ `awk' to keep processing the rest of the regexp.
+
+`\"'
+ A literal double-quote (necessary for string constants only). You
+ use this when you wish to write a string constant that contains a
+ double-quote. Since the string is delimited by double-quotes, you
+ need to escape the quote that is part of the string, in order to
+ tell `awk' to keep processing the rest of the string.
+
+ In `gawk', there are additional two character sequences that begin
+with backslash that have special meaning in regexps. *Note Additional
+Regexp Operators Only in `gawk': GNU Regexp Operators.
+
+ In a string constant, what happens if you place a backslash before
+something that is not one of the characters listed above? POSIX `awk'
+purposely leaves this case undefined. There are two choices.
+
+ * Strip the backslash out. This is what Unix `awk' and `gawk' both
+ do. For example, `"a\qc"' is the same as `"aqc"'.
+
+ * Leave the backslash alone. Some other `awk' implementations do
+ this. In such implementations, `"a\qc"' is the same as if you had
+ typed `"a\\qc"'.
+
+ In a regexp, a backslash before any character that is not in the
+above table, and not listed in *Note Additional Regexp Operators Only
+in `gawk': GNU Regexp Operators, means that the next character should
+be taken literally, even if it would normally be a regexp operator.
+E.g., `/a\+b/' matches the three characters `a+b'.
+
+ For complete portability, do not use a backslash before any
+character not listed in the table above.
+
+ Another interesting question arises. Suppose you use an octal or
+hexadecimal escape to represent a regexp metacharacter (*note Regular
+Expression Operators: Regexp Operators.). Does `awk' treat the
+character as literal character, or as a regexp operator?
+
+ It turns out that historically, such characters were taken literally
+(d.c.). However, the POSIX standard indicates that they should be
+treated as real metacharacters, and this is what `gawk' does. However,
+in compatibility mode (*note Command Line Options: Options.), `gawk'
+treats the characters represented by octal and hexadecimal escape
+sequences literally when used in regexp constants. Thus, `/a\52b/' is
+equivalent to `/a\*b/'.
+
+ To summarize:
+
+ 1. The escape sequences in the table above are always processed first,
+ for both string constants and regexp constants. This happens very
+ early, as soon as `awk' reads your program.
+
+ 2. `gawk' processes both regexp constants and dynamic regexps (*note
+ Using Dynamic Regexps: Computed Regexps.), for the special
+ operators listed in *Note Additional Regexp Operators Only in
+ `gawk': GNU Regexp Operators.
+
+ 3. A backslash before any other character means to treat that
+ character literally.
+
+
+File: gawk.info, Node: Regexp Operators, Next: GNU Regexp Operators, Prev: Escape Sequences, Up: Regexp
+
+Regular Expression Operators
+============================
+
+ You can combine regular expressions with the following characters,
+called "regular expression operators", or "metacharacters", to increase
+the power and versatility of regular expressions.
+
+ The escape sequences described in *Note Escape Sequences::, are
+valid inside a regexp. They are introduced by a `\'. They are
+recognized and converted into the corresponding real characters as the
+very first step in processing regexps.
+
+ Here is a table of metacharacters. All characters that are not
+escape sequences and that are not listed in the table stand for
+themselves.
+
+`\'
+ This is used to suppress the special meaning of a character when
+ matching. For example:
+
+ \$
+
+ matches the character `$'.
+
+`^'
+ This matches the beginning of a string. For example:
+
+ ^@chapter
+
+ matches the `@chapter' at the beginning of a string, and can be
+ used to identify chapter beginnings in Texinfo source files. The
+ `^' is known as an "anchor", since it anchors the pattern to
+ matching only at the beginning of the string.
+
+ It is important to realize that `^' does not match the beginning of
+ a line embedded in a string. In this example the condition is not
+ true:
+
+ if ("line1\nLINE 2" ~ /^L/) ...
+
+`$'
+ This is similar to `^', but it matches only at the end of a string.
+ For example:
+
+ p$
+
+ matches a record that ends with a `p'. The `$' is also an anchor,
+ and also does not match the end of a line embedded in a string.
+ In this example the condition is not true:
+
+ if ("line1\nLINE 2" ~ /1$/) ...
+
+`.'
+ The period, or dot, matches any single character, _including_ the
+ newline character. For example:
+
+ .P
+
+ matches any single character followed by a `P' in a string. Using
+ concatenation we can make a regular expression like `U.A', which
+ matches any three-character sequence that begins with `U' and ends
+ with `A'.
+
+ In strict POSIX mode (*note Command Line Options: Options.), `.'
+ does not match the NUL character, which is a character with all
+ bits equal to zero. Otherwise, NUL is just another character.
+ Other versions of `awk' may not be able to match the NUL character.
+
+`[...]'
+ This is called a "character list". It matches any _one_ of the
+ characters that are enclosed in the square brackets. For example:
+
+ [MVX]
+
+ matches any one of the characters `M', `V', or `X' in a string.
+
+ Ranges of characters are indicated by using a hyphen between the
+ beginning and ending characters, and enclosing the whole thing in
+ brackets. For example:
+
+ [0-9]
+
+ matches any digit. Multiple ranges are allowed. E.g., the list
+ `[A-Za-z0-9]' is a common way to express the idea of "all
+ alphanumeric characters."
+
+ To include one of the characters `\', `]', `-' or `^' in a
+ character list, put a `\' in front of it. For example:
+
+ [d\]]
+
+ matches either `d', or `]'.
+
+ This treatment of `\' in character lists is compatible with other
+ `awk' implementations, and is also mandated by POSIX. The regular
+ expressions in `awk' are a superset of the POSIX specification for
+ Extended Regular Expressions (EREs). POSIX EREs are based on the
+ regular expressions accepted by the traditional `egrep' utility.
+
+ "Character classes" are a new feature introduced in the POSIX
+ standard. A character class is a special notation for describing
+ lists of characters that have a specific attribute, but where the
+ actual characters themselves can vary from country to country
+ and/or from character set to character set. For example, the
+ notion of what is an alphabetic character differs in the USA and
+ in France.
+
+ A character class is only valid in a regexp _inside_ the brackets
+ of a character list. Character classes consist of `[:', a keyword
+ denoting the class, and `:]'. Here are the character classes
+ defined by the POSIX standard.
+
+ `[:alnum:]'
+ Alphanumeric characters.
+
+ `[:alpha:]'
+ Alphabetic characters.
+
+ `[:blank:]'
+ Space and tab characters.
+
+ `[:cntrl:]'
+ Control characters.
+
+ `[:digit:]'
+ Numeric characters.
+
+ `[:graph:]'
+ Characters that are printable and are also visible. (A space
+ is printable, but not visible, while an `a' is both.)
+
+ `[:lower:]'
+ Lower-case alphabetic characters.
+
+ `[:print:]'
+ Printable characters (characters that are not control
+ characters.)
+
+ `[:punct:]'
+ Punctuation characters (characters that are not letter,
+ digits, control characters, or space characters).
+
+ `[:space:]'
+ Space characters (such as space, tab, and formfeed, to name a
+ few).
+
+ `[:upper:]'
+ Upper-case alphabetic characters.
+
+ `[:xdigit:]'
+ Characters that are hexadecimal digits.
+
+ For example, before the POSIX standard, to match alphanumeric
+ characters, you had to write `/[A-Za-z0-9]/'. If your character
+ set had other alphabetic characters in it, this would not match
+ them. With the POSIX character classes, you can write
+ `/[[:alnum:]]/', and this will match _all_ the alphabetic and
+ numeric characters in your character set.
+
+ Two additional special sequences can appear in character lists.
+ These apply to non-ASCII character sets, which can have single
+ symbols (called "collating elements") that are represented with
+ more than one character, as well as several characters that are
+ equivalent for "collating", or sorting, purposes. (E.g., in
+ French, a plain "e" and a grave-accented "e`" are equivalent.)
+
+ Collating Symbols
+ A "collating symbol" is a multi-character collating element
+ enclosed in `[.' and `.]'. For example, if `ch' is a
+ collating element, then `[[.ch.]]' is a regexp that matches
+ this collating element, while `[ch]' is a regexp that matches
+ either `c' or `h'.
+
+ Equivalence Classes
+ An "equivalence class" is a locale-specific name for a list of
+ characters that are equivalent. The name is enclosed in `[='
+ and `=]'. For example, the name `e' might be used to
+ represent all of "e," "e`," and "e'." In this case, `[[=e]]'
+ is a regexp that matches any of `e', `e'', or `e`'.
+
+ These features are very valuable in non-English speaking locales.
+
+ *Caution:* The library functions that `gawk' uses for regular
+ expression matching currently only recognize POSIX character
+ classes; they do not recognize collating symbols or equivalence
+ classes.
+
+`[^ ...]'
+ This is a "complemented character list". The first character after
+ the `[' _must_ be a `^'. It matches any characters _except_ those
+ in the square brackets. For example:
+
+ [^0-9]
+
+ matches any character that is not a digit.
+
+`|'
+ This is the "alternation operator", and it is used to specify
+ alternatives. For example:
+
+ ^P|[0-9]
+
+ matches any string that matches either `^P' or `[0-9]'. This
+ means it matches any string that starts with `P' or contains a
+ digit.
+
+ The alternation applies to the largest possible regexps on either
+ side. In other words, `|' has the lowest precedence of all the
+ regular expression operators.
+
+`(...)'
+ Parentheses are used for grouping in regular expressions as in
+ arithmetic. They can be used to concatenate regular expressions
+ containing the alternation operator, `|'. For example,
+ `@(samp|code)\{[^}]+\}' matches both `@code{foo}' and
+ `@samp{bar}'. (These are Texinfo formatting control sequences.)
+
+`*'
+ This symbol means that the preceding regular expression is to be
+ repeated as many times as necessary to find a match. For example:
+
+ ph*
+
+ applies the `*' symbol to the preceding `h' and looks for matches
+ of one `p' followed by any number of `h's. This will also match
+ just `p' if no `h's are present.
+
+ The `*' repeats the _smallest_ possible preceding expression.
+ (Use parentheses if you wish to repeat a larger expression.) It
+ finds as many repetitions as possible. For example:
+
+ awk '/\(c[ad][ad]*r x\)/ { print }' sample
+
+ prints every record in `sample' containing a string of the form
+ `(car x)', `(cdr x)', `(cadr x)', and so on. Notice the escaping
+ of the parentheses by preceding them with backslashes.
+
+`+'
+ This symbol is similar to `*', but the preceding expression must be
+ matched at least once. This means that:
+
+ wh+y
+
+ would match `why' and `whhy' but not `wy', whereas `wh*y' would
+ match all three of these strings. This is a simpler way of
+ writing the last `*' example:
+
+ awk '/\(c[ad]+r x\)/ { print }' sample
+
+`?'
+ This symbol is similar to `*', but the preceding expression can be
+ matched either once or not at all. For example:
+
+ fe?d
+
+ will match `fed' and `fd', but nothing else.
+
+`{N}'
+`{N,}'
+`{N,M}'
+ One or two numbers inside braces denote an "interval expression".
+ If there is one number in the braces, the preceding regexp is
+ repeated N times. If there are two numbers separated by a comma,
+ the preceding regexp is repeated N to M times. If there is one
+ number followed by a comma, then the preceding regexp is repeated
+ at least N times.
+
+ `wh{3}y'
+ matches `whhhy' but not `why' or `whhhhy'.
+
+ `wh{3,5}y'
+ matches `whhhy' or `whhhhy' or `whhhhhy', only.
+
+ `wh{2,}y'
+ matches `whhy' or `whhhy', and so on.
+
+ Interval expressions were not traditionally available in `awk'.
+ As part of the POSIX standard they were added, to make `awk' and
+ `egrep' consistent with each other.
+
+ However, since old programs may use `{' and `}' in regexp
+ constants, by default `gawk' does _not_ match interval expressions
+ in regexps. If either `--posix' or `--re-interval' are specified
+ (*note Command Line Options: Options.), then interval expressions
+ are allowed in regexps.
+
+ In regular expressions, the `*', `+', and `?' operators, as well as
+the braces `{' and `}', have the highest precedence, followed by
+concatenation, and finally by `|'. As in arithmetic, parentheses can
+change how operators are grouped.
+
+ If `gawk' is in compatibility mode (*note Command Line Options:
+Options.), character classes and interval expressions are not available
+in regular expressions.
+
+ The next node discusses the GNU-specific regexp operators, and
+provides more detail concerning how command line options affect the way
+`gawk' interprets the characters in regular expressions.
+
+
+File: gawk.info, Node: GNU Regexp Operators, Next: Case-sensitivity, Prev: Regexp Operators, Up: Regexp
+
+Additional Regexp Operators Only in `gawk'
+==========================================
+
+ GNU software that deals with regular expressions provides a number of
+additional regexp operators. These operators are described in this
+section, and are specific to `gawk'; they are not available in other
+`awk' implementations.
+
+ Most of the additional operators are for dealing with word matching.
+For our purposes, a "word" is a sequence of one or more letters, digits,
+or underscores (`_').
+
+`\w'
+ This operator matches any word-constituent character, i.e. any
+ letter, digit, or underscore. Think of it as a short-hand for
+ `[[:alnum:]_]'.
+
+`\W'
+ This operator matches any character that is not word-constituent.
+ Think of it as a short-hand for `[^[:alnum:]_]'.
+
+`\<'
+ This operator matches the empty string at the beginning of a word.
+ For example, `/\<away/' matches `away', but not `stowaway'.
+
+`\>'
+ This operator matches the empty string at the end of a word. For
+ example, `/stow\>/' matches `stow', but not `stowaway'.
+
+`\y'
+ This operator matches the empty string at either the beginning or
+ the end of a word (the word boundar*y*). For example, `\yballs?\y'
+ matches either `ball' or `balls' as a separate word.
+
+`\B'
+ This operator matches the empty string within a word. In other
+ words, `\B' matches the empty string that occurs between two
+ word-constituent characters. For example, `/\Brat\B/' matches
+ `crate', but it does not match `dirty rat'. `\B' is essentially
+ the opposite of `\y'.
+
+ There are two other operators that work on buffers. In Emacs, a
+"buffer" is, naturally, an Emacs buffer. For other programs, the
+regexp library routines that `gawk' uses consider the entire string to
+be matched as the buffer.
+
+ For `awk', since `^' and `$' always work in terms of the beginning
+and end of strings, these operators don't add any new capabilities.
+They are provided for compatibility with other GNU software.
+
+`\`'
+ This operator matches the empty string at the beginning of the
+ buffer.
+
+`\''
+ This operator matches the empty string at the end of the buffer.
+
+ In other GNU software, the word boundary operator is `\b'. However,
+that conflicts with the `awk' language's definition of `\b' as
+backspace, so `gawk' uses a different letter.
+
+ An alternative method would have been to require two backslashes in
+the GNU operators, but this was deemed to be too confusing, and the
+current method of using `\y' for the GNU `\b' appears to be the lesser
+of two evils.
+
+ The various command line options (*note Command Line Options:
+Options.) control how `gawk' interprets characters in regexps.
+
+No options
+ In the default case, `gawk' provide all the facilities of POSIX
+ regexps and the GNU regexp operators described in *Note Regular
+ Expression Operators: Regexp Operators. However, interval
+ expressions are not supported.
+
+`--posix'
+ Only POSIX regexps are supported, the GNU operators are not special
+ (e.g., `\w' matches a literal `w'). Interval expressions are
+ allowed.
+
+`--traditional'
+ Traditional Unix `awk' regexps are matched. The GNU operators are
+ not special, interval expressions are not available, and neither
+ are the POSIX character classes (`[[:alnum:]]' and so on).
+ Characters described by octal and hexadecimal escape sequences are
+ treated literally, even if they represent regexp metacharacters.
+
+`--re-interval'
+ Allow interval expressions in regexps, even if `--traditional' has
+ been provided.
+
+
+File: gawk.info, Node: Case-sensitivity, Next: Leftmost Longest, Prev: GNU Regexp Operators, Up: Regexp
+
+Case-sensitivity in Matching
+============================
+
+ Case is normally significant in regular expressions, both when
+matching ordinary characters (i.e. not metacharacters), and inside
+character sets. Thus a `w' in a regular expression matches only a
+lower-case `w' and not an upper-case `W'.
+
+ The simplest way to do a case-independent match is to use a character
+list: `[Ww]'. However, this can be cumbersome if you need to use it
+often; and it can make the regular expressions harder to read. There
+are two alternatives that you might prefer.
+
+ One way to do a case-insensitive match at a particular point in the
+program is to convert the data to a single case, using the `tolower' or
+`toupper' built-in string functions (which we haven't discussed yet;
+*note Built-in Functions for String Manipulation: String Functions.).
+For example:
+
+ tolower($1) ~ /foo/ { ... }
+
+converts the first field to lower-case before matching against it.
+This will work in any POSIX-compliant implementation of `awk'.
+
+ Another method, specific to `gawk', is to set the variable
+`IGNORECASE' to a non-zero value (*note Built-in Variables::). When
+`IGNORECASE' is not zero, _all_ regexp and string operations ignore
+case. Changing the value of `IGNORECASE' dynamically controls the case
+sensitivity of your program as it runs. Case is significant by default
+because `IGNORECASE' (like most variables) is initialized to zero.
+
+ x = "aB"
+ if (x ~ /ab/) ... # this test will fail
+
+ IGNORECASE = 1
+ if (x ~ /ab/) ... # now it will succeed
+
+ In general, you cannot use `IGNORECASE' to make certain rules
+case-insensitive and other rules case-sensitive, because there is no way
+to set `IGNORECASE' just for the pattern of a particular rule. To do
+this, you must use character lists or `tolower'. However, one thing
+you can do only with `IGNORECASE' is turn case-sensitivity on or off
+dynamically for all the rules at once.
+
+ `IGNORECASE' can be set on the command line, or in a `BEGIN' rule
+(*note Other Command Line Arguments: Other Arguments.; also *note
+Startup and Cleanup Actions: Using BEGIN/END.). Setting `IGNORECASE'
+from the command line is a way to make a program case-insensitive
+without having to edit it.
+
+ Prior to version 3.0 of `gawk', the value of `IGNORECASE' only
+affected regexp operations. It did not affect string comparison with
+`==', `!=', and so on. Beginning with version 3.0, both regexp and
+string comparison operations are affected by `IGNORECASE'.
+
+ Beginning with version 3.0 of `gawk', the equivalences between
+upper-case and lower-case characters are based on the ISO-8859-1 (ISO
+Latin-1) character set. This character set is a superset of the
+traditional 128 ASCII characters, that also provides a number of
+characters suitable for use with European languages.
+
+ The value of `IGNORECASE' has no effect if `gawk' is in
+compatibility mode (*note Command Line Options: Options.). Case is
+always significant in compatibility mode.
+
+
+File: gawk.info, Node: Leftmost Longest, Next: Computed Regexps, Prev: Case-sensitivity, Up: Regexp
+
+How Much Text Matches?
+======================
+
+ Consider the following example:
+
+ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
+
+ This example uses the `sub' function (which we haven't discussed yet,
+*note Built-in Functions for String Manipulation: String Functions.)
+to make a change to the input record. Here, the regexp `/a+/' indicates
+"one or more `a' characters," and the replacement text is `<A>'.
+
+ The input contains four `a' characters. What will the output be?
+In other words, how many is "one or more"--will `awk' match two, three,
+or all four `a' characters?
+
+ The answer is, `awk' (and POSIX) regular expressions always match
+the leftmost, _longest_ sequence of input characters that can match.
+Thus, in this example, all four `a' characters are replaced with `<A>'.
+
+ $ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
+ -| <A>bcd
+
+ For simple match/no-match tests, this is not so important. But when
+doing regexp-based field and record splitting, and text matching and
+substitutions with the `match', `sub', `gsub', and `gensub' functions,
+it is very important. *Note Built-in Functions for String
+Manipulation: String Functions, for more information on these functions.
+Understanding this principle is also important for regexp-based record
+and field splitting (*note How Input is Split into Records: Records.,
+and also *note Specifying How Fields are Separated: Field Separators.).
+
+
+File: gawk.info, Node: Computed Regexps, Prev: Leftmost Longest, Up: Regexp
+
+Using Dynamic Regexps
+=====================
+
+ The right hand side of a `~' or `!~' operator need not be a regexp
+constant (i.e. a string of characters between slashes). It may be any
+expression. The expression is evaluated, and converted if necessary to
+a string; the contents of the string are used as the regexp. A regexp
+that is computed in this way is called a "dynamic regexp". For example:
+
+ BEGIN { identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" }
+ $0 ~ identifier_regexp { print }
+
+sets `identifier_regexp' to a regexp that describes `awk' variable
+names, and tests if the input record matches this regexp.
+
+ *Caution:* When using the `~' and `!~' operators, there is a
+difference between a regexp constant enclosed in slashes, and a string
+constant enclosed in double quotes. If you are going to use a string
+constant, you have to understand that the string is in essence scanned
+_twice_; the first time when `awk' reads your program, and the second
+time when it goes to match the string on the left-hand side of the
+operator with the pattern on the right. This is true of any string
+valued expression (such as `identifier_regexp' above), not just string
+constants.
+
+ What difference does it make if the string is scanned twice? The
+answer has to do with escape sequences, and particularly with
+backslashes. To get a backslash into a regular expression inside a
+string, you have to type two backslashes.
+
+ For example, `/\*/' is a regexp constant for a literal `*'. Only
+one backslash is needed. To do the same thing with a string, you would
+have to type `"\\*"'. The first backslash escapes the second one, so
+that the string actually contains the two characters `\' and `*'.
+
+ Given that you can use both regexp and string constants to describe
+regular expressions, which should you use? The answer is "regexp
+constants," for several reasons.
+
+ 1. String constants are more complicated to write, and more difficult
+ to read. Using regexp constants makes your programs less
+ error-prone. Not understanding the difference between the two
+ kinds of constants is a common source of errors.
+
+ 2. It is also more efficient to use regexp constants: `awk' can note
+ that you have supplied a regexp and store it internally in a form
+ that makes pattern matching more efficient. When using a string
+ constant, `awk' must first convert the string into this internal
+ form, and then perform the pattern matching.
+
+ 3. Using regexp constants is better style; it shows clearly that you
+ intend a regexp match.
+
+
+File: gawk.info, Node: Reading Files, Next: Printing, Prev: Regexp, Up: Top
+
+Reading Input Files
+*******************
+
+ In the typical `awk' program, all input is read either from the
+standard input (by default the keyboard, but often a pipe from another
+command) or from files whose names you specify on the `awk' command
+line. If you specify input files, `awk' reads them in order, reading
+all the data from one before going on to the next. The name of the
+current input file can be found in the built-in variable `FILENAME'
+(*note Built-in Variables::).
+
+ The input is read in units called "records", and processed by the
+rules of your program one record at a time. By default, each record is
+one line. Each record is automatically split into chunks called
+"fields". This makes it more convenient for programs to work on the
+parts of a record.
+
+ On rare occasions you will need to use the `getline' command. The
+`getline' command is valuable, both because it can do explicit input
+from any number of files, and because the files used with it do not
+have to be named on the `awk' command line (*note Explicit Input with
+`getline': Getline.).
+
+* Menu:
+
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the `getline' function.
+
+
+File: gawk.info, Node: Records, Next: Fields, Prev: Reading Files, Up: Reading Files
+
+How Input is Split into Records
+===============================
+
+ The `awk' utility divides the input for your `awk' program into
+records and fields. Records are separated by a character called the
+"record separator". By default, the record separator is the newline
+character. This is why records are, by default, single lines. You can
+use a different character for the record separator by assigning the
+character to the built-in variable `RS'.
+
+ You can change the value of `RS' in the `awk' program, like any
+other variable, with the assignment operator, `=' (*note Assignment
+Expressions: Assignment Ops.). The new record-separator character
+should be enclosed in quotation marks, which indicate a string
+constant. Often the right time to do this is at the beginning of
+execution, before any input has been processed, so that the very first
+record will be read with the proper separator. To do this, use the
+special `BEGIN' pattern (*note The `BEGIN' and `END' Special Patterns:
+BEGIN/END.). For example:
+
+ awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
+
+changes the value of `RS' to `"/"', before reading any input. This is
+a string whose first character is a slash; as a result, records are
+separated by slashes. Then the input file is read, and the second rule
+in the `awk' program (the action with no pattern) prints each record.
+Since each `print' statement adds a newline at the end of its output,
+the effect of this `awk' program is to copy the input with each slash
+changed to a newline. Here are the results of running the program on
+`BBS-list':
+
+ $ awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
+ -| aardvark 555-5553 1200
+ -| 300 B
+ -| alpo-net 555-3412 2400
+ -| 1200
+ -| 300 A
+ -| barfly 555-7685 1200
+ -| 300 A
+ -| bites 555-1675 2400
+ -| 1200
+ -| 300 A
+ -| camelot 555-0542 300 C
+ -| core 555-2912 1200
+ -| 300 C
+ -| fooey 555-1234 2400
+ -| 1200
+ -| 300 B
+ -| foot 555-6699 1200
+ -| 300 B
+ -| macfoo 555-6480 1200
+ -| 300 A
+ -| sdace 555-3430 2400
+ -| 1200
+ -| 300 A
+ -| sabafoo 555-2127 1200
+ -| 300 C
+ -|
+
+Note that the entry for the `camelot' BBS is not split. In the
+original data file (*note Data Files for the Examples: Sample Data
+Files.), the line looks like this:
+
+ camelot 555-0542 300 C
+
+It only has one baud rate; there are no slashes in the record.
+
+ Another way to change the record separator is on the command line,
+using the variable-assignment feature (*note Other Command Line
+Arguments: Other Arguments.).
+
+ awk '{ print $0 }' RS="/" BBS-list
+
+This sets `RS' to `/' before processing `BBS-list'.
+
+ Using an unusual character such as `/' for the record separator
+produces correct behavior in the vast majority of cases. However, the
+following (extreme) pipeline prints a surprising `1'. There is one
+field, consisting of a newline. The value of the built-in variable
+`NF' is the number of fields in the current record.
+
+ $ echo | awk 'BEGIN { RS = "a" } ; { print NF }'
+ -| 1
+
+Reaching the end of an input file terminates the current input record,
+even if the last character in the file is not the character in `RS'
+(d.c.).
+
+ The empty string, `""' (a string of no characters), has a special
+meaning as the value of `RS': it means that records are separated by
+one or more blank lines, and nothing else. *Note Multiple-Line
+Records: Multiple Line, for more details.
+
+ If you change the value of `RS' in the middle of an `awk' run, the
+new value is used to delimit subsequent records, but the record
+currently being processed (and records already processed) are not
+affected.
+
+ After the end of the record has been determined, `gawk' sets the
+variable `RT' to the text in the input that matched `RS'.
+
+ The value of `RS' is in fact not limited to a one-character string.
+It can be any regular expression (*note Regular Expressions: Regexp.).
+In general, each record ends at the next string that matches the
+regular expression; the next record starts at the end of the matching
+string. This general rule is actually at work in the usual case, where
+`RS' contains just a newline: a record ends at the beginning of the
+next matching string (the next newline in the input) and the following
+record starts just after the end of this string (at the first character
+of the following line). The newline, since it matches `RS', is not
+part of either record.
+
+ When `RS' is a single character, `RT' will contain the same single
+character. However, when `RS' is a regular expression, then `RT'
+becomes more useful; it contains the actual input text that matched the
+regular expression.
+
+ The following example illustrates both of these features. It sets
+`RS' equal to a regular expression that matches either a newline, or a
+series of one or more upper-case letters with optional leading and/or
+trailing white space (*note Regular Expressions: Regexp.).
+
+ $ echo record 1 AAAA record 2 BBBB record 3 |
+ > gawk 'BEGIN { RS = "\n|( *[[:upper:]]+ *)" }
+ > { print "Record =", $0, "and RT =", RT }'
+ -| Record = record 1 and RT = AAAA
+ -| Record = record 2 and RT = BBBB
+ -| Record = record 3 and RT =
+ -|
+
+The final line of output has an extra blank line. This is because the
+value of `RT' is a newline, and then the `print' statement supplies its
+own terminating newline.
+
+ *Note A Simple Stream Editor: Simple Sed, for a more useful example
+of `RS' as a regexp and `RT'.
+
+ The use of `RS' as a regular expression and the `RT' variable are
+`gawk' extensions; they are not available in compatibility mode (*note
+Command Line Options: Options.). In compatibility mode, only the first
+character of the value of `RS' is used to determine the end of the
+record.
+
+ The `awk' utility keeps track of the number of records that have
+been read so far from the current input file. This value is stored in a
+built-in variable called `FNR'. It is reset to zero when a new file is
+started. Another built-in variable, `NR', is the total number of input
+records read so far from all data files. It starts at zero but is
+never automatically reset to zero.
+
+
+File: gawk.info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files
+
+Examining Fields
+================
+
+ When `awk' reads an input record, the record is automatically
+separated or "parsed" by the interpreter into chunks called "fields".
+By default, fields are separated by whitespace, like words in a line.
+Whitespace in `awk' means any string of one or more spaces, tabs or
+newlines;(1) other characters such as formfeed, and so on, that are
+considered whitespace by other languages are _not_ considered
+whitespace by `awk'.
+
+ The purpose of fields is to make it more convenient for you to refer
+to these pieces of the record. You don't have to use them--you can
+operate on the whole record if you wish--but fields are what make
+simple `awk' programs so powerful.
+
+ To refer to a field in an `awk' program, you use a dollar-sign, `$',
+followed by the number of the field you want. Thus, `$1' refers to the
+first field, `$2' to the second, and so on. For example, suppose the
+following is a line of input:
+
+ This seems like a pretty nice example.
+
+Here the first field, or `$1', is `This'; the second field, or `$2', is
+`seems'; and so on. Note that the last field, `$7', is `example.'.
+Because there is no space between the `e' and the `.', the period is
+considered part of the seventh field.
+
+ `NF' is a built-in variable whose value is the number of fields in
+the current record. `awk' updates the value of `NF' automatically,
+each time a record is read.
+
+ No matter how many fields there are, the last field in a record can
+be represented by `$NF'. So, in the example above, `$NF' would be the
+same as `$7', which is `example.'. Why this works is explained below
+(*note Non-constant Field Numbers: Non-Constant Fields.). If you try
+to reference a field beyond the last one, such as `$8' when the record
+has only seven fields, you get the empty string.
+
+ `$0', which looks like a reference to the "zeroth" field, is a
+special case: it represents the whole input record. `$0' is used when
+you are not interested in fields.
+
+ Here are some more examples:
+
+ $ awk '$1 ~ /foo/ { print $0 }' BBS-list
+ -| fooey 555-1234 2400/1200/300 B
+ -| foot 555-6699 1200/300 B
+ -| macfoo 555-6480 1200/300 A
+ -| sabafoo 555-2127 1200/300 C
+
+This example prints each record in the file `BBS-list' whose first
+field contains the string `foo'. The operator `~' is called a
+"matching operator" (*note How to Use Regular Expressions: Regexp
+Usage.); it tests whether a string (here, the field `$1') matches a
+given regular expression.
+
+ By contrast, the following example looks for `foo' in _the entire
+record_ and prints the first field and the last field for each input
+record containing a match.
+
+ $ awk '/foo/ { print $1, $NF }' BBS-list
+ -| fooey B
+ -| foot B
+ -| macfoo A
+ -| sabafoo C
+
+ ---------- Footnotes ----------
+
+ (1) In POSIX `awk', newlines are not considered whitespace for
+separating fields.
+
+
+File: gawk.info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files
+
+Non-constant Field Numbers
+==========================
+
+ The number of a field does not need to be a constant. Any
+expression in the `awk' language can be used after a `$' to refer to a
+field. The value of the expression specifies the field number. If the
+value is a string, rather than a number, it is converted to a number.
+Consider this example:
+
+ awk '{ print $NR }'
+
+Recall that `NR' is the number of records read so far: one in the first
+record, two in the second, etc. So this example prints the first field
+of the first record, the second field of the second record, and so on.
+For the twentieth record, field number 20 is printed; most likely, the
+record has fewer than 20 fields, so this prints a blank line.
+
+ Here is another example of using expressions as field numbers:
+
+ awk '{ print $(2*2) }' BBS-list
+
+ `awk' must evaluate the expression `(2*2)' and use its value as the
+number of the field to print. The `*' sign represents multiplication,
+so the expression `2*2' evaluates to four. The parentheses are used so
+that the multiplication is done before the `$' operation; they are
+necessary whenever there is a binary operator in the field-number
+expression. This example, then, prints the hours of operation (the
+fourth field) for every line of the file `BBS-list'. (All of the `awk'
+operators are listed, in order of decreasing precedence, in *Note
+Operator Precedence (How Operators Nest): Precedence.)
+
+ If the field number you compute is zero, you get the entire record.
+Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are
+not allowed; trying to reference one will usually terminate your
+running `awk' program. (The POSIX standard does not define what
+happens when you reference a negative field number. `gawk' will notice
+this and terminate your program. Other `awk' implementations may
+behave differently.)
+
+ As mentioned in *Note Examining Fields: Fields, the number of fields
+in the current record is stored in the built-in variable `NF' (also
+*note Built-in Variables::). The expression `$NF' is not a special
+feature: it is the direct consequence of evaluating `NF' and using its
+value as a field number.
+
+
+File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files
+
+Changing the Contents of a Field
+================================
+
+ You can change the contents of a field as seen by `awk' within an
+`awk' program; this changes what `awk' perceives as the current input
+record. (The actual input is untouched; `awk' _never_ modifies the
+input file.)
+
+ Consider this example and its output:
+
+ $ awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
+ -| 13 3
+ -| 15 5
+ -| 15 5
+ ...
+
+The `-' sign represents subtraction, so this program reassigns field
+three, `$3', to be the value of field two minus ten, `$2 - 10'. (*Note
+Arithmetic Operators: Arithmetic Ops.) Then field two, and the new
+value for field three, are printed.
+
+ In order for this to work, the text in field `$2' must make sense as
+a number; the string of characters must be converted to a number in
+order for the computer to do arithmetic on it. The number resulting
+from the subtraction is converted back to a string of characters which
+then becomes field three. *Note Conversion of Strings and Numbers:
+Conversion.
+
+ When you change the value of a field (as perceived by `awk'), the
+text of the input record is recalculated to contain the new field where
+the old one was. Therefore, `$0' changes to reflect the altered field.
+Thus, this program prints a copy of the input file, with 10 subtracted
+from the second field of each line.
+
+ $ awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
+ -| Jan 3 25 15 115
+ -| Feb 5 32 24 226
+ -| Mar 5 24 34 228
+ ...
+
+ You can also assign contents to fields that are out of range. For
+example:
+
+ $ awk '{ $6 = ($5 + $4 + $3 + $2)
+ > print $6 }' inventory-shipped
+ -| 168
+ -| 297
+ -| 301
+ ...
+
+We've just created `$6', whose value is the sum of fields `$2', `$3',
+`$4', and `$5'. The `+' sign represents addition. For the file
+`inventory-shipped', `$6' represents the total number of parcels
+shipped for a particular month.
+
+ Creating a new field changes `awk''s internal copy of the current
+input record--the value of `$0'. Thus, if you do `print $0' after
+adding a field, the record printed includes the new field, with the
+appropriate number of field separators between it and the previously
+existing fields.
+
+ This recomputation affects and is affected by `NF' (the number of
+fields; *note Examining Fields: Fields.), and by a feature that has not
+been discussed yet, the "output field separator", `OFS', which is used
+to separate the fields (*note Output Separators::). For example, the
+value of `NF' is set to the number of the highest field you create.
+
+ Note, however, that merely _referencing_ an out-of-range field does
+_not_ change the value of either `$0' or `NF'. Referencing an
+out-of-range field only produces an empty string. For example:
+
+ if ($(NF+1) != "")
+ print "can't happen"
+ else
+ print "everything is normal"
+
+should print `everything is normal', because `NF+1' is certain to be
+out of range. (*Note The `if'-`else' Statement: If Statement, for more
+information about `awk''s `if-else' statements. *Note Variable Typing
+and Comparison Expressions: Typing and Comparison, for more information
+about the `!=' operator.)
+
+ It is important to note that making an assignment to an existing
+field will change the value of `$0', but will not change the value of
+`NF', even when you assign the empty string to a field. For example:
+
+ $ echo a b c d | awk '{ OFS = ":"; $2 = ""
+ > print $0; print NF }'
+ -| a::c:d
+ -| 4
+
+The field is still there; it just has an empty value. You can tell
+because there are two colons in a row.
+
+ This example shows what happens if you create a new field.
+
+ $ echo a b c d | awk '{ OFS = ":"; $2 = ""; $6 = "new"
+ > print $0; print NF }'
+ -| a::c:d::new
+ -| 6
+
+The intervening field, `$5' is created with an empty value (indicated
+by the second pair of adjacent colons), and `NF' is updated with the
+value six.
+
+ Finally, decrementing `NF' will lose the values of the fields after
+the new value of `NF', and `$0' will be recomputed. Here is an example:
+
+ $ echo a b c d e f | ../gawk '{ print "NF =", NF;
+ > NF = 3; print $0 }'
+ -| NF = 6
+ -| a b c
+
+
+File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files
+
+Specifying How Fields are Separated
+===================================
+
+ This section is rather long; it describes one of the most fundamental
+operations in `awk'.
+
+* Menu:
+
+* Basic Field Splitting:: How fields are split with single characters
+ or simple strings.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate field.
+* Command Line Field Separator:: Setting `FS' from the command line.
+* Field Splitting Summary:: Some final points and a summary table.
+
+
+File: gawk.info, Node: Basic Field Splitting, Next: Regexp Field Splitting, Prev: Field Separators, Up: Field Separators
+
+The Basics of Field Separating
+------------------------------
+
+ The "field separator", which is either a single character or a
+regular expression, controls the way `awk' splits an input record into
+fields. `awk' scans the input record for character sequences that
+match the separator; the fields themselves are the text between the
+matches.
+
+ In the examples below, we use the bullet symbol "*" to represent
+spaces in the output.
+
+ If the field separator is `oo', then the following line:
+
+ moo goo gai pan
+
+would be split into three fields: `m', `*g' and `*gai*pan'. Note the
+leading spaces in the values of the second and third fields.
+
+ The field separator is represented by the built-in variable `FS'.
+Shell programmers take note! `awk' does _not_ use the name `IFS' which
+is used by the POSIX compatible shells (such as the Bourne shell, `sh',
+or the GNU Bourne-Again Shell, Bash).
+
+ You can change the value of `FS' in the `awk' program with the
+assignment operator, `=' (*note Assignment Expressions: Assignment
+Ops.). Often the right time to do this is at the beginning of
+execution, before any input has been processed, so that the very first
+record will be read with the proper separator. To do this, use the
+special `BEGIN' pattern (*note The `BEGIN' and `END' Special Patterns:
+BEGIN/END.). For example, here we set the value of `FS' to the string
+`","':
+
+ awk 'BEGIN { FS = "," } ; { print $2 }'
+
+Given the input line,
+
+ John Q. Smith, 29 Oak St., Walamazoo, MI 42139
+
+this `awk' program extracts and prints the string `*29*Oak*St.'.
+
+ Sometimes your input data will contain separator characters that
+don't separate fields the way you thought they would. For instance, the
+person's name in the example we just used might have a title or suffix
+attached, such as `John Q. Smith, LXIX'. From input containing such a
+name:
+
+ John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
+
+the above program would extract `*LXIX', instead of `*29*Oak*St.'. If
+you were expecting the program to print the address, you would be
+surprised. The moral is: choose your data layout and separator
+characters carefully to prevent such problems.
+
+ Normally, fields are separated by whitespace sequences (spaces, tabs
+and newlines), not by single spaces: two spaces in a row do not delimit
+an empty field. The default value of the field separator `FS' is a
+string containing a single space, `" "'. If this value were
+interpreted in the usual way, each space character would separate
+fields, so two spaces in a row would make an empty field between them.
+The reason this does not happen is that a single space as the value of
+`FS' is a special case: it is taken to specify the default manner of
+delimiting fields.
+
+ If `FS' is any other single character, such as `","', then each
+occurrence of that character separates two fields. Two consecutive
+occurrences delimit an empty field. If the character occurs at the
+beginning or the end of the line, that too delimits an empty field. The
+space character is the only single character which does not follow these
+rules.
+
+
+File: gawk.info, Node: Regexp Field Splitting, Next: Single Character Fields, Prev: Basic Field Splitting, Up: Field Separators
+
+Using Regular Expressions to Separate Fields
+--------------------------------------------
+
+ The previous node discussed the use of single characters or simple
+strings as the value of `FS'. More generally, the value of `FS' may be
+a string containing any regular expression. In this case, each match
+in the record for the regular expression separates fields. For
+example, the assignment:
+
+ FS = ", \t"
+
+makes every area of an input line that consists of a comma followed by a
+space and a tab, into a field separator. (`\t' is an "escape sequence"
+that stands for a tab; *note Escape Sequences::, for the complete list
+of similar escape sequences.)
+
+ For a less trivial example of a regular expression, suppose you want
+single spaces to separate fields the way single commas were used above.
+You can set `FS' to `"[ ]"' (left bracket, space, right bracket). This
+regular expression matches a single space and nothing else (*note
+Regular Expressions: Regexp.).
+
+ There is an important difference between the two cases of `FS = " "'
+(a single space) and `FS = "[ \t\n]+"' (left bracket, space, backslash,
+"t", backslash, "n", right bracket, which is a regular expression
+matching one or more spaces, tabs, or newlines). For both values of
+`FS', fields are separated by runs of spaces, tabs and/or newlines.
+However, when the value of `FS' is `" "', `awk' will first strip
+leading and trailing whitespace from the record, and then decide where
+the fields are.
+
+ For example, the following pipeline prints `b':
+
+ $ echo ' a b c d ' | awk '{ print $2 }'
+ -| b
+
+However, this pipeline prints `a' (note the extra spaces around each
+letter):
+
+ $ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" }
+ > { print $2 }'
+ -| a
+
+In this case, the first field is "null", or empty.
+
+ The stripping of leading and trailing whitespace also comes into
+play whenever `$0' is recomputed. For instance, study this pipeline:
+
+ $ echo ' a b c d' | awk '{ print; $2 = $2; print }'
+ -| a b c d
+ -| a b c d
+
+The first `print' statement prints the record as it was read, with
+leading whitespace intact. The assignment to `$2' rebuilds `$0' by
+concatenating `$1' through `$NF' together, separated by the value of
+`OFS'. Since the leading whitespace was ignored when finding `$1', it
+is not part of the new `$0'. Finally, the last `print' statement
+prints the new `$0'.
+
+
+File: gawk.info, Node: Single Character Fields, Next: Command Line Field Separator, Prev: Regexp Field Splitting, Up: Field Separators
+
+Making Each Character a Separate Field
+--------------------------------------
+
+ There are times when you may want to examine each character of a
+record separately. In `gawk', this is easy to do, you simply assign
+the null string (`""') to `FS'. In this case, each individual character
+in the record will become a separate field. Here is an example:
+
+ echo a b | gawk 'BEGIN { FS = "" }
+ {
+ for (i = 1; i <= NF; i = i + 1)
+ print "Field", i, "is", $i
+ }'
+
+The output from this is:
+
+ Field 1 is a
+ Field 2 is
+ Field 3 is b
+
+ Traditionally, the behavior for `FS' equal to `""' was not defined.
+In this case, Unix `awk' would simply treat the entire record as only
+having one field (d.c.). In compatibility mode (*note Command Line
+Options: Options.), if `FS' is the null string, then `gawk' will also
+behave this way.
+
+
+File: gawk.info, Node: Command Line Field Separator, Next: Field Splitting Summary, Prev: Single Character Fields, Up: Field Separators
+
+Setting `FS' from the Command Line
+----------------------------------
+
+ `FS' can be set on the command line. You use the `-F' option to do
+so. For example:
+
+ awk -F, 'PROGRAM' INPUT-FILES
+
+sets `FS' to be the `,' character. Notice that the option uses a
+capital `F'. Contrast this with `-f', which specifies a file
+containing an `awk' program. Case is significant in command line
+options: the `-F' and `-f' options have nothing to do with each other.
+You can use both options at the same time to set the `FS' variable
+_and_ get an `awk' program from a file.
+
+ The value used for the argument to `-F' is processed in exactly the
+same way as assignments to the built-in variable `FS'. This means that
+if the field separator contains special characters, they must be escaped
+appropriately. For example, to use a `\' as the field separator, you
+would have to type:
+
+ # same as FS = "\\"
+ awk -F\\\\ '...' files ...
+
+Since `\' is used for quoting in the shell, `awk' will see `-F\\'.
+Then `awk' processes the `\\' for escape characters (*note Escape
+Sequences::), finally yielding a single `\' to be used for the field
+separator.
+
+ As a special case, in compatibility mode (*note Command Line
+Options: Options.), if the argument to `-F' is `t', then `FS' is set to
+the tab character. This is because if you type `-F\t' at the shell,
+without any quotes, the `\' gets deleted, so `awk' figures that you
+really want your fields to be separated with tabs, and not `t's. Use
+`-v FS="t"' on the command line if you really do want to separate your
+fields with `t's (*note Command Line Options: Options.).
+
+ For example, let's use an `awk' program file called `baud.awk' that
+contains the pattern `/300/', and the action `print $1'. Here is the
+program:
+
+ /300/ { print $1 }
+
+ Let's also set `FS' to be the `-' character, and run the program on
+the file `BBS-list'. The following command prints a list of the names
+of the bulletin boards that operate at 300 baud and the first three
+digits of their phone numbers:
+
+ $ awk -F- -f baud.awk BBS-list
+ -| aardvark 555
+ -| alpo
+ -| barfly 555
+ ...
+
+Note the second line of output. In the original file (*note Data Files
+for the Examples: Sample Data Files.), the second line looked like this:
+
+ alpo-net 555-3412 2400/1200/300 A
+
+ The `-' as part of the system's name was used as the field
+separator, instead of the `-' in the phone number that was originally
+intended. This demonstrates why you have to be careful in choosing
+your field and record separators.
+
+ On many Unix systems, each user has a separate entry in the system
+password file, one line per user. The information in these lines is
+separated by colons. The first field is the user's logon name, and the
+second is the user's encrypted password. A password file entry might
+look like this:
+
+ arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
+
+ The following program searches the system password file, and prints
+the entries for users who have no password:
+
+ awk -F: '$2 == ""' /etc/passwd
+
+
+File: gawk.info, Node: Field Splitting Summary, Prev: Command Line Field Separator, Up: Field Separators
+
+Field Splitting Summary
+-----------------------
+
+ According to the POSIX standard, `awk' is supposed to behave as if
+each record is split into fields at the time that it is read. In
+particular, this means that you can change the value of `FS' after a
+record is read, and the value of the fields (i.e. how they were split)
+should reflect the old value of `FS', not the new one.
+
+ However, many implementations of `awk' do not work this way.
+Instead, they defer splitting the fields until a field is actually
+referenced. The fields will be split using the _current_ value of
+`FS'! (d.c.) This behavior can be difficult to diagnose. The following
+example illustrates the difference between the two methods. (The
+`sed'(1) command prints just the first line of `/etc/passwd'.)
+
+ sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
+
+will usually print
+
+ root
+
+on an incorrect implementation of `awk', while `gawk' will print
+something like
+
+ root:nSijPlPhZZwgE:0:0:Root:/:
+
+ The following table summarizes how fields are split, based on the
+value of `FS'. (`==' means "is equal to.")
+
+`FS == " "'
+ Fields are separated by runs of whitespace. Leading and trailing
+ whitespace are ignored. This is the default.
+
+`FS == ANY OTHER SINGLE CHARACTER'
+ Fields are separated by each occurrence of the character. Multiple
+ successive occurrences delimit empty fields, as do leading and
+ trailing occurrences. The character can even be a regexp
+ metacharacter; it does not need to be escaped.
+
+`FS == REGEXP'
+ Fields are separated by occurrences of characters that match
+ REGEXP. Leading and trailing matches of REGEXP delimit empty
+ fields.
+
+`FS == ""'
+ Each individual character in the record becomes a separate field.
+
+ ---------- Footnotes ----------
+
+ (1) The `sed' utility is a "stream editor." Its behavior is also
+defined by the POSIX standard.
+
+
+File: gawk.info, Node: Constant Size, Next: Multiple Line, Prev: Field Separators, Up: Reading Files
+
+Reading Fixed-width Data
+========================
+
+ (This section discusses an advanced, experimental feature. If you
+are a novice `awk' user, you may wish to skip it on the first reading.)
+
+ `gawk' version 2.13 introduced a new facility for dealing with
+fixed-width fields with no distinctive field separator. Data of this
+nature arises, for example, in the input for old FORTRAN programs where
+numbers are run together; or in the output of programs that did not
+anticipate the use of their output as input for other programs.
+
+ An example of the latter is a table where all the columns are lined
+up by the use of a variable number of spaces and _empty fields are just
+spaces_. Clearly, `awk''s normal field splitting based on `FS' will
+not work well in this case. Although a portable `awk' program can use
+a series of `substr' calls on `$0' (*note Built-in Functions for String
+Manipulation: String Functions.), this is awkward and inefficient for a
+large number of fields.
+
+ The splitting of an input record into fixed-width fields is
+specified by assigning a string containing space-separated numbers to
+the built-in variable `FIELDWIDTHS'. Each number specifies the width
+of the field _including_ columns between fields. If you want to ignore
+the columns between fields, you can specify the width as a separate
+field that is subsequently ignored.
+
+ The following data is the output of the Unix `w' utility. It is
+useful to illustrate the use of `FIELDWIDTHS'.
+
+ 10:06pm up 21 days, 14:04, 23 users
+ User tty login idle JCPU PCPU what
+ hzuo ttyV0 8:58pm 9 5 vi p24.tex
+ hzang ttyV3 6:37pm 50 -csh
+ eklye ttyV5 9:53pm 7 1 em thes.tex
+ dportein ttyV6 8:17pm 1:47 -csh
+ gierd ttyD3 10:00pm 1 elm
+ dave ttyD4 9:47pm 4 4 w
+ brent ttyp0 26Jun91 4:46 26:46 4:41 bash
+ dave ttyq4 26Jun9115days 46 46 wnewmail
+
+ The following program takes the above input, converts the idle time
+to number of seconds and prints out the first two fields and the
+calculated idle time. (This program uses a number of `awk' features
+that haven't been introduced yet.)
+
+ BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
+ NR > 2 {
+ idle = $4
+ sub(/^ */, "", idle) # strip leading spaces
+ if (idle == "")
+ idle = 0
+ if (idle ~ /:/) {
+ split(idle, t, ":")
+ idle = t[1] * 60 + t[2]
+ }
+ if (idle ~ /days/)
+ idle *= 24 * 60 * 60
+
+ print $1, $2, idle
+ }
+
+ Here is the result of running the program on the data:
+
+ hzuo ttyV0 0
+ hzang ttyV3 50
+ eklye ttyV5 0
+ dportein ttyV6 107
+ gierd ttyD3 1
+ dave ttyD4 0
+ brent ttyp0 286
+ dave ttyq4 1296000
+
+ Another (possibly more practical) example of fixed-width input data
+would be the input from a deck of balloting cards. In some parts of
+the United States, voters mark their choices by punching holes in
+computer cards. These cards are then processed to count the votes for
+any particular candidate or on any particular issue. Since a voter may
+choose not to vote on some issue, any column on the card may be empty.
+An `awk' program for processing such data could use the `FIELDWIDTHS'
+feature to simplify reading the data. (Of course, getting `gawk' to
+run on a system with card readers is another story!)
+
+ Assigning a value to `FS' causes `gawk' to return to using `FS' for
+field splitting. Use `FS = FS' to make this happen, without having to
+know the current value of `FS'.
+
+ This feature is still experimental, and may evolve over time. Note
+that in particular, `gawk' does not attempt to verify the sanity of the
+values used in the value of `FIELDWIDTHS'.
+
+
+File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Constant Size, Up: Reading Files
+
+Multiple-Line Records
+=====================
+
+ In some data bases, a single line cannot conveniently hold all the
+information in one entry. In such cases, you can use multi-line
+records.
+
+ The first step in doing this is to choose your data format: when
+records are not defined as single lines, how do you want to define them?
+What should separate records?
+
+ One technique is to use an unusual character or string to separate
+records. For example, you could use the formfeed character (written
+`\f' in `awk', as in C) to separate them, making each record a page of
+the file. To do this, just set the variable `RS' to `"\f"' (a string
+containing the formfeed character). Any other character could equally
+well be used, as long as it won't be part of the data in a record.
+
+ Another technique is to have blank lines separate records. By a
+special dispensation, an empty string as the value of `RS' indicates
+that records are separated by one or more blank lines. If you set `RS'
+to the empty string, a record always ends at the first blank line
+encountered. And the next record doesn't start until the first
+non-blank line that follows--no matter how many blank lines appear in a
+row, they are considered one record-separator.
+
+ You can achieve the same effect as `RS = ""' by assigning the string
+`"\n\n+"' to `RS'. This regexp matches the newline at the end of the
+record, and one or more blank lines after the record. In addition, a
+regular expression always matches the longest possible sequence when
+there is a choice (*note How Much Text Matches?: Leftmost Longest.) So
+the next record doesn't start until the first non-blank line that
+follows--no matter how many blank lines appear in a row, they are
+considered one record-separator.
+
+ There is an important difference between `RS = ""' and `RS =
+"\n\n+"'. In the first case, leading newlines in the input data file
+are ignored, and if a file ends without extra blank lines after the
+last record, the final newline is removed from the record. In the
+second case, this special processing is not done (d.c.).
+
+ Now that the input is separated into records, the second step is to
+separate the fields in the record. One way to do this is to divide each
+of the lines into fields in the normal manner. This happens by default
+as the result of a special feature: when `RS' is set to the empty
+string, the newline character _always_ acts as a field separator. This
+is in addition to whatever field separations result from `FS'.
+
+ The original motivation for this special exception was probably to
+provide useful behavior in the default case (i.e. `FS' is equal to
+`" "'). This feature can be a problem if you really don't want the
+newline character to separate fields, since there is no way to prevent
+it. However, you can work around this by using the `split' function to
+break up the record manually (*note Built-in Functions for String
+Manipulation: String Functions.).
+
+ Another way to separate fields is to put each field on a separate
+line: to do this, just set the variable `FS' to the string `"\n"'.
+(This simple regular expression matches a single newline.)
+
+ A practical example of a data file organized this way might be a
+mailing list, where each entry is separated by blank lines. If we have
+a mailing list in a file named `addresses', that looks like this:
+
+ Jane Doe
+ 123 Main Street
+ Anywhere, SE 12345-6789
+
+ John Smith
+ 456 Tree-lined Avenue
+ Smallville, MW 98765-4321
+
+ ...
+
+A simple program to process this file would look like this:
+
+ # addrs.awk --- simple mailing list program
+
+ # Records are separated by blank lines.
+ # Each line is one field.
+ BEGIN { RS = "" ; FS = "\n" }
+
+ {
+ print "Name is:", $1
+ print "Address is:", $2
+ print "City and State are:", $3
+ print ""
+ }
+
+ Running the program produces the following output:
+
+ $ awk -f addrs.awk addresses
+ -| Name is: Jane Doe
+ -| Address is: 123 Main Street
+ -| City and State are: Anywhere, SE 12345-6789
+ -|
+ -| Name is: John Smith
+ -| Address is: 456 Tree-lined Avenue
+ -| City and State are: Smallville, MW 98765-4321
+ -|
+ ...
+
+ *Note Printing Mailing Labels: Labels Program, for a more realistic
+program that deals with address lists.
+
+ The following table summarizes how records are split, based on the
+value of `RS'. (`==' means "is equal to.")
+
+`RS == "\n"'
+ Records are separated by the newline character (`\n'). In effect,
+ every line in the data file is a separate record, including blank
+ lines. This is the default.
+
+`RS == ANY SINGLE CHARACTER'
+ Records are separated by each occurrence of the character.
+ Multiple successive occurrences delimit empty records.
+
+`RS == ""'
+ Records are separated by runs of blank lines. The newline
+ character always serves as a field separator, in addition to
+ whatever value `FS' may have. Leading and trailing newlines in a
+ file are ignored.
+
+`RS == REGEXP'
+ Records are separated by occurrences of characters that match
+ REGEXP. Leading and trailing matches of REGEXP delimit empty
+ records.
+
+ In all cases, `gawk' sets `RT' to the input text that matched the
+value specified by `RS'.
+
+
+File: gawk.info, Node: Getline, Prev: Multiple Line, Up: Reading Files
+
+Explicit Input with `getline'
+=============================
+
+ So far we have been getting our input data from `awk''s main input
+stream--either the standard input (usually your terminal, sometimes the
+output from another program) or from the files specified on the command
+line. The `awk' language has a special built-in command called
+`getline' that can be used to read input under your explicit control.
+
+* Menu:
+
+* Getline Intro:: Introduction to the `getline' function.
+* Plain Getline:: Using `getline' with no arguments.
+* Getline/Variable:: Using `getline' into a variable.
+* Getline/File:: Using `getline' from a file.
+* Getline/Variable/File:: Using `getline' into a variable from a
+ file.
+* Getline/Pipe:: Using `getline' from a pipe.
+* Getline/Variable/Pipe:: Using `getline' into a variable from a
+ pipe.
+* Getline Summary:: Summary Of `getline' Variants.
+
+
+File: gawk.info, Node: Getline Intro, Next: Plain Getline, Prev: Getline, Up: Getline
+
+Introduction to `getline'
+-------------------------
+
+ This command is used in several different ways, and should _not_ be
+used by beginners. It is covered here because this is the chapter on
+input. The examples that follow the explanation of the `getline'
+command include material that has not been covered yet. Therefore,
+come back and study the `getline' command _after_ you have reviewed the
+rest of this Info file and have a good knowledge of how `awk' works.
+
+ `getline' returns one if it finds a record, and zero if the end of
+the file is encountered. If there is some error in getting a record,
+such as a file that cannot be opened, then `getline' returns -1. In
+this case, `gawk' sets the variable `ERRNO' to a string describing the
+error that occurred.
+
+ In the following examples, COMMAND stands for a string value that
+represents a shell command.
+
+
+File: gawk.info, Node: Plain Getline, Next: Getline/Variable, Prev: Getline Intro, Up: Getline
+
+Using `getline' with No Arguments
+---------------------------------
+
+ The `getline' command can be used without arguments to read input
+from the current input file. All it does in this case is read the next
+input record and split it up into fields. This is useful if you've
+finished processing the current record, but you want to do some special
+processing _right now_ on the next record. Here's an example:
+
+ awk '{
+ if ((t = index($0, "/*")) != 0) {
+ # value will be "" if t is 1
+ tmp = substr($0, 1, t - 1)
+ u = index(substr($0, t + 2), "*/")
+ while (u == 0) {
+ if (getline <= 0) {
+ m = "unexpected EOF or error"
+ m = (m ": " ERRNO)
+ print m > "/dev/stderr"
+ exit
+ }
+ t = -1
+ u = index($0, "*/")
+ }
+ # substr expression will be "" if */
+ # occurred at end of line
+ $0 = tmp substr($0, t + u + 3)
+ }
+ print $0
+ }'
+
+ This `awk' program deletes all C-style comments, `/* ... */', from
+the input. By replacing the `print $0' with other statements, you
+could perform more complicated processing on the decommented input,
+like searching for matches of a regular expression. This program has a
+subtle problem--it does not work if one comment ends and another begins
+on the same line.
+
+ This form of the `getline' command sets `NF' (the number of fields;
+*note Examining Fields: Fields.), `NR' (the number of records read so
+far; *note How Input is Split into Records: Records.), `FNR' (the
+number of records read from this input file), and the value of `$0'.
+
+ *Note:* the new value of `$0' is used in testing the patterns of any
+subsequent rules. The original value of `$0' that triggered the rule
+which executed `getline' is lost (d.c.). By contrast, the `next'
+statement reads a new record but immediately begins processing it
+normally, starting with the first rule in the program. *Note The
+`next' Statement: Next Statement.
+
+
+File: gawk.info, Node: Getline/Variable, Next: Getline/File, Prev: Plain Getline, Up: Getline
+
+Using `getline' Into a Variable
+-------------------------------
+
+ You can use `getline VAR' to read the next record from `awk''s input
+into the variable VAR. No other processing is done.
+
+ For example, suppose the next line is a comment, or a special string,
+and you want to read it, without triggering any rules. This form of
+`getline' allows you to read that line and store it in a variable so
+that the main read-a-line-and-check-each-rule loop of `awk' never sees
+it.
+
+ The following example swaps every two lines of input. For example,
+given:
+
+ wan
+ tew
+ free
+ phore
+
+it outputs:
+
+ tew
+ wan
+ phore
+ free
+
+Here's the program:
+
+ awk '{
+ if ((getline tmp) > 0) {
+ print tmp
+ print $0
+ } else
+ print $0
+ }'
+
+ The `getline' command used in this way sets only the variables `NR'
+and `FNR' (and of course, VAR). The record is not split into fields,
+so the values of the fields (including `$0') and the value of `NF' do
+not change.
+
+
+File: gawk.info, Node: Getline/File, Next: Getline/Variable/File, Prev: Getline/Variable, Up: Getline
+
+Using `getline' from a File
+---------------------------
+
+ Use `getline < FILE' to read the next record from the file FILE.
+Here FILE is a string-valued expression that specifies the file name.
+`< FILE' is called a "redirection" since it directs input to come from
+a different place.
+
+ For example, the following program reads its input record from the
+file `secondary.input' when it encounters a first field with a value
+equal to 10 in the current input file.
+
+ awk '{
+ if ($1 == 10) {
+ getline < "secondary.input"
+ print
+ } else
+ print
+ }'
+
+ Since the main input stream is not used, the values of `NR' and
+`FNR' are not changed. But the record read is split into fields in the
+normal manner, so the values of `$0' and other fields are changed. So
+is the value of `NF'.
+
+ According to POSIX, `getline < EXPRESSION' is ambiguous if
+EXPRESSION contains unparenthesized operators other than `$'; for
+example, `getline < dir "/" file' is ambiguous because the
+concatenation operator is not parenthesized, and you should write it as
+`getline < (dir "/" file)' if you want your program to be portable to
+other `awk' implementations.
+
+
+File: gawk.info, Node: Getline/Variable/File, Next: Getline/Pipe, Prev: Getline/File, Up: Getline
+
+Using `getline' Into a Variable from a File
+-------------------------------------------
+
+ Use `getline VAR < FILE' to read input the file FILE and put it in
+the variable VAR. As above, FILE is a string-valued expression that
+specifies the file from which to read.
+
+ In this version of `getline', none of the built-in variables are
+changed, and the record is not split into fields. The only variable
+changed is VAR.
+
+ According to POSIX, `getline VAR < EXPRESSION' is ambiguous if
+EXPRESSION contains unparenthesized operators other than `$'; for
+example, `getline < dir "/" file' is ambiguous because the
+concatenation operator is not parenthesized, and you should write it as
+`getline < (dir "/" file)' if you want your program to be portable to
+other `awk' implementations.
+
+ For example, the following program copies all the input files to the
+output, except for records that say `@include FILENAME'. Such a record
+is replaced by the contents of the file FILENAME.
+
+ awk '{
+ if (NF == 2 && $1 == "@include") {
+ while ((getline line < $2) > 0)
+ print line
+ close($2)
+ } else
+ print
+ }'
+
+ Note here how the name of the extra input file is not built into the
+program; it is taken directly from the data, from the second field on
+the `@include' line.
+
+ The `close' function is called to ensure that if two identical
+`@include' lines appear in the input, the entire specified file is
+included twice. *Note Closing Input and Output Files and Pipes: Close
+Files And Pipes.
+
+ One deficiency of this program is that it does not process nested
+`@include' statements (`@include' statements in included files) the way
+a true macro preprocessor would. *Note An Easy Way to Use Library
+Functions: Igawk Program, for a program that does handle nested
+`@include' statements.
+
+
+File: gawk.info, Node: Getline/Pipe, Next: Getline/Variable/Pipe, Prev: Getline/Variable/File, Up: Getline
+
+Using `getline' from a Pipe
+---------------------------
+
+ You can pipe the output of a command into `getline', using `COMMAND
+| getline'. In this case, the string COMMAND is run as a shell command
+and its output is piped into `awk' to be used as input. This form of
+`getline' reads one record at a time from the pipe.
+
+ For example, the following program copies its input to its output,
+except for lines that begin with `@execute', which are replaced by the
+output produced by running the rest of the line as a shell command:
+
+ awk '{
+ if ($1 == "@execute") {
+ tmp = substr($0, 10)
+ while ((tmp | getline) > 0)
+ print
+ close(tmp)
+ } else
+ print
+ }'
+
+The `close' function is called to ensure that if two identical
+`@execute' lines appear in the input, the command is run for each one.
+*Note Closing Input and Output Files and Pipes: Close Files And Pipes.
+
+ Given the input:
+
+ foo
+ bar
+ baz
+ @execute who
+ bletch
+
+the program might produce:
+
+ foo
+ bar
+ baz
+ arnold ttyv0 Jul 13 14:22
+ miriam ttyp0 Jul 13 14:23 (murphy:0)
+ bill ttyp1 Jul 13 14:23 (murphy:0)
+ bletch
+
+Notice that this program ran the command `who' and printed the result.
+(If you try this program yourself, you will of course get different
+results, showing you who is logged in on your system.)
+
+ This variation of `getline' splits the record into fields, sets the
+value of `NF' and recomputes the value of `$0'. The values of `NR' and
+`FNR' are not changed.
+
+ According to POSIX, `EXPRESSION | getline' is ambiguous if
+EXPRESSION contains unparenthesized operators other than `$'; for
+example, `"echo " "date" | getline' is ambiguous because the
+concatenation operator is not parenthesized, and you should write it as
+`("echo " "date") | getline' if you want your program to be portable to
+other `awk' implementations.
+
+
+File: gawk.info, Node: Getline/Variable/Pipe, Next: Getline Summary, Prev: Getline/Pipe, Up: Getline
+
+Using `getline' Into a Variable from a Pipe
+-------------------------------------------
+
+ When you use `COMMAND | getline VAR', the output of the command
+COMMAND is sent through a pipe to `getline' and into the variable VAR.
+For example, the following program reads the current date and time into
+the variable `current_time', using the `date' utility, and then prints
+it.
+
+ awk 'BEGIN {
+ "date" | getline current_time
+ close("date")
+ print "Report printed on " current_time
+ }'
+
+ In this version of `getline', none of the built-in variables are
+changed, and the record is not split into fields.
+
+ According to POSIX, `EXPRESSION | getline VAR' is ambiguous if
+EXPRESSION contains unparenthesized operators other than `$'; for
+example, `"echo " "date" | getline VAR' is ambiguous because the
+concatenation operator is not parenthesized, and you should write it as
+`("echo " "date") | getline VAR' if you want your program to be
+portable to other `awk' implementations.
+
+
+File: gawk.info, Node: Getline Summary, Prev: Getline/Variable/Pipe, Up: Getline
+
+Summary of `getline' Variants
+-----------------------------
+
+ With all the forms of `getline', even though `$0' and `NF', may be
+updated, the record will not be tested against all the patterns in the
+`awk' program, in the way that would happen if the record were read
+normally by the main processing loop of `awk'. However the new record
+is tested against any subsequent rules.
+
+ Many `awk' implementations limit the number of pipelines an `awk'
+program may have open to just one! In `gawk', there is no such limit.
+You can open as many pipelines as the underlying operating system will
+permit.
+
+ An interesting side-effect occurs if you use `getline' (without a
+redirection) inside a `BEGIN' rule. Since an unredirected `getline'
+reads from the command line data files, the first `getline' command
+causes `awk' to set the value of `FILENAME'. Normally, `FILENAME' does
+not have a value inside `BEGIN' rules, since you have not yet started
+to process the command line data files (d.c.). (*Note The `BEGIN' and
+`END' Special Patterns: BEGIN/END, also *note Built-in Variables that
+Convey Information: Auto-set..)
+
+ The following table summarizes the six variants of `getline',
+listing which built-in variables are set by each one.
+
+`getline'
+ sets `$0', `NF', `FNR', and `NR'.
+
+`getline VAR'
+ sets VAR, `FNR', and `NR'.
+
+`getline < FILE'
+ sets `$0', and `NF'.
+
+`getline VAR < FILE'
+ sets VAR.
+
+`COMMAND | getline'
+ sets `$0', and `NF'.
+
+`COMMAND | getline VAR'
+ sets VAR.
+
+
+File: gawk.info, Node: Printing, Next: Expressions, Prev: Reading Files, Up: Top
+
+Printing Output
+***************
+
+ One of the most common actions is to "print", or output, some or all
+of the input. You use the `print' statement for simple output. You
+use the `printf' statement for fancier formatting. Both are described
+in this chapter.
+
+* Menu:
+
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print' statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With `print'.
+* Printf:: The `printf' statement.
+* Redirection:: How to redirect output to multiple files and
+ pipes.
+* Special Files:: File name interpretation in `gawk'.
+ `gawk' allows access to inherited file
+ descriptors.
+* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+
+
+File: gawk.info, Node: Print, Next: Print Examples, Prev: Printing, Up: Printing
+
+The `print' Statement
+=====================
+
+ The `print' statement does output with simple, standardized
+formatting. You specify only the strings or numbers to be printed, in a
+list separated by commas. They are output, separated by single spaces,
+followed by a newline. The statement looks like this:
+
+ print ITEM1, ITEM2, ...
+
+The entire list of items may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions uses the `>'
+relational operator; otherwise it could be confused with a redirection
+(*note Redirecting Output of `print' and `printf': Redirection.).
+
+ The items to be printed can be constant strings or numbers, fields
+of the current record (such as `$1'), variables, or any `awk'
+expressions. Numeric values are converted to strings, and then printed.
+
+ The `print' statement is completely general for computing _what_
+values to print. However, with two exceptions, you cannot specify _how_
+to print them--how many columns, whether to use exponential notation or
+not, and so on. (For the exceptions, *note Output Separators::, and
+*Note Controlling Numeric Output with `print': OFMT.) For that, you
+need the `printf' statement (*note Using `printf' Statements for
+Fancier Printing: Printf.).
+
+ The simple statement `print' with no items is equivalent to `print
+$0': it prints the entire current record. To print a blank line, use
+`print ""', where `""' is the empty string.
+
+ To print a fixed piece of text, use a string constant such as
+`"Don't Panic"' as one item. If you forget to use the double-quote
+characters, your text will be taken as an `awk' expression, and you
+will probably get an error. Keep in mind that a space is printed
+between any two items.
+
+ Each `print' statement makes at least one line of output. But it
+isn't limited to one line. If an item value is a string that contains a
+newline, the newline is output along with the rest of the string. A
+single `print' can make any number of lines this way.
+
+
+File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing
+
+Examples of `print' Statements
+==============================
+
+ Here is an example of printing a string that contains embedded
+newlines (the `\n' is an escape sequence, used to represent the newline
+character; see *Note Escape Sequences::):
+
+ $ awk 'BEGIN { print "line one\nline two\nline three" }'
+ -| line one
+ -| line two
+ -| line three
+
+ Here is an example that prints the first two fields of each input
+record, with a space between them:
+
+ $ awk '{ print $1, $2 }' inventory-shipped
+ -| Jan 13
+ -| Feb 15
+ -| Mar 15
+ ...
+
+ A common mistake in using the `print' statement is to omit the comma
+between two items. This often has the effect of making the items run
+together in the output, with no space. The reason for this is that
+juxtaposing two string expressions in `awk' means to concatenate them.
+Here is the same program, without the comma:
+
+ $ awk '{ print $1 $2 }' inventory-shipped
+ -| Jan13
+ -| Feb15
+ -| Mar15
+ ...
+
+ To someone unfamiliar with the file `inventory-shipped', neither
+example's output makes much sense. A heading line at the beginning
+would make it clearer. Let's add some headings to our table of months
+(`$1') and green crates shipped (`$2'). We do this using the `BEGIN'
+pattern (*note The `BEGIN' and `END' Special Patterns: BEGIN/END.) to
+force the headings to be printed only once:
+
+ awk 'BEGIN { print "Month Crates"
+ print "----- ------" }
+ { print $1, $2 }' inventory-shipped
+
+Did you already guess what happens? When run, the program prints the
+following:
+
+ Month Crates
+ ----- ------
+ Jan 13
+ Feb 15
+ Mar 15
+ ...
+
+The headings and the table data don't line up! We can fix this by
+printing some spaces between the two fields:
+
+ awk 'BEGIN { print "Month Crates"
+ print "----- ------" }
+ { print $1, " ", $2 }' inventory-shipped
+
+ You can imagine that this way of lining up columns can get pretty
+complicated when you have many columns to fix. Counting spaces for two
+or three columns can be simple, but more than this and you can get lost
+quite easily. This is why the `printf' statement was created (*note
+Using `printf' Statements for Fancier Printing: Printf.); one of its
+specialties is lining up columns of data.
+
+ As a side point, you can continue either a `print' or `printf'
+statement simply by putting a newline after any comma (*note `awk'
+Statements Versus Lines: Statements/Lines.).
+
+
+File: gawk.info, Node: Output Separators, Next: OFMT, Prev: Print Examples, Up: Printing
+
+Output Separators
+=================
+
+ As mentioned previously, a `print' statement contains a list of
+items, separated by commas. In the output, the items are normally
+separated by single spaces. This need not be the case; a single space
+is only the default. You can specify any string of characters to use
+as the "output field separator" by setting the built-in variable `OFS'.
+The initial value of this variable is the string `" "', that is, a
+single space.
+
+ The output from an entire `print' statement is called an "output
+record". Each `print' statement outputs one output record and then
+outputs a string called the "output record separator". The built-in
+variable `ORS' specifies this string. The initial value of `ORS' is
+the string `"\n"', i.e. a newline character; thus, normally each
+`print' statement makes a separate line.
+
+ You can change how output fields and records are separated by
+assigning new values to the variables `OFS' and/or `ORS'. The usual
+place to do this is in the `BEGIN' rule (*note The `BEGIN' and `END'
+Special Patterns: BEGIN/END.), so that it happens before any input is
+processed. You may also do this with assignments on the command line,
+before the names of your input files, or using the `-v' command line
+option (*note Command Line Options: Options.).
+
+ The following example prints the first and second fields of each
+input record separated by a semicolon, with a blank line added after
+each line:
+
+ $ awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
+ > { print $1, $2 }' BBS-list
+ -| aardvark;555-5553
+ -|
+ -| alpo-net;555-3412
+ -|
+ -| barfly;555-7685
+ ...
+
+ If the value of `ORS' does not contain a newline, all your output
+will be run together on a single line, unless you output newlines some
+other way.
+
+
+File: gawk.info, Node: OFMT, Next: Printf, Prev: Output Separators, Up: Printing
+
+Controlling Numeric Output with `print'
+=======================================
+
+ When you use the `print' statement to print numeric values, `awk'
+internally converts the number to a string of characters, and prints
+that string. `awk' uses the `sprintf' function to do this conversion
+(*note Built-in Functions for String Manipulation: String Functions.).
+For now, it suffices to say that the `sprintf' function accepts a
+"format specification" that tells it how to format numbers (or
+strings), and that there are a number of different ways in which
+numbers can be formatted. The different format specifications are
+discussed more fully in *Note Format-Control Letters: Control Letters.
+
+ The built-in variable `OFMT' contains the default format
+specification that `print' uses with `sprintf' when it wants to convert
+a number to a string for printing. The default value of `OFMT' is
+`"%.6g"'. By supplying different format specifications as the value of
+`OFMT', you can change how `print' will print your numbers. As a brief
+example:
+
+ $ awk 'BEGIN {
+ > OFMT = "%.0f" # print numbers as integers (rounds)
+ > print 17.23 }'
+ -| 17
+
+According to the POSIX standard, `awk''s behavior will be undefined if
+`OFMT' contains anything but a floating point conversion specification
+(d.c.).
+
+
+File: gawk.info, Node: Printf, Next: Redirection, Prev: OFMT, Up: Printing
+
+Using `printf' Statements for Fancier Printing
+==============================================
+
+ If you want more precise control over the output format than `print'
+gives you, use `printf'. With `printf' you can specify the width to
+use for each item, and you can specify various formatting choices for
+numbers (such as what radix to use, whether to print an exponent,
+whether to print a sign, and how many digits to print after the decimal
+point). You do this by supplying a string, called the "format string",
+which controls how and where to print the other arguments.
+
+* Menu:
+
+* Basic Printf:: Syntax of the `printf' statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+
+
+File: gawk.info, Node: Basic Printf, Next: Control Letters, Prev: Printf, Up: Printf
+
+Introduction to the `printf' Statement
+--------------------------------------
+
+ The `printf' statement looks like this:
+
+ printf FORMAT, ITEM1, ITEM2, ...
+
+The entire list of arguments may optionally be enclosed in parentheses.
+The parentheses are necessary if any of the item expressions use the
+`>' relational operator; otherwise it could be confused with a
+redirection (*note Redirecting Output of `print' and `printf':
+Redirection.).
+
+ The difference between `printf' and `print' is the FORMAT argument.
+This is an expression whose value is taken as a string; it specifies
+how to output each of the other arguments. It is called the "format
+string".
+
+ The format string is very similar to that in the ANSI C library
+function `printf'. Most of FORMAT is text to be output verbatim.
+Scattered among this text are "format specifiers", one per item. Each
+format specifier says to output the next item in the argument list at
+that place in the format.
+
+ The `printf' statement does not automatically append a newline to its
+output. It outputs only what the format string specifies. So if you
+want a newline, you must include one in the format string. The output
+separator variables `OFS' and `ORS' have no effect on `printf'
+statements. For example:
+
+ BEGIN {
+ ORS = "\nOUCH!\n"; OFS = "!"
+ msg = "Don't Panic!"; printf "%s\n", msg
+ }
+
+ This program still prints the familiar `Don't Panic!' message.
+
+
+File: gawk.info, Node: Control Letters, Next: Format Modifiers, Prev: Basic Printf, Up: Printf
+
+Format-Control Letters
+----------------------
+
+ A format specifier starts with the character `%' and ends with a
+"format-control letter"; it tells the `printf' statement how to output
+one item. (If you actually want to output a `%', write `%%'.) The
+format-control letter specifies what kind of value to print. The rest
+of the format specifier is made up of optional "modifiers" which are
+parameters to use, such as the field width.
+
+ Here is a list of the format-control letters:
+
+`c'
+ This prints a number as an ASCII character. Thus, `printf "%c",
+ 65' outputs the letter `A'. The output for a string value is the
+ first character of the string.
+
+`d'
+`i'
+ These are equivalent. They both print a decimal integer. The `%i'
+ specification is for compatibility with ANSI C.
+
+`e'
+`E'
+ This prints a number in scientific (exponential) notation. For
+ example,
+
+ printf "%4.3e\n", 1950
+
+ prints `1.950e+03', with a total of four significant figures of
+ which three follow the decimal point. The `4.3' are modifiers,
+ discussed below. `%E' uses `E' instead of `e' in the output.
+
+`f'
+ This prints a number in floating point notation. For example,
+
+ printf "%4.3f", 1950
+
+ prints `1950.000', with a total of four significant figures of
+ which three follow the decimal point. The `4.3' are modifiers,
+ discussed below.
+
+`g'
+`G'
+ This prints a number in either scientific notation or floating
+ point notation, whichever uses fewer characters. If the result is
+ printed in scientific notation, `%G' uses `E' instead of `e'.
+
+`o'
+ This prints an unsigned octal integer. (In octal, or base-eight
+ notation, the digits run from `0' to `7'; the decimal number eight
+ is represented as `10' in octal.)
+
+`s'
+ This prints a string.
+
+`x'
+`X'
+ This prints an unsigned hexadecimal integer. (In hexadecimal, or
+ base-16 notation, the digits are `0' through `9' and `a' through
+ `f'. The hexadecimal digit `f' represents the decimal number 15.)
+ `%X' uses the letters `A' through `F' instead of `a' through `f'.
+
+`%'
+ This isn't really a format-control letter, but it does have a
+ meaning when used after a `%': the sequence `%%' outputs one `%'.
+ It does not consume an argument, and it ignores any modifiers.
+
+ When using the integer format-control letters for values that are
+outside the range of a C `long' integer, `gawk' will switch to the `%g'
+format specifier. Other versions of `awk' may print invalid values, or
+do something else entirely (d.c.).
+
+
+File: gawk.info, Node: Format Modifiers, Next: Printf Examples, Prev: Control Letters, Up: Printf
+
+Modifiers for `printf' Formats
+------------------------------
+
+ A format specification can also include "modifiers" that can control
+how much of the item's value is printed and how much space it gets. The
+modifiers come between the `%' and the format-control letter. In the
+examples below, we use the bullet symbol "*" to represent spaces in the
+output. Here are the possible modifiers, in the order in which they may
+appear:
+
+`-'
+ The minus sign, used before the width modifier (see below), says
+ to left-justify the argument within its specified width. Normally
+ the argument is printed right-justified in the specified width.
+ Thus,
+
+ printf "%-4s", "foo"
+
+ prints `foo*'.
+
+`SPACE'
+ For numeric conversions, prefix positive values with a space, and
+ negative values with a minus sign.
+
+`+'
+ The plus sign, used before the width modifier (see below), says to
+ always supply a sign for numeric conversions, even if the data to
+ be formatted is positive. The `+' overrides the space modifier.
+
+`#'
+ Use an "alternate form" for certain control letters. For `%o',
+ supply a leading zero. For `%x', and `%X', supply a leading `0x'
+ or `0X' for a non-zero result. For `%e', `%E', and `%f', the
+ result will always contain a decimal point. For `%g', and `%G',
+ trailing zeros are not removed from the result.
+
+`0'
+ A leading `0' (zero) acts as a flag, that indicates output should
+ be padded with zeros instead of spaces. This applies even to
+ non-numeric output formats (d.c.). This flag only has an effect
+ when the field width is wider than the value to be printed.
+
+`WIDTH'
+ This is a number specifying the desired minimum width of a field.
+ Inserting any number between the `%' sign and the format control
+ character forces the field to be expanded to this width. The
+ default way to do this is to pad with spaces on the left. For
+ example,
+
+ printf "%4s", "foo"
+
+ prints `*foo'.
+
+ The value of WIDTH is a minimum width, not a maximum. If the item
+ value requires more than WIDTH characters, it can be as wide as
+ necessary. Thus,
+
+ printf "%4s", "foobar"
+
+ prints `foobar'.
+
+ Preceding the WIDTH with a minus sign causes the output to be
+ padded with spaces on the right, instead of on the left.
+
+`.PREC'
+ This is a number that specifies the precision to use when printing.
+ For the `e', `E', and `f' formats, this specifies the number of
+ digits you want printed to the right of the decimal point. For
+ the `g', and `G' formats, it specifies the maximum number of
+ significant digits. For the `d', `o', `i', `u', `x', and `X'
+ formats, it specifies the minimum number of digits to print. For
+ a string, it specifies the maximum number of characters from the
+ string that should be printed. Thus,
+
+ printf "%.4s", "foobar"
+
+ prints `foob'.
+
+ The C library `printf''s dynamic WIDTH and PREC capability (for
+example, `"%*.*s"') is supported. Instead of supplying explicit WIDTH
+and/or PREC values in the format string, you pass them in the argument
+list. For example:
+
+ w = 5
+ p = 3
+ s = "abcdefg"
+ printf "%*.*s\n", w, p, s
+
+is exactly equivalent to
+
+ s = "abcdefg"
+ printf "%5.3s\n", s
+
+Both programs output `**abc'.
+
+ Earlier versions of `awk' did not support this capability. If you
+must use such a version, you may simulate this feature by using
+concatenation to build up the format string, like so:
+
+ w = 5
+ p = 3
+ s = "abcdefg"
+ printf "%" w "." p "s\n", s
+
+This is not particularly easy to read, but it does work.
+
+ C programmers may be used to supplying additional `l' and `h' flags
+in `printf' format strings. These are not valid in `awk'. Most `awk'
+implementations silently ignore these flags. If `--lint' is provided
+on the command line (*note Command Line Options: Options.), `gawk' will
+warn about their use. If `--posix' is supplied, their use is a fatal
+error.
+
+
+File: gawk.info, Node: Printf Examples, Prev: Format Modifiers, Up: Printf
+
+Examples Using `printf'
+-----------------------
+
+ Here is how to use `printf' to make an aligned table:
+
+ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
+
+prints the names of bulletin boards (`$1') of the file `BBS-list' as a
+string of 10 characters, left justified. It also prints the phone
+numbers (`$2') afterward on the line. This produces an aligned
+two-column table of names and phone numbers:
+
+ $ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
+ -| aardvark 555-5553
+ -| alpo-net 555-3412
+ -| barfly 555-7685
+ -| bites 555-1675
+ -| camelot 555-0542
+ -| core 555-2912
+ -| fooey 555-1234
+ -| foot 555-6699
+ -| macfoo 555-6480
+ -| sdace 555-3430
+ -| sabafoo 555-2127
+
+ Did you notice that we did not specify that the phone numbers be
+printed as numbers? They had to be printed as strings because the
+numbers are separated by a dash. If we had tried to print the phone
+numbers as numbers, all we would have gotten would have been the first
+three digits, `555'. This would have been pretty confusing.
+
+ We did not specify a width for the phone numbers because they are the
+last things on their lines. We don't need to put spaces after them.
+
+ We could make our table look even nicer by adding headings to the
+tops of the columns. To do this, we use the `BEGIN' pattern (*note The
+`BEGIN' and `END' Special Patterns: BEGIN/END.) to force the header to
+be printed only once, at the beginning of the `awk' program:
+
+ awk 'BEGIN { print "Name Number"
+ print "---- ------" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+ Did you notice that we mixed `print' and `printf' statements in the
+above example? We could have used just `printf' statements to get the
+same results:
+
+ awk 'BEGIN { printf "%-10s %s\n", "Name", "Number"
+ printf "%-10s %s\n", "----", "------" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+By printing each column heading with the same format specification used
+for the elements of the column, we have made sure that the headings are
+aligned just like the columns.
+
+ The fact that the same format specification is used three times can
+be emphasized by storing it in a variable, like this:
+
+ awk 'BEGIN { format = "%-10s %s\n"
+ printf format, "Name", "Number"
+ printf format, "----", "------" }
+ { printf format, $1, $2 }' BBS-list
+
+ See if you can use the `printf' statement to line up the headings and
+table data for our `inventory-shipped' example covered earlier in the
+section on the `print' statement (*note The `print' Statement: Print.).
+
+
+File: gawk.info, Node: Redirection, Next: Special Files, Prev: Printf, Up: Printing
+
+Redirecting Output of `print' and `printf'
+==========================================
+
+ So far we have been dealing only with output that prints to the
+standard output, usually your terminal. Both `print' and `printf' can
+also send their output to other places. This is called "redirection".
+
+ A redirection appears after the `print' or `printf' statement.
+Redirections in `awk' are written just like redirections in shell
+commands, except that they are written inside the `awk' program.
+
+ There are three forms of output redirection: output to a file,
+output appended to a file, and output through a pipe to another command.
+They are all shown for the `print' statement, but they work identically
+for `printf' also.
+
+`print ITEMS > OUTPUT-FILE'
+ This type of redirection prints the items into the output file
+ OUTPUT-FILE. The file name OUTPUT-FILE can be any expression.
+ Its value is changed to a string and then used as a file name
+ (*note Expressions::).
+
+ When this type of redirection is used, the OUTPUT-FILE is erased
+ before the first output is written to it. Subsequent writes to
+ the same OUTPUT-FILE do not erase OUTPUT-FILE, but append to it.
+ If OUTPUT-FILE does not exist, then it is created.
+
+ For example, here is how an `awk' program can write a list of BBS
+ names to a file `name-list' and a list of phone numbers to a file
+ `phone-list'. Each output file contains one name or number per
+ line.
+
+ $ awk '{ print $2 > "phone-list"
+ > print $1 > "name-list" }' BBS-list
+ $ cat phone-list
+ -| 555-5553
+ -| 555-3412
+ ...
+ $ cat name-list
+ -| aardvark
+ -| alpo-net
+ ...
+
+`print ITEMS >> OUTPUT-FILE'
+ This type of redirection prints the items into the pre-existing
+ output file OUTPUT-FILE. The difference between this and the
+ single-`>' redirection is that the old contents (if any) of
+ OUTPUT-FILE are not erased. Instead, the `awk' output is appended
+ to the file. If OUTPUT-FILE does not exist, then it is created.
+
+`print ITEMS | COMMAND'
+ It is also possible to send output to another program through a
+ pipe instead of into a file. This type of redirection opens a
+ pipe to COMMAND and writes the values of ITEMS through this pipe,
+ to another process created to execute COMMAND.
+
+ The redirection argument COMMAND is actually an `awk' expression.
+ Its value is converted to a string, whose contents give the shell
+ command to be run.
+
+ For example, this produces two files, one unsorted list of BBS
+ names and one list sorted in reverse alphabetical order:
+
+ awk '{ print $1 > "names.unsorted"
+ command = "sort -r > names.sorted"
+ print $1 | command }' BBS-list
+
+ Here the unsorted list is written with an ordinary redirection
+ while the sorted list is written by piping through the `sort'
+ utility.
+
+ This example uses redirection to mail a message to a mailing list
+ `bug-system'. This might be useful when trouble is encountered in
+ an `awk' script run periodically for system maintenance.
+
+ report = "mail bug-system"
+ print "Awk script failed:", $0 | report
+ m = ("at record number " FNR " of " FILENAME)
+ print m | report
+ close(report)
+
+ The message is built using string concatenation and saved in the
+ variable `m'. It is then sent down the pipeline to the `mail'
+ program.
+
+ We call the `close' function here because it's a good idea to close
+ the pipe as soon as all the intended output has been sent to it.
+ *Note Closing Input and Output Files and Pipes: Close Files And
+ Pipes, for more information on this. This example also
+ illustrates the use of a variable to represent a FILE or COMMAND:
+ it is not necessary to always use a string constant. Using a
+ variable is generally a good idea, since `awk' requires you to
+ spell the string value identically every time.
+
+ Redirecting output using `>', `>>', or `|' asks the system to open a
+file or pipe only if the particular FILE or COMMAND you've specified
+has not already been written to by your program, or if it has been
+closed since it was last written to.
+
+ Many `awk' implementations limit the number of pipelines an `awk'
+program may have open to just one! In `gawk', there is no such limit.
+You can open as many pipelines as the underlying operating system will
+permit.
+
+
+File: gawk.info, Node: Special Files, Next: Close Files And Pipes, Prev: Redirection, Up: Printing
+
+Special File Names in `gawk'
+============================
+
+ Running programs conventionally have three input and output streams
+already available to them for reading and writing. These are known as
+the "standard input", "standard output", and "standard error output".
+These streams are, by default, connected to your terminal, but they are
+often redirected with the shell, via the `<', `<<', `>', `>>', `>&' and
+`|' operators. Standard error is typically used for writing error
+messages; the reason we have two separate streams, standard output and
+standard error, is so that they can be redirected separately.
+
+ In other implementations of `awk', the only way to write an error
+message to standard error in an `awk' program is as follows:
+
+ print "Serious error detected!" | "cat 1>&2"
+
+This works by opening a pipeline to a shell command which can access the
+standard error stream which it inherits from the `awk' process. This
+is far from elegant, and is also inefficient, since it requires a
+separate process. So people writing `awk' programs often neglect to do
+this. Instead, they send the error messages to the terminal, like this:
+
+ print "Serious error detected!" > "/dev/tty"
+
+This usually has the same effect, but not always: although the standard
+error stream is usually the terminal, it can be redirected, and when
+that happens, writing to the terminal is not correct. In fact, if
+`awk' is run from a background job, it may not have a terminal at all.
+Then opening `/dev/tty' will fail.
+
+ `gawk' provides special file names for accessing the three standard
+streams. When you redirect input or output in `gawk', if the file name
+matches one of these special names, then `gawk' directly uses the
+stream it stands for.
+
+`/dev/stdin'
+ The standard input (file descriptor 0).
+
+`/dev/stdout'
+ The standard output (file descriptor 1).
+
+`/dev/stderr'
+ The standard error output (file descriptor 2).
+
+`/dev/fd/N'
+ The file associated with file descriptor N. Such a file must have
+ been opened by the program initiating the `awk' execution
+ (typically the shell). Unless you take special pains in the shell
+ from which you invoke `gawk', only descriptors 0, 1 and 2 are
+ available.
+
+ The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are
+aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively,
+but they are more self-explanatory.
+
+ The proper way to write an error message in a `gawk' program is to
+use `/dev/stderr', like this:
+
+ print "Serious error detected!" > "/dev/stderr"
+
+ `gawk' also provides special file names that give access to
+information about the running `gawk' process. Each of these "files"
+provides a single record of information. To read them more than once,
+you must first close them with the `close' function (*note Closing
+Input and Output Files and Pipes: Close Files And Pipes.). The
+filenames are:
+
+`/dev/pid'
+ Reading this file returns the process ID of the current process,
+ in decimal, terminated with a newline.
+
+`/dev/ppid'
+ Reading this file returns the parent process ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/pgrpid'
+ Reading this file returns the process group ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/user'
+ Reading this file returns a single record terminated with a
+ newline. The fields are separated with spaces. The fields
+ represent the following information:
+
+ `$1'
+ The return value of the `getuid' system call (the real user
+ ID number).
+
+ `$2'
+ The return value of the `geteuid' system call (the effective
+ user ID number).
+
+ `$3'
+ The return value of the `getgid' system call (the real group
+ ID number).
+
+ `$4'
+ The return value of the `getegid' system call (the effective
+ group ID number).
+
+ If there are any additional fields, they are the group IDs
+ returned by `getgroups' system call. (Multiple groups may not be
+ supported on all systems.)
+
+ These special file names may be used on the command line as data
+files, as well as for I/O redirections within an `awk' program. They
+may not be used as source files with the `-f' option.
+
+ Recognition of these special file names is disabled if `gawk' is in
+compatibility mode (*note Command Line Options: Options.).
+
+ *Caution*: Unless your system actually has a `/dev/fd' directory
+(or any of the other above listed special files), the interpretation of
+these file names is done by `gawk' itself. For example, using
+`/dev/fd/4' for output will actually write on file descriptor 4, and
+not on a new file descriptor that was `dup''ed from file descriptor 4.
+Most of the time this does not matter; however, it is important to
+_not_ close any of the files related to file descriptors 0, 1, and 2.
+If you do close one of these files, unpredictable behavior will result.
+
+ The special files that provide process-related information may
+disappear in a future version of `gawk'. *Note Probable Future
+Extensions: Future Extensions.
+
+
+File: gawk.info, Node: Close Files And Pipes, Prev: Special Files, Up: Printing
+
+Closing Input and Output Files and Pipes
+========================================
+
+ If the same file name or the same shell command is used with
+`getline' (*note Explicit Input with `getline': Getline.) more than
+once during the execution of an `awk' program, the file is opened (or
+the command is executed) only the first time. At that time, the first
+record of input is read from that file or command. The next time the
+same file or command is used in `getline', another record is read from
+it, and so on.
+
+ Similarly, when a file or pipe is opened for output, the file name
+or command associated with it is remembered by `awk' and subsequent
+writes to the same file or command are appended to the previous writes.
+The file or pipe stays open until `awk' exits.
+
+ This implies that if you want to start reading the same file again
+from the beginning, or if you want to rerun a shell command (rather than
+reading more output from the command), you must take special steps.
+What you must do is use the `close' function, as follows:
+
+ close(FILENAME)
+
+or
+
+ close(COMMAND)
+
+ The argument FILENAME or COMMAND can be any expression. Its value
+must _exactly_ match the string that was used to open the file or start
+the command (spaces and other "irrelevant" characters included). For
+example, if you open a pipe with this:
+
+ "sort -r names" | getline foo
+
+then you must close it with this:
+
+ close("sort -r names")
+
+ Once this function call is executed, the next `getline' from that
+file or command, or the next `print' or `printf' to that file or
+command, will reopen the file or rerun the command.
+
+ Because the expression that you use to close a file or pipeline must
+exactly match the expression used to open the file or run the command,
+it is good practice to use a variable to store the file name or command.
+The previous example would become
+
+ sortcom = "sort -r names"
+ sortcom | getline foo
+ ...
+ close(sortcom)
+
+This helps avoid hard-to-find typographical errors in your `awk'
+programs.
+
+ Here are some reasons why you might need to close an output file:
+
+ * To write a file and read it back later on in the same `awk'
+ program. Close the file when you are finished writing it; then
+ you can start reading it with `getline'.
+
+ * To write numerous files, successively, in the same `awk' program.
+ If you don't close the files, eventually you may exceed a system
+ limit on the number of open files in one process. So close each
+ one when you are finished writing it.
+
+ * To make a command finish. When you redirect output through a pipe,
+ the command reading the pipe normally continues to try to read
+ input as long as the pipe is open. Often this means the command
+ cannot really do its work until the pipe is closed. For example,
+ if you redirect output to the `mail' program, the message is not
+ actually sent until the pipe is closed.
+
+ * To run the same program a second time, with the same arguments.
+ This is not the same thing as giving more input to the first run!
+
+ For example, suppose you pipe output to the `mail' program. If you
+ output several lines redirected to this pipe without closing it,
+ they make a single message of several lines. By contrast, if you
+ close the pipe after each line of output, then each line makes a
+ separate message.
+
+ `close' returns a value of zero if the close succeeded. Otherwise,
+the value will be non-zero. In this case, `gawk' sets the variable
+`ERRNO' to a string describing the error that occurred.
+
+ If you use more files than the system allows you to have open,
+`gawk' will attempt to multiplex the available open files among your
+data files. `gawk''s ability to do this depends upon the facilities of
+your operating system: it may not always work. It is therefore both
+good practice and good portability advice to always use `close' on your
+files when you are done with them.
+
+
+File: gawk.info, Node: Expressions, Next: Patterns and Actions, Prev: Printing, Up: Top
+
+Expressions
+***********
+
+ Expressions are the basic building blocks of `awk' patterns and
+actions. An expression evaluates to a value, which you can print, test,
+store in a variable or pass to a function. Additionally, an expression
+can assign a new value to a variable or a field, with an assignment
+operator.
+
+ An expression can serve as a pattern or action statement on its own.
+Most other kinds of statements contain one or more expressions which
+specify data on which to operate. As in other languages, expressions
+in `awk' include variables, array references, constants, and function
+calls, as well as combinations of these with various operators.
+
+* Menu:
+
+* Constants:: String, numeric, and regexp constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for later use.
+* Conversion:: The conversion of strings to numbers and vice
+ versa.
+* Arithmetic Ops:: Arithmetic operations (`+', `-',
+ etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+* Truth Values:: What is ``true'' and what is ``false''.
+* Typing and Comparison:: How variables acquire types, and how this
+ affects comparison of numbers and strings with
+ `<', etc.
+* Boolean Ops:: Combining comparison expressions using boolean
+ operators `||' (``or''), `&&'
+ (``and'') and `!' (``not'').
+* Conditional Exp:: Conditional expressions select between two
+ subexpressions under control of a third
+ subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+
+
+File: gawk.info, Node: Constants, Next: Using Constant Regexps, Prev: Expressions, Up: Expressions
+
+Constant Expressions
+====================
+
+ The simplest type of expression is the "constant", which always has
+the same value. There are three types of constants: numeric constants,
+string constants, and regular expression constants.
+
+* Menu:
+
+* Scalar Constants:: Numeric and string constants.
+* Regexp Constants:: Regular Expression constants.
+
+
+File: gawk.info, Node: Scalar Constants, Next: Regexp Constants, Prev: Constants, Up: Constants
+
+Numeric and String Constants
+----------------------------
+
+ A "numeric constant" stands for a number. This number can be an
+integer, a decimal fraction, or a number in scientific (exponential)
+notation.(1) Here are some examples of numeric constants, which all
+have the same value:
+
+ 105
+ 1.05e+2
+ 1050e-1
+
+ A string constant consists of a sequence of characters enclosed in
+double-quote marks. For example:
+
+ "parrot"
+
+represents the string whose contents are `parrot'. Strings in `gawk'
+can be of any length and they can contain any of the possible eight-bit
+ASCII characters including ASCII NUL (character code zero). Other `awk'
+implementations may have difficulty with some character codes.
+
+ ---------- Footnotes ----------
+
+ (1) The internal representation uses double-precision floating point
+numbers. If you don't know what that means, then don't worry about it.
+
+
+File: gawk.info, Node: Regexp Constants, Prev: Scalar Constants, Up: Constants
+
+Regular Expression Constants
+----------------------------
+
+ A regexp constant is a regular expression description enclosed in
+slashes, such as `/^beginning and end$/'. Most regexps used in `awk'
+programs are constant, but the `~' and `!~' matching operators can also
+match computed or "dynamic" regexps (which are just ordinary strings or
+variables that contain a regexp).
+
+
+File: gawk.info, Node: Using Constant Regexps, Next: Variables, Prev: Constants, Up: Expressions
+
+Using Regular Expression Constants
+==================================
+
+ When used on the right hand side of the `~' or `!~' operators, a
+regexp constant merely stands for the regexp that is to be matched.
+
+ Regexp constants (such as `/foo/') may be used like simple
+expressions. When a regexp constant appears by itself, it has the same
+meaning as if it appeared in a pattern, i.e. `($0 ~ /foo/)' (d.c.)
+(*note Expressions as Patterns: Expression Patterns.). This means that
+the two code segments,
+
+ if ($0 ~ /barfly/ || $0 ~ /camelot/)
+ print "found"
+
+and
+
+ if (/barfly/ || /camelot/)
+ print "found"
+
+are exactly equivalent.
+
+ One rather bizarre consequence of this rule is that the following
+boolean expression is valid, but does not do what the user probably
+intended:
+
+ # note that /foo/ is on the left of the ~
+ if (/foo/ ~ $1) print "found foo"
+
+This code is "obviously" testing `$1' for a match against the regexp
+`/foo/'. But in fact, the expression `/foo/ ~ $1' actually means `($0
+~ /foo/) ~ $1'. In other words, first match the input record against
+the regexp `/foo/'. The result will be either zero or one, depending
+upon the success or failure of the match. Then match that result
+against the first field in the record.
+
+ Since it is unlikely that you would ever really wish to make this
+kind of test, `gawk' will issue a warning when it sees this construct in
+a program.
+
+ Another consequence of this rule is that the assignment statement
+
+ matches = /foo/
+
+will assign either zero or one to the variable `matches', depending
+upon the contents of the current input record.
+
+ This feature of the language was never well documented until the
+POSIX specification.
+
+ Constant regular expressions are also used as the first argument for
+the `gensub', `sub' and `gsub' functions, and as the second argument of
+the `match' function (*note Built-in Functions for String Manipulation:
+String Functions.). Modern implementations of `awk', including `gawk',
+allow the third argument of `split' to be a regexp constant, while some
+older implementations do not (d.c.).
+
+ This can lead to confusion when attempting to use regexp constants
+as arguments to user defined functions (*note User-defined Functions:
+User-defined.). For example:
+
+ function mysub(pat, repl, str, global)
+ {
+ if (global)
+ gsub(pat, repl, str)
+ else
+ sub(pat, repl, str)
+ return str
+ }
+
+ {
+ ...
+ text = "hi! hi yourself!"
+ mysub(/hi/, "howdy", text, 1)
+ ...
+ }
+
+ In this example, the programmer wishes to pass a regexp constant to
+the user-defined function `mysub', which will in turn pass it on to
+either `sub' or `gsub'. However, what really happens is that the `pat'
+parameter will be either one or zero, depending upon whether or not
+`$0' matches `/hi/'.
+
+ As it is unlikely that you would ever really wish to pass a truth
+value in this way, `gawk' will issue a warning when it sees a regexp
+constant used as a parameter to a user-defined function.
+
+
+File: gawk.info, Node: Variables, Next: Conversion, Prev: Using Constant Regexps, Up: Expressions
+
+Variables
+=========
+
+ Variables are ways of storing values at one point in your program for
+use later in another part of your program. You can manipulate them
+entirely within your program text, and you can also assign values to
+them on the `awk' command line.
+
+* Menu:
+
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command line and a
+ summary of command line syntax. This is an
+ advanced method of input.
+
+
+File: gawk.info, Node: Using Variables, Next: Assignment Options, Prev: Variables, Up: Variables
+
+Using Variables in a Program
+----------------------------
+
+ Variables let you give names to values and refer to them later. You
+have already seen variables in many of the examples. The name of a
+variable must be a sequence of letters, digits and underscores, but it
+may not begin with a digit. Case is significant in variable names; `a'
+and `A' are distinct variables.
+
+ A variable name is a valid expression by itself; it represents the
+variable's current value. Variables are given new values with
+"assignment operators", "increment operators" and "decrement operators".
+*Note Assignment Expressions: Assignment Ops.
+
+ A few variables have special built-in meanings, such as `FS', the
+field separator, and `NF', the number of fields in the current input
+record. *Note Built-in Variables::, for a list of them. These
+built-in variables can be used and assigned just like all other
+variables, but their values are also used or changed automatically by
+`awk'. All built-in variables names are entirely upper-case.
+
+ Variables in `awk' can be assigned either numeric or string values.
+By default, variables are initialized to the empty string, which is
+zero if converted to a number. There is no need to "initialize" each
+variable explicitly in `awk', the way you would in C and in most other
+traditional languages.
+
+
+File: gawk.info, Node: Assignment Options, Prev: Using Variables, Up: Variables
+
+Assigning Variables on the Command Line
+---------------------------------------
+
+ You can set any `awk' variable by including a "variable assignment"
+among the arguments on the command line when you invoke `awk' (*note
+Other Command Line Arguments: Other Arguments.). Such an assignment has
+this form:
+
+ VARIABLE=TEXT
+
+With it, you can set a variable either at the beginning of the `awk'
+run or in between input files.
+
+ If you precede the assignment with the `-v' option, like this:
+
+ -v VARIABLE=TEXT
+
+then the variable is set at the very beginning, before even the `BEGIN'
+rules are run. The `-v' option and its assignment must precede all the
+file name arguments, as well as the program text. (*Note Command Line
+Options: Options, for more information about the `-v' option.)
+
+ Otherwise, the variable assignment is performed at a time determined
+by its position among the input file arguments: after the processing of
+the preceding input file argument. For example:
+
+ awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
+
+prints the value of field number `n' for all input records. Before the
+first file is read, the command line sets the variable `n' equal to
+four. This causes the fourth field to be printed in lines from the
+file `inventory-shipped'. After the first file has finished, but
+before the second file is started, `n' is set to two, so that the
+second field is printed in lines from `BBS-list'.
+
+ $ awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
+ -| 15
+ -| 24
+ ...
+ -| 555-5553
+ -| 555-3412
+ ...
+
+ Command line arguments are made available for explicit examination by
+the `awk' program in an array named `ARGV' (*note Using `ARGC' and
+`ARGV': ARGC and ARGV.).
+
+ `awk' processes the values of command line assignments for escape
+sequences (d.c.) (*note Escape Sequences::).
+
+
+File: gawk.info, Node: Conversion, Next: Arithmetic Ops, Prev: Variables, Up: Expressions
+
+Conversion of Strings and Numbers
+=================================
+
+ Strings are converted to numbers, and numbers to strings, if the
+context of the `awk' program demands it. For example, if the value of
+either `foo' or `bar' in the expression `foo + bar' happens to be a
+string, it is converted to a number before the addition is performed.
+If numeric values appear in string concatenation, they are converted to
+strings. Consider this:
+
+ two = 2; three = 3
+ print (two three) + 4
+
+This prints the (numeric) value 27. The numeric values of the
+variables `two' and `three' are converted to strings and concatenated
+together, and the resulting string is converted back to the number 23,
+to which four is then added.
+
+ If, for some reason, you need to force a number to be converted to a
+string, concatenate the empty string, `""', with that number. To force
+a string to be converted to a number, add zero to that string.
+
+ A string is converted to a number by interpreting any numeric prefix
+of the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to
+1000, and `"25fix"' has a numeric value of 25. Strings that can't be
+interpreted as valid numbers are converted to zero.
+
+ The exact manner in which numbers are converted into strings is
+controlled by the `awk' built-in variable `CONVFMT' (*note Built-in
+Variables::). Numbers are converted using the `sprintf' function
+(*note Built-in Functions for String Manipulation: String Functions.)
+with `CONVFMT' as the format specifier.
+
+ `CONVFMT''s default value is `"%.6g"', which prints a value with at
+least six significant digits. For some applications you will want to
+change it to specify more precision. Double precision on most modern
+machines gives you 16 or 17 decimal digits of precision.
+
+ Strange results can happen if you set `CONVFMT' to a string that
+doesn't tell `sprintf' how to format floating point numbers in a useful
+way. For example, if you forget the `%' in the format, all numbers
+will be converted to the same constant string.
+
+ As a special case, if a number is an integer, then the result of
+converting it to a string is _always_ an integer, no matter what the
+value of `CONVFMT' may be. Given the following code fragment:
+
+ CONVFMT = "%2.2f"
+ a = 12
+ b = a ""
+
+`b' has the value `"12"', not `"12.00"' (d.c.).
+
+ Prior to the POSIX standard, `awk' specified that the value of
+`OFMT' was used for converting numbers to strings. `OFMT' specifies
+the output format to use when printing numbers with `print'. `CONVFMT'
+was introduced in order to separate the semantics of conversion from
+the semantics of printing. Both `CONVFMT' and `OFMT' have the same
+default value: `"%.6g"'. In the vast majority of cases, old `awk'
+programs will not change their behavior. However, this use of `OFMT'
+is something to keep in mind if you must port your program to other
+implementations of `awk'; we recommend that instead of changing your
+programs, you just port `gawk' itself! *Note The `print' Statement:
+Print, for more information on the `print' statement.
+
+
+File: gawk.info, Node: Arithmetic Ops, Next: Concatenation, Prev: Conversion, Up: Expressions
+
+Arithmetic Operators
+====================
+
+ The `awk' language uses the common arithmetic operators when
+evaluating expressions. All of these arithmetic operators follow normal
+precedence rules, and work as you would expect them to.
+
+ Here is a file `grades' containing a list of student names and three
+test scores per student (it's a small class):
+
+ Pat 100 97 58
+ Sandy 84 72 93
+ Chris 72 92 89
+
+This programs takes the file `grades', and prints the average of the
+scores.
+
+ $ awk '{ sum = $2 + $3 + $4 ; avg = sum / 3
+ > print $1, avg }' grades
+ -| Pat 85
+ -| Sandy 83
+ -| Chris 84.3333
+
+ This table lists the arithmetic operators in `awk', in order from
+highest precedence to lowest:
+
+`- X'
+ Negation.
+
+`+ X'
+ Unary plus. The expression is converted to a number.
+
+`X ^ Y'
+`X ** Y'
+ Exponentiation: X raised to the Y power. `2 ^ 3' has the value
+ eight. The character sequence `**' is equivalent to `^'. (The
+ POSIX standard only specifies the use of `^' for exponentiation.)
+
+`X * Y'
+ Multiplication.
+
+`X / Y'
+ Division. Since all numbers in `awk' are real numbers, the result
+ is not rounded to an integer: `3 / 4' has the value 0.75.
+
+`X % Y'
+ Remainder. The quotient is rounded toward zero to an integer,
+ multiplied by Y and this result is subtracted from X. This
+ operation is sometimes known as "trunc-mod." The following
+ relation always holds:
+
+ b * int(a / b) + (a % b) == a
+
+ One possibly undesirable effect of this definition of remainder is
+ that `X % Y' is negative if X is negative. Thus,
+
+ -17 % 8 = -1
+
+ In other `awk' implementations, the signedness of the remainder
+ may be machine dependent.
+
+`X + Y'
+ Addition.
+
+`X - Y'
+ Subtraction.
+
+ For maximum portability, do not use the `**' operator.
+
+ Unary plus and minus have the same precedence, the multiplication
+operators all have the same precedence, and addition and subtraction
+have the same precedence.
+
+
+File: gawk.info, Node: Concatenation, Next: Assignment Ops, Prev: Arithmetic Ops, Up: Expressions
+
+String Concatenation
+====================
+
+ There is only one string operation: concatenation. It does not have
+a specific operator to represent it. Instead, concatenation is
+performed by writing expressions next to one another, with no operator.
+For example:
+
+ $ awk '{ print "Field number one: " $1 }' BBS-list
+ -| Field number one: aardvark
+ -| Field number one: alpo-net
+ ...
+
+ Without the space in the string constant after the `:', the line
+would run together. For example:
+
+ $ awk '{ print "Field number one:" $1 }' BBS-list
+ -| Field number one:aardvark
+ -| Field number one:alpo-net
+ ...
+
+ Since string concatenation does not have an explicit operator, it is
+often necessary to insure that it happens where you want it to by using
+parentheses to enclose the items to be concatenated. For example, the
+following code fragment does not concatenate `file' and `name' as you
+might expect:
+
+ file = "file"
+ name = "name"
+ print "something meaningful" > file name
+
+It is necessary to use the following:
+
+ print "something meaningful" > (file name)
+
+ We recommend that you use parentheses around concatenation in all
+but the most common contexts (such as on the right-hand side of `=').
+
+
+File: gawk.info, Node: Assignment Ops, Next: Increment Ops, Prev: Concatenation, Up: Expressions
+
+Assignment Expressions
+======================
+
+ An "assignment" is an expression that stores a new value into a
+variable. For example, let's assign the value one to the variable `z':
+
+ z = 1
+
+ After this expression is executed, the variable `z' has the value
+one. Whatever old value `z' had before the assignment is forgotten.
+
+ Assignments can store string values also. For example, this would
+store the value `"this food is good"' in the variable `message':
+
+ thing = "food"
+ predicate = "good"
+ message = "this " thing " is " predicate
+
+(This also illustrates string concatenation.)
+
+ The `=' sign is called an "assignment operator". It is the simplest
+assignment operator because the value of the right-hand operand is
+stored unchanged.
+
+ Most operators (addition, concatenation, and so on) have no effect
+except to compute a value. If you ignore the value, you might as well
+not use the operator. An assignment operator is different; it does
+produce a value, but even if you ignore the value, the assignment still
+makes itself felt through the alteration of the variable. We call this
+a "side effect".
+
+ The left-hand operand of an assignment need not be a variable (*note
+Variables::); it can also be a field (*note Changing the Contents of a
+Field: Changing Fields.) or an array element (*note Arrays in `awk':
+Arrays.). These are all called "lvalues", which means they can appear
+on the left-hand side of an assignment operator. The right-hand
+operand may be any expression; it produces the new value which the
+assignment stores in the specified variable, field or array element.
+(Such values are called "rvalues").
+
+ It is important to note that variables do _not_ have permanent types.
+The type of a variable is simply the type of whatever value it happens
+to hold at the moment. In the following program fragment, the variable
+`foo' has a numeric value at first, and a string value later on:
+
+ foo = 1
+ print foo
+ foo = "bar"
+ print foo
+
+When the second assignment gives `foo' a string value, the fact that it
+previously had a numeric value is forgotten.
+
+ String values that do not begin with a digit have a numeric value of
+zero. After executing this code, the value of `foo' is five:
+
+ foo = "a string"
+ foo = foo + 5
+
+(Note that using a variable as a number and then later as a string can
+be confusing and is poor programming style. The above examples
+illustrate how `awk' works, _not_ how you should write your own
+programs!)
+
+ An assignment is an expression, so it has a value: the same value
+that is assigned. Thus, `z = 1' as an expression has the value one.
+One consequence of this is that you can write multiple assignments
+together:
+
+ x = y = z = 0
+
+stores the value zero in all three variables. It does this because the
+value of `z = 0', which is zero, is stored into `y', and then the value
+of `y = z = 0', which is zero, is stored into `x'.
+
+ You can use an assignment anywhere an expression is called for. For
+example, it is valid to write `x != (y = 1)' to set `y' to one and then
+test whether `x' equals one. But this style tends to make programs
+hard to read; except in a one-shot program, you should not use such
+nesting of assignments.
+
+ Aside from `=', there are several other assignment operators that do
+arithmetic with the old value of the variable. For example, the
+operator `+=' computes a new value by adding the right-hand value to
+the old value of the variable. Thus, the following assignment adds
+five to the value of `foo':
+
+ foo += 5
+
+This is equivalent to the following:
+
+ foo = foo + 5
+
+Use whichever one makes the meaning of your program clearer.
+
+ There are situations where using `+=' (or any assignment operator)
+is _not_ the same as simply repeating the left-hand operand in the
+right-hand expression. For example:
+
+ # Thanks to Pat Rankin for this example
+ BEGIN {
+ foo[rand()] += 5
+ for (x in foo)
+ print x, foo[x]
+
+ bar[rand()] = bar[rand()] + 5
+ for (x in bar)
+ print x, bar[x]
+ }
+
+The indices of `bar' are guaranteed to be different, because `rand'
+will return different values each time it is called. (Arrays and the
+`rand' function haven't been covered yet. *Note Arrays in `awk':
+Arrays, and see *Note Numeric Built-in Functions: Numeric Functions,
+for more information). This example illustrates an important fact
+about the assignment operators: the left-hand expression is only
+evaluated _once_.
+
+ It is also up to the implementation as to which expression is
+evaluated first, the left-hand one or the right-hand one. Consider
+this example:
+
+ i = 1
+ a[i += 2] = i + 1
+
+The value of `a[3]' could be either two or four.
+
+ Here is a table of the arithmetic assignment operators. In each
+case, the right-hand operand is an expression whose value is converted
+to a number.
+
+`LVALUE += INCREMENT'
+ Adds INCREMENT to the value of LVALUE to make the new value of
+ LVALUE.
+
+`LVALUE -= DECREMENT'
+ Subtracts DECREMENT from the value of LVALUE.
+
+`LVALUE *= COEFFICIENT'
+ Multiplies the value of LVALUE by COEFFICIENT.
+
+`LVALUE /= DIVISOR'
+ Divides the value of LVALUE by DIVISOR.
+
+`LVALUE %= MODULUS'
+ Sets LVALUE to its remainder by MODULUS.
+
+`LVALUE ^= POWER'
+`LVALUE **= POWER'
+ Raises LVALUE to the power POWER. (Only the `^=' operator is
+ specified by POSIX.)
+
+ For maximum portability, do not use the `**=' operator.
+
+
+File: gawk.info, Node: Increment Ops, Next: Truth Values, Prev: Assignment Ops, Up: Expressions
+
+Increment and Decrement Operators
+=================================
+
+ "Increment" and "decrement operators" increase or decrease the value
+of a variable by one. You could do the same thing with an assignment
+operator, so the increment operators add no power to the `awk'
+language; but they are convenient abbreviations for very common
+operations.
+
+ The operator to add one is written `++'. It can be used to increment
+a variable either before or after taking its value.
+
+ To pre-increment a variable V, write `++V'. This adds one to the
+value of V and that new value is also the value of this expression.
+The assignment expression `V += 1' is completely equivalent.
+
+ Writing the `++' after the variable specifies post-increment. This
+increments the variable value just the same; the difference is that the
+value of the increment expression itself is the variable's _old_ value.
+Thus, if `foo' has the value four, then the expression `foo++' has the
+value four, but it changes the value of `foo' to five.
+
+ The post-increment `foo++' is nearly equivalent to writing `(foo +=
+1) - 1'. It is not perfectly equivalent because all numbers in `awk'
+are floating point: in floating point, `foo + 1 - 1' does not
+necessarily equal `foo'. But the difference is minute as long as you
+stick to numbers that are fairly small (less than 10e12).
+
+ Any lvalue can be incremented. Fields and array elements are
+incremented just like variables. (Use `$(i++)' when you wish to do a
+field reference and a variable increment at the same time. The
+parentheses are necessary because of the precedence of the field
+reference operator, `$'.)
+
+ The decrement operator `--' works just like `++' except that it
+subtracts one instead of adding. Like `++', it can be used before the
+lvalue to pre-decrement or after it to post-decrement.
+
+ Here is a summary of increment and decrement expressions.
+
+`++LVALUE'
+ This expression increments LVALUE and the new value becomes the
+ value of the expression.
+
+`LVALUE++'
+ This expression increments LVALUE, but the value of the expression
+ is the _old_ value of LVALUE.
+
+`--LVALUE'
+ Like `++LVALUE', but instead of adding, it subtracts. It
+ decrements LVALUE and delivers the value that results.
+
+`LVALUE--'
+ Like `LVALUE++', but instead of adding, it subtracts. It
+ decrements LVALUE. The value of the expression is the _old_ value
+ of LVALUE.
+
+
+File: gawk.info, Node: Truth Values, Next: Typing and Comparison, Prev: Increment Ops, Up: Expressions
+
+True and False in `awk'
+=======================
+
+ Many programming languages have a special representation for the
+concepts of "true" and "false." Such languages usually use the special
+constants `true' and `false', or perhaps their upper-case equivalents.
+
+ `awk' is different. It borrows a very simple concept of true and
+false from C. In `awk', any non-zero numeric value, _or_ any non-empty
+string value is true. Any other value (zero or the null string, `""')
+is false. The following program will print `A strange truth value'
+three times:
+
+ BEGIN {
+ if (3.1415927)
+ print "A strange truth value"
+ if ("Four Score And Seven Years Ago")
+ print "A strange truth value"
+ if (j = 57)
+ print "A strange truth value"
+ }
+
+ There is a surprising consequence of the "non-zero or non-null" rule:
+The string constant `"0"' is actually true, since it is non-null (d.c.).
+
+
+File: gawk.info, Node: Typing and Comparison, Next: Boolean Ops, Prev: Truth Values, Up: Expressions
+
+Variable Typing and Comparison Expressions
+==========================================
+
+ Unlike other programming languages, `awk' variables do not have a
+fixed type. Instead, they can be either a number or a string, depending
+upon the value that is assigned to them.
+
+ The 1992 POSIX standard introduced the concept of a "numeric
+string", which is simply a string that looks like a number, for
+example, `" +2"'. This concept is used for determining the type of a
+variable.
+
+ The type of the variable is important, since the types of two
+variables determine how they are compared.
+
+ In `gawk', variable typing follows these rules.
+
+ 1. A numeric literal or the result of a numeric operation has the
+ NUMERIC attribute.
+
+ 2. A string literal or the result of a string operation has the STRING
+ attribute.
+
+ 3. Fields, `getline' input, `FILENAME', `ARGV' elements, `ENVIRON'
+ elements and the elements of an array created by `split' that are
+ numeric strings have the STRNUM attribute. Otherwise, they have
+ the STRING attribute. Uninitialized variables also have the
+ STRNUM attribute.
+
+ 4. Attributes propagate across assignments, but are not changed by
+ any use.
+
+ The last rule is particularly important. In the following program,
+`a' has numeric type, even though it is later used in a string
+operation.
+
+ BEGIN {
+ a = 12.345
+ b = a " is a cute number"
+ print b
+ }
+
+ When two operands are compared, either string comparison or numeric
+comparison may be used, depending on the attributes of the operands,
+according to the following, symmetric, matrix:
+
+ +----------------------------------------------
+ | STRING NUMERIC STRNUM
+ --------+----------------------------------------------
+ |
+ STRING | string string string
+ |
+ NUMERIC | string numeric numeric
+ |
+ STRNUM | string numeric numeric
+ --------+----------------------------------------------
+
+ The basic idea is that user input that looks numeric, and _only_
+user input, should be treated as numeric, even though it is actually
+made of characters, and is therefore also a string.
+
+ "Comparison expressions" compare strings or numbers for
+relationships such as equality. They are written using "relational
+operators", which are a superset of those in C. Here is a table of
+them:
+
+`X < Y'
+ True if X is less than Y.
+
+`X <= Y'
+ True if X is less than or equal to Y.
+
+`X > Y'
+ True if X is greater than Y.
+
+`X >= Y'
+ True if X is greater than or equal to Y.
+
+`X == Y'
+ True if X is equal to Y.
+
+`X != Y'
+ True if X is not equal to Y.
+
+`X ~ Y'
+ True if the string X matches the regexp denoted by Y.
+
+`X !~ Y'
+ True if the string X does not match the regexp denoted by Y.
+
+`SUBSCRIPT in ARRAY'
+ True if the array ARRAY has an element with the subscript
+ SUBSCRIPT.
+
+ Comparison expressions have the value one if true and zero if false.
+
+ When comparing operands of mixed types, numeric operands are
+converted to strings using the value of `CONVFMT' (*note Conversion of
+Strings and Numbers: Conversion.).
+
+ Strings are compared by comparing the first character of each, then
+the second character of each, and so on. Thus `"10"' is less than
+`"9"'. If there are two strings where one is a prefix of the other,
+the shorter string is less than the longer one. Thus `"abc"' is less
+than `"abcd"'.
+
+ It is very easy to accidentally mistype the `==' operator, and leave
+off one of the `='s. The result is still valid `awk' code, but the
+program will not do what you mean:
+
+ if (a = b) # oops! should be a == b
+ ...
+ else
+ ...
+
+Unless `b' happens to be zero or the null string, the `if' part of the
+test will always succeed. Because the operators are so similar, this
+kind of error is very difficult to spot when scanning the source code.
+
+ Here are some sample expressions, how `gawk' compares them, and what
+the result of the comparison is.
+
+`1.5 <= 2.0'
+ numeric comparison (true)
+
+`"abc" >= "xyz"'
+ string comparison (false)
+
+`1.5 != " +2"'
+ string comparison (true)
+
+`"1e2" < "3"'
+ string comparison (true)
+
+`a = 2; b = "2"'
+`a == b'
+ string comparison (true)
+
+`a = 2; b = " +2"'
+`a == b'
+ string comparison (false)
+
+ In this example,
+
+ $ echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'
+ -| false
+
+the result is `false' since both `$1' and `$2' are numeric strings and
+thus both have the STRNUM attribute, dictating a numeric comparison.
+
+ The purpose of the comparison rules and the use of numeric strings is
+to attempt to produce the behavior that is "least surprising," while
+still "doing the right thing."
+
+ String comparisons and regular expression comparisons are very
+different. For example,
+
+ x == "foo"
+
+has the value of one, or is true, if the variable `x' is precisely
+`foo'. By contrast,
+
+ x ~ /foo/
+
+has the value one if `x' contains `foo', such as `"Oh, what a fool am
+I!"'.
+
+ The right hand operand of the `~' and `!~' operators may be either a
+regexp constant (`/.../'), or an ordinary expression, in which case the
+value of the expression as a string is used as a dynamic regexp (*note
+How to Use Regular Expressions: Regexp Usage.; also *note Using Dynamic
+Regexps: Computed Regexps.).
+
+ In recent implementations of `awk', a constant regular expression in
+slashes by itself is also an expression. The regexp `/REGEXP/' is an
+abbreviation for this comparison expression:
+
+ $0 ~ /REGEXP/
+
+ One special place where `/foo/' is _not_ an abbreviation for `$0 ~
+/foo/' is when it is the right-hand operand of `~' or `!~'! *Note
+Using Regular Expression Constants: Using Constant Regexps, where this
+is discussed in more detail.
+
+
+File: gawk.info, Node: Boolean Ops, Next: Conditional Exp, Prev: Typing and Comparison, Up: Expressions
+
+Boolean Expressions
+===================
+
+ A "boolean expression" is a combination of comparison expressions or
+matching expressions, using the boolean operators "or" (`||'), "and"
+(`&&'), and "not" (`!'), along with parentheses to control nesting.
+The truth value of the boolean expression is computed by combining the
+truth values of the component expressions. Boolean expressions are
+also referred to as "logical expressions". The terms are equivalent.
+
+ Boolean expressions can be used wherever comparison and matching
+expressions can be used. They can be used in `if', `while', `do' and
+`for' statements (*note Control Statements in Actions: Statements.).
+They have numeric values (one if true, zero if false), which come into
+play if the result of the boolean expression is stored in a variable, or
+used in arithmetic.
+
+ In addition, every boolean expression is also a valid pattern, so
+you can use one as a pattern to control the execution of rules.
+
+ Here are descriptions of the three boolean operators, with examples.
+
+`BOOLEAN1 && BOOLEAN2'
+ True if both BOOLEAN1 and BOOLEAN2 are true. For example, the
+ following statement prints the current input record if it contains
+ both `2400' and `foo'.
+
+ if ($0 ~ /2400/ && $0 ~ /foo/) print
+
+ The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true.
+ This can make a difference when BOOLEAN2 contains expressions that
+ have side effects: in the case of `$0 ~ /foo/ && ($2 == bar++)',
+ the variable `bar' is not incremented if there is no `foo' in the
+ record.
+
+`BOOLEAN1 || BOOLEAN2'
+ True if at least one of BOOLEAN1 or BOOLEAN2 is true. For
+ example, the following statement prints all records in the input
+ that contain _either_ `2400' or `foo', or both.
+
+ if ($0 ~ /2400/ || $0 ~ /foo/) print
+
+ The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false.
+ This can make a difference when BOOLEAN2 contains expressions
+ that have side effects.
+
+`! BOOLEAN'
+ True if BOOLEAN is false. For example, the following program
+ prints all records in the input file `BBS-list' that do _not_
+ contain the string `foo'.
+
+ awk '{ if (! ($0 ~ /foo/)) print }' BBS-list
+
+ The `&&' and `||' operators are called "short-circuit" operators
+because of the way they work. Evaluation of the full expression is
+"short-circuited" if the result can be determined part way through its
+evaluation.
+
+ You can continue a statement that uses `&&' or `||' simply by
+putting a newline after them. But you cannot put a newline in front of
+either of these operators without using backslash continuation (*note
+`awk' Statements Versus Lines: Statements/Lines.).
+
+ The actual value of an expression using the `!' operator will be
+either one or zero, depending upon the truth value of the expression it
+is applied to.
+
+ The `!' operator is often useful for changing the sense of a flag
+variable from false to true and back again. For example, the following
+program is one way to print lines in between special bracketing lines:
+
+ $1 == "START" { interested = ! interested }
+ interested == 1 { print }
+ $1 == "END" { interested = ! interested }
+
+The variable `interested', like all `awk' variables, starts out
+initialized to zero, which is also false. When a line is seen whose
+first field is `START', the value of `interested' is toggled to true,
+using `!'. The next rule prints lines as long as `interested' is true.
+When a line is seen whose first field is `END', `interested' is toggled
+back to false.
+
+
+File: gawk.info, Node: Conditional Exp, Next: Function Calls, Prev: Boolean Ops, Up: Expressions
+
+Conditional Expressions
+=======================
+
+ A "conditional expression" is a special kind of expression with
+three operands. It allows you to use one expression's value to select
+one of two other expressions.
+
+ The conditional expression is the same as in the C language:
+
+ SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP
+
+There are three subexpressions. The first, SELECTOR, is always
+computed first. If it is "true" (not zero and not null) then
+IF-TRUE-EXP is computed next and its value becomes the value of the
+whole expression. Otherwise, IF-FALSE-EXP is computed next and its
+value becomes the value of the whole expression.
+
+ For example, this expression produces the absolute value of `x':
+
+ x > 0 ? x : -x
+
+ Each time the conditional expression is computed, exactly one of
+IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This
+is important when the expressions contain side effects. For example,
+this conditional expression examines element `i' of either array `a' or
+array `b', and increments `i'.
+
+ x == y ? a[i++] : b[i++]
+
+This is guaranteed to increment `i' exactly once, because each time
+only one of the two increment expressions is executed, and the other is
+not. *Note Arrays in `awk': Arrays, for more information about arrays.
+
+ As a minor `gawk' extension, you can continue a statement that uses
+`?:' simply by putting a newline after either character. However, you
+cannot put a newline in front of either character without using
+backslash continuation (*note `awk' Statements Versus Lines:
+Statements/Lines.).
+
+
+File: gawk.info, Node: Function Calls, Next: Precedence, Prev: Conditional Exp, Up: Expressions
+
+Function Calls
+==============
+
+ A "function" is a name for a particular calculation. Because it has
+a name, you can ask for it by name at any point in the program. For
+example, the function `sqrt' computes the square root of a number.
+
+ A fixed set of functions are "built-in", which means they are
+available in every `awk' program. The `sqrt' function is one of these.
+*Note Built-in Functions: Built-in, for a list of built-in functions
+and their descriptions. In addition, you can define your own functions
+for use in your program. *Note User-defined Functions: User-defined,
+for how to do this.
+
+ The way to use a function is with a "function call" expression,
+which consists of the function name followed immediately by a list of
+"arguments" in parentheses. The arguments are expressions which
+provide the raw materials for the function's calculations. When there
+is more than one argument, they are separated by commas. If there are
+no arguments, write just `()' after the function name. Here are some
+examples:
+
+ sqrt(x^2 + y^2) one argument
+ atan2(y, x) two arguments
+ rand() no arguments
+
+ *Do not put any space between the function name and the
+open-parenthesis!* A user-defined function name looks just like the
+name of a variable, and space would make the expression look like
+concatenation of a variable with an expression inside parentheses.
+Space before the parenthesis is harmless with built-in functions, but
+it is best not to get into the habit of using space to avoid mistakes
+with user-defined functions.
+
+ Each function expects a particular number of arguments. For
+example, the `sqrt' function must be called with a single argument, the
+number to take the square root of:
+
+ sqrt(ARGUMENT)
+
+ Some of the built-in functions allow you to omit the final argument.
+If you do so, they use a reasonable default. *Note Built-in Functions:
+Built-in, for full details. If arguments are omitted in calls to
+user-defined functions, then those arguments are treated as local
+variables, initialized to the empty string (*note User-defined
+Functions: User-defined.).
+
+ Like every other expression, the function call has a value, which is
+computed by the function based on the arguments you give it. In this
+example, the value of `sqrt(ARGUMENT)' is the square root of ARGUMENT.
+A function can also have side effects, such as assigning values to
+certain variables or doing I/O.
+
+ Here is a command to read numbers, one number per line, and print the
+square root of each one:
+
+ $ awk '{ print "The square root of", $1, "is", sqrt($1) }'
+ 1
+ -| The square root of 1 is 1
+ 3
+ -| The square root of 3 is 1.73205
+ 5
+ -| The square root of 5 is 2.23607
+ Control-d
+
+
+File: gawk.info, Node: Precedence, Prev: Function Calls, Up: Expressions
+
+Operator Precedence (How Operators Nest)
+========================================
+
+ "Operator precedence" determines how operators are grouped, when
+different operators appear close by in one expression. For example,
+`*' has higher precedence than `+'; thus, `a + b * c' means to multiply
+`b' and `c', and then add `a' to the product (i.e. `a + (b * c)').
+
+ You can overrule the precedence of the operators by using
+parentheses. You can think of the precedence rules as saying where the
+parentheses are assumed to be if you do not write parentheses yourself.
+In fact, it is wise to always use parentheses whenever you have an
+unusual combination of operators, because other people who read the
+program may not remember what the precedence is in this case. You
+might forget, too; then you could make a mistake. Explicit parentheses
+will help prevent any such mistake.
+
+ When operators of equal precedence are used together, the leftmost
+operator groups first, except for the assignment, conditional and
+exponentiation operators, which group in the opposite order. Thus, `a
+- b + c' groups as `(a - b) + c', and `a = b = c' groups as `a = (b =
+c)'.
+
+ The precedence of prefix unary operators does not matter as long as
+only unary operators are involved, because there is only one way to
+interpret them--innermost first. Thus, `$++i' means `$(++i)' and
+`++$x' means `++($x)'. However, when another operator follows the
+operand, then the precedence of the unary operators can matter. Thus,
+`$x^2' means `($x)^2', but `-x^2' means `-(x^2)', because `-' has lower
+precedence than `^' while `$' has higher precedence.
+
+ Here is a table of `awk''s operators, in order from highest
+precedence to lowest:
+
+`(...)'
+ Grouping.
+
+`$'
+ Field.
+
+`++ --'
+ Increment, decrement.
+
+`^ **'
+ Exponentiation. These operators group right-to-left. (The `**'
+ operator is not specified by POSIX.)
+
+`+ - !'
+ Unary plus, minus, logical "not".
+
+`* / %'
+ Multiplication, division, modulus.
+
+`+ -'
+ Addition, subtraction.
+
+`Concatenation'
+ No special token is used to indicate concatenation. The operands
+ are simply written side by side.
+
+`< <= == !='
+`> >= >> |'
+ Relational, and redirection. The relational operators and the
+ redirections have the same precedence level. Characters such as
+ `>' serve both as relationals and as redirections; the context
+ distinguishes between the two meanings.
+
+ Note that the I/O redirection operators in `print' and `printf'
+ statements belong to the statement level, not to expressions. The
+ redirection does not produce an expression which could be the
+ operand of another operator. As a result, it does not make sense
+ to use a redirection operator near another operator of lower
+ precedence, without parentheses. Such combinations, for example
+ `print foo > a ? b : c', result in syntax errors. The correct way
+ to write this statement is `print foo > (a ? b : c)'.
+
+`~ !~'
+ Matching, non-matching.
+
+`in'
+ Array membership.
+
+`&&'
+ Logical "and".
+
+`||'
+ Logical "or".
+
+`?:'
+ Conditional. This operator groups right-to-left.
+
+`= += -= *='
+`/= %= ^= **='
+ Assignment. These operators group right-to-left. (The `**='
+ operator is not specified by POSIX.)
+
+
+File: gawk.info, Node: Patterns and Actions, Next: Statements, Prev: Expressions, Up: Top
+
+Patterns and Actions
+********************
+
+ As you have already seen, each `awk' statement consists of a pattern
+with an associated action. This chapter describes how you build
+patterns and actions.
+
+* Menu:
+
+* Pattern Overview:: What goes into a pattern.
+* Action Overview:: What goes into an action.
+
+
+File: gawk.info, Node: Pattern Overview, Next: Action Overview, Prev: Patterns and Actions, Up: Patterns and Actions
+
+Pattern Elements
+================
+
+ Patterns in `awk' control the execution of rules: a rule is executed
+when its pattern matches the current input record. This section
+explains all about how to write patterns.
+
+* Menu:
+
+* Kinds of Patterns:: A list of all kinds of patterns.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Empty:: The empty pattern, which matches every record.
+
+
+File: gawk.info, Node: Kinds of Patterns, Next: Regexp Patterns, Prev: Pattern Overview, Up: Pattern Overview
+
+Kinds of Patterns
+-----------------
+
+ Here is a summary of the types of patterns supported in `awk'.
+
+`/REGULAR EXPRESSION/'
+ A regular expression as a pattern. It matches when the text of the
+ input record fits the regular expression. (*Note Regular
+ Expressions: Regexp.)
+
+`EXPRESSION'
+ A single expression. It matches when its value is non-zero (if a
+ number) or non-null (if a string). (*Note Expressions as
+ Patterns: Expression Patterns.)
+
+`PAT1, PAT2'
+ A pair of patterns separated by a comma, specifying a range of
+ records. The range includes both the initial record that matches
+ PAT1, and the final record that matches PAT2. (*Note Specifying
+ Record Ranges with Patterns: Ranges.)
+
+`BEGIN'
+`END'
+ Special patterns for you to supply start-up or clean-up actions
+ for your `awk' program. (*Note The `BEGIN' and `END' Special
+ Patterns: BEGIN/END.)
+
+`EMPTY'
+ The empty pattern matches every input record. (*Note The Empty
+ Pattern: Empty.)
+
+
+File: gawk.info, Node: Regexp Patterns, Next: Expression Patterns, Prev: Kinds of Patterns, Up: Pattern Overview
+
+Regular Expressions as Patterns
+-------------------------------
+
+ We have been using regular expressions as patterns since our early
+examples. This kind of pattern is simply a regexp constant in the
+pattern part of a rule. Its meaning is `$0 ~ /PATTERN/'. The pattern
+matches when the input record matches the regexp. For example:
+
+ /foo|bar|baz/ { buzzwords++ }
+ END { print buzzwords, "buzzwords seen" }
+
+
+File: gawk.info, Node: Expression Patterns, Next: Ranges, Prev: Regexp Patterns, Up: Pattern Overview
+
+Expressions as Patterns
+-----------------------
+
+ Any `awk' expression is valid as an `awk' pattern. Then the pattern
+matches if the expression's value is non-zero (if a number) or non-null
+(if a string).
+
+ The expression is reevaluated each time the rule is tested against a
+new input record. If the expression uses fields such as `$1', the
+value depends directly on the new input record's text; otherwise, it
+depends only on what has happened so far in the execution of the `awk'
+program, but that may still be useful.
+
+ A very common kind of expression used as a pattern is the comparison
+expression, using the comparison operators described in *Note Variable
+Typing and Comparison Expressions: Typing and Comparison.
+
+ Regexp matching and non-matching are also very common expressions.
+The left operand of the `~' and `!~' operators is a string. The right
+operand is either a constant regular expression enclosed in slashes
+(`/REGEXP/'), or any expression, whose string value is used as a
+dynamic regular expression (*note Using Dynamic Regexps: Computed
+Regexps.).
+
+ The following example prints the second field of each input record
+whose first field is precisely `foo'.
+
+ $ awk '$1 == "foo" { print $2 }' BBS-list
+
+(There is no output, since there is no BBS site named "foo".) Contrast
+this with the following regular expression match, which would accept
+any record with a first field that contains `foo':
+
+ $ awk '$1 ~ /foo/ { print $2 }' BBS-list
+ -| 555-1234
+ -| 555-6699
+ -| 555-6480
+ -| 555-2127
+
+ Boolean expressions are also commonly used as patterns. Whether the
+pattern matches an input record depends on whether its subexpressions
+match.
+
+ For example, the following command prints all records in `BBS-list'
+that contain both `2400' and `foo'.
+
+ $ awk '/2400/ && /foo/' BBS-list
+ -| fooey 555-1234 2400/1200/300 B
+
+ The following command prints all records in `BBS-list' that contain
+_either_ `2400' or `foo', or both.
+
+ $ awk '/2400/ || /foo/' BBS-list
+ -| alpo-net 555-3412 2400/1200/300 A
+ -| bites 555-1675 2400/1200/300 A
+ -| fooey 555-1234 2400/1200/300 B
+ -| foot 555-6699 1200/300 B
+ -| macfoo 555-6480 1200/300 A
+ -| sdace 555-3430 2400/1200/300 A
+ -| sabafoo 555-2127 1200/300 C
+
+ The following command prints all records in `BBS-list' that do _not_
+contain the string `foo'.
+
+ $ awk '! /foo/' BBS-list
+ -| aardvark 555-5553 1200/300 B
+ -| alpo-net 555-3412 2400/1200/300 A
+ -| barfly 555-7685 1200/300 A
+ -| bites 555-1675 2400/1200/300 A
+ -| camelot 555-0542 300 C
+ -| core 555-2912 1200/300 C
+ -| sdace 555-3430 2400/1200/300 A
+
+ The subexpressions of a boolean operator in a pattern can be
+constant regular expressions, comparisons, or any other `awk'
+expressions. Range patterns are not expressions, so they cannot appear
+inside boolean patterns. Likewise, the special patterns `BEGIN' and
+`END', which never match any input record, are not expressions and
+cannot appear inside boolean patterns.
+
+ A regexp constant as a pattern is also a special case of an
+expression pattern. `/foo/' as an expression has the value one if `foo'
+appears in the current input record; thus, as a pattern, `/foo/'
+matches any record containing `foo'.
+
+
+File: gawk.info, Node: Ranges, Next: BEGIN/END, Prev: Expression Patterns, Up: Pattern Overview
+
+Specifying Record Ranges with Patterns
+--------------------------------------
+
+ A "range pattern" is made of two patterns separated by a comma, of
+the form `BEGPAT, ENDPAT'. It matches ranges of consecutive input
+records. The first pattern, BEGPAT, controls where the range begins,
+and the second one, ENDPAT, controls where it ends. For example,
+
+ awk '$1 == "on", $1 == "off"'
+
+prints every record between `on'/`off' pairs, inclusive.
+
+ A range pattern starts out by matching BEGPAT against every input
+record; when a record matches BEGPAT, the range pattern becomes "turned
+on". The range pattern matches this record. As long as it stays
+turned on, it automatically matches every input record read. It also
+matches ENDPAT against every input record; when that succeeds, the
+range pattern is turned off again for the following record. Then it
+goes back to checking BEGPAT against each record.
+
+ The record that turns on the range pattern and the one that turns it
+off both match the range pattern. If you don't want to operate on
+these records, you can write `if' statements in the rule's action to
+distinguish them from the records you are interested in.
+
+ It is possible for a pattern to be turned both on and off by the same
+record, if the record satisfies both conditions. Then the action is
+executed for just that record.
+
+ For example, suppose you have text between two identical markers (say
+the `%' symbol) that you wish to ignore. You might try to combine a
+range pattern that describes the delimited text with the `next'
+statement (not discussed yet, *note The `next' Statement: Next
+Statement.), which causes `awk' to skip any further processing of the
+current record and start over again with the next input record. Such a
+program would like this:
+
+ /^%$/,/^%$/ { next }
+ { print }
+
+This program fails because the range pattern is both turned on and
+turned off by the first line with just a `%' on it. To accomplish this
+task, you must write the program this way, using a flag:
+
+ /^%$/ { skip = ! skip; next }
+ skip == 1 { next } # skip lines with `skip' set
+
+ Note that in a range pattern, the `,' has the lowest precedence (is
+evaluated last) of all the operators. Thus, for example, the following
+program attempts to combine a range pattern with another, simpler test.
+
+ echo Yes | awk '/1/,/2/ || /Yes/'
+
+ The author of this program intended it to mean `(/1/,/2/) || /Yes/'.
+However, `awk' interprets this as `/1/, (/2/ || /Yes/)'. This cannot
+be changed or worked around; range patterns do not combine with other
+patterns.
+
+
+File: gawk.info, Node: BEGIN/END, Next: Empty, Prev: Ranges, Up: Pattern Overview
+
+The `BEGIN' and `END' Special Patterns
+--------------------------------------
+
+ `BEGIN' and `END' are special patterns. They are not used to match
+input records. Rather, they supply start-up or clean-up actions for
+your `awk' script.
+
+* Menu:
+
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+
+
+File: gawk.info, Node: Using BEGIN/END, Next: I/O And BEGIN/END, Prev: BEGIN/END, Up: BEGIN/END
+
+Startup and Cleanup Actions
+...........................
+
+ A `BEGIN' rule is executed, once, before the first input record has
+been read. An `END' rule is executed, once, after all the input has
+been read. For example:
+
+ $ awk '
+ > BEGIN { print "Analysis of \"foo\"" }
+ > /foo/ { ++n }
+ > END { print "\"foo\" appears " n " times." }' BBS-list
+ -| Analysis of "foo"
+ -| "foo" appears 4 times.
+
+ This program finds the number of records in the input file `BBS-list'
+that contain the string `foo'. The `BEGIN' rule prints a title for the
+report. There is no need to use the `BEGIN' rule to initialize the
+counter `n' to zero, as `awk' does this automatically (*note
+Variables::).
+
+ The second rule increments the variable `n' every time a record
+containing the pattern `foo' is read. The `END' rule prints the value
+of `n' at the end of the run.
+
+ The special patterns `BEGIN' and `END' cannot be used in ranges or
+with boolean operators (indeed, they cannot be used with any operators).
+
+ An `awk' program may have multiple `BEGIN' and/or `END' rules. They
+are executed in the order they appear, all the `BEGIN' rules at
+start-up and all the `END' rules at termination. `BEGIN' and `END'
+rules may be intermixed with other rules. This feature was added in
+the 1987 version of `awk', and is included in the POSIX standard. The
+original (1978) version of `awk' required you to put the `BEGIN' rule
+at the beginning of the program, and the `END' rule at the end, and
+only allowed one of each. This is no longer required, but it is a good
+idea in terms of program organization and readability.
+
+ Multiple `BEGIN' and `END' rules are useful for writing library
+functions, since each library file can have its own `BEGIN' and/or
+`END' rule to do its own initialization and/or cleanup. Note that the
+order in which library functions are named on the command line controls
+the order in which their `BEGIN' and `END' rules are executed.
+Therefore you have to be careful to write such rules in library files
+so that the order in which they are executed doesn't matter. *Note
+Command Line Options: Options, for more information on using library
+functions. *Note A Library of `awk' Functions: Library Functions, for
+a number of useful library functions.
+
+ If an `awk' program only has a `BEGIN' rule, and no other rules,
+then the program exits after the `BEGIN' rule has been run. (The
+original version of `awk' used to keep reading and ignoring input until
+end of file was seen.) However, if an `END' rule exists, then the
+input will be read, even if there are no other rules in the program.
+This is necessary in case the `END' rule checks the `FNR' and `NR'
+variables (d.c.).
+
+ `BEGIN' and `END' rules must have actions; there is no default
+action for these rules since there is no current record when they run.
+
+
+File: gawk.info, Node: I/O And BEGIN/END, Prev: Using BEGIN/END, Up: BEGIN/END
+
+Input/Output from `BEGIN' and `END' Rules
+.........................................
+
+ There are several (sometimes subtle) issues involved when doing I/O
+from a `BEGIN' or `END' rule.
+
+ The first has to do with the value of `$0' in a `BEGIN' rule. Since
+`BEGIN' rules are executed before any input is read, there simply is no
+input record, and therefore no fields, when executing `BEGIN' rules.
+References to `$0' and the fields yield a null string or zero,
+depending upon the context. One way to give `$0' a real value is to
+execute a `getline' command without a variable (*note Explicit Input
+with `getline': Getline.). Another way is to simply assign a value to
+it.
+
+ The second point is similar to the first, but from the other
+direction. Inside an `END' rule, what is the value of `$0' and `NF'?
+Traditionally, due largely to implementation issues, `$0' and `NF' were
+_undefined_ inside an `END' rule. The POSIX standard specified that
+`NF' was available in an `END' rule, containing the number of fields
+from the last input record. Due most probably to an oversight, the
+standard does not say that `$0' is also preserved, although logically
+one would think that it should be. In fact, `gawk' does preserve the
+value of `$0' for use in `END' rules. Be aware, however, that Unix
+`awk', and possibly other implementations, do not.
+
+ The third point follows from the first two. What is the meaning of
+`print' inside a `BEGIN' or `END' rule? The meaning is the same as
+always, `print $0'. If `$0' is the null string, then this prints an
+empty line. Many long time `awk' programmers use `print' in `BEGIN'
+and `END' rules, to mean `print ""', relying on `$0' being null. While
+you might generally get away with this in `BEGIN' rules, in `gawk' at
+least, it is a very bad idea in `END' rules. It is also poor style,
+since if you want an empty line in the output, you should say so
+explicitly in your program.
+
+
+File: gawk.info, Node: Empty, Prev: BEGIN/END, Up: Pattern Overview
+
+The Empty Pattern
+-----------------
+
+ An empty (i.e. non-existent) pattern is considered to match _every_
+input record. For example, the program:
+
+ awk '{ print $1 }' BBS-list
+
+prints the first field of every record.
+
+
+File: gawk.info, Node: Action Overview, Prev: Pattern Overview, Up: Patterns and Actions
+
+Overview of Actions
+===================
+
+ An `awk' program or script consists of a series of rules and
+function definitions, interspersed. (Functions are described later.
+*Note User-defined Functions: User-defined.)
+
+ A rule contains a pattern and an action, either of which (but not
+both) may be omitted. The purpose of the "action" is to tell `awk'
+what to do once a match for the pattern is found. Thus, in outline, an
+`awk' program generally looks like this:
+
+ [PATTERN] [{ ACTION }]
+ [PATTERN] [{ ACTION }]
+ ...
+ function NAME(ARGS) { ... }
+ ...
+
+ An action consists of one or more `awk' "statements", enclosed in
+curly braces (`{' and `}'). Each statement specifies one thing to be
+done. The statements are separated by newlines or semicolons.
+
+ The curly braces around an action must be used even if the action
+contains only one statement, or even if it contains no statements at
+all. However, if you omit the action entirely, omit the curly braces as
+well. An omitted action is equivalent to `{ print $0 }'.
+
+ /foo/ { } # match foo, do nothing - empty action
+ /foo/ # match foo, print the record - omitted action
+
+ Here are the kinds of statements supported in `awk':
+
+ * Expressions, which can call functions or assign values to variables
+ (*note Expressions::). Executing this kind of statement simply
+ computes the value of the expression. This is useful when the
+ expression has side effects (*note Assignment Expressions:
+ Assignment Ops.).
+
+ * Control statements, which specify the control flow of `awk'
+ programs. The `awk' language gives you C-like constructs (`if',
+ `for', `while', and `do') as well as a few special ones (*note
+ Control Statements in Actions: Statements.).
+
+ * Compound statements, which consist of one or more statements
+ enclosed in curly braces. A compound statement is used in order
+ to put several statements together in the body of an `if',
+ `while', `do' or `for' statement.
+
+ * Input statements, using the `getline' command (*note Explicit
+ Input with `getline': Getline.), the `next' statement (*note The
+ `next' Statement: Next Statement.), and the `nextfile' statement
+ (*note The `nextfile' Statement: Nextfile Statement.).
+
+ * Output statements, `print' and `printf'. *Note Printing Output:
+ Printing.
+
+ * Deletion statements, for deleting array elements. *Note The
+ `delete' Statement: Delete.
+
+
+File: gawk.info, Node: Statements, Next: Built-in Variables, Prev: Patterns and Actions, Up: Top
+
+Control Statements in Actions
+*****************************
+
+ "Control statements" such as `if', `while', and so on control the
+flow of execution in `awk' programs. Most of the control statements in
+`awk' are patterned on similar statements in C.
+
+ All the control statements start with special keywords such as `if'
+and `while', to distinguish them from simple expressions.
+
+ Many control statements contain other statements; for example, the
+`if' statement contains another statement which may or may not be
+executed. The contained statement is called the "body". If you want
+to include more than one statement in the body, group them into a
+single "compound statement" with curly braces, separating them with
+newlines or semicolons.
+
+* Menu:
+
+* If Statement:: Conditionally execute some `awk'
+ statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until some
+ condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Break Statement:: Immediately exit the innermost enclosing loop.
+* Continue Statement:: Skip to the end of the innermost enclosing
+ loop.
+* Next Statement:: Stop processing the current input record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of `awk'.
+
+
+File: gawk.info, Node: If Statement, Next: While Statement, Prev: Statements, Up: Statements
+
+The `if'-`else' Statement
+=========================
+
+ The `if'-`else' statement is `awk''s decision-making statement. It
+looks like this:
+
+ if (CONDITION) THEN-BODY [else ELSE-BODY]
+
+The CONDITION is an expression that controls what the rest of the
+statement will do. If CONDITION is true, THEN-BODY is executed;
+otherwise, ELSE-BODY is executed. The `else' part of the statement is
+optional. The condition is considered false if its value is zero or
+the null string, and true otherwise.
+
+ Here is an example:
+
+ if (x % 2 == 0)
+ print "x is even"
+ else
+ print "x is odd"
+
+ In this example, if the expression `x % 2 == 0' is true (that is,
+the value of `x' is evenly divisible by two), then the first `print'
+statement is executed, otherwise the second `print' statement is
+executed.
+
+ If the `else' appears on the same line as THEN-BODY, and THEN-BODY
+is not a compound statement (i.e. not surrounded by curly braces), then
+a semicolon must separate THEN-BODY from `else'. To illustrate this,
+let's rewrite the previous example:
+
+ if (x % 2 == 0) print "x is even"; else
+ print "x is odd"
+
+If you forget the `;', `awk' won't be able to interpret the statement,
+and you will get a syntax error.
+
+ We would not actually write this example this way, because a human
+reader might fail to see the `else' if it were not the first thing on
+its line.
+
+
+File: gawk.info, Node: While Statement, Next: Do Statement, Prev: If Statement, Up: Statements
+
+The `while' Statement
+=====================
+
+ In programming, a "loop" means a part of a program that can be
+executed two or more times in succession.
+
+ The `while' statement is the simplest looping statement in `awk'.
+It repeatedly executes a statement as long as a condition is true. It
+looks like this:
+
+ while (CONDITION)
+ BODY
+
+Here BODY is a statement that we call the "body" of the loop, and
+CONDITION is an expression that controls how long the loop keeps
+running.
+
+ The first thing the `while' statement does is test CONDITION. If
+CONDITION is true, it executes the statement BODY. (The CONDITION is
+true when the value is not zero and not a null string.) After BODY has
+been executed, CONDITION is tested again, and if it is still true, BODY
+is executed again. This process repeats until CONDITION is no longer
+true. If CONDITION is initially false, the body of the loop is never
+executed, and `awk' continues with the statement following the loop.
+
+ This example prints the first three fields of each record, one per
+line.
+
+ awk '{ i = 1
+ while (i <= 3) {
+ print $i
+ i++
+ }
+ }' inventory-shipped
+
+Here the body of the loop is a compound statement enclosed in braces,
+containing two statements.
+
+ The loop works like this: first, the value of `i' is set to one.
+Then, the `while' tests whether `i' is less than or equal to three.
+This is true when `i' equals one, so the `i'-th field is printed. Then
+the `i++' increments the value of `i' and the loop repeats. The loop
+terminates when `i' reaches four.
+
+ As you can see, a newline is not required between the condition and
+the body; but using one makes the program clearer unless the body is a
+compound statement or is very simple. The newline after the open-brace
+that begins the compound statement is not required either, but the
+program would be harder to read without it.
+
+
+File: gawk.info, Node: Do Statement, Next: For Statement, Prev: While Statement, Up: Statements
+
+The `do'-`while' Statement
+==========================
+
+ The `do' loop is a variation of the `while' looping statement. The
+`do' loop executes the BODY once, and then repeats BODY as long as
+CONDITION is true. It looks like this:
+
+ do
+ BODY
+ while (CONDITION)
+
+ Even if CONDITION is false at the start, BODY is executed at least
+once (and only once, unless executing BODY makes CONDITION true).
+Contrast this with the corresponding `while' statement:
+
+ while (CONDITION)
+ BODY
+
+This statement does not execute BODY even once if CONDITION is false to
+begin with.
+
+ Here is an example of a `do' statement:
+
+ awk '{ i = 1
+ do {
+ print $0
+ i++
+ } while (i <= 10)
+ }'
+
+This program prints each input record ten times. It isn't a very
+realistic example, since in this case an ordinary `while' would do just
+as well. But this reflects actual experience; there is only
+occasionally a real use for a `do' statement.
+
+
+File: gawk.info, Node: For Statement, Next: Break Statement, Prev: Do Statement, Up: Statements
+
+The `for' Statement
+===================
+
+ The `for' statement makes it more convenient to count iterations of a
+loop. The general form of the `for' statement looks like this:
+
+ for (INITIALIZATION; CONDITION; INCREMENT)
+ BODY
+
+The INITIALIZATION, CONDITION and INCREMENT parts are arbitrary `awk'
+expressions, and BODY stands for any `awk' statement.
+
+ The `for' statement starts by executing INITIALIZATION. Then, as
+long as CONDITION is true, it repeatedly executes BODY and then
+INCREMENT. Typically INITIALIZATION sets a variable to either zero or
+one, INCREMENT adds one to it, and CONDITION compares it against the
+desired number of iterations.
+
+ Here is an example of a `for' statement:
+
+ awk '{ for (i = 1; i <= 3; i++)
+ print $i
+ }' inventory-shipped
+
+This prints the first three fields of each input record, one field per
+line.
+
+ You cannot set more than one variable in the INITIALIZATION part
+unless you use a multiple assignment statement such as `x = y = 0',
+which is possible only if all the initial values are equal. (But you
+can initialize additional variables by writing their assignments as
+separate statements preceding the `for' loop.)
+
+ The same is true of the INCREMENT part; to increment additional
+variables, you must write separate statements at the end of the loop.
+The C compound expression, using C's comma operator, would be useful in
+this context, but it is not supported in `awk'.
+
+ Most often, INCREMENT is an increment expression, as in the example
+above. But this is not required; it can be any expression whatever.
+For example, this statement prints all the powers of two between one
+and 100:
+
+ for (i = 1; i <= 100; i *= 2)
+ print i
+
+ Any of the three expressions in the parentheses following the `for'
+may be omitted if there is nothing to be done there. Thus,
+`for (; x > 0;)' is equivalent to `while (x > 0)'. If the CONDITION is
+omitted, it is treated as TRUE, effectively yielding an "infinite loop"
+(i.e. a loop that will never terminate).
+
+ In most cases, a `for' loop is an abbreviation for a `while' loop,
+as shown here:
+
+ INITIALIZATION
+ while (CONDITION) {
+ BODY
+ INCREMENT
+ }
+
+The only exception is when the `continue' statement (*note The
+`continue' Statement: Continue Statement.) is used inside the loop;
+changing a `for' statement to a `while' statement in this way can
+change the effect of the `continue' statement inside the loop.
+
+ There is an alternate version of the `for' loop, for iterating over
+all the indices of an array:
+
+ for (i in array)
+ DO SOMETHING WITH array[i]
+
+*Note Scanning All Elements of an Array: Scanning an Array, for more
+information on this version of the `for' loop.
+
+ The `awk' language has a `for' statement in addition to a `while'
+statement because often a `for' loop is both less work to type and more
+natural to think of. Counting the number of iterations is very common
+in loops. It can be easier to think of this counting as part of
+looping rather than as something to do inside the loop.
+
+ The next section has more complicated examples of `for' loops.
+
+
+File: gawk.info, Node: Break Statement, Next: Continue Statement, Prev: For Statement, Up: Statements
+
+The `break' Statement
+=====================
+
+ The `break' statement jumps out of the innermost `for', `while', or
+`do' loop that encloses it. The following example finds the smallest
+divisor of any integer, and also identifies prime numbers:
+
+ awk '# find smallest divisor of num
+ { num = $1
+ for (div = 2; div*div <= num; div++)
+ if (num % div == 0)
+ break
+ if (num % div == 0)
+ printf "Smallest divisor of %d is %d\n", num, div
+ else
+ printf "%d is prime\n", num
+ }'
+
+ When the remainder is zero in the first `if' statement, `awk'
+immediately "breaks out" of the containing `for' loop. This means that
+`awk' proceeds immediately to the statement following the loop and
+continues processing. (This is very different from the `exit'
+statement which stops the entire `awk' program. *Note The `exit'
+Statement: Exit Statement.)
+
+ Here is another program equivalent to the previous one. It
+illustrates how the CONDITION of a `for' or `while' could just as well
+be replaced with a `break' inside an `if':
+
+ awk '# find smallest divisor of num
+ { num = $1
+ for (div = 2; ; div++) {
+ if (num % div == 0) {
+ printf "Smallest divisor of %d is %d\n", num, div
+ break
+ }
+ if (div*div > num) {
+ printf "%d is prime\n", num
+ break
+ }
+ }
+ }'
+
+ As described above, the `break' statement has no meaning when used
+outside the body of a loop. However, although it was never documented,
+historical implementations of `awk' have treated the `break' statement
+outside of a loop as if it were a `next' statement (*note The `next'
+Statement: Next Statement.). Recent versions of Unix `awk' no longer
+allow this usage. `gawk' will support this use of `break' only if
+`--traditional' has been specified on the command line (*note Command
+Line Options: Options.). Otherwise, it will be treated as an error,
+since the POSIX standard specifies that `break' should only be used
+inside the body of a loop (d.c.).
+
+
+File: gawk.info, Node: Continue Statement, Next: Next Statement, Prev: Break Statement, Up: Statements
+
+The `continue' Statement
+========================
+
+ The `continue' statement, like `break', is used only inside `for',
+`while', and `do' loops. It skips over the rest of the loop body,
+causing the next cycle around the loop to begin immediately. Contrast
+this with `break', which jumps out of the loop altogether.
+
+ The `continue' statement in a `for' loop directs `awk' to skip the
+rest of the body of the loop, and resume execution with the
+increment-expression of the `for' statement. The following program
+illustrates this fact:
+
+ awk 'BEGIN {
+ for (x = 0; x <= 20; x++) {
+ if (x == 5)
+ continue
+ printf "%d ", x
+ }
+ print ""
+ }'
+
+This program prints all the numbers from zero to 20, except for five,
+for which the `printf' is skipped. Since the increment `x++' is not
+skipped, `x' does not remain stuck at five. Contrast the `for' loop
+above with this `while' loop:
+
+ awk 'BEGIN {
+ x = 0
+ while (x <= 20) {
+ if (x == 5)
+ continue
+ printf "%d ", x
+ x++
+ }
+ print ""
+ }'
+
+This program loops forever once `x' gets to five.
+
+ As described above, the `continue' statement has no meaning when
+used outside the body of a loop. However, although it was never
+documented, historical implementations of `awk' have treated the
+`continue' statement outside of a loop as if it were a `next' statement
+(*note The `next' Statement: Next Statement.). Recent versions of Unix
+`awk' no longer allow this usage. `gawk' will support this use of
+`continue' only if `--traditional' has been specified on the command
+line (*note Command Line Options: Options.). Otherwise, it will be
+treated as an error, since the POSIX standard specifies that `continue'
+should only be used inside the body of a loop (d.c.).
+
+
+File: gawk.info, Node: Next Statement, Next: Nextfile Statement, Prev: Continue Statement, Up: Statements
+
+The `next' Statement
+====================
+
+ The `next' statement forces `awk' to immediately stop processing the
+current record and go on to the next record. This means that no
+further rules are executed for the current record. The rest of the
+current rule's action is not executed either.
+
+ Contrast this with the effect of the `getline' function (*note
+Explicit Input with `getline': Getline.). That too causes `awk' to
+read the next record immediately, but it does not alter the flow of
+control in any way. So the rest of the current action executes with a
+new input record.
+
+ At the highest level, `awk' program execution is a loop that reads
+an input record and then tests each rule's pattern against it. If you
+think of this loop as a `for' statement whose body contains the rules,
+then the `next' statement is analogous to a `continue' statement: it
+skips to the end of the body of this implicit loop, and executes the
+increment (which reads another record).
+
+ For example, if your `awk' program works only on records with four
+fields, and you don't want it to fail when given bad input, you might
+use this rule near the beginning of the program:
+
+ NF != 4 {
+ err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
+ print err > "/dev/stderr"
+ next
+ }
+
+so that the following rules will not see the bad record. The error
+message is redirected to the standard error output stream, as error
+messages should be. *Note Special File Names in `gawk': Special Files.
+
+ According to the POSIX standard, the behavior is undefined if the
+`next' statement is used in a `BEGIN' or `END' rule. `gawk' will treat
+it as a syntax error. Although POSIX permits it, some other `awk'
+implementations don't allow the `next' statement inside function bodies
+(*note User-defined Functions: User-defined.). Just as any other
+`next' statement, a `next' inside a function body reads the next record
+and starts processing it with the first rule in the program.
+
+ If the `next' statement causes the end of the input to be reached,
+then the code in any `END' rules will be executed. *Note The `BEGIN'
+and `END' Special Patterns: BEGIN/END.
+
+ *Caution:* Some `awk' implementations generate a run-time error if
+you use the `next' statement inside a user-defined function (*note
+User-defined Functions: User-defined.). `gawk' does not have this
+problem.
+
+
+File: gawk.info, Node: Nextfile Statement, Next: Exit Statement, Prev: Next Statement, Up: Statements
+
+The `nextfile' Statement
+========================
+
+ `gawk' provides the `nextfile' statement, which is similar to the
+`next' statement. However, instead of abandoning processing of the
+current record, the `nextfile' statement instructs `gawk' to stop
+processing the current data file.
+
+ Upon execution of the `nextfile' statement, `FILENAME' is updated to
+the name of the next data file listed on the command line, `FNR' is
+reset to one, `ARGIND' is incremented, and processing starts over with
+the first rule in the progam. *Note Built-in Variables::.
+
+ If the `nextfile' statement causes the end of the input to be
+reached, then the code in any `END' rules will be executed. *Note The
+`BEGIN' and `END' Special Patterns: BEGIN/END.
+
+ The `nextfile' statement is a `gawk' extension; it is not
+(currently) available in any other `awk' implementation. *Note
+Implementing `nextfile' as a Function: Nextfile Function, for a
+user-defined function you can use to simulate the `nextfile' statement.
+
+ The `nextfile' statement would be useful if you have many data files
+to process, and you expect that you would not want to process every
+record in every file. Normally, in order to move on to the next data
+file, you would have to continue scanning the unwanted records. The
+`nextfile' statement accomplishes this much more efficiently.
+
+ *Caution:* Versions of `gawk' prior to 3.0 used two words (`next
+file') for the `nextfile' statement. This was changed in 3.0 to one
+word, since the treatment of `file' was inconsistent. When it appeared
+after `next', it was a keyword. Otherwise, it was a regular
+identifier. The old usage is still accepted. However, `gawk' will
+generate a warning message, and support for `next file' will eventually
+be discontinued in a future version of `gawk'.
+
+
+File: gawk.info, Node: Exit Statement, Prev: Nextfile Statement, Up: Statements
+
+The `exit' Statement
+====================
+
+ The `exit' statement causes `awk' to immediately stop executing the
+current rule and to stop processing input; any remaining input is
+ignored. It looks like this:
+
+ exit [RETURN CODE]
+
+ If an `exit' statement is executed from a `BEGIN' rule the program
+stops processing everything immediately. No input records are read.
+However, if an `END' rule is present, it is executed (*note The `BEGIN'
+and `END' Special Patterns: BEGIN/END.).
+
+ If `exit' is used as part of an `END' rule, it causes the program to
+stop immediately.
+
+ An `exit' statement that is not part of a `BEGIN' or `END' rule
+stops the execution of any further automatic rules for the current
+record, skips reading any remaining input records, and executes the
+`END' rule if there is one.
+
+ If you do not want the `END' rule to do its job in this case, you
+can set a variable to non-zero before the `exit' statement, and check
+that variable in the `END' rule. *Note Assertions: Assert Function,
+for an example that does this.
+
+ If an argument is supplied to `exit', its value is used as the exit
+status code for the `awk' process. If no argument is supplied, `exit'
+returns status zero (success). In the case where an argument is
+supplied to a first `exit' statement, and then `exit' is called a
+second time with no argument, the previously supplied exit value is
+used (d.c.).
+
+ For example, let's say you've discovered an error condition you
+really don't know how to handle. Conventionally, programs report this
+by exiting with a non-zero status. Your `awk' program can do this
+using an `exit' statement with a non-zero argument. Here is an example:
+
+ BEGIN {
+ if (("date" | getline date_now) < 0) {
+ print "Can't get system date" > "/dev/stderr"
+ exit 1
+ }
+ print "current date is", date_now
+ close("date")
+ }
+
+
+File: gawk.info, Node: Built-in Variables, Next: Arrays, Prev: Statements, Up: Top
+
+Built-in Variables
+******************
+
+ Most `awk' variables are available for you to use for your own
+purposes; they never change except when your program assigns values to
+them, and never affect anything except when your program examines them.
+However, a few variables in `awk' have special built-in meanings. Some
+of them `awk' examines automatically, so that they enable you to tell
+`awk' how to do certain things. Others are set automatically by `awk',
+so that they carry information from the internal workings of `awk' to
+your program.
+
+ This chapter documents all the built-in variables of `gawk'. Most
+of them are also documented in the chapters describing their areas of
+activity.
+
+* Menu:
+
+* User-modified:: Built-in variables that you change to control
+ `awk'.
+* Auto-set:: Built-in variables where `awk' gives you
+ information.
+* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
+
+
+File: gawk.info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables
+
+Built-in Variables that Control `awk'
+=====================================
+
+ This is an alphabetical list of the variables which you can change to
+control how `awk' does certain things. Those variables that are
+specific to `gawk' are marked with an asterisk, `*'.
+
+`CONVFMT'
+ This string controls conversion of numbers to strings (*note
+ Conversion of Strings and Numbers: Conversion.). It works by
+ being passed, in effect, as the first argument to the `sprintf'
+ function (*note Built-in Functions for String Manipulation: String
+ Functions.). Its default value is `"%.6g"'. `CONVFMT' was
+ introduced by the POSIX standard.
+
+`FIELDWIDTHS *'
+ This is a space separated list of columns that tells `gawk' how to
+ split input with fixed, columnar boundaries. It is an
+ experimental feature. Assigning to `FIELDWIDTHS' overrides the
+ use of `FS' for field splitting. *Note Reading Fixed-width Data:
+ Constant Size, for more information.
+
+ If `gawk' is in compatibility mode (*note Command Line Options:
+ Options.), then `FIELDWIDTHS' has no special meaning, and field
+ splitting operations are done based exclusively on the value of
+ `FS'.
+
+`FS'
+ `FS' is the input field separator (*note Specifying How Fields are
+ Separated: Field Separators.). The value is a single-character
+ string or a multi-character regular expression that matches the
+ separations between fields in an input record. If the value is
+ the null string (`""'), then each character in the record becomes
+ a separate field.
+
+ The default value is `" "', a string consisting of a single space.
+ As a special exception, this value means that any sequence of
+ spaces, tabs, and/or newlines is a single separator.(1) It also
+ causes spaces, tabs, and newlines at the beginning and end of a
+ record to be ignored.
+
+ You can set the value of `FS' on the command line using the `-F'
+ option:
+
+ awk -F, 'PROGRAM' INPUT-FILES
+
+ If `gawk' is using `FIELDWIDTHS' for field-splitting, assigning a
+ value to `FS' will cause `gawk' to return to the normal,
+ `FS'-based, field splitting. An easy way to do this is to simply
+ say `FS = FS', perhaps with an explanatory comment.
+
+`IGNORECASE *'
+ If `IGNORECASE' is non-zero or non-null, then all string
+ comparisons, and all regular expression matching are
+ case-independent. Thus, regexp matching with `~' and `!~', and
+ the `gensub', `gsub', `index', `match', `split' and `sub'
+ functions, record termination with `RS', and field splitting with
+ `FS' all ignore case when doing their particular regexp operations.
+ *Note Case-sensitivity in Matching: Case-sensitivity.
+
+ If `gawk' is in compatibility mode (*note Command Line Options:
+ Options.), then `IGNORECASE' has no special meaning, and string
+ and regexp operations are always case-sensitive.
+
+`OFMT'
+ This string controls conversion of numbers to strings (*note
+ Conversion of Strings and Numbers: Conversion.) for printing with
+ the `print' statement. It works by being passed, in effect, as
+ the first argument to the `sprintf' function (*note Built-in
+ Functions for String Manipulation: String Functions.). Its
+ default value is `"%.6g"'. Earlier versions of `awk' also used
+ `OFMT' to specify the format for converting numbers to strings in
+ general expressions; this is now done by `CONVFMT'.
+
+`OFS'
+ This is the output field separator (*note Output Separators::).
+ It is output between the fields output by a `print' statement. Its
+ default value is `" "', a string consisting of a single space.
+
+`ORS'
+ This is the output record separator. It is output at the end of
+ every `print' statement. Its default value is `"\n"'. (*Note
+ Output Separators::.)
+
+`RS'
+ This is `awk''s input record separator. Its default value is a
+ string containing a single newline character, which means that an
+ input record consists of a single line of text. It can also be
+ the null string, in which case records are separated by runs of
+ blank lines, or a regexp, in which case records are separated by
+ matches of the regexp in the input text. (*Note How Input is
+ Split into Records: Records.)
+
+`SUBSEP'
+ `SUBSEP' is the subscript separator. It has the default value of
+ `"\034"', and is used to separate the parts of the indices of a
+ multi-dimensional array. Thus, the expression `foo["A", "B"]'
+ really accesses `foo["A\034B"]' (*note Multi-dimensional Arrays:
+ Multi-dimensional.).
+
+ ---------- Footnotes ----------
+
+ (1) In POSIX `awk', newline does not count as whitespace.
+
+
+File: gawk.info, Node: Auto-set, Next: ARGC and ARGV, Prev: User-modified, Up: Built-in Variables
+
+Built-in Variables that Convey Information
+==========================================
+
+ This is an alphabetical list of the variables that are set
+automatically by `awk' on certain occasions in order to provide
+information to your program. Those variables that are specific to
+`gawk' are marked with an asterisk, `*'.
+
+`ARGC'
+`ARGV'
+ The command-line arguments available to `awk' programs are stored
+ in an array called `ARGV'. `ARGC' is the number of command-line
+ arguments present. *Note Other Command Line Arguments: Other
+ Arguments. Unlike most `awk' arrays, `ARGV' is indexed from zero
+ to `ARGC' - 1. For example:
+
+ $ awk 'BEGIN {
+ > for (i = 0; i < ARGC; i++)
+ > print ARGV[i]
+ > }' inventory-shipped BBS-list
+ -| awk
+ -| inventory-shipped
+ -| BBS-list
+
+ In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
+ `"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The
+ value of `ARGC' is three, one more than the index of the last
+ element in `ARGV', since the elements are numbered from zero.
+
+ The names `ARGC' and `ARGV', as well as the convention of indexing
+ the array from zero to `ARGC' - 1, are derived from the C
+ language's method of accessing command line arguments. *Note
+ Using `ARGC' and `ARGV': ARGC and ARGV, for information about how
+ `awk' uses these variables.
+
+`ARGIND *'
+ The index in `ARGV' of the current file being processed. Every
+ time `gawk' opens a new data file for processing, it sets `ARGIND'
+ to the index in `ARGV' of the file name. When `gawk' is
+ processing the input files, it is always true that `FILENAME ==
+ ARGV[ARGIND]'.
+
+ This variable is useful in file processing; it allows you to tell
+ how far along you are in the list of data files, and to
+ distinguish between successive instances of the same filename on
+ the command line.
+
+ While you can change the value of `ARGIND' within your `awk'
+ program, `gawk' will automatically set it to a new value when the
+ next file is opened.
+
+ This variable is a `gawk' extension. In other `awk'
+ implementations, or if `gawk' is in compatibility mode (*note
+ Command Line Options: Options.), it is not special.
+
+`ENVIRON'
+ An associative array that contains the values of the environment.
+ The array indices are the environment variable names; the values
+ are the values of the particular environment variables. For
+ example, `ENVIRON["HOME"]' might be `/home/arnold'. Changing this
+ array does not affect the environment passed on to any programs
+ that `awk' may spawn via redirection or the `system' function.
+ (In a future version of `gawk', it may do so.)
+
+ Some operating systems may not have environment variables. On
+ such systems, the `ENVIRON' array is empty (except for
+ `ENVIRON["AWKPATH"]').
+
+`ERRNO *'
+ If a system error occurs either doing a redirection for `getline',
+ during a read for `getline', or during a `close' operation, then
+ `ERRNO' will contain a string describing the error.
+
+ This variable is a `gawk' extension. In other `awk'
+ implementations, or if `gawk' is in compatibility mode (*note
+ Command Line Options: Options.), it is not special.
+
+`FILENAME'
+ This is the name of the file that `awk' is currently reading.
+ When no data files are listed on the command line, `awk' reads
+ from the standard input, and `FILENAME' is set to `"-"'.
+ `FILENAME' is changed each time a new file is read (*note Reading
+ Input Files: Reading Files.). Inside a `BEGIN' rule, the value of
+ `FILENAME' is `""', since there are no input files being processed
+ yet.(1) (d.c.)
+
+`FNR'
+ `FNR' is the current record number in the current file. `FNR' is
+ incremented each time a new record is read (*note Explicit Input
+ with `getline': Getline.). It is reinitialized to zero each time
+ a new input file is started.
+
+`NF'
+ `NF' is the number of fields in the current input record. `NF' is
+ set each time a new record is read, when a new field is created,
+ or when `$0' changes (*note Examining Fields: Fields.).
+
+`NR'
+ This is the number of input records `awk' has processed since the
+ beginning of the program's execution (*note How Input is Split
+ into Records: Records.). `NR' is set each time a new record is
+ read.
+
+`RLENGTH'
+ `RLENGTH' is the length of the substring matched by the `match'
+ function (*note Built-in Functions for String Manipulation: String
+ Functions.). `RLENGTH' is set by invoking the `match' function.
+ Its value is the length of the matched string, or -1 if no match
+ was found.
+
+`RSTART'
+ `RSTART' is the start-index in characters of the substring matched
+ by the `match' function (*note Built-in Functions for String
+ Manipulation: String Functions.). `RSTART' is set by invoking the
+ `match' function. Its value is the position of the string where
+ the matched substring starts, or zero if no match was found.
+
+`RT *'
+ `RT' is set each time a record is read. It contains the input text
+ that matched the text denoted by `RS', the record separator.
+
+ This variable is a `gawk' extension. In other `awk'
+ implementations, or if `gawk' is in compatibility mode (*note
+ Command Line Options: Options.), it is not special.
+
+ A side note about `NR' and `FNR'. `awk' simply increments both of
+these variables each time it reads a record, instead of setting them to
+the absolute value of the number of records read. This means that your
+program can change these variables, and their new values will be
+incremented for each record (d.c.). For example:
+
+ $ echo '1
+ > 2
+ > 3
+ > 4' | awk 'NR == 2 { NR = 17 }
+ > { print NR }'
+ -| 1
+ -| 17
+ -| 18
+ -| 19
+
+Before `FNR' was added to the `awk' language (*note Major Changes
+between V7 and SVR3.1: V7/SVR3.1.), many `awk' programs used this
+feature to track the number of records in a file by resetting `NR' to
+zero when `FILENAME' changed.
+
+ ---------- Footnotes ----------
+
+ (1) Some early implementations of Unix `awk' initialized `FILENAME'
+to `"-"', even if there were data files to be processed. This behavior
+was incorrect, and should not be relied upon in your programs.
+
+
+File: gawk.info, Node: ARGC and ARGV, Prev: Auto-set, Up: Built-in Variables
+
+Using `ARGC' and `ARGV'
+=======================
+
+ In *Note Built-in Variables that Convey Information: Auto-set, you
+saw this program describing the information contained in `ARGC' and
+`ARGV':
+
+ $ awk 'BEGIN {
+ > for (i = 0; i < ARGC; i++)
+ > print ARGV[i]
+ > }' inventory-shipped BBS-list
+ -| awk
+ -| inventory-shipped
+ -| BBS-list
+
+In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
+`"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'.
+
+ Notice that the `awk' program is not entered in `ARGV'. The other
+special command line options, with their arguments, are also not
+entered. But variable assignments on the command line _are_ treated as
+arguments, and do show up in the `ARGV' array.
+
+ Your program can alter `ARGC' and the elements of `ARGV'. Each time
+`awk' reaches the end of an input file, it uses the next element of
+`ARGV' as the name of the next input file. By storing a different
+string there, your program can change which files are read. You can
+use `"-"' to represent the standard input. By storing additional
+elements and incrementing `ARGC' you can cause additional files to be
+read.
+
+ If you decrease the value of `ARGC', that eliminates input files
+from the end of the list. By recording the old value of `ARGC'
+elsewhere, your program can treat the eliminated arguments as something
+other than file names.
+
+ To eliminate a file from the middle of the list, store the null
+string (`""') into `ARGV' in place of the file's name. As a special
+feature, `awk' ignores file names that have been replaced with the null
+string. You may also use the `delete' statement to remove elements from
+`ARGV' (*note The `delete' Statement: Delete.).
+
+ All of these actions are typically done from the `BEGIN' rule,
+before actual processing of the input begins. *Note Splitting a Large
+File Into Pieces: Split Program, and see *Note Duplicating Output Into
+Multiple Files: Tee Program, for an example of each way of removing
+elements from `ARGV'.
+
+ The following fragment processes `ARGV' in order to examine, and
+then remove, command line options.
+
+ BEGIN {
+ for (i = 1; i < ARGC; i++) {
+ if (ARGV[i] == "-v")
+ verbose = 1
+ else if (ARGV[i] == "-d")
+ debug = 1
+ else if (ARGV[i] ~ /^-?/) {
+ e = sprintf("%s: unrecognized option -- %c",
+ ARGV[0], substr(ARGV[i], 1, ,1))
+ print e > "/dev/stderr"
+ } else
+ break
+ delete ARGV[i]
+ }
+ }
+
+
+File: gawk.info, Node: Arrays, Next: Built-in, Prev: Built-in Variables, Up: Top
+
+Arrays in `awk'
+***************
+
+ An "array" is a table of values, called "elements". The elements of
+an array are distinguished by their indices. "Indices" may be either
+numbers or strings. `awk' maintains a single set of names that may be
+used for naming variables, arrays and functions (*note User-defined
+Functions: User-defined.). Thus, you cannot have a variable and an
+array with the same name in the same `awk' program.
+
+* Menu:
+
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the `for' statement. It
+ loops through the indices of an array's
+ existing elements.
+* Delete:: The `delete' statement removes an element
+ from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ `awk'.
+* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
+* Multi-dimensional:: Emulating multi-dimensional arrays in
+ `awk'.
+* Multi-scanning:: Scanning multi-dimensional arrays.
+
+
+File: gawk.info, Node: Array Intro, Next: Reference to Elements, Prev: Arrays, Up: Arrays
+
+Introduction to Arrays
+======================
+
+ The `awk' language provides one-dimensional "arrays" for storing
+groups of related strings or numbers.
+
+ Every `awk' array must have a name. Array names have the same
+syntax as variable names; any valid variable name would also be a valid
+array name. But you cannot use one name in both ways (as an array and
+as a variable) in one `awk' program.
+
+ Arrays in `awk' superficially resemble arrays in other programming
+languages; but there are fundamental differences. In `awk', you don't
+need to specify the size of an array before you start to use it.
+Additionally, any number or string in `awk' may be used as an array
+index, not just consecutive integers.
+
+ In most other languages, you have to "declare" an array and specify
+how many elements or components it contains. In such languages, the
+declaration causes a contiguous block of memory to be allocated for that
+many elements. An index in the array usually must be a positive
+integer; for example, the index zero specifies the first element in the
+array, which is actually stored at the beginning of the block of
+memory. Index one specifies the second element, which is stored in
+memory right after the first element, and so on. It is impossible to
+add more elements to the array, because it has room for only as many
+elements as you declared. (Some languages allow arbitrary starting and
+ending indices, e.g., `15 .. 27', but the size of the array is still
+fixed when the array is declared.)
+
+ A contiguous array of four elements might look like this,
+conceptually, if the element values are eight, `"foo"', `""' and 30:
+
+ +---------+---------+--------+---------+
+ | 8 | "foo" | "" | 30 | value
+ +---------+---------+--------+---------+
+ 0 1 2 3 index
+
+Only the values are stored; the indices are implicit from the order of
+the values. Eight is the value at index zero, because eight appears in
+the position with zero elements before it.
+
+ Arrays in `awk' are different: they are "associative". This means
+that each array is a collection of pairs: an index, and its
+corresponding array element value:
+
+ Element 4 Value 30
+ Element 2 Value "foo"
+ Element 1 Value 8
+ Element 3 Value ""
+
+We have shown the pairs in jumbled order because their order is
+irrelevant.
+
+ One advantage of associative arrays is that new pairs can be added
+at any time. For example, suppose we add to the above array a tenth
+element whose value is `"number ten"'. The result is this:
+
+ Element 10 Value "number ten"
+ Element 4 Value 30
+ Element 2 Value "foo"
+ Element 1 Value 8
+ Element 3 Value ""
+
+Now the array is "sparse", which just means some indices are missing:
+it has elements 1-4 and 10, but doesn't have elements 5, 6, 7, 8, or 9.
+
+ Another consequence of associative arrays is that the indices don't
+have to be positive integers. Any number, or even a string, can be an
+index. For example, here is an array which translates words from
+English into French:
+
+ Element "dog" Value "chien"
+ Element "cat" Value "chat"
+ Element "one" Value "un"
+ Element 1 Value "un"
+
+Here we decided to translate the number one in both spelled-out and
+numeric form--thus illustrating that a single array can have both
+numbers and strings as indices. (In fact, array subscripts are always
+strings; this is discussed in more detail in *Note Using Numbers to
+Subscript Arrays: Numeric Array Subscripts.)
+
+ When `awk' creates an array for you, e.g., with the `split' built-in
+function, that array's indices are consecutive integers starting at one.
+(*Note Built-in Functions for String Manipulation: String Functions.)
+
+
+File: gawk.info, Node: Reference to Elements, Next: Assigning Elements, Prev: Array Intro, Up: Arrays
+
+Referring to an Array Element
+=============================
+
+ The principal way of using an array is to refer to one of its
+elements. An array reference is an expression which looks like this:
+
+ ARRAY[INDEX]
+
+Here, ARRAY is the name of an array. The expression INDEX is the index
+of the element of the array that you want.
+
+ The value of the array reference is the current value of that array
+element. For example, `foo[4.3]' is an expression for the element of
+array `foo' at index `4.3'.
+
+ If you refer to an array element that has no recorded value, the
+value of the reference is `""', the null string. This includes elements
+to which you have not assigned any value, and elements that have been
+deleted (*note The `delete' Statement: Delete.). Such a reference
+automatically creates that array element, with the null string as its
+value. (In some cases, this is unfortunate, because it might waste
+memory inside `awk'.)
+
+ You can find out if an element exists in an array at a certain index
+with the expression:
+
+ INDEX in ARRAY
+
+This expression tests whether or not the particular index exists,
+without the side effect of creating that element if it is not present.
+The expression has the value one (true) if `ARRAY[INDEX]' exists, and
+zero (false) if it does not exist.
+
+ For example, to test whether the array `frequencies' contains the
+index `2', you could write this statement:
+
+ if (2 in frequencies)
+ print "Subscript 2 is present."
+
+ Note that this is _not_ a test of whether or not the array
+`frequencies' contains an element whose _value_ is two. (There is no
+way to do that except to scan all the elements.) Also, this _does not_
+create `frequencies[2]', while the following (incorrect) alternative
+would do so:
+
+ if (frequencies[2] != "")
+ print "Subscript 2 is present."
+
+
+File: gawk.info, Node: Assigning Elements, Next: Array Example, Prev: Reference to Elements, Up: Arrays
+
+Assigning Array Elements
+========================
+
+ Array elements are lvalues: they can be assigned values just like
+`awk' variables:
+
+ ARRAY[SUBSCRIPT] = VALUE
+
+Here ARRAY is the name of your array. The expression SUBSCRIPT is the
+index of the element of the array that you want to assign a value. The
+expression VALUE is the value you are assigning to that element of the
+array.
+
+
+File: gawk.info, Node: Array Example, Next: Scanning an Array, Prev: Assigning Elements, Up: Arrays
+
+Basic Array Example
+===================
+
+ The following program takes a list of lines, each beginning with a
+line number, and prints them out in order of line number. The line
+numbers are not in order, however, when they are first read: they are
+scrambled. This program sorts the lines by making an array using the
+line numbers as subscripts. It then prints out the lines in sorted
+order of their numbers. It is a very simple program, and gets confused
+if it encounters repeated numbers, gaps, or lines that don't begin with
+a number.
+
+ {
+ if ($1 > max)
+ max = $1
+ arr[$1] = $0
+ }
+
+ END {
+ for (x = 1; x <= max; x++)
+ print arr[x]
+ }
+
+ The first rule keeps track of the largest line number seen so far;
+it also stores each line into the array `arr', at an index that is the
+line's number.
+
+ The second rule runs after all the input has been read, to print out
+all the lines.
+
+ When this program is run with the following input:
+
+ 5 I am the Five man
+ 2 Who are you? The new number two!
+ 4 . . . And four on the floor
+ 1 Who is number one?
+ 3 I three you.
+
+its output is this:
+
+ 1 Who is number one?
+ 2 Who are you? The new number two!
+ 3 I three you.
+ 4 . . . And four on the floor
+ 5 I am the Five man
+
+ If a line number is repeated, the last line with a given number
+overrides the others.
+
+ Gaps in the line numbers can be handled with an easy improvement to
+the program's `END' rule:
+
+ END {
+ for (x = 1; x <= max; x++)
+ if (x in arr)
+ print arr[x]
+ }
+
+
+File: gawk.info, Node: Scanning an Array, Next: Delete, Prev: Array Example, Up: Arrays
+
+Scanning All Elements of an Array
+=================================
+
+ In programs that use arrays, you often need a loop that executes
+once for each element of an array. In other languages, where arrays are
+contiguous and indices are limited to positive integers, this is easy:
+you can find all the valid indices by counting from the lowest index up
+to the highest. This technique won't do the job in `awk', since any
+number or string can be an array index. So `awk' has a special kind of
+`for' statement for scanning an array:
+
+ for (VAR in ARRAY)
+ BODY
+
+This loop executes BODY once for each index in ARRAY that your program
+has previously used, with the variable VAR set to that index.
+
+ Here is a program that uses this form of the `for' statement. The
+first rule scans the input records and notes which words appear (at
+least once) in the input, by storing a one into the array `used' with
+the word as index. The second rule scans the elements of `used' to
+find all the distinct words that appear in the input. It prints each
+word that is more than 10 characters long, and also prints the number of
+such words. *Note Built-in Functions for String Manipulation: String
+Functions, for more information on the built-in function `length'.
+
+ # Record a 1 for each word that is used at least once.
+ {
+ for (i = 1; i <= NF; i++)
+ used[$i] = 1
+ }
+
+ # Find number of distinct words more than 10 characters long.
+ END {
+ for (x in used)
+ if (length(x) > 10) {
+ ++num_long_words
+ print x
+ }
+ print num_long_words, "words longer than 10 characters"
+ }
+
+*Note Generating Word Usage Counts: Word Sorting, for a more detailed
+example of this type.
+
+ The order in which elements of the array are accessed by this
+statement is determined by the internal arrangement of the array
+elements within `awk' and cannot be controlled or changed. This can
+lead to problems if new elements are added to ARRAY by statements in
+the loop body; you cannot predict whether or not the `for' loop will
+reach them. Similarly, changing VAR inside the loop may produce
+strange results. It is best to avoid such things.
+
+
+File: gawk.info, Node: Delete, Next: Numeric Array Subscripts, Prev: Scanning an Array, Up: Arrays
+
+The `delete' Statement
+======================
+
+ You can remove an individual element of an array using the `delete'
+statement:
+
+ delete ARRAY[INDEX]
+
+ Once you have deleted an array element, you can no longer obtain any
+value the element once had. It is as if you had never referred to it
+and had never given it any value.
+
+ Here is an example of deleting elements in an array:
+
+ for (i in frequencies)
+ delete frequencies[i]
+
+This example removes all the elements from the array `frequencies'.
+
+ If you delete an element, a subsequent `for' statement to scan the
+array will not report that element, and the `in' operator to check for
+the presence of that element will return zero (i.e. false):
+
+ delete foo[4]
+ if (4 in foo)
+ print "This will never be printed"
+
+ It is important to note that deleting an element is _not_ the same
+as assigning it a null value (the empty string, `""').
+
+ foo[4] = ""
+ if (4 in foo)
+ print "This is printed, even though foo[4] is empty"
+
+ It is not an error to delete an element that does not exist.
+
+ You can delete all the elements of an array with a single statement,
+by leaving off the subscript in the `delete' statement.
+
+ delete ARRAY
+
+ This ability is a `gawk' extension; it is not available in
+compatibility mode (*note Command Line Options: Options.).
+
+ Using this version of the `delete' statement is about three times
+more efficient than the equivalent loop that deletes each element one
+at a time.
+
+ The following statement provides a portable, but non-obvious way to
+clear out an array.
+
+ # thanks to Michael Brennan for pointing this out
+ split("", array)
+
+ The `split' function (*note Built-in Functions for String
+Manipulation: String Functions.) clears out the target array first.
+This call asks it to split apart the null string. Since there is no
+data to split out, the function simply clears the array and then
+returns.
+
+
+File: gawk.info, Node: Numeric Array Subscripts, Next: Uninitialized Subscripts, Prev: Delete, Up: Arrays
+
+Using Numbers to Subscript Arrays
+=================================
+
+ An important aspect of arrays to remember is that _array subscripts
+are always strings_. If you use a numeric value as a subscript, it
+will be converted to a string value before it is used for subscripting
+(*note Conversion of Strings and Numbers: Conversion.).
+
+ This means that the value of the built-in variable `CONVFMT' can
+potentially affect how your program accesses elements of an array. For
+example:
+
+ xyz = 12.153
+ data[xyz] = 1
+ CONVFMT = "%2.2f"
+ if (xyz in data)
+ printf "%s is in data\n", xyz
+ else
+ printf "%s is not in data\n", xyz
+
+This prints `12.15 is not in data'. The first statement gives `xyz' a
+numeric value. Assigning to `data[xyz]' subscripts `data' with the
+string value `"12.153"' (using the default conversion value of
+`CONVFMT', `"%.6g"'), and assigns one to `data["12.153"]'. The program
+then changes the value of `CONVFMT'. The test `(xyz in data)'
+generates a new string value from `xyz', this time `"12.15"', since the
+value of `CONVFMT' only allows two significant digits. This test fails,
+since `"12.15"' is a different string from `"12.153"'.
+
+ According to the rules for conversions (*note Conversion of Strings
+and Numbers: Conversion.), integer values are always converted to
+strings as integers, no matter what the value of `CONVFMT' may happen
+to be. So the usual case of:
+
+ for (i = 1; i <= maxsub; i++)
+ do something with array[i]
+
+will work, no matter what the value of `CONVFMT'.
+
+ Like many things in `awk', the majority of the time things work as
+you would expect them to work. But it is useful to have a precise
+knowledge of the actual rules, since sometimes they can have a subtle
+effect on your programs.
+
+
+File: gawk.info, Node: Uninitialized Subscripts, Next: Multi-dimensional, Prev: Numeric Array Subscripts, Up: Arrays
+
+Using Uninitialized Variables as Subscripts
+===========================================
+
+ Suppose you want to print your input data in reverse order. A
+reasonable attempt at a program to do so (with some test data) might
+look like this:
+
+ $ echo 'line 1
+ > line 2
+ > line 3' | awk '{ l[lines] = $0; ++lines }
+ > END {
+ > for (i = lines-1; i >= 0; --i)
+ > print l[i]
+ > }'
+ -| line 3
+ -| line 2
+
+ Unfortunately, the very first line of input data did not come out in
+the output!
+
+ At first glance, this program should have worked. The variable
+`lines' is uninitialized, and uninitialized variables have the numeric
+value zero. So, the value of `l[0]' should have been printed.
+
+ The issue here is that subscripts for `awk' arrays are *always*
+strings. And uninitialized variables, when used as strings, have the
+value `""', not zero. Thus, `line 1' ended up stored in `l[""]'.
+
+ The following version of the program works correctly:
+
+ { l[lines++] = $0 }
+ END {
+ for (i = lines - 1; i >= 0; --i)
+ print l[i]
+ }
+
+ Here, the `++' forces `l' to be numeric, thus making the "old value"
+numeric zero, which is then converted to `"0"' as the array subscript.
+
+ As we have just seen, even though it is somewhat unusual, the null
+string (`""') is a valid array subscript (d.c.). If `--lint' is provided
+on the command line (*note Command Line Options: Options.), `gawk' will
+warn about the use of the null string as a subscript.
+
+
+File: gawk.info, Node: Multi-dimensional, Next: Multi-scanning, Prev: Uninitialized Subscripts, Up: Arrays
+
+Multi-dimensional Arrays
+========================
+
+ A multi-dimensional array is an array in which an element is
+identified by a sequence of indices, instead of a single index. For
+example, a two-dimensional array requires two indices. The usual way
+(in most languages, including `awk') to refer to an element of a
+two-dimensional array named `grid' is with `grid[X,Y]'.
+
+ Multi-dimensional arrays are supported in `awk' through
+concatenation of indices into one string. What happens is that `awk'
+converts the indices into strings (*note Conversion of Strings and
+Numbers: Conversion.) and concatenates them together, with a separator
+between them. This creates a single string that describes the values
+of the separate indices. The combined string is used as a single index
+into an ordinary, one-dimensional array. The separator used is the
+value of the built-in variable `SUBSEP'.
+
+ For example, suppose we evaluate the expression `foo[5,12] = "value"'
+when the value of `SUBSEP' is `"@"'. The numbers five and 12 are
+converted to strings and concatenated with an `@' between them,
+yielding `"5@12"'; thus, the array element `foo["5@12"]' is set to
+`"value"'.
+
+ Once the element's value is stored, `awk' has no record of whether
+it was stored with a single index or a sequence of indices. The two
+expressions `foo[5,12]' and `foo[5 SUBSEP 12]' are always equivalent.
+
+ The default value of `SUBSEP' is the string `"\034"', which contains
+a non-printing character that is unlikely to appear in an `awk' program
+or in most input data.
+
+ The usefulness of choosing an unlikely character comes from the fact
+that index values that contain a string matching `SUBSEP' lead to
+combined strings that are ambiguous. Suppose that `SUBSEP' were `"@"';
+then `foo["a@b", "c"]' and `foo["a", "b@c"]' would be indistinguishable
+because both would actually be stored as `foo["a@b@c"]'.
+
+ You can test whether a particular index-sequence exists in a
+"multi-dimensional" array with the same operator `in' used for single
+dimensional arrays. Instead of a single index as the left-hand operand,
+write the whole sequence of indices, separated by commas, in
+parentheses:
+
+ (SUBSCRIPT1, SUBSCRIPT2, ...) in ARRAY
+
+ The following example treats its input as a two-dimensional array of
+fields; it rotates this array 90 degrees clockwise and prints the
+result. It assumes that all lines have the same number of elements.
+
+ awk '{
+ if (max_nf < NF)
+ max_nf = NF
+ max_nr = NR
+ for (x = 1; x <= NF; x++)
+ vector[x, NR] = $x
+ }
+
+ END {
+ for (x = 1; x <= max_nf; x++) {
+ for (y = max_nr; y >= 1; --y)
+ printf("%s ", vector[x, y])
+ printf("\n")
+ }
+ }'
+
+When given the input:
+
+ 1 2 3 4 5 6
+ 2 3 4 5 6 1
+ 3 4 5 6 1 2
+ 4 5 6 1 2 3
+
+it produces:
+
+ 4 3 2 1
+ 5 4 3 2
+ 6 5 4 3
+ 1 6 5 4
+ 2 1 6 5
+ 3 2 1 6
+
+
+File: gawk.info, Node: Multi-scanning, Prev: Multi-dimensional, Up: Arrays
+
+Scanning Multi-dimensional Arrays
+=================================
+
+ There is no special `for' statement for scanning a
+"multi-dimensional" array; there cannot be one, because in truth there
+are no multi-dimensional arrays or elements; there is only a
+multi-dimensional _way of accessing_ an array.
+
+ However, if your program has an array that is always accessed as
+multi-dimensional, you can get the effect of scanning it by combining
+the scanning `for' statement (*note Scanning All Elements of an Array:
+Scanning an Array.) with the `split' built-in function (*note Built-in
+Functions for String Manipulation: String Functions.). It works like
+this:
+
+ for (combined in array) {
+ split(combined, separate, SUBSEP)
+ ...
+ }
+
+This sets `combined' to each concatenated, combined index in the array,
+and splits it into the individual indices by breaking it apart where
+the value of `SUBSEP' appears. The split-out indices become the
+elements of the array `separate'.
+
+ Thus, suppose you have previously stored a value in `array[1,
+"foo"]'; then an element with index `"1\034foo"' exists in `array'.
+(Recall that the default value of `SUBSEP' is the character with code
+034.) Sooner or later the `for' statement will find that index and do
+an iteration with `combined' set to `"1\034foo"'. Then the `split'
+function is called as follows:
+
+ split("1\034foo", separate, "\034")
+
+The result of this is to set `separate[1]' to `"1"' and `separate[2]'
+to `"foo"'. Presto, the original sequence of separate indices has been
+recovered.
+
+
+File: gawk.info, Node: Built-in, Next: User-defined, Prev: Arrays, Up: Top
+
+Built-in Functions
+******************
+
+ "Built-in" functions are functions that are always available for
+your `awk' program to call. This chapter defines all the built-in
+functions in `awk'; some of them are mentioned in other sections, but
+they are summarized here for your convenience. (You can also define
+new functions yourself. *Note User-defined Functions: User-defined.)
+
+* Menu:
+
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers, including
+ `int', `sin' and `rand'.
+* String Functions:: Functions for string manipulation, such as
+ `split', `match', and
+ `sprintf'.
+* I/O Functions:: Functions for files and shell commands.
+* Time Functions:: Functions for dealing with time stamps.
+
+
+File: gawk.info, Node: Calling Built-in, Next: Numeric Functions, Prev: Built-in, Up: Built-in
+
+Calling Built-in Functions
+==========================
+
+ To call a built-in function, write the name of the function followed
+by arguments in parentheses. For example, `atan2(y + z, 1)' is a call
+to the function `atan2', with two arguments.
+
+ Whitespace is ignored between the built-in function name and the
+open-parenthesis, but we recommend that you avoid using whitespace
+there. User-defined functions do not permit whitespace in this way, and
+you will find it easier to avoid mistakes by following a simple
+convention which always works: no whitespace after a function name.
+
+ Each built-in function accepts a certain number of arguments. In
+some cases, arguments can be omitted. The defaults for omitted
+arguments vary from function to function and are described under the
+individual functions. In some `awk' implementations, extra arguments
+given to built-in functions are ignored. However, in `gawk', it is a
+fatal error to give extra arguments to a built-in function.
+
+ When a function is called, expressions that create the function's
+actual parameters are evaluated completely before the function call is
+performed. For example, in the code fragment:
+
+ i = 4
+ j = sqrt(i++)
+
+the variable `i' is set to five before `sqrt' is called with a value of
+four for its actual parameter.
+
+ The order of evaluation of the expressions used for the function's
+parameters is undefined. Thus, you should not write programs that
+assume that parameters are evaluated from left to right or from right
+to left. For example,
+
+ i = 5
+ j = atan2(i++, i *= 2)
+
+ If the order of evaluation is left to right, then `i' first becomes
+six, and then 12, and `atan2' is called with the two arguments six and
+12. But if the order of evaluation is right to left, `i' first becomes
+10, and then 11, and `atan2' is called with the two arguments 11 and 10.
+
+
+File: gawk.info, Node: Numeric Functions, Next: String Functions, Prev: Calling Built-in, Up: Built-in
+
+Numeric Built-in Functions
+==========================
+
+ Here is a full list of built-in functions that work with numbers.
+Optional parameters are enclosed in square brackets ("[" and "]").
+
+`int(X)'
+ This produces the nearest integer to X, located between X and zero,
+ truncated toward zero.
+
+ For example, `int(3)' is three, `int(3.9)' is three, `int(-3.9)'
+ is -3, and `int(-3)' is -3 as well.
+
+`sqrt(X)'
+ This gives you the positive square root of X. It reports an error
+ if X is negative. Thus, `sqrt(4)' is two.
+
+`exp(X)'
+ This gives you the exponential of X (`e ^ X'), or reports an error
+ if X is out of range. The range of values X can have depends on
+ your machine's floating point representation.
+
+`log(X)'
+ This gives you the natural logarithm of X, if X is positive;
+ otherwise, it reports an error.
+
+`sin(X)'
+ This gives you the sine of X, with X in radians.
+
+`cos(X)'
+ This gives you the cosine of X, with X in radians.
+
+`atan2(Y, X)'
+ This gives you the arctangent of `Y / X' in radians.
+
+`rand()'
+ This gives you a random number. The values of `rand' are
+ uniformly-distributed between zero and one. The value is never
+ zero and never one.
+
+ Often you want random integers instead. Here is a user-defined
+ function you can use to obtain a random non-negative integer less
+ than N:
+
+ function randint(n) {
+ return int(n * rand())
+ }
+
+ The multiplication produces a random real number greater than zero
+ and less than `n'. We then make it an integer (using `int')
+ between zero and `n' - 1, inclusive.
+
+ Here is an example where a similar function is used to produce
+ random integers between one and N. This program prints a new
+ random number for each input record.
+
+ awk '
+ # Function to roll a simulated die.
+ function roll(n) { return 1 + int(rand() * n) }
+
+ # Roll 3 six-sided dice and
+ # print total number of points.
+ {
+ printf("%d points\n",
+ roll(6)+roll(6)+roll(6))
+ }'
+
+ *Caution:* In most `awk' implementations, including `gawk', `rand'
+ starts generating numbers from the same starting number, or
+ "seed", each time you run `awk'. Thus, a program will generate
+ the same results each time you run it. The numbers are random
+ within one `awk' run, but predictable from run to run. This is
+ convenient for debugging, but if you want a program to do
+ different things each time it is used, you must change the seed to
+ a value that will be different in each run. To do this, use
+ `srand'.
+
+`srand([X])'
+ The function `srand' sets the starting point, or seed, for
+ generating random numbers to the value X.
+
+ Each seed value leads to a particular sequence of random
+ numbers.(1) Thus, if you set the seed to the same value a second
+ time, you will get the same sequence of random numbers again.
+
+ If you omit the argument X, as in `srand()', then the current date
+ and time of day are used for a seed. This is the way to get random
+ numbers that are truly unpredictable.
+
+ The return value of `srand' is the previous seed. This makes it
+ easy to keep track of the seeds for use in consistently reproducing
+ sequences of random numbers.
+
+ ---------- Footnotes ----------
+
+ (1) Computer generated random numbers really are not truly random.
+They are technically known as "pseudo-random." This means that while
+the numbers in a sequence appear to be random, you can in fact generate
+the same sequence of random numbers over and over again.
+
+
+File: gawk.info, Node: String Functions, Next: I/O Functions, Prev: Numeric Functions, Up: Built-in
+
+Built-in Functions for String Manipulation
+==========================================
+
+ The functions in this section look at or change the text of one or
+more strings. Optional parameters are enclosed in square brackets ("["
+and "]").
+
+`index(IN, FIND)'
+ This searches the string IN for the first occurrence of the string
+ FIND, and returns the position in characters where that occurrence
+ begins in the string IN. For example:
+
+ $ awk 'BEGIN { print index("peanut", "an") }'
+ -| 3
+
+ If FIND is not found, `index' returns zero. (Remember that string
+ indices in `awk' start at one.)
+
+`length([STRING])'
+ This gives you the number of characters in STRING. If STRING is a
+ number, the length of the digit string representing that number is
+ returned. For example, `length("abcde")' is five. By contrast,
+ `length(15 * 35)' works out to three. How? Well, 15 * 35 = 525,
+ and 525 is then converted to the string `"525"', which has three
+ characters.
+
+ If no argument is supplied, `length' returns the length of `$0'.
+
+ In older versions of `awk', you could call the `length' function
+ without any parentheses. Doing so is marked as "deprecated" in the
+ POSIX standard. This means that while you can do this in your
+ programs, it is a feature that can eventually be removed from a
+ future version of the standard. Therefore, for maximal
+ portability of your `awk' programs, you should always supply the
+ parentheses.
+
+`match(STRING, REGEXP)'
+ The `match' function searches the string, STRING, for the longest,
+ leftmost substring matched by the regular expression, REGEXP. It
+ returns the character position, or "index", of where that
+ substring begins (one, if it starts at the beginning of STRING).
+ If no match is found, it returns zero.
+
+ The `match' function sets the built-in variable `RSTART' to the
+ index. It also sets the built-in variable `RLENGTH' to the length
+ in characters of the matched substring. If no match is found,
+ `RSTART' is set to zero, and `RLENGTH' to -1.
+
+ For example:
+
+ awk '{
+ if ($1 == "FIND")
+ regex = $2
+ else {
+ where = match($0, regex)
+ if (where != 0)
+ print "Match of", regex, "found at", \
+ where, "in", $0
+ }
+ }'
+
+ This program looks for lines that match the regular expression
+ stored in the variable `regex'. This regular expression can be
+ changed. If the first word on a line is `FIND', `regex' is
+ changed to be the second word on that line. Therefore, given:
+
+ FIND ru+n
+ My program runs
+ but not very quickly
+ FIND Melvin
+ JF+KM
+ This line is property of Reality Engineering Co.
+ Melvin was here.
+
+ `awk' prints:
+
+ Match of ru+n found at 12 in My program runs
+ Match of Melvin found at 1 in Melvin was here.
+
+`split(STRING, ARRAY [, FIELDSEP])'
+ This divides STRING into pieces separated by FIELDSEP, and stores
+ the pieces in ARRAY. The first piece is stored in `ARRAY[1]', the
+ second piece in `ARRAY[2]', and so forth. The string value of the
+ third argument, FIELDSEP, is a regexp describing where to split
+ STRING (much as `FS' can be a regexp describing where to split
+ input records). If the FIELDSEP is omitted, the value of `FS' is
+ used. `split' returns the number of elements created.
+
+ The `split' function splits strings into pieces in a manner
+ similar to the way input lines are split into fields. For example:
+
+ split("cul-de-sac", a, "-")
+
+ splits the string `cul-de-sac' into three fields using `-' as the
+ separator. It sets the contents of the array `a' as follows:
+
+ a[1] = "cul"
+ a[2] = "de"
+ a[3] = "sac"
+
+ The value returned by this call to `split' is three.
+
+ As with input field-splitting, when the value of FIELDSEP is
+ `" "', leading and trailing whitespace is ignored, and the elements
+ are separated by runs of whitespace.
+
+ Also as with input field-splitting, if FIELDSEP is the null
+ string, each individual character in the string is split into its
+ own array element. (This is a `gawk'-specific extension.)
+
+ Recent implementations of `awk', including `gawk', allow the third
+ argument to be a regexp constant (`/abc/'), as well as a string
+ (d.c.). The POSIX standard allows this as well.
+
+ Before splitting the string, `split' deletes any previously
+ existing elements in the array ARRAY (d.c.).
+
+`sprintf(FORMAT, EXPRESSION1,...)'
+ This returns (without printing) the string that `printf' would
+ have printed out with the same arguments (*note Using `printf'
+ Statements for Fancier Printing: Printf.). For example:
+
+ sprintf("pi = %.2f (approx.)", 22/7)
+
+ returns the string `"pi = 3.14 (approx.)"'.
+
+`sub(REGEXP, REPLACEMENT [, TARGET])'
+ The `sub' function alters the value of TARGET. It searches this
+ value, which is treated as a string, for the leftmost longest
+ substring matched by the regular expression, REGEXP, extending
+ this match as far as possible. Then the entire string is changed
+ by replacing the matched text with REPLACEMENT. The modified
+ string becomes the new value of TARGET.
+
+ This function is peculiar because TARGET is not simply used to
+ compute a value, and not just any expression will do: it must be a
+ variable, field or array element, so that `sub' can store a
+ modified value there. If this argument is omitted, then the
+ default is to use and alter `$0'.
+
+ For example:
+
+ str = "water, water, everywhere"
+ sub(/at/, "ith", str)
+
+ sets `str' to `"wither, water, everywhere"', by replacing the
+ leftmost, longest occurrence of `at' with `ith'.
+
+ The `sub' function returns the number of substitutions made (either
+ one or zero).
+
+ If the special character `&' appears in REPLACEMENT, it stands for
+ the precise substring that was matched by REGEXP. (If the regexp
+ can match more than one string, then this precise substring may
+ vary.) For example:
+
+ awk '{ sub(/candidate/, "& and his wife"); print }'
+
+ changes the first occurrence of `candidate' to `candidate and his
+ wife' on each input line.
+
+ Here is another example:
+
+ awk 'BEGIN {
+ str = "daabaaa"
+ sub(/a*/, "c&c", str)
+ print str
+ }'
+ -| dcaacbaaa
+
+ This shows how `&' can represent a non-constant string, and also
+ illustrates the "leftmost, longest" rule in regexp matching (*note
+ How Much Text Matches?: Leftmost Longest.).
+
+ The effect of this special character (`&') can be turned off by
+ putting a backslash before it in the string. As usual, to insert
+ one backslash in the string, you must write two backslashes.
+ Therefore, write `\\&' in a string constant to include a literal
+ `&' in the replacement. For example, here is how to replace the
+ first `|' on each line with an `&':
+
+ awk '{ sub(/\|/, "\\&"); print }'
+
+ *Note:* As mentioned above, the third argument to `sub' must be a
+ variable, field or array reference. Some versions of `awk' allow
+ the third argument to be an expression which is not an lvalue. In
+ such a case, `sub' would still search for the pattern and return
+ zero or one, but the result of the substitution (if any) would be
+ thrown away because there is no place to put it. Such versions of
+ `awk' accept expressions like this:
+
+ sub(/USA/, "United States", "the USA and Canada")
+
+ For historical compatibility, `gawk' will accept erroneous code,
+ such as in the above example. However, using any other
+ non-changeable object as the third parameter will cause a fatal
+ error, and your program will not run.
+
+`gsub(REGEXP, REPLACEMENT [, TARGET])'
+ This is similar to the `sub' function, except `gsub' replaces
+ _all_ of the longest, leftmost, _non-overlapping_ matching
+ substrings it can find. The `g' in `gsub' stands for "global,"
+ which means replace everywhere. For example:
+
+ awk '{ gsub(/Britain/, "United Kingdom"); print }'
+
+ replaces all occurrences of the string `Britain' with `United
+ Kingdom' for all input records.
+
+ The `gsub' function returns the number of substitutions made. If
+ the variable to be searched and altered, TARGET, is omitted, then
+ the entire input record, `$0', is used.
+
+ As in `sub', the characters `&' and `\' are special, and the third
+ argument must be an lvalue.
+
+`gensub(REGEXP, REPLACEMENT, HOW [, TARGET])'
+ `gensub' is a general substitution function. Like `sub' and
+ `gsub', it searches the target string TARGET for matches of the
+ regular expression REGEXP. Unlike `sub' and `gsub', the modified
+ string is returned as the result of the function, and the original
+ target string is _not_ changed. If HOW is a string beginning with
+ `g' or `G', then it replaces all matches of REGEXP with
+ REPLACEMENT. Otherwise, HOW is a number indicating which match of
+ REGEXP to replace. If no TARGET is supplied, `$0' is used instead.
+
+ `gensub' provides an additional feature that is not available in
+ `sub' or `gsub': the ability to specify components of a regexp in
+ the replacement text. This is done by using parentheses in the
+ regexp to mark the components, and then specifying `\N' in the
+ replacement text, where N is a digit from one to nine. For
+ example:
+
+ $ gawk '
+ > BEGIN {
+ > a = "abc def"
+ > b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
+ > print b
+ > }'
+ -| def abc
+
+ As described above for `sub', you must type two backslashes in
+ order to get one into the string.
+
+ In the replacement text, the sequence `\0' represents the entire
+ matched text, as does the character `&'.
+
+ This example shows how you can use the third argument to control
+ which match of the regexp should be changed.
+
+ $ echo a b c a b c |
+ > gawk '{ print gensub(/a/, "AA", 2) }'
+ -| a b c AA b c
+
+ In this case, `$0' is used as the default target string. `gensub'
+ returns the new string as its result, which is passed directly to
+ `print' for printing.
+
+ If the HOW argument is a string that does not begin with `g' or
+ `G', or if it is a number that is less than zero, only one
+ substitution is performed.
+
+ `gensub' is a `gawk' extension; it is not available in
+ compatibility mode (*note Command Line Options: Options.).
+
+`substr(STRING, START [, LENGTH])'
+ This returns a LENGTH-character-long substring of STRING, starting
+ at character number START. The first character of a string is
+ character number one. For example, `substr("washington", 5, 3)'
+ returns `"ing"'.
+
+ If LENGTH is not present, this function returns the whole suffix of
+ STRING that begins at character number START. For example,
+ `substr("washington", 5)' returns `"ington"'. The whole suffix is
+ also returned if LENGTH is greater than the number of characters
+ remaining in the string, counting from character number START.
+
+ *Note:* The string returned by `substr' _cannot_ be assigned to.
+ Thus, it is a mistake to attempt to change a portion of a string,
+ like this:
+
+ string = "abcdef"
+ # try to get "abCDEf", won't work
+ substr(string, 3, 3) = "CDE"
+
+ or to use `substr' as the third agument of `sub' or `gsub':
+
+ gsub(/xyz/, "pdq", substr($0, 5, 20)) # WRONG
+
+`tolower(STRING)'
+ This returns a copy of STRING, with each upper-case character in
+ the string replaced with its corresponding lower-case character.
+ Non-alphabetic characters are left unchanged. For example,
+ `tolower("MiXeD cAsE 123")' returns `"mixed case 123"'.
+
+`toupper(STRING)'
+ This returns a copy of STRING, with each lower-case character in
+ the string replaced with its corresponding upper-case character.
+ Non-alphabetic characters are left unchanged. For example,
+ `toupper("MiXeD cAsE 123")' returns `"MIXED CASE 123"'.
+
+More About `\' and `&' with `sub', `gsub' and `gensub'
+------------------------------------------------------
+
+ When using `sub', `gsub' or `gensub', and trying to get literal
+backslashes and ampersands into the replacement text, you need to
+remember that there are several levels of "escape processing" going on.
+
+ First, there is the "lexical" level, which is when `awk' reads your
+program, and builds an internal copy of your program that can be
+executed.
+
+ Then there is the run-time level, when `awk' actually scans the
+replacement string to determine what to generate.
+
+ At both levels, `awk' looks for a defined set of characters that can
+come after a backslash. At the lexical level, it looks for the escape
+sequences listed in *Note Escape Sequences::. Thus, for every `\' that
+`awk' will process at the run-time level, you type two `\'s at the
+lexical level. When a character that is not valid for an escape
+sequence follows the `\', Unix `awk' and `gawk' both simply remove the
+initial `\', and put the following character into the string. Thus, for
+example, `"a\qb"' is treated as `"aqb"'.
+
+ At the run-time level, the various functions handle sequences of `\'
+and `&' differently. The situation is (sadly) somewhat complex.
+
+ Historically, the `sub' and `gsub' functions treated the two
+character sequence `\&' specially; this sequence was replaced in the
+generated text with a single `&'. Any other `\' within the REPLACEMENT
+string that did not precede an `&' was passed through unchanged. To
+illustrate with a table:
+
+ You type `sub' sees `sub' generates
+ -------- ---------- ---------------
+ `\&' `&' the matched text
+ `\\&' `\&' a literal `&'
+ `\\\&' `\&' a literal `&'
+ `\\\\&' `\\&' a literal `\&'
+ `\\\\\&' `\\&' a literal `\&'
+ `\\\\\\&' `\\\&' a literal `\\&'
+ `\\q' `\q' a literal `\q'
+
+This table shows both the lexical level processing, where an odd number
+of backslashes becomes an even number at the run time level, and the
+run-time processing done by `sub'. (For the sake of simplicity, the
+rest of the tables below only show the case of even numbers of `\'s
+entered at the lexical level.)
+
+ The problem with the historical approach is that there is no way to
+get a literal `\' followed by the matched text.
+
+ The 1992 POSIX standard attempted to fix this problem. The standard
+says that `sub' and `gsub' look for either a `\' or an `&' after the
+`\'. If either one follows a `\', that character is output literally.
+The interpretation of `\' and `&' then becomes like this:
+
+ You type `sub' sees `sub' generates
+ -------- ---------- ---------------
+ `&' `&' the matched text
+ `\\&' `\&' a literal `&'
+ `\\\\&' `\\&' a literal `\', then the matched text
+ `\\\\\\&' `\\\&' a literal `\&'
+
+This would appear to solve the problem. Unfortunately, the phrasing of
+the standard is unusual. It says, in effect, that `\' turns off the
+special meaning of any following character, but that for anything other
+than `\' and `&', such special meaning is undefined. This wording
+leads to two problems.
+
+ 1. Backslashes must now be doubled in the REPLACEMENT string, breaking
+ historical `awk' programs.
+
+ 2. To make sure that an `awk' program is portable, _every_ character
+ in the REPLACEMENT string must be preceded with a backslash.(1)
+
+ The POSIX standard is under revision.(2) Because of the above
+problems, proposed text for the revised standard reverts to rules that
+correspond more closely to the original existing practice. The proposed
+rules have special cases that make it possible to produce a `\'
+preceding the matched text.
+
+ You type `sub' sees `sub' generates
+ -------- ---------- ---------------
+ `\\\\\\&' `\\\&' a literal `\&'
+ `\\\\&' `\\&' a literal `\', followed by the matched text
+ `\\&' `\&' a literal `&'
+ `\\q' `\q' a literal `\q'
+
+ In a nutshell, at the run-time level, there are now three special
+sequences of characters, `\\\&', `\\&' and `\&', whereas historically,
+there was only one. However, as in the historical case, any `\' that
+is not part of one of these three sequences is not special, and appears
+in the output literally.
+
+ `gawk' 3.0 follows these proposed POSIX rules for `sub' and `gsub'.
+Whether these proposed rules will actually become codified into the
+standard is unknown at this point. Subsequent `gawk' releases will
+track the standard and implement whatever the final version specifies;
+this Info file will be updated as well.
+
+ The rules for `gensub' are considerably simpler. At the run-time
+level, whenever `gawk' sees a `\', if the following character is a
+digit, then the text that matched the corresponding parenthesized
+subexpression is placed in the generated output. Otherwise, no matter
+what the character after the `\' is, that character will appear in the
+generated text, and the `\' will not.
+
+ You type `gensub' sees `gensub' generates
+ -------- ------------- ------------------
+ `&' `&' the matched text
+ `\\&' `\&' a literal `&'
+ `\\\\' `\\' a literal `\'
+ `\\\\&' `\\&' a literal `\', then the matched text
+ `\\\\\\&' `\\\&' a literal `\&'
+ `\\q' `\q' a literal `q'
+
+ Because of the complexity of the lexical and run-time level
+processing, and the special cases for `sub' and `gsub', we recommend
+the use of `gawk' and `gensub' for when you have to do substitutions.
+
+ ---------- Footnotes ----------
+
+ (1) This consequence was certainly unintended.
+
+ (2) As of December 1995, with final approval and publication
+hopefully sometime in 1996.
+
+
+File: gawk.info, Node: I/O Functions, Next: Time Functions, Prev: String Functions, Up: Built-in
+
+Built-in Functions for Input/Output
+===================================
+
+ The following functions are related to Input/Output (I/O). Optional
+parameters are enclosed in square brackets ("[" and "]").
+
+`close(FILENAME)'
+ Close the file FILENAME, for input or output. The argument may
+ alternatively be a shell command that was used for redirecting to
+ or from a pipe; then the pipe is closed. *Note Closing Input and
+ Output Files and Pipes: Close Files And Pipes, for more
+ information.
+
+`fflush([FILENAME])'
+ Flush any buffered output associated FILENAME, which is either a
+ file opened for writing, or a shell command for redirecting output
+ to a pipe.
+
+ Many utility programs will "buffer" their output; they save
+ information to be written to a disk file or terminal in memory,
+ until there is enough for it to be worthwhile to send the data to
+ the ouput device. This is often more efficient than writing every
+ little bit of information as soon as it is ready. However,
+ sometimes it is necessary to force a program to "flush" its
+ buffers; that is, write the information to its destination, even
+ if a buffer is not full. This is the purpose of the `fflush'
+ function; `gawk' too buffers its output, and the `fflush' function
+ can be used to force `gawk' to flush its buffers.
+
+ `fflush' is a recent (1994) addition to the Bell Labs research
+ version of `awk'; it is not part of the POSIX standard, and will
+ not be available if `--posix' has been specified on the command
+ line (*note Command Line Options: Options.).
+
+ `gawk' extends the `fflush' function in two ways. The first is to
+ allow no argument at all. In this case, the buffer for the
+ standard output is flushed. The second way is to allow the null
+ string (`""') as the argument. In this case, the buffers for _all_
+ open output files and pipes are flushed.
+
+ `fflush' returns zero if the buffer was successfully flushed, and
+ nonzero otherwise.
+
+`system(COMMAND)'
+ The system function allows the user to execute operating system
+ commands and then return to the `awk' program. The `system'
+ function executes the command given by the string COMMAND. It
+ returns, as its value, the status returned by the command that was
+ executed.
+
+ For example, if the following fragment of code is put in your `awk'
+ program:
+
+ END {
+ system("date | mail -s 'awk run done' root")
+ }
+
+ the system administrator will be sent mail when the `awk' program
+ finishes processing input and begins its end-of-input processing.
+
+ Note that redirecting `print' or `printf' into a pipe is often
+ enough to accomplish your task. However, if your `awk' program is
+ interactive, `system' is useful for cranking up large
+ self-contained programs, such as a shell or an editor.
+
+ Some operating systems cannot implement the `system' function.
+ `system' causes a fatal error if it is not supported.
+
+Interactive vs. Non-Interactive Buffering
+-----------------------------------------
+
+ As a side point, buffering issues can be even more confusing
+depending upon whether or not your program is "interactive", i.e.,
+communicating with a user sitting at a keyboard.(1)
+
+ Interactive programs generally "line buffer" their output; they
+write out every line. Non-interactive programs wait until they have a
+full buffer, which may be many lines of output.
+
+ Here is an example of the difference.
+
+ $ awk '{ print $1 + $2 }'
+ 1 1
+ -| 2
+ 2 3
+ -| 5
+ Control-d
+
+Each line of output is printed immediately. Compare that behavior with
+this example.
+
+ $ awk '{ print $1 + $2 }' | cat
+ 1 1
+ 2 3
+ Control-d
+ -| 2
+ -| 5
+
+Here, no output is printed until after the `Control-D' is typed, since
+it is all buffered, and sent down the pipe to `cat' in one shot.
+
+Controlling Output Buffering with `system'
+------------------------------------------
+
+ The `fflush' function provides explicit control over output
+buffering for individual files and pipes. However, its use is not
+portable to many other `awk' implementations. An alternative method to
+flush output buffers is by calling `system' with a null string as its
+argument:
+
+ system("") # flush output
+
+`gawk' treats this use of the `system' function as a special case, and
+is smart enough not to run a shell (or other command interpreter) with
+the empty command. Therefore, with `gawk', this idiom is not only
+useful, it is efficient. While this method should work with other
+`awk' implementations, it will not necessarily avoid starting an
+unnecessary shell. (Other implementations may only flush the buffer
+associated with the standard output, and not necessarily all buffered
+output.)
+
+ If you think about what a programmer expects, it makes sense that
+`system' should flush any pending output. The following program:
+
+ BEGIN {
+ print "first print"
+ system("echo system echo")
+ print "second print"
+ }
+
+must print
+
+ first print
+ system echo
+ second print
+
+and not
+
+ system echo
+ first print
+ second print
+
+ If `awk' did not flush its buffers before calling `system', the
+latter (undesirable) output is what you would see.
+
+ ---------- Footnotes ----------
+
+ (1) A program is interactive if the standard output is connected to
+a terminal device.
+
+
+File: gawk.info, Node: Time Functions, Prev: I/O Functions, Up: Built-in
+
+Functions for Dealing with Time Stamps
+======================================
+
+ A common use for `awk' programs is the processing of log files
+containing time stamp information, indicating when a particular log
+record was written. Many programs log their time stamp in the form
+returned by the `time' system call, which is the number of seconds
+since a particular epoch. On POSIX systems, it is the number of
+seconds since Midnight, January 1, 1970, UTC.
+
+ In order to make it easier to process such log files, and to produce
+useful reports, `gawk' provides two functions for working with time
+stamps. Both of these are `gawk' extensions; they are not specified in
+the POSIX standard, nor are they in any other known version of `awk'.
+
+ Optional parameters are enclosed in square brackets ("[" and "]").
+
+`systime()'
+ This function returns the current time as the number of seconds
+ since the system epoch. On POSIX systems, this is the number of
+ seconds since Midnight, January 1, 1970, UTC. It may be a
+ different number on other systems.
+
+`strftime([FORMAT [, TIMESTAMP]])'
+ This function returns a string. It is similar to the function of
+ the same name in ANSI C. The time specified by TIMESTAMP is used
+ to produce a string, based on the contents of the FORMAT string.
+ The TIMESTAMP is in the same format as the value returned by the
+ `systime' function. If no TIMESTAMP argument is supplied, `gawk'
+ will use the current time of day as the time stamp. If no FORMAT
+ argument is supplied, `strftime' uses `"%a %b %d %H:%M:%S %Z %Y"'.
+ This format string produces output (almost) equivalent to that of
+ the `date' utility. (Versions of `gawk' prior to 3.0 require the
+ FORMAT argument.)
+
+ The `systime' function allows you to compare a time stamp from a log
+file with the current time of day. In particular, it is easy to
+determine how long ago a particular record was logged. It also allows
+you to produce log records using the "seconds since the epoch" format.
+
+ The `strftime' function allows you to easily turn a time stamp into
+human-readable information. It is similar in nature to the `sprintf'
+function (*note Built-in Functions for String Manipulation: String
+Functions.), in that it copies non-format specification characters
+verbatim to the returned string, while substituting date and time
+values for format specifications in the FORMAT string.
+
+ `strftime' is guaranteed by the ANSI C standard to support the
+following date format specifications:
+
+`%a'
+ The locale's abbreviated weekday name.
+
+`%A'
+ The locale's full weekday name.
+
+`%b'
+ The locale's abbreviated month name.
+
+`%B'
+ The locale's full month name.
+
+`%c'
+ The locale's "appropriate" date and time representation.
+
+`%d'
+ The day of the month as a decimal number (01-31).
+
+`%H'
+ The hour (24-hour clock) as a decimal number (00-23).
+
+`%I'
+ The hour (12-hour clock) as a decimal number (01-12).
+
+`%j'
+ The day of the year as a decimal number (001-366).
+
+`%m'
+ The month as a decimal number (01-12).
+
+`%M'
+ The minute as a decimal number (00-59).
+
+`%p'
+ The locale's equivalent of the AM/PM designations associated with
+ a 12-hour clock.
+
+`%S'
+ The second as a decimal number (00-60).(1)
+
+`%U'
+ The week number of the year (the first Sunday as the first day of
+ week one) as a decimal number (00-53).
+
+`%w'
+ The weekday as a decimal number (0-6). Sunday is day zero.
+
+`%W'
+ The week number of the year (the first Monday as the first day of
+ week one) as a decimal number (00-53).
+
+`%x'
+ The locale's "appropriate" date representation.
+
+`%X'
+ The locale's "appropriate" time representation.
+
+`%y'
+ The year without century as a decimal number (00-99).
+
+`%Y'
+ The year with century as a decimal number (e.g., 1995).
+
+`%Z'
+ The time zone name or abbreviation, or no characters if no time
+ zone is determinable.
+
+`%%'
+ A literal `%'.
+
+ If a conversion specifier is not one of the above, the behavior is
+undefined.(2)
+
+ Informally, a "locale" is the geographic place in which a program is
+meant to run. For example, a common way to abbreviate the date
+September 4, 1991 in the United States would be "9/4/91". In many
+countries in Europe, however, it would be abbreviated "4.9.91". Thus,
+the `%x' specification in a `"US"' locale might produce `9/4/91', while
+in a `"EUROPE"' locale, it might produce `4.9.91'. The ANSI C standard
+defines a default `"C"' locale, which is an environment that is typical
+of what most C programmers are used to.
+
+ A public-domain C version of `strftime' is supplied with `gawk' for
+systems that are not yet fully ANSI-compliant. If that version is used
+to compile `gawk' (*note Installing `gawk': Installation.), then the
+following additional format specifications are available:
+
+`%D'
+ Equivalent to specifying `%m/%d/%y'.
+
+`%e'
+ The day of the month, padded with a space if it is only one digit.
+
+`%h'
+ Equivalent to `%b', above.
+
+`%n'
+ A newline character (ASCII LF).
+
+`%r'
+ Equivalent to specifying `%I:%M:%S %p'.
+
+`%R'
+ Equivalent to specifying `%H:%M'.
+
+`%T'
+ Equivalent to specifying `%H:%M:%S'.
+
+`%t'
+ A tab character.
+
+`%k'
+ The hour (24-hour clock) as a decimal number (0-23). Single digit
+ numbers are padded with a space.
+
+`%l'
+ The hour (12-hour clock) as a decimal number (1-12). Single digit
+ numbers are padded with a space.
+
+`%C'
+ The century, as a number between 00 and 99.
+
+`%u'
+ The weekday as a decimal number [1 (Monday)-7].
+
+`%V'
+ The week number of the year (the first Monday as the first day of
+ week one) as a decimal number (01-53). The method for determining
+ the week number is as specified by ISO 8601 (to wit: if the week
+ containing January 1 has four or more days in the new year, then
+ it is week one, otherwise it is week 53 of the previous year and
+ the next week is week one).
+
+`%G'
+ The year with century of the ISO week number, as a decimal number.
+
+ For example, January 1, 1993, is in week 53 of 1992. Thus, the year
+ of its ISO week number is 1992, even though its year is 1993.
+ Similarly, December 31, 1973, is in week 1 of 1974. Thus, the year
+ of its ISO week number is 1974, even though its year is 1973.
+
+`%g'
+ The year without century of the ISO week number, as a decimal
+ number (00-99).
+
+`%Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI'
+`%Om %OM %OS %Ou %OU %OV %Ow %OW %Oy'
+ These are "alternate representations" for the specifications that
+ use only the second letter (`%c', `%C', and so on). They are
+ recognized, but their normal representations are used.(3) (These
+ facilitate compliance with the POSIX `date' utility.)
+
+`%v'
+ The date in VMS format (e.g., 20-JUN-1991).
+
+`%z'
+ The timezone offset in a +HHMM format (e.g., the format necessary
+ to produce RFC-822/RFC-1036 date headers).
+
+ This example is an `awk' implementation of the POSIX `date' utility.
+Normally, the `date' utility prints the current date and time of day
+in a well known format. However, if you provide an argument to it that
+begins with a `+', `date' will copy non-format specifier characters to
+the standard output, and will interpret the current time according to
+the format specifiers in the string. For example:
+
+ $ date '+Today is %A, %B %d, %Y.'
+ -| Today is Thursday, July 11, 1991.
+
+ Here is the `gawk' version of the `date' utility. It has a shell
+"wrapper", to handle the `-u' option, which requires that `date' run as
+if the time zone was set to UTC.
+
+ #! /bin/sh
+ #
+ # date --- approximate the P1003.2 'date' command
+
+ case $1 in
+ -u) TZ=GMT0 # use UTC
+ export TZ
+ shift ;;
+ esac
+
+ gawk 'BEGIN {
+ format = "%a %b %d %H:%M:%S %Z %Y"
+ exitval = 0
+
+ if (ARGC > 2)
+ exitval = 1
+ else if (ARGC == 2) {
+ format = ARGV[1]
+ if (format ~ /^\+/)
+ format = substr(format, 2) # remove leading +
+ }
+ print strftime(format)
+ exit exitval
+ }' "$@"
+
+ ---------- Footnotes ----------
+
+ (1) Occasionally there are minutes in a year with a leap second,
+which is why the seconds can go up to 60.
+
+ (2) This is because ANSI C leaves the behavior of the C version of
+`strftime' undefined, and `gawk' will use the system's version of
+`strftime' if it's there. Typically, the conversion specifier will
+either not appear in the returned string, or it will appear literally.
+
+ (3) If you don't understand any of this, don't worry about it; these
+facilities are meant to make it easier to "internationalize" programs.
+
+
+File: gawk.info, Node: User-defined, Next: Invoking Gawk, Prev: Built-in, Up: Top
+
+User-defined Functions
+**********************
+
+ Complicated `awk' programs can often be simplified by defining your
+own functions. User-defined functions can be called just like built-in
+ones (*note Function Calls::), but it is up to you to define them--to
+tell `awk' what they should do.
+
+* Menu:
+
+* Definition Syntax:: How to write definitions and what they mean.
+* Function Example:: An example function definition and what it
+ does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+
+
+File: gawk.info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined
+
+Function Definition Syntax
+==========================
+
+ Definitions of functions can appear anywhere between the rules of an
+`awk' program. Thus, the general form of an `awk' program is extended
+to include sequences of rules _and_ user-defined function definitions.
+There is no need in `awk' to put the definition of a function before
+all uses of the function. This is because `awk' reads the entire
+program before starting to execute any of it.
+
+ The definition of a function named NAME looks like this:
+
+ function NAME(PARAMETER-LIST)
+ {
+ BODY-OF-FUNCTION
+ }
+
+NAME is the name of the function to be defined. A valid function name
+is like a valid variable name: a sequence of letters, digits and
+underscores, not starting with a digit. Within a single `awk' program,
+any particular name can only be used as a variable, array or function.
+
+ PARAMETER-LIST is a list of the function's arguments and local
+variable names, separated by commas. When the function is called, the
+argument names are used to hold the argument values given in the call.
+The local variables are initialized to the empty string. A function
+cannot have two parameters with the same name.
+
+ The BODY-OF-FUNCTION consists of `awk' statements. It is the most
+important part of the definition, because it says what the function
+should actually _do_. The argument names exist to give the body a way
+to talk about the arguments; local variables, to give the body places
+to keep temporary values.
+
+ Argument names are not distinguished syntactically from local
+variable names; instead, the number of arguments supplied when the
+function is called determines how many argument variables there are.
+Thus, if three argument values are given, the first three names in
+PARAMETER-LIST are arguments, and the rest are local variables.
+
+ It follows that if the number of arguments is not the same in all
+calls to the function, some of the names in PARAMETER-LIST may be
+arguments on some occasions and local variables on others. Another way
+to think of this is that omitted arguments default to the null string.
+
+ Usually when you write a function you know how many names you intend
+to use for arguments and how many you intend to use as local variables.
+It is conventional to place some extra space between the arguments and
+the local variables, to document how your function is supposed to be
+used.
+
+ During execution of the function body, the arguments and local
+variable values hide or "shadow" any variables of the same names used
+in the rest of the program. The shadowed variables are not accessible
+in the function definition, because there is no way to name them while
+their names have been taken away for the local variables. All other
+variables used in the `awk' program can be referenced or set normally
+in the function's body.
+
+ The arguments and local variables last only as long as the function
+body is executing. Once the body finishes, you can once again access
+the variables that were shadowed while the function was running.
+
+ The function body can contain expressions which call functions. They
+can even call this function, either directly or by way of another
+function. When this happens, we say the function is "recursive".
+
+ In many `awk' implementations, including `gawk', the keyword
+`function' may be abbreviated `func'. However, POSIX only specifies
+the use of the keyword `function'. This actually has some practical
+implications. If `gawk' is in POSIX-compatibility mode (*note Command
+Line Options: Options.), then the following statement will _not_ define
+a function:
+
+ func foo() { a = sqrt($1) ; print a }
+
+Instead it defines a rule that, for each record, concatenates the value
+of the variable `func' with the return value of the function `foo'. If
+the resulting string is non-null, the action is executed. This is
+probably not what was desired. (`awk' accepts this input as
+syntactically valid, since functions may be used before they are defined
+in `awk' programs.)
+
+ To ensure that your `awk' programs are portable, always use the
+keyword `function' when defining a function.
+
+
+File: gawk.info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
+
+Function Definition Examples
+============================
+
+ Here is an example of a user-defined function, called `myprint', that
+takes a number and prints it in a specific format.
+
+ function myprint(num)
+ {
+ printf "%6.3g\n", num
+ }
+
+To illustrate, here is an `awk' rule which uses our `myprint' function:
+
+ $3 > 0 { myprint($3) }
+
+This program prints, in our special format, all the third fields that
+contain a positive number in our input. Therefore, when given:
+
+ 1.2 3.4 5.6 7.8
+ 9.10 11.12 -13.14 15.16
+ 17.18 19.20 21.22 23.24
+
+this program, using our function to format the results, prints:
+
+ 5.6
+ 21.2
+
+ This function deletes all the elements in an array.
+
+ function delarray(a, i)
+ {
+ for (i in a)
+ delete a[i]
+ }
+
+ When working with arrays, it is often necessary to delete all the
+elements in an array and start over with a new list of elements (*note
+The `delete' Statement: Delete.). Instead of having to repeat this
+loop everywhere in your program that you need to clear out an array,
+your program can just call `delarray'.
+
+ Here is an example of a recursive function. It takes a string as an
+input parameter, and returns the string in backwards order.
+
+ function rev(str, start)
+ {
+ if (start == 0)
+ return ""
+
+ return (substr(str, start, 1) rev(str, start - 1))
+ }
+
+ If this function is in a file named `rev.awk', we can test it this
+way:
+
+ $ echo "Don't Panic!" |
+ > gawk --source '{ print rev($0, length($0)) }' -f rev.awk
+ -| !cinaP t'noD
+
+ Here is an example that uses the built-in function `strftime'.
+(*Note Functions for Dealing with Time Stamps: Time Functions, for more
+information on `strftime'.) The C `ctime' function takes a timestamp
+and returns it in a string, formatted in a well known fashion. Here is
+an `awk' version:
+
+ # ctime.awk
+ #
+ # awk version of C ctime(3) function
+
+ function ctime(ts, format)
+ {
+ format = "%a %b %d %H:%M:%S %Z %Y"
+ if (ts == 0)
+ ts = systime() # use current time as default
+ return strftime(format, ts)
+ }
+
+
+File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
+
+Calling User-defined Functions
+==============================
+
+ "Calling a function" means causing the function to run and do its
+job. A function call is an expression, and its value is the value
+returned by the function.
+
+ A function call consists of the function name followed by the
+arguments in parentheses. What you write in the call for the arguments
+are `awk' expressions; each time the call is executed, these
+expressions are evaluated, and the values are the actual arguments. For
+example, here is a call to `foo' with three arguments (the first being
+a string concatenation):
+
+ foo(x y, "lose", 4 * z)
+
+ *Caution:* whitespace characters (spaces and tabs) are not allowed
+between the function name and the open-parenthesis of the argument list.
+If you write whitespace by mistake, `awk' might think that you mean to
+concatenate a variable with an expression in parentheses. However, it
+notices that you used a function name and not a variable name, and
+reports an error.
+
+ When a function is called, it is given a _copy_ of the values of its
+arguments. This is known as "call by value". The caller may use a
+variable as the expression for the argument, but the called function
+does not know this: it only knows what value the argument had. For
+example, if you write this code:
+
+ foo = "bar"
+ z = myfunc(foo)
+
+then you should not think of the argument to `myfunc' as being "the
+variable `foo'." Instead, think of the argument as the string value,
+`"bar"'.
+
+ If the function `myfunc' alters the values of its local variables,
+this has no effect on any other variables. Thus, if `myfunc' does this:
+
+ function myfunc(str)
+ {
+ print str
+ str = "zzz"
+ print str
+ }
+
+to change its first argument variable `str', this _does not_ change the
+value of `foo' in the caller. The role of `foo' in calling `myfunc'
+ended when its value, `"bar"', was computed. If `str' also exists
+outside of `myfunc', the function body cannot alter this outer value,
+because it is shadowed during the execution of `myfunc' and cannot be
+seen or changed from there.
+
+ However, when arrays are the parameters to functions, they are _not_
+copied. Instead, the array itself is made available for direct
+manipulation by the function. This is usually called "call by
+reference". Changes made to an array parameter inside the body of a
+function _are_ visible outside that function. This can be *very*
+dangerous if you do not watch what you are doing. For example:
+
+ function changeit(array, ind, nvalue)
+ {
+ array[ind] = nvalue
+ }
+
+ BEGIN {
+ a[1] = 1; a[2] = 2; a[3] = 3
+ changeit(a, 2, "two")
+ printf "a[1] = %s, a[2] = %s, a[3] = %s\n",
+ a[1], a[2], a[3]
+ }
+
+This program prints `a[1] = 1, a[2] = two, a[3] = 3', because
+`changeit' stores `"two"' in the second element of `a'.
+
+ Some `awk' implementations allow you to call a function that has not
+been defined, and only report a problem at run-time when the program
+actually tries to call the function. For example:
+
+ BEGIN {
+ if (0)
+ foo()
+ else
+ bar()
+ }
+ function bar() { ... }
+ # note that `foo' is not defined
+
+Since the `if' statement will never be true, it is not really a problem
+that `foo' has not been defined. Usually though, it is a problem if a
+program calls an undefined function.
+
+ If `--lint' has been specified (*note Command Line Options:
+Options.), `gawk' will report about calls to undefined functions.
+
+ Some `awk' implementations generate a run-time error if you use the
+`next' statement (*note The `next' Statement: Next Statement.) inside
+a user-defined function. `gawk' does not have this problem.
+
+
+File: gawk.info, Node: Return Statement, Prev: Function Caveats, Up: User-defined
+
+The `return' Statement
+======================
+
+ The body of a user-defined function can contain a `return' statement.
+This statement returns control to the rest of the `awk' program. It
+can also be used to return a value for use in the rest of the `awk'
+program. It looks like this:
+
+ return [EXPRESSION]
+
+ The EXPRESSION part is optional. If it is omitted, then the returned
+value is undefined and, therefore, unpredictable.
+
+ A `return' statement with no value expression is assumed at the end
+of every function definition. So if control reaches the end of the
+function body, then the function returns an unpredictable value. `awk'
+will _not_ warn you if you use the return value of such a function.
+
+ Sometimes, you want to write a function for what it does, not for
+what it returns. Such a function corresponds to a `void' function in C
+or to a `procedure' in Pascal. Thus, it may be appropriate to not
+return any value; you should simply bear in mind that if you use the
+return value of such a function, you do so at your own risk.
+
+ Here is an example of a user-defined function that returns a value
+for the largest number among the elements of an array:
+
+ function maxelt(vec, i, ret)
+ {
+ for (i in vec) {
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ }
+ return ret
+ }
+
+You call `maxelt' with one argument, which is an array name. The local
+variables `i' and `ret' are not intended to be arguments; while there
+is nothing to stop you from passing two or three arguments to `maxelt',
+the results would be strange. The extra space before `i' in the
+function parameter list indicates that `i' and `ret' are not supposed
+to be arguments. This is a convention that you should follow when you
+define functions.
+
+ Here is a program that uses our `maxelt' function. It loads an
+array, calls `maxelt', and then reports the maximum number in that
+array:
+
+ awk '
+ function maxelt(vec, i, ret)
+ {
+ for (i in vec) {
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ }
+ return ret
+ }
+
+ # Load all fields of each record into nums.
+ {
+ for(i = 1; i <= NF; i++)
+ nums[NR, i] = $i
+ }
+
+ END {
+ print maxelt(nums)
+ }'
+
+ Given the following input:
+
+ 1 5 23 8 16
+ 44 3 5 2 8 26
+ 256 291 1396 2962 100
+ -6 467 998 1101
+ 99385 11 0 225
+
+our program tells us (predictably) that `99385' is the largest number
+in our array.
+
+
+File: gawk.info, Node: Invoking Gawk, Next: Library Functions, Prev: User-defined, Up: Top
+
+Running `awk'
+*************
+
+ There are two ways to run `awk': with an explicit program, or with
+one or more program files. Here are templates for both of them; items
+enclosed in `[...]' in these templates are optional.
+
+ Besides traditional one-letter POSIX-style options, `gawk' also
+supports GNU long options.
+
+ awk [OPTIONS] -f progfile [`--'] FILE ...
+ awk [OPTIONS] [`--'] 'PROGRAM' FILE ...
+
+ It is possible to invoke `awk' with an empty program:
+
+ $ awk '' datafile1 datafile2
+
+Doing so makes little sense though; `awk' will simply exit silently
+when given an empty program (d.c.). If `--lint' has been specified on
+the command line, `gawk' will issue a warning that the program is empty.
+
+* Menu:
+
+* Options:: Command line options and their meanings.
+* Other Arguments:: Input file names and variable assignments.
+* AWKPATH Variable:: Searching directories for `awk' programs.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Known Bugs:: Known Bugs in `gawk'.
+
+
+File: gawk.info, Node: Options, Next: Other Arguments, Prev: Invoking Gawk, Up: Invoking Gawk
+
+Command Line Options
+====================
+
+ Options begin with a dash, and consist of a single character. GNU
+style long options consist of two dashes and a keyword. The keyword
+can be abbreviated, as long the abbreviation allows the option to be
+uniquely identified. If the option takes an argument, then the keyword
+is either immediately followed by an equals sign (`=') and the
+argument's value, or the keyword and the argument's value are separated
+by whitespace. For brevity, the discussion below only refers to the
+traditional short options; however the long and short options are
+interchangeable in all contexts.
+
+ Each long option for `gawk' has a corresponding POSIX-style option.
+The options and their meanings are as follows:
+
+`-F FS'
+`--field-separator FS'
+ Sets the `FS' variable to FS (*note Specifying How Fields are
+ Separated: Field Separators.).
+
+`-f SOURCE-FILE'
+`--file SOURCE-FILE'
+ Indicates that the `awk' program is to be found in SOURCE-FILE
+ instead of in the first non-option argument.
+
+`-v VAR=VAL'
+`--assign VAR=VAL'
+ Sets the variable VAR to the value VAL *before* execution of the
+ program begins. Such variable values are available inside the
+ `BEGIN' rule (*note Other Command Line Arguments: Other
+ Arguments.).
+
+ The `-v' option can only set one variable, but you can use it more
+ than once, setting another variable each time, like this: `awk
+ -v foo=1 -v bar=2 ...'.
+
+`-mf NNN'
+`-mr NNN'
+ Set various memory limits to the value NNN. The `f' flag sets the
+ maximum number of fields, and the `r' flag sets the maximum record
+ size. These two flags and the `-m' option are from the Bell Labs
+ research version of Unix `awk'. They are provided for
+ compatibility, but otherwise ignored by `gawk', since `gawk' has
+ no predefined limits.
+
+`-W GAWK-OPT'
+ Following the POSIX standard, options that are implementation
+ specific are supplied as arguments to the `-W' option. These
+ options also have corresponding GNU style long options. See below.
+
+`--'
+ Signals the end of the command line options. The following
+ arguments are not treated as options even if they begin with `-'.
+ This interpretation of `--' follows the POSIX argument parsing
+ conventions.
+
+ This is useful if you have file names that start with `-', or in
+ shell scripts, if you have file names that will be specified by
+ the user which could start with `-'.
+
+ The following `gawk'-specific options are available:
+
+`-W traditional'
+`-W compat'
+`--traditional'
+`--compat'
+ Specifies "compatibility mode", in which the GNU extensions to the
+ `awk' language are disabled, so that `gawk' behaves just like the
+ Bell Labs research version of Unix `awk'. `--traditional' is the
+ preferred form of this option. *Note Extensions in `gawk' Not in
+ POSIX `awk': POSIX/GNU, which summarizes the extensions. Also see
+ *Note Downward Compatibility and Debugging: Compatibility Mode.
+
+`-W copyleft'
+`-W copyright'
+`--copyleft'
+`--copyright'
+ Print the short version of the General Public License, and then
+ exit. This option may disappear in a future version of `gawk'.
+
+`-W help'
+`-W usage'
+`--help'
+`--usage'
+ Print a "usage" message summarizing the short and long style
+ options that `gawk' accepts, and then exit.
+
+`-W lint'
+`--lint'
+ Warn about constructs that are dubious or non-portable to other
+ `awk' implementations. Some warnings are issued when `gawk' first
+ reads your program. Others are issued at run-time, as your
+ program executes.
+
+`-W lint-old'
+`--lint-old'
+ Warn about constructs that are not available in the original
+ Version 7 Unix version of `awk' (*note Major Changes between V7
+ and SVR3.1: V7/SVR3.1.).
+
+`-W posix'
+`--posix'
+ Operate in strict POSIX mode. This disables all `gawk' extensions
+ (just like `--traditional'), and adds the following additional
+ restrictions:
+
+ * `\x' escape sequences are not recognized (*note Escape
+ Sequences::).
+
+ * Newlines do not act as whitespace to separate fields when
+ `FS' is equal to a single space.
+
+ * The synonym `func' for the keyword `function' is not
+ recognized (*note Function Definition Syntax: Definition
+ Syntax.).
+
+ * The operators `**' and `**=' cannot be used in place of `^'
+ and `^=' (*note Arithmetic Operators: Arithmetic Ops., and
+ also *note Assignment Expressions: Assignment Ops.).
+
+ * Specifying `-Ft' on the command line does not set the value
+ of `FS' to be a single tab character (*note Specifying How
+ Fields are Separated: Field Separators.).
+
+ * The `fflush' built-in function is not supported (*note
+ Built-in Functions for Input/Output: I/O Functions.).
+
+ If you supply both `--traditional' and `--posix' on the command
+ line, `--posix' will take precedence. `gawk' will also issue a
+ warning if both options are supplied.
+
+`-W re-interval'
+`--re-interval'
+ Allow interval expressions (*note Regular Expression Operators:
+ Regexp Operators.), in regexps. Because interval expressions were
+ traditionally not available in `awk', `gawk' does not provide them
+ by default. This prevents old `awk' programs from breaking.
+
+`-W source PROGRAM-TEXT'
+`--source PROGRAM-TEXT'
+ Program source code is taken from the PROGRAM-TEXT. This option
+ allows you to mix source code in files with source code that you
+ enter on the command line. This is particularly useful when you
+ have library functions that you wish to use from your command line
+ programs (*note The `AWKPATH' Environment Variable: AWKPATH
+ Variable.).
+
+`-W version'
+`--version'
+ Prints version information for this particular copy of `gawk'.
+ This allows you to determine if your copy of `gawk' is up to date
+ with respect to whatever the Free Software Foundation is currently
+ distributing. It is also useful for bug reports (*note Reporting
+ Problems and Bugs: Bugs.).
+
+ Any other options are flagged as invalid with a warning message, but
+are otherwise ignored.
+
+ In compatibility mode, as a special case, if the value of FS supplied
+to the `-F' option is `t', then `FS' is set to the tab character
+(`"\t"'). This is only true for `--traditional', and not for `--posix'
+(*note Specifying How Fields are Separated: Field Separators.).
+
+ The `-f' option may be used more than once on the command line. If
+it is, `awk' reads its program source from all of the named files, as
+if they had been concatenated together into one big file. This is
+useful for creating libraries of `awk' functions. Useful functions can
+be written once, and then retrieved from a standard place, instead of
+having to be included into each individual program.
+
+ You can type in a program at the terminal and still use library
+functions, by specifying `-f /dev/tty'. `awk' will read a file from
+the terminal to use as part of the `awk' program. After typing your
+program, type `Control-d' (the end-of-file character) to terminate it.
+(You may also use `-f -' to read program source from the standard
+input, but then you will not be able to also use the standard input as a
+source of data.)
+
+ Because it is clumsy using the standard `awk' mechanisms to mix
+source file and command line `awk' programs, `gawk' provides the
+`--source' option. This does not require you to pre-empt the standard
+input for your source code, and allows you to easily mix command line
+and library source code (*note The `AWKPATH' Environment Variable:
+AWKPATH Variable.).
+
+ If no `-f' or `--source' option is specified, then `gawk' will use
+the first non-option command line argument as the text of the program
+source code.
+
+ If the environment variable `POSIXLY_CORRECT' exists, then `gawk'
+will behave in strict POSIX mode, exactly as if you had supplied the
+`--posix' command line option. Many GNU programs look for this
+environment variable to turn on strict POSIX mode. If you supply
+`--lint' on the command line, and `gawk' turns on POSIX mode because of
+`POSIXLY_CORRECT', then it will print a warning message indicating that
+POSIX mode is in effect.
+
+ You would typically set this variable in your shell's startup file.
+For a Bourne compatible shell (such as Bash), you would add these lines
+to the `.profile' file in your home directory.
+
+ POSIXLY_CORRECT=true
+ export POSIXLY_CORRECT
+
+ For a `csh' compatible shell,(1) you would add this line to the
+`.login' file in your home directory.
+
+ setenv POSIXLY_CORRECT true
+
+ ---------- Footnotes ----------
+
+ (1) Not recommended.
+
+
+File: gawk.info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Invoking Gawk
+
+Other Command Line Arguments
+============================
+
+ Any additional arguments on the command line are normally treated as
+input files to be processed in the order specified. However, an
+argument that has the form `VAR=VALUE', assigns the value VALUE to the
+variable VAR--it does not specify a file at all.
+
+ All these arguments are made available to your `awk' program in the
+`ARGV' array (*note Built-in Variables::). Command line options and
+the program text (if present) are omitted from `ARGV'. All other
+arguments, including variable assignments, are included. As each
+element of `ARGV' is processed, `gawk' sets the variable `ARGIND' to
+the index in `ARGV' of the current element.
+
+ The distinction between file name arguments and variable-assignment
+arguments is made when `awk' is about to open the next input file. At
+that point in execution, it checks the "file name" to see whether it is
+really a variable assignment; if so, `awk' sets the variable instead of
+reading a file.
+
+ Therefore, the variables actually receive the given values after all
+previously specified files have been read. In particular, the values of
+variables assigned in this fashion are _not_ available inside a `BEGIN'
+rule (*note The `BEGIN' and `END' Special Patterns: BEGIN/END.), since
+such rules are run before `awk' begins scanning the argument list.
+
+ The variable values given on the command line are processed for
+escape sequences (d.c.) (*note Escape Sequences::).
+
+ In some earlier implementations of `awk', when a variable assignment
+occurred before any file names, the assignment would happen _before_
+the `BEGIN' rule was executed. `awk''s behavior was thus inconsistent;
+some command line assignments were available inside the `BEGIN' rule,
+while others were not. However, some applications came to depend upon
+this "feature." When `awk' was changed to be more consistent, the `-v'
+option was added to accommodate applications that depended upon the old
+behavior.
+
+ The variable assignment feature is most useful for assigning to
+variables such as `RS', `OFS', and `ORS', which control input and
+output formats, before scanning the data files. It is also useful for
+controlling state if multiple passes are needed over a data file. For
+example:
+
+ awk 'pass == 1 { PASS 1 STUFF }
+ pass == 2 { PASS 2 STUFF }' pass=1 mydata pass=2 mydata
+
+ Given the variable assignment feature, the `-F' option for setting
+the value of `FS' is not strictly necessary. It remains for historical
+compatibility.
+
+
+File: gawk.info, Node: AWKPATH Variable, Next: Obsolete, Prev: Other Arguments, Up: Invoking Gawk
+
+The `AWKPATH' Environment Variable
+==================================
+
+ The previous section described how `awk' program files can be named
+on the command line with the `-f' option. In most `awk'
+implementations, you must supply a precise path name for each program
+file, unless the file is in the current directory.
+
+ But in `gawk', if the file name supplied to the `-f' option does not
+contain a `/', then `gawk' searches a list of directories (called the
+"search path"), one by one, looking for a file with the specified name.
+
+ The search path is a string consisting of directory names separated
+by colons. `gawk' gets its search path from the `AWKPATH' environment
+variable. If that variable does not exist, `gawk' uses a default path,
+which is `.:/usr/local/share/awk'.(1) (Programs written for use by
+system administrators should use an `AWKPATH' variable that does not
+include the current directory, `.'.)
+
+ The search path feature is particularly useful for building up
+libraries of useful `awk' functions. The library files can be placed
+in a standard directory that is in the default path, and then specified
+on the command line with a short file name. Otherwise, the full file
+name would have to be typed for each file.
+
+ By using both the `--source' and `-f' options, your command line
+`awk' programs can use facilities in `awk' library files. *Note A
+Library of `awk' Functions: Library Functions.
+
+ Path searching is not done if `gawk' is in compatibility mode. This
+is true for both `--traditional' and `--posix'. *Note Command Line
+Options: Options.
+
+ *Note:* if you want files in the current directory to be found, you
+must include the current directory in the path, either by including `.'
+explicitly in the path, or by writing a null entry in the path. (A
+null entry is indicated by starting or ending the path with a colon, or
+by placing two colons next to each other (`::').) If the current
+directory is not included in the path, then files cannot be found in
+the current directory. This path search mechanism is identical to the
+shell's.
+
+ Starting with version 3.0, if `AWKPATH' is not defined in the
+environment, `gawk' will place its default search path into
+`ENVIRON["AWKPATH"]'. This makes it easy to determine the actual search
+path `gawk' will use.
+
+ ---------- Footnotes ----------
+
+ (1) Your version of `gawk' may use a directory that is different
+than `/usr/local/share/awk'; it will depend upon how `gawk' was built
+and installed. The actual directory will be the value of `$(datadir)'
+generated when `gawk' was configured. You probably don't need to worry
+about this though.
+
+
+File: gawk.info, Node: Obsolete, Next: Undocumented, Prev: AWKPATH Variable, Up: Invoking Gawk
+
+Obsolete Options and/or Features
+================================
+
+ This section describes features and/or command line options from
+previous releases of `gawk' that are either not available in the
+current version, or that are still supported but deprecated (meaning
+that they will _not_ be in the next release).
+
+ For version 3.0.1 of `gawk', there are no command line options or
+other deprecated features from the previous version of `gawk'. This
+node is thus essentially a place holder, in case some option becomes
+obsolete in a future version of `gawk'.
+
+
+File: gawk.info, Node: Undocumented, Next: Known Bugs, Prev: Obsolete, Up: Invoking Gawk
+
+Undocumented Options and Features
+=================================
+
+ This section intentionally left blank.
+
+
+File: gawk.info, Node: Known Bugs, Prev: Undocumented, Up: Invoking Gawk
+
+Known Bugs in `gawk'
+====================
+
+ * The `-F' option for changing the value of `FS' (*note Command Line
+ Options: Options.) is not necessary given the command line
+ variable assignment feature; it remains only for backwards
+ compatibility.
+
+ * If your system actually has support for `/dev/fd' and the
+ associated `/dev/stdin', `/dev/stdout', and `/dev/stderr' files,
+ you may get different output from `gawk' than you would get on a
+ system without those files. When `gawk' interprets these files
+ internally, it synchronizes output to the standard output with
+ output to `/dev/stdout', while on a system with those files, the
+ output is actually to different open files (*note Special File
+ Names in `gawk': Special Files.).
+
+ * Syntactically invalid single character programs tend to overflow
+ the parse stack, generating a rather unhelpful message. Such
+ programs are surprisingly difficult to diagnose in the completely
+ general case, and the effort to do so really is not worth it.
+
+
+File: gawk.info, Node: Library Functions, Next: Sample Programs, Prev: Invoking Gawk, Up: Top
+
+A Library of `awk' Functions
+****************************
+
+ This chapter presents a library of useful `awk' functions. The
+sample programs presented later (*note Practical `awk' Programs: Sample
+Programs.) use these functions. The functions are presented here in a
+progression from simple to complex.
+
+ *Note Extracting Programs from Texinfo Source Files: Extract Program,
+presents a program that you can use to extract the source code for
+these example library functions and programs from the Texinfo source
+for this Info file. (This has already been done as part of the `gawk'
+distribution.)
+
+ If you have written one or more useful, general purpose `awk'
+functions, and would like to contribute them for a subsequent edition
+of this Info file, please contact the author. *Note Reporting Problems
+and Bugs: Bugs, for information on doing this. Don't just send code,
+as you will be required to either place your code in the public domain,
+publish it under the GPL (*note GNU GENERAL PUBLIC LICENSE: Copying.),
+or assign the copyright in it to the Free Software Foundation.
+
+* Menu:
+
+* Portability Notes:: What to do if you don't have `gawk'.
+* Nextfile Function:: Two implementations of a `nextfile'
+ function.
+* Assert Function:: A function for assertions in `awk'
+ programs.
+* Round Function:: A function for rounding if `sprintf' does
+ not do it correctly.
+* Ordinal Functions:: Functions for using characters as numbers and
+ vice versa.
+* Join Function:: A function to join an array into a string.
+* Mktime Function:: A function to turn a date into a timestamp.
+* Gettimeofday Function:: A function to get formatted times.
+* Filetrans Function:: A function for handling data file transitions.
+* Getopt Function:: A function for processing command line
+ arguments.
+* Passwd Functions:: Functions for getting user information.
+* Group Functions:: Functions for getting group information.
+* Library Names:: How to best name private global variables in
+ library functions.
+
+
+File: gawk.info, Node: Portability Notes, Next: Nextfile Function, Prev: Library Functions, Up: Library Functions
+
+Simulating `gawk'-specific Features
+===================================
+
+ The programs in this chapter and in *Note Practical `awk' Programs:
+Sample Programs, freely use features that are specific to `gawk'. This
+section briefly discusses how you can rewrite these programs for
+different implementations of `awk'.
+
+ Diagnostic error messages are sent to `/dev/stderr'. Use `| "cat
+1>&2"' instead of `> "/dev/stderr"', if your system does not have a
+`/dev/stderr', or if you cannot use `gawk'.
+
+ A number of programs use `nextfile' (*note The `nextfile' Statement:
+Nextfile Statement.), to skip any remaining input in the input file.
+*Note Implementing `nextfile' as a Function: Nextfile Function, shows
+you how to write a function that will do the same thing.
+
+ Finally, some of the programs choose to ignore upper-case and
+lower-case distinctions in their input. They do this by assigning one
+to `IGNORECASE'. You can achieve the same effect by adding the
+following rule to the beginning of the program:
+
+ # ignore case
+ { $0 = tolower($0) }
+
+Also, verify that all regexp and string constants used in comparisons
+only use lower-case letters.
+
+
+File: gawk.info, Node: Nextfile Function, Next: Assert Function, Prev: Portability Notes, Up: Library Functions
+
+Implementing `nextfile' as a Function
+=====================================
+
+ The `nextfile' statement presented in *Note The `nextfile'
+Statement: Nextfile Statement, is a `gawk'-specific extension. It is
+not available in other implementations of `awk'. This section shows
+two versions of a `nextfile' function that you can use to simulate
+`gawk''s `nextfile' statement if you cannot use `gawk'.
+
+ Here is a first attempt at writing a `nextfile' function.
+
+ # nextfile --- skip remaining records in current file
+
+ # this should be read in before the "main" awk program
+
+ function nextfile() { _abandon_ = FILENAME; next }
+
+ _abandon_ == FILENAME { next }
+
+ This file should be included before the main program, because it
+supplies a rule that must be executed first. This rule compares the
+current data file's name (which is always in the `FILENAME' variable)
+to a private variable named `_abandon_'. If the file name matches,
+then the action part of the rule executes a `next' statement, to go on
+to the next record. (The use of `_' in the variable name is a
+convention. It is discussed more fully in *Note Naming Library
+Function Global Variables: Library Names.)
+
+ The use of the `next' statement effectively creates a loop that reads
+all the records from the current data file. Eventually, the end of the
+file is reached, and a new data file is opened, changing the value of
+`FILENAME'. Once this happens, the comparison of `_abandon_' to
+`FILENAME' fails, and execution continues with the first rule of the
+"real" program.
+
+ The `nextfile' function itself simply sets the value of `_abandon_'
+and then executes a `next' statement to start the loop going.(1)
+
+ This initial version has a subtle problem. What happens if the same
+data file is listed _twice_ on the command line, one right after the
+other, or even with just a variable assignment between the two
+occurrences of the file name?
+
+ In such a case, this code will skip right through the file, a second
+time, even though it should stop when it gets to the end of the first
+occurrence. Here is a second version of `nextfile' that remedies this
+problem.
+
+ # nextfile --- skip remaining records in current file
+ # correctly handle successive occurrences of the same file
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May, 1993
+
+ # this should be read in before the "main" awk program
+
+ function nextfile() { _abandon_ = FILENAME; next }
+
+ _abandon_ == FILENAME {
+ if (FNR == 1)
+ _abandon_ = ""
+ else
+ next
+ }
+
+ The `nextfile' function has not changed. It sets `_abandon_' equal
+to the current file name and then executes a `next' satement. The
+`next' statement reads the next record and increments `FNR', so `FNR'
+is guaranteed to have a value of at least two. However, if `nextfile'
+is called for the last record in the file, then `awk' will close the
+current data file and move on to the next one. Upon doing so,
+`FILENAME' will be set to the name of the new file, and `FNR' will be
+reset to one. If this next file is the same as the previous one,
+`_abandon_' will still be equal to `FILENAME'. However, `FNR' will be
+equal to one, telling us that this is a new occurrence of the file, and
+not the one we were reading when the `nextfile' function was executed.
+In that case, `_abandon_' is reset to the empty string, so that further
+executions of this rule will fail (until the next time that `nextfile'
+is called).
+
+ If `FNR' is not one, then we are still in the original data file,
+and the program executes a `next' statement to skip through it.
+
+ An important question to ask at this point is: "Given that the
+functionality of `nextfile' can be provided with a library file, why is
+it built into `gawk'?" This is an important question. Adding features
+for little reason leads to larger, slower programs that are harder to
+maintain.
+
+ The answer is that building `nextfile' into `gawk' provides
+significant gains in efficiency. If the `nextfile' function is executed
+at the beginning of a large data file, `awk' still has to scan the
+entire file, splitting it up into records, just to skip over it. The
+built-in `nextfile' can simply close the file immediately and proceed
+to the next one, saving a lot of time. This is particularly important
+in `awk', since `awk' programs are generally I/O bound (i.e. they
+spend most of their time doing input and output, instead of performing
+computations).
+
+ ---------- Footnotes ----------
+
+ (1) Some implementations of `awk' do not allow you to execute `next'
+from within a function body. Some other work-around will be necessary
+if you use such a version.
+
+
+File: gawk.info, Node: Assert Function, Next: Round Function, Prev: Nextfile Function, Up: Library Functions
+
+Assertions
+==========
+
+ When writing large programs, it is often useful to be able to know
+that a condition or set of conditions is true. Before proceeding with a
+particular computation, you make a statement about what you believe to
+be the case. Such a statement is known as an "assertion." The C
+language provides an `<assert.h>' header file and corresponding
+`assert' macro that the programmer can use to make assertions. If an
+assertion fails, the `assert' macro arranges to print a diagnostic
+message describing the condition that should have been true but was
+not, and then it kills the program. In C, using `assert' looks this:
+
+ #include <assert.h>
+
+ int myfunc(int a, double b)
+ {
+ assert(a <= 5 && b >= 17);
+ ...
+ }
+
+ If the assertion failed, the program would print a message similar to
+this:
+
+ prog.c:5: assertion failed: a <= 5 && b >= 17
+
+ The ANSI C language makes it possible to turn the condition into a
+string for use in printing the diagnostic message. This is not
+possible in `awk', so this `assert' function also requires a string
+version of the condition that is being tested.
+
+ # assert --- assert that a condition is true. Otherwise exit.
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May, 1993
+
+ function assert(condition, string)
+ {
+ if (! condition) {
+ printf("%s:%d: assertion failed: %s\n",
+ FILENAME, FNR, string) > "/dev/stderr"
+ _assert_exit = 1
+ exit 1
+ }
+ }
+
+ END {
+ if (_assert_exit)
+ exit 1
+ }
+
+ The `assert' function tests the `condition' parameter. If it is
+false, it prints a message to standard error, using the `string'
+parameter to describe the failed condition. It then sets the variable
+`_assert_exit' to one, and executes the `exit' statement. The `exit'
+statement jumps to the `END' rule. If the `END' rules finds
+`_assert_exit' to be true, then it exits immediately.
+
+ The purpose of the `END' rule with its test is to keep any other
+`END' rules from running. When an assertion fails, the program should
+exit immediately. If no assertions fail, then `_assert_exit' will
+still be false when the `END' rule is run normally, and the rest of the
+program's `END' rules will execute. For all of this to work correctly,
+`assert.awk' must be the first source file read by `awk'.
+
+ You would use this function in your programs this way:
+
+ function myfunc(a, b)
+ {
+ assert(a <= 5 && b >= 17, "a <= 5 && b >= 17")
+ ...
+ }
+
+If the assertion failed, you would see a message like this:
+
+ mydata:1357: assertion failed: a <= 5 && b >= 17
+
+ There is a problem with this version of `assert', that it may not be
+possible to work around. An `END' rule is automatically added to the
+program calling `assert'. Normally, if a program consists of just a
+`BEGIN' rule, the input files and/or standard input are not read.
+However, now that the program has an `END' rule, `awk' will attempt to
+read the input data files, or standard input (*note Startup and Cleanup
+Actions: Using BEGIN/END.), most likely causing the program to hang,
+waiting for input.
+
+
+File: gawk.info, Node: Round Function, Next: Ordinal Functions, Prev: Assert Function, Up: Library Functions
+
+Rounding Numbers
+================
+
+ The way `printf' and `sprintf' (*note Using `printf' Statements for
+Fancier Printing: Printf.) do rounding will often depend upon the
+system's C `sprintf' subroutine. On many machines, `sprintf' rounding
+is "unbiased," which means it doesn't always round a trailing `.5' up,
+contrary to naive expectations. In unbiased rounding, `.5' rounds to
+even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4.
+The result is that if you are using a format that does rounding (e.g.,
+`"%.0f"') you should check what your system does. The following
+function does traditional rounding; it might be useful if your awk's
+`printf' does unbiased rounding.
+
+ # round --- do normal rounding
+ #
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, August, 1996
+ # Public Domain
+
+ function round(x, ival, aval, fraction)
+ {
+ ival = int(x) # integer part, int() truncates
+
+ # see if fractional part
+ if (ival == x) # no fraction
+ return x
+
+ if (x < 0) {
+ aval = -x # absolute value
+ ival = int(aval)
+ fraction = aval - ival
+ if (fraction >= .5)
+ return int(x) - 1 # -2.5 --> -3
+ else
+ return int(x) # -2.3 --> -2
+ } else {
+ fraction = x - ival
+ if (fraction >= .5)
+ return ival + 1
+ else
+ return ival
+ }
+ }
+
+ # test harness
+ { print $0, round($0) }
+
+
+File: gawk.info, Node: Ordinal Functions, Next: Join Function, Prev: Round Function, Up: Library Functions
+
+Translating Between Characters and Numbers
+==========================================
+
+ One commercial implementation of `awk' supplies a built-in function,
+`ord', which takes a character and returns the numeric value for that
+character in the machine's character set. If the string passed to
+`ord' has more than one character, only the first one is used.
+
+ The inverse of this function is `chr' (from the function of the same
+name in Pascal), which takes a number and returns the corresponding
+character.
+
+ Both functions can be written very nicely in `awk'; there is no real
+reason to build them into the `awk' interpreter.
+
+ # ord.awk --- do ord and chr
+ #
+ # Global identifiers:
+ # _ord_: numerical values indexed by characters
+ # _ord_init: function to initialize _ord_
+ #
+ # Arnold Robbins
+ # arnold@gnu.ai.mit.edu
+ # Public Domain
+ # 16 January, 1992
+ # 20 July, 1992, revised
+
+ BEGIN { _ord_init() }
+
+ function _ord_init( low, high, i, t)
+ {
+ low = sprintf("%c", 7) # BEL is ascii 7
+ if (low == "\a") { # regular ascii
+ low = 0
+ high = 127
+ } else if (sprintf("%c", 128 + 7) == "\a") {
+ # ascii, mark parity
+ low = 128
+ high = 255
+ } else { # ebcdic(!)
+ low = 0
+ high = 255
+ }
+
+ for (i = low; i <= high; i++) {
+ t = sprintf("%c", i)
+ _ord_[t] = i
+ }
+ }
+
+ Some explanation of the numbers used by `chr' is worthwhile. The
+most prominent character set in use today is ASCII. Although an
+eight-bit byte can hold 256 distinct values (from zero to 255), ASCII
+only defines characters that use the values from zero to 127.(1) At
+least one computer manufacturer that we know of uses ASCII, but with
+mark parity, meaning that the leftmost bit in the byte is always one.
+What this means is that on those systems, characters have numeric
+values from 128 to 255. Finally, large mainframe systems use the
+EBCDIC character set, which uses all 256 values. While there are other
+character sets in use on some older systems, they are not really worth
+worrying about.
+
+ function ord(str, c)
+ {
+ # only first character is of interest
+ c = substr(str, 1, 1)
+ return _ord_[c]
+ }
+
+ function chr(c)
+ {
+ # force c to be numeric by adding 0
+ return sprintf("%c", c + 0)
+ }
+
+ #### test code ####
+ # BEGIN \
+ # {
+ # for (;;) {
+ # printf("enter a character: ")
+ # if (getline var <= 0)
+ # break
+ # printf("ord(%s) = %d\n", var, ord(var))
+ # }
+ # }
+
+ An obvious improvement to these functions would be to move the code
+for the `_ord_init' function into the body of the `BEGIN' rule. It was
+written this way initially for ease of development.
+
+ There is a "test program" in a `BEGIN' rule, for testing the
+function. It is commented out for production use.
+
+ ---------- Footnotes ----------
+
+ (1) ASCII has been extended in many countries to use the values from
+128 to 255 for country-specific characters. If your system uses these
+extensions, you can simplify `_ord_init' to simply loop from zero to
+255.
+
+
+File: gawk.info, Node: Join Function, Next: Mktime Function, Prev: Ordinal Functions, Up: Library Functions
+
+Merging an Array Into a String
+==============================
+
+ When doing string processing, it is often useful to be able to join
+all the strings in an array into one long string. The following
+function, `join', accomplishes this task. It is used later in several
+of the application programs (*note Practical `awk' Programs: Sample
+Programs.).
+
+ Good function design is important; this function needs to be
+general, but it should also have a reasonable default behavior. It is
+called with an array and the beginning and ending indices of the
+elements in the array to be merged. This assumes that the array
+indices are numeric--a reasonable assumption since the array was likely
+created with `split' (*note Built-in Functions for String Manipulation:
+String Functions.).
+
+ # join.awk --- join an array into a string
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ function join(array, start, end, sep, result, i)
+ {
+ if (sep == "")
+ sep = " "
+ else if (sep == SUBSEP) # magic value
+ sep = ""
+ result = array[start]
+ for (i = start + 1; i <= end; i++)
+ result = result sep array[i]
+ return result
+ }
+
+ An optional additional argument is the separator to use when joining
+the strings back together. If the caller supplies a non-empty value,
+`join' uses it. If it is not supplied, it will have a null value. In
+this case, `join' uses a single blank as a default separator for the
+strings. If the value is equal to `SUBSEP', then `join' joins the
+strings with no separator between them. `SUBSEP' serves as a "magic"
+value to indicate that there should be no separation between the
+component strings.
+
+ It would be nice if `awk' had an assignment operator for
+concatenation. The lack of an explicit operator for concatenation
+makes string operations more difficult than they really need to be.
+
+
+File: gawk.info, Node: Mktime Function, Next: Gettimeofday Function, Prev: Join Function, Up: Library Functions
+
+Turning Dates Into Timestamps
+=============================
+
+ The `systime' function built in to `gawk' returns the current time
+of day as a timestamp in "seconds since the Epoch." This timestamp can
+be converted into a printable date of almost infinitely variable format
+using the built-in `strftime' function. (For more information on
+`systime' and `strftime', *note Functions for Dealing with Time Stamps:
+Time Functions..)
+
+ An interesting but difficult problem is to convert a readable
+representation of a date back into a timestamp. The ANSI C library
+provides a `mktime' function that does the basic job, converting a
+canonical representation of a date into a timestamp.
+
+ It would appear at first glance that `gawk' would have to supply a
+`mktime' built-in function that was simply a "hook" to the C language
+version. In fact though, `mktime' can be implemented entirely in `awk'.
+
+ Here is a version of `mktime' for `awk'. It takes a simple
+representation of the date and time, and converts it into a timestamp.
+
+ The code is presented here intermixed with explanatory prose. In
+*Note Extracting Programs from Texinfo Source Files: Extract Program,
+you will see how the Texinfo source file for this Info file can be
+processed to extract the code into a single source file.
+
+ The program begins with a descriptive comment and a `BEGIN' rule
+that initializes a table `_tm_months'. This table is a two-dimensional
+array that has the lengths of the months. The first index is zero for
+regular years, and one for leap years. The values are the same for all
+the months in both kinds of years, except for February; thus the use of
+multiple assignment.
+
+ # mktime.awk --- convert a canonical date representation
+ # into a timestamp
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ BEGIN \
+ {
+ # Initialize table of month lengths
+ _tm_months[0,1] = _tm_months[1,1] = 31
+ _tm_months[0,2] = 28; _tm_months[1,2] = 29
+ _tm_months[0,3] = _tm_months[1,3] = 31
+ _tm_months[0,4] = _tm_months[1,4] = 30
+ _tm_months[0,5] = _tm_months[1,5] = 31
+ _tm_months[0,6] = _tm_months[1,6] = 30
+ _tm_months[0,7] = _tm_months[1,7] = 31
+ _tm_months[0,8] = _tm_months[1,8] = 31
+ _tm_months[0,9] = _tm_months[1,9] = 30
+ _tm_months[0,10] = _tm_months[1,10] = 31
+ _tm_months[0,11] = _tm_months[1,11] = 30
+ _tm_months[0,12] = _tm_months[1,12] = 31
+ }
+
+ The benefit of merging multiple `BEGIN' rules (*note The `BEGIN' and
+`END' Special Patterns: BEGIN/END.) is particularly clear when writing
+library files. Functions in library files can cleanly initialize their
+own private data and also provide clean-up actions in private `END'
+rules.
+
+ The next function is a simple one that computes whether a given year
+is or is not a leap year. If a year is evenly divisible by four, but
+not evenly divisible by 100, or if it is evenly divisible by 400, then
+it is a leap year. Thus, 1904 was a leap year, 1900 was not, but 2000
+will be.
+
+ # decide if a year is a leap year
+ function _tm_isleap(year, ret)
+ {
+ ret = (year % 4 == 0 && year % 100 != 0) ||
+ (year % 400 == 0)
+
+ return ret
+ }
+
+ This function is only used a few times in this file, and its
+computation could have been written "in-line" (at the point where it's
+used). Making it a separate function made the original development
+easier, and also avoids the possibility of typing errors when
+duplicating the code in multiple places.
+
+ The next function is more interesting. It does most of the work of
+generating a timestamp, which is converting a date and time into some
+number of seconds since the Epoch. The caller passes an array (rather
+imaginatively named `a') containing six values: the year including
+century, the month as a number between one and 12, the day of the
+month, the hour as a number between zero and 23, the minute in the
+hour, and the seconds within the minute.
+
+ The function uses several local variables to precompute the number of
+seconds in an hour, seconds in a day, and seconds in a year. Often,
+similar C code simply writes out the expression in-line, expecting the
+compiler to do "constant folding". E.g., most C compilers would turn
+`60 * 60' into `3600' at compile time, instead of recomputing it every
+time at run time. Precomputing these values makes the function more
+efficient.
+
+ # convert a date into seconds
+ function _tm_addup(a, total, yearsecs, daysecs,
+ hoursecs, i, j)
+ {
+ hoursecs = 60 * 60
+ daysecs = 24 * hoursecs
+ yearsecs = 365 * daysecs
+
+ total = (a[1] - 1970) * yearsecs
+
+ # extra day for leap years
+ for (i = 1970; i < a[1]; i++)
+ if (_tm_isleap(i))
+ total += daysecs
+
+ j = _tm_isleap(a[1])
+ for (i = 1; i < a[2]; i++)
+ total += _tm_months[j, i] * daysecs
+
+ total += (a[3] - 1) * daysecs
+ total += a[4] * hoursecs
+ total += a[5] * 60
+ total += a[6]
+
+ return total
+ }
+
+ The function starts with a first approximation of all the seconds
+between Midnight, January 1, 1970,(1) and the beginning of the current
+year. It then goes through all those years, and for every leap year,
+adds an additional day's worth of seconds.
+
+ The variable `j' holds either one or zero, if the current year is or
+is not a leap year. For every month in the current year prior to the
+current month, it adds the number of seconds in the month, using the
+appropriate entry in the `_tm_months' array.
+
+ Finally, it adds in the seconds for the number of days prior to the
+current day, and the number of hours, minutes, and seconds in the
+current day.
+
+ The result is a count of seconds since January 1, 1970. This value
+is not yet what is needed though. The reason why is described shortly.
+
+ The main `mktime' function takes a single character string argument.
+This string is a representation of a date and time in a "canonical"
+(fixed) form. This string should be `"YEAR MONTH DAY HOUR MINUTE
+SECOND"'.
+
+ # mktime --- convert a date into seconds,
+ # compensate for time zone
+
+ function mktime(str, res1, res2, a, b, i, j, t, diff)
+ {
+ i = split(str, a, " ") # don't rely on FS
+
+ if (i != 6)
+ return -1
+
+ # force numeric
+ for (j in a)
+ a[j] += 0
+
+ # validate
+ if (a[1] < 1970 ||
+ a[2] < 1 || a[2] > 12 ||
+ a[3] < 1 || a[3] > 31 ||
+ a[4] < 0 || a[4] > 23 ||
+ a[5] < 0 || a[5] > 59 ||
+ a[6] < 0 || a[6] > 60 )
+ return -1
+
+ res1 = _tm_addup(a)
+ t = strftime("%Y %m %d %H %M %S", res1)
+
+ if (_tm_debug)
+ printf("(%s) -> (%s)\n", str, t) > "/dev/stderr"
+
+ split(t, b, " ")
+ res2 = _tm_addup(b)
+
+ diff = res1 - res2
+
+ if (_tm_debug)
+ printf("diff = %d seconds\n", diff) > "/dev/stderr"
+
+ res1 += diff
+
+ return res1
+ }
+
+ The function first splits the string into an array, using spaces and
+tabs as separators. If there are not six elements in the array, it
+returns an error, signaled as the value -1. Next, it forces each
+element of the array to be numeric, by adding zero to it. The
+following `if' statement then makes sure that each element is within an
+allowable range. (This checking could be extended further, e.g., to
+make sure that the day of the month is within the correct range for the
+particular month supplied.) All of this is essentially preliminary
+set-up and error checking.
+
+ Recall that `_tm_addup' generated a value in seconds since Midnight,
+January 1, 1970. This value is not directly usable as the result we
+want, _since the calculation does not account for the local timezone_.
+In other words, the value represents the count in seconds since the
+Epoch, but only for UTC (Universal Coordinated Time). If the local
+timezone is east or west of UTC, then some number of hours should be
+either added to, or subtracted from the resulting timestamp.
+
+ For example, 6:23 p.m. in Atlanta, Georgia (USA), is normally five
+hours west of (behind) UTC. It is only four hours behind UTC if
+daylight savings time is in effect. If you are calling `mktime' in
+Atlanta, with the argument `"1993 5 23 18 23 12"', the result from
+`_tm_addup' will be for 6:23 p.m. UTC, which is only 2:23 p.m. in
+Atlanta. It is necessary to add another four hours worth of seconds to
+the result.
+
+ How can `mktime' determine how far away it is from UTC? This is
+surprisingly easy. The returned timestamp represents the time passed to
+`mktime' _as UTC_. This timestamp can be fed back to `strftime', which
+will format it as a _local_ time; i.e. as if it already had the UTC
+difference added in to it. This is done by giving
+`"%Y %m %d %H %M %S"' to `strftime' as the format argument. It returns
+the computed timestamp in the original string format. The result
+represents a time that accounts for the UTC difference. When the new
+time is converted back to a timestamp, the difference between the two
+timestamps is the difference (in seconds) between the local timezone
+and UTC. This difference is then added back to the original result.
+An example demonstrating this is presented below.
+
+ Finally, there is a "main" program for testing the function.
+
+ BEGIN {
+ if (_tm_test) {
+ printf "Enter date as yyyy mm dd hh mm ss: "
+ getline _tm_test_date
+
+ t = mktime(_tm_test_date)
+ r = strftime("%Y %m %d %H %M %S", t)
+ printf "Got back (%s)\n", r
+ }
+ }
+
+ The entire program uses two variables that can be set on the command
+line to control debugging output and to enable the test in the final
+`BEGIN' rule. Here is the result of a test run. (Note that debugging
+output is to standard error, and test output is to standard output.)
+
+ $ gawk -f mktime.awk -v _tm_test=1 -v _tm_debug=1
+ -| Enter date as yyyy mm dd hh mm ss: 1993 5 23 15 35 10
+ error--> (1993 5 23 15 35 10) -> (1993 05 23 11 35 10)
+ error--> diff = 14400 seconds
+ -| Got back (1993 05 23 15 35 10)
+
+ The time entered was 3:35 p.m. (15:35 on a 24-hour clock), on May
+23, 1993. The first line of debugging output shows the resulting time
+as UTC--four hours ahead of the local time zone. The second line shows
+that the difference is 14400 seconds, which is four hours. (The
+difference is only four hours, since daylight savings time is in effect
+during May.) The final line of test output shows that the timezone
+compensation algorithm works; the returned time is the same as the
+entered time.
+
+ This program does not solve the general problem of turning an
+arbitrary date representation into a timestamp. That problem is very
+involved. However, the `mktime' function provides a foundation upon
+which to build. Other software can convert month names into numeric
+months, and AM/PM times into 24-hour clocks, to generate the
+"canonical" format that `mktime' requires.
+
+ ---------- Footnotes ----------
+
+ (1) This is the Epoch on POSIX systems. It may be different on
+other systems.
+
+
+File: gawk.info, Node: Gettimeofday Function, Next: Filetrans Function, Prev: Mktime Function, Up: Library Functions
+
+Managing the Time of Day
+========================
+
+ The `systime' and `strftime' functions described in *Note Functions
+for Dealing with Time Stamps: Time Functions, provide the minimum
+functionality necessary for dealing with the time of day in human
+readable form. While `strftime' is extensive, the control formats are
+not necessarily easy to remember or intuitively obvious when reading a
+program.
+
+ The following function, `gettimeofday', populates a user-supplied
+array with pre-formatted time information. It returns a string with
+the current time formatted in the same way as the `date' utility.
+
+ # gettimeofday --- get the time of day in a usable format
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain, May 1993
+ #
+ # Returns a string in the format of output of date(1)
+ # Populates the array argument time with individual values:
+ # time["second"] -- seconds (0 - 59)
+ # time["minute"] -- minutes (0 - 59)
+ # time["hour"] -- hours (0 - 23)
+ # time["althour"] -- hours (0 - 12)
+ # time["monthday"] -- day of month (1 - 31)
+ # time["month"] -- month of year (1 - 12)
+ # time["monthname"] -- name of the month
+ # time["shortmonth"] -- short name of the month
+ # time["year"] -- year within century (0 - 99)
+ # time["fullyear"] -- year with century (19xx or 20xx)
+ # time["weekday"] -- day of week (Sunday = 0)
+ # time["altweekday"] -- day of week (Monday = 0)
+ # time["weeknum"] -- week number, Sunday first day
+ # time["altweeknum"] -- week number, Monday first day
+ # time["dayname"] -- name of weekday
+ # time["shortdayname"] -- short name of weekday
+ # time["yearday"] -- day of year (0 - 365)
+ # time["timezone"] -- abbreviation of timezone name
+ # time["ampm"] -- AM or PM designation
+
+ function gettimeofday(time, ret, now, i)
+ {
+ # get time once, avoids unnecessary system calls
+ now = systime()
+
+ # return date(1)-style output
+ ret = strftime("%a %b %d %H:%M:%S %Z %Y", now)
+
+ # clear out target array
+ for (i in time)
+ delete time[i]
+
+ # fill in values, force numeric values to be
+ # numeric by adding 0
+ time["second"] = strftime("%S", now) + 0
+ time["minute"] = strftime("%M", now) + 0
+ time["hour"] = strftime("%H", now) + 0
+ time["althour"] = strftime("%I", now) + 0
+ time["monthday"] = strftime("%d", now) + 0
+ time["month"] = strftime("%m", now) + 0
+ time["monthname"] = strftime("%B", now)
+ time["shortmonth"] = strftime("%b", now)
+ time["year"] = strftime("%y", now) + 0
+ time["fullyear"] = strftime("%Y", now) + 0
+ time["weekday"] = strftime("%w", now) + 0
+ time["altweekday"] = strftime("%u", now) + 0
+ time["dayname"] = strftime("%A", now)
+ time["shortdayname"] = strftime("%a", now)
+ time["yearday"] = strftime("%j", now) + 0
+ time["timezone"] = strftime("%Z", now)
+ time["ampm"] = strftime("%p", now)
+ time["weeknum"] = strftime("%U", now) + 0
+ time["altweeknum"] = strftime("%W", now) + 0
+
+ return ret
+ }
+
+ The string indices are easier to use and read than the various
+formats required by `strftime'. The `alarm' program presented in *Note
+An Alarm Clock Program: Alarm Program, uses this function.
+
+ The `gettimeofday' function is presented above as it was written. A
+more general design for this function would have allowed the user to
+supply an optional timestamp value that would have been used instead of
+the current time.
+
+
+File: gawk.info, Node: Filetrans Function, Next: Getopt Function, Prev: Gettimeofday Function, Up: Library Functions
+
+Noting Data File Boundaries
+===========================
+
+ The `BEGIN' and `END' rules are each executed exactly once, at the
+beginning and end respectively of your `awk' program (*note The `BEGIN'
+and `END' Special Patterns: BEGIN/END.). We (the `gawk' authors) once
+had a user who mistakenly thought that the `BEGIN' rule was executed at
+the beginning of each data file and the `END' rule was executed at the
+end of each data file. When informed that this was not the case, the
+user requested that we add new special patterns to `gawk', named
+`BEGIN_FILE' and `END_FILE', that would have the desired behavior. He
+even supplied us the code to do so.
+
+ However, after a little thought, I came up with the following
+library program. It arranges to call two user-supplied functions,
+`beginfile' and `endfile', at the beginning and end of each data file.
+Besides solving the problem in only nine(!) lines of code, it does so
+_portably_; this will work with any implementation of `awk'.
+
+ # transfile.awk
+ #
+ # Give the user a hook for filename transitions
+ #
+ # The user must supply functions beginfile() and endfile()
+ # that each take the name of the file being started or
+ # finished, respectively.
+ #
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, January 1992
+ # Public Domain
+
+ FILENAME != _oldfilename \
+ {
+ if (_oldfilename != "")
+ endfile(_oldfilename)
+ _oldfilename = FILENAME
+ beginfile(FILENAME)
+ }
+
+ END { endfile(FILENAME) }
+
+ This file must be loaded before the user's "main" program, so that
+the rule it supplies will be executed first.
+
+ This rule relies on `awk''s `FILENAME' variable that automatically
+changes for each new data file. The current file name is saved in a
+private variable, `_oldfilename'. If `FILENAME' does not equal
+`_oldfilename', then a new data file is being processed, and it is
+necessary to call `endfile' for the old file. Since `endfile' should
+only be called if a file has been processed, the program first checks
+to make sure that `_oldfilename' is not the null string. The program
+then assigns the current file name to `_oldfilename', and calls
+`beginfile' for the file. Since, like all `awk' variables,
+`_oldfilename' will be initialized to the null string, this rule
+executes correctly even for the first data file.
+
+ The program also supplies an `END' rule, to do the final processing
+for the last file. Since this `END' rule comes before any `END' rules
+supplied in the "main" program, `endfile' will be called first. Once
+again the value of multiple `BEGIN' and `END' rules should be clear.
+
+ This version has same problem as the first version of `nextfile'
+(*note Implementing `nextfile' as a Function: Nextfile Function.). If
+the same data file occurs twice in a row on command line, then
+`endfile' and `beginfile' will not be executed at the end of the first
+pass and at the beginning of the second pass. This version solves the
+problem.
+
+ # ftrans.awk --- handle data file transitions
+ #
+ # user supplies beginfile() and endfile() functions
+ #
+ # Arnold Robbins, arnold@gnu.ai.mit.edu. November 1992
+ # Public Domain
+
+ FNR == 1 {
+ if (_filename_ != "")
+ endfile(_filename_)
+ _filename_ = FILENAME
+ beginfile(FILENAME)
+ }
+
+ END { endfile(_filename_) }
+
+ In *Note Counting Things: Wc Program, you will see how this library
+function can be used, and how it simplifies writing the main program.
+
+
+File: gawk.info, Node: Getopt Function, Next: Passwd Functions, Prev: Filetrans Function, Up: Library Functions
+
+Processing Command Line Options
+===============================
+
+ Most utilities on POSIX compatible systems take options or
+"switches" on the command line that can be used to change the way a
+program behaves. `awk' is an example of such a program (*note Command
+Line Options: Options.). Often, options take "arguments", data that
+the program needs to correctly obey the command line option. For
+example, `awk''s `-F' option requires a string to use as the field
+separator. The first occurrence on the command line of either `--' or a
+string that does not begin with `-' ends the options.
+
+ Most Unix systems provide a C function named `getopt' for processing
+command line arguments. The programmer provides a string describing
+the one letter options. If an option requires an argument, it is
+followed in the string with a colon. `getopt' is also passed the count
+and values of the command line arguments, and is called in a loop.
+`getopt' processes the command line arguments for option letters. Each
+time around the loop, it returns a single character representing the
+next option letter that it found, or `?' if it found an invalid option.
+When it returns -1, there are no options left on the command line.
+
+ When using `getopt', options that do not take arguments can be
+grouped together. Furthermore, options that take arguments require
+that the argument be present. The argument can immediately follow the
+option letter, or it can be a separate command line argument.
+
+ Given a hypothetical program that takes three command line options,
+`-a', `-b', and `-c', and `-b' requires an argument, all of the
+following are valid ways of invoking the program:
+
+ prog -a -b foo -c data1 data2 data3
+ prog -ac -bfoo -- data1 data2 data3
+ prog -acbfoo data1 data2 data3
+
+ Notice that when the argument is grouped with its option, the rest of
+the command line argument is considered to be the option's argument.
+In the above example, `-acbfoo' indicates that all of the `-a', `-b',
+and `-c' options were supplied, and that `foo' is the argument to the
+`-b' option.
+
+ `getopt' provides four external variables that the programmer can
+use.
+
+`optind'
+ The index in the argument value array (`argv') where the first
+ non-option command line argument can be found.
+
+`optarg'
+ The string value of the argument to an option.
+
+`opterr'
+ Usually `getopt' prints an error message when it finds an invalid
+ option. Setting `opterr' to zero disables this feature. (An
+ application might wish to print its own error message.)
+
+`optopt'
+ The letter representing the command line option. While not
+ usually documented, most versions supply this variable.
+
+ The following C fragment shows how `getopt' might process command
+line arguments for `awk'.
+
+ int
+ main(int argc, char *argv[])
+ {
+ ...
+ /* print our own message */
+ opterr = 0;
+ while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
+ switch (c) {
+ case 'f': /* file */
+ ...
+ break;
+ case 'F': /* field separator */
+ ...
+ break;
+ case 'v': /* variable assignment */
+ ...
+ break;
+ case 'W': /* extension */
+ ...
+ break;
+ case '?':
+ default:
+ usage();
+ break;
+ }
+ }
+ ...
+ }
+
+ As a side point, `gawk' actually uses the GNU `getopt_long' function
+to process both normal and GNU-style long options (*note Command Line
+Options: Options.).
+
+ The abstraction provided by `getopt' is very useful, and would be
+quite handy in `awk' programs as well. Here is an `awk' version of
+`getopt'. This function highlights one of the greatest weaknesses in
+`awk', which is that it is very poor at manipulating single characters.
+Repeated calls to `substr' are necessary for accessing individual
+characters (*note Built-in Functions for String Manipulation: String
+Functions.).
+
+ The discussion walks through the code a bit at a time.
+
+ # getopt --- do C library getopt(3) function in awk
+ #
+ # arnold@gnu.ai.mit.edu
+ # Public domain
+ #
+ # Initial version: March, 1991
+ # Revised: May, 1993
+
+ # External variables:
+ # Optind -- index of ARGV for first non-option argument
+ # Optarg -- string value of argument to current option
+ # Opterr -- if non-zero, print our own diagnostic
+ # Optopt -- current option letter
+
+ # Returns
+ # -1 at end of options
+ # ? for unrecognized option
+ # <c> a character representing the current option
+
+ # Private Data
+ # _opti index in multi-flag option, e.g., -abc
+
+ The function starts out with some documentation: who wrote the code,
+and when it was revised, followed by a list of the global variables it
+uses, what the return values are and what they mean, and any global
+variables that are "private" to this library function. Such
+documentation is essential for any program, and particularly for
+library functions.
+
+ function getopt(argc, argv, options, optl, thisopt, i)
+ {
+ optl = length(options)
+ if (optl == 0) # no options given
+ return -1
+
+ if (argv[Optind] == "--") { # all done
+ Optind++
+ _opti = 0
+ return -1
+ } else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) {
+ _opti = 0
+ return -1
+ }
+
+ The function first checks that it was indeed called with a string of
+options (the `options' parameter). If `options' has a zero length,
+`getopt' immediately returns -1.
+
+ The next thing to check for is the end of the options. A `--' ends
+the command line options, as does any command line argument that does
+not begin with a `-'. `Optind' is used to step through the array of
+command line arguments; it retains its value across calls to `getopt',
+since it is a global variable.
+
+ The regexp used, `/^-[^: \t\n\f\r\v\b]/', is perhaps a bit of
+overkill; it checks for a `-' followed by anything that is not
+whitespace and not a colon. If the current command line argument does
+not match this pattern, it is not an option, and it ends option
+processing.
+
+ if (_opti == 0)
+ _opti = 2
+ thisopt = substr(argv[Optind], _opti, 1)
+ Optopt = thisopt
+ i = index(options, thisopt)
+ if (i == 0) {
+ if (Opterr)
+ printf("%c -- invalid option\n",
+ thisopt) > "/dev/stderr"
+ if (_opti >= length(argv[Optind])) {
+ Optind++
+ _opti = 0
+ } else
+ _opti++
+ return "?"
+ }
+
+ The `_opti' variable tracks the position in the current command line
+argument (`argv[Optind]'). In the case that multiple options were
+grouped together with one `-' (e.g., `-abx'), it is necessary to return
+them to the user one at a time.
+
+ If `_opti' is equal to zero, it is set to two, the index in the
+string of the next character to look at (we skip the `-', which is at
+position one). The variable `thisopt' holds the character, obtained
+with `substr'. It is saved in `Optopt' for the main program to use.
+
+ If `thisopt' is not in the `options' string, then it is an invalid
+option. If `Opterr' is non-zero, `getopt' prints an error message on
+the standard error that is similar to the message from the C version of
+`getopt'.
+
+ Since the option is invalid, it is necessary to skip it and move on
+to the next option character. If `_opti' is greater than or equal to
+the length of the current command line argument, then it is necessary
+to move on to the next one, so `Optind' is incremented and `_opti' is
+reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
+incremented.
+
+ In any case, since the option was invalid, `getopt' returns `?'.
+The main program can examine `Optopt' if it needs to know what the
+invalid option letter actually was.
+
+ if (substr(options, i + 1, 1) == ":") {
+ # get option argument
+ if (length(substr(argv[Optind], _opti + 1)) > 0)
+ Optarg = substr(argv[Optind], _opti + 1)
+ else
+ Optarg = argv[++Optind]
+ _opti = 0
+ } else
+ Optarg = ""
+
+ If the option requires an argument, the option letter is followed by
+a colon in the `options' string. If there are remaining characters in
+the current command line argument (`argv[Optind]'), then the rest of
+that string is assigned to `Optarg'. Otherwise, the next command line
+argument is used (`-xFOO' vs. `-x FOO'). In either case, `_opti' is
+reset to zero, since there are no more characters left to examine in
+the current command line argument.
+
+ if (_opti == 0 || _opti >= length(argv[Optind])) {
+ Optind++
+ _opti = 0
+ } else
+ _opti++
+ return thisopt
+ }
+
+ Finally, if `_opti' is either zero or greater than the length of the
+current command line argument, it means this element in `argv' is
+through being processed, so `Optind' is incremented to point to the
+next element in `argv'. If neither condition is true, then only
+`_opti' is incremented, so that the next option letter can be processed
+on the next call to `getopt'.
+
+ BEGIN {
+ Opterr = 1 # default is to diagnose
+ Optind = 1 # skip ARGV[0]
+
+ # test program
+ if (_getopt_test) {
+ while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
+ printf("c = <%c>, optarg = <%s>\n",
+ _go_c, Optarg)
+ printf("non-option arguments:\n")
+ for (; Optind < ARGC; Optind++)
+ printf("\tARGV[%d] = <%s>\n",
+ Optind, ARGV[Optind])
+ }
+ }
+
+ The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
+`Opterr' is set to one, since the default behavior is for `getopt' to
+print a diagnostic message upon seeing an invalid option. `Optind' is
+set to one, since there's no reason to look at the program name, which
+is in `ARGV[0]'.
+
+ The rest of the `BEGIN' rule is a simple test program. Here is the
+result of two sample runs of the test program.
+
+ $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
+ -| c = <a>, optarg = <>
+ -| c = <c>, optarg = <>
+ -| c = <b>, optarg = <ARG>
+ -| non-option arguments:
+ -| ARGV[3] = <bax>
+ -| ARGV[4] = <-x>
+
+ $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
+ -| c = <a>, optarg = <>
+ error--> x -- invalid option
+ -| c = <?>, optarg = <>
+ -| non-option arguments:
+ -| ARGV[4] = <xyz>
+ -| ARGV[5] = <abc>
+
+ The first `--' terminates the arguments to `awk', so that it does
+not try to interpret the `-a' etc. as its own options.
+
+ Several of the sample programs presented in *Note Practical `awk'
+Programs: Sample Programs, use `getopt' to process their arguments.
+
+
+File: gawk.info, Node: Passwd Functions, Next: Group Functions, Prev: Getopt Function, Up: Library Functions
+
+Reading the User Database
+=========================
+
+ The `/dev/user' special file (*note Special File Names in `gawk':
+Special Files.) provides access to the current user's real and
+effective user and group id numbers, and if available, the user's
+supplementary group set. However, since these are numbers, they do not
+provide very useful information to the average user. There needs to be
+some way to find the user information associated with the user and
+group numbers. This section presents a suite of functions for
+retrieving information from the user database. *Note Reading the Group
+Database: Group Functions, for a similar suite that retrieves
+information from the group database.
+
+ The POSIX standard does not define the file where user information is
+kept. Instead, it provides the `<pwd.h>' header file and several C
+language subroutines for obtaining user information. The primary
+function is `getpwent', for "get password entry." The "password" comes
+from the original user database file, `/etc/passwd', which kept user
+information, along with the encrypted passwords (hence the name).
+
+ While an `awk' program could simply read `/etc/passwd' directly (the
+format is well known), because of the way password files are handled on
+networked systems, this file may not contain complete information about
+the system's set of users.
+
+ To be sure of being able to produce a readable, complete version of
+the user database, it is necessary to write a small C program that
+calls `getpwent'. `getpwent' is defined to return a pointer to a
+`struct passwd'. Each time it is called, it returns the next entry in
+the database. When there are no more entries, it returns `NULL', the
+null pointer. When this happens, the C program should call `endpwent'
+to close the database. Here is `pwcat', a C program that "cats" the
+password database.
+
+ /*
+ * pwcat.c
+ *
+ * Generate a printable version of the password database
+ *
+ * Arnold Robbins
+ * arnold@gnu.ai.mit.edu
+ * May 1993
+ * Public Domain
+ */
+
+ #include <stdio.h>
+ #include <pwd.h>
+
+ int
+ main(argc, argv)
+ int argc;
+ char **argv;
+ {
+ struct passwd *p;
+
+ while ((p = getpwent()) != NULL)
+ printf("%s:%s:%d:%d:%s:%s:%s\n",
+ p->pw_name, p->pw_passwd, p->pw_uid,
+ p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
+
+ endpwent();
+ exit(0);
+ }
+
+ If you don't understand C, don't worry about it. The output from
+`pwcat' is the user database, in the traditional `/etc/passwd' format
+of colon-separated fields. The fields are:
+
+Login name
+ The user's login name.
+
+Encrypted password
+ The user's encrypted password. This may not be available on some
+ systems.
+
+User-ID
+ The user's numeric user-id number.
+
+Group-ID
+ The user's numeric group-id number.
+
+Full name
+ The user's full name, and perhaps other information associated
+ with the user.
+
+Home directory
+ The user's login, or "home" directory (familiar to shell
+ programmers as `$HOME').
+
+Login shell
+ The program that will be run when the user logs in. This is
+ usually a shell, such as Bash (the Gnu Bourne-Again shell).
+
+ Here are a few lines representative of `pwcat''s output.
+
+ $ pwcat
+ -| root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
+ -| nobody:*:65534:65534::/:
+ -| daemon:*:1:1::/:
+ -| sys:*:2:2::/:/bin/csh
+ -| bin:*:3:3::/bin:
+ -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
+ -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
+ -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
+ ...
+
+ With that introduction, here is a group of functions for getting user
+information. There are several functions here, corresponding to the C
+functions of the same name.
+
+ # passwd.awk --- access password file information
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ BEGIN {
+ # tailor this to suit your system
+ _pw_awklib = "/usr/local/libexec/awk/"
+ }
+
+ function _pw_init( oldfs, oldrs, olddol0, pwcat)
+ {
+ if (_pw_inited)
+ return
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ FS = ":"
+ RS = "\n"
+ pwcat = _pw_awklib "pwcat"
+ while ((pwcat | getline) > 0) {
+ _pw_byname[$1] = $0
+ _pw_byuid[$3] = $0
+ _pw_bycount[++_pw_total] = $0
+ }
+ close(pwcat)
+ _pw_count = 0
+ _pw_inited = 1
+ FS = oldfs
+ RS = oldrs
+ $0 = olddol0
+ }
+
+ The `BEGIN' rule sets a private variable to the directory where
+`pwcat' is stored. Since it is used to help out an `awk' library
+routine, we have chosen to put it in `/usr/local/libexec/awk'. You
+might want it to be in a different directory on your system.
+
+ The function `_pw_init' keeps three copies of the user information
+in three associative arrays. The arrays are indexed by user name
+(`_pw_byname'), by user-id number (`_pw_byuid'), and by order of
+occurrence (`_pw_bycount').
+
+ The variable `_pw_inited' is used for efficiency; `_pw_init' only
+needs to be called once.
+
+ Since this function uses `getline' to read information from `pwcat',
+it first saves the values of `FS', `RS', and `$0'. Doing so is
+necessary, since these functions could be called from anywhere within a
+user's program, and the user may have his or her own values for `FS'
+and `RS'.
+
+ The main part of the function uses a loop to read database lines,
+split the line into fields, and then store the line into each array as
+necessary. When the loop is done, `_pw_init' cleans up by closing the
+pipeline, setting `_pw_inited' to one, and restoring `FS', `RS', and
+`$0'. The use of `_pw_count' will be explained below.
+
+ function getpwnam(name)
+ {
+ _pw_init()
+ if (name in _pw_byname)
+ return _pw_byname[name]
+ return ""
+ }
+
+ The `getpwnam' function takes a user name as a string argument. If
+that user is in the database, it returns the appropriate line.
+Otherwise it returns the null string.
+
+ function getpwuid(uid)
+ {
+ _pw_init()
+ if (uid in _pw_byuid)
+ return _pw_byuid[uid]
+ return ""
+ }
+
+ Similarly, the `getpwuid' function takes a user-id number argument.
+If that user number is in the database, it returns the appropriate
+line. Otherwise it returns the null string.
+
+ function getpwent()
+ {
+ _pw_init()
+ if (_pw_count < _pw_total)
+ return _pw_bycount[++_pw_count]
+ return ""
+ }
+
+ The `getpwent' function simply steps through the database, one entry
+at a time. It uses `_pw_count' to track its current position in the
+`_pw_bycount' array.
+
+ function endpwent()
+ {
+ _pw_count = 0
+ }
+
+ The `endpwent' function resets `_pw_count' to zero, so that
+subsequent calls to `getpwent' will start over again.
+
+ A conscious design decision in this suite is that each subroutine
+calls `_pw_init' to initialize the database arrays. The overhead of
+running a separate process to generate the user database, and the I/O
+to scan it, will only be incurred if the user's main program actually
+calls one of these functions. If this library file is loaded along
+with a user's program, but none of the routines are ever called, then
+there is no extra run-time overhead. (The alternative would be to move
+the body of `_pw_init' into a `BEGIN' rule, which would always run
+`pwcat'. This simplifies the code but runs an extra process that may
+never be needed.)
+
+ In turn, calling `_pw_init' is not too expensive, since the
+`_pw_inited' variable keeps the program from reading the data more than
+once. If you are worried about squeezing every last cycle out of your
+`awk' program, the check of `_pw_inited' could be moved out of
+`_pw_init' and duplicated in all the other functions. In practice,
+this is not necessary, since most `awk' programs are I/O bound, and it
+would clutter up the code.
+
+ The `id' program in *Note Printing Out User Information: Id Program,
+uses these functions.
+
+
+File: gawk.info, Node: Group Functions, Next: Library Names, Prev: Passwd Functions, Up: Library Functions
+
+Reading the Group Database
+==========================
+
+ Much of the discussion presented in *Note Reading the User Database:
+Passwd Functions, applies to the group database as well. Although
+there has traditionally been a well known file, `/etc/group', in a well
+known format, the POSIX standard only provides a set of C library
+routines (`<grp.h>' and `getgrent') for accessing the information.
+Even though this file may exist, it likely does not have complete
+information. Therefore, as with the user database, it is necessary to
+have a small C program that generates the group database as its output.
+
+ Here is `grcat', a C program that "cats" the group database.
+
+ /*
+ * grcat.c
+ *
+ * Generate a printable version of the group database
+ *
+ * Arnold Robbins, arnold@gnu.ai.mit.edu
+ * May 1993
+ * Public Domain
+ */
+
+ #include <stdio.h>
+ #include <grp.h>
+
+ int
+ main(argc, argv)
+ int argc;
+ char **argv;
+ {
+ struct group *g;
+ int i;
+
+ while ((g = getgrent()) != NULL) {
+ printf("%s:%s:%d:", g->gr_name, g->gr_passwd,
+ g->gr_gid);
+ for (i = 0; g->gr_mem[i] != NULL; i++) {
+ printf("%s", g->gr_mem[i]);
+ if (g->gr_mem[i+1] != NULL)
+ putchar(',');
+ }
+ putchar('\n');
+ }
+ endgrent();
+ exit(0);
+ }
+
+ Each line in the group database represent one group. The fields are
+separated with colons, and represent the following information.
+
+Group Name
+ The name of the group.
+
+Group Password
+ The encrypted group password. In practice, this field is never
+ used. It is usually empty, or set to `*'.
+
+Group ID Number
+ The numeric group-id number. This number should be unique within
+ the file.
+
+Group Member List
+ A comma-separated list of user names. These users are members of
+ the group. Most Unix systems allow users to be members of several
+ groups simultaneously. If your system does, then reading
+ `/dev/user' will return those group-id numbers in `$5' through
+ `$NF'. (Note that `/dev/user' is a `gawk' extension; *note
+ Special File Names in `gawk': Special Files..)
+
+ Here is what running `grcat' might produce:
+
+ $ grcat
+ -| wheel:*:0:arnold
+ -| nogroup:*:65534:
+ -| daemon:*:1:
+ -| kmem:*:2:
+ -| staff:*:10:arnold,miriam,andy
+ -| other:*:20:
+ ...
+
+ Here are the functions for obtaining information from the group
+database. There are several, modeled after the C library functions of
+the same names.
+
+ # group.awk --- functions for dealing with the group file
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ BEGIN \
+ {
+ # Change to suit your system
+ _gr_awklib = "/usr/local/libexec/awk/"
+ }
+
+ function _gr_init( oldfs, oldrs, olddol0, grcat, n, a, i)
+ {
+ if (_gr_inited)
+ return
+
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ FS = ":"
+ RS = "\n"
+
+ grcat = _gr_awklib "grcat"
+ while ((grcat | getline) > 0) {
+ if ($1 in _gr_byname)
+ _gr_byname[$1] = _gr_byname[$1] "," $4
+ else
+ _gr_byname[$1] = $0
+ if ($3 in _gr_bygid)
+ _gr_bygid[$3] = _gr_bygid[$3] "," $4
+ else
+ _gr_bygid[$3] = $0
+
+ n = split($4, a, "[ \t]*,[ \t]*")
+ for (i = 1; i <= n; i++)
+ if (a[i] in _gr_groupsbyuser)
+ _gr_groupsbyuser[a[i]] = \
+ _gr_groupsbyuser[a[i]] " " $1
+ else
+ _gr_groupsbyuser[a[i]] = $1
+
+ _gr_bycount[++_gr_count] = $0
+ }
+ close(grcat)
+ _gr_count = 0
+ _gr_inited++
+ FS = oldfs
+ RS = oldrs
+ $0 = olddol0
+ }
+
+ The `BEGIN' rule sets a private variable to the directory where
+`grcat' is stored. Since it is used to help out an `awk' library
+routine, we have chosen to put it in `/usr/local/libexec/awk'. You
+might want it to be in a different directory on your system.
+
+ These routines follow the same general outline as the user database
+routines (*note Reading the User Database: Passwd Functions.). The
+`_gr_inited' variable is used to ensure that the database is scanned no
+more than once. The `_gr_init' function first saves `FS', `RS', and
+`$0', and then sets `FS' and `RS' to the correct values for scanning
+the group information.
+
+ The group information is stored is several associative arrays. The
+arrays are indexed by group name (`_gr_byname'), by group-id number
+(`_gr_bygid'), and by position in the database (`_gr_bycount'). There
+is an additional array indexed by user name (`_gr_groupsbyuser'), that
+is a space separated list of groups that each user belongs to.
+
+ Unlike the user database, it is possible to have multiple records in
+the database for the same group. This is common when a group has a
+large number of members. Such a pair of entries might look like:
+
+ tvpeople:*:101:johny,jay,arsenio
+ tvpeople:*:101:david,conan,tom,joan
+
+ For this reason, `_gr_init' looks to see if a group name or group-id
+number has already been seen. If it has, then the user names are
+simply concatenated onto the previous list of users. (There is
+actually a subtle problem with the code presented above. Suppose that
+the first time there were no names. This code adds the names with a
+leading comma. It also doesn't check that there is a `$4'.)
+
+ Finally, `_gr_init' closes the pipeline to `grcat', restores `FS',
+`RS', and `$0', initializes `_gr_count' to zero (it is used later), and
+makes `_gr_inited' non-zero.
+
+ function getgrnam(group)
+ {
+ _gr_init()
+ if (group in _gr_byname)
+ return _gr_byname[group]
+ return ""
+ }
+
+ The `getgrnam' function takes a group name as its argument, and if
+that group exists, it is returned. Otherwise, `getgrnam' returns the
+null string.
+
+ function getgrgid(gid)
+ {
+ _gr_init()
+ if (gid in _gr_bygid)
+ return _gr_bygid[gid]
+ return ""
+ }
+
+ The `getgrgid' function is similar, it takes a numeric group-id, and
+looks up the information associated with that group-id.
+
+ function getgruser(user)
+ {
+ _gr_init()
+ if (user in _gr_groupsbyuser)
+ return _gr_groupsbyuser[user]
+ return ""
+ }
+
+ The `getgruser' function does not have a C counterpart. It takes a
+user name, and returns the list of groups that have the user as a
+member.
+
+ function getgrent()
+ {
+ _gr_init()
+ if (++gr_count in _gr_bycount)
+ return _gr_bycount[_gr_count]
+ return ""
+ }
+
+ The `getgrent' function steps through the database one entry at a
+time. It uses `_gr_count' to track its position in the list.
+
+ function endgrent()
+ {
+ _gr_count = 0
+ }
+
+ `endgrent' resets `_gr_count' to zero so that `getgrent' can start
+over again.
+
+ As with the user database routines, each function calls `_gr_init' to
+initialize the arrays. Doing so only incurs the extra overhead of
+running `grcat' if these functions are used (as opposed to moving the
+body of `_gr_init' into a `BEGIN' rule).
+
+ Most of the work is in scanning the database and building the various
+associative arrays. The functions that the user calls are themselves
+very simple, relying on `awk''s associative arrays to do work.
+
+ The `id' program in *Note Printing Out User Information: Id Program,
+uses these functions.
+
+
+File: gawk.info, Node: Library Names, Prev: Group Functions, Up: Library Functions
+
+Naming Library Function Global Variables
+========================================
+
+ Due to the way the `awk' language evolved, variables are either
+"global" (usable by the entire program), or "local" (usable just by a
+specific function). There is no intermediate state analogous to
+`static' variables in C.
+
+ Library functions often need to have global variables that they can
+use to preserve state information between calls to the function. For
+example, `getopt''s variable `_opti' (*note Processing Command Line
+Options: Getopt Function.), and the `_tm_months' array used by `mktime'
+(*note Turning Dates Into Timestamps: Mktime Function.). Such
+variables are called "private", since the only functions that need to
+use them are the ones in the library.
+
+ When writing a library function, you should try to choose names for
+your private variables so that they will not conflict with any
+variables used by either another library function or a user's main
+program. For example, a name like `i' or `j' is not a good choice,
+since user programs often use variable names like these for their own
+purposes.
+
+ The example programs shown in this chapter all start the names of
+their private variables with an underscore (`_'). Users generally
+don't use leading underscores in their variable names, so this
+convention immediately decreases the chances that the variable name
+will be accidentally shared with the user's program.
+
+ In addition, several of the library functions use a prefix that helps
+indicate what function or set of functions uses the variables. For
+example, `_tm_months' in `mktime' (*note Turning Dates Into Timestamps:
+Mktime Function.), and `_pw_byname' in the user data base routines
+(*note Reading the User Database: Passwd Functions.). This convention
+is recommended, since it even further decreases the chance of
+inadvertent conflict among variable names. Note that this convention
+can be used equally well both for variable names and for private
+function names too.
+
+ While I could have re-written all the library routines to use this
+convention, I did not do so, in order to show how my own `awk'
+programming style has evolved, and to provide some basis for this
+discussion.
+
+ As a final note on variable naming, if a function makes global
+variables available for use by a main program, it is a good convention
+to start that variable's name with a capital letter. For example,
+`getopt''s `Opterr' and `Optind' variables (*note Processing Command
+Line Options: Getopt Function.). The leading capital letter indicates
+that it is global, while the fact that the variable name is not all
+capital letters indicates that the variable is not one of `awk''s
+built-in variables, like `FS'.
+
+ It is also important that _all_ variables in library functions that
+do not need to save state are in fact declared local. If this is not
+done, the variable could accidentally be used in the user's program,
+leading to bugs that are very difficult to track down.
+
+ function lib_func(x, y, l1, l2)
+ {
+ ...
+ USE VARIABLE some_var # some_var could be local
+ ... # but is not by oversight
+ }
+
+ A different convention, common in the Tcl community, is to use a
+single associative array to hold the values needed by the library
+function(s), or "package." This significantly decreases the number of
+actual global names in use. For example, the functions described in
+*Note Reading the User Database: Passwd Functions, might have used
+`PW_data["inited"]', `PW_data["total"]', `PW_data["count"]' and
+`PW_data["awklib"]', instead of `_pw_inited', `_pw_awklib', `_pw_total',
+and `_pw_count'.
+
+ The conventions presented in this section are exactly that,
+conventions. You are not required to write your programs this way, we
+merely recommend that you do so.
+
+
+File: gawk.info, Node: Sample Programs, Next: Language History, Prev: Library Functions, Up: Top
+
+Practical `awk' Programs
+************************
+
+ This chapter presents a potpourri of `awk' programs for your reading
+enjoyment.
+
+ Many of these programs use the library functions presented in *Note
+A Library of `awk' Functions: Library Functions.
+
+* Menu:
+
+* Clones:: Clones of common utilities.
+* Miscellaneous Programs:: Some interesting `awk' programs.
+
+
+File: gawk.info, Node: Clones, Next: Miscellaneous Programs, Prev: Sample Programs, Up: Sample Programs
+
+Re-inventing Wheels for Fun and Profit
+======================================
+
+ This section presents a number of POSIX utilities that are
+implemented in `awk'. Re-inventing these programs in `awk' is often
+enjoyable, since the algorithms can be very clearly expressed, and
+usually the code is very concise and simple. This is true because
+`awk' does so much for you.
+
+ It should be noted that these programs are not necessarily intended
+to replace the installed versions on your system. Instead, their
+purpose is to illustrate `awk' language programming for "real world"
+tasks.
+
+ The programs are presented in alphabetical order.
+
+* Menu:
+
+* Cut Program:: The `cut' utility.
+* Egrep Program:: The `egrep' utility.
+* Id Program:: The `id' utility.
+* Split Program:: The `split' utility.
+* Tee Program:: The `tee' utility.
+* Uniq Program:: The `uniq' utility.
+* Wc Program:: The `wc' utility.
+
+
+File: gawk.info, Node: Cut Program, Next: Egrep Program, Prev: Clones, Up: Clones
+
+Cutting Out Fields and Columns
+------------------------------
+
+ The `cut' utility selects, or "cuts," either characters or fields
+from its standard input and sends them to its standard output. `cut'
+can cut out either a list of characters, or a list of fields. By
+default, fields are separated by tabs, but you may supply a command
+line option to change the field "delimiter", i.e. the field separator
+character. `cut''s definition of fields is less general than `awk''s.
+
+ A common use of `cut' might be to pull out just the login name of
+logged-on users from the output of `who'. For example, the following
+pipeline generates a sorted, unique list of the logged on users:
+
+ who | cut -c1-8 | sort | uniq
+
+ The options for `cut' are:
+
+`-c LIST'
+ Use LIST as the list of characters to cut out. Items within the
+ list may be separated by commas, and ranges of characters can be
+ separated with dashes. The list `1-8,15,22-35' specifies
+ characters one through eight, 15, and 22 through 35.
+
+`-f LIST'
+ Use LIST as the list of fields to cut out.
+
+`-d DELIM'
+ Use DELIM as the field separator character instead of the tab
+ character.
+
+`-s'
+ Suppress printing of lines that do not contain the field delimiter.
+
+ The `awk' implementation of `cut' uses the `getopt' library function
+(*note Processing Command Line Options: Getopt Function.), and the
+`join' library function (*note Merging an Array Into a String: Join
+Function.).
+
+ The program begins with a comment describing the options and a
+`usage' function which prints out a usage message and exits. `usage'
+is called if invalid arguments are supplied.
+
+ # cut.awk --- implement cut in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # Options:
+ # -f list Cut fields
+ # -d c Field delimiter character
+ # -c list Cut characters
+ #
+ # -s Suppress lines without the delimiter character
+
+ function usage( e1, e2)
+ {
+ e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
+ e2 = "usage: cut [-c list] [files...]"
+ print e1 > "/dev/stderr"
+ print e2 > "/dev/stderr"
+ exit 1
+ }
+
+The variables `e1' and `e2' are used so that the function fits nicely
+on the screen.
+
+ Next comes a `BEGIN' rule that parses the command line options. It
+sets `FS' to a single tab character, since that is `cut''s default
+field separator. The output field separator is also set to be the same
+as the input field separator. Then `getopt' is used to step through
+the command line options. One or the other of the variables
+`by_fields' or `by_chars' is set to true, to indicate that processing
+should be done by fields or by characters respectively. When cutting
+by characters, the output field separator is set to the null string.
+
+ BEGIN \
+ {
+ FS = "\t" # default
+ OFS = FS
+ while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) {
+ if (c == "f") {
+ by_fields = 1
+ fieldlist = Optarg
+ } else if (c == "c") {
+ by_chars = 1
+ fieldlist = Optarg
+ OFS = ""
+ } else if (c == "d") {
+ if (length(Optarg) > 1) {
+ printf("Using first character of %s" \
+ " for delimiter\n", Optarg) > "/dev/stderr"
+ Optarg = substr(Optarg, 1, 1)
+ }
+ FS = Optarg
+ OFS = FS
+ if (FS == " ") # defeat awk semantics
+ FS = "[ ]"
+ } else if (c == "s")
+ suppress++
+ else
+ usage()
+ }
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+
+ Special care is taken when the field delimiter is a space. Using
+`" "' (a single space) for the value of `FS' is incorrect--`awk' would
+separate fields with runs of spaces, tabs and/or newlines, and we want
+them to be separated with individual spaces. Also, note that after
+`getopt' is through, we have to clear out all the elements of `ARGV'
+from one to `Optind', so that `awk' will not try to process the command
+line options as file names.
+
+ After dealing with the command line options, the program verifies
+that the options make sense. Only one or the other of `-c' and `-f'
+should be used, and both require a field list. Then either
+`set_fieldlist' or `set_charlist' is called to pull apart the list of
+fields or characters.
+
+ if (by_fields && by_chars)
+ usage()
+
+ if (by_fields == 0 && by_chars == 0)
+ by_fields = 1 # default
+
+ if (fieldlist == "") {
+ print "cut: needs list for -c or -f" > "/dev/stderr"
+ exit 1
+ }
+
+ if (by_fields)
+ set_fieldlist()
+ else
+ set_charlist()
+ }
+
+ Here is `set_fieldlist'. It first splits the field list apart at
+the commas, into an array. Then, for each element of the array, it
+looks to see if it is actually a range, and if so splits it apart. The
+range is verified to make sure the first number is smaller than the
+second. Each number in the list is added to the `flist' array, which
+simply lists the fields that will be printed. Normal field splitting
+is used. The program lets `awk' handle the job of doing the field
+splitting.
+
+ function set_fieldlist( n, m, i, j, k, f, g)
+ {
+ n = split(fieldlist, f, ",")
+ j = 1 # index in flist
+ for (i = 1; i <= n; i++) {
+ if (index(f[i], "-") != 0) { # a range
+ m = split(f[i], g, "-")
+ if (m != 2 || g[1] >= g[2]) {
+ printf("bad field list: %s\n",
+ f[i]) > "/dev/stderr"
+ exit 1
+ }
+ for (k = g[1]; k <= g[2]; k++)
+ flist[j++] = k
+ } else
+ flist[j++] = f[i]
+ }
+ nfields = j - 1
+ }
+
+ The `set_charlist' function is more complicated than `set_fieldlist'.
+The idea here is to use `gawk''s `FIELDWIDTHS' variable (*note Reading
+Fixed-width Data: Constant Size.), which describes constant width
+input. When using a character list, that is exactly what we have.
+
+ Setting up `FIELDWIDTHS' is more complicated than simply listing the
+fields that need to be printed. We have to keep track of the fields to
+be printed, and also the intervening characters that have to be skipped.
+For example, suppose you wanted characters one through eight, 15, and
+22 through 35. You would use `-c 1-8,15,22-35'. The necessary value
+for `FIELDWIDTHS' would be `"8 6 1 6 14"'. This gives us five fields,
+and what should be printed are `$1', `$3', and `$5'. The intermediate
+fields are "filler," stuff in between the desired data.
+
+ `flist' lists the fields to be printed, and `t' tracks the complete
+field list, including filler fields.
+
+ function set_charlist( field, i, j, f, g, t,
+ filler, last, len)
+ {
+ field = 1 # count total fields
+ n = split(fieldlist, f, ",")
+ j = 1 # index in flist
+ for (i = 1; i <= n; i++) {
+ if (index(f[i], "-") != 0) { # range
+ m = split(f[i], g, "-")
+ if (m != 2 || g[1] >= g[2]) {
+ printf("bad character list: %s\n",
+ f[i]) > "/dev/stderr"
+ exit 1
+ }
+ len = g[2] - g[1] + 1
+ if (g[1] > 1) # compute length of filler
+ filler = g[1] - last - 1
+ else
+ filler = 0
+ if (filler)
+ t[field++] = filler
+ t[field++] = len # length of field
+ last = g[2]
+ flist[j++] = field - 1
+ } else {
+ if (f[i] > 1)
+ filler = f[i] - last - 1
+ else
+ filler = 0
+ if (filler)
+ t[field++] = filler
+ t[field++] = 1
+ last = f[i]
+ flist[j++] = field - 1
+ }
+ }
+ FIELDWIDTHS = join(t, 1, field - 1)
+ nfields = j - 1
+ }
+
+ Here is the rule that actually processes the data. If the `-s'
+option was given, then `suppress' will be true. The first `if'
+statement makes sure that the input record does have the field
+separator. If `cut' is processing fields, `suppress' is true, and the
+field separator character is not in the record, then the record is
+skipped.
+
+ If the record is valid, then at this point, `gawk' has split the data
+into fields, either using the character in `FS' or using fixed-length
+fields and `FIELDWIDTHS'. The loop goes through the list of fields
+that should be printed. If the corresponding field has data in it, it
+is printed. If the next field also has data, then the separator
+character is written out in between the fields.
+
+ {
+ if (by_fields && suppress && $0 !~ FS)
+ next
+
+ for (i = 1; i <= nfields; i++) {
+ if ($flist[i] != "") {
+ printf "%s", $flist[i]
+ if (i < nfields && $flist[i+1] != "")
+ printf "%s", OFS
+ }
+ }
+ print ""
+ }
+
+ This version of `cut' relies on `gawk''s `FIELDWIDTHS' variable to
+do the character-based cutting. While it would be possible in other
+`awk' implementations to use `substr' (*note Built-in Functions for
+String Manipulation: String Functions.), it would also be extremely
+painful to do so. The `FIELDWIDTHS' variable supplies an elegant
+solution to the problem of picking the input line apart by characters.
+
+
+File: gawk.info, Node: Egrep Program, Next: Id Program, Prev: Cut Program, Up: Clones
+
+Searching for Regular Expressions in Files
+------------------------------------------
+
+ The `egrep' utility searches files for patterns. It uses regular
+expressions that are almost identical to those available in `awk'
+(*note Regular Expression Constants: Regexp Constants.). It is used
+this way:
+
+ egrep [ OPTIONS ] 'PATTERN' FILES ...
+
+ The PATTERN is a regexp. In typical usage, the regexp is quoted to
+prevent the shell from expanding any of the special characters as file
+name wildcards. Normally, `egrep' prints the lines that matched. If
+multiple file names are provided on the command line, each output line
+is preceded by the name of the file and a colon.
+
+ The options are:
+
+`-c'
+ Print out a count of the lines that matched the pattern, instead
+ of the lines themselves.
+
+`-s'
+ Be silent. No output is produced, and the exit value indicates
+ whether or not the pattern was matched.
+
+`-v'
+ Invert the sense of the test. `egrep' prints the lines that do
+ _not_ match the pattern, and exits successfully if the pattern was
+ not matched.
+
+`-i'
+ Ignore case distinctions in both the pattern and the input data.
+
+`-l'
+ Only print the names of the files that matched, not the lines that
+ matched.
+
+`-e PATTERN'
+ Use PATTERN as the regexp to match. The purpose of the `-e'
+ option is to allow patterns that start with a `-'.
+
+ This version uses the `getopt' library function (*note Processing
+Command Line Options: Getopt Function.), and the file transition
+library program (*note Noting Data File Boundaries: Filetrans
+Function.).
+
+ The program begins with a descriptive comment, and then a `BEGIN'
+rule that processes the command line arguments with `getopt'. The `-i'
+(ignore case) option is particularly easy with `gawk'; we just use the
+`IGNORECASE' built in variable (*note Built-in Variables::).
+
+ # egrep.awk --- simulate egrep in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # Options:
+ # -c count of lines
+ # -s silent - use exit value
+ # -v invert test, success if no match
+ # -i ignore case
+ # -l print filenames only
+ # -e argument is pattern
+
+ BEGIN {
+ while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
+ if (c == "c")
+ count_only++
+ else if (c == "s")
+ no_print++
+ else if (c == "v")
+ invert++
+ else if (c == "i")
+ IGNORECASE = 1
+ else if (c == "l")
+ filenames_only++
+ else if (c == "e")
+ pattern = Optarg
+ else
+ usage()
+ }
+
+ Next comes the code that handles the `egrep' specific behavior. If no
+pattern was supplied with `-e', the first non-option on the command
+line is used. The `awk' command line arguments up to `ARGV[Optind]'
+are cleared, so that `awk' won't try to process them as files. If no
+files were specified, the standard input is used, and if multiple files
+were specified, we make sure to note this so that the file names can
+precede the matched lines in the output.
+
+ The last two lines are commented out, since they are not needed in
+`gawk'. They should be uncommented if you have to use another version
+of `awk'.
+
+ if (pattern == "")
+ pattern = ARGV[Optind++]
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+ if (Optind >= ARGC) {
+ ARGV[1] = "-"
+ ARGC = 2
+ } else if (ARGC - Optind > 1)
+ do_filenames++
+
+ # if (IGNORECASE)
+ # pattern = tolower(pattern)
+ }
+
+ The next set of lines should be uncommented if you are not using
+`gawk'. This rule translates all the characters in the input line into
+lower-case if the `-i' option was specified. The rule is commented out
+since it is not necessary with `gawk'.
+
+ #{
+ # if (IGNORECASE)
+ # $0 = tolower($0)
+ #}
+
+ The `beginfile' function is called by the rule in `ftrans.awk' when
+each new file is processed. In this case, it is very simple; all it
+does is initialize a variable `fcount' to zero. `fcount' tracks how
+many lines in the current file matched the pattern.
+
+ function beginfile(junk)
+ {
+ fcount = 0
+ }
+
+ The `endfile' function is called after each file has been processed.
+It is used only when the user wants a count of the number of lines that
+matched. `no_print' will be true only if the exit status is desired.
+`count_only' will be true if line counts are desired. `egrep' will
+therefore only print line counts if printing and counting are enabled.
+The output format must be adjusted depending upon the number of files
+to be processed. Finally, `fcount' is added to `total', so that we
+know how many lines altogether matched the pattern.
+
+ function endfile(file)
+ {
+ if (! no_print && count_only)
+ if (do_filenames)
+ print file ":" fcount
+ else
+ print fcount
+
+ total += fcount
+ }
+
+ This rule does most of the work of matching lines. The variable
+`matches' will be true if the line matched the pattern. If the user
+wants lines that did not match, the sense of the `matches' is inverted
+using the `!' operator. `fcount' is incremented with the value of
+`matches', which will be either one or zero, depending upon a
+successful or unsuccessful match. If the line did not match, the
+`next' statement just moves on to the next record.
+
+ There are several optimizations for performance in the following few
+lines of code. If the user only wants exit status (`no_print' is true),
+and we don't have to count lines, then it is enough to know that one
+line in this file matched, and we can skip on to the next file with
+`nextfile'. Along similar lines, if we are only printing file names,
+and we don't need to count lines, we can print the file name, and then
+skip to the next file with `nextfile'.
+
+ Finally, each line is printed, with a leading filename and colon if
+necessary.
+
+ {
+ matches = ($0 ~ pattern)
+ if (invert)
+ matches = ! matches
+
+ fcount += matches # 1 or 0
+
+ if (! matches)
+ next
+
+ if (no_print && ! count_only)
+ nextfile
+
+ if (filenames_only && ! count_only) {
+ print FILENAME
+ nextfile
+ }
+
+ if (do_filenames && ! count_only)
+ print FILENAME ":" $0
+ else if (! count_only)
+ print
+ }
+
+ The `END' rule takes care of producing the correct exit status. If
+there were no matches, the exit status is one, otherwise it is zero.
+
+ END \
+ {
+ if (total == 0)
+ exit 1
+ exit 0
+ }
+
+ The `usage' function prints a usage message in case of invalid
+options and then exits.
+
+ function usage( e)
+ {
+ e = "Usage: egrep [-csvil] [-e pat] [files ...]"
+ print e > "/dev/stderr"
+ exit 1
+ }
+
+ The variable `e' is used so that the function fits nicely on the
+printed page.
+
+ Just a note on programming style. You may have noticed that the `END'
+rule uses backslash continuation, with the open brace on a line by
+itself. This is so that it more closely resembles the way functions
+are written. Many of the examples use this style. You can decide for
+yourself if you like writing your `BEGIN' and `END' rules this way, or
+not.
+
+
+File: gawk.info, Node: Id Program, Next: Split Program, Prev: Egrep Program, Up: Clones
+
+Printing Out User Information
+-----------------------------
+
+ The `id' utility lists a user's real and effective user-id numbers,
+real and effective group-id numbers, and the user's group set, if any.
+`id' will only print the effective user-id and group-id if they are
+different from the real ones. If possible, `id' will also supply the
+corresponding user and group names. The output might look like this:
+
+ $ id
+ -| uid=2076(arnold) gid=10(staff) groups=10(staff),4(tty)
+
+ This information is exactly what is provided by `gawk''s `/dev/user'
+special file (*note Special File Names in `gawk': Special Files.).
+However, the `id' utility provides a more palatable output than just a
+string of numbers.
+
+ Here is a simple version of `id' written in `awk'. It uses the user
+database library functions (*note Reading the User Database: Passwd
+Functions.), and the group database library functions (*note Reading
+the Group Database: Group Functions.).
+
+ The program is fairly straightforward. All the work is done in the
+`BEGIN' rule. The user and group id numbers are obtained from
+`/dev/user'. If there is no support for `/dev/user', the program gives
+up.
+
+ The code is repetitive. The entry in the user database for the real
+user-id number is split into parts at the `:'. The name is the first
+field. Similar code is used for the effective user-id number, and the
+group numbers.
+
+ # id.awk --- implement id in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # output is:
+ # uid=12(foo) euid=34(bar) gid=3(baz) \
+ # egid=5(blat) groups=9(nine),2(two),1(one)
+
+ BEGIN \
+ {
+ if ((getline < "/dev/user") < 0) {
+ err = "id: no /dev/user support - cannot run"
+ print err > "/dev/stderr"
+ exit 1
+ }
+ close("/dev/user")
+
+ uid = $1
+ euid = $2
+ gid = $3
+ egid = $4
+
+ printf("uid=%d", uid)
+ pw = getpwuid(uid)
+ if (pw != "") {
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ }
+
+ if (euid != uid) {
+ printf(" euid=%d", euid)
+ pw = getpwuid(euid)
+ if (pw != "") {
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ }
+ }
+
+ printf(" gid=%d", gid)
+ pw = getgrgid(gid)
+ if (pw != "") {
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ }
+
+ if (egid != gid) {
+ printf(" egid=%d", egid)
+ pw = getgrgid(egid)
+ if (pw != "") {
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ }
+ }
+
+ if (NF > 4) {
+ printf(" groups=");
+ for (i = 5; i <= NF; i++) {
+ printf("%d", $i)
+ pw = getgrgid($i)
+ if (pw != "") {
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ }
+ if (i < NF)
+ printf(",")
+ }
+ }
+ print ""
+ }
+
+
+File: gawk.info, Node: Split Program, Next: Tee Program, Prev: Id Program, Up: Clones
+
+Splitting a Large File Into Pieces
+----------------------------------
+
+ The `split' program splits large text files into smaller pieces. By
+default, the output files are named `xaa', `xab', and so on. Each file
+has 1000 lines in it, with the likely exception of the last file. To
+change the number of lines in each file, you supply a number on the
+command line preceded with a minus, e.g., `-500' for files with 500
+lines in them instead of 1000. To change the name of the output files
+to something like `myfileaa', `myfileab', and so on, you supply an
+additional argument that specifies the filename.
+
+ Here is a version of `split' in `awk'. It uses the `ord' and `chr'
+functions presented in *Note Translating Between Characters and
+Numbers: Ordinal Functions.
+
+ The program first sets its defaults, and then tests to make sure
+there are not too many arguments. It then looks at each argument in
+turn. The first argument could be a minus followed by a number. If it
+is, this happens to look like a negative number, so it is made
+positive, and that is the count of lines. The data file name is
+skipped over, and the final argument is used as the prefix for the
+output file names.
+
+ # split.awk --- do split in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # usage: split [-num] [file] [outname]
+
+ BEGIN \
+ {
+ outfile = "x" # default
+ count = 1000
+ if (ARGC > 4)
+ usage()
+
+ i = 1
+ if (ARGV[i] ~ /^-[0-9]+$/) {
+ count = -ARGV[i]
+ ARGV[i] = ""
+ i++
+ }
+ # test argv in case reading from stdin instead of file
+ if (i in ARGV)
+ i++ # skip data file name
+ if (i in ARGV) {
+ outfile = ARGV[i]
+ ARGV[i] = ""
+ }
+
+ s1 = s2 = "a"
+ out = (outfile s1 s2)
+ }
+
+ The next rule does most of the work. `tcount' (temporary count)
+tracks how many lines have been printed to the output file so far. If
+it is greater than `count', it is time to close the current file and
+start a new one. `s1' and `s2' track the current suffixes for the file
+name. If they are both `z', the file is just too big. Otherwise, `s1'
+moves to the next letter in the alphabet and `s2' starts over again at
+`a'.
+
+ {
+ if (++tcount > count) {
+ close(out)
+ if (s2 == "z") {
+ if (s1 == "z") {
+ printf("split: %s is too large to split\n", \
+ FILENAME) > "/dev/stderr"
+ exit 1
+ }
+ s1 = chr(ord(s1) + 1)
+ s2 = "a"
+ } else
+ s2 = chr(ord(s2) + 1)
+ out = (outfile s1 s2)
+ tcount = 1
+ }
+ print > out
+ }
+
+ The `usage' function simply prints an error message and exits.
+
+ function usage( e)
+ {
+ e = "usage: split [-num] [file] [outname]"
+ print e > "/dev/stderr"
+ exit 1
+ }
+
+The variable `e' is used so that the function fits nicely on the screen.
+
+ This program is a bit sloppy; it relies on `awk' to close the last
+file for it automatically, instead of doing it in an `END' rule.
+
+
+File: gawk.info, Node: Tee Program, Next: Uniq Program, Prev: Split Program, Up: Clones
+
+Duplicating Output Into Multiple Files
+--------------------------------------
+
+ The `tee' program is known as a "pipe fitting." `tee' copies its
+standard input to its standard output, and also duplicates it to the
+files named on the command line. Its usage is:
+
+ tee [-a] file ...
+
+ The `-a' option tells `tee' to append to the named files, instead of
+truncating them and starting over.
+
+ The `BEGIN' rule first makes a copy of all the command line
+arguments, into an array named `copy'. `ARGV[0]' is not copied, since
+it is not needed. `tee' cannot use `ARGV' directly, since `awk' will
+attempt to process each file named in `ARGV' as input data.
+
+ If the first argument is `-a', then the flag variable `append' is
+set to true, and both `ARGV[1]' and `copy[1]' are deleted. If `ARGC' is
+less than two, then no file names were supplied, and `tee' prints a
+usage message and exits. Finally, `awk' is forced to read the standard
+input by setting `ARGV[1]' to `"-"', and `ARGC' to two.
+
+ # tee.awk --- tee in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+ # Revised December 1995
+
+ BEGIN \
+ {
+ for (i = 1; i < ARGC; i++)
+ copy[i] = ARGV[i]
+
+ if (ARGV[1] == "-a") {
+ append = 1
+ delete ARGV[1]
+ delete copy[1]
+ ARGC--
+ }
+ if (ARGC < 2) {
+ print "usage: tee [-a] file ..." > "/dev/stderr"
+ exit 1
+ }
+ ARGV[1] = "-"
+ ARGC = 2
+ }
+
+ The single rule does all the work. Since there is no pattern, it is
+executed for each line of input. The body of the rule simply prints the
+line into each file on the command line, and then to the standard
+output.
+
+ {
+ # moving the if outside the loop makes it run faster
+ if (append)
+ for (i in copy)
+ print >> copy[i]
+ else
+ for (i in copy)
+ print > copy[i]
+ print
+ }
+
+ It would have been possible to code the loop this way:
+
+ for (i in copy)
+ if (append)
+ print >> copy[i]
+ else
+ print > copy[i]
+
+This is more concise, but it is also less efficient. The `if' is
+tested for each record and for each output file. By duplicating the
+loop body, the `if' is only tested once for each input record. If
+there are N input records and M input files, the first method only
+executes N `if' statements, while the second would execute N`*'M `if'
+statements.
+
+ Finally, the `END' rule cleans up, by closing all the output files.
+
+ END \
+ {
+ for (i in copy)
+ close(copy[i])
+ }
+
+
+File: gawk.info, Node: Uniq Program, Next: Wc Program, Prev: Tee Program, Up: Clones
+
+Printing Non-duplicated Lines of Text
+-------------------------------------
+
+ The `uniq' utility reads sorted lines of data on its standard input,
+and (by default) removes duplicate lines. In other words, only unique
+lines are printed, hence the name. `uniq' has a number of options. The
+usage is:
+
+ uniq [-udc [-N]] [+N] [ INPUT FILE [ OUTPUT FILE ]]
+
+ The option meanings are:
+
+`-d'
+ Only print repeated lines.
+
+`-u'
+ Only print non-repeated lines.
+
+`-c'
+ Count lines. This option overrides `-d' and `-u'. Both repeated
+ and non-repeated lines are counted.
+
+`-N'
+ Skip N fields before comparing lines. The definition of fields is
+ similar to `awk''s default: non-whitespace characters separated by
+ runs of spaces and/or tabs.
+
+`+N'
+ Skip N characters before comparing lines. Any fields specified
+ with `-N' are skipped first.
+
+`INPUT FILE'
+ Data is read from the input file named on the command line,
+ instead of from the standard input.
+
+`OUTPUT FILE'
+ The generated output is sent to the named output file, instead of
+ to the standard output.
+
+ Normally `uniq' behaves as if both the `-d' and `-u' options had
+been provided.
+
+ Here is an `awk' implementation of `uniq'. It uses the `getopt'
+library function (*note Processing Command Line Options: Getopt
+Function.), and the `join' library function (*note Merging an Array
+Into a String: Join Function.).
+
+ The program begins with a `usage' function and then a brief outline
+of the options and their meanings in a comment.
+
+ The `BEGIN' rule deals with the command line arguments and options.
+It uses a trick to get `getopt' to handle options of the form `-25',
+treating such an option as the option letter `2' with an argument of
+`5'. If indeed two or more digits were supplied (`Optarg' looks like a
+number), `Optarg' is concatenated with the option digit, and then
+result is added to zero to make it into a number. If there is only one
+digit in the option, then `Optarg' is not needed, and `Optind' must be
+decremented so that `getopt' will process it next time. This code is
+admittedly a bit tricky.
+
+ If no options were supplied, then the default is taken, to print both
+repeated and non-repeated lines. The output file, if provided, is
+assigned to `outputfile'. Earlier, `outputfile' was initialized to the
+standard output, `/dev/stdout'.
+
+ # uniq.awk --- do uniq in awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ function usage( e)
+ {
+ e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
+ print e > "/dev/stderr"
+ exit 1
+ }
+
+ # -c count lines. overrides -d and -u
+ # -d only repeated lines
+ # -u only non-repeated lines
+ # -n skip n fields
+ # +n skip n characters, skip fields first
+
+ BEGIN \
+ {
+ count = 1
+ outputfile = "/dev/stdout"
+ opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ while ((c = getopt(ARGC, ARGV, opts)) != -1) {
+ if (c == "u")
+ non_repeated_only++
+ else if (c == "d")
+ repeated_only++
+ else if (c == "c")
+ do_count++
+ else if (index("0123456789", c) != 0) {
+ # getopt requires args to options
+ # this messes us up for things like -5
+ if (Optarg ~ /^[0-9]+$/)
+ fcount = (c Optarg) + 0
+ else {
+ fcount = c + 0
+ Optind--
+ }
+ } else
+ usage()
+ }
+
+ if (ARGV[Optind] ~ /^\+[0-9]+$/) {
+ charcount = substr(ARGV[Optind], 2) + 0
+ Optind++
+ }
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+
+ if (repeated_only == 0 && non_repeated_only == 0)
+ repeated_only = non_repeated_only = 1
+
+ if (ARGC - Optind == 2) {
+ outputfile = ARGV[ARGC - 1]
+ ARGV[ARGC - 1] = ""
+ }
+ }
+
+ The following function, `are_equal', compares the current line,
+`$0', to the previous line, `last'. It handles skipping fields and
+characters.
+
+ If no field count and no character count were specified, `are_equal'
+simply returns one or zero depending upon the result of a simple string
+comparison of `last' and `$0'. Otherwise, things get more complicated.
+
+ If fields have to be skipped, each line is broken into an array using
+`split' (*note Built-in Functions for String Manipulation: String
+Functions.), and then the desired fields are joined back into a line
+using `join'. The joined lines are stored in `clast' and `cline'. If
+no fields are skipped, `clast' and `cline' are set to `last' and `$0'
+respectively.
+
+ Finally, if characters are skipped, `substr' is used to strip off the
+leading `charcount' characters in `clast' and `cline'. The two strings
+are then compared, and `are_equal' returns the result.
+
+ function are_equal( n, m, clast, cline, alast, aline)
+ {
+ if (fcount == 0 && charcount == 0)
+ return (last == $0)
+
+ if (fcount > 0) {
+ n = split(last, alast)
+ m = split($0, aline)
+ clast = join(alast, fcount+1, n)
+ cline = join(aline, fcount+1, m)
+ } else {
+ clast = last
+ cline = $0
+ }
+ if (charcount) {
+ clast = substr(clast, charcount + 1)
+ cline = substr(cline, charcount + 1)
+ }
+
+ return (clast == cline)
+ }
+
+ The following two rules are the body of the program. The first one
+is executed only for the very first line of data. It sets `last' equal
+to `$0', so that subsequent lines of text have something to be compared
+to.
+
+ The second rule does the work. The variable `equal' will be one or
+zero depending upon the results of `are_equal''s comparison. If `uniq'
+is counting repeated lines, then the `count' variable is incremented if
+the lines are equal. Otherwise the line is printed and `count' is
+reset, since the two lines are not equal.
+
+ If `uniq' is not counting, `count' is incremented if the lines are
+equal. Otherwise, if `uniq' is counting repeated lines, and more than
+one line has been seen, or if `uniq' is counting non-repeated lines,
+and only one line has been seen, then the line is printed, and `count'
+is reset.
+
+ Finally, similar logic is used in the `END' rule to print the final
+line of input data.
+
+ NR == 1 {
+ last = $0
+ next
+ }
+
+ {
+ equal = are_equal()
+
+ if (do_count) { # overrides -d and -u
+ if (equal)
+ count++
+ else {
+ printf("%4d %s\n", count, last) > outputfile
+ last = $0
+ count = 1 # reset
+ }
+ next
+ }
+
+ if (equal)
+ count++
+ else {
+ if ((repeated_only && count > 1) ||
+ (non_repeated_only && count == 1))
+ print last > outputfile
+ last = $0
+ count = 1
+ }
+ }
+
+ END {
+ if (do_count)
+ printf("%4d %s\n", count, last) > outputfile
+ else if ((repeated_only && count > 1) ||
+ (non_repeated_only && count == 1))
+ print last > outputfile
+ }
+
+
+File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones
+
+Counting Things
+---------------
+
+ The `wc' (word count) utility counts lines, words, and characters in
+one or more input files. Its usage is:
+
+ wc [-lwc] [ FILES ... ]
+
+ If no files are specified on the command line, `wc' reads its
+standard input. If there are multiple files, it will also print total
+counts for all the files. The options and their meanings are:
+
+`-l'
+ Only count lines.
+
+`-w'
+ Only count words. A "word" is a contiguous sequence of
+ non-whitespace characters, separated by spaces and/or tabs.
+ Happily, this is the normal way `awk' separates fields in its
+ input data.
+
+`-c'
+ Only count characters.
+
+ Implementing `wc' in `awk' is particularly elegant, since `awk' does
+a lot of the work for us; it splits lines into words (i.e. fields) and
+counts them, it counts lines (i.e. records) for us, and it can easily
+tell us how long a line is.
+
+ This version uses the `getopt' library function (*note Processing
+Command Line Options: Getopt Function.), and the file transition
+functions (*note Noting Data File Boundaries: Filetrans Function.).
+
+ This version has one major difference from traditional versions of
+`wc'. Our version always prints the counts in the order lines, words,
+and characters. Traditional versions note the order of the `-l', `-w',
+and `-c' options on the command line, and print the counts in that
+order.
+
+ The `BEGIN' rule does the argument processing. The variable
+`print_total' will be true if more than one file was named on the
+command line.
+
+ # wc.awk --- count lines, words, characters
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # Options:
+ # -l only count lines
+ # -w only count words
+ # -c only count characters
+ #
+ # Default is to count lines, words, characters
+
+ BEGIN {
+ # let getopt print a message about
+ # invalid options. we ignore them
+ while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
+ if (c == "l")
+ do_lines = 1
+ else if (c == "w")
+ do_words = 1
+ else if (c == "c")
+ do_chars = 1
+ }
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+
+ # if no options, do all
+ if (! do_lines && ! do_words && ! do_chars)
+ do_lines = do_words = do_chars = 1
+
+ print_total = (ARGC - i > 2)
+ }
+
+ The `beginfile' function is simple; it just resets the counts of
+lines, words, and characters to zero, and saves the current file name in
+`fname'.
+
+ The `endfile' function adds the current file's numbers to the running
+totals of lines, words, and characters. It then prints out those
+numbers for the file that was just read. It relies on `beginfile' to
+reset the numbers for the following data file.
+
+ function beginfile(file)
+ {
+ chars = lines = words = 0
+ fname = FILENAME
+ }
+
+ function endfile(file)
+ {
+ tchars += chars
+ tlines += lines
+ twords += words
+ if (do_lines)
+ printf "\t%d", lines
+ if (do_words)
+ printf "\t%d", words
+ if (do_chars)
+ printf "\t%d", chars
+ printf "\t%s\n", fname
+ }
+
+ There is one rule that is executed for each line. It adds the length
+of the record to `chars'. It has to add one, since the newline
+character separating records (the value of `RS') is not part of the
+record itself. `lines' is incremented for each line read, and `words'
+is incremented by the value of `NF', the number of "words" on this
+line.(1)
+
+ Finally, the `END' rule simply prints the totals for all the files.
+
+ # do per line
+ {
+ chars += length($0) + 1 # get newline
+ lines++
+ words += NF
+ }
+
+ END {
+ if (print_total) {
+ if (do_lines)
+ printf "\t%d", tlines
+ if (do_words)
+ printf "\t%d", twords
+ if (do_chars)
+ printf "\t%d", tchars
+ print "\ttotal"
+ }
+ }
+
+ ---------- Footnotes ----------
+
+ (1) Examine the code in *Note Noting Data File Boundaries: Filetrans
+Function. Why must `wc' use a separate `lines' variable, instead of
+using the value of `FNR' in `endfile'?
+
+
+File: gawk.info, Node: Miscellaneous Programs, Prev: Clones, Up: Sample Programs
+
+A Grab Bag of `awk' Programs
+============================
+
+ This section is a large "grab bag" of miscellaneous programs. We
+hope you find them both interesting and enjoyable.
+
+* Menu:
+
+* Dupword Program:: Finding duplicated words in a document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the `tr' utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage count.
+* History Sorting:: Eliminating duplicate entries from a history
+ file.
+* Extract Program:: Pulling out programs from Texinfo source
+ files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for `awk' that includes files.
+
+
+File: gawk.info, Node: Dupword Program, Next: Alarm Program, Prev: Miscellaneous Programs, Up: Miscellaneous Programs
+
+Finding Duplicated Words in a Document
+--------------------------------------
+
+ A common error when writing large amounts of prose is to accidentally
+duplicate words. Often you will see this in text as something like "the
+the program does the following ...." When the text is on-line, often
+the duplicated words occur at the end of one line and the beginning of
+another, making them very difficult to spot.
+
+ This program, `dupword.awk', scans through a file one line at a time,
+and looks for adjacent occurrences of the same word. It also saves the
+last word on a line (in the variable `prev') for comparison with the
+first word on the next line.
+
+ The first two statements make sure that the line is all lower-case,
+so that, for example, "The" and "the" compare equal to each other. The
+second statement removes all non-alphanumeric and non-whitespace
+characters from the line, so that punctuation does not affect the
+comparison either. This sometimes leads to reports of duplicated words
+that really are different, but this is unusual.
+
+ # dupword --- find duplicate words in text
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # December 1991
+
+ {
+ $0 = tolower($0)
+ gsub(/[^A-Za-z0-9 \t]/, "");
+ if ($1 == prev)
+ printf("%s:%d: duplicate %s\n",
+ FILENAME, FNR, $1)
+ for (i = 2; i <= NF; i++)
+ if ($i == $(i-1))
+ printf("%s:%d: duplicate %s\n",
+ FILENAME, FNR, $i)
+ prev = $NF
+ }
+
+
+File: gawk.info, Node: Alarm Program, Next: Translate Program, Prev: Dupword Program, Up: Miscellaneous Programs
+
+An Alarm Clock Program
+----------------------
+
+ The following program is a simple "alarm clock" program. You give
+it a time of day, and an optional message. At the given time, it
+prints the message on the standard output. In addition, you can give it
+the number of times to repeat the message, and also a delay between
+repetitions.
+
+ This program uses the `gettimeofday' function from *Note Managing
+the Time of Day: Gettimeofday Function.
+
+ All the work is done in the `BEGIN' rule. The first part is argument
+checking and setting of defaults; the delay, the count, and the message
+to print. If the user supplied a message, but it does not contain the
+ASCII BEL character (known as the "alert" character, `\a'), then it is
+added to the message. (On many systems, printing the ASCII BEL
+generates some sort of audible alert. Thus, when the alarm goes off,
+the system calls attention to itself, in case the user is not looking
+at their computer or terminal.)
+
+ # alarm --- set an alarm
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # usage: alarm time [ "message" [ count [ delay ] ] ]
+
+ BEGIN \
+ {
+ # Initial argument sanity checking
+ usage1 = "usage: alarm time ['message' [count [delay]]]"
+ usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
+
+ if (ARGC < 2) {
+ print usage > "/dev/stderr"
+ exit 1
+ } else if (ARGC == 5) {
+ delay = ARGV[4] + 0
+ count = ARGV[3] + 0
+ message = ARGV[2]
+ } else if (ARGC == 4) {
+ count = ARGV[3] + 0
+ message = ARGV[2]
+ } else if (ARGC == 3) {
+ message = ARGV[2]
+ } else if (ARGV[1] !~ /[0-9]?[0-9]:[0-9][0-9]/) {
+ print usage1 > "/dev/stderr"
+ print usage2 > "/dev/stderr"
+ exit 1
+ }
+
+ # set defaults for once we reach the desired time
+ if (delay == 0)
+ delay = 180 # 3 minutes
+ if (count == 0)
+ count = 5
+ if (message == "")
+ message = sprintf("\aIt is now %s!\a", ARGV[1])
+ else if (index(message, "\a") == 0)
+ message = "\a" message "\a"
+
+ The next section of code turns the alarm time into hours and minutes,
+and converts it if necessary to a 24-hour clock. Then it turns that
+time into a count of the seconds since midnight. Next it turns the
+current time into a count of seconds since midnight. The difference
+between the two is how long to wait before setting off the alarm.
+
+ # split up dest time
+ split(ARGV[1], atime, ":")
+ hour = atime[1] + 0 # force numeric
+ minute = atime[2] + 0 # force numeric
+
+ # get current broken down time
+ gettimeofday(now)
+
+ # if time given is 12-hour hours and it's after that
+ # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
+ # then add 12 to real hour
+ if (hour < 12 && now["hour"] > hour)
+ hour += 12
+
+ # set target time in seconds since midnight
+ target = (hour * 60 * 60) + (minute * 60)
+
+ # get current time in seconds since midnight
+ current = (now["hour"] * 60 * 60) + \
+ (now["minute"] * 60) + now["second"]
+
+ # how long to sleep for
+ naptime = target - current
+ if (naptime <= 0) {
+ print "time is in the past!" > "/dev/stderr"
+ exit 1
+ }
+
+ Finally, the program uses the `system' function (*note Built-in
+Functions for Input/Output: I/O Functions.) to call the `sleep'
+utility. The `sleep' utility simply pauses for the given number of
+seconds. If the exit status is not zero, the program assumes that
+`sleep' was interrupted, and exits. If `sleep' exited with an OK status
+(zero), then the program prints the message in a loop, again using
+`sleep' to delay for however many seconds are necessary.
+
+ # zzzzzz..... go away if interrupted
+ if (system(sprintf("sleep %d", naptime)) != 0)
+ exit 1
+
+ # time to notify!
+ command = sprintf("sleep %d", delay)
+ for (i = 1; i <= count; i++) {
+ print message
+ # if sleep command interrupted, go away
+ if (system(command) != 0)
+ break
+ }
+
+ exit 0
+ }
+
+
+File: gawk.info, Node: Translate Program, Next: Labels Program, Prev: Alarm Program, Up: Miscellaneous Programs
+
+Transliterating Characters
+--------------------------
+
+ The system `tr' utility transliterates characters. For example, it
+is often used to map upper-case letters into lower-case, for further
+processing.
+
+ GENERATE DATA | tr '[A-Z]' '[a-z]' | PROCESS DATA ...
+
+ You give `tr' two lists of characters enclosed in square brackets.
+Usually, the lists are quoted to keep the shell from attempting to do a
+filename expansion.(1) When processing the input, the first character
+in the first list is replaced with the first character in the second
+list, the second character in the first list is replaced with the
+second character in the second list, and so on. If there are more
+characters in the "from" list than in the "to" list, the last character
+of the "to" list is used for the remaining characters in the "from"
+list.
+
+ Some time ago, a user proposed to us that we add a transliteration
+function to `gawk'. Being opposed to "creeping featurism," I wrote the
+following program to prove that character transliteration could be done
+with a user-level function. This program is not as complete as the
+system `tr' utility, but it will do most of the job.
+
+ The `translate' program demonstrates one of the few weaknesses of
+standard `awk': dealing with individual characters is very painful,
+requiring repeated use of the `substr', `index', and `gsub' built-in
+functions (*note Built-in Functions for String Manipulation: String
+Functions.).(2)
+
+ There are two functions. The first, `stranslate', takes three
+arguments.
+
+`from'
+ A list of characters to translate from.
+
+`to'
+ A list of characters to translate to.
+
+`target'
+ The string to do the translation on.
+
+ Associative arrays make the translation part fairly easy. `t_ar'
+holds the "to" characters, indexed by the "from" characters. Then a
+simple loop goes through `from', one character at a time. For each
+character in `from', if the character appears in `target', `gsub' is
+used to change it to the corresponding `to' character.
+
+ The `translate' function simply calls `stranslate' using `$0' as the
+target. The main program sets two global variables, `FROM' and `TO',
+from the command line, and then changes `ARGV' so that `awk' will read
+from the standard input.
+
+ Finally, the processing rule simply calls `translate' for each
+record.
+
+ # translate --- do tr like stuff
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # August 1989
+
+ # bugs: does not handle things like: tr A-Z a-z, it has
+ # to be spelled out. However, if `to' is shorter than `from',
+ # the last character in `to' is used for the rest of `from'.
+
+ function stranslate(from, to, target, lf, lt, t_ar, i, c)
+ {
+ lf = length(from)
+ lt = length(to)
+ for (i = 1; i <= lt; i++)
+ t_ar[substr(from, i, 1)] = substr(to, i, 1)
+ if (lt < lf)
+ for (; i <= lf; i++)
+ t_ar[substr(from, i, 1)] = substr(to, lt, 1)
+ for (i = 1; i <= lf; i++) {
+ c = substr(from, i, 1)
+ if (index(target, c) > 0)
+ gsub(c, t_ar[c], target)
+ }
+ return target
+ }
+
+ function translate(from, to)
+ {
+ return $0 = stranslate(from, to, $0)
+ }
+
+ # main program
+ BEGIN {
+ if (ARGC < 3) {
+ print "usage: translate from to" > "/dev/stderr"
+ exit
+ }
+ FROM = ARGV[1]
+ TO = ARGV[2]
+ ARGC = 2
+ ARGV[1] = "-"
+ }
+
+ {
+ translate(FROM, TO)
+ print
+ }
+
+ While it is possible to do character transliteration in a user-level
+function, it is not necessarily efficient, and we started to consider
+adding a built-in function. However, shortly after writing this
+program, we learned that the System V Release 4 `awk' had added the
+`toupper' and `tolower' functions. These functions handle the vast
+majority of the cases where character transliteration is necessary, and
+so we chose to simply add those functions to `gawk' as well, and then
+leave well enough alone.
+
+ An obvious improvement to this program would be to set up the `t_ar'
+array only once, in a `BEGIN' rule. However, this assumes that the
+"from" and "to" lists will never change throughout the lifetime of the
+program.
+
+ ---------- Footnotes ----------
+
+ (1) On older, non-POSIX systems, `tr' often does not require that
+the lists be enclosed in square brackets and quoted. This is a feature.
+
+ (2) This program was written before `gawk' acquired the ability to
+split each character in a string into separate array elements. How
+might this ability simplify the program?
+
+
+File: gawk.info, Node: Labels Program, Next: Word Sorting, Prev: Translate Program, Up: Miscellaneous Programs
+
+Printing Mailing Labels
+-----------------------
+
+ Here is a "real world"(1) program. This script reads lists of names
+and addresses, and generates mailing labels. Each page of labels has
+20 labels on it, two across and ten down. The addresses are guaranteed
+to be no more than five lines of data. Each address is separated from
+the next by a blank line.
+
+ The basic idea is to read 20 labels worth of data. Each line of
+each label is stored in the `line' array. The single rule takes care
+of filling the `line' array and printing the page when 20 labels have
+been read.
+
+ The `BEGIN' rule simply sets `RS' to the empty string, so that `awk'
+will split records at blank lines (*note How Input is Split into
+Records: Records.). It sets `MAXLINES' to 100, since `MAXLINE' is the
+maximum number of lines on the page (20 * 5 = 100).
+
+ Most of the work is done in the `printpage' function. The label
+lines are stored sequentially in the `line' array. But they have to be
+printed horizontally; `line[1]' next to `line[6]', `line[2]' next to
+`line[7]', and so on. Two loops are used to accomplish this. The
+outer loop, controlled by `i', steps through every 10 lines of data;
+this is each row of labels. The inner loop, controlled by `j', goes
+through the lines within the row. As `j' goes from zero to four, `i+j'
+is the `j''th line in the row, and `i+j+5' is the entry next to it.
+The output ends up looking something like this:
+
+ line 1 line 6
+ line 2 line 7
+ line 3 line 8
+ line 4 line 9
+ line 5 line 10
+
+ As a final note, at lines 21 and 61, an extra blank line is printed,
+to keep the output lined up on the labels. This is dependent on the
+particular brand of labels in use when the program was written. You
+will also note that there are two blank lines at the top and two blank
+lines at the bottom.
+
+ The `END' rule arranges to flush the final page of labels; there may
+not have been an even multiple of 20 labels in the data.
+
+ # labels.awk
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # June 1992
+
+ # Program to print labels. Each label is 5 lines of data
+ # that may have blank lines. The label sheets have 2
+ # blank lines at the top and 2 at the bottom.
+
+ BEGIN { RS = "" ; MAXLINES = 100 }
+
+ function printpage( i, j)
+ {
+ if (Nlines <= 0)
+ return
+
+ printf "\n\n" # header
+
+ for (i = 1; i <= Nlines; i += 10) {
+ if (i == 21 || i == 61)
+ print ""
+ for (j = 0; j < 5; j++) {
+ if (i + j > MAXLINES)
+ break
+ printf " %-41s %s\n", line[i+j], line[i+j+5]
+ }
+ print ""
+ }
+
+ printf "\n\n" # footer
+
+ for (i in line)
+ line[i] = ""
+ }
+
+ # main rule
+ {
+ if (Count >= 20) {
+ printpage()
+ Count = 0
+ Nlines = 0
+ }
+ n = split($0, a, "\n")
+ for (i = 1; i <= n; i++)
+ line[++Nlines] = a[i]
+ for (; i <= 5; i++)
+ line[++Nlines] = ""
+ Count++
+ }
+
+ END \
+ {
+ printpage()
+ }
+
+ ---------- Footnotes ----------
+
+ (1) "Real world" is defined as "a program actually used to get
+something done."
+
+
+File: gawk.info, Node: Word Sorting, Next: History Sorting, Prev: Labels Program, Up: Miscellaneous Programs
+
+Generating Word Usage Counts
+----------------------------
+
+ The following `awk' program prints the number of occurrences of each
+word in its input. It illustrates the associative nature of `awk'
+arrays by using strings as subscripts. It also demonstrates the `for X
+in ARRAY' construction. Finally, it shows how `awk' can be used in
+conjunction with other utility programs to do a useful task of some
+complexity with a minimum of effort. Some explanations follow the
+program listing.
+
+ awk '
+ # Print list of word frequencies
+ {
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+ }
+
+ END {
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+ }'
+
+ The first thing to notice about this program is that it has two
+rules. The first rule, because it has an empty pattern, is executed on
+every line of the input. It uses `awk''s field-accessing mechanism
+(*note Examining Fields: Fields.) to pick out the individual words from
+the line, and the built-in variable `NF' (*note Built-in Variables::)
+to know how many fields are available.
+
+ For each input word, an element of the array `freq' is incremented to
+reflect that the word has been seen an additional time.
+
+ The second rule, because it has the pattern `END', is not executed
+until the input has been exhausted. It prints out the contents of the
+`freq' table that has been built up inside the first action.
+
+ This program has several problems that would prevent it from being
+useful by itself on real text files:
+
+ * Words are detected using the `awk' convention that fields are
+ separated by whitespace and that other characters in the input
+ (except newlines) don't have any special meaning to `awk'. This
+ means that punctuation characters count as part of words.
+
+ * The `awk' language considers upper- and lower-case characters to be
+ distinct. Therefore, `bartender' and `Bartender' are not treated
+ as the same word. This is undesirable since, in normal text, words
+ are capitalized if they begin sentences, and a frequency analyzer
+ should not be sensitive to capitalization.
+
+ * The output does not come out in any useful order. You're more
+ likely to be interested in which words occur most frequently, or
+ having an alphabetized table of how frequently each word occurs.
+
+ The way to solve these problems is to use some of the more advanced
+features of the `awk' language. First, we use `tolower' to remove case
+distinctions. Next, we use `gsub' to remove punctuation characters.
+Finally, we use the system `sort' utility to process the output of the
+`awk' script. Here is the new version of the program:
+
+ # Print list of word frequencies
+ {
+ $0 = tolower($0) # remove case distinctions
+ gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+ }
+
+ END {
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+ }
+
+ Assuming we have saved this program in a file named `wordfreq.awk',
+and that the data is in `file1', the following pipeline
+
+ awk -f wordfreq.awk file1 | sort +1 -nr
+
+produces a table of the words appearing in `file1' in order of
+decreasing frequency.
+
+ The `awk' program suitably massages the data and produces a word
+frequency table, which is not ordered.
+
+ The `awk' script's output is then sorted by the `sort' utility and
+printed on the terminal. The options given to `sort' in this example
+specify to sort using the second field of each input line (skipping one
+field), that the sort keys should be treated as numeric quantities
+(otherwise `15' would come before `5'), and that the sorting should be
+done in descending (reverse) order.
+
+ We could have even done the `sort' from within the program, by
+changing the `END' action to:
+
+ END {
+ sort = "sort +1 -nr"
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word] | sort
+ close(sort)
+ }
+
+ You would have to use this way of sorting on systems that do not
+have true pipes.
+
+ See the general operating system documentation for more information
+on how to use the `sort' program.
+
+
+File: gawk.info, Node: History Sorting, Next: Extract Program, Prev: Word Sorting, Up: Miscellaneous Programs
+
+Removing Duplicates from Unsorted Text
+--------------------------------------
+
+ The `uniq' program (*note Printing Non-duplicated Lines of Text:
+Uniq Program.), removes duplicate lines from _sorted_ data.
+
+ Suppose, however, you need to remove duplicate lines from a data
+file, but that you wish to preserve the order the lines are in? A good
+example of this might be a shell history file. The history file keeps
+a copy of all the commands you have entered, and it is not unusual to
+repeat a command several times in a row. Occasionally you might wish
+to compact the history by removing duplicate entries. Yet it is
+desirable to maintain the order of the original commands.
+
+ This simple program does the job. It uses two arrays. The `data'
+array is indexed by the text of each line. For each line, `data[$0]'
+is incremented.
+
+ If a particular line has not been seen before, then `data[$0]' will
+be zero. In that case, the text of the line is stored in
+`lines[count]'. Each element of `lines' is a unique command, and the
+indices of `lines' indicate the order in which those lines were
+encountered. The `END' rule simply prints out the lines, in order.
+
+ # histsort.awk --- compact a shell history file
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ # Thanks to Byron Rakitzis for the general idea
+ {
+ if (data[$0]++ == 0)
+ lines[++count] = $0
+ }
+
+ END {
+ for (i = 1; i <= count; i++)
+ print lines[i]
+ }
+
+ This program also provides a foundation for generating other useful
+information. For example, using the following `print' satement in the
+`END' rule would indicate how often a particular command was used.
+
+ print data[lines[i]], lines[i]
+
+ This works because `data[$0]' was incremented each time a line was
+seen.
+
+
+File: gawk.info, Node: Extract Program, Next: Simple Sed, Prev: History Sorting, Up: Miscellaneous Programs
+
+Extracting Programs from Texinfo Source Files
+---------------------------------------------
+
+ The nodes *Note A Library of `awk' Functions: Library Functions, and
+*Note Practical `awk' Programs: Sample Programs, are the top level
+nodes for a large number of `awk' programs. If you wish to experiment
+with these programs, it is tedious to have to type them in by hand.
+Here we present a program that can extract parts of a Texinfo input
+file into separate files.
+
+ This Info file is written in Texinfo, the GNU project's document
+formatting language. A single Texinfo source file can be used to
+produce both printed and on-line documentation. The Texinfo language
+is described fully, starting with *Note Introduction: (texi)Top.
+
+ For our purposes, it is enough to know three things about Texinfo
+input files.
+
+ * The "at" symbol, `@', is special in Texinfo, much like `\' in C or
+ `awk'. Literal `@' symbols are represented in Texinfo source
+ files as `@@'.
+
+ * Comments start with either `@c' or `@comment'. The file
+ extraction program will work by using special comments that start
+ at the beginning of a line.
+
+ * Example text that should not be split across a page boundary is
+ bracketed between lines containing `@group' and `@end group'
+ commands.
+
+ The following program, `extract.awk', reads through a Texinfo source
+file, and does two things, based on the special comments. Upon seeing
+`@c system ...', it runs a command, by extracting the command text from
+the control line and passing it on to the `system' function (*note
+Built-in Functions for Input/Output: I/O Functions.). Upon seeing `@c
+file FILENAME', each subsequent line is sent to the file FILENAME,
+until `@c endfile' is encountered. The rules in `extract.awk' will
+match either `@c' or `@comment' by letting the `omment' part be
+optional. Lines containing `@group' and `@end group' are simply
+removed. `extract.awk' uses the `join' library function (*note Merging
+an Array Into a String: Join Function.).
+
+ The example programs in the on-line Texinfo source for `The GNU Awk
+User's Guide' (`gawk.texi') have all been bracketed inside `file', and
+`endfile' lines. The `gawk' distribution uses a copy of `extract.awk'
+to extract the sample programs and install many of them in a standard
+directory, where `gawk' can find them.
+
+ `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
+upper-case and lower-case letters in the directives won't matter.
+
+ The first rule handles calling `system', checking that a command was
+given (`NF' is at least three), and also checking that the command
+exited with a zero exit status, signifying OK.
+
+ # extract.awk --- extract files and run programs
+ # from texinfo files
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # May 1993
+
+ BEGIN { IGNORECASE = 1 }
+
+ /^@c(omment)?[ \t]+system/ \
+ {
+ if (NF < 3) {
+ e = (FILENAME ":" FNR)
+ e = (e ": badly formed `system' line")
+ print e > "/dev/stderr"
+ next
+ }
+ $1 = ""
+ $2 = ""
+ stat = system($0)
+ if (stat != 0) {
+ e = (FILENAME ":" FNR)
+ e = (e ": warning: system returned " stat)
+ print e > "/dev/stderr"
+ }
+ }
+
+The variable `e' is used so that the function fits nicely on the screen.
+
+ The second rule handles moving data into files. It verifies that a
+file name was given in the directive. If the file named is not the
+current file, then the current file is closed. This means that an `@c
+endfile' was not given for that file. (We should probably print a
+diagnostic in this case, although at the moment we do not.)
+
+ The `for' loop does the work. It reads lines using `getline' (*note
+Explicit Input with `getline': Getline.). For an unexpected end of
+file, it calls the `unexpected_eof' function. If the line is an
+"endfile" line, then it breaks out of the loop. If the line is an
+`@group' or `@end group' line, then it ignores it, and goes on to the
+next line.
+
+ Most of the work is in the following few lines. If the line has no
+`@' symbols, it can be printed directly. Otherwise, each leading `@'
+must be stripped off.
+
+ To remove the `@' symbols, the line is split into separate elements
+of the array `a', using the `split' function (*note Built-in Functions
+for String Manipulation: String Functions.). Each element of `a' that
+is empty indicates two successive `@' symbols in the original line.
+For each two empty elements (`@@' in the original file), we have to add
+back in a single `@' symbol.
+
+ When the processing of the array is finished, `join' is called with
+the value of `SUBSEP', to rejoin the pieces back into a single line.
+That line is then printed to the output file.
+
+ /^@c(omment)?[ \t]+file/ \
+ {
+ if (NF != 3) {
+ e = (FILENAME ":" FNR ": badly formed `file' line")
+ print e > "/dev/stderr"
+ next
+ }
+ if ($3 != curfile) {
+ if (curfile != "")
+ close(curfile)
+ curfile = $3
+ }
+
+ for (;;) {
+ if ((getline line) <= 0)
+ unexpected_eof()
+ if (line ~ /^@c(omment)?[ \t]+endfile/)
+ break
+ else if (line ~ /^@(end[ \t]+)?group/)
+ continue
+ if (index(line, "@") == 0) {
+ print line > curfile
+ continue
+ }
+ n = split(line, a, "@")
+ # if a[1] == "", means leading @,
+ # don't add one back in.
+ for (i = 2; i <= n; i++) {
+ if (a[i] == "") { # was an @@
+ a[i] = "@"
+ if (a[i+1] == "")
+ i++
+ }
+ }
+ print join(a, 1, n, SUBSEP) > curfile
+ }
+ }
+
+ An important thing to note is the use of the `>' redirection.
+Output done with `>' only opens the file once; it stays open and
+subsequent output is appended to the file (*note Redirecting Output of
+`print' and `printf': Redirection.). This allows us to easily mix
+program text and explanatory prose for the same sample source file (as
+has been done here!) without any hassle. The file is only closed when
+a new data file name is encountered, or at the end of the input file.
+
+ Finally, the function `unexpected_eof' prints an appropriate error
+message and then exits.
+
+ The `END' rule handles the final cleanup, closing the open file.
+
+ function unexpected_eof()
+ {
+ printf("%s:%d: unexpected EOF or error\n", \
+ FILENAME, FNR) > "/dev/stderr"
+ exit 1
+ }
+
+ END {
+ if (curfile)
+ close(curfile)
+ }
+
+
+File: gawk.info, Node: Simple Sed, Next: Igawk Program, Prev: Extract Program, Up: Miscellaneous Programs
+
+A Simple Stream Editor
+----------------------
+
+ The `sed' utility is a "stream editor," a program that reads a
+stream of data, makes changes to it, and passes the modified data on.
+It is often used to make global changes to a large file, or to a stream
+of data generated by a pipeline of commands.
+
+ While `sed' is a complicated program in its own right, its most
+common use is to perform global substitutions in the middle of a
+pipeline:
+
+ command1 < orig.data | sed 's/old/new/g' | command2 > result
+
+ Here, the `s/old/new/g' tells `sed' to look for the regexp `old' on
+each input line, and replace it with the text `new', globally (i.e. all
+the occurrences on a line). This is similar to `awk''s `gsub' function
+(*note Built-in Functions for String Manipulation: String Functions.).
+
+ The following program, `awksed.awk', accepts at least two command
+line arguments; the pattern to look for and the text to replace it
+with. Any additional arguments are treated as data file names to
+process. If none are provided, the standard input is used.
+
+ # awksed.awk --- do s/foo/bar/g using just print
+ # Thanks to Michael Brennan for the idea
+
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # August 1995
+
+ function usage()
+ {
+ print "usage: awksed pat repl [files...]" > "/dev/stderr"
+ exit 1
+ }
+
+ BEGIN {
+ # validate arguments
+ if (ARGC < 3)
+ usage()
+
+ RS = ARGV[1]
+ ORS = ARGV[2]
+
+ # don't use arguments as files
+ ARGV[1] = ARGV[2] = ""
+ }
+
+ # look ma, no hands!
+ {
+ if (RT == "")
+ printf "%s", $0
+ else
+ print
+ }
+
+ The program relies on `gawk''s ability to have `RS' be a regexp and
+on the setting of `RT' to the actual text that terminated the record
+(*note How Input is Split into Records: Records.).
+
+ The idea is to have `RS' be the pattern to look for. `gawk' will
+automatically set `$0' to the text between matches of the pattern.
+This is text that we wish to keep, unmodified. Then, by setting `ORS'
+to the replacement text, a simple `print' statement will output the
+text we wish to keep, followed by the replacement text.
+
+ There is one wrinkle to this scheme, which is what to do if the last
+record doesn't end with text that matches `RS'? Using a `print'
+statement unconditionally prints the replacement text, which is not
+correct.
+
+ However, if the file did not end in text that matches `RS', `RT'
+will be set to the null string. In this case, we can print `$0' using
+`printf' (*note Using `printf' Statements for Fancier Printing:
+Printf.).
+
+ The `BEGIN' rule handles the setup, checking for the right number of
+arguments, and calling `usage' if there is a problem. Then it sets `RS'
+and `ORS' from the command line arguments, and sets `ARGV[1]' and
+`ARGV[2]' to the null string, so that they will not be treated as file
+names (*note Using `ARGC' and `ARGV': ARGC and ARGV.).
+
+ The `usage' function prints an error message and exits.
+
+ Finally, the single rule handles the printing scheme outlined above,
+using `print' or `printf' as appropriate, depending upon the value of
+`RT'.
+
+
+File: gawk.info, Node: Igawk Program, Prev: Simple Sed, Up: Miscellaneous Programs
+
+An Easy Way to Use Library Functions
+------------------------------------
+
+ Using library functions in `awk' can be very beneficial. It
+encourages code re-use and the writing of general functions. Programs
+are smaller, and therefore clearer. However, using library functions
+is only easy when writing `awk' programs; it is painful when running
+them, requiring multiple `-f' options. If `gawk' is unavailable, then
+so too is the `AWKPATH' environment variable and the ability to put
+`awk' functions into a library directory (*note Command Line Options:
+Options.).
+
+ It would be nice to be able to write programs like so:
+
+ # library functions
+ @include getopt.awk
+ @include join.awk
+ ...
+
+ # main program
+ BEGIN {
+ while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
+ ...
+ ...
+ }
+
+ The following program, `igawk.sh', provides this service. It
+simulates `gawk''s searching of the `AWKPATH' variable, and also allows
+"nested" includes; i.e. a file that has been included with `@include'
+can contain further `@include' statements. `igawk' will make an effort
+to only include files once, so that nested includes don't accidentally
+include a library function twice.
+
+ `igawk' should behave externally just like `gawk'. This means it
+should accept all of `gawk''s command line arguments, including the
+ability to have multiple source files specified via `-f', and the
+ability to mix command line and library source files.
+
+ The program is written using the POSIX Shell (`sh') command language.
+The way the program works is as follows:
+
+ 1. Loop through the arguments, saving anything that doesn't represent
+ `awk' source code for later, when the expanded program is run.
+
+ 2. For any arguments that do represent `awk' text, put the arguments
+ into a temporary file that will be expanded. There are two cases.
+
+ a. Literal text, provided with `--source' or `--source='. This
+ text is just echoed directly. The `echo' program will
+ automatically supply a trailing newline.
+
+ b. File names provided with `-f'. We use a neat trick, and echo
+ `@include FILENAME' into the temporary file. Since the file
+ inclusion program will work the way `gawk' does, this will
+ get the text of the file included into the program at the
+ correct point.
+
+ 3. Run an `awk' program (naturally) over the temporary file to expand
+ `@include' statements. The expanded program is placed in a second
+ temporary file.
+
+ 4. Run the expanded program with `gawk' and any other original
+ command line arguments that the user supplied (such as the data
+ file names).
+
+ The initial part of the program turns on shell tracing if the first
+argument was `debug'. Otherwise, a shell `trap' statement arranges to
+clean up any temporary files on program exit or upon an interrupt.
+
+ The next part loops through all the command line arguments. There
+are several cases of interest.
+
+`--'
+ This ends the arguments to `igawk'. Anything else should be
+ passed on to the user's `awk' program without being evaluated.
+
+`-W'
+ This indicates that the next option is specific to `gawk'. To make
+ argument processing easier, the `-W' is appended to the front of
+ the remaining arguments and the loop continues. (This is an `sh'
+ programming trick. Don't worry about it if you are not familiar
+ with `sh'.)
+
+`-v'
+`-F'
+ These are saved and passed on to `gawk'.
+
+`-f'
+`--file'
+`--file='
+`-Wfile='
+ The file name is saved to the temporary file `/tmp/ig.s.$$' with an
+ `@include' statement. The `sed' utility is used to remove the
+ leading option part of the argument (e.g., `--file=').
+
+`--source'
+`--source='
+`-Wsource='
+ The source text is echoed into `/tmp/ig.s.$$'.
+
+`--version'
+`--version'
+`-Wversion'
+ `igawk' prints its version number, and runs `gawk --version' to
+ get the `gawk' version information, and then exits.
+
+ If none of `-f', `--file', `-Wfile', `--source', or `-Wsource', were
+supplied, then the first non-option argument should be the `awk'
+program. If there are no command line arguments left, `igawk' prints
+an error message and exits. Otherwise, the first argument is echoed
+into `/tmp/ig.s.$$'.
+
+ In any case, after the arguments have been processed, `/tmp/ig.s.$$'
+contains the complete text of the original `awk' program.
+
+ The `$$' in `sh' represents the current process ID number. It is
+often used in shell programs to generate unique temporary file names.
+This allows multiple users to run `igawk' without worrying that the
+temporary file names will clash.
+
+ Here's the program:
+
+ #! /bin/sh
+
+ # igawk --- like gawk but do @include processing
+ # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
+ # July 1993
+
+ if [ "$1" = debug ]
+ then
+ set -x
+ shift
+ else
+ # cleanup on exit, hangup, interrupt, quit, termination
+ trap 'rm -f /tmp/ig.[se].$$' 0 1 2 3 15
+ fi
+
+ while [ $# -ne 0 ] # loop over arguments
+ do
+ case $1 in
+ --) shift; break;;
+
+ -W) shift
+ set -- -W"$@"
+ continue;;
+
+ -[vF]) opts="$opts $1 '$2'"
+ shift;;
+
+ -[vF]*) opts="$opts '$1'" ;;
+
+ -f) echo @include "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -f*) f=`echo "$1" | sed 's/-f//'`
+ echo @include "$f" >> /tmp/ig.s.$$ ;;
+
+ -?file=*) # -Wfile or --file
+ f=`echo "$1" | sed 's/-.file=//'`
+ echo @include "$f" >> /tmp/ig.s.$$ ;;
+
+ -?file) # get arg, $2
+ echo @include "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -?source=*) # -Wsource or --source
+ t=`echo "$1" | sed 's/-.source=//'`
+ echo "$t" >> /tmp/ig.s.$$ ;;
+
+ -?source) # get arg, $2
+ echo "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -?version)
+ echo igawk: version 1.0 1>&2
+ gawk --version
+ exit 0 ;;
+
+ -[W-]*) opts="$opts '$1'" ;;
+
+ *) break;;
+ esac
+ shift
+ done
+
+ if [ ! -s /tmp/ig.s.$$ ]
+ then
+ if [ -z "$1" ]
+ then
+ echo igawk: no program! 1>&2
+ exit 1
+ else
+ echo "$1" > /tmp/ig.s.$$
+ shift
+ fi
+ fi
+
+ # at this point, /tmp/ig.s.$$ has the program
+
+ The `awk' program to process `@include' directives reads through the
+program, one line at a time using `getline' (*note Explicit Input with
+`getline': Getline.). The input file names and `@include' statements
+are managed using a stack. As each `@include' is encountered, the
+current file name is "pushed" onto the stack, and the file named in the
+`@include' directive becomes the current file name. As each file is
+finished, the stack is "popped," and the previous input file becomes
+the current input file again. The process is started by making the
+original file the first one on the stack.
+
+ The `pathto' function does the work of finding the full path to a
+file. It simulates `gawk''s behavior when searching the `AWKPATH'
+environment variable (*note The `AWKPATH' Environment Variable: AWKPATH
+Variable.). If a file name has a `/' in it, no path search is done.
+Otherwise, the file name is concatenated with the name of each
+directory in the path, and an attempt is made to open the generated file
+name. The only way in `awk' to test if a file can be read is to go
+ahead and try to read it with `getline'; that is what `pathto' does.
+If the file can be read, it is closed, and the file name is returned.
+
+ gawk -- '
+ # process @include directives
+
+ function pathto(file, i, t, junk)
+ {
+ if (index(file, "/") != 0)
+ return file
+
+ for (i = 1; i <= ndirs; i++) {
+ t = (pathlist[i] "/" file)
+ if ((getline junk < t) > 0) {
+ # found it
+ close(t)
+ return t
+ }
+ }
+ return ""
+ }
+
+ The main program is contained inside one `BEGIN' rule. The first
+thing it does is set up the `pathlist' array that `pathto' uses. After
+splitting the path on `:', null elements are replaced with `"."', which
+represents the current directory.
+
+ BEGIN {
+ path = ENVIRON["AWKPATH"]
+ ndirs = split(path, pathlist, ":")
+ for (i = 1; i <= ndirs; i++) {
+ if (pathlist[i] == "")
+ pathlist[i] = "."
+ }
+
+ The stack is initialized with `ARGV[1]', which will be
+`/tmp/ig.s.$$'. The main loop comes next. Input lines are read in
+succession. Lines that do not start with `@include' are printed
+verbatim.
+
+ If the line does start with `@include', the file name is in `$2'.
+`pathto' is called to generate the full path. If it could not, then we
+print an error message and continue.
+
+ The next thing to check is if the file has been included already.
+The `processed' array is indexed by the full file name of each included
+file, and it tracks this information for us. If the file has been
+seen, a warning message is printed. Otherwise, the new file name is
+pushed onto the stack and processing continues.
+
+ Finally, when `getline' encounters the end of the input file, the
+file is closed and the stack is popped. When `stackptr' is less than
+zero, the program is done.
+
+ stackptr = 0
+ input[stackptr] = ARGV[1] # ARGV[1] is first file
+
+ for (; stackptr >= 0; stackptr--) {
+ while ((getline < input[stackptr]) > 0) {
+ if (tolower($1) != "@include") {
+ print
+ continue
+ }
+ fpath = pathto($2)
+ if (fpath == "") {
+ printf("igawk:%s:%d: cannot find %s\n", \
+ input[stackptr], FNR, $2) > "/dev/stderr"
+ continue
+ }
+ if (! (fpath in processed)) {
+ processed[fpath] = input[stackptr]
+ input[++stackptr] = fpath
+ } else
+ print $2, "included in", input[stackptr], \
+ "already included in", \
+ processed[fpath] > "/dev/stderr"
+ }
+ close(input[stackptr])
+ }
+ }' /tmp/ig.s.$$ > /tmp/ig.e.$$
+
+ The last step is to call `gawk' with the expanded program and the
+original options and command line arguments that the user supplied.
+`gawk''s exit status is passed back on to `igawk''s calling program.
+
+ eval gawk -f /tmp/ig.e.$$ $opts -- "$@"
+
+ exit $?
+
+ This version of `igawk' represents my third attempt at this program.
+There are three key simplifications that made the program work better.
+
+ 1. Using `@include' even for the files named with `-f' makes building
+ the initial collected `awk' program much simpler; all the
+ `@include' processing can be done once.
+
+ 2. The `pathto' function doesn't try to save the line read with
+ `getline' when testing for the file's accessibility. Trying to
+ save this line for use with the main program complicates things
+ considerably.
+
+ 3. Using a `getline' loop in the `BEGIN' rule does it all in one
+ place. It is not necessary to call out to a separate loop for
+ processing nested `@include' statements.
+
+ Also, this program illustrates that it is often worthwhile to combine
+`sh' and `awk' programming together. You can usually accomplish quite
+a lot, without having to resort to low-level programming in C or C++,
+and it is frequently easier to do certain kinds of string and argument
+manipulation using the shell than it is in `awk'.
+
+ Finally, `igawk' shows that it is not always necessary to add new
+features to a program; they can often be layered on top. With `igawk',
+there is no real reason to build `@include' processing into `gawk'
+itself.
+
+ As an additional example of this, consider the idea of having two
+files in a directory in the search path.
+
+`default.awk'
+ This file would contain a set of default library functions, such
+ as `getopt' and `assert'.
+
+`site.awk'
+ This file would contain library functions that are specific to a
+ site or installation, i.e. locally developed functions. Having a
+ separate file allows `default.awk' to change with new `gawk'
+ releases, without requiring the system administrator to update it
+ each time by adding the local functions.
+
+ One user suggested that `gawk' be modified to automatically read
+these files upon startup. Instead, it would be very simple to modify
+`igawk' to do this. Since `igawk' can process nested `@include'
+directives, `default.awk' could simply contain `@include' statements
+for the desired library functions.
+
+
+File: gawk.info, Node: Language History, Next: Gawk Summary, Prev: Sample Programs, Up: Top
+
+The Evolution of the `awk' Language
+***********************************
+
+ This Info file describes the GNU implementation of `awk', which
+follows the POSIX specification. Many `awk' users are only familiar
+with the original `awk' implementation in Version 7 Unix. (This
+implementation was the basis for `awk' in Berkeley Unix, through
+4.3-Reno. The 4.4 release of Berkeley Unix uses `gawk' 2.15.2 for its
+version of `awk'.) This chapter briefly describes the evolution of the
+`awk' language, with cross references to other parts of the Info file
+where you can find more information.
+
+* Menu:
+
+* V7/SVR3.1:: The major changes between V7 and System V
+ Release 3.1.
+* SVR4:: Minor changes between System V Releases 3.1
+ and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from the Bell Laboratories
+ version of `awk'.
+* POSIX/GNU:: The extensions in `gawk' not in POSIX
+ `awk'.
+
+
+File: gawk.info, Node: V7/SVR3.1, Next: SVR4, Prev: Language History, Up: Language History
+
+Major Changes between V7 and SVR3.1
+===================================
+
+ The `awk' language evolved considerably between the release of
+Version 7 Unix (1978) and the new version first made generally
+available in System V Release 3.1 (1987). This section summarizes the
+changes, with cross-references to further details.
+
+ * The requirement for `;' to separate rules on a line (*note `awk'
+ Statements Versus Lines: Statements/Lines.).
+
+ * User-defined functions, and the `return' statement (*note
+ User-defined Functions: User-defined.).
+
+ * The `delete' statement (*note The `delete' Statement: Delete.).
+
+ * The `do'-`while' statement (*note The `do'-`while' Statement: Do
+ Statement.).
+
+ * The built-in functions `atan2', `cos', `sin', `rand' and `srand'
+ (*note Numeric Built-in Functions: Numeric Functions.).
+
+ * The built-in functions `gsub', `sub', and `match' (*note Built-in
+ Functions for String Manipulation: String Functions.).
+
+ * The built-in functions `close', and `system' (*note Built-in
+ Functions for Input/Output: I/O Functions.).
+
+ * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
+ built-in variables (*note Built-in Variables::).
+
+ * The conditional expression using the ternary operator `?:' (*note
+ Conditional Expressions: Conditional Exp.).
+
+ * The exponentiation operator `^' (*note Arithmetic Operators:
+ Arithmetic Ops.) and its assignment operator form `^=' (*note
+ Assignment Expressions: Assignment Ops.).
+
+ * C-compatible operator precedence, which breaks some old `awk'
+ programs (*note Operator Precedence (How Operators Nest):
+ Precedence.).
+
+ * Regexps as the value of `FS' (*note Specifying How Fields are
+ Separated: Field Separators.), and as the third argument to the
+ `split' function (*note Built-in Functions for String
+ Manipulation: String Functions.).
+
+ * Dynamic regexps as operands of the `~' and `!~' operators (*note
+ How to Use Regular Expressions: Regexp Usage.).
+
+ * The escape sequences `\b', `\f', and `\r' (*note Escape
+ Sequences::). (Some vendors have updated their old versions of
+ `awk' to recognize `\r', `\b', and `\f', but this is not something
+ you can rely on.)
+
+ * Redirection of input for the `getline' function (*note Explicit
+ Input with `getline': Getline.).
+
+ * Multiple `BEGIN' and `END' rules (*note The `BEGIN' and `END'
+ Special Patterns: BEGIN/END.).
+
+ * Multi-dimensional arrays (*note Multi-dimensional Arrays:
+ Multi-dimensional.).
+
+
+File: gawk.info, Node: SVR4, Next: POSIX, Prev: V7/SVR3.1, Up: Language History
+
+Changes between SVR3.1 and SVR4
+===============================
+
+ The System V Release 4 version of Unix `awk' added these features
+(some of which originated in `gawk'):
+
+ * The `ENVIRON' variable (*note Built-in Variables::).
+
+ * Multiple `-f' options on the command line (*note Command Line
+ Options: Options.).
+
+ * The `-v' option for assigning variables before program execution
+ begins (*note Command Line Options: Options.).
+
+ * The `--' option for terminating command line options.
+
+ * The `\a', `\v', and `\x' escape sequences (*note Escape
+ Sequences::).
+
+ * A defined return value for the `srand' built-in function (*note
+ Numeric Built-in Functions: Numeric Functions.).
+
+ * The `toupper' and `tolower' built-in string functions for case
+ translation (*note Built-in Functions for String Manipulation:
+ String Functions.).
+
+ * A cleaner specification for the `%c' format-control letter in the
+ `printf' function (*note Format-Control Letters: Control Letters.).
+
+ * The ability to dynamically pass the field width and precision
+ (`"%*.*d"') in the argument list of the `printf' function (*note
+ Format-Control Letters: Control Letters.).
+
+ * The use of regexp constants such as `/foo/' as expressions, where
+ they are equivalent to using the matching operator, as in `$0 ~
+ /foo/' (*note Using Regular Expression Constants: Using Constant
+ Regexps.).
+
+
+File: gawk.info, Node: POSIX, Next: BTL, Prev: SVR4, Up: Language History
+
+Changes between SVR4 and POSIX `awk'
+====================================
+
+ The POSIX Command Language and Utilities standard for `awk'
+introduced the following changes into the language:
+
+ * The use of `-W' for implementation-specific options.
+
+ * The use of `CONVFMT' for controlling the conversion of numbers to
+ strings (*note Conversion of Strings and Numbers: Conversion.).
+
+ * The concept of a numeric string, and tighter comparison rules to go
+ with it (*note Variable Typing and Comparison Expressions: Typing
+ and Comparison.).
+
+ * More complete documentation of many of the previously undocumented
+ features of the language.
+
+ The following common extensions are not permitted by the POSIX
+standard:
+
+ * `\x' escape sequences are not recognized (*note Escape
+ Sequences::).
+
+ * Newlines do not act as whitespace to separate fields when `FS' is
+ equal to a single space.
+
+ * The synonym `func' for the keyword `function' is not recognized
+ (*note Function Definition Syntax: Definition Syntax.).
+
+ * The operators `**' and `**=' cannot be used in place of `^' and
+ `^=' (*note Arithmetic Operators: Arithmetic Ops., and also *note
+ Assignment Expressions: Assignment Ops.).
+
+ * Specifying `-Ft' on the command line does not set the value of
+ `FS' to be a single tab character (*note Specifying How Fields are
+ Separated: Field Separators.).
+
+ * The `fflush' built-in function is not supported (*note Built-in
+ Functions for Input/Output: I/O Functions.).
+
+
+File: gawk.info, Node: BTL, Next: POSIX/GNU, Prev: POSIX, Up: Language History
+
+Extensions in the Bell Laboratories `awk'
+=========================================
+
+ Brian Kernighan, one of the original designers of Unix `awk', has
+made his version available via anonymous `ftp' (*note Other Freely
+Available `awk' Implementations: Other Versions.). This section
+describes extensions in his version of `awk' that are not in POSIX
+`awk'.
+
+ * The `-mf NNN' and `-mr NNN' command line options to set the
+ maximum number of fields, and the maximum record size, respectively
+ (*note Command Line Options: Options.).
+
+ * The `fflush' built-in function for flushing buffered output (*note
+ Built-in Functions for Input/Output: I/O Functions.).
+
+
+
+File: gawk.info, Node: POSIX/GNU, Prev: BTL, Up: Language History
+
+Extensions in `gawk' Not in POSIX `awk'
+=======================================
+
+ The GNU implementation, `gawk', adds a number of features. This
+sections lists them in the order they were added to `gawk'. They can
+all be disabled with either the `--traditional' or `--posix' options
+(*note Command Line Options: Options.).
+
+ Version 2.10 of `gawk' introduced these features:
+
+ * The `AWKPATH' environment variable for specifying a path search for
+ the `-f' command line option (*note Command Line Options:
+ Options.).
+
+ * The `IGNORECASE' variable and its effects (*note Case-sensitivity
+ in Matching: Case-sensitivity.).
+
+ * The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N'
+ file name interpretation (*note Special File Names in `gawk':
+ Special Files.).
+
+ Version 2.13 of `gawk' introduced these features:
+
+ * The `FIELDWIDTHS' variable and its effects (*note Reading
+ Fixed-width Data: Constant Size.).
+
+ * The `systime' and `strftime' built-in functions for obtaining and
+ printing time stamps (*note Functions for Dealing with Time
+ Stamps: Time Functions.).
+
+ * The `-W lint' option to provide source code and run time error and
+ portability checking (*note Command Line Options: Options.).
+
+ * The `-W compat' option to turn off these extensions (*note Command
+ Line Options: Options.).
+
+ * The `-W posix' option for full POSIX compliance (*note Command
+ Line Options: Options.).
+
+ Version 2.14 of `gawk' introduced these features:
+
+ * The `next file' statement for skipping to the next data file
+ (*note The `nextfile' Statement: Nextfile Statement.).
+
+ Version 2.15 of `gawk' introduced these features:
+
+ * The `ARGIND' variable, that tracks the movement of `FILENAME'
+ through `ARGV' (*note Built-in Variables::).
+
+ * The `ERRNO' variable, that contains the system error message when
+ `getline' returns -1, or when `close' fails (*note Built-in
+ Variables::).
+
+ * The ability to use GNU-style long named options that start with
+ `--' (*note Command Line Options: Options.).
+
+ * The `--source' option for mixing command line and library file
+ source code (*note Command Line Options: Options.).
+
+ * The `/dev/pid', `/dev/ppid', `/dev/pgrpid', and `/dev/user' file
+ name interpretation (*note Special File Names in `gawk': Special
+ Files.).
+
+ Version 3.0 of `gawk' introduced these features:
+
+ * The `next file' statement became `nextfile' (*note The `nextfile'
+ Statement: Nextfile Statement.).
+
+ * The `--lint-old' option to warn about constructs that are not
+ available in the original Version 7 Unix version of `awk' (*note
+ Major Changes between V7 and SVR3.1: V7/SVR3.1.).
+
+ * The `--traditional' option was added as a better name for
+ `--compat' (*note Command Line Options: Options.).
+
+ * The ability for `FS' to be a null string, and for the third
+ argument to `split' to be the null string (*note Making Each
+ Character a Separate Field: Single Character Fields.).
+
+ * The ability for `RS' to be a regexp (*note How Input is Split into
+ Records: Records.).
+
+ * The `RT' variable (*note How Input is Split into Records:
+ Records.).
+
+ * The `gensub' function for more powerful text manipulation (*note
+ Built-in Functions for String Manipulation: String Functions.).
+
+ * The `strftime' function acquired a default time format, allowing
+ it to be called with no arguments (*note Functions for Dealing
+ with Time Stamps: Time Functions.).
+
+ * Full support for both POSIX and GNU regexps (*note Regular
+ Expressions: Regexp.).
+
+ * The `--re-interval' option to provide interval expressions in
+ regexps (*note Regular Expression Operators: Regexp Operators.).
+
+ * `IGNORECASE' changed, now applying to string comparison as well as
+ regexp operations (*note Case-sensitivity in Matching:
+ Case-sensitivity.).
+
+ * The `-m' option and the `fflush' function from the Bell Labs
+ research version of `awk' (*note Command Line Options: Options.;
+ also *note Built-in Functions for Input/Output: I/O Functions.).
+
+ * The use of GNU Autoconf to control the configuration process
+ (*note Compiling `gawk' for Unix: Quick Installation.).
+
+ * Amiga support (*note Installing `gawk' on an Amiga: Amiga
+ Installation.).
+
+
+
+File: gawk.info, Node: Gawk Summary, Next: Installation, Prev: Language History, Up: Top
+
+`gawk' Summary
+**************
+
+ This appendix provides a brief summary of the `gawk' command line
+and the `awk' language. It is designed to serve as "quick reference."
+It is therefore terse, but complete.
+
+* Menu:
+
+* Command Line Summary:: Recapitulation of the command line.
+* Language Summary:: A terse review of the language.
+* Variables/Fields:: Variables, fields, and arrays.
+* Rules Summary:: Patterns and Actions, and their component
+ parts.
+* Actions Summary:: Quick overview of actions.
+* Functions Summary:: Defining and calling functions.
+* Historical Features:: Some undocumented but supported ``features''.
+
+
+File: gawk.info, Node: Command Line Summary, Next: Language Summary, Prev: Gawk Summary, Up: Gawk Summary
+
+Command Line Options Summary
+============================
+
+ The command line consists of options to `gawk' itself, the `awk'
+program text (if not supplied via the `-f' option), and values to be
+made available in the `ARGC' and `ARGV' predefined `awk' variables:
+
+ gawk [POSIX OR GNU STYLE OPTIONS] -f SOURCE-FILE [`--'] FILE ...
+ gawk [POSIX OR GNU STYLE OPTIONS] [`--'] 'PROGRAM' FILE ...
+
+ The options that `gawk' accepts are:
+
+`-F FS'
+`--field-separator FS'
+ Use FS for the input field separator (the value of the `FS'
+ predefined variable).
+
+`-f PROGRAM-FILE'
+`--file PROGRAM-FILE'
+ Read the `awk' program source from the file PROGRAM-FILE, instead
+ of from the first command line argument.
+
+`-mf NNN'
+`-mr NNN'
+ The `f' flag sets the maximum number of fields, and the `r' flag
+ sets the maximum record size. These options are ignored by
+ `gawk', since `gawk' has no predefined limits; they are only for
+ compatibility with the Bell Labs research version of Unix `awk'.
+
+`-v VAR=VAL'
+`--assign VAR=VAL'
+ Assign the variable VAR the value VAL before program execution
+ begins.
+
+`-W traditional'
+`-W compat'
+`--traditional'
+`--compat'
+ Use compatibility mode, in which `gawk' extensions are turned off.
+
+`-W copyleft'
+`-W copyright'
+`--copyleft'
+`--copyright'
+ Print the short version of the General Public License on the
+ standard output, and exit. This option may disappear in a future
+ version of `gawk'.
+
+`-W help'
+`-W usage'
+`--help'
+`--usage'
+ Print a relatively short summary of the available options on the
+ standard output, and exit.
+
+`-W lint'
+`--lint'
+ Give warnings about dubious or non-portable `awk' constructs.
+
+`-W lint-old'
+`--lint-old'
+ Warn about constructs that are not available in the original
+ Version 7 Unix version of `awk'.
+
+`-W posix'
+`--posix'
+ Use POSIX compatibility mode, in which `gawk' extensions are
+ turned off and additional restrictions apply.
+
+`-W re-interval'
+`--re-interval'
+ Allow interval expressions (*note Regular Expression Operators:
+ Regexp Operators.), in regexps.
+
+`-W source=PROGRAM-TEXT'
+`--source PROGRAM-TEXT'
+ Use PROGRAM-TEXT as `awk' program source code. This option allows
+ mixing command line source code with source code from files, and is
+ particularly useful for mixing command line programs with library
+ functions.
+
+`-W version'
+`--version'
+ Print version information for this particular copy of `gawk' on
+ the error output.
+
+`--'
+ Signal the end of options. This is useful to allow further
+ arguments to the `awk' program itself to start with a `-'. This
+ is mainly for consistency with POSIX argument parsing conventions.
+
+ Any other options are flagged as invalid, but are otherwise ignored.
+*Note Command Line Options: Options, for more details.
+
+
+File: gawk.info, Node: Language Summary, Next: Variables/Fields, Prev: Command Line Summary, Up: Gawk Summary
+
+Language Summary
+================
+
+ An `awk' program consists of a sequence of zero or more
+pattern-action statements and optional function definitions. One or
+the other of the pattern and action may be omitted.
+
+ PATTERN { ACTION STATEMENTS }
+ PATTERN
+ { ACTION STATEMENTS }
+
+ function NAME(PARAMETER LIST) { ACTION STATEMENTS }
+
+ `gawk' first reads the program source from the PROGRAM-FILE(s), if
+specified, or from the first non-option argument on the command line.
+The `-f' option may be used multiple times on the command line. `gawk'
+reads the program text from all the PROGRAM-FILE files, effectively
+concatenating them in the order they are specified. This is useful for
+building libraries of `awk' functions, without having to include them
+in each new `awk' program that uses them. To use a library function in
+a file from a program typed in on the command line, specify `--source
+'PROGRAM'', and type your program in between the single quotes. *Note
+Command Line Options: Options.
+
+ The environment variable `AWKPATH' specifies a search path to use
+when finding source files named with the `-f' option. The default
+path, which is `.:/usr/local/share/awk'(1) is used if `AWKPATH' is not
+set. If a file name given to the `-f' option contains a `/' character,
+no path search is performed. *Note The `AWKPATH' Environment Variable:
+AWKPATH Variable.
+
+ `gawk' compiles the program into an internal form, and then proceeds
+to read each file named in the `ARGV' array. The initial values of
+`ARGV' come from the command line arguments. If there are no files
+named on the command line, `gawk' reads the standard input.
+
+ If a "file" named on the command line has the form `VAR=VAL', it is
+treated as a variable assignment: the variable VAR is assigned the
+value VAL. If any of the files have a value that is the null string,
+that element in the list is skipped.
+
+ For each record in the input, `gawk' tests to see if it matches any
+PATTERN in the `awk' program. For each pattern that the record
+matches, the associated ACTION is executed.
+
+ ---------- Footnotes ----------
+
+ (1) The path may use a directory other than `/usr/local/share/awk',
+depending upon how `gawk' was built and installed.
+
+
+File: gawk.info, Node: Variables/Fields, Next: Rules Summary, Prev: Language Summary, Up: Gawk Summary
+
+Variables and Fields
+====================
+
+ `awk' variables are not declared; they come into existence when they
+are first used. Their values are either floating-point numbers or
+strings. `awk' also has one-dimensional arrays; multiple-dimensional
+arrays may be simulated. There are several predefined variables that
+`awk' sets as a program runs; these are summarized below.
+
+* Menu:
+
+* Fields Summary:: Input field splitting.
+* Built-in Summary:: `awk''s built-in variables.
+* Arrays Summary:: Using arrays.
+* Data Type Summary:: Values in `awk' are numbers or strings.
+
+
+File: gawk.info, Node: Fields Summary, Next: Built-in Summary, Prev: Variables/Fields, Up: Variables/Fields
+
+Fields
+------
+
+ As each input line is read, `gawk' splits the line into FIELDS,
+using the value of the `FS' variable as the field separator. If `FS'
+is a single character, fields are separated by that character.
+Otherwise, `FS' is expected to be a full regular expression. In the
+special case that `FS' is a single space, fields are separated by runs
+of spaces, tabs and/or newlines.(1) If `FS' is the null string (`""'),
+then each individual character in the record becomes a separate field.
+Note that the value of `IGNORECASE' (*note Case-sensitivity in
+Matching: Case-sensitivity.) also affects how fields are split when
+`FS' is a regular expression.
+
+ Each field in the input line may be referenced by its position, `$1',
+`$2', and so on. `$0' is the whole line. The value of a field may be
+assigned to as well. Field numbers need not be constants:
+
+ n = 5
+ print $n
+
+prints the fifth field in the input line. The variable `NF' is set to
+the total number of fields in the input line.
+
+ References to non-existent fields (i.e. fields after `$NF') return
+the null string. However, assigning to a non-existent field (e.g.,
+`$(NF+2) = 5') increases the value of `NF', creates any intervening
+fields with the null string as their value, and causes the value of
+`$0' to be recomputed, with the fields being separated by the value of
+`OFS'. Decrementing `NF' causes the values of fields past the new
+value to be lost, and the value of `$0' to be recomputed, with the
+fields being separated by the value of `OFS'. *Note Reading Input
+Files: Reading Files.
+
+ ---------- Footnotes ----------
+
+ (1) In POSIX `awk', newline does not separate fields.
+
+
+File: gawk.info, Node: Built-in Summary, Next: Arrays Summary, Prev: Fields Summary, Up: Variables/Fields
+
+Built-in Variables
+------------------
+
+ `gawk''s built-in variables are:
+
+`ARGC'
+ The number of elements in `ARGV'. See below for what is actually
+ included in `ARGV'.
+
+`ARGIND'
+ The index in `ARGV' of the current file being processed. When
+ `gawk' is processing the input data files, it is always true that
+ `FILENAME == ARGV[ARGIND]'.
+
+`ARGV'
+ The array of command line arguments. The array is indexed from
+ zero to `ARGC' - 1. Dynamically changing `ARGC' and the contents
+ of `ARGV' can control the files used for data. A null-valued
+ element in `ARGV' is ignored. `ARGV' does not include the options
+ to `awk' or the text of the `awk' program itself.
+
+`CONVFMT'
+ The conversion format to use when converting numbers to strings.
+
+`FIELDWIDTHS'
+ A space separated list of numbers describing the fixed-width input
+ data.
+
+`ENVIRON'
+ An array of environment variable values. The array is indexed by
+ variable name, each element being the value of that variable.
+ Thus, the environment variable `HOME' is `ENVIRON["HOME"]'. One
+ possible value might be `/home/arnold'.
+
+ Changing this array does not affect the environment seen by
+ programs which `gawk' spawns via redirection or the `system'
+ function. (This may change in a future version of `gawk'.)
+
+ Some operating systems do not have environment variables. The
+ `ENVIRON' array is empty when running on these systems.
+
+`ERRNO'
+ The system error message when an error occurs using `getline' or
+ `close'.
+
+`FILENAME'
+ The name of the current input file. If no files are specified on
+ the command line, the value of `FILENAME' is the null string.
+
+`FNR'
+ The input record number in the current input file.
+
+`FS'
+ The input field separator, a space by default.
+
+`IGNORECASE'
+ The case-sensitivity flag for string comparisons and regular
+ expression operations. If `IGNORECASE' has a non-zero value, then
+ pattern matching in rules, record separating with `RS', field
+ splitting with `FS', regular expression matching with `~' and
+ `!~', and the `gensub', `gsub', `index', `match', `split' and
+ `sub' built-in functions all ignore case when doing regular
+ expression operations, and all string comparisons are done
+ ignoring case.
+
+`NF'
+ The number of fields in the current input record.
+
+`NR'
+ The total number of input records seen so far.
+
+`OFMT'
+ The output format for numbers for the `print' statement, `"%.6g"'
+ by default.
+
+`OFS'
+ The output field separator, a space by default.
+
+`ORS'
+ The output record separator, by default a newline.
+
+`RS'
+ The input record separator, by default a newline. If `RS' is set
+ to the null string, then records are separated by blank lines.
+ When `RS' is set to the null string, then the newline character
+ always acts as a field separator, in addition to whatever value
+ `FS' may have. If `RS' is set to a multi-character string, it
+ denotes a regexp; input text matching the regexp separates records.
+
+`RT'
+ The input text that matched the text denoted by `RS', the record
+ separator.
+
+`RSTART'
+ The index of the first character last matched by `match'; zero if
+ no match.
+
+`RLENGTH'
+ The length of the string last matched by `match'; -1 if no match.
+
+`SUBSEP'
+ The string used to separate multiple subscripts in array elements,
+ by default `"\034"'.
+
+ *Note Built-in Variables::, for more information.
+
+
+File: gawk.info, Node: Arrays Summary, Next: Data Type Summary, Prev: Built-in Summary, Up: Variables/Fields
+
+Arrays
+------
+
+ Arrays are subscripted with an expression between square brackets
+(`[' and `]'). Array subscripts are _always_ strings; numbers are
+converted to strings as necessary, following the standard conversion
+rules (*note Conversion of Strings and Numbers: Conversion.).
+
+ If you use multiple expressions separated by commas inside the square
+brackets, then the array subscript is a string consisting of the
+concatenation of the individual subscript values, converted to strings,
+separated by the subscript separator (the value of `SUBSEP').
+
+ The special operator `in' may be used in a conditional context to
+see if an array has an index consisting of a particular value.
+
+ if (val in array)
+ print array[val]
+
+ If the array has multiple subscripts, use `(i, j, ...) in ARRAY' to
+test for existence of an element.
+
+ The `in' construct may also be used in a `for' loop to iterate over
+all the elements of an array. *Note Scanning All Elements of an Array:
+Scanning an Array.
+
+ You can remove an element from an array using the `delete' statement.
+
+ You can clear an entire array using `delete ARRAY'.
+
+ *Note Arrays in `awk': Arrays.
+
+
+File: gawk.info, Node: Data Type Summary, Prev: Arrays Summary, Up: Variables/Fields
+
+Data Types
+----------
+
+ The value of an `awk' expression is always either a number or a
+string.
+
+ Some contexts (such as arithmetic operators) require numeric values.
+They convert strings to numbers by interpreting the text of the string
+as a number. If the string does not look like a number, it converts to
+zero.
+
+ Other contexts (such as concatenation) require string values. They
+convert numbers to strings by effectively printing them with `sprintf'.
+*Note Conversion of Strings and Numbers: Conversion, for the details.
+
+ To force conversion of a string value to a number, simply add zero
+to it. If the value you start with is already a number, this does not
+change it.
+
+ To force conversion of a numeric value to a string, concatenate it
+with the null string.
+
+ Comparisons are done numerically if both operands are numeric, or if
+one is numeric and the other is a numeric string. Otherwise one or
+both operands are converted to strings and a string comparison is
+performed. Fields, `getline' input, `FILENAME', `ARGV' elements,
+`ENVIRON' elements and the elements of an array created by `split' are
+the only items that can be numeric strings. String constants, such as
+`"3.1415927"' are not numeric strings, they are string constants. The
+full rules for comparisons are described in *Note Variable Typing and
+Comparison Expressions: Typing and Comparison.
+
+ Uninitialized variables have the string value `""' (the null, or
+empty, string). In contexts where a number is required, this is
+equivalent to zero.
+
+ *Note Variables::, for more information on variable naming and
+initialization; *note Conversion of Strings and Numbers: Conversion.,
+for more information on how variable values are interpreted.
+
+
+File: gawk.info, Node: Rules Summary, Next: Actions Summary, Prev: Variables/Fields, Up: Gawk Summary
+
+Patterns
+========
+
+* Menu:
+
+* Pattern Summary:: Quick overview of patterns.
+* Regexp Summary:: Quick overview of regular expressions.
+
+ An `awk' program is mostly composed of rules, each consisting of a
+pattern followed by an action. The action is enclosed in `{' and `}'.
+Either the pattern may be missing, or the action may be missing, but
+not both. If the pattern is missing, the action is executed for every
+input record. A missing action is equivalent to `{ print }', which
+prints the entire line.
+
+ Comments begin with the `#' character, and continue until the end of
+the line. Blank lines may be used to separate statements. Statements
+normally end with a newline; however, this is not the case for lines
+ending in a `,', `{', `?', `:', `&&', or `||'. Lines ending in `do' or
+`else' also have their statements automatically continued on the
+following line. In other cases, a line can be continued by ending it
+with a `\', in which case the newline is ignored.
+
+ Multiple statements may be put on one line by separating each one
+with a `;'. This applies to both the statements within the action part
+of a rule (the usual case), and to the rule statements.
+
+ *Note Comments in `awk' Programs: Comments, for information on
+`awk''s commenting convention; *note `awk' Statements Versus Lines:
+Statements/Lines., for a description of the line continuation mechanism
+in `awk'.
+
+
+File: gawk.info, Node: Pattern Summary, Next: Regexp Summary, Prev: Rules Summary, Up: Rules Summary
+
+Pattern Summary
+---------------
+
+ `awk' patterns may be one of the following:
+
+ /REGULAR EXPRESSION/
+ RELATIONAL EXPRESSION
+ PATTERN && PATTERN
+ PATTERN || PATTERN
+ PATTERN ? PATTERN : PATTERN
+ (PATTERN)
+ ! PATTERN
+ PATTERN1, PATTERN2
+ BEGIN
+ END
+
+ `BEGIN' and `END' are two special kinds of patterns that are not
+tested against the input. The action parts of all `BEGIN' rules are
+concatenated as if all the statements had been written in a single
+`BEGIN' rule. They are executed before any of the input is read.
+Similarly, all the `END' rules are concatenated, and executed when all
+the input is exhausted (or when an `exit' statement is executed).
+`BEGIN' and `END' patterns cannot be combined with other patterns in
+pattern expressions. `BEGIN' and `END' rules cannot have missing
+action parts.
+
+ For `/REGULAR-EXPRESSION/' patterns, the associated statement is
+executed for each input record that matches the regular expression.
+Regular expressions are summarized below.
+
+ A RELATIONAL EXPRESSION may use any of the operators defined below in
+the section on actions. These generally test whether certain fields
+match certain regular expressions.
+
+ The `&&', `||', and `!' operators are logical "and," logical "or,"
+and logical "not," respectively, as in C. They do short-circuit
+evaluation, also as in C, and are used for combining more primitive
+pattern expressions. As in most languages, parentheses may be used to
+change the order of evaluation.
+
+ The `?:' operator is like the same operator in C. If the first
+pattern matches, then the second pattern is matched against the input
+record; otherwise, the third is matched. Only one of the second and
+third patterns is matched.
+
+ The `PATTERN1, PATTERN2' form of a pattern is called a range
+pattern. It matches all input lines starting with a line that matches
+PATTERN1, and continuing until a line that matches PATTERN2, inclusive.
+A range pattern cannot be used as an operand of any of the pattern
+operators.
+
+ *Note Pattern Elements: Pattern Overview.
+
+
+File: gawk.info, Node: Regexp Summary, Prev: Pattern Summary, Up: Rules Summary
+
+Regular Expressions
+-------------------
+
+ Regular expressions are based on POSIX EREs (extended regular
+expressions). The escape sequences allowed in string constants are
+also valid in regular expressions (*note Escape Sequences::). Regexps
+are composed of characters as follows:
+
+`C'
+ matches the character C (assuming C is none of the characters
+ listed below).
+
+`\C'
+ matches the literal character C.
+
+`.'
+ matches any character, _including_ newline. In strict POSIX mode,
+ `.' does not match the NUL character, which is a character with
+ all bits equal to zero.
+
+`^'
+ matches the beginning of a string.
+
+`$'
+ matches the end of a string.
+
+`[ABC...]'
+ matches any of the characters ABC... (character list).
+
+`[[:CLASS:]]'
+ matches any character in the character class CLASS. Allowable
+ classes are `alnum', `alpha', `blank', `cntrl', `digit', `graph',
+ `lower', `print', `punct', `space', `upper', and `xdigit'.
+
+`[[.SYMBOL.]]'
+ matches the multi-character collating symbol SYMBOL. `gawk' does
+ not currently support collating symbols.
+
+`[[=CLASSNAME=]]'
+ matches any of the equivalent characters in the current locale
+ named by the equivalence class CLASSNAME. `gawk' does not
+ currently support equivalence classes.
+
+`[^ABC...]'
+ matches any character except ABC... (negated character list).
+
+`R1|R2'
+ matches either R1 or R2 (alternation).
+
+`R1R2'
+ matches R1, and then R2 (concatenation).
+
+`R+'
+ matches one or more R's.
+
+`R*'
+ matches zero or more R's.
+
+`R?'
+ matches zero or one R's.
+
+`(R)'
+ matches R (grouping).
+
+`R{N}'
+`R{N,}'
+`R{N,M}'
+ matches at least N, N to any number, or N to M occurrences of R
+ (interval expressions).
+
+`\y'
+ matches the empty string at either the beginning or the end of a
+ word.
+
+`\B'
+ matches the empty string within a word.
+
+`\<'
+ matches the empty string at the beginning of a word.
+
+`\>'
+ matches the empty string at the end of a word.
+
+`\w'
+ matches any word-constituent character (alphanumeric characters and
+ the underscore).
+
+`\W'
+ matches any character that is not word-constituent.
+
+`\`'
+ matches the empty string at the beginning of a buffer (same as a
+ string in `gawk').
+
+`\''
+ matches the empty string at the end of a buffer.
+
+ The various command line options control how `gawk' interprets
+characters in regexps.
+
+No options
+ In the default case, `gawk' provide all the facilities of POSIX
+ regexps and the GNU regexp operators described above. However,
+ interval expressions are not supported.
+
+`--posix'
+ Only POSIX regexps are supported, the GNU operators are not special
+ (e.g., `\w' matches a literal `w'). Interval expressions are
+ allowed.
+
+`--traditional'
+ Traditional Unix `awk' regexps are matched. The GNU operators are
+ not special, interval expressions are not available, and neither
+ are the POSIX character classes (`[[:alnum:]]' and so on).
+ Characters described by octal and hexadecimal escape sequences are
+ treated literally, even if they represent regexp metacharacters.
+
+`--re-interval'
+ Allow interval expressions in regexps, even if `--traditional' has
+ been provided.
+
+ *Note Regular Expressions: Regexp.
+
+
+File: gawk.info, Node: Actions Summary, Next: Functions Summary, Prev: Rules Summary, Up: Gawk Summary
+
+Actions
+=======
+
+ Action statements are enclosed in braces, `{' and `}'. A missing
+action statement is equivalent to `{ print }'.
+
+ Action statements consist of the usual assignment, conditional, and
+looping statements found in most languages. The operators, control
+statements, and Input/Output statements available are similar to those
+in C.
+
+ Comments begin with the `#' character, and continue until the end of
+the line. Blank lines may be used to separate statements. Statements
+normally end with a newline; however, this is not the case for lines
+ending in a `,', `{', `?', `:', `&&', or `||'. Lines ending in `do' or
+`else' also have their statements automatically continued on the
+following line. In other cases, a line can be continued by ending it
+with a `\', in which case the newline is ignored.
+
+ Multiple statements may be put on one line by separating each one
+with a `;'. This applies to both the statements within the action part
+of a rule (the usual case), and to the rule statements.
+
+ *Note Comments in `awk' Programs: Comments, for information on
+`awk''s commenting convention; *note `awk' Statements Versus Lines:
+Statements/Lines., for a description of the line continuation mechanism
+in `awk'.
+
+* Menu:
+
+* Operator Summary:: `awk' operators.
+* Control Flow Summary:: The control statements.
+* I/O Summary:: The I/O statements.
+* Printf Summary:: A summary of `printf'.
+* Special File Summary:: Special file names interpreted internally.
+* Built-in Functions Summary:: Built-in numeric and string functions.
+* Time Functions Summary:: Built-in time functions.
+* String Constants Summary:: Escape sequences in strings.
+
+
+File: gawk.info, Node: Operator Summary, Next: Control Flow Summary, Prev: Actions Summary, Up: Actions Summary
+
+Operators
+---------
+
+ The operators in `awk', in order of decreasing precedence, are:
+
+`(...)'
+ Grouping.
+
+`$'
+ Field reference.
+
+`++ --'
+ Increment and decrement, both prefix and postfix.
+
+`^'
+ Exponentiation (`**' may also be used, and `**=' for the assignment
+ operator, but they are not specified in the POSIX standard).
+
+`+ - !'
+ Unary plus, unary minus, and logical negation.
+
+`* / %'
+ Multiplication, division, and modulus.
+
+`+ -'
+ Addition and subtraction.
+
+`SPACE'
+ String concatenation.
+
+`< <= > >= != =='
+ The usual relational operators.
+
+`~ !~'
+ Regular expression match, negated match.
+
+`in'
+ Array membership.
+
+`&&'
+ Logical "and".
+
+`||'
+ Logical "or".
+
+`?:'
+ A conditional expression. This has the form `EXPR1 ? EXPR2 :
+ EXPR3'. If EXPR1 is true, the value of the expression is EXPR2;
+ otherwise it is EXPR3. Only one of EXPR2 and EXPR3 is evaluated.
+
+`= += -= *= /= %= ^='
+ Assignment. Both absolute assignment (`VAR=VALUE') and operator
+ assignment (the other forms) are supported.
+
+ *Note Expressions::.
+
+
+File: gawk.info, Node: Control Flow Summary, Next: I/O Summary, Prev: Operator Summary, Up: Actions Summary
+
+Control Statements
+------------------
+
+ The control statements are as follows:
+
+ if (CONDITION) STATEMENT [ else STATEMENT ]
+ while (CONDITION) STATEMENT
+ do STATEMENT while (CONDITION)
+ for (EXPR1; EXPR2; EXPR3) STATEMENT
+ for (VAR in ARRAY) STATEMENT
+ break
+ continue
+ delete ARRAY[INDEX]
+ delete ARRAY
+ exit [ EXPRESSION ]
+ { STATEMENTS }
+
+ *Note Control Statements in Actions: Statements.
+
+
+File: gawk.info, Node: I/O Summary, Next: Printf Summary, Prev: Control Flow Summary, Up: Actions Summary
+
+I/O Statements
+--------------
+
+ The Input/Output statements are as follows:
+
+`getline'
+ Set `$0' from next input record; set `NF', `NR', `FNR'. *Note
+ Explicit Input with `getline': Getline.
+
+`getline <FILE'
+ Set `$0' from next record of FILE; set `NF'.
+
+`getline VAR'
+ Set VAR from next input record; set `NR', `FNR'.
+
+`getline VAR <FILE'
+ Set VAR from next record of FILE.
+
+`COMMAND | getline'
+ Run COMMAND, piping its output into `getline'; sets `$0', `NF',
+ `NR'.
+
+`COMMAND | getline `var''
+ Run COMMAND, piping its output into `getline'; sets VAR.
+
+`next'
+ Stop processing the current input record. The next input record
+ is read and processing starts over with the first pattern in the
+ `awk' program. If the end of the input data is reached, the `END'
+ rule(s), if any, are executed. *Note The `next' Statement: Next
+ Statement.
+
+`nextfile'
+ Stop processing the current input file. The next input record
+ read comes from the next input file. `FILENAME' is updated, `FNR'
+ is set to one, `ARGIND' is incremented, and processing starts over
+ with the first pattern in the `awk' program. If the end of the
+ input data is reached, the `END' rule(s), if any, are executed.
+ Earlier versions of `gawk' used `next file'; this usage is still
+ supported, but is considered to be deprecated. *Note The
+ `nextfile' Statement: Nextfile Statement.
+
+`print'
+ Prints the current record. *Note Printing Output: Printing.
+
+`print EXPR-LIST'
+ Prints expressions.
+
+`print EXPR-LIST > FILE'
+ Prints expressions to FILE. If FILE does not exist, it is created.
+ If it does exist, its contents are deleted the first time the
+ `print' is executed.
+
+`print EXPR-LIST >> FILE'
+ Prints expressions to FILE. The previous contents of FILE are
+ retained, and the output of `print' is appended to the file.
+
+`print EXPR-LIST | COMMAND'
+ Prints expressions, sending the output down a pipe to COMMAND.
+ The pipeline to the command stays open until the `close' function
+ is called.
+
+`printf FMT, EXPR-LIST'
+ Format and print.
+
+`printf FMT, EXPR-LIST > file'
+ Format and print to FILE. If FILE does not exist, it is created.
+ If it does exist, its contents are deleted the first time the
+ `printf' is executed.
+
+`printf FMT, EXPR-LIST >> FILE'
+ Format and print to FILE. The previous contents of FILE are
+ retained, and the output of `printf' is appended to the file.
+
+`printf FMT, EXPR-LIST | COMMAND'
+ Format and print, sending the output down a pipe to COMMAND. The
+ pipeline to the command stays open until the `close' function is
+ called.
+
+ `getline' returns zero on end of file, and -1 on an error. In the
+event of an error, `getline' will set `ERRNO' to the value of a
+system-dependent string that describes the error.
+
+
+File: gawk.info, Node: Printf Summary, Next: Special File Summary, Prev: I/O Summary, Up: Actions Summary
+
+`printf' Summary
+----------------
+
+ Conversion specification have the form
+`%'[FLAG][WIDTH][`.'PREC]FORMAT. Items in brackets are optional.
+
+ The `awk' `printf' statement and `sprintf' function accept the
+following conversion specification formats:
+
+`%c'
+ An ASCII character. If the argument used for `%c' is numeric, it
+ is treated as a character and printed. Otherwise, the argument is
+ assumed to be a string, and the only first character of that
+ string is printed.
+
+`%d'
+`%i'
+ A decimal number (the integer part).
+
+`%e'
+`%E'
+ A floating point number of the form `[-]d.dddddde[+-]dd'. The
+ `%E' format uses `E' instead of `e'.
+
+`%f'
+ A floating point number of the form [`-']`ddd.dddddd'.
+
+`%g'
+`%G'
+ Use either the `%e' or `%f' formats, whichever produces a shorter
+ string, with non-significant zeros suppressed. `%G' will use `%E'
+ instead of `%e'.
+
+`%o'
+ An unsigned octal number (again, an integer).
+
+`%s'
+ A character string.
+
+`%x'
+`%X'
+ An unsigned hexadecimal number (an integer). The `%X' format uses
+ `A' through `F' instead of `a' through `f' for decimal 10 through
+ 15.
+
+`%%'
+ A single `%' character; no argument is converted.
+
+ There are optional, additional parameters that may lie between the
+`%' and the control letter:
+
+`-'
+ The expression should be left-justified within its field.
+
+`SPACE'
+ For numeric conversions, prefix positive values with a space, and
+ negative values with a minus sign.
+
+`+'
+ The plus sign, used before the width modifier (see below), says to
+ always supply a sign for numeric conversions, even if the data to
+ be formatted is positive. The `+' overrides the space modifier.
+
+`#'
+ Use an "alternate form" for certain control letters. For `o',
+ supply a leading zero. For `x', and `X', supply a leading `0x' or
+ `0X' for a non-zero result. For `e', `E', and `f', the result
+ will always contain a decimal point. For `g', and `G', trailing
+ zeros are not removed from the result.
+
+`0'
+ A leading `0' (zero) acts as a flag, that indicates output should
+ be padded with zeros instead of spaces. This applies even to
+ non-numeric output formats. This flag only has an effect when the
+ field width is wider than the value to be printed.
+
+`WIDTH'
+ The field should be padded to this width. The field is normally
+ padded with spaces. If the `0' flag has been used, it is padded
+ with zeros.
+
+`.PREC'
+ A number that specifies the precision to use when printing. For
+ the `e', `E', and `f' formats, this specifies the number of digits
+ you want printed to the right of the decimal point. For the `g',
+ and `G' formats, it specifies the maximum number of significant
+ digits. For the `d', `o', `i', `u', `x', and `X' formats, it
+ specifies the minimum number of digits to print. For the `s'
+ format, it specifies the maximum number of characters from the
+ string that should be printed.
+
+ Either or both of the WIDTH and PREC values may be specified as `*'.
+In that case, the particular value is taken from the argument list.
+
+ *Note Using `printf' Statements for Fancier Printing: Printf.
+
+
+File: gawk.info, Node: Special File Summary, Next: Built-in Functions Summary, Prev: Printf Summary, Up: Actions Summary
+
+Special File Names
+------------------
+
+ When doing I/O redirection from either `print' or `printf' into a
+file, or via `getline' from a file, `gawk' recognizes certain special
+file names internally. These file names allow access to open file
+descriptors inherited from `gawk''s parent process (usually the shell).
+The file names are:
+
+`/dev/stdin'
+ The standard input.
+
+`/dev/stdout'
+ The standard output.
+
+`/dev/stderr'
+ The standard error output.
+
+`/dev/fd/N'
+ The file denoted by the open file descriptor N.
+
+ In addition, reading the following files provides process related
+information about the running `gawk' program. All returned records are
+terminated with a newline.
+
+`/dev/pid'
+ Returns the process ID of the current process.
+
+`/dev/ppid'
+ Returns the parent process ID of the current process.
+
+`/dev/pgrpid'
+ Returns the process group ID of the current process.
+
+`/dev/user'
+ At least four space-separated fields, containing the return values
+ of the `getuid', `geteuid', `getgid', and `getegid' system calls.
+ If there are any additional fields, they are the group IDs
+ returned by `getgroups' system call. (Multiple groups may not be
+ supported on all systems.)
+
+These file names may also be used on the command line to name data
+files. These file names are only recognized internally if you do not
+actually have files with these names on your system.
+
+ *Note Special File Names in `gawk': Special Files, for a longer
+description that provides the motivation for this feature.
+
+
+File: gawk.info, Node: Built-in Functions Summary, Next: Time Functions Summary, Prev: Special File Summary, Up: Actions Summary
+
+Built-in Functions
+------------------
+
+ `awk' provides a number of built-in functions for performing numeric
+operations, string related operations, and I/O related operations.
+
+ The built-in arithmetic functions are:
+
+`atan2(Y, X)'
+ the arctangent of Y/X in radians.
+
+`cos(EXPR)'
+ the cosine of EXPR, which is in radians.
+
+`exp(EXPR)'
+ the exponential function (`e ^ EXPR').
+
+`int(EXPR)'
+ truncates to integer.
+
+`log(EXPR)'
+ the natural logarithm of `expr'.
+
+`rand()'
+ a random number between zero and one.
+
+`sin(EXPR)'
+ the sine of EXPR, which is in radians.
+
+`sqrt(EXPR)'
+ the square root function.
+
+`srand([EXPR])'
+ use EXPR as a new seed for the random number generator. If no EXPR
+ is provided, the time of day is used. The return value is the
+ previous seed for the random number generator.
+
+ `awk' has the following built-in string functions:
+
+`gensub(REGEX, SUBST, HOW [, TARGET])'
+ If HOW is a string beginning with `g' or `G', then replace each
+ match of REGEX in TARGET with SUBST. Otherwise, replace the
+ HOW'th occurrence. If TARGET is not supplied, use `$0'. The
+ return value is the changed string; the original TARGET is not
+ modified. Within SUBST, `\N', where N is a digit from one to nine,
+ can be used to indicate the text that matched the N'th
+ parenthesized subexpression. This function is `gawk'-specific.
+
+`gsub(REGEX, SUBST [, TARGET])'
+ for each substring matching the regular expression REGEX in the
+ string TARGET, substitute the string SUBST, and return the number
+ of substitutions. If TARGET is not supplied, use `$0'.
+
+`index(STR, SEARCH)'
+ returns the index of the string SEARCH in the string STR, or zero
+ if SEARCH is not present.
+
+`length([STR])'
+ returns the length of the string STR. The length of `$0' is
+ returned if no argument is supplied.
+
+`match(STR, REGEX)'
+ returns the position in STR where the regular expression REGEX
+ occurs, or zero if REGEX is not present, and sets the values of
+ `RSTART' and `RLENGTH'.
+
+`split(STR, ARR [, REGEX])'
+ splits the string STR into the array ARR on the regular expression
+ REGEX, and returns the number of elements. If REGEX is omitted,
+ `FS' is used instead. REGEX can be the null string, causing each
+ character to be placed into its own array element. The array ARR
+ is cleared first.
+
+`sprintf(FMT, EXPR-LIST)'
+ prints EXPR-LIST according to FMT, and returns the resulting
+ string.
+
+`sub(REGEX, SUBST [, TARGET])'
+ just like `gsub', but only the first matching substring is
+ replaced.
+
+`substr(STR, INDEX [, LEN])'
+ returns the LEN-character substring of STR starting at INDEX. If
+ LEN is omitted, the rest of STR is used.
+
+`tolower(STR)'
+ returns a copy of the string STR, with all the upper-case
+ characters in STR translated to their corresponding lower-case
+ counterparts. Non-alphabetic characters are left unchanged.
+
+`toupper(STR)'
+ returns a copy of the string STR, with all the lower-case
+ characters in STR translated to their corresponding upper-case
+ counterparts. Non-alphabetic characters are left unchanged.
+
+ The I/O related functions are:
+
+`close(EXPR)'
+ Close the open file or pipe denoted by EXPR.
+
+`fflush([EXPR])'
+ Flush any buffered output for the output file or pipe denoted by
+ EXPR. If EXPR is omitted, standard output is flushed. If EXPR is
+ the null string (`""'), all output buffers are flushed.
+
+`system(CMD-LINE)'
+ Execute the command CMD-LINE, and return the exit status. If your
+ operating system does not support `system', calling it will
+ generate a fatal error.
+
+ `system("")' can be used to force `awk' to flush any pending
+ output. This is more portable, but less obvious, than calling
+ `fflush'.
+
+
+File: gawk.info, Node: Time Functions Summary, Next: String Constants Summary, Prev: Built-in Functions Summary, Up: Actions Summary
+
+Time Functions
+--------------
+
+ The following two functions are available for getting the current
+time of day, and for formatting time stamps. They are specific to
+`gawk'.
+
+`systime()'
+ returns the current time of day as the number of seconds since a
+ particular epoch (Midnight, January 1, 1970 UTC, on POSIX systems).
+
+`strftime([FORMAT[, TIMESTAMP]])'
+ formats TIMESTAMP according to the specification in FORMAT. The
+ current time of day is used if no TIMESTAMP is supplied. A
+ default format equivalent to the output of the `date' utility is
+ used if no FORMAT is supplied. *Note Functions for Dealing with
+ Time Stamps: Time Functions, for the details on the conversion
+ specifiers that `strftime' accepts.
+
+
+File: gawk.info, Node: String Constants Summary, Prev: Time Functions Summary, Up: Actions Summary
+
+String Constants
+----------------
+
+ String constants in `awk' are sequences of characters enclosed in
+double quotes (`"'). Within strings, certain "escape sequences" are
+recognized, as in C. These are:
+
+`\\'
+ A literal backslash.
+
+`\a'
+ The "alert" character; usually the ASCII BEL character.
+
+`\b'
+ Backspace.
+
+`\f'
+ Formfeed.
+
+`\n'
+ Newline.
+
+`\r'
+ Carriage return.
+
+`\t'
+ Horizontal tab.
+
+`\v'
+ Vertical tab.
+
+`\xHEX DIGITS'
+ The character represented by the string of hexadecimal digits
+ following the `\x'. As in ANSI C, all following hexadecimal
+ digits are considered part of the escape sequence. E.g., `"\x1B"'
+ is a string containing the ASCII ESC (escape) character. (The `\x'
+ escape sequence is not in POSIX `awk'.)
+
+`\DDD'
+ The character represented by the one, two, or three digit sequence
+ of octal digits. Thus, `"\033"' is also a string containing the
+ ASCII ESC (escape) character.
+
+`\C'
+ The literal character C, if C is not one of the above.
+
+ The escape sequences may also be used inside constant regular
+expressions (e.g., the regexp `/[ \t\f\n\r\v]/' matches whitespace
+characters).
+
+ *Note Escape Sequences::.
+
+
+File: gawk.info, Node: Functions Summary, Next: Historical Features, Prev: Actions Summary, Up: Gawk Summary
+
+User-defined Functions
+======================
+
+ Functions in `awk' are defined as follows:
+
+ function NAME(PARAMETER LIST) { STATEMENTS }
+
+ Actual parameters supplied in the function call are used to
+instantiate the formal parameters declared in the function. Arrays are
+passed by reference, other variables are passed by value.
+
+ If there are fewer arguments passed than there are names in
+PARAMETER-LIST, the extra names are given the null string as their
+value. Extra names have the effect of local variables.
+
+ The open-parenthesis in a function call of a user-defined function
+must immediately follow the function name, without any intervening
+white space. This is to avoid a syntactic ambiguity with the
+concatenation operator.
+
+ The word `func' may be used in place of `function' (but not in POSIX
+`awk').
+
+ Use the `return' statement to return a value from a function.
+
+ *Note User-defined Functions: User-defined.
+
+
+File: gawk.info, Node: Historical Features, Prev: Functions Summary, Up: Gawk Summary
+
+Historical Features
+===================
+
+ There are two features of historical `awk' implementations that
+`gawk' supports.
+
+ First, it is possible to call the `length' built-in function not only
+with no arguments, but even without parentheses!
+
+ a = length
+
+is the same as either of
+
+ a = length()
+ a = length($0)
+
+For example:
+
+ $ echo abcdef | awk '{ print length }'
+ -| 6
+
+This feature is marked as "deprecated" in the POSIX standard, and
+`gawk' will issue a warning about its use if `--lint' is specified on
+the command line. (The ability to use `length' this way was actually
+an accident of the original Unix `awk' implementation. If any built-in
+function used `$0' as its default argument, it was possible to call
+that function without the parentheses. In particular, it was common
+practice to use the `length' function in this fashion, and this usage
+was documented in the `awk' manual page.)
+
+ The other historical feature is the use of either the `break'
+statement, or the `continue' statement outside the body of a `while',
+`for', or `do' loop. Traditional `awk' implementations have treated
+such usage as equivalent to the `next' statement. More recent versions
+of Unix `awk' do not allow it. `gawk' supports this usage if
+`--traditional' has been specified.
+
+ *Note Command Line Options: Options, for more information about the
+`--posix' and `--lint' options.
+
+
+File: gawk.info, Node: Installation, Next: Notes, Prev: Gawk Summary, Up: Top
+
+Installing `gawk'
+*****************
+
+ This appendix provides instructions for installing `gawk' on the
+various platforms that are supported by the developers. The primary
+developers support Unix (and one day, GNU), while the other ports were
+contributed. The file `ACKNOWLEDGMENT' in the `gawk' distribution
+lists the electronic mail addresses of the people who did the
+respective ports, and they are also provided in *Note Reporting
+Problems and Bugs: Bugs.
+
+* Menu:
+
+* Gawk Distribution:: What is in the `gawk' distribution.
+* Unix Installation:: Installing `gawk' under various versions
+ of Unix.
+* VMS Installation:: Installing `gawk' on VMS.
+* PC Installation:: Installing and Compiling `gawk' on MS-DOS
+ and OS/2
+* Atari Installation:: Installing `gawk' on the Atari ST.
+* Amiga Installation:: Installing `gawk' on an Amiga.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available `awk'
+ implementations.
+
+
+File: gawk.info, Node: Gawk Distribution, Next: Unix Installation, Prev: Installation, Up: Installation
+
+The `gawk' Distribution
+=======================
+
+ This section first describes how to get the `gawk' distribution, how
+to extract it, and then what is in the various files and subdirectories.
+
+* Menu:
+
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+
+
+File: gawk.info, Node: Getting, Next: Extracting, Prev: Gawk Distribution, Up: Gawk Distribution
+
+Getting the `gawk' Distribution
+-------------------------------
+
+ There are three ways you can get GNU software.
+
+ 1. You can copy it from someone else who already has it.
+
+ 2. You can order `gawk' directly from the Free Software Foundation.
+ Software distributions are available for Unix, MS-DOS, and VMS, on
+ tape, CD-ROM, or floppies (MS-DOS only). The address is:
+
+ Free Software Foundation
+ 59 Temple Place--Suite 330
+ Boston, MA 02111-1307 USA
+ Phone: +1-617-542-5942
+ Fax (including Japan): +1-617-542-2652
+ E-mail: `gnu@prep.ai.mit.edu'
+
+ Ordering from the FSF directly contributes to the support of the
+ foundation and to the production of more free software.
+
+ 3. You can get `gawk' by using anonymous `ftp' to the Internet host
+ `ftp.gnu.ai.mit.edu', in the directory `/pub/gnu'.
+
+ Here is a list of alternate `ftp' sites from which you can obtain
+ GNU software. When a site is listed as "SITE`:'DIRECTORY" the
+ DIRECTORY indicates the directory where GNU software is kept. You
+ should use a site that is geographically close to you.
+
+ Asia:
+
+ `cair-archive.kaist.ac.kr:/pub/gnu'
+ `ftp.cs.titech.ac.jp'
+ `ftp.nectec.or.th:/pub/mirrors/gnu'
+ `utsun.s.u-tokyo.ac.jp:/ftpsync/prep'
+
+ Australia:
+
+ `archie.au:/gnu'
+ (`archie.oz' or `archie.oz.au' for ACSnet)
+
+ Africa:
+
+ `ftp.sun.ac.za:/pub/gnu'
+
+ Middle East:
+
+ `ftp.technion.ac.il:/pub/unsupported/gnu'
+
+ Europe:
+
+ `archive.eu.net'
+ `ftp.denet.dk'
+ `ftp.eunet.ch'
+ `ftp.funet.fi:/pub/gnu'
+ `ftp.ieunet.ie:pub/gnu'
+ `ftp.informatik.rwth-aachen.de:/pub/gnu'
+ `ftp.informatik.tu-muenchen.de'
+ `ftp.luth.se:/pub/unix/gnu'
+ `ftp.mcc.ac.uk'
+ `ftp.stacken.kth.se'
+ `ftp.sunet.se:/pub/gnu'
+ `ftp.univ-lyon1.fr:pub/gnu'
+ `ftp.win.tue.nl:/pub/gnu'
+ `irisa.irisa.fr:/pub/gnu'
+ `isy.liu.se'
+ `nic.switch.ch:/mirror/gnu'
+ `src.doc.ic.ac.uk:/gnu'
+ `unix.hensa.ac.uk:/pub/uunet/systems/gnu'
+
+ South America:
+
+ `ftp.inf.utfsm.cl:/pub/gnu'
+ `ftp.unicamp.br:/pub/gnu'
+
+ Western Canada:
+
+ `ftp.cs.ubc.ca:/mirror2/gnu'
+
+ USA:
+
+ `col.hp.com:/mirrors/gnu'
+ `f.ms.uky.edu:/pub3/gnu'
+ `ftp.cc.gatech.edu:/pub/gnu'
+ `ftp.cs.columbia.edu:/archives/gnu/prep'
+ `ftp.digex.net:/pub/gnu'
+ `ftp.hawaii.edu:/mirrors/gnu'
+ `ftp.kpc.com:/pub/mirror/gnu'
+
+ USA (continued):
+ `ftp.uu.net:/systems/gnu'
+ `gatekeeper.dec.com:/pub/GNU'
+ `jaguar.utah.edu:/gnustuff'
+ `labrea.stanford.edu'
+ `mrcnext.cso.uiuc.edu:/pub/gnu'
+ `vixen.cso.uiuc.edu:/gnu'
+ `wuarchive.wustl.edu:/systems/gnu'
+
+
+File: gawk.info, Node: Extracting, Next: Distribution contents, Prev: Getting, Up: Gawk Distribution
+
+Extracting the Distribution
+---------------------------
+
+ `gawk' is distributed as a `tar' file compressed with the GNU Zip
+program, `gzip'.
+
+ Once you have the distribution (for example, `gawk-3.0.1.tar.gz'),
+first use `gzip' to expand the file, and then use `tar' to extract it.
+You can use the following pipeline to produce the `gawk' distribution:
+
+ # Under System V, add 'o' to the tar flags
+ gzip -d -c gawk-3.0.1.tar.gz | tar -xvpf -
+
+This will create a directory named `gawk-3.0.1' in the current
+directory.
+
+ The distribution file name is of the form `gawk-V.R.N.tar.gz'. The
+V represents the major version of `gawk', the R represents the current
+release of version V, and the N represents a "patch level", meaning
+that minor bugs have been fixed in the release. The current patch
+level is 0, but when retrieving distributions, you should get the
+version with the highest version, release, and patch level. (Note that
+release levels greater than or equal to 90 denote "beta," or
+non-production software; you may not wish to retrieve such a version
+unless you don't mind experimenting.)
+
+ If you are not on a Unix system, you will need to make other
+arrangements for getting and extracting the `gawk' distribution. You
+should consult a local expert.
+
+
+File: gawk.info, Node: Distribution contents, Prev: Extracting, Up: Gawk Distribution
+
+Contents of the `gawk' Distribution
+-----------------------------------
+
+ The `gawk' distribution has a number of C source files,
+documentation files, subdirectories and files related to the
+configuration process (*note Compiling and Installing `gawk' on Unix:
+Unix Installation.), and several subdirectories related to different,
+non-Unix, operating systems.
+
+various `.c', `.y', and `.h' files
+ These files are the actual `gawk' source code.
+
+`README'
+`README_d/README.*'
+ Descriptive files: `README' for `gawk' under Unix, and the rest
+ for the various hardware and software combinations.
+
+`INSTALL'
+ A file providing an overview of the configuration and installation
+ process.
+
+`PORTS'
+ A list of systems to which `gawk' has been ported, and which have
+ successfully run the test suite.
+
+`ACKNOWLEDGMENT'
+ A list of the people who contributed major parts of the code or
+ documentation.
+
+`ChangeLog'
+ A detailed list of source code changes as bugs are fixed or
+ improvements made.
+
+`NEWS'
+ A list of changes to `gawk' since the last release or patch.
+
+`COPYING'
+ The GNU General Public License.
+
+`FUTURES'
+ A brief list of features and/or changes being contemplated for
+ future releases, with some indication of the time frame for the
+ feature, based on its difficulty.
+
+`LIMITATIONS'
+ A list of those factors that limit `gawk''s performance. Most of
+ these depend on the hardware or operating system software, and are
+ not limits in `gawk' itself.
+
+`POSIX.STD'
+ A description of one area where the POSIX standard for `awk' is
+ incorrect, and how `gawk' handles the problem.
+
+`PROBLEMS'
+ A file describing known problems with the current release.
+
+`doc/awkforai.txt'
+ A short article describing why `gawk' is a good language for AI
+ (Artificial Intelligence) programming.
+
+`doc/README.card'
+`doc/ad.block'
+`doc/awkcard.in'
+`doc/cardfonts'
+`doc/colors'
+`doc/macros'
+`doc/no.colors'
+`doc/setter.outline'
+ The `troff' source for a five-color `awk' reference card. A
+ modern version of `troff', such as GNU Troff (`groff') is needed
+ to produce the color version. See the file `README.card' for
+ instructions if you have an older `troff'.
+
+`doc/gawk.1'
+ The `troff' source for a manual page describing `gawk'. This is
+ distributed for the convenience of Unix users.
+
+`doc/gawk.texi'
+ The Texinfo source file for this Info file. It should be
+ processed with TeX to produce a printed document, and with
+ `makeinfo' to produce an Info file.
+
+`doc/gawk.info'
+ The generated Info file for this Info file.
+
+`doc/igawk.1'
+ The `troff' source for a manual page describing the `igawk'
+ program presented in *Note An Easy Way to Use Library Functions:
+ Igawk Program.
+
+`doc/Makefile.in'
+ The input file used during the configuration process to generate
+ the actual `Makefile' for creating the documentation.
+
+`Makefile.in'
+`acconfig.h'
+`aclocal.m4'
+`configh.in'
+`configure.in'
+`configure'
+`custom.h'
+`missing/*'
+ These files and subdirectory are used when configuring `gawk' for
+ various Unix systems. They are explained in detail in *Note
+ Compiling and Installing `gawk' on Unix: Unix Installation.
+
+`awklib/extract.awk'
+`awklib/Makefile.in'
+ The `awklib' directory contains a copy of `extract.awk' (*note
+ Extracting Programs from Texinfo Source Files: Extract Program.),
+ which can be used to extract the sample programs from the Texinfo
+ source file for this Info file, and a `Makefile.in' file, which
+ `configure' uses to generate a `Makefile'. As part of the process
+ of building `gawk', the library functions from *Note A Library of
+ `awk' Functions: Library Functions, and the `igawk' program from
+ *Note An Easy Way to Use Library Functions: Igawk Program, are
+ extracted into ready to use files. They are installed as part of
+ the installation process.
+
+`amiga/*'
+ Files needed for building `gawk' on an Amiga. *Note Installing
+ `gawk' on an Amiga: Amiga Installation, for details.
+
+`atari/*'
+ Files needed for building `gawk' on an Atari ST. *Note Installing
+ `gawk' on the Atari ST: Atari Installation, for details.
+
+`pc/*'
+ Files needed for building `gawk' under MS-DOS and OS/2. *Note
+ MS-DOS and OS/2 Installation and Compilation: PC Installation, for
+ details.
+
+`vms/*'
+ Files needed for building `gawk' under VMS. *Note How to Compile
+ and Install `gawk' on VMS: VMS Installation, for details.
+
+`test/*'
+ A test suite for `gawk'. You can use `make check' from the top
+ level `gawk' directory to run your version of `gawk' against the
+ test suite. If `gawk' successfully passes `make check' then you
+ can be confident of a successful port.
+
+
+File: gawk.info, Node: Unix Installation, Next: VMS Installation, Prev: Gawk Distribution, Up: Installation
+
+Compiling and Installing `gawk' on Unix
+=======================================
+
+ Usually, you can compile and install `gawk' by typing only two
+commands. However, if you do use an unusual system, you may need to
+configure `gawk' for your system yourself.
+
+* Menu:
+
+* Quick Installation:: Compiling `gawk' under Unix.
+* Configuration Philosophy:: How it's all supposed to work.
+
+
+File: gawk.info, Node: Quick Installation, Next: Configuration Philosophy, Prev: Unix Installation, Up: Unix Installation
+
+Compiling `gawk' for Unix
+-------------------------
+
+ After you have extracted the `gawk' distribution, `cd' to
+`gawk-3.0.1'. Like most GNU software, `gawk' is configured
+automatically for your Unix system by running the `configure' program.
+This program is a Bourne shell script that was generated automatically
+using GNU `autoconf'. (The `autoconf' software is described fully
+starting with *Note Introduction: (autoconf)Top.)
+
+ To configure `gawk', simply run `configure':
+
+ sh ./configure
+
+ This produces a `Makefile' and `config.h' tailored to your system.
+The `config.h' file describes various facts about your system. You may
+wish to edit the `Makefile' to change the `CFLAGS' variable, which
+controls the command line options that are passed to the C compiler
+(such as optimization levels, or compiling for debugging).
+
+ Alternatively, you can add your own values for most `make'
+variables, such as `CC' and `CFLAGS', on the command line when running
+`configure':
+
+ CC=cc CFLAGS=-g sh ./configure
+
+See the file `INSTALL' in the `gawk' distribution for all the details.
+
+ After you have run `configure', and possibly edited the `Makefile',
+type:
+
+ make
+
+and shortly thereafter, you should have an executable version of `gawk'.
+That's all there is to it! (If these steps do not work, please send in
+a bug report; *note Reporting Problems and Bugs: Bugs..)
+
+
+File: gawk.info, Node: Configuration Philosophy, Prev: Quick Installation, Up: Unix Installation
+
+The Configuration Process
+-------------------------
+
+ (This section is of interest only if you know something about using
+the C language and the Unix operating system.)
+
+ The source code for `gawk' generally attempts to adhere to formal
+standards wherever possible. This means that `gawk' uses library
+routines that are specified by the ANSI C standard and by the POSIX
+operating system interface standard. When using an ANSI C compiler,
+function prototypes are used to help improve the compile-time checking.
+
+ Many Unix systems do not support all of either the ANSI or the POSIX
+standards. The `missing' subdirectory in the `gawk' distribution
+contains replacement versions of those subroutines that are most likely
+to be missing.
+
+ The `config.h' file that is created by the `configure' program
+contains definitions that describe features of the particular operating
+system where you are attempting to compile `gawk'. The three things
+described by this file are what header files are available, so that
+they can be correctly included, what (supposedly) standard functions
+are actually available in your C libraries, and other miscellaneous
+facts about your variant of Unix. For example, there may not be an
+`st_blksize' element in the `stat' structure. In this case
+`HAVE_ST_BLKSIZE' would be undefined.
+
+ It is possible for your C compiler to lie to `configure'. It may do
+so by not exiting with an error when a library function is not
+available. To get around this, you can edit the file `custom.h'. Use
+an `#ifdef' that is appropriate for your system, and either `#define'
+any constants that `configure' should have defined but didn't, or
+`#undef' any constants that `configure' defined and should not have.
+`custom.h' is automatically included by `config.h'.
+
+ It is also possible that the `configure' program generated by
+`autoconf' will not work on your system in some other fashion. If you
+do have a problem, the file `configure.in' is the input for `autoconf'.
+You may be able to change this file, and generate a new version of
+`configure' that will work on your system. *Note Reporting Problems
+and Bugs: Bugs, for information on how to report problems in
+configuring `gawk'. The same mechanism may be used to send in updates
+to `configure.in' and/or `custom.h'.
+
+
+File: gawk.info, Node: VMS Installation, Next: PC Installation, Prev: Unix Installation, Up: Installation
+
+How to Compile and Install `gawk' on VMS
+========================================
+
+ This section describes how to compile and install `gawk' under VMS.
+
+* Menu:
+
+* VMS Compilation:: How to compile `gawk' under VMS.
+* VMS Installation Details:: How to install `gawk' under VMS.
+* VMS Running:: How to run `gawk' under VMS.
+* VMS POSIX:: Alternate instructions for VMS POSIX.
+
+
+File: gawk.info, Node: VMS Compilation, Next: VMS Installation Details, Prev: VMS Installation, Up: VMS Installation
+
+Compiling `gawk' on VMS
+-----------------------
+
+ To compile `gawk' under VMS, there is a `DCL' command procedure that
+will issue all the necessary `CC' and `LINK' commands, and there is
+also a `Makefile' for use with the `MMS' utility. From the source
+directory, use either
+
+ $ @[.VMS]VMSBUILD.COM
+
+or
+
+ $ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK
+
+ Depending upon which C compiler you are using, follow one of the sets
+of instructions in this table:
+
+VAX C V3.x
+ Use either `vmsbuild.com' or `descrip.mms' as is. These use
+ `CC/OPTIMIZE=NOLINE', which is essential for Version 3.0.
+
+VAX C V2.x
+ You must have Version 2.3 or 2.4; older ones won't work. Edit
+ either `vmsbuild.com' or `descrip.mms' according to the comments
+ in them. For `vmsbuild.com', this just entails removing two `!'
+ delimiters. Also edit `config.h' (which is a copy of file
+ `[.config]vms-conf.h') and comment out or delete the two lines
+ `#define __STDC__ 0' and `#define VAXC_BUILTINS' near the end.
+
+GNU C
+ Edit `vmsbuild.com' or `descrip.mms'; the changes are different
+ from those for VAX C V2.x, but equally straightforward. No
+ changes to `config.h' should be needed.
+
+DEC C
+ Edit `vmsbuild.com' or `descrip.mms' according to their comments.
+ No changes to `config.h' should be needed.
+
+ `gawk' has been tested under VAX/VMS 5.5-1 using VAX C V3.2, GNU C
+1.40 and 2.3. It should work without modifications for VMS V4.6 and up.
+
+
+File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Compilation, Up: VMS Installation
+
+Installing `gawk' on VMS
+------------------------
+
+ To install `gawk', all you need is a "foreign" command, which is a
+`DCL' symbol whose value begins with a dollar sign. For example:
+
+ $ GAWK :== $disk1:[gnubin]GAWK
+
+(Substitute the actual location of `gawk.exe' for `$disk1:[gnubin]'.)
+The symbol should be placed in the `login.com' of any user who wishes
+to run `gawk', so that it will be defined every time the user logs on.
+Alternatively, the symbol may be placed in the system-wide
+`sylogin.com' procedure, which will allow all users to run `gawk'.
+
+ Optionally, the help entry can be loaded into a VMS help library:
+
+ $ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
+
+(You may want to substitute a site-specific help library rather than
+the standard VMS library `HELPLIB'.) After loading the help text,
+
+ $ HELP GAWK
+
+will provide information about both the `gawk' implementation and the
+`awk' programming language.
+
+ The logical name `AWK_LIBRARY' can designate a default location for
+`awk' program files. For the `-f' option, if the specified filename
+has no device or directory path information in it, `gawk' will look in
+the current directory first, then in the directory specified by the
+translation of `AWK_LIBRARY' if the file was not found. If after
+searching in both directories, the file still is not found, then `gawk'
+appends the suffix `.awk' to the filename and the file search will be
+re-tried. If `AWK_LIBRARY' is not defined, that portion of the file
+search will fail benignly.
+
+
+File: gawk.info, Node: VMS Running, Next: VMS POSIX, Prev: VMS Installation Details, Up: VMS Installation
+
+Running `gawk' on VMS
+---------------------
+
+ Command line parsing and quoting conventions are significantly
+different on VMS, so examples in this Info file or from other sources
+often need minor changes. They _are_ minor though, and all `awk'
+programs should run correctly.
+
+ Here are a couple of trivial tests:
+
+ $ gawk -- "BEGIN {print ""Hello, World!""}"
+ $ gawk -"W" version
+ ! could also be -"W version" or "-W version"
+
+Note that upper-case and mixed-case text must be quoted.
+
+ The VMS port of `gawk' includes a `DCL'-style interface in addition
+to the original shell-style interface (see the help entry for details).
+One side-effect of dual command line parsing is that if there is only a
+single parameter (as in the quoted string program above), the command
+becomes ambiguous. To work around this, the normally optional `--'
+flag is required to force Unix style rather than `DCL' parsing. If any
+other dash-type options (or multiple parameters such as data files to be
+processed) are present, there is no ambiguity and `--' can be omitted.
+
+ The default search path when looking for `awk' program files
+specified by the `-f' option is `"SYS$DISK:[],AWK_LIBRARY:"'. The
+logical name `AWKPATH' can be used to override this default. The format
+of `AWKPATH' is a comma-separated list of directory specifications.
+When defining it, the value should be quoted so that it retains a single
+translation, and not a multi-translation `RMS' searchlist.
+
+
+File: gawk.info, Node: VMS POSIX, Prev: VMS Running, Up: VMS Installation
+
+Building and Using `gawk' on VMS POSIX
+--------------------------------------
+
+ Ignore the instructions above, although `vms/gawk.hlp' should still
+be made available in a help library. The source tree should be unpacked
+into a container file subsystem rather than into the ordinary VMS file
+system. Make sure that the two scripts, `configure' and
+`vms/posix-cc.sh', are executable; use `chmod +x' on them if necessary.
+Then execute the following two commands:
+
+ psx> CC=vms/posix-cc.sh configure
+ psx> make CC=c89 gawk
+
+The first command will construct files `config.h' and `Makefile' out of
+templates, using a script to make the C compiler fit `configure''s
+expectations. The second command will compile and link `gawk' using
+the C compiler directly; ignore any warnings from `make' about being
+unable to redefine `CC'. `configure' will take a very long time to
+execute, but at least it provides incremental feedback as it runs.
+
+ This has been tested with VAX/VMS V6.2, VMS POSIX V2.0, and DEC C
+V5.2.
+
+ Once built, `gawk' will work like any other shell utility. Unlike
+the normal VMS port of `gawk', no special command line manipulation is
+needed in the VMS POSIX environment.
+
+
+File: gawk.info, Node: PC Installation, Next: Atari Installation, Prev: VMS Installation, Up: Installation
+
+MS-DOS and OS/2 Installation and Compilation
+============================================
+
+ If you have received a binary distribution prepared by the DOS
+maintainers, then `gawk' and the necessary support files will appear
+under the `gnu' directory, with executables in `gnu/bin', libraries in
+`gnu/lib/awk', and manual pages under `gnu/man'. This is designed for
+easy installation to a `/gnu' directory on your drive, but the files
+can be installed anywhere provided `AWKPATH' is set properly.
+Regardless of the installation directory, the first line of `igawk.cmd'
+and `igawk.bat' (in `gnu/bin') may need to be edited.
+
+ The binary distribution will contain a separate file describing the
+contents. In particular, it may include more than one version of the
+`gawk' executable. OS/2 binary distributions may have a different
+arrangement, but installation is similar.
+
+ The OS/2 and MS-DOS versions of `gawk' search for program files as
+described in *Note The `AWKPATH' Environment Variable: AWKPATH Variable.
+However, semicolons (rather than colons) separate elements in the
+`AWKPATH' variable. If `AWKPATH' is not set or is empty, then the
+default search path is `".;c:/lib/awk;c:/gnu/lib/awk"'.
+
+ An `sh'-like shell (as opposed to `command.com' under MS-DOS or
+`cmd.exe' under OS/2) may be useful for `awk' programming. Ian
+Stewartson has written an excellent shell for MS-DOS and OS/2, and a
+`ksh' clone and GNU Bash are available for OS/2. The file
+`README_d/README.pc' in the `gawk' distribution contains information on
+these shells. Users of Stewartson's shell on DOS should examine its
+documentation on handling of command-lines. In particular, the setting
+for `gawk' in the shell configuration may need to be changed, and the
+`ignoretype' option may also be of interest.
+
+ `gawk' can be compiled for MS-DOS and OS/2 using the GNU development
+tools from DJ Delorie (DJGPP, MS-DOS-only) or Eberhard Mattes (EMX,
+MS-DOS and OS/2). Microsoft C can be used to build 16-bit versions for
+MS-DOS and OS/2. The file `README_d/README.pc' in the `gawk'
+distribution contains additional notes, and `pc/Makefile' contains
+important notes on compilation options.
+
+ To build `gawk', copy the files in the `pc' directory (_except_ for
+`ChangeLog') to the directory with the rest of the `gawk' sources. The
+`Makefile' contains a configuration section with comments, and may need
+to be edited in order to work with your `make' utility.
+
+ The `Makefile' contains a number of targets for building various
+MS-DOS and OS/2 versions. A list of targets will be printed if the
+`make' command is given without a target. As an example, to build `gawk'
+using the DJGPP tools, enter `make djgpp'.
+
+ Using `make' to run the standard tests and to install `gawk'
+requires additional Unix-like tools, including `sh', `sed', and `cp'.
+In order to run the tests, the `test/*.ok' files may need to be
+converted so that they have the usual DOS-style end-of-line markers.
+Most of the tests will work properly with Stewartson's shell along with
+the companion utilities or appropriate GNU utilities. However, some
+editing of `test/Makefile' is required. It is recommended that the file
+`pc/Makefile.tst' be copied to `test/Makefile' as a replacement.
+Details can be found in `README_d/README.pc'.
+
+
+File: gawk.info, Node: Atari Installation, Next: Amiga Installation, Prev: PC Installation, Up: Installation
+
+Installing `gawk' on the Atari ST
+=================================
+
+ There are no substantial differences when installing `gawk' on
+various Atari models. Compiled `gawk' executables do not require a
+large amount of memory with most `awk' programs and should run on all
+Motorola processor based models (called further ST, even if that is not
+exactly right).
+
+ In order to use `gawk', you need to have a shell, either text or
+graphics, that does not map all the characters of a command line to
+upper-case. Maintaining case distinction in option flags is very
+important (*note Command Line Options: Options.). These days this is
+the default, and it may only be a problem for some very old machines.
+If your system does not preserve the case of option flags, you will
+need to upgrade your tools. Support for I/O redirection is necessary
+to make it easy to import `awk' programs from other environments.
+Pipes are nice to have, but not vital.
+
+* Menu:
+
+* Atari Compiling:: Compiling `gawk' on Atari
+* Atari Using:: Running `gawk' on Atari
+
+
+File: gawk.info, Node: Atari Compiling, Next: Atari Using, Prev: Atari Installation, Up: Atari Installation
+
+Compiling `gawk' on the Atari ST
+--------------------------------
+
+ A proper compilation of `gawk' sources when `sizeof(int)' differs
+from `sizeof(void *)' requires an ANSI C compiler. An initial port was
+done with `gcc'. You may actually prefer executables where `int's are
+four bytes wide, but the other variant works as well.
+
+ You may need quite a bit of memory when trying to recompile the
+`gawk' sources, as some source files (`regex.c' in particular) are quite
+big. If you run out of memory compiling such a file, try reducing the
+optimization level for this particular file; this may help.
+
+ With a reasonable shell (Bash will do), and in particular if you run
+Linux, MiNT or a similar operating system, you have a pretty good
+chance that the `configure' utility will succeed. Otherwise sample
+versions of `config.h' and `Makefile.st' are given in the `atari'
+subdirectory and can be edited and copied to the corresponding files in
+the main source directory. Even if `configure' produced something, it
+might be advisable to compare its results with the sample versions and
+possibly make adjustments.
+
+ Some `gawk' source code fragments depend on a preprocessor define
+`atarist'. This basically assumes the TOS environment with `gcc'.
+Modify these sections as appropriate if they are not right for your
+environment. Also see the remarks about `AWKPATH' and `envsep' in
+*Note Running `gawk' on the Atari ST: Atari Using.
+
+ As shipped, the sample `config.h' claims that the `system' function
+is missing from the libraries, which is not true, and an alternative
+implementation of this function is provided in `atari/system.c'.
+Depending upon your particular combination of shell and operating
+system, you may wish to change the file to indicate that `system' is
+available.
+
+
+File: gawk.info, Node: Atari Using, Prev: Atari Compiling, Up: Atari Installation
+
+Running `gawk' on the Atari ST
+------------------------------
+
+ An executable version of `gawk' should be placed, as usual, anywhere
+in your `PATH' where your shell can find it.
+
+ While executing, `gawk' creates a number of temporary files. When
+using `gcc' libraries for TOS, `gawk' looks for either of the
+environment variables `TEMP' or `TMPDIR', in that order. If either one
+is found, its value is assumed to be a directory for temporary files.
+This directory must exist, and if you can spare the memory, it is a
+good idea to put it on a RAM drive. If neither `TEMP' nor `TMPDIR' are
+found, then `gawk' uses the current directory for its temporary files.
+
+ The ST version of `gawk' searches for its program files as described
+in *Note The `AWKPATH' Environment Variable: AWKPATH Variable. The
+default value for the `AWKPATH' variable is taken from `DEFPATH'
+defined in `Makefile'. The sample `gcc'/TOS `Makefile' for the ST in
+the distribution sets `DEFPATH' to `".,c:\lib\awk,c:\gnu\lib\awk"'.
+The search path can be modified by explicitly setting `AWKPATH' to
+whatever you wish. Note that colons cannot be used on the ST to
+separate elements in the `AWKPATH' variable, since they have another,
+reserved, meaning. Instead, you must use a comma to separate elements
+in the path. When recompiling, the separating character can be
+modified by initializing the `envsep' variable in `atari/gawkmisc.atr'
+to another value.
+
+ Although `awk' allows great flexibility in doing I/O redirections
+from within a program, this facility should be used with care on the ST
+running under TOS. In some circumstances the OS routines for file
+handle pool processing lose track of certain events, causing the
+computer to crash, and requiring a reboot. Often a warm reboot is
+sufficient. Fortunately, this happens infrequently, and in rather
+esoteric situations. In particular, avoid having one part of an `awk'
+program using `print' statements explicitly redirected to
+`"/dev/stdout"', while other `print' statements use the default
+standard output, and a calling shell has redirected standard output to
+a file.
+
+ When `gawk' is compiled with the ST version of `gcc' and its usual
+libraries, it will accept both `/' and `\' as path separators. While
+this is convenient, it should be remembered that this removes one,
+technically valid, character (`/') from your file names, and that it
+may create problems for external programs, called via the `system'
+function, which may not support this convention. Whenever it is
+possible that a file created by `gawk' will be used by some other
+program, use only backslashes. Also remember that in `awk',
+backslashes in strings have to be doubled in order to get literal
+backslashes (*note Escape Sequences::).
+
+
+File: gawk.info, Node: Amiga Installation, Next: Bugs, Prev: Atari Installation, Up: Installation
+
+Installing `gawk' on an Amiga
+=============================
+
+ You can install `gawk' on an Amiga system using a Unix emulation
+environment available via anonymous `ftp' from `wuarchive.wustl.edu' in
+the directory `pub/aminet/dev/gcc'. This includes a shell based on
+`pdksh'. The primary component of this environment is a Unix emulation
+library, `ixemul.lib'.
+
+ A more complete distribution for the Amiga is available on the
+FreshFish CD-ROM from:
+
+ CRONUS
+ 1840 E. Warner Road #105-265
+ Tempe, AZ 85284 USA
+ US Toll Free: (800) 804-0833
+ Phone: +1-602-491-0442
+ FAX: +1-602-491-0048
+ Email: `info@ninemoons.com'
+ WWW: `http://www.ninemoons.com'
+ Anonymous `ftp' site: `ftp.ninemoons.com'
+
+ Once you have the distribution, you can configure `gawk' simply by
+running `configure':
+
+ configure -v m68k-cbm-amigados
+
+ Then run `make', and you should be all set! (If these steps do not
+work, please send in a bug report; *note Reporting Problems and Bugs:
+Bugs..)
+
+
+File: gawk.info, Node: Bugs, Next: Other Versions, Prev: Amiga Installation, Up: Installation
+
+Reporting Problems and Bugs
+===========================
+
+ If you have problems with `gawk' or think that you have found a bug,
+please report it to the developers; we cannot promise to do anything
+but we might well want to fix it.
+
+ Before reporting a bug, make sure you have actually found a real bug.
+Carefully reread the documentation and see if it really says you can do
+what you're trying to do. If it's not clear whether you should be able
+to do something or not, report that too; it's a bug in the
+documentation!
+
+ Before reporting a bug or trying to fix it yourself, try to isolate
+it to the smallest possible `awk' program and input data file that
+reproduces the problem. Then send us the program and data file, some
+idea of what kind of Unix system you're using, and the exact results
+`gawk' gave you. Also say what you expected to occur; this will help
+us decide whether the problem was really in the documentation.
+
+ Once you have a precise problem, there are two e-mail addresses you
+can send mail to.
+
+Internet:
+ `bug-gnu-utils@prep.ai.mit.edu'
+
+UUCP:
+ `uunet!prep.ai.mit.edu!bug-gnu-utils'
+
+ Please include the version number of `gawk' you are using. You can
+get this information with the command `gawk --version'. You should
+send a carbon copy of your mail to Arnold Robbins, who can be reached
+at `arnold@gnu.ai.mit.edu'.
+
+ *Important!* Do _not_ try to report bugs in `gawk' by posting to the
+Usenet/Internet newsgroup `comp.lang.awk'. While the `gawk' developers
+do occasionally read this newsgroup, there is no guarantee that we will
+see your posting. The steps described above are the official,
+recognized ways for reporting bugs.
+
+ Non-bug suggestions are always welcome as well. If you have
+questions about things that are unclear in the documentation or are
+just obscure features, ask Arnold Robbins; he will try to help you out,
+although he may not have the time to fix the problem. You can send him
+electronic mail at the Internet address above.
+
+ If you find bugs in one of the non-Unix ports of `gawk', please send
+an electronic mail message to the person who maintains that port. They
+are listed below, and also in the `README' file in the `gawk'
+distribution. Information in the `README' file should be considered
+authoritative if it conflicts with this Info file.
+
+ The people maintaining the non-Unix ports of `gawk' are:
+
+MS-DOS
+ Scott Deifik, `scottd@amgen.com', and Darrel Hankerson,
+ `hankedr@mail.auburn.edu'.
+
+OS/2
+ Kai Uwe Rommel, `rommel@ars.de'.
+
+VMS
+ Pat Rankin, `rankin@eql.caltech.edu'.
+
+Atari ST
+ Michal Jaegermann, `michal@gortel.phys.ualberta.ca'.
+
+Amiga
+ Fred Fish, `fnf@ninemoons.com'.
+
+ If your bug is also reproducible under Unix, please send copies of
+your report to the general GNU bug list, as well as to Arnold Robbins,
+at the addresses listed above.
+
+
+File: gawk.info, Node: Other Versions, Prev: Bugs, Up: Installation
+
+Other Freely Available `awk' Implementations
+============================================
+
+ It's kind of fun to put comments like this in your awk code.
+ `// Do C++ comments work? answer: yes! of course'
+ Michael Brennan
+
+ There are two other freely available `awk' implementations. This
+section briefly describes where to get them.
+
+Unix `awk'
+ Brian Kernighan has been able to make his implementation of `awk'
+ freely available. You can get it via anonymous `ftp' to the host
+ `netlib.att.com'. Change directory to `/netlib/research'. Use
+ "binary" or "image" mode, and retrieve `awk.bundle.Z'.
+
+ This is a shell archive that has been compressed with the
+ `compress' utility. It can be uncompressed with either
+ `uncompress' or the GNU `gunzip' utility.
+
+ This version requires an ANSI C compiler; GCC (the GNU C compiler)
+ works quite nicely.
+
+`mawk'
+ Michael Brennan has written an independent implementation of `awk',
+ called `mawk'. It is available under the GPL (*note GNU GENERAL
+ PUBLIC LICENSE: Copying.), just as `gawk' is.
+
+ You can get it via anonymous `ftp' to the host `ftp.whidbey.net'.
+ Change directory to `/pub/brennan'. Use "binary" or "image" mode,
+ and retrieve `mawk1.3.3.tar.gz' (or the latest version that is
+ there).
+
+ `gunzip' may be used to decompress this file. Installation is
+ similar to `gawk''s (*note Compiling and Installing `gawk' on
+ Unix: Unix Installation.).
+
+
+File: gawk.info, Node: Notes, Next: Glossary, Prev: Installation, Up: Top
+
+Implementation Notes
+********************
+
+ This appendix contains information mainly of interest to
+implementors and maintainers of `gawk'. Everything in it applies
+specifically to `gawk', and not to other implementations.
+
+* Menu:
+
+* Compatibility Mode:: How to disable certain `gawk' extensions.
+* Additions:: Making Additions To `gawk'.
+* Future Extensions:: New features that may be implemented one day.
+* Improvements:: Suggestions for improvements by volunteers.
+
+
+File: gawk.info, Node: Compatibility Mode, Next: Additions, Prev: Notes, Up: Notes
+
+Downward Compatibility and Debugging
+====================================
+
+ *Note Extensions in `gawk' Not in POSIX `awk': POSIX/GNU, for a
+summary of the GNU extensions to the `awk' language and program. All
+of these features can be turned off by invoking `gawk' with the
+`--traditional' option, or with the `--posix' option.
+
+ If `gawk' is compiled for debugging with `-DDEBUG', then there is
+one more option available on the command line:
+
+`-W parsedebug'
+`--parsedebug'
+ Print out the parse stack information as the program is being
+ parsed.
+
+ This option is intended only for serious `gawk' developers, and not
+for the casual user. It probably has not even been compiled into your
+version of `gawk', since it slows down execution.
+
+
+File: gawk.info, Node: Additions, Next: Future Extensions, Prev: Compatibility Mode, Up: Notes
+
+Making Additions to `gawk'
+==========================
+
+ If you should find that you wish to enhance `gawk' in a significant
+fashion, you are perfectly free to do so. That is the point of having
+free software; the source code is available, and you are free to change
+it as you wish (*note GNU GENERAL PUBLIC LICENSE: Copying.).
+
+ This section discusses the ways you might wish to change `gawk', and
+any considerations you should bear in mind.
+
+* Menu:
+
+* Adding Code:: Adding code to the main body of `gawk'.
+* New Ports:: Porting `gawk' to a new operating system.
+
+
+File: gawk.info, Node: Adding Code, Next: New Ports, Prev: Additions, Up: Additions
+
+Adding New Features
+-------------------
+
+ You are free to add any new features you like to `gawk'. However,
+if you want your changes to be incorporated into the `gawk'
+distribution, there are several steps that you need to take in order to
+make it possible for me to include to your changes.
+
+ 1. Get the latest version. It is much easier for me to integrate
+ changes if they are relative to the most recent distributed
+ version of `gawk'. If your version of `gawk' is very old, I may
+ not be able to integrate them at all. *Note Getting the `gawk'
+ Distribution: Getting, for information on getting the latest
+ version of `gawk'.
+
+ 2. See *note (Version)Top:: standards, GNU Coding Standards. This
+ document describes how GNU software should be written. If you
+ haven't read it, please do so, preferably _before_ starting to
+ modify `gawk'. (The `GNU Coding Standards' are available as part
+ of the Autoconf distribution, from the FSF.)
+
+ 3. Use the `gawk' coding style. The C code for `gawk' follows the
+ instructions in the `GNU Coding Standards', with minor exceptions.
+ The code is formatted using the traditional "K&R" style,
+ particularly as regards the placement of braces and the use of
+ tabs. In brief, the coding rules for `gawk' are:
+
+ * Use old style (non-prototype) function headers when defining
+ functions.
+
+ * Put the name of the function at the beginning of its own line.
+
+ * Put the return type of the function, even if it is `int', on
+ the line above the line with the name and arguments of the
+ function.
+
+ * The declarations for the function arguments should not be
+ indented.
+
+ * Put spaces around parentheses used in control structures
+ (`if', `while', `for', `do', `switch' and `return').
+
+ * Do not put spaces in front of parentheses used in function
+ calls.
+
+ * Put spaces around all C operators, and after commas in
+ function calls.
+
+ * Do not use the comma operator to produce multiple
+ side-effects, except in `for' loop initialization and
+ increment parts, and in macro bodies.
+
+ * Use real tabs for indenting, not spaces.
+
+ * Use the "K&R" brace layout style.
+
+ * Use comparisons against `NULL' and `'\0'' in the conditions of
+ `if', `while' and `for' statements, and in the `case's of
+ `switch' statements, instead of just the plain pointer or
+ character value.
+
+ * Use the `TRUE', `FALSE', and `NULL' symbolic constants, and
+ the character constant `'\0'' where appropriate, instead of
+ `1' and `0'.
+
+ * Provide one-line descriptive comments for each function.
+
+ * Do not use `#elif'. Many older Unix C compilers cannot handle
+ it.
+
+ * Do not use the `alloca' function for allocating memory off
+ the stack. Its use causes more portability trouble than the
+ minor benefit of not having to free the storage. Instead, use
+ `malloc' and `free'.
+
+ If I have to reformat your code to follow the coding style used in
+ `gawk', I may not bother.
+
+ 4. Be prepared to sign the appropriate paperwork. In order for the
+ FSF to distribute your changes, you must either place those
+ changes in the public domain, and submit a signed statement to that
+ effect, or assign the copyright in your changes to the FSF. Both
+ of these actions are easy to do, and _many_ people have done so
+ already. If you have questions, please contact me (*note Reporting
+ Problems and Bugs: Bugs.), or `gnu@prep.ai.mit.edu'.
+
+ 5. Update the documentation. Along with your new code, please supply
+ new sections and or chapters for this Info file. If at all
+ possible, please use real Texinfo, instead of just supplying
+ unformatted ASCII text (although even that is better than no
+ documentation at all). Conventions to be followed in `The GNU Awk
+ User's Guide' are provided after the `@bye' at the end of the
+ Texinfo source file. If possible, please update the man page as
+ well.
+
+ You will also have to sign paperwork for your documentation
+ changes.
+
+ 6. Submit changes as context diffs or unified diffs. Use `diff -c -r
+ -N' or `diff -u -r -N' to compare the original `gawk' source tree
+ with your version. (I find context diffs to be more readable, but
+ unified diffs are more compact.) I recommend using the GNU
+ version of `diff'. Send the output produced by either run of
+ `diff' to me when you submit your changes. *Note Reporting
+ Problems and Bugs: Bugs, for the electronic mail information.
+
+ Using this format makes it easy for me to apply your changes to the
+ master version of the `gawk' source code (using `patch'). If I
+ have to apply the changes manually, using a text editor, I may not
+ do so, particularly if there are lots of changes.
+
+ Although this sounds like a lot of work, please remember that while
+you may write the new code, I have to maintain it and support it, and
+if it isn't possible for me to do that with a minimum of extra work,
+then I probably will not.
+
+
+File: gawk.info, Node: New Ports, Prev: Adding Code, Up: Additions
+
+Porting `gawk' to a New Operating System
+----------------------------------------
+
+ If you wish to port `gawk' to a new operating system, there are
+several steps to follow.
+
+ 1. Follow the guidelines in *Note Adding New Features: Adding Code,
+ concerning coding style, submission of diffs, and so on.
+
+ 2. When doing a port, bear in mind that your code must co-exist
+ peacefully with the rest of `gawk', and the other ports. Avoid
+ gratuitous changes to the system-independent parts of the code. If
+ at all possible, avoid sprinkling `#ifdef's just for your port
+ throughout the code.
+
+ If the changes needed for a particular system affect too much of
+ the code, I probably will not accept them. In such a case, you
+ will, of course, be able to distribute your changes on your own,
+ as long as you comply with the GPL (*note GNU GENERAL PUBLIC
+ LICENSE: Copying.).
+
+ 3. A number of the files that come with `gawk' are maintained by other
+ people at the Free Software Foundation. Thus, you should not
+ change them unless it is for a very good reason. I.e. changes are
+ not out of the question, but changes to these files will be
+ scrutinized extra carefully. The files are `alloca.c',
+ `getopt.h', `getopt.c', `getopt1.c', `regex.h', `regex.c', `dfa.h',
+ `dfa.c', `install-sh', and `mkinstalldirs'.
+
+ 4. Be willing to continue to maintain the port. Non-Unix operating
+ systems are supported by volunteers who maintain the code needed
+ to compile and run `gawk' on their systems. If no-one volunteers
+ to maintain a port, that port becomes unsupported, and it may be
+ necessary to remove it from the distribution.
+
+ 5. Supply an appropriate `gawkmisc.???' file. Each port has its own
+ `gawkmisc.???' that implements certain operating system specific
+ functions. This is cleaner than a plethora of `#ifdef's scattered
+ throughout the code. The `gawkmisc.c' in the main source
+ directory includes the appropriate `gawkmisc.???' file from each
+ subdirectory. Be sure to update it as well.
+
+ Each port's `gawkmisc.???' file has a suffix reminiscent of the
+ machine or operating system for the port. For example,
+ `pc/gawkmisc.pc' and `vms/gawkmisc.vms'. The use of separate
+ suffixes, instead of plain `gawkmisc.c', makes it possible to move
+ files from a port's subdirectory into the main subdirectory,
+ without accidentally destroying the real `gawkmisc.c' file.
+ (Currently, this is only an issue for the MS-DOS and OS/2 ports.)
+
+ 6. Supply a `Makefile' and any other C source and header files that
+ are necessary for your operating system. All your code should be
+ in a separate subdirectory, with a name that is the same as, or
+ reminiscent of, either your operating system or the computer
+ system. If possible, try to structure things so that it is not
+ necessary to move files out of the subdirectory into the main
+ source directory. If that is not possible, then be sure to avoid
+ using names for your files that duplicate the names of files in
+ the main source directory.
+
+ 7. Update the documentation. Please write a section (or sections)
+ for this Info file describing the installation and compilation
+ steps needed to install and/or compile `gawk' for your system.
+
+ 8. Be prepared to sign the appropriate paperwork. In order for the
+ FSF to distribute your code, you must either place your code in
+ the public domain, and submit a signed statement to that effect,
+ or assign the copyright in your code to the FSF. Both of these
+ actions are easy to do, and _many_ people have done so already. If
+ you have questions, please contact me, or `gnu@prep.ai.mit.edu'.
+
+ Following these steps will make it much easier to integrate your
+changes into `gawk', and have them co-exist happily with the code for
+other operating systems that is already there.
+
+ In the code that you supply, and that you maintain, feel free to use
+a coding style and brace layout that suits your taste.
+
+
+File: gawk.info, Node: Future Extensions, Next: Improvements, Prev: Additions, Up: Notes
+
+Probable Future Extensions
+==========================
+
+ AWK is a language similar to PERL, only considerably more elegant.
+ Arnold Robbins
+
+ Hey!
+ Larry Wall
+
+ This section briefly lists extensions and possible improvements that
+indicate the directions we are currently considering for `gawk'. The
+file `FUTURES' in the `gawk' distributions lists these extensions as
+well.
+
+ This is a list of probable future changes that will be usable by the
+`awk' language programmer.
+
+Localization
+ The GNU project is starting to support multiple languages. It
+ will at least be possible to make `gawk' print its warnings and
+ error messages in languages other than English. It may be
+ possible for `awk' programs to also use the multiple language
+ facilities, separate from `gawk' itself.
+
+Databases
+ It may be possible to map a GDBM/NDBM/SDBM file into an `awk'
+ array.
+
+A `PROCINFO' Array
+ The special files that provide process-related information (*note
+ Special File Names in `gawk': Special Files.) may be superseded
+ by a `PROCINFO' array that would provide the same information, in
+ an easier to access fashion.
+
+More `lint' warnings
+ There are more things that could be checked for portability.
+
+Control of subprocess environment
+ Changes made in `gawk' to the array `ENVIRON' may be propagated to
+ subprocesses run by `gawk'.
+
+ This is a list of probable improvements that will make `gawk'
+perform better.
+
+An Improved Version of `dfa'
+ The `dfa' pattern matcher from GNU `grep' has some problems.
+ Either a new version or a fixed one will deal with some important
+ regexp matching issues.
+
+Use of GNU `malloc'
+ The GNU version of `malloc' could potentially speed up `gawk',
+ since it relies heavily on the use of dynamic memory allocation.
+
+Use of the `rx' regexp library
+ The `rx' regular expression library could potentially speed up all
+ regexp operations that require knowing the exact location of
+ matches. This includes record termination, field and array
+ splitting, and the `sub', `gsub', `gensub' and `match' functions.
+
+
+File: gawk.info, Node: Improvements, Prev: Future Extensions, Up: Notes
+
+Suggestions for Improvements
+============================
+
+ Here are some projects that would-be `gawk' hackers might like to
+take on. They vary in size from a few days to a few weeks of
+programming, depending on which one you choose and how fast a
+programmer you are. Please send any improvements you write to the
+maintainers at the GNU project. *Note Adding New Features: Adding Code,
+for guidelines to follow when adding new features to `gawk'. *Note
+Reporting Problems and Bugs: Bugs, for information on contacting the
+maintainers.
+
+ 1. Compilation of `awk' programs: `gawk' uses a Bison (YACC-like)
+ parser to convert the script given it into a syntax tree; the
+ syntax tree is then executed by a simple recursive evaluator.
+ This method incurs a lot of overhead, since the recursive
+ evaluator performs many procedure calls to do even the simplest
+ things.
+
+ It should be possible for `gawk' to convert the script's parse tree
+ into a C program which the user would then compile, using the
+ normal C compiler and a special `gawk' library to provide all the
+ needed functions (regexps, fields, associative arrays, type
+ coercion, and so on).
+
+ An easier possibility might be for an intermediate phase of `awk'
+ to convert the parse tree into a linear byte code form like the
+ one used in GNU Emacs Lisp. The recursive evaluator would then be
+ replaced by a straight line byte code interpreter that would be
+ intermediate in speed between running a compiled program and doing
+ what `gawk' does now.
+
+ 2. The programs in the test suite could use documenting in this
+ Info file.
+
+ 3. See the `FUTURES' file for more ideas. Contact us if you would
+ seriously like to tackle any of the items listed there.
+
+
+File: gawk.info, Node: Glossary, Next: Copying, Prev: Notes, Up: Top
+
+Glossary
+********
+
+Action
+ A series of `awk' statements attached to a rule. If the rule's
+ pattern matches an input record, `awk' executes the rule's action.
+ Actions are always enclosed in curly braces. *Note Overview of
+ Actions: Action Overview.
+
+Amazing `awk' Assembler
+ Henry Spencer at the University of Toronto wrote a retargetable
+ assembler completely as `awk' scripts. It is thousands of lines
+ long, including machine descriptions for several eight-bit
+ microcomputers. It is a good example of a program that would have
+ been better written in another language.
+
+Amazingly Workable Formatter (`awf')
+ Henry Spencer at the University of Toronto wrote a formatter that
+ accepts a large subset of the `nroff -ms' and `nroff -man'
+ formatting commands, using `awk' and `sh'.
+
+ANSI
+ The American National Standards Institute. This organization
+ produces many standards, among them the standards for the C and
+ C++ programming languages.
+
+Assignment
+ An `awk' expression that changes the value of some `awk' variable
+ or data object. An object that you can assign to is called an
+ "lvalue". The assigned values are called "rvalues". *Note
+ Assignment Expressions: Assignment Ops.
+
+`awk' Language
+ The language in which `awk' programs are written.
+
+`awk' Program
+ An `awk' program consists of a series of "patterns" and "actions",
+ collectively known as "rules". For each input record given to the
+ program, the program's rules are all processed in turn. `awk'
+ programs may also contain function definitions.
+
+`awk' Script
+ Another name for an `awk' program.
+
+Bash
+ The GNU version of the standard shell (the Bourne-Again shell).
+ See "Bourne Shell."
+
+BBS
+ See "Bulletin Board System."
+
+Boolean Expression
+ Named after the English mathematician Boole. See "Logical
+ Expression."
+
+Bourne Shell
+ The standard shell (`/bin/sh') on Unix and Unix-like systems,
+ originally written by Steven R. Bourne. Many shells (Bash, `ksh',
+ `pdksh', `zsh') are generally upwardly compatible with the Bourne
+ shell.
+
+Built-in Function
+ The `awk' language provides built-in functions that perform various
+ numerical, time stamp related, and string computations. Examples
+ are `sqrt' (for the square root of a number) and `substr' (for a
+ substring of a string). *Note Built-in Functions: Built-in.
+
+Built-in Variable
+ `ARGC', `ARGIND', `ARGV', `CONVFMT', `ENVIRON', `ERRNO',
+ `FIELDWIDTHS', `FILENAME', `FNR', `FS', `IGNORECASE', `NF', `NR',
+ `OFMT', `OFS', `ORS', `RLENGTH', `RSTART', `RS', `RT', and
+ `SUBSEP', are the variables that have special meaning to `awk'.
+ Changing some of them affects `awk''s running environment.
+ Several of these variables are specific to `gawk'. *Note Built-in
+ Variables::.
+
+Braces
+ See "Curly Braces."
+
+Bulletin Board System
+ A computer system allowing users to log in and read and/or leave
+ messages for other users of the system, much like leaving paper
+ notes on a bulletin board.
+
+C
+ The system programming language that most GNU software is written
+ in. The `awk' programming language has C-like syntax, and this
+ Info file points out similarities between `awk' and C when
+ appropriate.
+
+Character Set
+ The set of numeric codes used by a computer system to represent the
+ characters (letters, numbers, punctuation, etc.) of a particular
+ country or place. The most common character set in use today is
+ ASCII (American Standard Code for Information Interchange). Many
+ European countries use an extension of ASCII known as ISO-8859-1
+ (ISO Latin-1).
+
+CHEM
+ A preprocessor for `pic' that reads descriptions of molecules and
+ produces `pic' input for drawing them. It was written in `awk' by
+ Brian Kernighan and Jon Bentley, and is available from
+ `netlib@research.att.com'.
+
+Compound Statement
+ A series of `awk' statements, enclosed in curly braces. Compound
+ statements may be nested. *Note Control Statements in Actions:
+ Statements.
+
+Concatenation
+ Concatenating two strings means sticking them together, one after
+ another, giving a new string. For example, the string `foo'
+ concatenated with the string `bar' gives the string `foobar'.
+ *Note String Concatenation: Concatenation.
+
+Conditional Expression
+ An expression using the `?:' ternary operator, such as `EXPR1 ?
+ EXPR2 : EXPR3'. The expression EXPR1 is evaluated; if the result
+ is true, the value of the whole expression is the value of EXPR2,
+ otherwise the value is EXPR3. In either case, only one of EXPR2
+ and EXPR3 is evaluated. *Note Conditional Expressions:
+ Conditional Exp.
+
+Comparison Expression
+ A relation that is either true or false, such as `(a < b)'.
+ Comparison expressions are used in `if', `while', `do', and `for'
+ statements, and in patterns to select which input records to
+ process. *Note Variable Typing and Comparison Expressions: Typing
+ and Comparison.
+
+Curly Braces
+ The characters `{' and `}'. Curly braces are used in `awk' for
+ delimiting actions, compound statements, and function bodies.
+
+Dark Corner
+ An area in the language where specifications often were (or still
+ are) not clear, leading to unexpected or undesirable behavior.
+ Such areas are marked in this Info file with "(d.c.)" in the text,
+ and are indexed under the heading "dark corner."
+
+Data Objects
+ These are numbers and strings of characters. Numbers are
+ converted into strings and vice versa, as needed. *Note
+ Conversion of Strings and Numbers: Conversion.
+
+Double Precision
+ An internal representation of numbers that can have fractional
+ parts. Double precision numbers keep track of more digits than do
+ single precision numbers, but operations on them are more
+ expensive. This is the way `awk' stores numeric values. It is
+ the C type `double'.
+
+Dynamic Regular Expression
+ A dynamic regular expression is a regular expression written as an
+ ordinary expression. It could be a string constant, such as
+ `"foo"', but it may also be an expression whose value can vary.
+ *Note Using Dynamic Regexps: Computed Regexps.
+
+Environment
+ A collection of strings, of the form NAME`='VAL, that each program
+ has available to it. Users generally place values into the
+ environment in order to provide information to various programs.
+ Typical examples are the environment variables `HOME' and `PATH'.
+
+Empty String
+ See "Null String."
+
+Escape Sequences
+ A special sequence of characters used for describing non-printing
+ characters, such as `\n' for newline, or `\033' for the ASCII ESC
+ (escape) character. *Note Escape Sequences::.
+
+Field
+ When `awk' reads an input record, it splits the record into pieces
+ separated by whitespace (or by a separator regexp which you can
+ change by setting the built-in variable `FS'). Such pieces are
+ called fields. If the pieces are of fixed length, you can use the
+ built-in variable `FIELDWIDTHS' to describe their lengths. *Note
+ Specifying How Fields are Separated: Field Separators, and also see
+ *Note Reading Fixed-width Data: Constant Size.
+
+Floating Point Number
+ Often referred to in mathematical terms as a "rational" number,
+ this is just a number that can have a fractional part. See
+ "Double Precision" and "Single Precision."
+
+Format
+ Format strings are used to control the appearance of output in the
+ `printf' statement. Also, data conversions from numbers to strings
+ are controlled by the format string contained in the built-in
+ variable `CONVFMT'. *Note Format-Control Letters: Control Letters.
+
+Function
+ A specialized group of statements used to encapsulate general or
+ program-specific tasks. `awk' has a number of built-in functions,
+ and also allows you to define your own. *Note Built-in Functions:
+ Built-in, and *Note User-defined Functions: User-defined.
+
+FSF
+ See "Free Software Foundation."
+
+Free Software Foundation
+ A non-profit organization dedicated to the production and
+ distribution of freely distributable software. It was founded by
+ Richard M. Stallman, the author of the original Emacs editor. GNU
+ Emacs is the most widely used version of Emacs today.
+
+`gawk'
+ The GNU implementation of `awk'.
+
+General Public License
+ This document describes the terms under which `gawk' and its source
+ code may be distributed. (*note GNU GENERAL PUBLIC LICENSE:
+ Copying.)
+
+GNU
+ "GNU's not Unix". An on-going project of the Free Software
+ Foundation to create a complete, freely distributable,
+ POSIX-compliant computing environment.
+
+GPL
+ See "General Public License."
+
+Hexadecimal
+ Base 16 notation, where the digits are `0'-`9' and `A'-`F', with
+ `A' representing 10, `B' representing 11, and so on up to `F' for
+ 15. Hexadecimal numbers are written in C using a leading `0x', to
+ indicate their base. Thus, `0x12' is 18 (one times 16 plus 2).
+
+I/O
+ Abbreviation for "Input/Output," the act of moving data into and/or
+ out of a running program.
+
+Input Record
+ A single chunk of data read in by `awk'. Usually, an `awk' input
+ record consists of one line of text. *Note How Input is Split
+ into Records: Records.
+
+Integer
+ A whole number, i.e. a number that does not have a fractional part.
+
+Keyword
+ In the `awk' language, a keyword is a word that has special
+ meaning. Keywords are reserved and may not be used as variable
+ names.
+
+ `gawk''s keywords are: `BEGIN', `END', `if', `else', `while',
+ `do...while', `for', `for...in', `break', `continue', `delete',
+ `next', `nextfile', `function', `func', and `exit'.
+
+Logical Expression
+ An expression using the operators for logic, AND, OR, and NOT,
+ written `&&', `||', and `!' in `awk'. Often called Boolean
+ expressions, after the mathematician who pioneered this kind of
+ mathematical logic.
+
+Lvalue
+ An expression that can appear on the left side of an assignment
+ operator. In most languages, lvalues can be variables or array
+ elements. In `awk', a field designator can also be used as an
+ lvalue.
+
+Null String
+ A string with no characters in it. It is represented explicitly in
+ `awk' programs by placing two double-quote characters next to each
+ other (`""'). It can appear in input data by having two successive
+ occurrences of the field separator appear next to each other.
+
+Number
+ A numeric valued data object. The `gawk' implementation uses
+ double precision floating point to represent numbers. Very old
+ `awk' implementations use single precision floating point.
+
+Octal
+ Base-eight notation, where the digits are `0'-`7'. Octal numbers
+ are written in C using a leading `0', to indicate their base.
+ Thus, `013' is 11 (one times 8 plus 3).
+
+Pattern
+ Patterns tell `awk' which input records are interesting to which
+ rules.
+
+ A pattern is an arbitrary conditional expression against which
+ input is tested. If the condition is satisfied, the pattern is
+ said to "match" the input record. A typical pattern might compare
+ the input record against a regular expression. *Note Pattern
+ Elements: Pattern Overview.
+
+POSIX
+ The name for a series of standards being developed by the IEEE
+ that specify a Portable Operating System interface. The "IX"
+ denotes the Unix heritage of these standards. The main standard
+ of interest for `awk' users is `IEEE Standard for Information
+ Technology, Standard 1003.2-1992, Portable Operating System
+ Interface (POSIX) Part 2: Shell and Utilities'. Informally, this
+ standard is often referred to as simply "P1003.2."
+
+Private
+ Variables and/or functions that are meant for use exclusively by
+ library functions, and not for the main `awk' program. Special
+ care must be taken when naming such variables and functions.
+ *Note Naming Library Function Global Variables: Library Names.
+
+Range (of input lines)
+ A sequence of consecutive lines from the input file. A pattern
+ can specify ranges of input lines for `awk' to process, or it can
+ specify single lines. *Note Pattern Elements: Pattern Overview.
+
+Recursion
+ When a function calls itself, either directly or indirectly. If
+ this isn't clear, refer to the entry for "recursion."
+
+Redirection
+ Redirection means performing input from other than the standard
+ input stream, or output to other than the standard output stream.
+
+ You can redirect the output of the `print' and `printf' statements
+ to a file or a system command, using the `>', `>>', and `|'
+ operators. You can redirect input to the `getline' statement using
+ the `<' and `|' operators. *Note Redirecting Output of `print'
+ and `printf': Redirection, and *Note Explicit Input with
+ `getline': Getline.
+
+Regexp
+ Short for "regular expression". A regexp is a pattern that
+ denotes a set of strings, possibly an infinite set. For example,
+ the regexp `R.*xp' matches any string starting with the letter `R'
+ and ending with the letters `xp'. In `awk', regexps are used in
+ patterns and in conditional expressions. Regexps may contain
+ escape sequences. *Note Regular Expressions: Regexp.
+
+Regular Expression
+ See "regexp."
+
+Regular Expression Constant
+ A regular expression constant is a regular expression written
+ within slashes, such as `/foo/'. This regular expression is chosen
+ when you write the `awk' program, and cannot be changed doing its
+ execution. *Note How to Use Regular Expressions: Regexp Usage.
+
+Rule
+ A segment of an `awk' program that specifies how to process single
+ input records. A rule consists of a "pattern" and an "action".
+ `awk' reads an input record; then, for each rule, if the input
+ record satisfies the rule's pattern, `awk' executes the rule's
+ action. Otherwise, the rule does nothing for that input record.
+
+Rvalue
+ A value that can appear on the right side of an assignment
+ operator. In `awk', essentially every expression has a value.
+ These values are rvalues.
+
+`sed'
+ See "Stream Editor."
+
+Short-Circuit
+ The nature of the `awk' logical operators `&&' and `||'. If the
+ value of the entire expression can be deduced from evaluating just
+ the left-hand side of these operators, the right-hand side will not
+ be evaluated (*note Boolean Expressions: Boolean Ops.).
+
+Side Effect
+ A side effect occurs when an expression has an effect aside from
+ merely producing a value. Assignment expressions, increment and
+ decrement expressions and function calls have side effects. *Note
+ Assignment Expressions: Assignment Ops.
+
+Single Precision
+ An internal representation of numbers that can have fractional
+ parts. Single precision numbers keep track of fewer digits than
+ do double precision numbers, but operations on them are less
+ expensive in terms of CPU time. This is the type used by some
+ very old versions of `awk' to store numeric values. It is the C
+ type `float'.
+
+Space
+ The character generated by hitting the space bar on the keyboard.
+
+Special File
+ A file name interpreted internally by `gawk', instead of being
+ handed directly to the underlying operating system. For example,
+ `/dev/stderr'. *Note Special File Names in `gawk': Special Files.
+
+Stream Editor
+ A program that reads records from an input stream and processes
+ them one or more at a time. This is in contrast with batch
+ programs, which may expect to read their input files in entirety
+ before starting to do anything, and with interactive programs,
+ which require input from the user.
+
+String
+ A datum consisting of a sequence of characters, such as `I am a
+ string'. Constant strings are written with double-quotes in the
+ `awk' language, and may contain escape sequences. *Note Escape
+ Sequences::.
+
+Tab
+ The character generated by hitting the `TAB' key on the keyboard.
+ It usually expands to up to eight spaces upon output.
+
+Unix
+ A computer operating system originally developed in the early
+ 1970's at AT&T Bell Laboratories. It initially became popular in
+ universities around the world, and later moved into commercial
+ evnironments as a software development system and network server
+ system. There are many commercial versions of Unix, as well as
+ several work-alike systems whose source code is freely available
+ (such as Linux, NetBSD, and FreeBSD).
+
+Whitespace
+ A sequence of space, tab, or newline characters occurring inside
+ an input record or a string.
+
+
+File: gawk.info, Node: Copying, Next: Index, Prev: Glossary, Up: Top
+
+GNU GENERAL PUBLIC LICENSE
+**************************
+
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+ 59 Temple Place --- Suite 330, Boston, MA 02111-1307, USA
+
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+Preamble
+========
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it in
+new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software,
+and (2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains a
+ notice placed by the copyright holder saying it may be distributed
+ under the terms of this General Public License. The "Program",
+ below, refers to any such program or work, and a "work based on
+ the Program" means either the Program or any derivative work under
+ copyright law: that is to say, a work containing the Program or a
+ portion of it, either verbatim or with modifications and/or
+ translated into another language. (Hereinafter, translation is
+ included without limitation in the term "modification".) Each
+ licensee is addressed as "you".
+
+ Activities other than copying, distribution and modification are
+ not covered by this License; they are outside its scope. The act
+ of running the Program is not restricted, and the output from the
+ Program is covered only if its contents constitute a work based on
+ the Program (independent of having been made by running the
+ Program). Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+ source code as you receive it, in any medium, provided that you
+ conspicuously and appropriately publish on each copy an appropriate
+ copyright notice and disclaimer of warranty; keep intact all the
+ notices that refer to this License and to the absence of any
+ warranty; and give any other recipients of the Program a copy of
+ this License along with the Program.
+
+ You may charge a fee for the physical act of transferring a copy,
+ and you may at your option offer warranty protection in exchange
+ for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+ of it, thus forming a work based on the Program, and copy and
+ distribute such modifications or work under the terms of Section 1
+ above, provided that you also meet all of these conditions:
+
+ a. You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b. You must cause any work that you distribute or publish, that
+ in whole or in part contains or is derived from the Program
+ or any part thereof, to be licensed as a whole at no charge
+ to all third parties under the terms of this License.
+
+ c. If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display
+ an announcement including an appropriate copyright notice and
+ a notice that there is no warranty (or else, saying that you
+ provide a warranty) and that users may redistribute the
+ program under these conditions, and telling the user how to
+ view a copy of this License. (Exception: if the Program
+ itself is interactive but does not normally print such an
+ announcement, your work based on the Program is not required
+ to print an announcement.)
+
+ These requirements apply to the modified work as a whole. If
+ identifiable sections of that work are not derived from the
+ Program, and can be reasonably considered independent and separate
+ works in themselves, then this License, and its terms, do not
+ apply to those sections when you distribute them as separate
+ works. But when you distribute the same sections as part of a
+ whole which is a work based on the Program, the distribution of
+ the whole must be on the terms of this License, whose permissions
+ for other licensees extend to the entire whole, and thus to each
+ and every part regardless of who wrote it.
+
+ Thus, it is not the intent of this section to claim rights or
+ contest your rights to work written entirely by you; rather, the
+ intent is to exercise the right to control the distribution of
+ derivative or collective works based on the Program.
+
+ In addition, mere aggregation of another work not based on the
+ Program with the Program (or with a work based on the Program) on
+ a volume of a storage or distribution medium does not bring the
+ other work under the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+ under Section 2) in object code or executable form under the terms
+ of Sections 1 and 2 above provided that you also do one of the
+ following:
+
+ a. Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Sections 1 and 2 above on a medium customarily used for
+ software interchange; or,
+
+ b. Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a
+ medium customarily used for software interchange; or,
+
+ c. Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for non-commercial distribution and only if you
+ received the program in object code or executable form with
+ such an offer, in accord with Subsection b above.)
+
+ The source code for a work means the preferred form of the work for
+ making modifications to it. For an executable work, complete
+ source code means all the source code for all modules it contains,
+ plus any associated interface definition files, plus the scripts
+ used to control compilation and installation of the executable.
+ However, as a special exception, the source code distributed need
+ not include anything that is normally distributed (in either
+ source or binary form) with the major components (compiler,
+ kernel, and so on) of the operating system on which the executable
+ runs, unless that component itself accompanies the executable.
+
+ If distribution of executable or object code is made by offering
+ access to copy from a designated place, then offering equivalent
+ access to copy the source code from the same place counts as
+ distribution of the source code, even though third parties are not
+ compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+ except as expressly provided under this License. Any attempt
+ otherwise to copy, modify, sublicense or distribute the Program is
+ void, and will automatically terminate your rights under this
+ License. However, parties who have received copies, or rights,
+ from you under this License will not have their licenses
+ terminated so long as such parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+ signed it. However, nothing else grants you permission to modify
+ or distribute the Program or its derivative works. These actions
+ are prohibited by law if you do not accept this License.
+ Therefore, by modifying or distributing the Program (or any work
+ based on the Program), you indicate your acceptance of this
+ License to do so, and all its terms and conditions for copying,
+ distributing or modifying the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+ Program), the recipient automatically receives a license from the
+ original licensor to copy, distribute or modify the Program
+ subject to these terms and conditions. You may not impose any
+ further restrictions on the recipients' exercise of the rights
+ granted herein. You are not responsible for enforcing compliance
+ by third parties to this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+ infringement or for any other reason (not limited to patent
+ issues), conditions are imposed on you (whether by court order,
+ agreement or otherwise) that contradict the conditions of this
+ License, they do not excuse you from the conditions of this
+ License. If you cannot distribute so as to satisfy simultaneously
+ your obligations under this License and any other pertinent
+ obligations, then as a consequence you may not distribute the
+ Program at all. For example, if a patent license would not permit
+ royalty-free redistribution of the Program by all those who
+ receive copies directly or indirectly through you, then the only
+ way you could satisfy both it and this License would be to refrain
+ entirely from distribution of the Program.
+
+ If any portion of this section is held invalid or unenforceable
+ under any particular circumstance, the balance of the section is
+ intended to apply and the section as a whole is intended to apply
+ in other circumstances.
+
+ It is not the purpose of this section to induce you to infringe any
+ patents or other property right claims or to contest validity of
+ any such claims; this section has the sole purpose of protecting
+ the integrity of the free software distribution system, which is
+ implemented by public license practices. Many people have made
+ generous contributions to the wide range of software distributed
+ through that system in reliance on consistent application of that
+ system; it is up to the author/donor to decide if he or she is
+ willing to distribute software through any other system and a
+ licensee cannot impose that choice.
+
+ This section is intended to make thoroughly clear what is believed
+ to be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+ certain countries either by patents or by copyrighted interfaces,
+ the original copyright holder who places the Program under this
+ License may add an explicit geographical distribution limitation
+ excluding those countries, so that distribution is permitted only
+ in or among countries not thus excluded. In such case, this
+ License incorporates the limitation as if written in the body of
+ this License.
+
+ 9. The Free Software Foundation may publish revised and/or new
+ versions of the General Public License from time to time. Such
+ new versions will be similar in spirit to the present version, but
+ may differ in detail to address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+ Program specifies a version number of this License which applies
+ to it and "any later version", you have the option of following
+ the terms and conditions either of that version or of any later
+ version published by the Free Software Foundation. If the Program
+ does not specify a version number of this License, you may choose
+ any version ever published by the Free Software Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+ programs whose distribution conditions are different, write to the
+ author to ask for permission. For software which is copyrighted
+ by the Free Software Foundation, write to the Free Software
+ Foundation; we sometimes make exceptions for this. Our decision
+ will be guided by the two goals of preserving the free status of
+ all derivatives of our free software and of promoting the sharing
+ and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
+ WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
+ LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
+ WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
+ NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
+ QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+ PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
+ SERVICING, REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+ WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
+ MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
+ LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
+ INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
+ INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
+ OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
+ OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+How to Apply These Terms to Your New Programs
+=============================================
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these
+terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ ONE LINE TO GIVE THE PROGRAM'S NAME AND AN IDEA OF WHAT IT DOES.
+ Copyright (C) 19YY NAME OF AUTHOR
+
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License
+ as published by the Free Software Foundation; either version 2
+ of the License, or (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place --- Suite 330, Boston, MA 02111-1307, USA.
+
+ Also add information on how to contact you by electronic and paper
+mail.
+
+ If the program is interactive, make it output a short notice like
+this when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
+ type `show w'. This is free software, and you are welcome
+ to redistribute it under certain conditions; type `show c'
+ for details.
+
+ The hypothetical commands `show w' and `show c' should show the
+appropriate parts of the General Public License. Of course, the
+commands you use may be called something other than `show w' and `show
+c'; they could even be mouse-clicks or menu items--whatever suits your
+program.
+
+ You should also get your employer (if you work as a programmer) or
+your school, if any, to sign a "copyright disclaimer" for the program,
+if necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright
+ interest in the program `Gnomovision'
+ (which makes passes at compilers) written
+ by James Hacker.
+
+ SIGNATURE OF TY COON, 1 April 1989
+ Ty Coon, President of Vice
+
+ This General Public License does not permit incorporating your
+program into proprietary programs. If your program is a subroutine
+library, you may consider it more useful to permit linking proprietary
+applications with the library. If this is what you want to do, use the
+GNU Library General Public License instead of this License.
+
+
+File: gawk.info, Node: Index, Prev: Copying, Up: Top
+
+Index
+*****
+
+* Menu:
+
+* ! operator: Boolean Ops.
+* != operator: Typing and Comparison.
+* !~ operator <1>: Typing and Comparison.
+* !~ operator <2>: Regexp Constants.
+* !~ operator <3>: Computed Regexps.
+* !~ operator <4>: Case-sensitivity.
+* !~ operator: Regexp Usage.
+* # (comment): Comments.
+* #! (executable scripts): Executable Scripts.
+* $ (field operator): Fields.
+* && operator: Boolean Ops.
+* --assign option: Options.
+* --compat option: Options.
+* --copyleft option: Options.
+* --copyright option: Options.
+* --field-separator option: Options.
+* --file option: Options.
+* --help option: Options.
+* --lint option: Options.
+* --lint-old option: Options.
+* --posix option: Options.
+* --source option: Options.
+* --traditional option: Options.
+* --usage option: Options.
+* --version option: Options.
+* -f option: Options.
+* -F option <1>: Options.
+* -F option: Command Line Field Separator.
+* -f option: Long.
+* -v option: Options.
+* -W option: Options.
+* /dev/fd: Special Files.
+* /dev/pgrpid: Special Files.
+* /dev/pid: Special Files.
+* /dev/ppid: Special Files.
+* /dev/stderr: Special Files.
+* /dev/stdin: Special Files.
+* /dev/stdout: Special Files.
+* /dev/user <1>: Passwd Functions.
+* /dev/user: Special Files.
+* < operator: Typing and Comparison.
+* <= operator: Typing and Comparison.
+* == operator: Typing and Comparison.
+* > operator: Typing and Comparison.
+* >= operator: Typing and Comparison.
+* \' regexp operator: GNU Regexp Operators.
+* \< regexp operator: GNU Regexp Operators.
+* \> regexp operator: GNU Regexp Operators.
+* \` regexp operator: GNU Regexp Operators.
+* \B regexp operator: GNU Regexp Operators.
+* \W regexp operator: GNU Regexp Operators.
+* \w regexp operator: GNU Regexp Operators.
+* \y regexp operator: GNU Regexp Operators.
+* _gr_init: Group Functions.
+* _pw_init: Passwd Functions.
+* _tm_addup: Mktime Function.
+* _tm_isleap: Mktime Function.
+* accessing fields: Fields.
+* account information <1>: Group Functions.
+* account information: Passwd Functions.
+* acronym: History.
+* action, curly braces: Action Overview.
+* action, default: Very Simple.
+* action, definition of: Action Overview.
+* action, empty: Very Simple.
+* action, separating statements: Action Overview.
+* adding new features: Adding Code.
+* addition: Arithmetic Ops.
+* Aho, Alfred: History.
+* AI programming, using gawk: Distribution contents.
+* alarm.awk: Alarm Program.
+* amiga: Amiga Installation.
+* anchors in regexps: Regexp Operators.
+* and operator: Boolean Ops.
+* anonymous ftp <1>: Other Versions.
+* anonymous ftp: Getting.
+* applications of awk: When.
+* ARGC: Auto-set.
+* ARGIND <1>: Other Arguments.
+* ARGIND: Auto-set.
+* argument processing: Getopt Function.
+* arguments in function call: Function Calls.
+* arguments, command line: Invoking Gawk.
+* ARGV <1>: Other Arguments.
+* ARGV: Auto-set.
+* arithmetic operators: Arithmetic Ops.
+* array assignment: Assigning Elements.
+* array reference: Reference to Elements.
+* array subscripts, uninitialized variables: Uninitialized Subscripts.
+* arrays: Array Intro.
+* arrays, associative: Array Intro.
+* arrays, definition of: Array Intro.
+* arrays, deleting an element: Delete.
+* arrays, deleting entire contents: Delete.
+* arrays, multi-dimensional subscripts: Multi-dimensional.
+* arrays, presence of elements: Reference to Elements.
+* arrays, sparse: Array Intro.
+* arrays, special for statement: Scanning an Array.
+* arrays, the in operator: Reference to Elements.
+* artificial intelligence, using gawk: Distribution contents.
+* ASCII: Ordinal Functions.
+* assert: Assert Function.
+* assert, C version: Assert Function.
+* assertions: Assert Function.
+* assignment operators: Assignment Ops.
+* assignment to fields: Changing Fields.
+* associative arrays: Array Intro.
+* atan2: Numeric Functions.
+* atari: Atari Installation.
+* automatic initialization: More Complex.
+* awk language, POSIX version <1>: Definition Syntax.
+* awk language, POSIX version <2>: String Functions.
+* awk language, POSIX version <3>: User-modified.
+* awk language, POSIX version <4>: Next Statement.
+* awk language, POSIX version <5>: Continue Statement.
+* awk language, POSIX version <6>: Break Statement.
+* awk language, POSIX version <7>: Precedence.
+* awk language, POSIX version <8>: Assignment Ops.
+* awk language, POSIX version <9>: Arithmetic Ops.
+* awk language, POSIX version <10>: Conversion.
+* awk language, POSIX version <11>: Format Modifiers.
+* awk language, POSIX version <12>: OFMT.
+* awk language, POSIX version <13>: Field Splitting Summary.
+* awk language, POSIX version <14>: Regexp Operators.
+* awk language, POSIX version: Escape Sequences.
+* awk language, V.4 version <1>: SVR4.
+* awk language, V.4 version: Escape Sequences.
+* AWKPATH environment variable: AWKPATH Variable.
+* awksed: Simple Sed.
+* backslash continuation <1>: Egrep Program.
+* backslash continuation: Statements/Lines.
+* backslash continuation and comments: Statements/Lines.
+* backslash continuation in csh <1>: Statements/Lines.
+* backslash continuation in csh: More Complex.
+* basic function of awk: Getting Started.
+* BBS-list file: Sample Data Files.
+* BEGIN special pattern: BEGIN/END.
+* beginfile: Filetrans Function.
+* body of a loop: While Statement.
+* book, using this: This Manual.
+* boolean expressions: Boolean Ops.
+* boolean operators: Boolean Ops.
+* break statement: Break Statement.
+* break, outside of loops: Break Statement.
+* Brennan, Michael <1>: Other Versions.
+* Brennan, Michael <2>: Simple Sed.
+* Brennan, Michael: Delete.
+* buffer matching operators: GNU Regexp Operators.
+* buffering output: I/O Functions.
+* buffering, interactive vs. non-interactive: I/O Functions.
+* buffering, non-interactive vs. interactive: I/O Functions.
+* buffers, flushing: I/O Functions.
+* bugs, known in gawk: Known Bugs.
+* built-in functions: Built-in.
+* built-in variables: Built-in Variables.
+* built-in variables, convey information: Auto-set.
+* built-in variables, user modifiable: User-modified.
+* call by reference: Function Caveats.
+* call by value: Function Caveats.
+* calling a function <1>: Function Caveats.
+* calling a function: Function Calls.
+* case conversion: String Functions.
+* case sensitivity: Case-sensitivity.
+* changing contents of a field: Changing Fields.
+* changing the record separator: Records.
+* character classes: Regexp Operators.
+* character encodings: Ordinal Functions.
+* character list: Regexp Operators.
+* character list, complemented: Regexp Operators.
+* character sets: Ordinal Functions.
+* chr: Ordinal Functions.
+* close <1>: I/O Functions.
+* close: Close Files And Pipes.
+* closing input files and pipes: Close Files And Pipes.
+* closing output files and pipes: Close Files And Pipes.
+* coding style used in gawk: Adding Code.
+* collating elements: Regexp Operators.
+* collating symbols: Regexp Operators.
+* command line: Invoking Gawk.
+* command line formats: Running gawk.
+* command line, setting FS on: Command Line Field Separator.
+* comments: Comments.
+* comments and backslash continuation: Statements/Lines.
+* common mistakes <1>: Typing and Comparison.
+* common mistakes <2>: Print Examples.
+* common mistakes <3>: Basic Field Splitting.
+* common mistakes: Computed Regexps.
+* comp.lang.awk: Bugs.
+* comparison expressions: Typing and Comparison.
+* comparisons, string vs. regexp: Typing and Comparison.
+* compatibility mode <1>: POSIX/GNU.
+* compatibility mode: Options.
+* complemented character list: Regexp Operators.
+* compound statement: Statements.
+* computed regular expressions: Computed Regexps.
+* concatenation: Concatenation.
+* conditional expression: Conditional Exp.
+* configuring gawk: Configuration Philosophy.
+* constants, types of: Constants.
+* continuation of lines: Statements/Lines.
+* continue statement: Continue Statement.
+* continue, outside of loops: Continue Statement.
+* control statement: Statements.
+* conversion of case: String Functions.
+* conversion of strings and numbers: Conversion.
+* conversions, during subscripting: Numeric Array Subscripts.
+* converting dates to timestamps: Mktime Function.
+* CONVFMT <1>: Numeric Array Subscripts.
+* CONVFMT <2>: User-modified.
+* CONVFMT: Conversion.
+* cos: Numeric Functions.
+* csh, backslash continuation <1>: Statements/Lines.
+* csh, backslash continuation: More Complex.
+* curly braces: Action Overview.
+* custom.h configuration file: Configuration Philosophy.
+* cut utility: Cut Program.
+* cut.awk: Cut Program.
+* d.c., see "dark corner": This Manual.
+* dark corner <1>: Other Arguments.
+* dark corner <2>: Invoking Gawk.
+* dark corner <3>: String Functions.
+* dark corner <4>: Uninitialized Subscripts.
+* dark corner <5>: Auto-set.
+* dark corner <6>: Exit Statement.
+* dark corner <7>: Continue Statement.
+* dark corner <8>: Break Statement.
+* dark corner <9>: Using BEGIN/END.
+* dark corner <10>: Truth Values.
+* dark corner <11>: Conversion.
+* dark corner <12>: Assignment Options.
+* dark corner <13>: Using Constant Regexps.
+* dark corner <14>: Format Modifiers.
+* dark corner <15>: Control Letters.
+* dark corner <16>: OFMT.
+* dark corner <17>: Getline Summary.
+* dark corner <18>: Plain Getline.
+* dark corner <19>: Multiple Line.
+* dark corner <20>: Field Splitting Summary.
+* dark corner <21>: Single Character Fields.
+* dark corner <22>: Records.
+* dark corner <23>: Escape Sequences.
+* dark corner: This Manual.
+* data-driven languages: Getting Started.
+* dates, converting to timestamps: Mktime Function.
+* decrement operators: Increment Ops.
+* default action: Very Simple.
+* default pattern: Very Simple.
+* defining functions: Definition Syntax.
+* Deifik, Scott <1>: Bugs.
+* Deifik, Scott: Acknowledgements.
+* delete statement: Delete.
+* deleting elements of arrays: Delete.
+* deleting entire arrays: Delete.
+* deprecated features: Obsolete.
+* deprecated options: Obsolete.
+* differences between gawk and awk <1>: AWKPATH Variable.
+* differences between gawk and awk <2>: String Functions.
+* differences between gawk and awk <3>: Calling Built-in.
+* differences between gawk and awk <4>: Delete.
+* differences between gawk and awk <5>: Nextfile Statement.
+* differences between gawk and awk <6>: I/O And BEGIN/END.
+* differences between gawk and awk <7>: Conditional Exp.
+* differences between gawk and awk <8>: Arithmetic Ops.
+* differences between gawk and awk <9>: Using Constant Regexps.
+* differences between gawk and awk <10>: Scalar Constants.
+* differences between gawk and awk <11>: Close Files And Pipes.
+* differences between gawk and awk <12>: Special Files.
+* differences between gawk and awk <13>: Redirection.
+* differences between gawk and awk <14>: Getline Summary.
+* differences between gawk and awk <15>: Getline Intro.
+* differences between gawk and awk <16>: Single Character Fields.
+* differences between gawk and awk <17>: Records.
+* differences between gawk and awk: Case-sensitivity.
+* directory search: AWKPATH Variable.
+* division: Arithmetic Ops.
+* documenting awk programs <1>: Library Names.
+* documenting awk programs: Comments.
+* dupword.awk: Dupword Program.
+* dynamic regular expressions: Computed Regexps.
+* EBCDIC: Ordinal Functions.
+* egrep <1>: Regexp Operators.
+* egrep: One-shot.
+* egrep utility: Egrep Program.
+* egrep.awk: Egrep Program.
+* element assignment: Assigning Elements.
+* element of array: Reference to Elements.
+* empty action: Very Simple.
+* empty pattern: Empty.
+* empty program: Invoking Gawk.
+* empty string <1>: Truth Values.
+* empty string <2>: Conversion.
+* empty string <3>: Regexp Field Splitting.
+* empty string: Records.
+* END special pattern: BEGIN/END.
+* endfile: Filetrans Function.
+* endgrent: Group Functions.
+* endpwent: Passwd Functions.
+* ENVIRON: Auto-set.
+* environment variable, AWKPATH: AWKPATH Variable.
+* environment variable, POSIXLY_CORRECT: Options.
+* equivalence classes: Regexp Operators.
+* ERRNO <1>: Auto-set.
+* ERRNO <2>: Close Files And Pipes.
+* ERRNO: Getline Intro.
+* errors, common <1>: Typing and Comparison.
+* errors, common <2>: Print Examples.
+* errors, common <3>: Basic Field Splitting.
+* errors, common: Computed Regexps.
+* escape processing, sub et. al.: String Functions.
+* escape sequence notation: Escape Sequences.
+* evaluation, order of: Calling Built-in.
+* examining fields: Fields.
+* executable scripts: Executable Scripts.
+* exit statement: Exit Statement.
+* exp: Numeric Functions.
+* explicit input: Getline.
+* exponentiation: Arithmetic Ops.
+* expression: Expressions.
+* expression, assignment: Assignment Ops.
+* expression, boolean: Boolean Ops.
+* expression, comparison: Typing and Comparison.
+* expression, conditional: Conditional Exp.
+* expression, matching: Typing and Comparison.
+* extract.awk: Extract Program.
+* features, adding: Adding Code.
+* fflush: I/O Functions.
+* field operator $: Fields.
+* field separator, choice of: Basic Field Splitting.
+* field separator, FS: Basic Field Splitting.
+* field separator, on command line: Command Line Field Separator.
+* field, changing contents of: Changing Fields.
+* fields: Fields.
+* fields, separating: Basic Field Splitting.
+* FIELDWIDTHS: User-modified.
+* file descriptors: Special Files.
+* file, awk program: Long.
+* FILENAME <1>: Auto-set.
+* FILENAME <2>: Getline Summary.
+* FILENAME: Reading Files.
+* FILENAME, being set by getline: Getline Summary.
+* Fish, Fred: Bugs.
+* flushing buffers: I/O Functions.
+* FNR <1>: Auto-set.
+* FNR: Records.
+* for (x in ...): Scanning an Array.
+* for statement: For Statement.
+* format specifier: Control Letters.
+* format string: Basic Printf.
+* format, numeric output: OFMT.
+* formatted output: Printf.
+* formatted timestamps: Gettimeofday Function.
+* Free Software Foundation <1>: Getting.
+* Free Software Foundation: Manual History.
+* FreeBSD: Manual History.
+* Friedl, Jeffrey: Acknowledgements.
+* FS <1>: User-modified.
+* FS: Basic Field Splitting.
+* ftp, anonymous <1>: Other Versions.
+* ftp, anonymous: Getting.
+* function call <1>: Function Caveats.
+* function call: Function Calls.
+* function definition: Definition Syntax.
+* function, recursive: Definition Syntax.
+* functions, undefined: Function Caveats.
+* functions, user-defined: User-defined.
+* gawk coding style: Adding Code.
+* gensub: String Functions.
+* getgrent: Group Functions.
+* getgrent, C version: Group Functions.
+* getgrgid: Group Functions.
+* getgrnam: Group Functions.
+* getgruser: Group Functions.
+* getline: Getline.
+* getline, return values: Getline Intro.
+* getline, setting FILENAME: Getline Summary.
+* getopt: Getopt Function.
+* getopt, C version: Getopt Function.
+* getpwent: Passwd Functions.
+* getpwent, C version: Passwd Functions.
+* getpwnam: Passwd Functions.
+* getpwuid: Passwd Functions.
+* gettimeofday: Gettimeofday Function.
+* getting gawk: Getting.
+* GNU Project: Manual History.
+* grcat program: Group Functions.
+* grcat.c: Group Functions.
+* group file: Group Functions.
+* group information: Group Functions.
+* gsub: String Functions.
+* gsub, third argument of: String Functions.
+* Hankerson, Darrel <1>: Bugs.
+* Hankerson, Darrel: Acknowledgements.
+* historical features <1>: Historical Features.
+* historical features <2>: String Functions.
+* historical features <3>: Continue Statement.
+* historical features <4>: Break Statement.
+* historical features: Command Line Field Separator.
+* history of awk: History.
+* histsort.awk: History Sorting.
+* how awk works: Two Rules.
+* Hughes, Phil: Acknowledgements.
+* I/O from BEGIN and END: I/O And BEGIN/END.
+* id utility: Id Program.
+* id.awk: Id Program.
+* if-else statement: If Statement.
+* igawk.sh: Igawk Program.
+* IGNORECASE <1>: User-modified.
+* IGNORECASE: Case-sensitivity.
+* ignoring case: Case-sensitivity.
+* implementation limits <1>: Redirection.
+* implementation limits: Getline Summary.
+* in operator: Typing and Comparison.
+* increment operators: Increment Ops.
+* index: String Functions.
+* initialization, automatic: More Complex.
+* input: Reading Files.
+* input file, sample: Sample Data Files.
+* input files, skipping: Nextfile Function.
+* input pipeline: Getline/Pipe.
+* input redirection: Getline/File.
+* input, explicit: Getline.
+* input, getline command: Getline.
+* input, multiple line records: Multiple Line.
+* input, standard: Read Terminal.
+* installation, amiga: Amiga Installation.
+* installation, atari: Atari Installation.
+* installation, MS-DOS and OS/2: PC Installation.
+* installation, unix: Quick Installation.
+* installation, vms: VMS Installation.
+* int: Numeric Functions.
+* interaction, awk and other programs: I/O Functions.
+* interactive buffering vs. non-interactive: I/O Functions.
+* interval expressions: Regexp Operators.
+* inventory-shipped file: Sample Data Files.
+* invocation of gawk: Invoking Gawk.
+* ISO 8601: Time Functions.
+* ISO 8859-1 <1>: Glossary.
+* ISO 8859-1: Case-sensitivity.
+* ISO Latin-1 <1>: Glossary.
+* ISO Latin-1: Case-sensitivity.
+* Jaegermann, Michal <1>: Bugs.
+* Jaegermann, Michal: Acknowledgements.
+* join: Join Function.
+* Kernighan, Brian <1>: Other Versions.
+* Kernighan, Brian <2>: BTL.
+* Kernighan, Brian <3>: Acknowledgements.
+* Kernighan, Brian: History.
+* known bugs: Known Bugs.
+* labels.awk: Labels Program.
+* language, awk: This Manual.
+* language, data-driven: Getting Started.
+* language, procedural: Getting Started.
+* leftmost longest match <1>: Multiple Line.
+* leftmost longest match: Leftmost Longest.
+* length: String Functions.
+* limitations <1>: Redirection.
+* limitations: Getline Summary.
+* line break: Statements/Lines.
+* line continuation <1>: Conditional Exp.
+* line continuation <2>: Boolean Ops.
+* line continuation <3>: Print Examples.
+* line continuation: Statements/Lines.
+* Linux <1>: Atari Compiling.
+* Linux: Manual History.
+* locale, definition of: Time Functions.
+* log: Numeric Functions.
+* logical false: Truth Values.
+* logical operations: Boolean Ops.
+* logical true: Truth Values.
+* login information: Passwd Functions.
+* long options: Invoking Gawk.
+* loop: While Statement.
+* loops, exiting: Break Statement.
+* lvalue: Assignment Ops.
+* mark parity: Ordinal Functions.
+* match: String Functions.
+* matching ranges of lines: Ranges.
+* matching, leftmost longest <1>: Multiple Line.
+* matching, leftmost longest: Leftmost Longest.
+* mawk: Other Versions.
+* merging strings: Join Function.
+* metacharacters: Regexp Operators.
+* mistakes, common <1>: Typing and Comparison.
+* mistakes, common <2>: Print Examples.
+* mistakes, common <3>: Basic Field Splitting.
+* mistakes, common: Computed Regexps.
+* mktime: Mktime Function.
+* modifiers (in format specifiers): Format Modifiers.
+* multi-dimensional subscripts: Multi-dimensional.
+* multiple line records: Multiple Line.
+* multiple passes over data: Other Arguments.
+* multiple statements on one line: Statements/Lines.
+* multiplication: Arithmetic Ops.
+* names, use of: Definition Syntax.
+* namespace issues in awk: Library Names.
+* namespaces: Definition Syntax.
+* NetBSD: Manual History.
+* new awk: History.
+* new awk vs. old awk: Names.
+* newline: Statements/Lines.
+* next file statement: Nextfile Statement.
+* next statement: Next Statement.
+* next, inside a user-defined function: Next Statement.
+* nextfile function: Nextfile Function.
+* nextfile statement: Nextfile Statement.
+* NF <1>: Auto-set.
+* NF: Fields.
+* non-interactive buffering vs. interactive: I/O Functions.
+* not operator: Boolean Ops.
+* NR <1>: Auto-set.
+* NR: Records.
+* null string <1>: Truth Values.
+* null string <2>: Conversion.
+* null string: Regexp Field Splitting.
+* null string, as array subscript: Uninitialized Subscripts.
+* number of fields, NF: Fields.
+* number of records, NR, FNR: Records.
+* numbers, used as subscripts: Numeric Array Subscripts.
+* numeric character values: Ordinal Functions.
+* numeric constant: Scalar Constants.
+* numeric output format: OFMT.
+* numeric string: Typing and Comparison.
+* numeric value: Scalar Constants.
+* obsolete features: Obsolete.
+* obsolete options: Obsolete.
+* OFMT <1>: User-modified.
+* OFMT <2>: Conversion.
+* OFMT: OFMT.
+* OFS <1>: User-modified.
+* OFS: Output Separators.
+* old awk: History.
+* old awk vs. new awk: Names.
+* one-liners: One-liners.
+* operations, logical: Boolean Ops.
+* operator precedence: Precedence.
+* operators, arithmetic: Arithmetic Ops.
+* operators, assignment: Assignment Ops.
+* operators, boolean: Boolean Ops.
+* operators, decrement: Increment Ops.
+* operators, increment: Increment Ops.
+* operators, regexp matching: Regexp Usage.
+* operators, relational: Typing and Comparison.
+* operators, short-circuit: Boolean Ops.
+* operators, string: Concatenation.
+* operators, string-matching: Regexp Usage.
+* options, command line: Invoking Gawk.
+* options, long: Invoking Gawk.
+* or operator: Boolean Ops.
+* ord: Ordinal Functions.
+* order of evaluation: Calling Built-in.
+* ORS <1>: User-modified.
+* ORS: Output Separators.
+* output: Printing.
+* output field separator, OFS: Output Separators.
+* output format specifier, OFMT: OFMT.
+* output record separator, ORS: Output Separators.
+* output redirection: Redirection.
+* output, buffering: I/O Functions.
+* output, formatted: Printf.
+* output, piping: Redirection.
+* passes, multiple: Other Arguments.
+* password file: Passwd Functions.
+* path, search: AWKPATH Variable.
+* pattern, BEGIN: BEGIN/END.
+* pattern, default: Very Simple.
+* pattern, definition of: Patterns and Actions.
+* pattern, empty: Empty.
+* pattern, END: BEGIN/END.
+* pattern, range: Ranges.
+* pattern, regular expressions: Regexp.
+* patterns, types of: Kinds of Patterns.
+* per file initialization and clean-up: Filetrans Function.
+* PERL: Future Extensions.
+* pipeline, input: Getline/Pipe.
+* pipes for output: Redirection.
+* portability issues <1>: Portability Notes.
+* portability issues <2>: Definition Syntax.
+* portability issues <3>: I/O Functions.
+* portability issues <4>: String Functions.
+* portability issues <5>: Delete.
+* portability issues <6>: Close Files And Pipes.
+* portability issues <7>: Escape Sequences.
+* portability issues: Statements/Lines.
+* porting gawk: New Ports.
+* POSIX awk <1>: Definition Syntax.
+* POSIX awk <2>: String Functions.
+* POSIX awk <3>: User-modified.
+* POSIX awk <4>: Next Statement.
+* POSIX awk <5>: Continue Statement.
+* POSIX awk <6>: Break Statement.
+* POSIX awk <7>: Precedence.
+* POSIX awk <8>: Assignment Ops.
+* POSIX awk <9>: Arithmetic Ops.
+* POSIX awk <10>: Conversion.
+* POSIX awk <11>: Format Modifiers.
+* POSIX awk <12>: OFMT.
+* POSIX awk <13>: Field Splitting Summary.
+* POSIX awk <14>: Regexp Operators.
+* POSIX awk: Escape Sequences.
+* POSIX mode: Options.
+* POSIXLY_CORRECT environment variable: Options.
+* precedence: Precedence.
+* precedence, regexp operators: Regexp Operators.
+* print statement: Print.
+* printf statement, syntax of: Basic Printf.
+* printf, format-control characters: Control Letters.
+* printf, modifiers: Format Modifiers.
+* printing: Printing.
+* procedural languages: Getting Started.
+* process information: Special Files.
+* processing arguments: Getopt Function.
+* program file: Long.
+* program, awk: This Manual.
+* program, definition of: Getting Started.
+* program, self contained: Executable Scripts.
+* programs, documenting <1>: Library Names.
+* programs, documenting: Comments.
+* pwcat program: Passwd Functions.
+* pwcat.c: Passwd Functions.
+* quotient: Arithmetic Ops.
+* quoting, shell <1>: Long.
+* quoting, shell: Read Terminal.
+* Rakitzis, Byron: History Sorting.
+* rand: Numeric Functions.
+* random numbers, seed of: Numeric Functions.
+* range pattern: Ranges.
+* Rankin, Pat <1>: Bugs.
+* Rankin, Pat <2>: Assignment Ops.
+* Rankin, Pat: Acknowledgements.
+* reading files: Reading Files.
+* reading files, getline command: Getline.
+* reading files, multiple line records: Multiple Line.
+* record separator, RS: Records.
+* record terminator, RT: Records.
+* record, definition of: Records.
+* records, multiple line: Multiple Line.
+* recursive function: Definition Syntax.
+* redirection of input: Getline/File.
+* redirection of output: Redirection.
+* reference to array: Reference to Elements.
+* regexp: Regexp.
+* regexp as expression: Typing and Comparison.
+* regexp comparison vs. string comparison: Typing and Comparison.
+* regexp constant: Regexp Usage.
+* regexp constants, difference between slashes and quotes: Computed Regexps.
+* regexp match/non-match operators <1>: Typing and Comparison.
+* regexp match/non-match operators: Regexp Usage.
+* regexp matching operators: Regexp Usage.
+* regexp operators: Regexp Operators.
+* regexp operators, GNU specific: GNU Regexp Operators.
+* regexp operators, precedence of: Regexp Operators.
+* regexp, anchors: Regexp Operators.
+* regexp, dynamic: Computed Regexps.
+* regexp, effect of command line options: GNU Regexp Operators.
+* regular expression: Regexp.
+* regular expression metacharacters: Regexp Operators.
+* regular expressions as field separators: Basic Field Splitting.
+* regular expressions as patterns: Regexp.
+* regular expressions as record separators: Records.
+* regular expressions, computed: Computed Regexps.
+* relational operators: Typing and Comparison.
+* remainder: Arithmetic Ops.
+* removing elements of arrays: Delete.
+* return statement: Return Statement.
+* RFC-1036: Time Functions.
+* RFC-822: Time Functions.
+* RLENGTH <1>: String Functions.
+* RLENGTH: Auto-set.
+* Robbins, Miriam: Acknowledgements.
+* Rommel, Kai Uwe <1>: Bugs.
+* Rommel, Kai Uwe: Acknowledgements.
+* round: Round Function.
+* rounding: Round Function.
+* RS <1>: User-modified.
+* RS: Records.
+* RSTART <1>: String Functions.
+* RSTART: Auto-set.
+* RT <1>: Auto-set.
+* RT <2>: Multiple Line.
+* RT: Records.
+* rule, definition of: Getting Started.
+* running awk programs: Running gawk.
+* running long programs: Long.
+* rvalue: Assignment Ops.
+* sample input file: Sample Data Files.
+* scanning an array: Scanning an Array.
+* script, definition of: Getting Started.
+* scripts, executable: Executable Scripts.
+* scripts, shell: Executable Scripts.
+* search path: AWKPATH Variable.
+* search path, for source files: AWKPATH Variable.
+* sed utility <1>: Igawk Program.
+* sed utility <2>: Simple Sed.
+* sed utility: Field Splitting Summary.
+* seed for random numbers: Numeric Functions.
+* self contained programs: Executable Scripts.
+* shell quoting <1>: Long.
+* shell quoting: Read Terminal.
+* shell scripts: Executable Scripts.
+* short-circuit operators: Boolean Ops.
+* side effect: Assignment Ops.
+* simple stream editor: Simple Sed.
+* sin: Numeric Functions.
+* single character fields: Single Character Fields.
+* single quotes, why needed: One-shot.
+* skipping input files: Nextfile Function.
+* skipping lines between markers: Ranges.
+* sparse arrays: Array Intro.
+* split: String Functions.
+* split utility: Split Program.
+* split.awk: Split Program.
+* sprintf: String Functions.
+* sqrt: Numeric Functions.
+* srand: Numeric Functions.
+* Stallman, Richard <1>: Acknowledgements.
+* Stallman, Richard: Manual History.
+* standard error output: Special Files.
+* standard input <1>: Special Files.
+* standard input <2>: Reading Files.
+* standard input: Read Terminal.
+* standard output: Special Files.
+* statement, compound: Statements.
+* stream editor: Field Splitting Summary.
+* stream editor, simple: Simple Sed.
+* strftime: Time Functions.
+* string comparison vs. regexp comparison: Typing and Comparison.
+* string constants: Constants.
+* string operators: Concatenation.
+* string-matching operators: Regexp Usage.
+* sub: String Functions.
+* sub, third argument of: String Functions.
+* subscripts in arrays: Multi-dimensional.
+* SUBSEP <1>: Multi-dimensional.
+* SUBSEP: User-modified.
+* substr: String Functions.
+* subtraction: Arithmetic Ops.
+* system: I/O Functions.
+* systime: Time Functions.
+* Tcl: Library Names.
+* tee utility: Tee Program.
+* tee.awk: Tee Program.
+* terminator, record: Records.
+* time of day: Time Functions.
+* timestamps: Time Functions.
+* timestamps, converting from dates: Mktime Function.
+* timestamps, formatted: Gettimeofday Function.
+* tolower: String Functions.
+* toupper: String Functions.
+* translate.awk: Translate Program.
+* Trueman, David: Acknowledgements.
+* truth values: Truth Values.
+* type conversion: Conversion.
+* types of variables <1>: Typing and Comparison.
+* types of variables: Assignment Ops.
+* undefined functions: Function Caveats.
+* undocumented features: Undocumented.
+* uninitialized variables, as array subscripts: Uninitialized Subscripts.
+* uniq utility: Uniq Program.
+* uniq.awk: Uniq Program.
+* use of comments: Comments.
+* user information: Passwd Functions.
+* user-defined functions: User-defined.
+* user-defined variables: Using Variables.
+* uses of awk: What Is Awk.
+* using this book: This Manual.
+* values of characters as numbers: Ordinal Functions.
+* variable shadowing: Definition Syntax.
+* variable typing: Typing and Comparison.
+* variables, user-defined: Using Variables.
+* Wall, Larry: Future Extensions.
+* wc utility: Wc Program.
+* wc.awk: Wc Program.
+* Weinberger, Peter: History.
+* when to use awk: When.
+* while statement: While Statement.
+* word boundaries, matching: GNU Regexp Operators.
+* word, regexp definition of: GNU Regexp Operators.
+* wordfreq.sh: Word Sorting.
+* || operator: Boolean Ops.
+* ~ operator <1>: Typing and Comparison.
+* ~ operator <2>: Regexp Constants.
+* ~ operator <3>: Computed Regexps.
+* ~ operator <4>: Case-sensitivity.
+* ~ operator: Regexp Usage.
+
+
+
+Tag Table:
+Node: Top1197
+Node: Preface20700
+Ref: Preface-Footnote-121817
+Node: History22049
+Node: Manual History23407
+Node: Acknowledgements26997
+Node: What Is Awk30624
+Node: This Manual32278
+Node: Conventions34919
+Node: Sample Data Files36211
+Node: Getting Started39294
+Node: Names41602
+Ref: Names-Footnote-143099
+Node: Running gawk43171
+Node: One-shot44332
+Node: Read Terminal45719
+Node: Long47331
+Node: Executable Scripts48724
+Ref: Executable Scripts-Footnote-150374
+Ref: Executable Scripts-Footnote-250523
+Node: Comments50977
+Node: Very Simple52137
+Node: Two Rules54184
+Node: More Complex56363
+Node: Statements/Lines59479
+Node: Other Features63752
+Node: When64478
+Node: One-liners66412
+Node: Regexp69299
+Node: Regexp Usage70625
+Node: Escape Sequences72775
+Node: Regexp Operators78227
+Node: GNU Regexp Operators89260
+Node: Case-sensitivity92965
+Node: Leftmost Longest96080
+Node: Computed Regexps97615
+Node: Reading Files100272
+Node: Records102039
+Node: Fields108534
+Ref: Fields-Footnote-1111516
+Node: Non-Constant Fields111602
+Node: Changing Fields113888
+Node: Field Separators118295
+Node: Basic Field Splitting118997
+Node: Regexp Field Splitting122226
+Node: Single Character Fields124792
+Node: Command Line Field Separator125869
+Node: Field Splitting Summary129109
+Ref: Field Splitting Summary-Footnote-1131028
+Node: Constant Size131129
+Node: Multiple Line135166
+Node: Getline140574
+Node: Getline Intro141648
+Node: Plain Getline142611
+Node: Getline/Variable144875
+Node: Getline/File146017
+Node: Getline/Variable/File147327
+Node: Getline/Pipe149301
+Node: Getline/Variable/Pipe151391
+Node: Getline Summary152509
+Node: Printing154103
+Node: Print155171
+Node: Print Examples157271
+Node: Output Separators159882
+Node: OFMT161780
+Node: Printf163182
+Node: Basic Printf164086
+Node: Control Letters165620
+Node: Format Modifiers168308
+Node: Printf Examples172457
+Node: Redirection175236
+Node: Special Files179874
+Node: Close Files And Pipes185111
+Node: Expressions189172
+Node: Constants191378
+Node: Scalar Constants191857
+Ref: Scalar Constants-Footnote-1192717
+Node: Regexp Constants192861
+Node: Using Constant Regexps193323
+Node: Variables196524
+Node: Using Variables197178
+Node: Assignment Options198613
+Node: Conversion200557
+Node: Arithmetic Ops203738
+Node: Concatenation205872
+Node: Assignment Ops207227
+Node: Increment Ops212822
+Node: Truth Values215350
+Node: Typing and Comparison216398
+Node: Boolean Ops222298
+Node: Conditional Exp225991
+Node: Function Calls227668
+Node: Precedence230548
+Node: Patterns and Actions233936
+Node: Pattern Overview234362
+Node: Kinds of Patterns235137
+Node: Regexp Patterns236274
+Node: Expression Patterns236828
+Node: Ranges240480
+Node: BEGIN/END243199
+Node: Using BEGIN/END243668
+Node: I/O And BEGIN/END246630
+Node: Empty248646
+Node: Action Overview248945
+Node: Statements251516
+Node: If Statement253222
+Node: While Statement254725
+Node: Do Statement256756
+Node: For Statement257858
+Node: Break Statement261115
+Node: Continue Statement263386
+Node: Next Statement265382
+Node: Nextfile Statement267879
+Node: Exit Statement269793
+Node: Built-in Variables271803
+Node: User-modified272899
+Ref: User-modified-Footnote-1277688
+Node: Auto-set277750
+Ref: Auto-set-Footnote-1284073
+Node: ARGC and ARGV284279
+Node: Arrays286981
+Node: Array Intro288444
+Node: Reference to Elements292320
+Node: Assigning Elements294270
+Node: Array Example294772
+Node: Scanning an Array296491
+Node: Delete298821
+Node: Numeric Array Subscripts300881
+Node: Uninitialized Subscripts302787
+Node: Multi-dimensional304427
+Node: Multi-scanning307522
+Node: Built-in309165
+Node: Calling Built-in310154
+Node: Numeric Functions312125
+Ref: Numeric Functions-Footnote-1315673
+Node: String Functions315943
+Ref: String Functions-Footnote-1334729
+Ref: String Functions-Footnote-2334780
+Node: I/O Functions334873
+Ref: I/O Functions-Footnote-1340366
+Node: Time Functions340457
+Ref: Time Functions-Footnote-1348776
+Ref: Time Functions-Footnote-2348887
+Ref: Time Functions-Footnote-3349163
+Node: User-defined349307
+Node: Definition Syntax350019
+Node: Function Example354268
+Node: Function Caveats356598
+Node: Return Statement360469
+Node: Invoking Gawk363124
+Node: Options364359
+Ref: Options-Footnote-1373162
+Node: Other Arguments373187
+Node: AWKPATH Variable375833
+Ref: AWKPATH Variable-Footnote-1378281
+Node: Obsolete378581
+Node: Undocumented379247
+Node: Known Bugs379455
+Node: Library Functions380593
+Node: Portability Notes383012
+Node: Nextfile Function384296
+Ref: Nextfile Function-Footnote-1389001
+Node: Assert Function389171
+Node: Round Function392510
+Node: Ordinal Functions394155
+Ref: Ordinal Functions-Footnote-1397387
+Node: Join Function397606
+Node: Mktime Function399658
+Ref: Mktime Function-Footnote-1411149
+Node: Gettimeofday Function411232
+Node: Filetrans Function415244
+Node: Getopt Function418921
+Node: Passwd Functions430277
+Node: Group Functions438612
+Node: Library Names446509
+Node: Sample Programs450434
+Node: Clones450925
+Node: Cut Program452019
+Node: Egrep Program462048
+Node: Id Program469710
+Node: Split Program472981
+Node: Tee Program476359
+Node: Uniq Program479155
+Node: Wc Program486700
+Ref: Wc Program-Footnote-1490936
+Node: Miscellaneous Programs491117
+Node: Dupword Program492027
+Node: Alarm Program493698
+Node: Translate Program498243
+Ref: Translate Program-Footnote-1502730
+Ref: Translate Program-Footnote-2502873
+Node: Labels Program503053
+Ref: Labels Program-Footnote-1506512
+Node: Word Sorting506596
+Node: History Sorting510940
+Node: Extract Program512909
+Node: Simple Sed519866
+Node: Igawk Program523210
+Node: Language History536353
+Node: V7/SVR3.1537586
+Node: SVR4540239
+Node: POSIX541759
+Node: BTL543378
+Node: POSIX/GNU544142
+Node: Gawk Summary548573
+Node: Command Line Summary549397
+Node: Language Summary552373
+Ref: Language Summary-Footnote-1554630
+Node: Variables/Fields554753
+Node: Fields Summary555487
+Ref: Fields Summary-Footnote-1557215
+Node: Built-in Summary557273
+Node: Arrays Summary560918
+Node: Data Type Summary562211
+Node: Rules Summary564037
+Node: Pattern Summary565565
+Node: Regexp Summary567750
+Node: Actions Summary571132
+Node: Operator Summary572964
+Node: Control Flow Summary574191
+Node: I/O Summary574748
+Node: Printf Summary577737
+Node: Special File Summary581075
+Node: Built-in Functions Summary582753
+Node: Time Functions Summary586753
+Node: String Constants Summary587644
+Node: Functions Summary588964
+Node: Historical Features590025
+Node: Installation591523
+Node: Gawk Distribution592738
+Node: Getting593241
+Node: Extracting596226
+Node: Distribution contents597613
+Node: Unix Installation602527
+Node: Quick Installation603036
+Node: Configuration Philosophy604554
+Node: VMS Installation606956
+Node: VMS Compilation607495
+Node: VMS Installation Details609099
+Node: VMS Running610741
+Node: VMS POSIX612331
+Node: PC Installation613611
+Node: Atari Installation617014
+Node: Atari Compiling618198
+Node: Atari Using620107
+Node: Amiga Installation622953
+Node: Bugs624071
+Node: Other Versions627040
+Node: Notes628614
+Node: Compatibility Mode629221
+Node: Additions630064
+Node: Adding Code630762
+Node: New Ports636102
+Node: Future Extensions640270
+Node: Improvements642519
+Node: Glossary644387
+Node: Copying661452
+Node: Index680644
+
+End Tag Table