aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.info')
-rw-r--r--doc/gawk.info29436
1 files changed, 450 insertions, 28986 deletions
diff --git a/doc/gawk.info b/doc/gawk.info
index 498c7668..54a8ccf8 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -1,4 +1,4 @@
-This is gawk.info, produced by makeinfo version 4.13 from gawk.texi.
+This is gawk.info, produced by makeinfo version 4.13 from foo.texi.
INFO-DIR-SECTION Text creation and manipulation
START-INFO-DIR-ENTRY
@@ -33,28992 +33,456 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
software freedom."

-File: gawk.info, Node: Top, Next: Foreword, Up: (dir)
-
-General Introduction
-********************
-
-This file documents `awk', a program that you can use to select
-particular records in a file and perform operations upon them.
-
- Copyright (C) 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012 Free
-Software Foundation, Inc.
-
-
- This is Edition 4 of `GAWK: Effective AWK Programming: A User's
-Guide for GNU Awk', for the 4.0.1 (or later) version of the GNU
-implementation of AWK.
-
- Permission is granted to copy, distribute and/or modify this document
-under the terms of the GNU Free Documentation License, Version 1.3 or
-any later version published by the Free Software Foundation; with the
-Invariant Sections being "GNU General Public License", the Front-Cover
-texts being (a) (see below), and with the Back-Cover Texts being (b)
-(see below). A copy of the license is included in the section entitled
-"GNU Free Documentation License".
-
- a. "A GNU Manual"
-
- b. "You have the freedom to copy and modify this GNU manual. Buying
- copies from the FSF supports it in developing GNU and promoting
- software freedom."
-
-* Menu:
-
-* Foreword:: Some nice words about this
- Info file.
-* Preface:: What this Info file is about; brief
- history and acknowledgments.
-* Getting Started:: A basic introduction to using
- `awk'. How to run an `awk'
- program. Command-line syntax.
-* Invoking Gawk:: How to run `gawk'.
-* Regexp:: All about matching things using regular
- expressions.
-* Reading Files:: How to read files and manipulate fields.
-* Printing:: How to print using `awk'. Describes
- the `print' and `printf'
- statements. Also describes redirection of
- output.
-* Expressions:: Expressions are the basic building blocks
- of statements.
-* Patterns and Actions:: Overviews of patterns and actions.
-* Arrays:: The description and use of arrays. Also
- includes array-oriented control statements.
-* Functions:: Built-in and user-defined functions.
-* Internationalization:: Getting `gawk' to speak your
- language.
-* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
- `gawk'.
-* Advanced Features:: Stuff for advanced users, specific to
- `gawk'.
-* Library Functions:: A Library of `awk' Functions.
-* Sample Programs:: Many `awk' programs with complete
- explanations.
-* Debugger:: The `gawk' debugger.
-* Language History:: The evolution of the `awk'
- language.
-* Installation:: Installing `gawk' under various
- operating systems.
-* Notes:: Notes about `gawk' extensions and
- possible future work.
-* Basic Concepts:: A very quick introduction to programming
- concepts.
-* Glossary:: An explanation of some unfamiliar terms.
-* Copying:: Your right to copy and distribute
- `gawk'.
-* GNU Free Documentation License:: The license for this Info file.
-* Index:: Concept and Variable Index.
-
-* History:: The history of `gawk' and
- `awk'.
-* Names:: What name to use to find `awk'.
-* This Manual:: Using this Info file. Includes
- sample input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- Info file.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-* Running gawk:: How to run `gawk' programs;
- includes command-line syntax.
-* One-shot:: Running a short throwaway `awk'
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent `awk' programs in
- files.
-* Executable Scripts:: Making self-contained `awk'
- programs.
-* Comments:: Adding documentation to `gawk'
- programs.
-* Quoting:: More discussion of shell quoting issues.
-* DOS Quoting:: Quoting in Windows Batch Files.
-* Sample Data Files:: Sample data files for use in the
- `awk' programs illustrated in this
- Info file.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of `awk'.
-* When:: When to use `gawk' and when to use
- other things.
-* Command Line:: How to run `awk'.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables `gawk'
- uses.
-* AWKPATH Variable:: Searching directories for `awk'
- programs.
-* AWKLIBPATH Variable:: Searching directories for `awk'
- shared libraries.
-* Other Environment Variables:: The environment variables.
-* Exit Status:: `gawk''s exit status.
-* Include Files:: Including other files into your program.
-* Loading Shared Libraries:: Loading shared libraries into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between `[...]'.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting `FS' from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program
- control using the `getline' function.
-* Plain Getline:: Using `getline' with no arguments.
-* Getline/Variable:: Using `getline' into a variable.
-* Getline/File:: Using `getline' from a file.
-* Getline/Variable/File:: Using `getline' into a variable from a
- file.
-* Getline/Pipe:: Using `getline' from a pipe.
-* Getline/Variable/Pipe:: Using `getline' into a variable from a
- pipe.
-* Getline/Coprocess:: Using `getline' from a coprocess.
-* Getline/Variable/Coprocess:: Using `getline' into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about
- `getline'.
-* Getline Summary:: Summary of `getline' Variants.
-* Read Timeout:: Reading input with a timeout.
-* Command line directories:: What happens if you put a directory on the
- command line.
-* Print:: The `print' statement.
-* Print Examples:: Simple examples of `print' statements.
-* Output Separators:: The output separators and how to change
- them.
-* OFMT:: Controlling Numeric Output With
- `print'.
-* Printf:: The `printf' statement.
-* Basic Printf:: Syntax of the `printf' statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-* Redirection:: How to redirect output to multiple files
- and pipes.
-* Special Files:: File name interpretation in `gawk'.
- `gawk' allows access to inherited
- file descriptors.
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-* Values:: Constants, Variables, and Regular
- Expressions.
-* Constants:: String, numeric and regexp constants.
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later
- use.
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-* Conversion:: The conversion of strings to numbers and
- vice versa.
-* All Operators:: `gawk''s operators.
-* Arithmetic Ops:: Arithmetic operations (`+', `-',
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a
- field.
-* Increment Ops:: Incrementing the numeric value of a
- variable.
-* Truth Values and Conditions:: Testing for true and false.
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings
- with `<', etc.
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-* Boolean Ops:: Combining comparison expressions using
- boolean operators `||' (``or''),
- `&&' (``and'') and `!' (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-* Pattern Overview:: What goes into a pattern.
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup
- rules.
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every
- record.
-* Using Shell Variables:: How to use shell variables with
- `awk'.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* If Statement:: Conditionally execute some `awk'
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until
- some condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing
- loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of `awk'.
-* Built-in Variables:: Summarizes the built-in variables.
-* User-modified:: Built-in variables that you change to
- control `awk'.
-* Auto-set:: Built-in variables where `awk'
- gives you information.
-* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
-* Array Basics:: The basics of arrays.
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the `for' statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-* Delete:: The `delete' statement removes an
- element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- `awk'.
-* Uninitialized Subscripts:: Using Uninitialized variables as
- subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- `awk'.
-* Multi-scanning:: Scanning multidimensional arrays.
-* Arrays of Arrays:: True multidimensional arrays.
-* Built-in:: Summarizes the built-in functions.
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- `int()', `sin()' and
- `rand()'.
-* String Functions:: Functions for string manipulation, such as
- `split()', `match()' and
- `sprintf()'.
-* Gory Details:: More than you want to know about `\'
- and `&' with `sub()',
- `gsub()', and `gensub()'.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-* User-defined:: Describes User-defined functions in detail.
-* Definition Syntax:: How to write definitions and what they
- mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-* Indirect Calls:: Choosing the function to call at runtime.
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU `gettext' works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging `printf' arguments.
-* I18N Portability:: `awk'-level portability issues.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: `gawk' is also internationalized.
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-* Integer Programming:: Effective integer programming.
-* Floating-point Programming:: Effective floating-point programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Gawk and MPFR:: How `gawk' provides
- aribitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary precision floating-point
- arithmetic with `gawk'.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point
- numbers.
-* Arbitrary Precision Integers:: Arbitrary precision integer arithmetic with
- `gawk'.
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal
- and sorting arrays.
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use `asort()' and
- `asorti()'.
-* Two-way I/O:: Two-way communications with another
- process.
-* TCP/IP Networking:: Using `gawk' for network
- programming.
-* Profiling:: Profiling your `awk' programs.
-* Library Names:: How to best name private global variables
- in library functions.
-* General Functions:: Functions that are of general use.
-* Strtonum Function:: A replacement for the built-in
- `strtonum()' function.
-* Assert Function:: A function for assertions in `awk'
- programs.
-* Round Function:: A function for rounding if `sprintf()'
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers
- and vice versa.
-* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
-* Data File Management:: Functions for managing command-line data
- files.
-* Filetrans Function:: A function for handling data file
- transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Cut Program:: The `cut' utility.
-* Egrep Program:: The `egrep' utility.
-* Id Program:: The `id' utility.
-* Split Program:: The `split' utility.
-* Tee Program:: The `tee' utility.
-* Uniq Program:: The `uniq' utility.
-* Wc Program:: The `wc' utility.
-* Miscellaneous Programs:: Some interesting `awk' programs.
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the `tr'
- utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a
- history file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for `awk' that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time
- on their hands.
-* Debugging:: Introduction to `gawk' debugger.
-* Debugging Concepts:: Debugging in General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-* Sample Debugging Session:: Sample debugging session.
-* Debugger Invocation:: How to Start the Debugger.
-* Finding The Bug:: Finding the Bug.
-* List of Debugger Commands:: Main debugger commands.
-* Breakpoint Control:: Control of Breakpoints.
-* Debugger Execution Control:: Control of Execution.
-* Viewing And Changing Data:: Viewing and Changing Data.
-* Execution Stack:: Dealing with the Stack.
-* Debugger Info:: Obtaining Information about the Program and
- the Debugger State.
-* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-* Readline Support:: Readline support.
-* Limitations:: Limitations and future plans.
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version
- of `awk'.
-* POSIX/GNU:: The extensions in `gawk' not in
- POSIX `awk'.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to `gawk'.
-* Gawk Distribution:: What is in the `gawk' distribution.
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-* Unix Installation:: Installing `gawk' under various
- versions of Unix.
-* Quick Installation:: Compiling `gawk' under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* PC Installation:: Installing and Compiling `gawk' on
- MS-DOS and OS/2.
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling `gawk' for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing `gawk' on PC systems.
-* PC Using:: Running `gawk' on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running `gawk' for
- Cygwin.
-* MSYS:: Using `gawk' In The MSYS
- Environment.
-* VMS Installation:: Installing `gawk' on VMS.
-* VMS Compilation:: How to compile `gawk' under VMS.
-* VMS Installation Details:: How to install `gawk' under VMS.
-* VMS Running:: How to run `gawk' under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available `awk'
- implementations.
-* Compatibility Mode:: How to disable certain `gawk'
- extensions.
-* Additions:: Making Additions To `gawk'.
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- `gawk'.
-* New Ports:: Porting `gawk' to a new operating
- system.
-* Dynamic Extensions:: Adding new built-in functions to
- `gawk'.
-* Internals:: A brief look at some `gawk'
- internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-* Future Extensions:: New features that may be implemented one
- day.
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
-
- To Miriam, for making me complete.
-
- To Chana, for the joy you bring us.
-
- To Rivka, for the exponential increase.
-
- To Nachum, for the added dimension.
-
- To Malka, for the new beginning.
-
-File: gawk.info, Node: Foreword, Next: Preface, Prev: Top, Up: Top
-
-Foreword
-********
-
-Arnold Robbins and I are good friends. We were introduced in 1990 by
-circumstances--and our favorite programming language, AWK. The
-circumstances started a couple of years earlier. I was working at a new
-job and noticed an unplugged Unix computer sitting in the corner. No
-one knew how to use it, and neither did I. However, a couple of days
-later it was running, and I was `root' and the one-and-only user. That
-day, I began the transition from statistician to Unix programmer.
-
- On one of many trips to the library or bookstore in search of books
-on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and
-Weinberger, `The AWK Programming Language', Addison-Wesley, 1988.
-AWK's simple programming paradigm--find a pattern in the input and then
-perform an action--often reduced complex or tedious data manipulations
-to few lines of code. I was excited to try my hand at programming in
-AWK.
-
- Alas, the `awk' on my computer was a limited version of the
-language described in the AWK book. I discovered that my computer had
-"old `awk'" and the AWK book described "new `awk'." I learned that
-this was typical; the old version refused to step aside or relinquish
-its name. If a system had a new `awk', it was invariably called
-`nawk', and few systems had it. The best way to get a new `awk' was to
-`ftp' the source code for `gawk' from `prep.ai.mit.edu'. `gawk' was a
-version of new `awk' written by David Trueman and Arnold, and available
-under the GNU General Public License.
-
- (Incidentally, it's no longer difficult to find a new `awk'. `gawk'
-ships with GNU/Linux, and you can download binaries or source code for
-almost any system; my wife uses `gawk' on her VMS box.)
-
- My Unix system started out unplugged from the wall; it certainly was
-not plugged into a network. So, oblivious to the existence of `gawk'
-and the Unix community in general, and desiring a new `awk', I wrote my
-own, called `mawk'. Before I was finished I knew about `gawk', but it
-was too late to stop, so I eventually posted to a `comp.sources'
-newsgroup.
-
- A few days after my posting, I got a friendly email from Arnold
-introducing himself. He suggested we share design and algorithms and
-attached a draft of the POSIX standard so that I could update `mawk' to
-support language extensions added after publication of the AWK book.
-
- Frankly, if our roles had been reversed, I would not have been so
-open and we probably would have never met. I'm glad we did meet. He
-is an AWK expert's AWK expert and a genuinely nice person. Arnold
-contributes significant amounts of his expertise and time to the Free
-Software Foundation.
-
- This book is the `gawk' reference manual, but at its core it is a
-book about AWK programming that will appeal to a wide audience. It is
-a definitive reference to the AWK language as defined by the 1987 Bell
-Laboratories release and codified in the 1992 POSIX Utilities standard.
-
- On the other hand, the novice AWK programmer can study a wealth of
-practical programs that emphasize the power of AWK's basic idioms: data
-driven control-flow, pattern matching with regular expressions, and
-associative arrays. Those looking for something new can try out
-`gawk''s interface to network protocols via special `/inet' files.
-
- The programs in this book make clear that an AWK program is
-typically much smaller and faster to develop than a counterpart written
-in C. Consequently, there is often a payoff to prototype an algorithm
-or design in AWK to get it running quickly and expose problems early.
-Often, the interpreted performance is adequate and the AWK prototype
-becomes the product.
-
- The new `pgawk' (profiling `gawk'), produces program execution
-counts. I recently experimented with an algorithm that for n lines of
-input, exhibited ~ C n^2 performance, while theory predicted ~ C n log n
-behavior. A few minutes poring over the `awkprof.out' profile
-pinpointed the problem to a single line of code. `pgawk' is a welcome
-addition to my programmer's toolbox.
-
- Arnold has distilled over a decade of experience writing and using
-AWK programs, and developing `gawk', into this book. If you use AWK or
-want to learn how, then read this book.
-
- Michael Brennan
- Author of `mawk'
- March, 2001
-
-
-File: gawk.info, Node: Preface, Next: Getting Started, Prev: Foreword, Up: Top
-
-Preface
-*******
-
-Several kinds of tasks occur repeatedly when working with text files.
-You might want to extract certain lines and discard the rest. Or you
-may need to make changes wherever certain patterns appear, but leave
-the rest of the file alone. Writing single-use programs for these
-tasks in languages such as C, C++, or Java is time-consuming and
-inconvenient. Such jobs are often easier with `awk'. The `awk'
-utility interprets a special-purpose programming language that makes it
-easy to handle simple data-reformatting jobs.
-
- The GNU implementation of `awk' is called `gawk'; if you invoke it
-with the proper options or environment variables (*note Options::), it
-is fully compatible with the POSIX(1) specification of the `awk'
-language and with the Unix version of `awk' maintained by Brian
-Kernighan. This means that all properly written `awk' programs should
-work with `gawk'. Thus, we usually don't distinguish between `gawk'
-and other `awk' implementations.
-
- Using `awk' allows you to:
-
- * Manage small, personal databases
-
- * Generate reports
-
- * Validate data
-
- * Produce indexes and perform other document preparation tasks
-
- * Experiment with algorithms that you can adapt later to other
- computer languages
-
- In addition, `gawk' provides facilities that make it easy to:
-
- * Extract bits and pieces of data for processing
-
- * Sort data
-
- * Perform simple network communications
-
- This Info file teaches you about the `awk' language and how you can
-use it effectively. You should already be familiar with basic system
-commands, such as `cat' and `ls',(2) as well as basic shell facilities,
-such as input/output (I/O) redirection and pipes.
-
- Implementations of the `awk' language are available for many
-different computing environments. This Info file, while describing the
-`awk' language in general, also describes the particular implementation
-of `awk' called `gawk' (which stands for "GNU awk"). `gawk' runs on a
-broad range of Unix systems, ranging from Intel(R)-architecture
-PC-based computers up through large-scale systems, such as Crays.
-`gawk' has also been ported to Mac OS X, Microsoft Windows (all
-versions) and OS/2 PCs, and VMS. (Some other, obsolete systems to
-which `gawk' was once ported are no longer supported and the code for
-those systems has been removed.)
-
-* Menu:
-
-* History:: The history of `gawk' and
- `awk'.
-* Names:: What name to use to find `awk'.
-* This Manual:: Using this Info file. Includes sample
- input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- Info file.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-
- ---------- Footnotes ----------
-
- (1) The 2008 POSIX standard can be found online at
-`http://www.opengroup.org/onlinepubs/9699919799/'.
-
- (2) These commands are available on POSIX-compliant systems, as well
-as on traditional Unix-based systems. If you are using some other
-operating system, you still need to be familiar with the ideas of I/O
-redirection and pipes.
-
-
-File: gawk.info, Node: History, Next: Names, Up: Preface
-
-History of `awk' and `gawk'
-===========================
-
- Recipe For A Programming Language
-
- 1 part `egrep' 1 part `snobol'
- 2 parts `ed' 3 parts C
-
- Blend all parts well using `lex' and `yacc'. Document minimally
- and release.
-
- After eight years, add another part `egrep' and two more parts C.
- Document very well and release.
-
- The name `awk' comes from the initials of its designers: Alfred V.
-Aho, Peter J. Weinberger and Brian W. Kernighan. The original version
-of `awk' was written in 1977 at AT&T Bell Laboratories. In 1985, a new
-version made the programming language more powerful, introducing
-user-defined functions, multiple input streams, and computed regular
-expressions. This new version became widely available with Unix System
-V Release 3.1 (1987). The version in System V Release 4 (1989) added
-some new features and cleaned up the behavior in some of the "dark
-corners" of the language. The specification for `awk' in the POSIX
-Command Language and Utilities standard further clarified the language.
-Both the `gawk' designers and the original Bell Laboratories `awk'
-designers provided feedback for the POSIX specification.
-
- Paul Rubin wrote the GNU implementation, `gawk', in 1986. Jay
-Fenlason completed it, with advice from Richard Stallman. John Woods
-contributed parts of the code as well. In 1988 and 1989, David
-Trueman, with help from me, thoroughly reworked `gawk' for compatibility
-with the newer `awk'. Circa 1994, I became the primary maintainer.
-Current development focuses on bug fixes, performance improvements,
-standards compliance, and occasionally, new features.
-
- In May of 1997, Ju"rgen Kahrs felt the need for network access from
-`awk', and with a little help from me, set about adding features to do
-this for `gawk'. At that time, he also wrote the bulk of `TCP/IP
-Internetworking with `gawk'' (a separate document, available as part of
-the `gawk' distribution). His code finally became part of the main
-`gawk' distribution with `gawk' version 3.1.
-
- John Haque rewrote the `gawk' internals, in the process providing an
-`awk'-level debugger. This version became available as `gawk' version
-4.0, in 2011.
-
- *Note Contributors::, for a complete list of those who made
-important contributions to `gawk'.
-
-
-File: gawk.info, Node: Names, Next: This Manual, Prev: History, Up: Preface
-
-A Rose by Any Other Name
-========================
-
-The `awk' language has evolved over the years. Full details are
-provided in *note Language History::. The language described in this
-Info file is often referred to as "new `awk'" (`nawk').
-
- Because of this, there are systems with multiple versions of `awk'.
-Some systems have an `awk' utility that implements the original version
-of the `awk' language and a `nawk' utility for the new version. Others
-have an `oawk' version for the "old `awk'" language and plain `awk' for
-the new one. Still others only have one version, which is usually the
-new one.(1)
-
- All in all, this makes it difficult for you to know which version of
-`awk' you should run when writing your programs. The best advice we
-can give here is to check your local documentation. Look for `awk',
-`oawk', and `nawk', as well as for `gawk'. It is likely that you
-already have some version of new `awk' on your system, which is what
-you should use when running your programs. (Of course, if you're
-reading this Info file, chances are good that you have `gawk'!)
-
- Throughout this Info file, whenever we refer to a language feature
-that should be available in any complete implementation of POSIX `awk',
-we simply use the term `awk'. When referring to a feature that is
-specific to the GNU implementation, we use the term `gawk'.
-
- ---------- Footnotes ----------
-
- (1) Often, these systems use `gawk' for their `awk' implementation!
-
-
-File: gawk.info, Node: This Manual, Next: Conventions, Prev: Names, Up: Preface
-
-Using This Book
-===============
-
-The term `awk' refers to a particular program as well as to the
-language you use to tell this program what to do. When we need to be
-careful, we call the language "the `awk' language," and the program
-"the `awk' utility." This Info file explains both how to write
-programs in the `awk' language and how to run the `awk' utility. The
-term "`awk' program" refers to a program written by you in the `awk'
-programming language.
-
- Primarily, this Info file explains the features of `awk' as defined
-in the POSIX standard. It does so in the context of the `gawk'
-implementation. While doing so, it also attempts to describe important
-differences between `gawk' and other `awk' implementations.(1) Finally,
-any `gawk' features that are not in the POSIX standard for `awk' are
-noted.
-
- There are subsections labeled as *Advanced Notes* scattered
-throughout the Info file. They add a more complete explanation of
-points that are relevant, but not likely to be of interest on first
-reading. All appear in the index, under the heading "advanced
-features."
-
- Most of the time, the examples use complete `awk' programs. Some of
-the more advanced sections show only the part of the `awk' program that
-illustrates the concept currently being described.
-
- While this Info file is aimed principally at people who have not been
-exposed to `awk', there is a lot of information here that even the `awk'
-expert should find useful. In particular, the description of POSIX
-`awk' and the example programs in *note Library Functions::, and in
-*note Sample Programs::, should be of interest.
-
- *note Getting Started::, provides the essentials you need to know to
-begin using `awk'.
-
- *note Invoking Gawk::, describes how to run `gawk', the meaning of
-its command-line options, and how it finds `awk' program source files.
-
- *note Regexp::, introduces regular expressions in general, and in
-particular the flavors supported by POSIX `awk' and `gawk'.
-
- *note Reading Files::, describes how `awk' reads your data. It
-introduces the concepts of records and fields, as well as the `getline'
-command. I/O redirection is first described here. Network I/O is also
-briefly introduced here.
-
- *note Printing::, describes how `awk' programs can produce output
-with `print' and `printf'.
-
- *note Expressions::, describes expressions, which are the basic
-building blocks for getting most things done in a program.
-
- *note Patterns and Actions::, describes how to write patterns for
-matching records, actions for doing something when a record is matched,
-and the built-in variables `awk' and `gawk' use.
-
- *note Arrays::, covers `awk''s one-and-only data structure:
-associative arrays. Deleting array elements and whole arrays is also
-described, as well as sorting arrays in `gawk'. It also describes how
-`gawk' provides arrays of arrays.
-
- *note Functions::, describes the built-in functions `awk' and `gawk'
-provide, as well as how to define your own functions.
-
- *note Internationalization::, describes special features in `gawk'
-for translating program messages into different languages at runtime.
-
- *note Advanced Features::, describes a number of `gawk'-specific
-advanced features. Of particular note are the abilities to have
-two-way communications with another process, perform TCP/IP networking,
-and profile your `awk' programs.
-
- *note Library Functions::, and *note Sample Programs::, provide many
-sample `awk' programs. Reading them allows you to see `awk' solving
-real problems.
-
- *note Debugger::, describes the `awk' debugger.
-
- *note Language History::, describes how the `awk' language has
-evolved since its first release to present. It also describes how
-`gawk' has acquired features over time.
-
- *note Installation::, describes how to get `gawk', how to compile it
-on POSIX-compatible systems, and how to compile and use it on different
-non-POSIX systems. It also describes how to report bugs in `gawk' and
-where to get other freely available `awk' implementations.
-
- *note Notes::, describes how to disable `gawk''s extensions, as well
-as how to contribute new code to `gawk', how to write extension
-libraries, and some possible future directions for `gawk' development.
-
- *note Basic Concepts::, provides some very cursory background
-material for those who are completely unfamiliar with computer
-programming. Also centralized there is a discussion of some of the
-issues surrounding floating-point numbers.
-
- The *note Glossary::, defines most, if not all, the significant
-terms used throughout the book. If you find terms that you aren't
-familiar with, try looking them up here.
-
- *note Copying::, and *note GNU Free Documentation License::, present
-the licenses that cover the `gawk' source code and this Info file,
-respectively.
-
- ---------- Footnotes ----------
-
- (1) All such differences appear in the index under the entry
-"differences in `awk' and `gawk'."
-
-
-File: gawk.info, Node: Conventions, Next: Manual History, Prev: This Manual, Up: Preface
-
-Typographical Conventions
-=========================
-
-This Info file is written in Texinfo (http://texinfo.org), the GNU
-documentation formatting language. A single Texinfo source file is
-used to produce both the printed and online versions of the
-documentation. This minor node briefly documents the typographical
-conventions used in Texinfo.
-
- Examples you would type at the command-line are preceded by the
-common shell primary and secondary prompts, `$' and `>'. Input that
-you type is shown `like this'. Output from the command is preceded by
-the glyph "-|". This typically represents the command's standard
-output. Error messages, and other output on the command's standard
-error, are preceded by the glyph "error-->". For example:
-
- $ echo hi on stdout
- -| hi on stdout
- $ echo hello on stderr 1>&2
- error--> hello on stderr
-
- Characters that you type at the keyboard look `like this'. In
-particular, there are special characters called "control characters."
-These are characters that you type by holding down both the `CONTROL'
-key and another key, at the same time. For example, a `Ctrl-d' is typed
-by first pressing and holding the `CONTROL' key, next pressing the `d'
-key and finally releasing both keys.
-
-Dark Corners
-............
-
- Dark corners are basically fractal -- no matter how much you
- illuminate, there's always a smaller but darker one.
- Brian Kernighan
-
- Until the POSIX standard (and `GAWK: Effective AWK Programming'),
-many features of `awk' were either poorly documented or not documented
-at all. Descriptions of such features (often called "dark corners")
-are noted in this Info file with "(d.c.)". They also appear in the
-index under the heading "dark corner."
-
- As noted by the opening quote, though, any coverage of dark corners
-is, by definition, incomplete.
-
- Extensions to the standard `awk' language that are supported by more
-than one `awk' implementation are marked "(c.e.)," and listed in the
-index under "common extensions" and "extensions, common."
-
-
-File: gawk.info, Node: Manual History, Next: How To Contribute, Prev: Conventions, Up: Preface
-
-The GNU Project and This Book
-=============================
-
-The Free Software Foundation (FSF) is a nonprofit organization dedicated
-to the production and distribution of freely distributable software.
-It was founded by Richard M. Stallman, the author of the original Emacs
-editor. GNU Emacs is the most widely used version of Emacs today.
-
- The GNU(1) Project is an ongoing effort on the part of the Free
-Software Foundation to create a complete, freely distributable,
-POSIX-compliant computing environment. The FSF uses the "GNU General
-Public License" (GPL) to ensure that their software's source code is
-always available to the end user. A copy of the GPL is included for
-your reference (*note Copying::). The GPL applies to the C language
-source code for `gawk'. To find out more about the FSF and the GNU
-Project online, see the GNU Project's home page (http://www.gnu.org).
-This Info file may also be read from their web site
-(http://www.gnu.org/software/gawk/manual/).
-
- A shell, an editor (Emacs), highly portable optimizing C, C++, and
-Objective-C compilers, a symbolic debugger and dozens of large and
-small utilities (such as `gawk'), have all been completed and are
-freely available. The GNU operating system kernel (the HURD), has been
-released but remains in an early stage of development.
-
- Until the GNU operating system is more fully developed, you should
-consider using GNU/Linux, a freely distributable, Unix-like operating
-system for Intel(R), Power Architecture, Sun SPARC, IBM S/390, and other
-systems.(2) Many GNU/Linux distributions are available for download
-from the Internet.
-
- (There are numerous other freely available, Unix-like operating
-systems based on the Berkeley Software Distribution, and some of them
-use recent versions of `gawk' for their versions of `awk'. NetBSD
-(http://www.netbsd.org), FreeBSD (http://www.freebsd.org), and OpenBSD
-(http://www.openbsd.org) are three of the most popular ones, but there
-are others.)
-
- The Info file itself has gone through a number of previous editions.
-Paul Rubin wrote the very first draft of `The GAWK Manual'; it was
-around 40 pages in size. Diane Close and Richard Stallman improved it,
-yielding a version that was around 90 pages long and barely described
-the original, "old" version of `awk'.
-
- I started working with that version in the fall of 1988. As work on
-it progressed, the FSF published several preliminary versions (numbered
-0.X). In 1996, Edition 1.0 was released with `gawk' 3.0.0. The FSF
-published the first two editions under the title `The GNU Awk User's
-Guide'.
-
- This edition maintains the basic structure of the previous editions.
-For Edition 4.0, the content has been thoroughly reviewed and updated.
-All references to versions prior to 4.0 have been removed. Of
-significant note for this edition is *note Debugger::.
-
- `GAWK: Effective AWK Programming' will undoubtedly continue to
-evolve. An electronic version comes with the `gawk' distribution from
-the FSF. If you find an error in this Info file, please report it!
-*Note Bugs::, for information on submitting problem reports
-electronically.
-
- ---------- Footnotes ----------
-
- (1) GNU stands for "GNU's not Unix."
-
- (2) The terminology "GNU/Linux" is explained in the *note Glossary::.
-
-
-File: gawk.info, Node: How To Contribute, Next: Acknowledgments, Prev: Manual History, Up: Preface
-
-How to Contribute
-=================
-
-As the maintainer of GNU `awk', I once thought that I would be able to
-manage a collection of publicly available `awk' programs and I even
-solicited contributions. Making things available on the Internet helps
-keep the `gawk' distribution down to manageable size.
-
- The initial collection of material, such as it is, is still available
-at `ftp://ftp.freefriends.org/arnold/Awkstuff'. In the hopes of doing
-something more broad, I acquired the `awk.info' domain.
-
- However, I found that I could not dedicate enough time to managing
-contributed code: the archive did not grow and the domain went unused
-for several years.
-
- Fortunately, late in 2008, a volunteer took on the task of setting up
-an `awk'-related web site--`http://awk.info'--and did a very nice job.
-
- If you have written an interesting `awk' program, or have written a
-`gawk' extension that you would like to share with the rest of the
-world, please see `http://awk.info/?contribute' for how to contribute
-it to the web site.
-
-
-File: gawk.info, Node: Acknowledgments, Prev: How To Contribute, Up: Preface
-
-Acknowledgments
-===============
-
-The initial draft of `The GAWK Manual' had the following
-acknowledgments:
-
- Many people need to be thanked for their assistance in producing
- this manual. Jay Fenlason contributed many ideas and sample
- programs. Richard Mlynarik and Robert Chassell gave helpful
- comments on drafts of this manual. The paper `A Supplemental
- Document for `awk'' by John W. Pierce of the Chemistry Department
- at UC San Diego, pinpointed several issues relevant both to `awk'
- implementation and to this manual, that would otherwise have
- escaped us.
-
- I would like to acknowledge Richard M. Stallman, for his vision of a
-better world and for his courage in founding the FSF and starting the
-GNU Project.
-
- Earlier editions of this Info file had the following
-acknowledgements:
-
- The following people (in alphabetical order) provided helpful
- comments on various versions of this book, Rick Adams, Dr. Nelson
- H.F. Beebe, Karl Berry, Dr. Michael Brennan, Rich Burridge, Claire
- Cloutier, Diane Close, Scott Deifik, Christopher ("Topher") Eliot,
- Jeffrey Friedl, Dr. Darrel Hankerson, Michal Jaegermann, Dr.
- Richard J. LeBlanc, Michael Lijewski, Pat Rankin, Miriam Robbins,
- Mary Sheehan, and Chuck Toporek.
-
- Robert J. Chassell provided much valuable advice on the use of
- Texinfo. He also deserves special thanks for convincing me _not_
- to title this Info file `How To Gawk Politely'. Karl Berry helped
- significantly with the TeX part of Texinfo.
-
- I would like to thank Marshall and Elaine Hartholz of Seattle and
- Dr. Bert and Rita Schreiber of Detroit for large amounts of quiet
- vacation time in their homes, which allowed me to make significant
- progress on this Info file and on `gawk' itself.
-
- Phil Hughes of SSC contributed in a very important way by loaning
- me his laptop GNU/Linux system, not once, but twice, which allowed
- me to do a lot of work while away from home.
-
- David Trueman deserves special credit; he has done a yeoman job of
- evolving `gawk' so that it performs well and without bugs.
- Although he is no longer involved with `gawk', working with him on
- this project was a significant pleasure.
-
- The intrepid members of the GNITS mailing list, and most notably
- Ulrich Drepper, provided invaluable help and feedback for the
- design of the internationalization features.
-
- Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly &
- Associates contributed significant editorial help for this Info
- file for the 3.1 release of `gawk'.
-
- Dr. Nelson Beebe, Andreas Buening, Antonio Colombo, Stephen Davies,
-Scott Deifik, John H. DuBois III, Darrel Hankerson, Michal Jaegermann,
-Ju"rgen Kahrs, Dave Pitts, Stepan Kasal, Pat Rankin, Andrew Schorr,
-Corinna Vinschen, Anders Wallin, and Eli Zaretskii (in alphabetical
-order) make up the current `gawk' "crack portability team." Without
-their hard work and help, `gawk' would not be nearly the fine program
-it is today. It has been and continues to be a pleasure working with
-this team of fine people.
-
- John Haque contributed the modifications to convert `gawk' into a
-byte-code interpreter, including the debugger, and the additional
-modifications for support of arbitrary precision arithmetic. Stephen
-Davies contributed to the effort to bring the byte-code changes into
-the mainstream code base. Efraim Yawitz contributed the initial text
-of *note Debugger::. John Haque contributed the initial text of *note
-Arbitrary Precision Arithmetic::.
-
- I would like to thank Brian Kernighan for invaluable assistance
-during the testing and debugging of `gawk', and for ongoing help and
-advice in clarifying numerous points about the language. We could not
-have done nearly as good a job on either `gawk' or its documentation
-without his help.
-
- I must thank my wonderful wife, Miriam, for her patience through the
-many versions of this project, for her proofreading, and for sharing me
-with the computer. I would like to thank my parents for their love,
-and for the grace with which they raised and educated me. Finally, I
-also must acknowledge my gratitude to G-d, for the many opportunities
-He has sent my way, as well as for the gifts He has given me with which
-to take advantage of those opportunities.
-
-
-Arnold Robbins
-Nof Ayalon
-ISRAEL
-March, 2011
-
-
-File: gawk.info, Node: Getting Started, Next: Invoking Gawk, Prev: Preface, Up: Top
-
-1 Getting Started with `awk'
-****************************
-
-The basic function of `awk' is to search files for lines (or other
-units of text) that contain certain patterns. When a line matches one
-of the patterns, `awk' performs specified actions on that line. `awk'
-keeps processing input lines in this way until it reaches the end of
-the input files.
-
- Programs in `awk' are different from programs in most other
-languages, because `awk' programs are "data-driven"; that is, you
-describe the data you want to work with and then what to do when you
-find it. Most other languages are "procedural"; you have to describe,
-in great detail, every step the program is to take. When working with
-procedural languages, it is usually much harder to clearly describe the
-data your program will process. For this reason, `awk' programs are
-often refreshingly easy to read and write.
-
- When you run `awk', you specify an `awk' "program" that tells `awk'
-what to do. The program consists of a series of "rules". (It may also
-contain "function definitions", an advanced feature that we will ignore
-for now. *Note User-defined::.) Each rule specifies one pattern to
-search for and one action to perform upon finding the pattern.
-
- Syntactically, a rule consists of a pattern followed by an action.
-The action is enclosed in curly braces to separate it from the pattern.
-Newlines usually separate rules. Therefore, an `awk' program looks
-like this:
-
- PATTERN { ACTION }
- PATTERN { ACTION }
- ...
-
-* Menu:
-
-* Running gawk:: How to run `gawk' programs; includes
- command-line syntax.
-* Sample Data Files:: Sample data files for use in the `awk'
- programs illustrated in this Info file.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of `awk'.
-* When:: When to use `gawk' and when to use
- other things.
-
-
-File: gawk.info, Node: Running gawk, Next: Sample Data Files, Up: Getting Started
-
-1.1 How to Run `awk' Programs
-=============================
-
-There are several ways to run an `awk' program. If the program is
-short, it is easiest to include it in the command that runs `awk', like
-this:
-
- awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
-
- When the program is long, it is usually more convenient to put it in
-a file and run it with a command like this:
-
- awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...
-
- This minor node discusses both mechanisms, along with several
-variations of each.
-
-* Menu:
-
-* One-shot:: Running a short throwaway `awk'
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent `awk' programs in
- files.
-* Executable Scripts:: Making self-contained `awk' programs.
-* Comments:: Adding documentation to `gawk'
- programs.
-* Quoting:: More discussion of shell quoting issues.
-
-
-File: gawk.info, Node: One-shot, Next: Read Terminal, Up: Running gawk
-
-1.1.1 One-Shot Throwaway `awk' Programs
----------------------------------------
-
-Once you are familiar with `awk', you will often type in simple
-programs the moment you want to use them. Then you can write the
-program as the first argument of the `awk' command, like this:
-
- awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
-
-where PROGRAM consists of a series of PATTERNS and ACTIONS, as
-described earlier.
-
- This command format instructs the "shell", or command interpreter,
-to start `awk' and use the PROGRAM to process records in the input
-file(s). There are single quotes around PROGRAM so the shell won't
-interpret any `awk' characters as special shell characters. The quotes
-also cause the shell to treat all of PROGRAM as a single argument for
-`awk', and allow PROGRAM to be more than one line long.
-
- This format is also useful for running short or medium-sized `awk'
-programs from shell scripts, because it avoids the need for a separate
-file for the `awk' program. A self-contained shell script is more
-reliable because there are no other files to misplace.
-
- *note Very Simple::, presents several short, self-contained programs.
-
-
-File: gawk.info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk
-
-1.1.2 Running `awk' Without Input Files
----------------------------------------
-
-You can also run `awk' without any input files. If you type the
-following command line:
-
- awk 'PROGRAM'
-
-`awk' applies the PROGRAM to the "standard input", which usually means
-whatever you type on the terminal. This continues until you indicate
-end-of-file by typing `Ctrl-d'. (On other operating systems, the
-end-of-file character may be different. For example, on OS/2, it is
-`Ctrl-z'.)
-
- As an example, the following program prints a friendly piece of
-advice (from Douglas Adams's `The Hitchhiker's Guide to the Galaxy'),
-to keep you from worrying about the complexities of computer
-programming(1) (`BEGIN' is a feature we haven't discussed yet):
-
- $ awk "BEGIN { print \"Don't Panic!\" }"
- -| Don't Panic!
-
- This program does not read any input. The `\' before each of the
-inner double quotes is necessary because of the shell's quoting
-rules--in particular because it mixes both single quotes and double
-quotes.(2)
-
- This next simple `awk' program emulates the `cat' utility; it copies
-whatever you type on the keyboard to its standard output (why this
-works is explained shortly).
-
- $ awk '{ print }'
- Now is the time for all good men
- -| Now is the time for all good men
- to come to the aid of their country.
- -| to come to the aid of their country.
- Four score and seven years ago, ...
- -| Four score and seven years ago, ...
- What, me worry?
- -| What, me worry?
- Ctrl-d
-
- ---------- Footnotes ----------
-
- (1) If you use Bash as your shell, you should execute the command
-`set +H' before running this program interactively, to disable the C
-shell-style command history, which treats `!' as a special character.
-We recommend putting this command into your personal startup file.
-
- (2) Although we generally recommend the use of single quotes around
-the program text, double quotes are needed here in order to put the
-single quote into the message.
-
-
-File: gawk.info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk
-
-1.1.3 Running Long Programs
----------------------------
-
-Sometimes your `awk' programs can be very long. In this case, it is
-more convenient to put the program into a separate file. In order to
-tell `awk' to use that file for its program, you type:
-
- awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...
-
- The `-f' instructs the `awk' utility to get the `awk' program from
-the file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For
-example, you could put the program:
-
- BEGIN { print "Don't Panic!" }
-
-into the file `advice'. Then this command:
-
- awk -f advice
-
-does the same thing as this one:
-
- awk "BEGIN { print \"Don't Panic!\" }"
-
-This was explained earlier (*note Read Terminal::). Note that you
-don't usually need single quotes around the file name that you specify
-with `-f', because most file names don't contain any of the shell's
-special characters. Notice that in `advice', the `awk' program did not
-have single quotes around it. The quotes are only needed for programs
-that are provided on the `awk' command line.
-
- If you want to clearly identify your `awk' program files as such,
-you can add the extension `.awk' to the file name. This doesn't affect
-the execution of the `awk' program but it does make "housekeeping"
-easier.
-
-
-File: gawk.info, Node: Executable Scripts, Next: Comments, Prev: Long, Up: Running gawk
-
-1.1.4 Executable `awk' Programs
--------------------------------
-
-Once you have learned `awk', you may want to write self-contained `awk'
-scripts, using the `#!' script mechanism. You can do this on many
-systems.(1) For example, you could update the file `advice' to look
-like this:
-
- #! /bin/awk -f
-
- BEGIN { print "Don't Panic!" }
-
-After making this file executable (with the `chmod' utility), simply
-type `advice' at the shell and the system arranges to run `awk'(2) as
-if you had typed `awk -f advice':
-
- $ chmod +x advice
- $ advice
- -| Don't Panic!
-
-(We assume you have the current directory in your shell's search path
-variable [typically `$PATH']. If not, you may need to type `./advice'
-at the shell.)
-
- Self-contained `awk' scripts are useful when you want to write a
-program that users can invoke without their having to know that the
-program is written in `awk'.
-
-Advanced Notes: Portability Issues with `#!'
---------------------------------------------
-
-Some systems limit the length of the interpreter name to 32 characters.
-Often, this can be dealt with by using a symbolic link.
-
- You should not put more than one argument on the `#!' line after the
-path to `awk'. It does not work. The operating system treats the rest
-of the line as a single argument and passes it to `awk'. Doing this
-leads to confusing behavior--most likely a usage diagnostic of some
-sort from `awk'.
-
- Finally, the value of `ARGV[0]' (*note Built-in Variables::) varies
-depending upon your operating system. Some systems put `awk' there,
-some put the full pathname of `awk' (such as `/bin/awk'), and some put
-the name of your script (`advice'). (d.c.) Don't rely on the value of
-`ARGV[0]' to provide your script name.
-
- ---------- Footnotes ----------
-
- (1) The `#!' mechanism works on GNU/Linux systems, BSD-based systems
-and commercial Unix systems.
-
- (2) The line beginning with `#!' lists the full file name of an
-interpreter to run and an optional initial command-line argument to
-pass to that interpreter. The operating system then runs the
-interpreter with the given argument and the full argument list of the
-executed program. The first argument in the list is the full file name
-of the `awk' program. The rest of the argument list contains either
-options to `awk', or data files, or both. Note that on many systems
-`awk' may be found in `/usr/bin' instead of in `/bin'. Caveat Emptor.
-
-
-File: gawk.info, Node: Comments, Next: Quoting, Prev: Executable Scripts, Up: Running gawk
-
-1.1.5 Comments in `awk' Programs
---------------------------------
-
-A "comment" is some text that is included in a program for the sake of
-human readers; it is not really an executable part of the program.
-Comments can explain what the program does and how it works. Nearly all
-programming languages have provisions for comments, as programs are
-typically hard to understand without them.
-
- In the `awk' language, a comment starts with the sharp sign
-character (`#') and continues to the end of the line. The `#' does not
-have to be the first character on the line. The `awk' language ignores
-the rest of a line following a sharp sign. For example, we could have
-put the following into `advice':
-
- # This program prints a nice friendly message. It helps
- # keep novice users from being afraid of the computer.
- BEGIN { print "Don't Panic!" }
-
- You can put comment lines into keyboard-composed throwaway `awk'
-programs, but this usually isn't very useful; the purpose of a comment
-is to help you or another person understand the program when reading it
-at a later time.
-
- CAUTION: As mentioned in *note One-shot::, you can enclose small
- to medium programs in single quotes, in order to keep your shell
- scripts self-contained. When doing so, _don't_ put an apostrophe
- (i.e., a single quote) into a comment (or anywhere else in your
- program). The shell interprets the quote as the closing quote for
- the entire program. As a result, usually the shell prints a
- message about mismatched quotes, and if `awk' actually runs, it
- will probably print strange messages about syntax errors. For
- example, look at the following:
-
- $ awk '{ print "hello" } # let's be cute'
- >
-
- The shell sees that the first two quotes match, and that a new
- quoted object begins at the end of the command line. It therefore
- prompts with the secondary prompt, waiting for more input. With
- Unix `awk', closing the quoted string produces this result:
-
- $ awk '{ print "hello" } # let's be cute'
- > '
- error--> awk: can't open file be
- error--> source line number 1
-
- Putting a backslash before the single quote in `let's' wouldn't
- help, since backslashes are not special inside single quotes. The
- next node describes the shell's quoting rules.
-
-
-File: gawk.info, Node: Quoting, Prev: Comments, Up: Running gawk
-
-1.1.6 Shell-Quoting Issues
---------------------------
-
-* Menu:
-
-* DOS Quoting:: Quoting in Windows Batch Files.
-
- For short to medium length `awk' programs, it is most convenient to
-enter the program on the `awk' command line. This is best done by
-enclosing the entire program in single quotes. This is true whether
-you are entering the program interactively at the shell prompt, or
-writing it as part of a larger shell script:
-
- awk 'PROGRAM TEXT' INPUT-FILE1 INPUT-FILE2 ...
-
- Once you are working with the shell, it is helpful to have a basic
-knowledge of shell quoting rules. The following rules apply only to
-POSIX-compliant, Bourne-style shells (such as Bash, the GNU Bourne-Again
-Shell). If you use the C shell, you're on your own.
-
- * Quoted items can be concatenated with nonquoted items as well as
- with other quoted items. The shell turns everything into one
- argument for the command.
-
- * Preceding any single character with a backslash (`\') quotes that
- character. The shell removes the backslash and passes the quoted
- character on to the command.
-
- * Single quotes protect everything between the opening and closing
- quotes. The shell does no interpretation of the quoted text,
- passing it on verbatim to the command. It is _impossible_ to
- embed a single quote inside single-quoted text. Refer back to
- *note Comments::, for an example of what happens if you try.
-
- * Double quotes protect most things between the opening and closing
- quotes. The shell does at least variable and command substitution
- on the quoted text. Different shells may do additional kinds of
- processing on double-quoted text.
-
- Since certain characters within double-quoted text are processed
- by the shell, they must be "escaped" within the text. Of note are
- the characters `$', ``', `\', and `"', all of which must be
- preceded by a backslash within double-quoted text if they are to
- be passed on literally to the program. (The leading backslash is
- stripped first.) Thus, the example seen in *note Read Terminal::,
- is applicable:
-
- $ awk "BEGIN { print \"Don't Panic!\" }"
- -| Don't Panic!
-
- Note that the single quote is not special within double quotes.
-
- * Null strings are removed when they occur as part of a non-null
- command-line argument, while explicit non-null objects are kept.
- For example, to specify that the field separator `FS' should be
- set to the null string, use:
-
- awk -F "" 'PROGRAM' FILES # correct
-
- Don't use this:
-
- awk -F"" 'PROGRAM' FILES # wrong!
-
- In the second case, `awk' will attempt to use the text of the
- program as the value of `FS', and the first file name as the text
- of the program! This results in syntax errors at best, and
- confusing behavior at worst.
-
- Mixing single and double quotes is difficult. You have to resort to
-shell quoting tricks, like this:
-
- $ awk 'BEGIN { print "Here is a single quote <'"'"'>" }'
- -| Here is a single quote <'>
-
-This program consists of three concatenated quoted strings. The first
-and the third are single-quoted, the second is double-quoted.
-
- This can be "simplified" to:
-
- $ awk 'BEGIN { print "Here is a single quote <'\''>" }'
- -| Here is a single quote <'>
-
-Judge for yourself which of these two is the more readable.
-
- Another option is to use double quotes, escaping the embedded,
-`awk'-level double quotes:
-
- $ awk "BEGIN { print \"Here is a single quote <'>\" }"
- -| Here is a single quote <'>
-
-This option is also painful, because double quotes, backslashes, and
-dollar signs are very common in more advanced `awk' programs.
-
- A third option is to use the octal escape sequence equivalents
-(*note Escape Sequences::) for the single- and double-quote characters,
-like so:
-
- $ awk 'BEGIN { print "Here is a single quote <\47>" }'
- -| Here is a single quote <'>
- $ awk 'BEGIN { print "Here is a double quote <\42>" }'
- -| Here is a double quote <">
-
-This works nicely, except that you should comment clearly what the
-escapes mean.
-
- A fourth option is to use command-line variable assignment, like
-this:
-
- $ awk -v sq="'" 'BEGIN { print "Here is a single quote <" sq ">" }'
- -| Here is a single quote <'>
-
- If you really need both single and double quotes in your `awk'
-program, it is probably best to move it into a separate file, where the
-shell won't be part of the picture, and you can say what you mean.
-
-
-File: gawk.info, Node: DOS Quoting, Up: Quoting
-
-1.1.6.1 Quoting in MS-Windows Batch Files
-.........................................
-
-Although this Info file generally only worries about POSIX systems and
-the POSIX shell, the following issue arises often enough for many users
-that it is worth addressing.
-
- The "shells" on Microsoft Windows systems use the double-quote
-character for quoting, and make it difficult or impossible to include an
-escaped double-quote character in a command-line script. The following
-example, courtesy of Jeroen Brink, shows how to print all lines in a
-file surrounded by double quotes:
-
- gawk "{ print \"\042\" $0 \"\042\" }" FILE
-
-
-File: gawk.info, Node: Sample Data Files, Next: Very Simple, Prev: Running gawk, Up: Getting Started
-
-1.2 Data Files for the Examples
-===============================
-
-Many of the examples in this Info file take their input from two sample
-data files. The first, `BBS-list', represents a list of computer
-bulletin board systems together with information about those systems.
-The second data file, called `inventory-shipped', contains information
-about monthly shipments. In both files, each line is considered to be
-one "record".
-
- In the data file `BBS-list', each record contains the name of a
-computer bulletin board, its phone number, the board's baud rate(s),
-and a code for the number of hours it is operational. An `A' in the
-last column means the board operates 24 hours a day. A `B' in the last
-column means the board only operates on evening and weekend hours. A
-`C' means the board operates only on weekends:
-
- aardvark 555-5553 1200/300 B
- alpo-net 555-3412 2400/1200/300 A
- barfly 555-7685 1200/300 A
- bites 555-1675 2400/1200/300 A
- camelot 555-0542 300 C
- core 555-2912 1200/300 C
- fooey 555-1234 2400/1200/300 B
- foot 555-6699 1200/300 B
- macfoo 555-6480 1200/300 A
- sdace 555-3430 2400/1200/300 A
- sabafoo 555-2127 1200/300 C
-
- The data file `inventory-shipped' represents information about
-shipments during the year. Each record contains the month, the number
-of green crates shipped, the number of red boxes shipped, the number of
-orange bags shipped, and the number of blue packages shipped,
-respectively. There are 16 entries, covering the 12 months of last year
-and the first four months of the current year.
-
- Jan 13 25 15 115
- Feb 15 32 24 226
- Mar 15 24 34 228
- Apr 31 52 63 420
- May 16 34 29 208
- Jun 31 42 75 492
- Jul 24 34 67 436
- Aug 15 34 47 316
- Sep 13 55 37 277
- Oct 29 54 68 525
- Nov 20 87 82 577
- Dec 17 35 61 401
-
- Jan 21 36 64 620
- Feb 26 58 80 652
- Mar 24 75 70 495
- Apr 21 70 74 514
-
- If you are reading this in GNU Emacs using Info, you can copy the
-regions of text showing these sample files into your own test files.
-This way you can try out the examples shown in the remainder of this
-document. You do this by using the command `M-x write-region' to copy
-text from the Info file into a file for use with `awk' (*Note
-Miscellaneous File Operations: (emacs)Misc File Ops, for more
-information). Using this information, create your own `BBS-list' and
-`inventory-shipped' files and practice what you learn in this Info file.
-
- If you are using the stand-alone version of Info, see *note Extract
-Program::, for an `awk' program that extracts these data files from
-`gawk.texi', the Texinfo source file for this Info file.
-
-
-File: gawk.info, Node: Very Simple, Next: Two Rules, Prev: Sample Data Files, Up: Getting Started
-
-1.3 Some Simple Examples
-========================
-
-The following command runs a simple `awk' program that searches the
-input file `BBS-list' for the character string `foo' (a grouping of
-characters is usually called a "string"; the term "string" is based on
-similar usage in English, such as "a string of pearls," or "a string of
-cars in a train"):
-
- awk '/foo/ { print $0 }' BBS-list
-
-When lines containing `foo' are found, they are printed because
-`print $0' means print the current line. (Just `print' by itself means
-the same thing, so we could have written that instead.)
-
- You will notice that slashes (`/') surround the string `foo' in the
-`awk' program. The slashes indicate that `foo' is the pattern to
-search for. This type of pattern is called a "regular expression",
-which is covered in more detail later (*note Regexp::). The pattern is
-allowed to match parts of words. There are single quotes around the
-`awk' program so that the shell won't interpret any of it as special
-shell characters.
-
- Here is what this program prints:
-
- $ awk '/foo/ { print $0 }' BBS-list
- -| fooey 555-1234 2400/1200/300 B
- -| foot 555-6699 1200/300 B
- -| macfoo 555-6480 1200/300 A
- -| sabafoo 555-2127 1200/300 C
-
- In an `awk' rule, either the pattern or the action can be omitted,
-but not both. If the pattern is omitted, then the action is performed
-for _every_ input line. If the action is omitted, the default action
-is to print all lines that match the pattern.
-
- Thus, we could leave out the action (the `print' statement and the
-curly braces) in the previous example and the result would be the same:
-`awk' prints all lines matching the pattern `foo'. By comparison,
-omitting the `print' statement but retaining the curly braces makes an
-empty action that does nothing (i.e., no lines are printed).
-
- Many practical `awk' programs are just a line or two. Following is a
-collection of useful, short programs to get you started. Some of these
-programs contain constructs that haven't been covered yet. (The
-description of the program will give you a good idea of what is going
-on, but please read the rest of the Info file to become an `awk'
-expert!) Most of the examples use a data file named `data'. This is
-just a placeholder; if you use these programs yourself, substitute your
-own file names for `data'. For future reference, note that there is
-often more than one way to do things in `awk'. At some point, you may
-want to look back at these examples and see if you can come up with
-different ways to do the same things shown here:
-
- * Print the length of the longest input line:
-
- awk '{ if (length($0) > max) max = length($0) }
- END { print max }' data
-
- * Print every line that is longer than 80 characters:
-
- awk 'length($0) > 80' data
-
- The sole rule has a relational expression as its pattern and it
- has no action--so the default action, printing the record, is used.
-
- * Print the length of the longest line in `data':
-
- expand data | awk '{ if (x < length()) x = length() }
- END { print "maximum line length is " x }'
-
- The input is processed by the `expand' utility to change TABs into
- spaces, so the widths compared are actually the right-margin
- columns.
-
- * Print every line that has at least one field:
-
- awk 'NF > 0' data
-
- This is an easy way to delete blank lines from a file (or rather,
- to create a new file similar to the old file but from which the
- blank lines have been removed).
-
- * Print seven random numbers from 0 to 100, inclusive:
-
- awk 'BEGIN { for (i = 1; i <= 7; i++)
- print int(101 * rand()) }'
-
- * Print the total number of bytes used by FILES:
-
- ls -l FILES | awk '{ x += $5 }
- END { print "total bytes: " x }'
-
- * Print the total number of kilobytes used by FILES:
-
- ls -l FILES | awk '{ x += $5 }
- END { print "total K-bytes:", x / 1024 }'
-
- * Print a sorted list of the login names of all users:
-
- awk -F: '{ print $1 }' /etc/passwd | sort
-
- * Count the lines in a file:
-
- awk 'END { print NR }' data
-
- * Print the even-numbered lines in the data file:
-
- awk 'NR % 2 == 0' data
-
- If you use the expression `NR % 2 == 1' instead, the program would
- print the odd-numbered lines.
-
-
-File: gawk.info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started
-
-1.4 An Example with Two Rules
-=============================
-
-The `awk' utility reads the input files one line at a time. For each
-line, `awk' tries the patterns of each of the rules. If several
-patterns match, then several actions are run in the order in which they
-appear in the `awk' program. If no patterns match, then no actions are
-run.
-
- After processing all the rules that match the line (and perhaps
-there are none), `awk' reads the next line. (However, *note Next
-Statement::, and also *note Nextfile Statement::). This continues
-until the program reaches the end of the file. For example, the
-following `awk' program contains two rules:
-
- /12/ { print $0 }
- /21/ { print $0 }
-
-The first rule has the string `12' as the pattern and `print $0' as the
-action. The second rule has the string `21' as the pattern and also
-has `print $0' as the action. Each rule's action is enclosed in its
-own pair of braces.
-
- This program prints every line that contains the string `12' _or_
-the string `21'. If a line contains both strings, it is printed twice,
-once by each rule.
-
- This is what happens if we run this program on our two sample data
-files, `BBS-list' and `inventory-shipped':
-
- $ awk '/12/ { print $0 }
- > /21/ { print $0 }' BBS-list inventory-shipped
- -| aardvark 555-5553 1200/300 B
- -| alpo-net 555-3412 2400/1200/300 A
- -| barfly 555-7685 1200/300 A
- -| bites 555-1675 2400/1200/300 A
- -| core 555-2912 1200/300 C
- -| fooey 555-1234 2400/1200/300 B
- -| foot 555-6699 1200/300 B
- -| macfoo 555-6480 1200/300 A
- -| sdace 555-3430 2400/1200/300 A
- -| sabafoo 555-2127 1200/300 C
- -| sabafoo 555-2127 1200/300 C
- -| Jan 21 36 64 620
- -| Apr 21 70 74 514
-
-Note how the line beginning with `sabafoo' in `BBS-list' was printed
-twice, once for each rule.
-
-
-File: gawk.info, Node: More Complex, Next: Statements/Lines, Prev: Two Rules, Up: Getting Started
-
-1.5 A More Complex Example
-==========================
-
-Now that we've mastered some simple tasks, let's look at what typical
-`awk' programs do. This example shows how `awk' can be used to
-summarize, select, and rearrange the output of another utility. It uses
-features that haven't been covered yet, so don't worry if you don't
-understand all the details:
-
- LC_ALL=C ls -l | awk '$6 == "Nov" { sum += $5 }
- END { print sum }'
-
- This command prints the total number of bytes in all the files in the
-current directory that were last modified in November (of any year).
-The `ls -l' part of this example is a system command that gives you a
-listing of the files in a directory, including each file's size and the
-date the file was last modified. Its output looks like this:
-
- -rw-r--r-- 1 arnold user 1933 Nov 7 13:05 Makefile
- -rw-r--r-- 1 arnold user 10809 Nov 7 13:03 awk.h
- -rw-r--r-- 1 arnold user 983 Apr 13 12:14 awk.tab.h
- -rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awkgram.y
- -rw-r--r-- 1 arnold user 22414 Nov 7 13:03 awk1.c
- -rw-r--r-- 1 arnold user 37455 Nov 7 13:03 awk2.c
- -rw-r--r-- 1 arnold user 27511 Dec 9 13:07 awk3.c
- -rw-r--r-- 1 arnold user 7989 Nov 7 13:03 awk4.c
-
-The first field contains read-write permissions, the second field
-contains the number of links to the file, and the third field
-identifies the owner of the file. The fourth field identifies the group
-of the file. The fifth field contains the size of the file in bytes.
-The sixth, seventh, and eighth fields contain the month, day, and time,
-respectively, that the file was last modified. Finally, the ninth field
-contains the file name.(1)
-
- The `$6 == "Nov"' in our `awk' program is an expression that tests
-whether the sixth field of the output from `ls -l' matches the string
-`Nov'. Each time a line has the string `Nov' for its sixth field, the
-action `sum += $5' is performed. This adds the fifth field (the file's
-size) to the variable `sum'. As a result, when `awk' has finished
-reading all the input lines, `sum' is the total of the sizes of the
-files whose lines matched the pattern. (This works because `awk'
-variables are automatically initialized to zero.)
-
- After the last line of output from `ls' has been processed, the
-`END' rule executes and prints the value of `sum'. In this example,
-the value of `sum' is 80600.
-
- These more advanced `awk' techniques are covered in later sections
-(*note Action Overview::). Before you can move on to more advanced
-`awk' programming, you have to know how `awk' interprets your input and
-displays your output. By manipulating fields and using `print'
-statements, you can produce some very useful and impressive-looking
-reports.
-
- ---------- Footnotes ----------
-
- (1) The `LC_ALL=C' is needed to produce this traditional-style
-output from `ls'.
-
-
-File: gawk.info, Node: Statements/Lines, Next: Other Features, Prev: More Complex, Up: Getting Started
-
-1.6 `awk' Statements Versus Lines
-=================================
-
-Most often, each line in an `awk' program is a separate statement or
-separate rule, like this:
-
- awk '/12/ { print $0 }
- /21/ { print $0 }' BBS-list inventory-shipped
-
- However, `gawk' ignores newlines after any of the following symbols
-and keywords:
-
- , { ? : || && do else
-
-A newline at any other point is considered the end of the statement.(1)
-
- If you would like to split a single statement into two lines at a
-point where a newline would terminate it, you can "continue" it by
-ending the first line with a backslash character (`\'). The backslash
-must be the final character on the line in order to be recognized as a
-continuation character. A backslash is allowed anywhere in the
-statement, even in the middle of a string or regular expression. For
-example:
-
- awk '/This regular expression is too long, so continue it\
- on the next line/ { print $1 }'
-
-We have generally not used backslash continuation in our sample
-programs. `gawk' places no limit on the length of a line, so backslash
-continuation is never strictly necessary; it just makes programs more
-readable. For this same reason, as well as for clarity, we have kept
-most statements short in the sample programs presented throughout the
-Info file. Backslash continuation is most useful when your `awk'
-program is in a separate source file instead of entered from the
-command line. You should also note that many `awk' implementations are
-more particular about where you may use backslash continuation. For
-example, they may not allow you to split a string constant using
-backslash continuation. Thus, for maximum portability of your `awk'
-programs, it is best not to split your lines in the middle of a regular
-expression or a string.
-
- CAUTION: _Backslash continuation does not work as described with
- the C shell._ It works for `awk' programs in files and for
- one-shot programs, _provided_ you are using a POSIX-compliant
- shell, such as the Unix Bourne shell or Bash. But the C shell
- behaves differently! There, you must use two backslashes in a
- row, followed by a newline. Note also that when using the C
- shell, _every_ newline in your `awk' program must be escaped with
- a backslash. To illustrate:
-
- % awk 'BEGIN { \
- ? print \\
- ? "hello, world" \
- ? }'
- -| hello, world
-
- Here, the `%' and `?' are the C shell's primary and secondary
- prompts, analogous to the standard shell's `$' and `>'.
-
- Compare the previous example to how it is done with a
- POSIX-compliant shell:
-
- $ awk 'BEGIN {
- > print \
- > "hello, world"
- > }'
- -| hello, world
-
- `awk' is a line-oriented language. Each rule's action has to begin
-on the same line as the pattern. To have the pattern and action on
-separate lines, you _must_ use backslash continuation; there is no
-other option.
-
- Another thing to keep in mind is that backslash continuation and
-comments do not mix. As soon as `awk' sees the `#' that starts a
-comment, it ignores _everything_ on the rest of the line. For example:
-
- $ gawk 'BEGIN { print "dont panic" # a friendly \
- > BEGIN rule
- > }'
- error--> gawk: cmd. line:2: BEGIN rule
- error--> gawk: cmd. line:2: ^ parse error
-
-In this case, it looks like the backslash would continue the comment
-onto the next line. However, the backslash-newline combination is never
-even noticed because it is "hidden" inside the comment. Thus, the
-`BEGIN' is noted as a syntax error.
-
- When `awk' statements within one rule are short, you might want to
-put more than one of them on a line. This is accomplished by
-separating the statements with a semicolon (`;'). This also applies to
-the rules themselves. Thus, the program shown at the start of this
-minor node could also be written this way:
-
- /12/ { print $0 } ; /21/ { print $0 }
-
- NOTE: The requirement that states that rules on the same line must
- be separated with a semicolon was not in the original `awk'
- language; it was added for consistency with the treatment of
- statements within an action.
-
- ---------- Footnotes ----------
-
- (1) The `?' and `:' referred to here is the three-operand
-conditional expression described in *note Conditional Exp::. Splitting
-lines after `?' and `:' is a minor `gawk' extension; if `--posix' is
-specified (*note Options::), then this extension is disabled.
-
-
-File: gawk.info, Node: Other Features, Next: When, Prev: Statements/Lines, Up: Getting Started
-
-1.7 Other Features of `awk'
-===========================
-
-The `awk' language provides a number of predefined, or "built-in",
-variables that your programs can use to get information from `awk'.
-There are other variables your program can set as well to control how
-`awk' processes your data.
-
- In addition, `awk' provides a number of built-in functions for doing
-common computational and string-related operations. `gawk' provides
-built-in functions for working with timestamps, performing bit
-manipulation, for runtime string translation (internationalization),
-determining the type of a variable, and array sorting.
-
- As we develop our presentation of the `awk' language, we introduce
-most of the variables and many of the functions. They are described
-systematically in *note Built-in Variables::, and *note Built-in::.
-
-
-File: gawk.info, Node: When, Prev: Other Features, Up: Getting Started
-
-1.8 When to Use `awk'
-=====================
-
-Now that you've seen some of what `awk' can do, you might wonder how
-`awk' could be useful for you. By using utility programs, advanced
-patterns, field separators, arithmetic statements, and other selection
-criteria, you can produce much more complex output. The `awk' language
-is very useful for producing reports from large amounts of raw data,
-such as summarizing information from the output of other utility
-programs like `ls'. (*Note More Complex::.)
-
- Programs written with `awk' are usually much smaller than they would
-be in other languages. This makes `awk' programs easy to compose and
-use. Often, `awk' programs can be quickly composed at your keyboard,
-used once, and thrown away. Because `awk' programs are interpreted, you
-can avoid the (usually lengthy) compilation part of the typical
-edit-compile-test-debug cycle of software development.
-
- Complex programs have been written in `awk', including a complete
-retargetable assembler for eight-bit microprocessors (*note Glossary::,
-for more information), and a microcode assembler for a special-purpose
-Prolog computer. While the original `awk''s capabilities were strained
-by tasks of such complexity, modern versions are more capable. Even
-Brian Kernighan's version of `awk' has fewer predefined limits, and
-those that it has are much larger than they used to be.
-
- If you find yourself writing `awk' scripts of more than, say, a few
-hundred lines, you might consider using a different programming
-language. Emacs Lisp is a good choice if you need sophisticated string
-or pattern matching capabilities. The shell is also good at string and
-pattern matching; in addition, it allows powerful use of the system
-utilities. More conventional languages, such as C, C++, and Java, offer
-better facilities for system programming and for managing the complexity
-of large programs. Programs in these languages may require more lines
-of source code than the equivalent `awk' programs, but they are easier
-to maintain and usually run more efficiently.
-
-
-File: gawk.info, Node: Invoking Gawk, Next: Regexp, Prev: Getting Started, Up: Top
-
-2 Running `awk' and `gawk'
-**************************
-
-This major node covers how to run awk, both POSIX-standard and
-`gawk'-specific command-line options, and what `awk' and `gawk' do with
-non-option arguments. It then proceeds to cover how `gawk' searches
-for source files, reading standard input along with other files,
-`gawk''s environment variables, `gawk''s exit status, using include
-files, and obsolete and undocumented options and/or features.
-
- Many of the options and features described here are discussed in
-more detail later in the Info file; feel free to skip over things in
-this major node that don't interest you right now.
-
-* Menu:
-
-* Command Line:: How to run `awk'.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables `gawk' uses.
-* Exit Status:: `gawk''s exit status.
-* Include Files:: Including other files into your program.
-* Loading Shared Libraries:: Loading shared libraries into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-
-
-File: gawk.info, Node: Command Line, Next: Options, Up: Invoking Gawk
-
-2.1 Invoking `awk'
-==================
-
-There are two ways to run `awk'--with an explicit program or with one
-or more program files. Here are templates for both of them; items
-enclosed in [...] in these templates are optional:
-
- awk [OPTIONS] -f progfile [`--'] FILE ...
- awk [OPTIONS] [`--'] 'PROGRAM' FILE ...
-
- Besides traditional one-letter POSIX-style options, `gawk' also
-supports GNU long options.
-
- It is possible to invoke `awk' with an empty program:
-
- awk '' datafile1 datafile2
-
-Doing so makes little sense, though; `awk' exits silently when given an
-empty program. (d.c.) If `--lint' has been specified on the command
-line, `gawk' issues a warning that the program is empty.
-
-
-File: gawk.info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Invoking Gawk
-
-2.2 Command-Line Options
-========================
-
-Options begin with a dash and consist of a single character. GNU-style
-long options consist of two dashes and a keyword. The keyword can be
-abbreviated, as long as the abbreviation allows the option to be
-uniquely identified. If the option takes an argument, then the keyword
-is either immediately followed by an equals sign (`=') and the
-argument's value, or the keyword and the argument's value are separated
-by whitespace. If a particular option with a value is given more than
-once, it is the last value that counts.
-
- Each long option for `gawk' has a corresponding POSIX-style short
-option. The long and short options are interchangeable in all contexts.
-The following list describes options mandated by the POSIX standard:
-
-`-F FS'
-`--field-separator FS'
- Set the `FS' variable to FS (*note Field Separators::).
-
-`-f SOURCE-FILE'
-`--file SOURCE-FILE'
- Read `awk' program source from SOURCE-FILE instead of in the first
- non-option argument. This option may be given multiple times; the
- `awk' program consists of the concatenation the contents of each
- specified SOURCE-FILE.
-
-`-v VAR=VAL'
-`--assign VAR=VAL'
- Set the variable VAR to the value VAL _before_ execution of the
- program begins. Such variable values are available inside the
- `BEGIN' rule (*note Other Arguments::).
-
- The `-v' option can only set one variable, but it can be used more
- than once, setting another variable each time, like this: `awk
- -v foo=1 -v bar=2 ...'.
-
- CAUTION: Using `-v' to set the values of the built-in
- variables may lead to surprising results. `awk' will reset
- the values of those variables as it needs to, possibly
- ignoring any predefined value you may have given.
-
-`-W GAWK-OPT'
- Provide an implementation-specific option. This is the POSIX
- convention for providing implementation-specific options. These
- options also have corresponding GNU-style long options. Note that
- the long options may be abbreviated, as long as the abbreviations
- remain unique. The full list of `gawk'-specific options is
- provided next.
-
-`--'
- Signal the end of the command-line options. The following
- arguments are not treated as options even if they begin with `-'.
- This interpretation of `--' follows the POSIX argument parsing
- conventions.
-
- This is useful if you have file names that start with `-', or in
- shell scripts, if you have file names that will be specified by
- the user that could start with `-'. It is also useful for passing
- options on to the `awk' program; see *note Getopt Function::.
-
- The following list describes `gawk'-specific options:
-
-`-b'
-`--characters-as-bytes'
- Cause `gawk' to treat all input data as single-byte characters.
- In addition, all output written with `print' or `printf' are
- treated as single-byte characters.
-
- Normally, `gawk' follows the POSIX standard and attempts to process
- its input data according to the current locale. This can often
- involve converting multibyte characters into wide characters
- (internally), and can lead to problems or confusion if the input
- data does not contain valid multibyte characters. This option is
- an easy way to tell `gawk': "hands off my data!".
-
-`-c'
-`--traditional'
- Specify "compatibility mode", in which the GNU extensions to the
- `awk' language are disabled, so that `gawk' behaves just like
- Brian Kernighan's version `awk'. *Note POSIX/GNU::, which
- summarizes the extensions. Also see *note Compatibility Mode::.
-
-`-C'
-`--copyright'
- Print the short version of the General Public License and then
- exit.
-
-`-d[FILE]'
-`--dump-variables[=FILE]'
- Print a sorted list of global variables, their types, and final
- values to FILE. If no FILE is provided, print this list to the
- file named `awkvars.out' in the current directory. No space is
- allowed between the `-d' and FILE, if FILE is supplied.
-
- Having a list of all global variables is a good way to look for
- typographical errors in your programs. You would also use this
- option if you have a large program with a lot of functions, and
- you want to be sure that your functions don't inadvertently use
- global variables that you meant to be local. (This is a
- particularly easy mistake to make with simple variable names like
- `i', `j', etc.)
-
-`-D[FILE]'
-`--debug=[FILE]'
- Enable debugging of `awk' programs (*note Debugging::). By
- default, the debugger reads commands interactively from the
- terminal. The optional FILE argument allows you to specify a file
- with a list of commands for the debugger to execute
- non-interactively. No space is allowed between the `-D' and FILE,
- if FILE is supplied.
-
-`-e PROGRAM-TEXT'
-`--source PROGRAM-TEXT'
- Provide program source code in the PROGRAM-TEXT. This option
- allows you to mix source code in files with source code that you
- enter on the command line. This is particularly useful when you
- have library functions that you want to use from your command-line
- programs (*note AWKPATH Variable::).
-
-`-E FILE'
-`--exec FILE'
- Similar to `-f', read `awk' program text from FILE. There are two
- differences from `-f':
-
- * This option terminates option processing; anything else on
- the command line is passed on directly to the `awk' program.
-
- * Command-line variable assignments of the form `VAR=VALUE' are
- disallowed.
-
- This option is particularly necessary for World Wide Web CGI
- applications that pass arguments through the URL; using this
- option prevents a malicious (or other) user from passing in
- options, assignments, or `awk' source code (via `--source') to the
- CGI application. This option should be used with `#!' scripts
- (*note Executable Scripts::), like so:
-
- #! /usr/local/bin/gawk -E
-
- AWK PROGRAM HERE ...
-
-`-g'
-`--gen-pot'
- Analyze the source program and generate a GNU `gettext' Portable
- Object Template file on standard output for all string constants
- that have been marked for translation. *Note
- Internationalization::, for information about this option.
-
-`-h'
-`--help'
- Print a "usage" message summarizing the short and long style
- options that `gawk' accepts and then exit.
-
-`-l LIB'
-`--load LIB'
- Load a shared library LIB. This searches for the library using the
- `AWKLIBPATH' environment variable. The correct library suffix for
- your platform will be supplied by default, so it need not be
- specified in the library name. The library initialization routine
- should be named `dlload()'. An alternative is to use the `@load'
- keyword inside the program to load a shared library.
-
-`-L [value]'
-`--lint[=value]'
- Warn about constructs that are dubious or nonportable to other
- `awk' implementations. Some warnings are issued when `gawk' first
- reads your program. Others are issued at runtime, as your program
- executes. With an optional argument of `fatal', lint warnings
- become fatal errors. This may be drastic, but its use will
- certainly encourage the development of cleaner `awk' programs.
- With an optional argument of `invalid', only warnings about things
- that are actually invalid are issued. (This is not fully
- implemented yet.)
-
- Some warnings are only printed once, even if the dubious
- constructs they warn about occur multiple times in your `awk'
- program. Thus, when eliminating problems pointed out by `--lint',
- you should take care to search for all occurrences of each
- inappropriate construct. As `awk' programs are usually short,
- doing so is not burdensome.
-
-`-M'
-`--bignum'
- Force arbitrary precision arithmetic on numbers. This option has
- no effect if `gawk' is not compiled to use the GNU MPFR and MP
- libraries (*note Arbitrary Precision Arithmetic::).
-
-`-n'
-`--non-decimal-data'
- Enable automatic interpretation of octal and hexadecimal values in
- input data (*note Nondecimal Data::).
-
- CAUTION: This option can severely break old programs. Use
- with care.
-
-`-N'
-`--use-lc-numeric'
- Force the use of the locale's decimal point character when parsing
- numeric input data (*note Locales::).
-
-`-o[FILE]'
-`--pretty-print[=FILE]'
- Enable pretty-printing of `awk' programs. By default, output
- program is created in a file named `awkprof.out'. The optional
- FILE argument allows you to specify a different file name for the
- output. No space is allowed between the `-o' and FILE, if FILE is
- supplied.
-
-`-O'
-`--optimize'
- Enable some optimizations on the internal representation of the
- program. At the moment this includes just simple constant
- folding. The `gawk' maintainer hopes to add more optimizations
- over time.
-
-`-p[FILE]'
-`--profile[=FILE]'
- Enable profiling of `awk' programs (*note Profiling::). By
- default, profiles are created in a file named `awkprof.out'. The
- optional FILE argument allows you to specify a different file name
- for the profile file. No space is allowed between the `-p' and
- FILE, if FILE is supplied.
-
- The profile contains execution counts for each statement in the
- program in the left margin, and function call counts for each
- function.
-
-`-P'
-`--posix'
- Operate in strict POSIX mode. This disables all `gawk' extensions
- (just like `--traditional') and disables all extensions not
- allowed by POSIX. *Note Common Extensions::, for a summary of the
- extensions in `gawk' that are disabled by this option. Also, the
- following additional restrictions apply:
-
- * Newlines do not act as whitespace to separate fields when
- `FS' is equal to a single space (*note Fields::).
-
- * Newlines are not allowed after `?' or `:' (*note Conditional
- Exp::).
-
- * Specifying `-Ft' on the command-line does not set the value
- of `FS' to be a single TAB character (*note Field
- Separators::).
-
- * The locale's decimal point character is used for parsing input
- data (*note Locales::).
-
- If you supply both `--traditional' and `--posix' on the command
- line, `--posix' takes precedence. `gawk' also issues a warning if
- both options are supplied.
-
-`-r'
-`--re-interval'
- Allow interval expressions (*note Regexp Operators::) in regexps.
- This is now `gawk''s default behavior. Nevertheless, this option
- remains both for backward compatibility, and for use in
- combination with the `--traditional' option.
-
-`-S'
-`--sandbox'
- Disable the `system()' function, input redirections with `getline',
- output redirections with `print' and `printf', and dynamic
- extensions. This is particularly useful when you want to run
- `awk' scripts from questionable sources and need to make sure the
- scripts can't access your system (other than the specified input
- data file).
-
-`-t'
-`--lint-old'
- Warn about constructs that are not available in the original
- version of `awk' from Version 7 Unix (*note V7/SVR3.1::).
-
-`-V'
-`--version'
- Print version information for this particular copy of `gawk'.
- This allows you to determine if your copy of `gawk' is up to date
- with respect to whatever the Free Software Foundation is currently
- distributing. It is also useful for bug reports (*note Bugs::).
-
- As long as program text has been supplied, any other options are
-flagged as invalid with a warning message but are otherwise ignored.
-
- In compatibility mode, as a special case, if the value of FS supplied
-to the `-F' option is `t', then `FS' is set to the TAB character
-(`"\t"'). This is true only for `--traditional' and not for `--posix'
-(*note Field Separators::).
-
- The `-f' option may be used more than once on the command line. If
-it is, `awk' reads its program source from all of the named files, as
-if they had been concatenated together into one big file. This is
-useful for creating libraries of `awk' functions. These functions can
-be written once and then retrieved from a standard place, instead of
-having to be included into each individual program. (As mentioned in
-*note Definition Syntax::, function names must be unique.)
-
- With standard `awk', library functions can still be used, even if
-the program is entered at the terminal, by specifying `-f /dev/tty'.
-After typing your program, type `Ctrl-d' (the end-of-file character) to
-terminate it. (You may also use `-f -' to read program source from the
-standard input but then you will not be able to also use the standard
-input as a source of data.)
-
- Because it is clumsy using the standard `awk' mechanisms to mix
-source file and command-line `awk' programs, `gawk' provides the
-`--source' option. This does not require you to pre-empt the standard
-input for your source code; it allows you to easily mix command-line
-and library source code (*note AWKPATH Variable::). The `--source'
-option may also be used multiple times on the command line.
-
- If no `-f' or `--source' option is specified, then `gawk' uses the
-first non-option command-line argument as the text of the program
-source code.
-
- If the environment variable `POSIXLY_CORRECT' exists, then `gawk'
-behaves in strict POSIX mode, exactly as if you had supplied the
-`--posix' command-line option. Many GNU programs look for this
-environment variable to suppress extensions that conflict with POSIX,
-but `gawk' behaves differently: it suppresses all extensions, even
-those that do not conflict with POSIX, and behaves in strict POSIX
-mode. If `--lint' is supplied on the command line and `gawk' turns on
-POSIX mode because of `POSIXLY_CORRECT', then it issues a warning
-message indicating that POSIX mode is in effect. You would typically
-set this variable in your shell's startup file. For a
-Bourne-compatible shell (such as Bash), you would add these lines to
-the `.profile' file in your home directory:
-
- POSIXLY_CORRECT=true
- export POSIXLY_CORRECT
-
- For a C shell-compatible shell,(1) you would add this line to the
-`.login' file in your home directory:
-
- setenv POSIXLY_CORRECT true
-
- Having `POSIXLY_CORRECT' set is not recommended for daily use, but
-it is good for testing the portability of your programs to other
-environments.
-
- ---------- Footnotes ----------
-
- (1) Not recommended.
-
-
-File: gawk.info, Node: Other Arguments, Next: Naming Standard Input, Prev: Options, Up: Invoking Gawk
-
-2.3 Other Command-Line Arguments
-================================
-
-Any additional arguments on the command line are normally treated as
-input files to be processed in the order specified. However, an
-argument that has the form `VAR=VALUE', assigns the value VALUE to the
-variable VAR--it does not specify a file at all. (See *note Assignment
-Options::.)
-
- All these arguments are made available to your `awk' program in the
-`ARGV' array (*note Built-in Variables::). Command-line options and
-the program text (if present) are omitted from `ARGV'. All other
-arguments, including variable assignments, are included. As each
-element of `ARGV' is processed, `gawk' sets the variable `ARGIND' to
-the index in `ARGV' of the current element.
-
- The distinction between file name arguments and variable-assignment
-arguments is made when `awk' is about to open the next input file. At
-that point in execution, it checks the file name to see whether it is
-really a variable assignment; if so, `awk' sets the variable instead of
-reading a file.
-
- Therefore, the variables actually receive the given values after all
-previously specified files have been read. In particular, the values of
-variables assigned in this fashion are _not_ available inside a `BEGIN'
-rule (*note BEGIN/END::), because such rules are run before `awk'
-begins scanning the argument list.
-
- The variable values given on the command line are processed for
-escape sequences (*note Escape Sequences::). (d.c.)
-
- In some earlier implementations of `awk', when a variable assignment
-occurred before any file names, the assignment would happen _before_
-the `BEGIN' rule was executed. `awk''s behavior was thus inconsistent;
-some command-line assignments were available inside the `BEGIN' rule,
-while others were not. Unfortunately, some applications came to depend
-upon this "feature." When `awk' was changed to be more consistent, the
-`-v' option was added to accommodate applications that depended upon
-the old behavior.
-
- The variable assignment feature is most useful for assigning to
-variables such as `RS', `OFS', and `ORS', which control input and
-output formats before scanning the data files. It is also useful for
-controlling state if multiple passes are needed over a data file. For
-example:
-
- awk 'pass == 1 { PASS 1 STUFF }
- pass == 2 { PASS 2 STUFF }' pass=1 mydata pass=2 mydata
-
- Given the variable assignment feature, the `-F' option for setting
-the value of `FS' is not strictly necessary. It remains for historical
-compatibility.
-
-
-File: gawk.info, Node: Naming Standard Input, Next: Environment Variables, Prev: Other Arguments, Up: Invoking Gawk
-
-2.4 Naming Standard Input
-=========================
-
-Often, you may wish to read standard input together with other files.
-For example, you may wish to read one file, read standard input coming
-from a pipe, and then read another file.
-
- The way to name the standard input, with all versions of `awk', is
-to use a single, standalone minus sign or dash, `-'. For example:
-
- SOME_COMMAND | awk -f myprog.awk file1 - file2
-
-Here, `awk' first reads `file1', then it reads the output of
-SOME_COMMAND, and finally it reads `file2'.
-
- You may also use `"-"' to name standard input when reading files
-with `getline' (*note Getline/File::).
-
- In addition, `gawk' allows you to specify the special file name
-`/dev/stdin', both on the command line and with `getline'. Some other
-versions of `awk' also support this, but it is not standard. (Some
-operating systems provide a `/dev/stdin' file in the file system,
-however, `gawk' always processes this file name itself.)
-
-
-File: gawk.info, Node: Environment Variables, Next: Exit Status, Prev: Naming Standard Input, Up: Invoking Gawk
-
-2.5 The Environment Variables `gawk' Uses
-=========================================
-
-A number of environment variables influence how `gawk' behaves.
-
-* Menu:
-
-* AWKPATH Variable:: Searching directories for `awk'
- programs.
-* AWKLIBPATH Variable:: Searching directories for `awk' shared
- libraries.
-* Other Environment Variables:: The environment variables.
-
-
-File: gawk.info, Node: AWKPATH Variable, Next: AWKLIBPATH Variable, Up: Environment Variables
-
-2.5.1 The `AWKPATH' Environment Variable
-----------------------------------------
-
-The previous minor node described how `awk' program files can be named
-on the command-line with the `-f' option. In most `awk'
-implementations, you must supply a precise path name for each program
-file, unless the file is in the current directory. But in `gawk', if
-the file name supplied to the `-f' option does not contain a `/', then
-`gawk' searches a list of directories (called the "search path"), one
-by one, looking for a file with the specified name.
-
-The search path is a string consisting of directory names separated by
-colons. `gawk' gets its search path from the `AWKPATH' environment
-variable. If that variable does not exist, `gawk' uses a default path,
-`.:/usr/local/share/awk'.(1)
-
- The search path feature is particularly useful for building libraries
-of useful `awk' functions. The library files can be placed in a
-standard directory in the default path and then specified on the
-command line with a short file name. Otherwise, the full file name
-would have to be typed for each file.
-
- By using both the `--source' and `-f' options, your command-line
-`awk' programs can use facilities in `awk' library files (*note Library
-Functions::). Path searching is not done if `gawk' is in compatibility
-mode. This is true for both `--traditional' and `--posix'. *Note
-Options::.
-
- NOTE: To include the current directory in the path, either place
- `.' explicitly in the path or write a null entry in the path. (A
- null entry is indicated by starting or ending the path with a
- colon or by placing two colons next to each other (`::').) This
- path search mechanism is similar to the shell's.
-
- However, `gawk' always looks in the current directory _before_
- searching `AWKPATH', so there is no real reason to include the
- current directory in the search path.
-
- If `AWKPATH' is not defined in the environment, `gawk' places its
-default search path into `ENVIRON["AWKPATH"]'. This makes it easy to
-determine the actual search path that `gawk' will use from within an
-`awk' program.
-
- While you can change `ENVIRON["AWKPATH"]' within your `awk' program,
-this has no effect on the running program's behavior. This makes
-sense: the `AWKPATH' environment variable is used to find the program
-source files. Once your program is running, all the files have been
-found, and `gawk' no longer needs to use `AWKPATH'.
-
- ---------- Footnotes ----------
-
- (1) Your version of `gawk' may use a different directory; it will
-depend upon how `gawk' was built and installed. The actual directory is
-the value of `$(datadir)' generated when `gawk' was configured. You
-probably don't need to worry about this, though.
-
-
-File: gawk.info, Node: AWKLIBPATH Variable, Next: Other Environment Variables, Prev: AWKPATH Variable, Up: Environment Variables
-
-2.5.2 The `AWKLIBPATH' Environment Variable
--------------------------------------------
-
-The `AWKLIBPATH' environment variable is similar to the `AWKPATH'
-variable, but it is used to search for shared libraries specified with
-the `-l' option rather than for source files. If the library is not
-found, the path is searched again after adding the appropriate shared
-library suffix for the platform. For example, on GNU/Linux systems,
-the suffix `.so' is used.
-
-
-File: gawk.info, Node: Other Environment Variables, Prev: AWKLIBPATH Variable, Up: Environment Variables
-
-2.5.3 Other Environment Variables
----------------------------------
-
-A number of other environment variables affect `gawk''s behavior, but
-they are more specialized. Those in the following list are meant to be
-used by regular users.
-
-`POSIXLY_CORRECT'
- Causes `gawk' to switch POSIX compatibility mode, disabling all
- traditional and GNU extensions. *Note Options::.
-
-`GAWK_SOCK_RETRIES'
- Controls the number of time `gawk' will attempt to retry a two-way
- TCP/IP (socket) connection before giving up. *Note TCP/IP
- Networking::.
-
-`GAWK_MSEC_SLEEP'
- Specifies the interval between connection retries, in
- milliseconds. On systems that do not support the `usleep()' system
- call, the value is rounded up to an integral number of seconds.
-
-`GAWK_READ_TIMEOUT'
- Specifies the time, in milliseconds, for `gawk' to wait for input
- before returning with an error. *Note Read Timeout::.
-
- The environment variables in the following list are meant for use by
-the `gawk' developers for testing and tuning. They are subject to
-change. The variables are:
-
-`AVG_CHAIN_MAX'
- The average number of items `gawk' will maintain on a hash chain
- for managing arrays.
-
-`AWK_HASH'
- If this variable exists with a value of `gst', `gawk' will switch
- to using the hash function from GNU Smalltalk for managing arrays.
- This function may be marginally faster than the standard function.
-
-`AWKREADFUNC'
- If this variable exists, `gawk' switches to reading source files
- one line at a time, instead of reading in blocks. This exists for
- debugging problems on filesystems on non-POSIX operating systems
- where I/O is performed in records, not in blocks.
-
-`GAWK_NO_DFA'
- If this variable exists, `gawk' does not use the DFA regexp matcher
- for "does it match" kinds of tests. This can cause `gawk' to be
- slower. Its purpose is to help isolate differences between the two
- regexp matchers that `gawk' uses internally. (There aren't
- supposed to be differences, but occasionally theory and practice
- don't coordinate with each other.)
-
-`GAWK_STACKSIZE'
- This specifies the amount by which `gawk' should grow its internal
- evaluation stack, when needed.
-
-`TIDYMEM'
- If this variable exists, `gawk' uses the `mtrace()' library calls
- from GNU LIBC to help track down possible memory leaks.
-
-
-File: gawk.info, Node: Exit Status, Next: Include Files, Prev: Environment Variables, Up: Invoking Gawk
-
-2.6 `gawk''s Exit Status
-========================
-
-If the `exit' statement is used with a value (*note Exit Statement::),
-then `gawk' exits with the numeric value given to it.
-
- Otherwise, if there were no problems during execution, `gawk' exits
-with the value of the C constant `EXIT_SUCCESS'. This is usually zero.
-
- If an error occurs, `gawk' exits with the value of the C constant
-`EXIT_FAILURE'. This is usually one.
-
- If `gawk' exits because of a fatal error, the exit status is 2. On
-non-POSIX systems, this value may be mapped to `EXIT_FAILURE'.
-
-
-File: gawk.info, Node: Include Files, Next: Loading Shared Libraries, Prev: Exit Status, Up: Invoking Gawk
-
-2.7 Including Other Files Into Your Program
-===========================================
-
-This minor node describes a feature that is specific to `gawk'.
-
- The `@include' keyword can be used to read external `awk' source
-files. This gives you the ability to split large `awk' source files
-into smaller, more manageable pieces, and also lets you reuse common
-`awk' code from various `awk' scripts. In other words, you can group
-together `awk' functions, used to carry out specific tasks, into
-external files. These files can be used just like function libraries,
-using the `@include' keyword in conjunction with the `AWKPATH'
-environment variable.
-
- Let's see an example. We'll start with two (trivial) `awk' scripts,
-namely `test1' and `test2'. Here is the `test1' script:
-
- BEGIN {
- print "This is script test1."
- }
-
-and here is `test2':
-
- @include "test1"
- BEGIN {
- print "This is script test2."
- }
-
- Running `gawk' with `test2' produces the following result:
-
- $ gawk -f test2
- -| This is file test1.
- -| This is file test2.
-
- `gawk' runs the `test2' script which includes `test1' using the
-`@include' keyword. So, to include external `awk' source files you just
-use `@include' followed by the name of the file to be included,
-enclosed in double quotes.
-
- NOTE: Keep in mind that this is a language construct and the file
- name cannot be a string variable, but rather just a literal string
- in double quotes.
-
- The files to be included may be nested; e.g., given a third script,
-namely `test3':
-
- @include "test2"
- BEGIN {
- print "This is script test3."
- }
-
-Running `gawk' with the `test3' script produces the following results:
-
- $ gawk -f test3
- -| This is file test1.
- -| This is file test2.
- -| This is file test3.
-
- The file name can, of course, be a pathname. For example:
-
- @include "../io_funcs"
-
-or:
-
- @include "/usr/awklib/network"
-
-are valid. The `AWKPATH' environment variable can be of great value
-when using `@include'. The same rules for the use of the `AWKPATH'
-variable in command-line file searches (*note AWKPATH Variable::) apply
-to `@include' also.
-
- This is very helpful in constructing `gawk' function libraries. If
-you have a large script with useful, general purpose `awk' functions,
-you can break it down into library files and put those files in a
-special directory. You can then include those "libraries," using
-either the full pathnames of the files, or by setting the `AWKPATH'
-environment variable accordingly and then using `@include' with just
-the file part of the full pathname. Of course you can have more than
-one directory to keep library files; the more complex the working
-environment is, the more directories you may need to organize the files
-to be included.
-
- Given the ability to specify multiple `-f' options, the `@include'
-mechanism is not strictly necessary. However, the `@include' keyword
-can help you in constructing self-contained `gawk' programs, thus
-reducing the need for writing complex and tedious command lines. In
-particular, `@include' is very useful for writing CGI scripts to be run
-from web pages.
-
- As mentioned in *note AWKPATH Variable::, the current directory is
-always searched first for source files, before searching in `AWKPATH',
-and this also applies to files named with `@include'.
-
-
-File: gawk.info, Node: Loading Shared Libraries, Next: Obsolete, Prev: Include Files, Up: Invoking Gawk
-
-2.8 Loading Shared Libraries Into Your Program
-==============================================
-
-This minor node describes a feature that is specific to `gawk'.
-
- The `@load' keyword can be used to read external `awk' shared
-libraries. This allows you to link in compiled code that may offer
-superior performance and/or give you access to extended capabilities
-not supported by the `awk' language. The `AWKLIBPATH' variable is used
-to search for the shared library. Using `@load' is completely
-equivalent to using the `-l' command-line option.
-
- If the shared library is not initially found in `AWKLIBPATH', another
-search is conducted after appending the platform's default shared
-library suffix to the filename. For example, on GNU/Linux systems, the
-suffix `.so' is used.
-
- $ gawk '@load "ordchr"; BEGIN {print chr(65)}'
- -| A
-
-This is equivalent to the following example:
-
- $ gawk -lordchr 'BEGIN {print chr(65)}'
- -| A
-
-For command-line usage, the `-l' option is more convenient, but `@load'
-is useful for embedding inside an `awk' source file that requires
-access to a shared library.
-
-
-File: gawk.info, Node: Obsolete, Next: Undocumented, Prev: Loading Shared Libraries, Up: Invoking Gawk
-
-2.9 Obsolete Options and/or Features
-====================================
-
-This minor node describes features and/or command-line options from
-previous releases of `gawk' that are either not available in the
-current version or that are still supported but deprecated (meaning that
-they will _not_ be in the next release).
-
- The process-related special files `/dev/pid', `/dev/ppid',
-`/dev/pgrpid', and `/dev/user' were deprecated in `gawk' 3.1, but still
-worked. As of version 4.0, they are no longer interpreted specially by
-`gawk'. (Use `PROCINFO' instead; see *note Auto-set::.)
-
-
-File: gawk.info, Node: Undocumented, Prev: Obsolete, Up: Invoking Gawk
-
-2.10 Undocumented Options and Features
-======================================
-
- Use the Source, Luke!
- Obi-Wan
-
- This minor node intentionally left blank.
-
-
-File: gawk.info, Node: Regexp, Next: Reading Files, Prev: Invoking Gawk, Up: Top
-
-3 Regular Expressions
-*********************
-
-A "regular expression", or "regexp", is a way of describing a set of
-strings. Because regular expressions are such a fundamental part of
-`awk' programming, their format and use deserve a separate major node.
-
- A regular expression enclosed in slashes (`/') is an `awk' pattern
-that matches every input record whose text belongs to that set. The
-simplest regular expression is a sequence of letters, numbers, or both.
-Such a regexp matches any string that contains that sequence. Thus,
-the regexp `foo' matches any string containing `foo'. Therefore, the
-pattern `/foo/' matches any input record containing the three
-characters `foo' _anywhere_ in the record. Other kinds of regexps let
-you specify more complicated classes of strings.
-
-* Menu:
-
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between `[...]'.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-
-
-File: gawk.info, Node: Regexp Usage, Next: Escape Sequences, Up: Regexp
-
-3.1 How to Use Regular Expressions
-==================================
-
-A regular expression can be used as a pattern by enclosing it in
-slashes. Then the regular expression is tested against the entire text
-of each record. (Normally, it only needs to match some part of the
-text in order to succeed.) For example, the following prints the
-second field of each record that contains the string `foo' anywhere in
-it:
-
- $ awk '/foo/ { print $2 }' BBS-list
- -| 555-1234
- -| 555-6699
- -| 555-6480
- -| 555-2127
-
- Regular expressions can also be used in matching expressions. These
-expressions allow you to specify the string to match against; it need
-not be the entire current input record. The two operators `~' and `!~'
-perform regular expression comparisons. Expressions using these
-operators can be used as patterns, or in `if', `while', `for', and `do'
-statements. (*Note Statements::.) For example:
-
- EXP ~ /REGEXP/
-
-is true if the expression EXP (taken as a string) matches REGEXP. The
-following example matches, or selects, all input records with the
-uppercase letter `J' somewhere in the first field:
-
- $ awk '$1 ~ /J/' inventory-shipped
- -| Jan 13 25 15 115
- -| Jun 31 42 75 492
- -| Jul 24 34 67 436
- -| Jan 21 36 64 620
-
- So does this:
-
- awk '{ if ($1 ~ /J/) print }' inventory-shipped
-
- This next example is true if the expression EXP (taken as a
-character string) does _not_ match REGEXP:
-
- EXP !~ /REGEXP/
-
- The following example matches, or selects, all input records whose
-first field _does not_ contain the uppercase letter `J':
-
- $ awk '$1 !~ /J/' inventory-shipped
- -| Feb 15 32 24 226
- -| Mar 15 24 34 228
- -| Apr 31 52 63 420
- -| May 16 34 29 208
- ...
-
- When a regexp is enclosed in slashes, such as `/foo/', we call it a
-"regexp constant", much like `5.27' is a numeric constant and `"foo"'
-is a string constant.
-
-
-File: gawk.info, Node: Escape Sequences, Next: Regexp Operators, Prev: Regexp Usage, Up: Regexp
-
-3.2 Escape Sequences
-====================
-
-Some characters cannot be included literally in string constants
-(`"foo"') or regexp constants (`/foo/'). Instead, they should be
-represented with "escape sequences", which are character sequences
-beginning with a backslash (`\'). One use of an escape sequence is to
-include a double-quote character in a string constant. Because a plain
-double quote ends the string, you must use `\"' to represent an actual
-double-quote character as a part of the string. For example:
-
- $ awk 'BEGIN { print "He said \"hi!\" to her." }'
- -| He said "hi!" to her.
-
- The backslash character itself is another character that cannot be
-included normally; you must write `\\' to put one backslash in the
-string or regexp. Thus, the string whose contents are the two
-characters `"' and `\' must be written `"\"\\"'.
-
- Other escape sequences represent unprintable characters such as TAB
-or newline. While there is nothing to stop you from entering most
-unprintable characters directly in a string constant or regexp constant,
-they may look ugly.
-
- The following table lists all the escape sequences used in `awk' and
-what they represent. Unless noted otherwise, all these escape sequences
-apply to both string constants and regexp constants:
-
-`\\'
- A literal backslash, `\'.
-
-`\a'
- The "alert" character, `Ctrl-g', ASCII code 7 (BEL). (This
- usually makes some sort of audible noise.)
-
-`\b'
- Backspace, `Ctrl-h', ASCII code 8 (BS).
-
-`\f'
- Formfeed, `Ctrl-l', ASCII code 12 (FF).
-
-`\n'
- Newline, `Ctrl-j', ASCII code 10 (LF).
-
-`\r'
- Carriage return, `Ctrl-m', ASCII code 13 (CR).
-
-`\t'
- Horizontal TAB, `Ctrl-i', ASCII code 9 (HT).
-
-`\v'
- Vertical tab, `Ctrl-k', ASCII code 11 (VT).
-
-`\NNN'
- The octal value NNN, where NNN stands for 1 to 3 digits between
- `0' and `7'. For example, the code for the ASCII ESC (escape)
- character is `\033'.
-
-`\xHH...'
- The hexadecimal value HH, where HH stands for a sequence of
- hexadecimal digits (`0'-`9', and either `A'-`F' or `a'-`f'). Like
- the same construct in ISO C, the escape sequence continues until
- the first nonhexadecimal digit is seen. (c.e.) However, using
- more than two hexadecimal digits produces undefined results. (The
- `\x' escape sequence is not allowed in POSIX `awk'.)
-
-`\/'
- A literal slash (necessary for regexp constants only). This
- sequence is used when you want to write a regexp constant that
- contains a slash. Because the regexp is delimited by slashes, you
- need to escape the slash that is part of the pattern, in order to
- tell `awk' to keep processing the rest of the regexp.
-
-`\"'
- A literal double quote (necessary for string constants only).
- This sequence is used when you want to write a string constant
- that contains a double quote. Because the string is delimited by
- double quotes, you need to escape the quote that is part of the
- string, in order to tell `awk' to keep processing the rest of the
- string.
-
- In `gawk', a number of additional two-character sequences that begin
-with a backslash have special meaning in regexps. *Note GNU Regexp
-Operators::.
-
- In a regexp, a backslash before any character that is not in the
-previous list and not listed in *note GNU Regexp Operators::, means
-that the next character should be taken literally, even if it would
-normally be a regexp operator. For example, `/a\+b/' matches the three
-characters `a+b'.
-
- For complete portability, do not use a backslash before any
-character not shown in the previous list.
-
- To summarize:
-
- * The escape sequences in the table above are always processed first,
- for both string constants and regexp constants. This happens very
- early, as soon as `awk' reads your program.
-
- * `gawk' processes both regexp constants and dynamic regexps (*note
- Computed Regexps::), for the special operators listed in *note GNU
- Regexp Operators::.
-
- * A backslash before any other character means to treat that
- character literally.
-
-Advanced Notes: Backslash Before Regular Characters
----------------------------------------------------
-
-If you place a backslash in a string constant before something that is
-not one of the characters previously listed, POSIX `awk' purposely
-leaves what happens as undefined. There are two choices:
-
-Strip the backslash out
- This is what Brian Kernighan's `awk' and `gawk' both do. For
- example, `"a\qc"' is the same as `"aqc"'. (Because this is such
- an easy bug both to introduce and to miss, `gawk' warns you about
- it.) Consider `FS = "[ \t]+\|[ \t]+"' to use vertical bars
- surrounded by whitespace as the field separator. There should be
- two backslashes in the string: `FS = "[ \t]+\\|[ \t]+"'.)
-
-Leave the backslash alone
- Some other `awk' implementations do this. In such
- implementations, typing `"a\qc"' is the same as typing `"a\\qc"'.
-
-Advanced Notes: Escape Sequences for Metacharacters
----------------------------------------------------
-
-Suppose you use an octal or hexadecimal escape to represent a regexp
-metacharacter. (See *note Regexp Operators::.) Does `awk' treat the
-character as a literal character or as a regexp operator?
-
- Historically, such characters were taken literally. (d.c.)
-However, the POSIX standard indicates that they should be treated as
-real metacharacters, which is what `gawk' does. In compatibility mode
-(*note Options::), `gawk' treats the characters represented by octal
-and hexadecimal escape sequences literally when used in regexp
-constants. Thus, `/a\52b/' is equivalent to `/a\*b/'.
-
-
-File: gawk.info, Node: Regexp Operators, Next: Bracket Expressions, Prev: Escape Sequences, Up: Regexp
-
-3.3 Regular Expression Operators
-================================
-
-You can combine regular expressions with special characters, called
-"regular expression operators" or "metacharacters", to increase the
-power and versatility of regular expressions.
-
- The escape sequences described in *note Escape Sequences::, are
-valid inside a regexp. They are introduced by a `\' and are recognized
-and converted into corresponding real characters as the very first step
-in processing regexps.
-
- Here is a list of metacharacters. All characters that are not escape
-sequences and that are not listed in the table stand for themselves:
-
-`\'
- This is used to suppress the special meaning of a character when
- matching. For example, `\$' matches the character `$'.
-
-`^'
- This matches the beginning of a string. For example, `^@chapter'
- matches `@chapter' at the beginning of a string and can be used to
- identify chapter beginnings in Texinfo source files. The `^' is
- known as an "anchor", because it anchors the pattern to match only
- at the beginning of the string.
-
- It is important to realize that `^' does not match the beginning of
- a line embedded in a string. The condition is not true in the
- following example:
-
- if ("line1\nLINE 2" ~ /^L/) ...
-
-`$'
- This is similar to `^', but it matches only at the end of a string.
- For example, `p$' matches a record that ends with a `p'. The `$'
- is an anchor and does not match the end of a line embedded in a
- string. The condition in the following example is not true:
-
- if ("line1\nLINE 2" ~ /1$/) ...
-
-`. (period)'
- This matches any single character, _including_ the newline
- character. For example, `.P' matches any single character
- followed by a `P' in a string. Using concatenation, we can make a
- regular expression such as `U.A', which matches any
- three-character sequence that begins with `U' and ends with `A'.
-
- In strict POSIX mode (*note Options::), `.' does not match the NUL
- character, which is a character with all bits equal to zero.
- Otherwise, NUL is just another character. Other versions of `awk'
- may not be able to match the NUL character.
-
-`[...]'
- This is called a "bracket expression".(1) It matches any _one_ of
- the characters that are enclosed in the square brackets. For
- example, `[MVX]' matches any one of the characters `M', `V', or
- `X' in a string. A full discussion of what can be inside the
- square brackets of a bracket expression is given in *note Bracket
- Expressions::.
-
-`[^ ...]'
- This is a "complemented bracket expression". The first character
- after the `[' _must_ be a `^'. It matches any characters _except_
- those in the square brackets. For example, `[^awk]' matches any
- character that is not an `a', `w', or `k'.
-
-`|'
- This is the "alternation operator" and it is used to specify
- alternatives. The `|' has the lowest precedence of all the regular
- expression operators. For example, `^P|[[:digit:]]' matches any
- string that matches either `^P' or `[[:digit:]]'. This means it
- matches any string that starts with `P' or contains a digit.
-
- The alternation applies to the largest possible regexps on either
- side.
-
-`(...)'
- Parentheses are used for grouping in regular expressions, as in
- arithmetic. They can be used to concatenate regular expressions
- containing the alternation operator, `|'. For example,
- `@(samp|code)\{[^}]+\}' matches both `@code{foo}' and `@samp{bar}'.
- (These are Texinfo formatting control sequences. The `+' is
- explained further on in this list.)
-
-`*'
- This symbol means that the preceding regular expression should be
- repeated as many times as necessary to find a match. For example,
- `ph*' applies the `*' symbol to the preceding `h' and looks for
- matches of one `p' followed by any number of `h's. This also
- matches just `p' if no `h's are present.
-
- The `*' repeats the _smallest_ possible preceding expression.
- (Use parentheses if you want to repeat a larger expression.) It
- finds as many repetitions as possible. For example, `awk
- '/\(c[ad][ad]*r x\)/ { print }' sample' prints every record in
- `sample' containing a string of the form `(car x)', `(cdr x)',
- `(cadr x)', and so on. Notice the escaping of the parentheses by
- preceding them with backslashes.
-
-`+'
- This symbol is similar to `*', except that the preceding
- expression must be matched at least once. This means that `wh+y'
- would match `why' and `whhy', but not `wy', whereas `wh*y' would
- match all three of these strings. The following is a simpler way
- of writing the last `*' example:
-
- awk '/\(c[ad]+r x\)/ { print }' sample
-
-`?'
- This symbol is similar to `*', except that the preceding
- expression can be matched either once or not at all. For example,
- `fe?d' matches `fed' and `fd', but nothing else.
-
-`{N}'
-`{N,}'
-`{N,M}'
- One or two numbers inside braces denote an "interval expression".
- If there is one number in the braces, the preceding regexp is
- repeated N times. If there are two numbers separated by a comma,
- the preceding regexp is repeated N to M times. If there is one
- number followed by a comma, then the preceding regexp is repeated
- at least N times:
-
- `wh{3}y'
- Matches `whhhy', but not `why' or `whhhhy'.
-
- `wh{3,5}y'
- Matches `whhhy', `whhhhy', or `whhhhhy', only.
-
- `wh{2,}y'
- Matches `whhy' or `whhhy', and so on.
-
- Interval expressions were not traditionally available in `awk'.
- They were added as part of the POSIX standard to make `awk' and
- `egrep' consistent with each other.
-
- Initially, because old programs may use `{' and `}' in regexp
- constants, `gawk' did _not_ match interval expressions in regexps.
-
- However, beginning with version 4.0, `gawk' does match interval
- expressions by default. This is because compatibility with POSIX
- has become more important to most `gawk' users than compatibility
- with old programs.
-
- For programs that use `{' and `}' in regexp constants, it is good
- practice to always escape them with a backslash. Then the regexp
- constants are valid and work the way you want them to, using any
- version of `awk'.(2)
-
- Finally, when `{' and `}' appear in regexp constants in a way that
- cannot be interpreted as an interval expression (such as
- `/q{a}/'), then they stand for themselves.
-
- In regular expressions, the `*', `+', and `?' operators, as well as
-the braces `{' and `}', have the highest precedence, followed by
-concatenation, and finally by `|'. As in arithmetic, parentheses can
-change how operators are grouped.
-
- In POSIX `awk' and `gawk', the `*', `+', and `?' operators stand for
-themselves when there is nothing in the regexp that precedes them. For
-example, `/+/' matches a literal plus sign. However, many other
-versions of `awk' treat such a usage as a syntax error.
-
- If `gawk' is in compatibility mode (*note Options::), interval
-expressions are not available in regular expressions.
-
- ---------- Footnotes ----------
-
- (1) In other literature, you may see a bracket expression referred
-to as either a "character set", a "character class", or a "character
-list".
-
- (2) Use two backslashes if you're using a string constant with a
-regexp operator or function.
-
-
-File: gawk.info, Node: Bracket Expressions, Next: GNU Regexp Operators, Prev: Regexp Operators, Up: Regexp
-
-3.4 Using Bracket Expressions
-=============================
-
-As mentioned earlier, a bracket expression matches any character amongst
-those listed between the opening and closing square brackets.
-
- Within a bracket expression, a "range expression" consists of two
-characters separated by a hyphen. It matches any single character that
-sorts between the two characters, based upon the system's native
-character set. For example, `[0-9]' is equivalent to `[0123456789]'.
-(See *note Ranges and Locales::, for an explanation of how the POSIX
-standard and `gawk' have changed over time. This is mainly of
-historical interest.)
-
- To include one of the characters `\', `]', `-', or `^' in a bracket
-expression, put a `\' in front of it. For example:
-
- [d\]]
-
-matches either `d' or `]'.
-
- This treatment of `\' in bracket expressions is compatible with
-other `awk' implementations and is also mandated by POSIX. The regular
-expressions in `awk' are a superset of the POSIX specification for
-Extended Regular Expressions (EREs). POSIX EREs are based on the
-regular expressions accepted by the traditional `egrep' utility.
-
- "Character classes" are a feature introduced in the POSIX standard.
-A character class is a special notation for describing lists of
-characters that have a specific attribute, but the actual characters
-can vary from country to country and/or from character set to character
-set. For example, the notion of what is an alphabetic character
-differs between the United States and France.
-
- A character class is only valid in a regexp _inside_ the brackets of
-a bracket expression. Character classes consist of `[:', a keyword
-denoting the class, and `:]'. *note table-char-classes:: lists the
-character classes defined by the POSIX standard.
-
-Class Meaning
---------------------------------------------------------------------------
-`[:alnum:]' Alphanumeric characters.
-`[:alpha:]' Alphabetic characters.
-`[:blank:]' Space and TAB characters.
-`[:cntrl:]' Control characters.
-`[:digit:]' Numeric characters.
-`[:graph:]' Characters that are both printable and visible. (A space is
- printable but not visible, whereas an `a' is both.)
-`[:lower:]' Lowercase alphabetic characters.
-`[:print:]' Printable characters (characters that are not control
- characters).
-`[:punct:]' Punctuation characters (characters that are not letters,
- digits, control characters, or space characters).
-`[:space:]' Space characters (such as space, TAB, and formfeed, to name
- a few).
-`[:upper:]' Uppercase alphabetic characters.
-`[:xdigit:]'Characters that are hexadecimal digits.
-
-Table 3.1: POSIX Character Classes
-
- For example, before the POSIX standard, you had to write
-`/[A-Za-z0-9]/' to match alphanumeric characters. If your character
-set had other alphabetic characters in it, this would not match them.
-With the POSIX character classes, you can write `/[[:alnum:]]/' to
-match the alphabetic and numeric characters in your character set.
-
- Two additional special sequences can appear in bracket expressions.
-These apply to non-ASCII character sets, which can have single symbols
-(called "collating elements") that are represented with more than one
-character. They can also have several characters that are equivalent for
-"collating", or sorting, purposes. (For example, in French, a plain "e"
-and a grave-accented "e`" are equivalent.) These sequences are:
-
-Collating symbols
- Multicharacter collating elements enclosed between `[.' and `.]'.
- For example, if `ch' is a collating element, then `[[.ch.]]' is a
- regexp that matches this collating element, whereas `[ch]' is a
- regexp that matches either `c' or `h'.
-
-Equivalence classes
- Locale-specific names for a list of characters that are equal. The
- name is enclosed between `[=' and `=]'. For example, the name `e'
- might be used to represent all of "e," "e`," and "e'." In this
- case, `[[=e=]]' is a regexp that matches any of `e', `e'', or `e`'.
-
- These features are very valuable in non-English-speaking locales.
-
- CAUTION: The library functions that `gawk' uses for regular
- expression matching currently recognize only POSIX character
- classes; they do not recognize collating symbols or equivalence
- classes.
-
-
-File: gawk.info, Node: GNU Regexp Operators, Next: Case-sensitivity, Prev: Bracket Expressions, Up: Regexp
-
-3.5 `gawk'-Specific Regexp Operators
-====================================
-
-GNU software that deals with regular expressions provides a number of
-additional regexp operators. These operators are described in this
-minor node and are specific to `gawk'; they are not available in other
-`awk' implementations. Most of the additional operators deal with word
-matching. For our purposes, a "word" is a sequence of one or more
-letters, digits, or underscores (`_'):
-
-`\s'
- Matches any whitespace character. Think of it as shorthand for
- `[[:space:]]'.
-
-`\S'
- Matches any character that is not whitespace. Think of it as
- shorthand for `[^[:space:]]'.
-
-`\w'
- Matches any word-constituent character--that is, it matches any
- letter, digit, or underscore. Think of it as shorthand for
- `[[:alnum:]_]'.
-
-`\W'
- Matches any character that is not word-constituent. Think of it
- as shorthand for `[^[:alnum:]_]'.
-
-`\<'
- Matches the empty string at the beginning of a word. For example,
- `/\<away/' matches `away' but not `stowaway'.
-
-`\>'
- Matches the empty string at the end of a word. For example,
- `/stow\>/' matches `stow' but not `stowaway'.
-
-`\y'
- Matches the empty string at either the beginning or the end of a
- word (i.e., the word boundar*y*). For example, `\yballs?\y'
- matches either `ball' or `balls', as a separate word.
-
-`\B'
- Matches the empty string that occurs between two word-constituent
- characters. For example, `/\Brat\B/' matches `crate' but it does
- not match `dirty rat'. `\B' is essentially the opposite of `\y'.
-
- There are two other operators that work on buffers. In Emacs, a
-"buffer" is, naturally, an Emacs buffer. For other programs, `gawk''s
-regexp library routines consider the entire string to match as the
-buffer. The operators are:
-
-`\`'
- Matches the empty string at the beginning of a buffer (string).
-
-`\''
- Matches the empty string at the end of a buffer (string).
-
- Because `^' and `$' always work in terms of the beginning and end of
-strings, these operators don't add any new capabilities for `awk'.
-They are provided for compatibility with other GNU software.
-
- In other GNU software, the word-boundary operator is `\b'. However,
-that conflicts with the `awk' language's definition of `\b' as
-backspace, so `gawk' uses a different letter. An alternative method
-would have been to require two backslashes in the GNU operators, but
-this was deemed too confusing. The current method of using `\y' for the
-GNU `\b' appears to be the lesser of two evils.
-
- The various command-line options (*note Options::) control how
-`gawk' interprets characters in regexps:
-
-No options
- In the default case, `gawk' provides all the facilities of POSIX
- regexps and the GNU regexp operators described in *note Regexp
- Operators::.
-
-`--posix'
- Only POSIX regexps are supported; the GNU operators are not special
- (e.g., `\w' matches a literal `w'). Interval expressions are
- allowed.
-
-`--traditional'
- Traditional Unix `awk' regexps are matched. The GNU operators are
- not special, and interval expressions are not available. The
- POSIX character classes (`[[:alnum:]]', etc.) are supported, as
- Brian Kernighan's `awk' does support them. Characters described
- by octal and hexadecimal escape sequences are treated literally,
- even if they represent regexp metacharacters.
-
-`--re-interval'
- Allow interval expressions in regexps, if `--traditional' has been
- provided. Otherwise, interval expressions are available by
- default.
-
-
-File: gawk.info, Node: Case-sensitivity, Next: Leftmost Longest, Prev: GNU Regexp Operators, Up: Regexp
-
-3.6 Case Sensitivity in Matching
-================================
-
-Case is normally significant in regular expressions, both when matching
-ordinary characters (i.e., not metacharacters) and inside bracket
-expressions. Thus, a `w' in a regular expression matches only a
-lowercase `w' and not an uppercase `W'.
-
- The simplest way to do a case-independent match is to use a bracket
-expression--for example, `[Ww]'. However, this can be cumbersome if
-you need to use it often, and it can make the regular expressions harder
-to read. There are two alternatives that you might prefer.
-
- One way to perform a case-insensitive match at a particular point in
-the program is to convert the data to a single case, using the
-`tolower()' or `toupper()' built-in string functions (which we haven't
-discussed yet; *note String Functions::). For example:
-
- tolower($1) ~ /foo/ { ... }
-
-converts the first field to lowercase before matching against it. This
-works in any POSIX-compliant `awk'.
-
- Another method, specific to `gawk', is to set the variable
-`IGNORECASE' to a nonzero value (*note Built-in Variables::). When
-`IGNORECASE' is not zero, _all_ regexp and string operations ignore
-case. Changing the value of `IGNORECASE' dynamically controls the
-case-sensitivity of the program as it runs. Case is significant by
-default because `IGNORECASE' (like most variables) is initialized to
-zero:
-
- x = "aB"
- if (x ~ /ab/) ... # this test will fail
-
- IGNORECASE = 1
- if (x ~ /ab/) ... # now it will succeed
-
- In general, you cannot use `IGNORECASE' to make certain rules
-case-insensitive and other rules case-sensitive, because there is no
-straightforward way to set `IGNORECASE' just for the pattern of a
-particular rule.(1) To do this, use either bracket expressions or
-`tolower()'. However, one thing you can do with `IGNORECASE' only is
-dynamically turn case-sensitivity on or off for all the rules at once.
-
- `IGNORECASE' can be set on the command line or in a `BEGIN' rule
-(*note Other Arguments::; also *note Using BEGIN/END::). Setting
-`IGNORECASE' from the command line is a way to make a program
-case-insensitive without having to edit it.
-
- Both regexp and string comparison operations are affected by
-`IGNORECASE'.
-
- In multibyte locales, the equivalences between upper- and lowercase
-characters are tested based on the wide-character values of the
-locale's character set. Otherwise, the characters are tested based on
-the ISO-8859-1 (ISO Latin-1) character set. This character set is a
-superset of the traditional 128 ASCII characters, which also provides a
-number of characters suitable for use with European languages.(2)
-
- The value of `IGNORECASE' has no effect if `gawk' is in
-compatibility mode (*note Options::). Case is always significant in
-compatibility mode.
-
- ---------- Footnotes ----------
-
- (1) Experienced C and C++ programmers will note that it is possible,
-using something like `IGNORECASE = 1 && /foObAr/ { ... }' and
-`IGNORECASE = 0 || /foobar/ { ... }'. However, this is somewhat
-obscure and we don't recommend it.
-
- (2) If you don't understand this, don't worry about it; it just
-means that `gawk' does the right thing.
-
-
-File: gawk.info, Node: Leftmost Longest, Next: Computed Regexps, Prev: Case-sensitivity, Up: Regexp
-
-3.7 How Much Text Matches?
-==========================
-
-Consider the following:
-
- echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
-
- This example uses the `sub()' function (which we haven't discussed
-yet; *note String Functions::) to make a change to the input record.
-Here, the regexp `/a+/' indicates "one or more `a' characters," and the
-replacement text is `<A>'.
-
- The input contains four `a' characters. `awk' (and POSIX) regular
-expressions always match the leftmost, _longest_ sequence of input
-characters that can match. Thus, all four `a' characters are replaced
-with `<A>' in this example:
-
- $ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
- -| <A>bcd
-
- For simple match/no-match tests, this is not so important. But when
-doing text matching and substitutions with the `match()', `sub()',
-`gsub()', and `gensub()' functions, it is very important. *Note String
-Functions::, for more information on these functions. Understanding
-this principle is also important for regexp-based record and field
-splitting (*note Records::, and also *note Field Separators::).
-
-
-File: gawk.info, Node: Computed Regexps, Prev: Leftmost Longest, Up: Regexp
-
-3.8 Using Dynamic Regexps
-=========================
-
-The righthand side of a `~' or `!~' operator need not be a regexp
-constant (i.e., a string of characters between slashes). It may be any
-expression. The expression is evaluated and converted to a string if
-necessary; the contents of the string are then used as the regexp. A
-regexp computed in this way is called a "dynamic regexp":
-
- BEGIN { digits_regexp = "[[:digit:]]+" }
- $0 ~ digits_regexp { print }
-
-This sets `digits_regexp' to a regexp that describes one or more digits,
-and tests whether the input record matches this regexp.
-
- NOTE: When using the `~' and `!~' operators, there is a difference
- between a regexp constant enclosed in slashes and a string
- constant enclosed in double quotes. If you are going to use a
- string constant, you have to understand that the string is, in
- essence, scanned _twice_: the first time when `awk' reads your
- program, and the second time when it goes to match the string on
- the lefthand side of the operator with the pattern on the right.
- This is true of any string-valued expression (such as
- `digits_regexp', shown previously), not just string constants.
-
- What difference does it make if the string is scanned twice? The
-answer has to do with escape sequences, and particularly with
-backslashes. To get a backslash into a regular expression inside a
-string, you have to type two backslashes.
-
- For example, `/\*/' is a regexp constant for a literal `*'. Only
-one backslash is needed. To do the same thing with a string, you have
-to type `"\\*"'. The first backslash escapes the second one so that
-the string actually contains the two characters `\' and `*'.
-
- Given that you can use both regexp and string constants to describe
-regular expressions, which should you use? The answer is "regexp
-constants," for several reasons:
-
- * String constants are more complicated to write and more difficult
- to read. Using regexp constants makes your programs less
- error-prone. Not understanding the difference between the two
- kinds of constants is a common source of errors.
-
- * It is more efficient to use regexp constants. `awk' can note that
- you have supplied a regexp and store it internally in a form that
- makes pattern matching more efficient. When using a string
- constant, `awk' must first convert the string into this internal
- form and then perform the pattern matching.
-
- * Using regexp constants is better form; it shows clearly that you
- intend a regexp match.
-
-Advanced Notes: Using `\n' in Bracket Expressions of Dynamic Regexps
---------------------------------------------------------------------
-
-Some commercial versions of `awk' do not allow the newline character to
-be used inside a bracket expression for a dynamic regexp:
-
- $ awk '$0 ~ "[ \t\n]"'
- error--> awk: newline in character class [
- error--> ]...
- error--> source line number 1
- error--> context is
- error--> >>> <<<
-
- But a newline in a regexp constant works with no problem:
-
- $ awk '$0 ~ /[ \t\n]/'
- here is a sample line
- -| here is a sample line
- Ctrl-d
-
- `gawk' does not have this problem, and it isn't likely to occur
-often in practice, but it's worth noting for future reference.
-
-
-File: gawk.info, Node: Reading Files, Next: Printing, Prev: Regexp, Up: Top
-
-4 Reading Input Files
-*********************
-
-In the typical `awk' program, `awk' reads all input either from the
-standard input (by default, this is the keyboard, but often it is a
-pipe from another command) or from files whose names you specify on the
-`awk' command line. If you specify input files, `awk' reads them in
-order, processing all the data from one before going on to the next.
-The name of the current input file can be found in the built-in variable
-`FILENAME' (*note Built-in Variables::).
-
- The input is read in units called "records", and is processed by the
-rules of your program one record at a time. By default, each record is
-one line. Each record is automatically split into chunks called
-"fields". This makes it more convenient for programs to work on the
-parts of a record.
-
- On rare occasions, you may need to use the `getline' command. The
-`getline' command is valuable, both because it can do explicit input
-from any number of files, and because the files used with it do not
-have to be named on the `awk' command line (*note Getline::).
-
-* Menu:
-
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program control
- using the `getline' function.
-* Read Timeout:: Reading input with a timeout.
-* Command line directories:: What happens if you put a directory on the
- command line.
-
-
-File: gawk.info, Node: Records, Next: Fields, Up: Reading Files
-
-4.1 How Input Is Split into Records
-===================================
-
-The `awk' utility divides the input for your `awk' program into records
-and fields. `awk' keeps track of the number of records that have been
-read so far from the current input file. This value is stored in a
-built-in variable called `FNR'. It is reset to zero when a new file is
-started. Another built-in variable, `NR', records the total number of
-input records read so far from all data files. It starts at zero, but
-is never automatically reset to zero.
-
- Records are separated by a character called the "record separator".
-By default, the record separator is the newline character. This is why
-records are, by default, single lines. A different character can be
-used for the record separator by assigning the character to the
-built-in variable `RS'.
-
- Like any other variable, the value of `RS' can be changed in the
-`awk' program with the assignment operator, `=' (*note Assignment
-Ops::). The new record-separator character should be enclosed in
-quotation marks, which indicate a string constant. Often the right
-time to do this is at the beginning of execution, before any input is
-processed, so that the very first record is read with the proper
-separator. To do this, use the special `BEGIN' pattern (*note
-BEGIN/END::). For example:
-
- awk 'BEGIN { RS = "/" }
- { print $0 }' BBS-list
-
-changes the value of `RS' to `"/"', before reading any input. This is
-a string whose first character is a slash; as a result, records are
-separated by slashes. Then the input file is read, and the second rule
-in the `awk' program (the action with no pattern) prints each record.
-Because each `print' statement adds a newline at the end of its output,
-this `awk' program copies the input with each slash changed to a
-newline. Here are the results of running the program on `BBS-list':
-
- $ awk 'BEGIN { RS = "/" }
- > { print $0 }' BBS-list
- -| aardvark 555-5553 1200
- -| 300 B
- -| alpo-net 555-3412 2400
- -| 1200
- -| 300 A
- -| barfly 555-7685 1200
- -| 300 A
- -| bites 555-1675 2400
- -| 1200
- -| 300 A
- -| camelot 555-0542 300 C
- -| core 555-2912 1200
- -| 300 C
- -| fooey 555-1234 2400
- -| 1200
- -| 300 B
- -| foot 555-6699 1200
- -| 300 B
- -| macfoo 555-6480 1200
- -| 300 A
- -| sdace 555-3430 2400
- -| 1200
- -| 300 A
- -| sabafoo 555-2127 1200
- -| 300 C
- -|
-
-Note that the entry for the `camelot' BBS is not split. In the
-original data file (*note Sample Data Files::), the line looks like
-this:
-
- camelot 555-0542 300 C
-
-It has one baud rate only, so there are no slashes in the record,
-unlike the others which have two or more baud rates. In fact, this
-record is treated as part of the record for the `core' BBS; the newline
-separating them in the output is the original newline in the data file,
-not the one added by `awk' when it printed the record!
-
- Another way to change the record separator is on the command line,
-using the variable-assignment feature (*note Other Arguments::):
-
- awk '{ print $0 }' RS="/" BBS-list
-
-This sets `RS' to `/' before processing `BBS-list'.
-
- Using an unusual character such as `/' for the record separator
-produces correct behavior in the vast majority of cases. However, the
-following (extreme) pipeline prints a surprising `1':
-
- $ echo | awk 'BEGIN { RS = "a" } ; { print NF }'
- -| 1
-
- There is one field, consisting of a newline. The value of the
-built-in variable `NF' is the number of fields in the current record.
-
- Reaching the end of an input file terminates the current input
-record, even if the last character in the file is not the character in
-`RS'. (d.c.)
-
- The empty string `""' (a string without any characters) has a
-special meaning as the value of `RS'. It means that records are
-separated by one or more blank lines and nothing else. *Note Multiple
-Line::, for more details.
-
- If you change the value of `RS' in the middle of an `awk' run, the
-new value is used to delimit subsequent records, but the record
-currently being processed, as well as records already processed, are not
-affected.
-
- After the end of the record has been determined, `gawk' sets the
-variable `RT' to the text in the input that matched `RS'.
-
- When using `gawk', the value of `RS' is not limited to a
-one-character string. It can be any regular expression (*note
-Regexp::). (c.e.) In general, each record ends at the next string that
-matches the regular expression; the next record starts at the end of
-the matching string. This general rule is actually at work in the
-usual case, where `RS' contains just a newline: a record ends at the
-beginning of the next matching string (the next newline in the input),
-and the following record starts just after the end of this string (at
-the first character of the following line). The newline, because it
-matches `RS', is not part of either record.
-
- When `RS' is a single character, `RT' contains the same single
-character. However, when `RS' is a regular expression, `RT' contains
-the actual input text that matched the regular expression.
-
- If the input file ended without any text that matches `RS', `gawk'
-sets `RT' to the null string.
-
- The following example illustrates both of these features. It sets
-`RS' equal to a regular expression that matches either a newline or a
-series of one or more uppercase letters with optional leading and/or
-trailing whitespace:
-
- $ echo record 1 AAAA record 2 BBBB record 3 |
- > gawk 'BEGIN { RS = "\n|( *[[:upper:]]+ *)" }
- > { print "Record =", $0, "and RT =", RT }'
- -| Record = record 1 and RT = AAAA
- -| Record = record 2 and RT = BBBB
- -| Record = record 3 and RT =
- -|
-
-The final line of output has an extra blank line. This is because the
-value of `RT' is a newline, and the `print' statement supplies its own
-terminating newline. *Note Simple Sed::, for a more useful example of
-`RS' as a regexp and `RT'.
-
- If you set `RS' to a regular expression that allows optional
-trailing text, such as `RS = "abc(XYZ)?"' it is possible, due to
-implementation constraints, that `gawk' may match the leading part of
-the regular expression, but not the trailing part, particularly if the
-input text that could match the trailing part is fairly long. `gawk'
-attempts to avoid this problem, but currently, there's no guarantee
-that this will never happen.
-
- NOTE: Remember that in `awk', the `^' and `$' anchor
- metacharacters match the beginning and end of a _string_, and not
- the beginning and end of a _line_. As a result, something like
- `RS = "^[[:upper:]]"' can only match at the beginning of a file.
- This is because `gawk' views the input file as one long string
- that happens to contain newline characters in it. It is thus best
- to avoid anchor characters in the value of `RS'.
-
- The use of `RS' as a regular expression and the `RT' variable are
-`gawk' extensions; they are not available in compatibility mode (*note
-Options::). In compatibility mode, only the first character of the
-value of `RS' is used to determine the end of the record.
-
-Advanced Notes: `RS = "\0"' Is Not Portable
--------------------------------------------
-
-There are times when you might want to treat an entire data file as a
-single record. The only way to make this happen is to give `RS' a
-value that you know doesn't occur in the input file. This is hard to
-do in a general way, such that a program always works for arbitrary
-input files.
-
- You might think that for text files, the NUL character, which
-consists of a character with all bits equal to zero, is a good value to
-use for `RS' in this case:
-
- BEGIN { RS = "\0" } # whole file becomes one record?
-
- `gawk' in fact accepts this, and uses the NUL character for the
-record separator. However, this usage is _not_ portable to other `awk'
-implementations.
-
- All other `awk' implementations(1) store strings internally as
-C-style strings. C strings use the NUL character as the string
-terminator. In effect, this means that `RS = "\0"' is the same as `RS
-= ""'. (d.c.)
-
- The best way to treat a whole file as a single record is to simply
-read the file in, one record at a time, concatenating each record onto
-the end of the previous ones.
-
- ---------- Footnotes ----------
-
- (1) At least that we know about.
-
-
-File: gawk.info, Node: Fields, Next: Nonconstant Fields, Prev: Records, Up: Reading Files
-
-4.2 Examining Fields
-====================
-
-When `awk' reads an input record, the record is automatically "parsed"
-or separated by the `awk' utility into chunks called "fields". By
-default, fields are separated by "whitespace", like words in a line.
-Whitespace in `awk' means any string of one or more spaces, TABs, or
-newlines;(1) other characters, such as formfeed, vertical tab, etc.,
-that are considered whitespace by other languages, are _not_ considered
-whitespace by `awk'.
-
- The purpose of fields is to make it more convenient for you to refer
-to these pieces of the record. You don't have to use them--you can
-operate on the whole record if you want--but fields are what make
-simple `awk' programs so powerful.
-
- A dollar-sign (`$') is used to refer to a field in an `awk' program,
-followed by the number of the field you want. Thus, `$1' refers to the
-first field, `$2' to the second, and so on. (Unlike the Unix shells,
-the field numbers are not limited to single digits. `$127' is the one
-hundred twenty-seventh field in the record.) For example, suppose the
-following is a line of input:
-
- This seems like a pretty nice example.
-
-Here the first field, or `$1', is `This', the second field, or `$2', is
-`seems', and so on. Note that the last field, `$7', is `example.'.
-Because there is no space between the `e' and the `.', the period is
-considered part of the seventh field.
-
- `NF' is a built-in variable whose value is the number of fields in
-the current record. `awk' automatically updates the value of `NF' each
-time it reads a record. No matter how many fields there are, the last
-field in a record can be represented by `$NF'. So, `$NF' is the same
-as `$7', which is `example.'. If you try to reference a field beyond
-the last one (such as `$8' when the record has only seven fields), you
-get the empty string. (If used in a numeric operation, you get zero.)
-
- The use of `$0', which looks like a reference to the "zero-th"
-field, is a special case: it represents the whole input record when you
-are not interested in specific fields. Here are some more examples:
-
- $ awk '$1 ~ /foo/ { print $0 }' BBS-list
- -| fooey 555-1234 2400/1200/300 B
- -| foot 555-6699 1200/300 B
- -| macfoo 555-6480 1200/300 A
- -| sabafoo 555-2127 1200/300 C
-
-This example prints each record in the file `BBS-list' whose first
-field contains the string `foo'. The operator `~' is called a
-"matching operator" (*note Regexp Usage::); it tests whether a string
-(here, the field `$1') matches a given regular expression.
-
- By contrast, the following example looks for `foo' in _the entire
-record_ and prints the first field and the last field for each matching
-input record:
-
- $ awk '/foo/ { print $1, $NF }' BBS-list
- -| fooey B
- -| foot B
- -| macfoo A
- -| sabafoo C
-
- ---------- Footnotes ----------
-
- (1) In POSIX `awk', newlines are not considered whitespace for
-separating fields.
-
-
-File: gawk.info, Node: Nonconstant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files
-
-4.3 Nonconstant Field Numbers
-=============================
-
-The number of a field does not need to be a constant. Any expression in
-the `awk' language can be used after a `$' to refer to a field. The
-value of the expression specifies the field number. If the value is a
-string, rather than a number, it is converted to a number. Consider
-this example:
-
- awk '{ print $NR }'
-
-Recall that `NR' is the number of records read so far: one in the first
-record, two in the second, etc. So this example prints the first field
-of the first record, the second field of the second record, and so on.
-For the twentieth record, field number 20 is printed; most likely, the
-record has fewer than 20 fields, so this prints a blank line. Here is
-another example of using expressions as field numbers:
-
- awk '{ print $(2*2) }' BBS-list
-
- `awk' evaluates the expression `(2*2)' and uses its value as the
-number of the field to print. The `*' sign represents multiplication,
-so the expression `2*2' evaluates to four. The parentheses are used so
-that the multiplication is done before the `$' operation; they are
-necessary whenever there is a binary operator in the field-number
-expression. This example, then, prints the hours of operation (the
-fourth field) for every line of the file `BBS-list'. (All of the `awk'
-operators are listed, in order of decreasing precedence, in *note
-Precedence::.)
-
- If the field number you compute is zero, you get the entire record.
-Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are
-not allowed; trying to reference one usually terminates the program.
-(The POSIX standard does not define what happens when you reference a
-negative field number. `gawk' notices this and terminates your
-program. Other `awk' implementations may behave differently.)
-
- As mentioned in *note Fields::, `awk' stores the current record's
-number of fields in the built-in variable `NF' (also *note Built-in
-Variables::). The expression `$NF' is not a special feature--it is the
-direct consequence of evaluating `NF' and using its value as a field
-number.
-
-
-File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Nonconstant Fields, Up: Reading Files
-
-4.4 Changing the Contents of a Field
-====================================
-
-The contents of a field, as seen by `awk', can be changed within an
-`awk' program; this changes what `awk' perceives as the current input
-record. (The actual input is untouched; `awk' _never_ modifies the
-input file.) Consider the following example and its output:
-
- $ awk '{ nboxes = $3 ; $3 = $3 - 10
- > print nboxes, $3 }' inventory-shipped
- -| 25 15
- -| 32 22
- -| 24 14
- ...
-
-The program first saves the original value of field three in the
-variable `nboxes'. The `-' sign represents subtraction, so this
-program reassigns field three, `$3', as the original value of field
-three minus ten: `$3 - 10'. (*Note Arithmetic Ops::.) Then it prints
-the original and new values for field three. (Someone in the warehouse
-made a consistent mistake while inventorying the red boxes.)
-
- For this to work, the text in field `$3' must make sense as a
-number; the string of characters must be converted to a number for the
-computer to do arithmetic on it. The number resulting from the
-subtraction is converted back to a string of characters that then
-becomes field three. *Note Conversion::.
-
- When the value of a field is changed (as perceived by `awk'), the
-text of the input record is recalculated to contain the new field where
-the old one was. In other words, `$0' changes to reflect the altered
-field. Thus, this program prints a copy of the input file, with 10
-subtracted from the second field of each line:
-
- $ awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
- -| Jan 3 25 15 115
- -| Feb 5 32 24 226
- -| Mar 5 24 34 228
- ...
-
- It is also possible to also assign contents to fields that are out
-of range. For example:
-
- $ awk '{ $6 = ($5 + $4 + $3 + $2)
- > print $6 }' inventory-shipped
- -| 168
- -| 297
- -| 301
- ...
-
-We've just created `$6', whose value is the sum of fields `$2', `$3',
-`$4', and `$5'. The `+' sign represents addition. For the file
-`inventory-shipped', `$6' represents the total number of parcels
-shipped for a particular month.
-
- Creating a new field changes `awk''s internal copy of the current
-input record, which is the value of `$0'. Thus, if you do `print $0'
-after adding a field, the record printed includes the new field, with
-the appropriate number of field separators between it and the previously
-existing fields.
-
- This recomputation affects and is affected by `NF' (the number of
-fields; *note Fields::). For example, the value of `NF' is set to the
-number of the highest field you create. The exact format of `$0' is
-also affected by a feature that has not been discussed yet: the "output
-field separator", `OFS', used to separate the fields (*note Output
-Separators::).
-
- Note, however, that merely _referencing_ an out-of-range field does
-_not_ change the value of either `$0' or `NF'. Referencing an
-out-of-range field only produces an empty string. For example:
-
- if ($(NF+1) != "")
- print "can't happen"
- else
- print "everything is normal"
-
-should print `everything is normal', because `NF+1' is certain to be
-out of range. (*Note If Statement::, for more information about
-`awk''s `if-else' statements. *Note Typing and Comparison::, for more
-information about the `!=' operator.)
-
- It is important to note that making an assignment to an existing
-field changes the value of `$0' but does not change the value of `NF',
-even when you assign the empty string to a field. For example:
-
- $ echo a b c d | awk '{ OFS = ":"; $2 = ""
- > print $0; print NF }'
- -| a::c:d
- -| 4
-
-The field is still there; it just has an empty value, denoted by the
-two colons between `a' and `c'. This example shows what happens if you
-create a new field:
-
- $ echo a b c d | awk '{ OFS = ":"; $2 = ""; $6 = "new"
- > print $0; print NF }'
- -| a::c:d::new
- -| 6
-
-The intervening field, `$5', is created with an empty value (indicated
-by the second pair of adjacent colons), and `NF' is updated with the
-value six.
-
- Decrementing `NF' throws away the values of the fields after the new
-value of `NF' and recomputes `$0'. (d.c.) Here is an example:
-
- $ echo a b c d e f | awk '{ print "NF =", NF;
- > NF = 3; print $0 }'
- -| NF = 6
- -| a b c
-
- CAUTION: Some versions of `awk' don't rebuild `$0' when `NF' is
- decremented. Caveat emptor.
-
- Finally, there are times when it is convenient to force `awk' to
-rebuild the entire record, using the current value of the fields and
-`OFS'. To do this, use the seemingly innocuous assignment:
-
- $1 = $1 # force record to be reconstituted
- print $0 # or whatever else with $0
-
-This forces `awk' to rebuild the record. It does help to add a
-comment, as we've shown here.
-
- There is a flip side to the relationship between `$0' and the
-fields. Any assignment to `$0' causes the record to be reparsed into
-fields using the _current_ value of `FS'. This also applies to any
-built-in function that updates `$0', such as `sub()' and `gsub()'
-(*note String Functions::).
-
-Advanced Notes: Understanding `$0'
-----------------------------------
-
-It is important to remember that `$0' is the _full_ record, exactly as
-it was read from the input. This includes any leading or trailing
-whitespace, and the exact whitespace (or other characters) that
-separate the fields.
-
- It is a not-uncommon error to try to change the field separators in
-a record simply by setting `FS' and `OFS', and then expecting a plain
-`print' or `print $0' to print the modified record.
-
- But this does not work, since nothing was done to change the record
-itself. Instead, you must force the record to be rebuilt, typically
-with a statement such as `$1 = $1', as described earlier.
-
-
-File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files
-
-4.5 Specifying How Fields Are Separated
-=======================================
-
-* Menu:
-
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting `FS' from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-
- The "field separator", which is either a single character or a
-regular expression, controls the way `awk' splits an input record into
-fields. `awk' scans the input record for character sequences that
-match the separator; the fields themselves are the text between the
-matches.
-
- In the examples that follow, we use the bullet symbol (*) to
-represent spaces in the output. If the field separator is `oo', then
-the following line:
-
- moo goo gai pan
-
-is split into three fields: `m', `*g', and `*gai*pan'. Note the
-leading spaces in the values of the second and third fields.
-
- The field separator is represented by the built-in variable `FS'.
-Shell programmers take note: `awk' does _not_ use the name `IFS' that
-is used by the POSIX-compliant shells (such as the Unix Bourne shell,
-`sh', or Bash).
-
- The value of `FS' can be changed in the `awk' program with the
-assignment operator, `=' (*note Assignment Ops::). Often the right
-time to do this is at the beginning of execution before any input has
-been processed, so that the very first record is read with the proper
-separator. To do this, use the special `BEGIN' pattern (*note
-BEGIN/END::). For example, here we set the value of `FS' to the string
-`","':
-
- awk 'BEGIN { FS = "," } ; { print $2 }'
-
-Given the input line:
-
- John Q. Smith, 29 Oak St., Walamazoo, MI 42139
-
-this `awk' program extracts and prints the string `*29*Oak*St.'.
-
- Sometimes the input data contains separator characters that don't
-separate fields the way you thought they would. For instance, the
-person's name in the example we just used might have a title or suffix
-attached, such as:
-
- John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
-
-The same program would extract `*LXIX', instead of `*29*Oak*St.'. If
-you were expecting the program to print the address, you would be
-surprised. The moral is to choose your data layout and separator
-characters carefully to prevent such problems. (If the data is not in
-a form that is easy to process, perhaps you can massage it first with a
-separate `awk' program.)
-
-
-File: gawk.info, Node: Default Field Splitting, Next: Regexp Field Splitting, Up: Field Separators
-
-4.5.1 Whitespace Normally Separates Fields
-------------------------------------------
-
-Fields are normally separated by whitespace sequences (spaces, TABs,
-and newlines), not by single spaces. Two spaces in a row do not
-delimit an empty field. The default value of the field separator `FS'
-is a string containing a single space, `" "'. If `awk' interpreted
-this value in the usual way, each space character would separate
-fields, so two spaces in a row would make an empty field between them.
-The reason this does not happen is that a single space as the value of
-`FS' is a special case--it is taken to specify the default manner of
-delimiting fields.
-
- If `FS' is any other single character, such as `","', then each
-occurrence of that character separates two fields. Two consecutive
-occurrences delimit an empty field. If the character occurs at the
-beginning or the end of the line, that too delimits an empty field. The
-space character is the only single character that does not follow these
-rules.
-
-
-File: gawk.info, Node: Regexp Field Splitting, Next: Single Character Fields, Prev: Default Field Splitting, Up: Field Separators
-
-4.5.2 Using Regular Expressions to Separate Fields
---------------------------------------------------
-
-The previous node discussed the use of single characters or simple
-strings as the value of `FS'. More generally, the value of `FS' may be
-a string containing any regular expression. In this case, each match
-in the record for the regular expression separates fields. For
-example, the assignment:
-
- FS = ", \t"
-
-makes every area of an input line that consists of a comma followed by a
-space and a TAB into a field separator. (`\t' is an "escape sequence"
-that stands for a TAB; *note Escape Sequences::, for the complete list
-of similar escape sequences.)
-
- For a less trivial example of a regular expression, try using single
-spaces to separate fields the way single commas are used. `FS' can be
-set to `"[ ]"' (left bracket, space, right bracket). This regular
-expression matches a single space and nothing else (*note Regexp::).
-
- There is an important difference between the two cases of `FS = " "'
-(a single space) and `FS = "[ \t\n]+"' (a regular expression matching
-one or more spaces, TABs, or newlines). For both values of `FS',
-fields are separated by "runs" (multiple adjacent occurrences) of
-spaces, TABs, and/or newlines. However, when the value of `FS' is
-`" "', `awk' first strips leading and trailing whitespace from the
-record and then decides where the fields are. For example, the
-following pipeline prints `b':
-
- $ echo ' a b c d ' | awk '{ print $2 }'
- -| b
-
-However, this pipeline prints `a' (note the extra spaces around each
-letter):
-
- $ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t\n]+" }
- > { print $2 }'
- -| a
-
-In this case, the first field is "null" or empty.
-
- The stripping of leading and trailing whitespace also comes into
-play whenever `$0' is recomputed. For instance, study this pipeline:
-
- $ echo ' a b c d' | awk '{ print; $2 = $2; print }'
- -| a b c d
- -| a b c d
-
-The first `print' statement prints the record as it was read, with
-leading whitespace intact. The assignment to `$2' rebuilds `$0' by
-concatenating `$1' through `$NF' together, separated by the value of
-`OFS'. Because the leading whitespace was ignored when finding `$1',
-it is not part of the new `$0'. Finally, the last `print' statement
-prints the new `$0'.
-
- There is an additional subtlety to be aware of when using regular
-expressions for field splitting. It is not well-specified in the POSIX
-standard, or anywhere else, what `^' means when splitting fields. Does
-the `^' match only at the beginning of the entire record? Or is each
-field separator a new string? It turns out that different `awk'
-versions answer this question differently, and you should not rely on
-any specific behavior in your programs. (d.c.)
-
- As a point of information, Brian Kernighan's `awk' allows `^' to
-match only at the beginning of the record. `gawk' also works this way.
-For example:
-
- $ echo 'xxAA xxBxx C' |
- > gawk -F '(^x+)|( +)' '{ for (i = 1; i <= NF; i++)
- > printf "-->%s<--\n", $i }'
- -| --><--
- -| -->AA<--
- -| -->xxBxx<--
- -| -->C<--
-
-
-File: gawk.info, Node: Single Character Fields, Next: Command Line Field Separator, Prev: Regexp Field Splitting, Up: Field Separators
-
-4.5.3 Making Each Character a Separate Field
---------------------------------------------
-
-There are times when you may want to examine each character of a record
-separately. This can be done in `gawk' by simply assigning the null
-string (`""') to `FS'. (c.e.) In this case, each individual character
-in the record becomes a separate field. For example:
-
- $ echo a b | gawk 'BEGIN { FS = "" }
- > {
- > for (i = 1; i <= NF; i = i + 1)
- > print "Field", i, "is", $i
- > }'
- -| Field 1 is a
- -| Field 2 is
- -| Field 3 is b
-
- Traditionally, the behavior of `FS' equal to `""' was not defined.
-In this case, most versions of Unix `awk' simply treat the entire record
-as only having one field. (d.c.) In compatibility mode (*note
-Options::), if `FS' is the null string, then `gawk' also behaves this
-way.
-
-
-File: gawk.info, Node: Command Line Field Separator, Next: Field Splitting Summary, Prev: Single Character Fields, Up: Field Separators
-
-4.5.4 Setting `FS' from the Command Line
-----------------------------------------
-
-`FS' can be set on the command line. Use the `-F' option to do so.
-For example:
-
- awk -F, 'PROGRAM' INPUT-FILES
-
-sets `FS' to the `,' character. Notice that the option uses an
-uppercase `F' instead of a lowercase `f'. The latter option (`-f')
-specifies a file containing an `awk' program. Case is significant in
-command-line options: the `-F' and `-f' options have nothing to do with
-each other. You can use both options at the same time to set the `FS'
-variable _and_ get an `awk' program from a file.
-
- The value used for the argument to `-F' is processed in exactly the
-same way as assignments to the built-in variable `FS'. Any special
-characters in the field separator must be escaped appropriately. For
-example, to use a `\' as the field separator on the command line, you
-would have to type:
-
- # same as FS = "\\"
- awk -F\\\\ '...' files ...
-
-Because `\' is used for quoting in the shell, `awk' sees `-F\\'. Then
-`awk' processes the `\\' for escape characters (*note Escape
-Sequences::), finally yielding a single `\' to use for the field
-separator.
-
- As a special case, in compatibility mode (*note Options::), if the
-argument to `-F' is `t', then `FS' is set to the TAB character. If you
-type `-F\t' at the shell, without any quotes, the `\' gets deleted, so
-`awk' figures that you really want your fields to be separated with
-TABs and not `t's. Use `-v FS="t"' or `-F"[t]"' on the command line if
-you really do want to separate your fields with `t's.
-
- As an example, let's use an `awk' program file called `baud.awk'
-that contains the pattern `/300/' and the action `print $1':
-
- /300/ { print $1 }
-
- Let's also set `FS' to be the `-' character and run the program on
-the file `BBS-list'. The following command prints a list of the names
-of the bulletin boards that operate at 300 baud and the first three
-digits of their phone numbers:
-
- $ awk -F- -f baud.awk BBS-list
- -| aardvark 555
- -| alpo
- -| barfly 555
- -| bites 555
- -| camelot 555
- -| core 555
- -| fooey 555
- -| foot 555
- -| macfoo 555
- -| sdace 555
- -| sabafoo 555
-
-Note the second line of output. The second line in the original file
-looked like this:
-
- alpo-net 555-3412 2400/1200/300 A
-
- The `-' as part of the system's name was used as the field
-separator, instead of the `-' in the phone number that was originally
-intended. This demonstrates why you have to be careful in choosing
-your field and record separators.
-
- Perhaps the most common use of a single character as the field
-separator occurs when processing the Unix system password file. On
-many Unix systems, each user has a separate entry in the system password
-file, one line per user. The information in these lines is separated
-by colons. The first field is the user's login name and the second is
-the user's (encrypted or shadow) password. A password file entry might
-look like this:
-
- arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash
-
- The following program searches the system password file and prints
-the entries for users who have no password:
-
- awk -F: '$2 == ""' /etc/passwd
-
-
-File: gawk.info, Node: Field Splitting Summary, Prev: Command Line Field Separator, Up: Field Separators
-
-4.5.5 Field-Splitting Summary
------------------------------
-
-It is important to remember that when you assign a string constant as
-the value of `FS', it undergoes normal `awk' string processing. For
-example, with Unix `awk' and `gawk', the assignment `FS = "\.."'
-assigns the character string `".."' to `FS' (the backslash is
-stripped). This creates a regexp meaning "fields are separated by
-occurrences of any two characters." If instead you want fields to be
-separated by a literal period followed by any single character, use `FS
-= "\\.."'.
-
- The following table summarizes how fields are split, based on the
-value of `FS' (`==' means "is equal to"):
-
-`FS == " "'
- Fields are separated by runs of whitespace. Leading and trailing
- whitespace are ignored. This is the default.
-
-`FS == ANY OTHER SINGLE CHARACTER'
- Fields are separated by each occurrence of the character. Multiple
- successive occurrences delimit empty fields, as do leading and
- trailing occurrences. The character can even be a regexp
- metacharacter; it does not need to be escaped.
-
-`FS == REGEXP'
- Fields are separated by occurrences of characters that match
- REGEXP. Leading and trailing matches of REGEXP delimit empty
- fields.
-
-`FS == ""'
- Each individual character in the record becomes a separate field.
- (This is a `gawk' extension; it is not specified by the POSIX
- standard.)
-
-Advanced Notes: Changing `FS' Does Not Affect the Fields
---------------------------------------------------------
-
-According to the POSIX standard, `awk' is supposed to behave as if each
-record is split into fields at the time it is read. In particular,
-this means that if you change the value of `FS' after a record is read,
-the value of the fields (i.e., how they were split) should reflect the
-old value of `FS', not the new one.
-
- However, many older implementations of `awk' do not work this way.
-Instead, they defer splitting the fields until a field is actually
-referenced. The fields are split using the _current_ value of `FS'!
-(d.c.) This behavior can be difficult to diagnose. The following
-example illustrates the difference between the two methods. (The
-`sed'(1) command prints just the first line of `/etc/passwd'.)
-
- sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
-
-which usually prints:
-
- root
-
-on an incorrect implementation of `awk', while `gawk' prints something
-like:
-
- root:nSijPlPhZZwgE:0:0:Root:/:
-
-Advanced Notes: `FS' and `IGNORECASE'
--------------------------------------
-
-The `IGNORECASE' variable (*note User-modified::) affects field
-splitting _only_ when the value of `FS' is a regexp. It has no effect
-when `FS' is a single character, even if that character is a letter.
-Thus, in the following code:
-
- FS = "c"
- IGNORECASE = 1
- $0 = "aCa"
- print $1
-
-The output is `aCa'. If you really want to split fields on an
-alphabetic character while ignoring case, use a regexp that will do it
-for you. E.g., `FS = "[c]"'. In this case, `IGNORECASE' will take
-effect.
-
- ---------- Footnotes ----------
-
- (1) The `sed' utility is a "stream editor." Its behavior is also
-defined by the POSIX standard.
-
-
-File: gawk.info, Node: Constant Size, Next: Splitting By Content, Prev: Field Separators, Up: Reading Files
-
-4.6 Reading Fixed-Width Data
-============================
-
-(This minor node discusses an advanced feature of `awk'. If you are a
-novice `awk' user, you might want to skip it on the first reading.)
-
-`gawk' provides a facility for dealing with fixed-width fields with no
-distinctive field separator. For example, data of this nature arises
-in the input for old Fortran programs where numbers are run together,
-or in the output of programs that did not anticipate the use of their
-output as input for other programs.
-
- An example of the latter is a table where all the columns are lined
-up by the use of a variable number of spaces and _empty fields are just
-spaces_. Clearly, `awk''s normal field splitting based on `FS' does
-not work well in this case. Although a portable `awk' program can use
-a series of `substr()' calls on `$0' (*note String Functions::), this
-is awkward and inefficient for a large number of fields.
-
- The splitting of an input record into fixed-width fields is
-specified by assigning a string containing space-separated numbers to
-the built-in variable `FIELDWIDTHS'. Each number specifies the width
-of the field, _including_ columns between fields. If you want to
-ignore the columns between fields, you can specify the width as a
-separate field that is subsequently ignored. It is a fatal error to
-supply a field width that is not a positive number. The following data
-is the output of the Unix `w' utility. It is useful to illustrate the
-use of `FIELDWIDTHS':
-
- 10:06pm up 21 days, 14:04, 23 users
- User tty login idle JCPU PCPU what
- hzuo ttyV0 8:58pm 9 5 vi p24.tex
- hzang ttyV3 6:37pm 50 -csh
- eklye ttyV5 9:53pm 7 1 em thes.tex
- dportein ttyV6 8:17pm 1:47 -csh
- gierd ttyD3 10:00pm 1 elm
- dave ttyD4 9:47pm 4 4 w
- brent ttyp0 26Jun91 4:46 26:46 4:41 bash
- dave ttyq4 26Jun9115days 46 46 wnewmail
-
- The following program takes the above input, converts the idle time
-to number of seconds, and prints out the first two fields and the
-calculated idle time:
-
- NOTE: This program uses a number of `awk' features that haven't
- been introduced yet.
-
- BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
- NR > 2 {
- idle = $4
- sub(/^ */, "", idle) # strip leading spaces
- if (idle == "")
- idle = 0
- if (idle ~ /:/) {
- split(idle, t, ":")
- idle = t[1] * 60 + t[2]
- }
- if (idle ~ /days/)
- idle *= 24 * 60 * 60
-
- print $1, $2, idle
- }
-
- Running the program on the data produces the following results:
-
- hzuo ttyV0 0
- hzang ttyV3 50
- eklye ttyV5 0
- dportein ttyV6 107
- gierd ttyD3 1
- dave ttyD4 0
- brent ttyp0 286
- dave ttyq4 1296000
-
- Another (possibly more practical) example of fixed-width input data
-is the input from a deck of balloting cards. In some parts of the
-United States, voters mark their choices by punching holes in computer
-cards. These cards are then processed to count the votes for any
-particular candidate or on any particular issue. Because a voter may
-choose not to vote on some issue, any column on the card may be empty.
-An `awk' program for processing such data could use the `FIELDWIDTHS'
-feature to simplify reading the data. (Of course, getting `gawk' to
-run on a system with card readers is another story!)
-
- Assigning a value to `FS' causes `gawk' to use `FS' for field
-splitting again. Use `FS = FS' to make this happen, without having to
-know the current value of `FS'. In order to tell which kind of field
-splitting is in effect, use `PROCINFO["FS"]' (*note Auto-set::). The
-value is `"FS"' if regular field splitting is being used, or it is
-`"FIELDWIDTHS"' if fixed-width field splitting is being used:
-
- if (PROCINFO["FS"] == "FS")
- REGULAR FIELD SPLITTING ...
- else if (PROCINFO["FS"] == "FIELDWIDTHS")
- FIXED-WIDTH FIELD SPLITTING ...
- else
- CONTENT-BASED FIELD SPLITTING ... (see next minor node)
-
- This information is useful when writing a function that needs to
-temporarily change `FS' or `FIELDWIDTHS', read some records, and then
-restore the original settings (*note Passwd Functions::, for an example
-of such a function).
-
-
-File: gawk.info, Node: Splitting By Content, Next: Multiple Line, Prev: Constant Size, Up: Reading Files
-
-4.7 Defining Fields By Content
-==============================
-
-(This minor node discusses an advanced feature of `awk'. If you are a
-novice `awk' user, you might want to skip it on the first reading.)
-
-Normally, when using `FS', `gawk' defines the fields as the parts of
-the record that occur in between each field separator. In other words,
-`FS' defines what a field _is not_, instead of what a field _is_.
-However, there are times when you really want to define the fields by
-what they are, and not by what they are not.
-
- The most notorious such case is so-called "comma separated value"
-(CSV) data. Many spreadsheet programs, for example, can export their
-data into text files, where each record is terminated with a newline,
-and fields are separated by commas. If only commas separated the data,
-there wouldn't be an issue. The problem comes when one of the fields
-contains an _embedded_ comma. While there is no formal standard
-specification for CSV data(1), in such cases, most programs embed the
-field in double quotes. So we might have data like this:
-
- Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
-
- The `FPAT' variable offers a solution for cases like this. The
-value of `FPAT' should be a string that provides a regular expression.
-This regular expression describes the contents of each field.
-
- In the case of CSV data as presented above, each field is either
-"anything that is not a comma," or "a double quote, anything that is
-not a double quote, and a closing double quote." If written as a
-regular expression constant (*note Regexp::), we would have
-`/([^,]+)|("[^"]+")/'. Writing this as a string requires us to escape
-the double quotes, leading to:
-
- FPAT = "([^,]+)|(\"[^\"]+\")"
-
- Putting this to use, here is a simple program to parse the data:
-
- BEGIN {
- FPAT = "([^,]+)|(\"[^\"]+\")"
- }
-
- {
- print "NF = ", NF
- for (i = 1; i <= NF; i++) {
- printf("$%d = <%s>\n", i, $i)
- }
- }
-
- When run, we get the following:
-
- $ gawk -f simple-csv.awk addresses.csv
- NF = 7
- $1 = <Robbins>
- $2 = <Arnold>
- $3 = <"1234 A Pretty Street, NE">
- $4 = <MyTown>
- $5 = <MyState>
- $6 = <12345-6789>
- $7 = <USA>
-
- Note the embedded comma in the value of `$3'.
-
- A straightforward improvement when processing CSV data of this sort
-would be to remove the quotes when they occur, with something like this:
-
- if (substr($i, 1, 1) == "\"") {
- len = length($i)
- $i = substr($i, 2, len - 2) # Get text within the two quotes
- }
-
- As with `FS', the `IGNORECASE' variable (*note User-modified::)
-affects field splitting with `FPAT'.
-
- Similar to `FIELDWIDTHS', the value of `PROCINFO["FS"]' will be
-`"FPAT"' if content-based field splitting is being used.
-
- NOTE: Some programs export CSV data that contains embedded
- newlines between the double quotes. `gawk' provides no way to
- deal with this. Since there is no formal specification for CSV
- data, there isn't much more to be done; the `FPAT' mechanism
- provides an elegant solution for the majority of cases, and the
- `gawk' maintainer is satisfied with that.
-
- As written, the regexp used for `FPAT' requires that each field have
-a least one character. A straightforward modification (changing
-changed the first `+' to `*') allows fields to be empty:
-
- FPAT = "([^,]*)|(\"[^\"]+\")"
-
- Finally, the `patsplit()' function makes the same functionality
-available for splitting regular strings (*note String Functions::).
-
- ---------- Footnotes ----------
-
- (1) At least, we don't know of one.
-
-
-File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Splitting By Content, Up: Reading Files
-
-4.8 Multiple-Line Records
-=========================
-
-In some databases, a single line cannot conveniently hold all the
-information in one entry. In such cases, you can use multiline
-records. The first step in doing this is to choose your data format.
-
- One technique is to use an unusual character or string to separate
-records. For example, you could use the formfeed character (written
-`\f' in `awk', as in C) to separate them, making each record a page of
-the file. To do this, just set the variable `RS' to `"\f"' (a string
-containing the formfeed character). Any other character could equally
-well be used, as long as it won't be part of the data in a record.
-
- Another technique is to have blank lines separate records. By a
-special dispensation, an empty string as the value of `RS' indicates
-that records are separated by one or more blank lines. When `RS' is set
-to the empty string, each record always ends at the first blank line
-encountered. The next record doesn't start until the first nonblank
-line that follows. No matter how many blank lines appear in a row, they
-all act as one record separator. (Blank lines must be completely
-empty; lines that contain only whitespace do not count.)
-
- You can achieve the same effect as `RS = ""' by assigning the string
-`"\n\n+"' to `RS'. This regexp matches the newline at the end of the
-record and one or more blank lines after the record. In addition, a
-regular expression always matches the longest possible sequence when
-there is a choice (*note Leftmost Longest::). So the next record
-doesn't start until the first nonblank line that follows--no matter how
-many blank lines appear in a row, they are considered one record
-separator.
-
- There is an important difference between `RS = ""' and `RS =
-"\n\n+"'. In the first case, leading newlines in the input data file
-are ignored, and if a file ends without extra blank lines after the
-last record, the final newline is removed from the record. In the
-second case, this special processing is not done. (d.c.)
-
- Now that the input is separated into records, the second step is to
-separate the fields in the record. One way to do this is to divide each
-of the lines into fields in the normal manner. This happens by default
-as the result of a special feature. When `RS' is set to the empty
-string, _and_ `FS' is set to a single character, the newline character
-_always_ acts as a field separator. This is in addition to whatever
-field separations result from `FS'.(1)
-
- The original motivation for this special exception was probably to
-provide useful behavior in the default case (i.e., `FS' is equal to
-`" "'). This feature can be a problem if you really don't want the
-newline character to separate fields, because there is no way to
-prevent it. However, you can work around this by using the `split()'
-function to break up the record manually (*note String Functions::).
-If you have a single character field separator, you can work around the
-special feature in a different way, by making `FS' into a regexp for
-that single character. For example, if the field separator is a
-percent character, instead of `FS = "%"', use `FS = "[%]"'.
-
- Another way to separate fields is to put each field on a separate
-line: to do this, just set the variable `FS' to the string `"\n"'.
-(This single character separator matches a single newline.) A
-practical example of a data file organized this way might be a mailing
-list, where each entry is separated by blank lines. Consider a mailing
-list in a file named `addresses', which looks like this:
-
- Jane Doe
- 123 Main Street
- Anywhere, SE 12345-6789
-
- John Smith
- 456 Tree-lined Avenue
- Smallville, MW 98765-4321
- ...
-
-A simple program to process this file is as follows:
-
- # addrs.awk --- simple mailing list program
-
- # Records are separated by blank lines.
- # Each line is one field.
- BEGIN { RS = "" ; FS = "\n" }
-
- {
- print "Name is:", $1
- print "Address is:", $2
- print "City and State are:", $3
- print ""
- }
-
- Running the program produces the following output:
-
- $ awk -f addrs.awk addresses
- -| Name is: Jane Doe
- -| Address is: 123 Main Street
- -| City and State are: Anywhere, SE 12345-6789
- -|
- -| Name is: John Smith
- -| Address is: 456 Tree-lined Avenue
- -| City and State are: Smallville, MW 98765-4321
- -|
- ...
-
- *Note Labels Program::, for a more realistic program that deals with
-address lists. The following table summarizes how records are split,
-based on the value of `RS'. (`==' means "is equal to.")
-
-`RS == "\n"'
- Records are separated by the newline character (`\n'). In effect,
- every line in the data file is a separate record, including blank
- lines. This is the default.
-
-`RS == ANY SINGLE CHARACTER'
- Records are separated by each occurrence of the character.
- Multiple successive occurrences delimit empty records.
-
-`RS == ""'
- Records are separated by runs of blank lines. When `FS' is a
- single character, then the newline character always serves as a
- field separator, in addition to whatever value `FS' may have.
- Leading and trailing newlines in a file are ignored.
-
-`RS == REGEXP'
- Records are separated by occurrences of characters that match
- REGEXP. Leading and trailing matches of REGEXP delimit empty
- records. (This is a `gawk' extension; it is not specified by the
- POSIX standard.)
-
- In all cases, `gawk' sets `RT' to the input text that matched the
-value specified by `RS'. But if the input file ended without any text
-that matches `RS', then `gawk' sets `RT' to the null string.
-
- ---------- Footnotes ----------
-
- (1) When `FS' is the null string (`""') or a regexp, this special
-feature of `RS' does not apply. It does apply to the default field
-separator of a single space: `FS = " "'.
-
-
-File: gawk.info, Node: Getline, Next: Read Timeout, Prev: Multiple Line, Up: Reading Files
-
-4.9 Explicit Input with `getline'
-=================================
-
-So far we have been getting our input data from `awk''s main input
-stream--either the standard input (usually your terminal, sometimes the
-output from another program) or from the files specified on the command
-line. The `awk' language has a special built-in command called
-`getline' that can be used to read input under your explicit control.
-
- The `getline' command is used in several different ways and should
-_not_ be used by beginners. The examples that follow the explanation
-of the `getline' command include material that has not been covered
-yet. Therefore, come back and study the `getline' command _after_ you
-have reviewed the rest of this Info file and have a good knowledge of
-how `awk' works.
-
- The `getline' command returns one if it finds a record and zero if
-it encounters the end of the file. If there is some error in getting a
-record, such as a file that cannot be opened, then `getline' returns
--1. In this case, `gawk' sets the variable `ERRNO' to a string
-describing the error that occurred.
-
- In the following examples, COMMAND stands for a string value that
-represents a shell command.
-
- NOTE: When `--sandbox' is specified (*note Options::), reading
- lines from files, pipes and coprocesses is disabled.
-
-* Menu:
-
-* Plain Getline:: Using `getline' with no arguments.
-* Getline/Variable:: Using `getline' into a variable.
-* Getline/File:: Using `getline' from a file.
-* Getline/Variable/File:: Using `getline' into a variable from a
- file.
-* Getline/Pipe:: Using `getline' from a pipe.
-* Getline/Variable/Pipe:: Using `getline' into a variable from a
- pipe.
-* Getline/Coprocess:: Using `getline' from a coprocess.
-* Getline/Variable/Coprocess:: Using `getline' into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about `getline'.
-* Getline Summary:: Summary of `getline' Variants.
-
-
-File: gawk.info, Node: Plain Getline, Next: Getline/Variable, Up: Getline
-
-4.9.1 Using `getline' with No Arguments
----------------------------------------
-
-The `getline' command can be used without arguments to read input from
-the current input file. All it does in this case is read the next
-input record and split it up into fields. This is useful if you've
-finished processing the current record, but want to do some special
-processing on the next record _right now_. For example:
-
- {
- if ((t = index($0, "/*")) != 0) {
- # value of `tmp' will be "" if t is 1
- tmp = substr($0, 1, t - 1)
- u = index(substr($0, t + 2), "*/")
- offset = t + 2
- while (u == 0) {
- if (getline <= 0) {
- m = "unexpected EOF or error"
- m = (m ": " ERRNO)
- print m > "/dev/stderr"
- exit
- }
- u = index($0, "*/")
- offset = 0
- }
- # substr() expression will be "" if */
- # occurred at end of line
- $0 = tmp substr($0, offset + u + 2)
- }
- print $0
- }
-
- This `awk' program deletes C-style comments (`/* ... */') from the
-input. By replacing the `print $0' with other statements, you could
-perform more complicated processing on the decommented input, such as
-searching for matches of a regular expression. (This program has a
-subtle problem--it does not work if one comment ends and another begins
-on the same line.)
-
- This form of the `getline' command sets `NF', `NR', `FNR', and the
-value of `$0'.
-
- NOTE: The new value of `$0' is used to test the patterns of any
- subsequent rules. The original value of `$0' that triggered the
- rule that executed `getline' is lost. By contrast, the `next'
- statement reads a new record but immediately begins processing it
- normally, starting with the first rule in the program. *Note Next
- Statement::.
-
-
-File: gawk.info, Node: Getline/Variable, Next: Getline/File, Prev: Plain Getline, Up: Getline
-
-4.9.2 Using `getline' into a Variable
--------------------------------------
-
-You can use `getline VAR' to read the next record from `awk''s input
-into the variable VAR. No other processing is done. For example,
-suppose the next line is a comment or a special string, and you want to
-read it without triggering any rules. This form of `getline' allows
-you to read that line and store it in a variable so that the main
-read-a-line-and-check-each-rule loop of `awk' never sees it. The
-following example swaps every two lines of input:
-
- {
- if ((getline tmp) > 0) {
- print tmp
- print $0
- } else
- print $0
- }
-
-It takes the following list:
-
- wan
- tew
- free
- phore
-
-and produces these results:
-
- tew
- wan
- phore
- free
-
- The `getline' command used in this way sets only the variables `NR'
-and `FNR' (and of course, VAR). The record is not split into fields,
-so the values of the fields (including `$0') and the value of `NF' do
-not change.
-
-
-File: gawk.info, Node: Getline/File, Next: Getline/Variable/File, Prev: Getline/Variable, Up: Getline
-
-4.9.3 Using `getline' from a File
----------------------------------
-
-Use `getline < FILE' to read the next record from FILE. Here FILE is a
-string-valued expression that specifies the file name. `< FILE' is
-called a "redirection" because it directs input to come from a
-different place. For example, the following program reads its input
-record from the file `secondary.input' when it encounters a first field
-with a value equal to 10 in the current input file:
-
- {
- if ($1 == 10) {
- getline < "secondary.input"
- print
- } else
- print
- }
-
- Because the main input stream is not used, the values of `NR' and
-`FNR' are not changed. However, the record it reads is split into
-fields in the normal manner, so the values of `$0' and the other fields
-are changed, resulting in a new value of `NF'.
-
- According to POSIX, `getline < EXPRESSION' is ambiguous if
-EXPRESSION contains unparenthesized operators other than `$'; for
-example, `getline < dir "/" file' is ambiguous because the
-concatenation operator is not parenthesized. You should write it as
-`getline < (dir "/" file)' if you want your program to be portable to
-all `awk' implementations.
-
-
-File: gawk.info, Node: Getline/Variable/File, Next: Getline/Pipe, Prev: Getline/File, Up: Getline
-
-4.9.4 Using `getline' into a Variable from a File
--------------------------------------------------
-
-Use `getline VAR < FILE' to read input from the file FILE, and put it
-in the variable VAR. As above, FILE is a string-valued expression that
-specifies the file from which to read.
-
- In this version of `getline', none of the built-in variables are
-changed and the record is not split into fields. The only variable
-changed is VAR.(1) For example, the following program copies all the
-input files to the output, except for records that say
-`@include FILENAME'. Such a record is replaced by the contents of the
-file FILENAME:
-
- {
- if (NF == 2 && $1 == "@include") {
- while ((getline line < $2) > 0)
- print line
- close($2)
- } else
- print
- }
-
- Note here how the name of the extra input file is not built into the
-program; it is taken directly from the data, specifically from the
-second field on the `@include' line.
-
- The `close()' function is called to ensure that if two identical
-`@include' lines appear in the input, the entire specified file is
-included twice. *Note Close Files And Pipes::.
-
- One deficiency of this program is that it does not process nested
-`@include' statements (i.e., `@include' statements in included files)
-the way a true macro preprocessor would. *Note Igawk Program::, for a
-program that does handle nested `@include' statements.
-
- ---------- Footnotes ----------
-
- (1) This is not quite true. `RT' could be changed if `RS' is a
-regular expression.
-
-
-File: gawk.info, Node: Getline/Pipe, Next: Getline/Variable/Pipe, Prev: Getline/Variable/File, Up: Getline
-
-4.9.5 Using `getline' from a Pipe
----------------------------------
-
-The output of a command can also be piped into `getline', using
-`COMMAND | getline'. In this case, the string COMMAND is run as a
-shell command and its output is piped into `awk' to be used as input.
-This form of `getline' reads one record at a time from the pipe. For
-example, the following program copies its input to its output, except
-for lines that begin with `@execute', which are replaced by the output
-produced by running the rest of the line as a shell command:
-
- {
- if ($1 == "@execute") {
- tmp = substr($0, 10) # Remove "@execute"
- while ((tmp | getline) > 0)
- print
- close(tmp)
- } else
- print
- }
-
-The `close()' function is called to ensure that if two identical
-`@execute' lines appear in the input, the command is run for each one.
-*Note Close Files And Pipes::. Given the input:
-
- foo
- bar
- baz
- @execute who
- bletch
-
-the program might produce:
-
- foo
- bar
- baz
- arnold ttyv0 Jul 13 14:22
- miriam ttyp0 Jul 13 14:23 (murphy:0)
- bill ttyp1 Jul 13 14:23 (murphy:0)
- bletch
-
-Notice that this program ran the command `who' and printed the previous
-result. (If you try this program yourself, you will of course get
-different results, depending upon who is logged in on your system.)
-
- This variation of `getline' splits the record into fields, sets the
-value of `NF', and recomputes the value of `$0'. The values of `NR'
-and `FNR' are not changed.
-
- According to POSIX, `EXPRESSION | getline' is ambiguous if
-EXPRESSION contains unparenthesized operators other than `$'--for
-example, `"echo " "date" | getline' is ambiguous because the
-concatenation operator is not parenthesized. You should write it as
-`("echo " "date") | getline' if you want your program to be portable to
-all `awk' implementations.
-
- NOTE: Unfortunately, `gawk' has not been consistent in its
- treatment of a construct like `"echo " "date" | getline'. Most
- versions, including the current version, treat it at as `("echo "
- "date") | getline'. (This how Brian Kernighan's `awk' behaves.)
- Some versions changed and treated it as `"echo " ("date" |
- getline)'. (This is how `mawk' behaves.) In short, _always_ use
- explicit parentheses, and then you won't have to worry.
-
-
-File: gawk.info, Node: Getline/Variable/Pipe, Next: Getline/Coprocess, Prev: Getline/Pipe, Up: Getline
-
-4.9.6 Using `getline' into a Variable from a Pipe
--------------------------------------------------
-
-When you use `COMMAND | getline VAR', the output of COMMAND is sent
-through a pipe to `getline' and into the variable VAR. For example, the
-following program reads the current date and time into the variable
-`current_time', using the `date' utility, and then prints it:
-
- BEGIN {
- "date" | getline current_time
- close("date")
- print "Report printed on " current_time
- }
-
- In this version of `getline', none of the built-in variables are
-changed and the record is not split into fields.
-
- According to POSIX, `EXPRESSION | getline VAR' is ambiguous if
-EXPRESSION contains unparenthesized operators other than `$'; for
-example, `"echo " "date" | getline VAR' is ambiguous because the
-concatenation operator is not parenthesized. You should write it as
-`("echo " "date") | getline VAR' if you want your program to be
-portable to other `awk' implementations.
-
-
-File: gawk.info, Node: Getline/Coprocess, Next: Getline/Variable/Coprocess, Prev: Getline/Variable/Pipe, Up: Getline
-
-4.9.7 Using `getline' from a Coprocess
---------------------------------------
-
-Input into `getline' from a pipe is a one-way operation. The command
-that is started with `COMMAND | getline' only sends data _to_ your
-`awk' program.
-
- On occasion, you might want to send data to another program for
-processing and then read the results back. `gawk' allows you to start
-a "coprocess", with which two-way communications are possible. This is
-done with the `|&' operator. Typically, you write data to the
-coprocess first and then read results back, as shown in the following:
-
- print "SOME QUERY" |& "db_server"
- "db_server" |& getline
-
-which sends a query to `db_server' and then reads the results.
-
- The values of `NR' and `FNR' are not changed, because the main input
-stream is not used. However, the record is split into fields in the
-normal manner, thus changing the values of `$0', of the other fields,
-and of `NF'.
-
- Coprocesses are an advanced feature. They are discussed here only
-because this is the minor node on `getline'. *Note Two-way I/O::,
-where coprocesses are discussed in more detail.
-
-
-File: gawk.info, Node: Getline/Variable/Coprocess, Next: Getline Notes, Prev: Getline/Coprocess, Up: Getline
-
-4.9.8 Using `getline' into a Variable from a Coprocess
-------------------------------------------------------
-
-When you use `COMMAND |& getline VAR', the output from the coprocess
-COMMAND is sent through a two-way pipe to `getline' and into the
-variable VAR.
-
- In this version of `getline', none of the built-in variables are
-changed and the record is not split into fields. The only variable
-changed is VAR.
-
- Coprocesses are an advanced feature. They are discussed here only
-because this is the minor node on `getline'. *Note Two-way I/O::,
-where coprocesses are discussed in more detail.
-
-
-File: gawk.info, Node: Getline Notes, Next: Getline Summary, Prev: Getline/Variable/Coprocess, Up: Getline
-
-4.9.9 Points to Remember About `getline'
-----------------------------------------
-
-Here are some miscellaneous points about `getline' that you should bear
-in mind:
-
- * When `getline' changes the value of `$0' and `NF', `awk' does
- _not_ automatically jump to the start of the program and start
- testing the new record against every pattern. However, the new
- record is tested against any subsequent rules.
-
- * Many `awk' implementations limit the number of pipelines that an
- `awk' program may have open to just one. In `gawk', there is no
- such limit. You can open as many pipelines (and coprocesses) as
- the underlying operating system permits.
-
- * An interesting side effect occurs if you use `getline' without a
- redirection inside a `BEGIN' rule. Because an unredirected
- `getline' reads from the command-line data files, the first
- `getline' command causes `awk' to set the value of `FILENAME'.
- Normally, `FILENAME' does not have a value inside `BEGIN' rules,
- because you have not yet started to process the command-line data
- files. (d.c.) (*Note BEGIN/END::, also *note Auto-set::.)
-
- * Using `FILENAME' with `getline' (`getline < FILENAME') is likely
- to be a source for confusion. `awk' opens a separate input stream
- from the current input file. However, by not using a variable,
- `$0' and `NR' are still updated. If you're doing this, it's
- probably by accident, and you should reconsider what it is you're
- trying to accomplish.
-
- * *note Getline Summary::, presents a table summarizing the
- `getline' variants and which variables they can affect. It is
- worth noting that those variants which do not use redirection can
- cause `FILENAME' to be updated if they cause `awk' to start
- reading a new input file.
-
-
-File: gawk.info, Node: Getline Summary, Prev: Getline Notes, Up: Getline
-
-4.9.10 Summary of `getline' Variants
-------------------------------------
-
-*note table-getline-variants:: summarizes the eight variants of
-`getline', listing which built-in variables are set by each one, and
-whether the variant is standard or a `gawk' extension.
-
-Variant Effect Standard /
- Extension
--------------------------------------------------------------------------
-`getline' Sets `$0', `NF', `FNR', Standard
- and `NR'
-`getline' VAR Sets VAR, `FNR', and `NR' Standard
-`getline <' FILE Sets `$0' and `NF' Standard
-`getline VAR < FILE' Sets VAR Standard
-COMMAND `| getline' Sets `$0' and `NF' Standard
-COMMAND `| getline' VAR Sets VAR Standard
-COMMAND `|& getline' Sets `$0' and `NF' Extension
-COMMAND `|& getline' Sets VAR Extension
-VAR
-
-Table 4.1: getline Variants and What They Set
-
-
-File: gawk.info, Node: Read Timeout, Next: Command line directories, Prev: Getline, Up: Reading Files
-
-4.10 Reading Input With A Timeout
-=================================
-
-You may specify a timeout in milliseconds for reading input from a
-terminal, pipe or two-way communication including, TCP/IP sockets. This
-can be done on a per input, command or connection basis, by setting a
-special element in the `PROCINFO' array:
-
- PROCINFO["input_name", "READ_TIMEOUT"] = TIMEOUT IN MILLISECONDS
-
- When set, this will cause `gawk' to time out and return failure if
-no data is available to read within the specified timeout period. For
-example, a TCP client can decide to give up on receiving any response
-from the server after a certain amount of time:
-
- Service = "/inet/tcp/0/localhost/daytime"
- PROCINFO[Service, "READ_TIMEOUT"] = 100
- if ((Service |& getline) > 0)
- print $0
- else if (ERRNO != "")
- print ERRNO
-
- Here is how to read interactively from the terminal(1) without
-waiting for more than five seconds:
-
- PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000
- while ((getline < "/dev/stdin") > 0)
- print $0
-
- `gawk' will terminate the read operation if input does not arrive
-after waiting for the timeout period, return failure and set the
-`ERRNO' variable to an appropriate string value. A negative or zero
-value for the timeout is the same as specifying no timeout at all.
-
- A timeout can also be set for reading from the terminal in the
-implicit loop that reads input records and matches them against
-patterns, like so:
-
- $ gawk 'BEGIN { PROCINFO["-", "READ_TIMEOUT"] = 5000 }
- > { print "You entered: " $0 }'
- gawk
- -| You entered: gawk
-
- In this case, failure to respond within five seconds results in the
-following error message:
-
- error--> gawk: cmd. line:2: (FILENAME=- FNR=1) fatal: error reading input file `-': Connection timed out
-
- The timeout can be set or changed at any time, and will take effect
-on the next attempt to read from the input device. In the following
-example, we start with a timeout value of one second, and progressively
-reduce it by one-tenth of a second until we wait indefinitely for the
-input to arrive:
-
- PROCINFO[Service, "READ_TIMEOUT"] = 1000
- while ((Service |& getline) > 0) {
- print $0
- PROCINFO[S, "READ_TIMEOUT"] -= 100
- }
-
- NOTE: You should not assume that the read operation will block
- exactly after the tenth record has been printed. It is possible
- that `gawk' will read and buffer more than one record's worth of
- data the first time. Because of this, changing the value of
- timeout like in the above example is not very useful.
-
- If the `PROCINFO' element is not present and the environment
-variable `GAWK_READ_TIMEOUT' exists, `gawk' uses its value to
-initialize the timeout value. The exclusive use of the environment
-variable to specify timeout has the disadvantage of not being able to
-control it on a per command or connection basis.
-
- `gawk' considers a timeout event to be an error even though the
-attempt to read from the underlying device may succeed in a later
-attempt. This is a limitation, and it also means that you cannot use
-this to multiplex input from two or more sources.
-
- Assigning a timeout value prevents read operations from blocking
-indefinitely. But bear in mind that there are other ways `gawk' can
-stall waiting for an input device to be ready. A network client can
-sometimes take a long time to establish a connection before it can
-start reading any data, or the attempt to open a FIFO special file for
-reading can block indefinitely until some other process opens it for
-writing.
-
- ---------- Footnotes ----------
-
- (1) This assumes that standard input is the keyboard
-
-
-File: gawk.info, Node: Command line directories, Prev: Read Timeout, Up: Reading Files
-
-4.11 Directories On The Command Line
-====================================
-
-According to the POSIX standard, files named on the `awk' command line
-must be text files. It is a fatal error if they are not. Most
-versions of `awk' treat a directory on the command line as a fatal
-error.
-
- By default, `gawk' produces a warning for a directory on the command
-line, but otherwise ignores it. If either of the `--posix' or
-`--traditional' options is given, then `gawk' reverts to treating a
-directory on the command line as a fatal error.
-
-
-File: gawk.info, Node: Printing, Next: Expressions, Prev: Reading Files, Up: Top
-
-5 Printing Output
-*****************
-
-One of the most common programming actions is to "print", or output,
-some or all of the input. Use the `print' statement for simple output,
-and the `printf' statement for fancier formatting. The `print'
-statement is not limited when computing _which_ values to print.
-However, with two exceptions, you cannot specify _how_ to print
-them--how many columns, whether to use exponential notation or not, and
-so on. (For the exceptions, *note Output Separators::, and *note
-OFMT::.) For printing with specifications, you need the `printf'
-statement (*note Printf::).
-
- Besides basic and formatted printing, this major node also covers
-I/O redirections to files and pipes, introduces the special file names
-that `gawk' processes internally, and discusses the `close()' built-in
-function.
-
-* Menu:
-
-* Print:: The `print' statement.
-* Print Examples:: Simple examples of `print' statements.
-* Output Separators:: The output separators and how to change them.
-* OFMT:: Controlling Numeric Output With `print'.
-* Printf:: The `printf' statement.
-* Redirection:: How to redirect output to multiple files and
- pipes.
-* Special Files:: File name interpretation in `gawk'.
- `gawk' allows access to inherited file
- descriptors.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-
-
-File: gawk.info, Node: Print, Next: Print Examples, Up: Printing
-
-5.1 The `print' Statement
-=========================
-
-The `print' statement is used for producing output with simple,
-standardized formatting. Specify only the strings or numbers to print,
-in a list separated by commas. They are output, separated by single
-spaces, followed by a newline. The statement looks like this:
-
- print ITEM1, ITEM2, ...
-
-The entire list of items may be optionally enclosed in parentheses. The
-parentheses are necessary if any of the item expressions uses the `>'
-relational operator; otherwise it could be confused with an output
-redirection (*note Redirection::).
-
- The items to print can be constant strings or numbers, fields of the
-current record (such as `$1'), variables, or any `awk' expression.
-Numeric values are converted to strings and then printed.
-
- The simple statement `print' with no items is equivalent to `print
-$0': it prints the entire current record. To print a blank line, use
-`print ""', where `""' is the empty string. To print a fixed piece of
-text, use a string constant, such as `"Don't Panic"', as one item. If
-you forget to use the double-quote characters, your text is taken as an
-`awk' expression, and you will probably get an error. Keep in mind
-that a space is printed between any two items.
-
-
-File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing
-
-5.2 `print' Statement Examples
-==============================
-
-Each `print' statement makes at least one line of output. However, it
-isn't limited to only one line. If an item value is a string
-containing a newline, the newline is output along with the rest of the
-string. A single `print' statement can make any number of lines this
-way.
-
- The following is an example of printing a string that contains
-embedded newlines (the `\n' is an escape sequence, used to represent
-the newline character; *note Escape Sequences::):
-
- $ awk 'BEGIN { print "line one\nline two\nline three" }'
- -| line one
- -| line two
- -| line three
-
- The next example, which is run on the `inventory-shipped' file,
-prints the first two fields of each input record, with a space between
-them:
-
- $ awk '{ print $1, $2 }' inventory-shipped
- -| Jan 13
- -| Feb 15
- -| Mar 15
- ...
-
- A common mistake in using the `print' statement is to omit the comma
-between two items. This often has the effect of making the items run
-together in the output, with no space. The reason for this is that
-juxtaposing two string expressions in `awk' means to concatenate them.
-Here is the same program, without the comma:
-
- $ awk '{ print $1 $2 }' inventory-shipped
- -| Jan13
- -| Feb15
- -| Mar15
- ...
-
- To someone unfamiliar with the `inventory-shipped' file, neither
-example's output makes much sense. A heading line at the beginning
-would make it clearer. Let's add some headings to our table of months
-(`$1') and green crates shipped (`$2'). We do this using the `BEGIN'
-pattern (*note BEGIN/END::) so that the headings are only printed once:
-
- awk 'BEGIN { print "Month Crates"
- print "----- ------" }
- { print $1, $2 }' inventory-shipped
-
-When run, the program prints the following:
-
- Month Crates
- ----- ------
- Jan 13
- Feb 15
- Mar 15
- ...
-
-The only problem, however, is that the headings and the table data
-don't line up! We can fix this by printing some spaces between the two
-fields:
-
- awk 'BEGIN { print "Month Crates"
- print "----- ------" }
- { print $1, " ", $2 }' inventory-shipped
-
- Lining up columns this way can get pretty complicated when there are
-many columns to fix. Counting spaces for two or three columns is
-simple, but any more than this can take up a lot of time. This is why
-the `printf' statement was created (*note Printf::); one of its
-specialties is lining up columns of data.
-
- NOTE: You can continue either a `print' or `printf' statement
- simply by putting a newline after any comma (*note
- Statements/Lines::).
-
-
-File: gawk.info, Node: Output Separators, Next: OFMT, Prev: Print Examples, Up: Printing
-
-5.3 Output Separators
-=====================
-
-As mentioned previously, a `print' statement contains a list of items
-separated by commas. In the output, the items are normally separated
-by single spaces. However, this doesn't need to be the case; a single
-space is simply the default. Any string of characters may be used as
-the "output field separator" by setting the built-in variable `OFS'.
-The initial value of this variable is the string `" "'--that is, a
-single space.
-
- The output from an entire `print' statement is called an "output
-record". Each `print' statement outputs one output record, and then
-outputs a string called the "output record separator" (or `ORS'). The
-initial value of `ORS' is the string `"\n"'; i.e., a newline character.
-Thus, each `print' statement normally makes a separate line.
-
- In order to change how output fields and records are separated,
-assign new values to the variables `OFS' and `ORS'. The usual place to
-do this is in the `BEGIN' rule (*note BEGIN/END::), so that it happens
-before any input is processed. It can also be done with assignments on
-the command line, before the names of the input files, or using the
-`-v' command-line option (*note Options::). The following example
-prints the first and second fields of each input record, separated by a
-semicolon, with a blank line added after each newline:
-
- $ awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
- > { print $1, $2 }' BBS-list
- -| aardvark;555-5553
- -|
- -| alpo-net;555-3412
- -|
- -| barfly;555-7685
- ...
-
- If the value of `ORS' does not contain a newline, the program's
-output runs together on a single line.
-
-
-File: gawk.info, Node: OFMT, Next: Printf, Prev: Output Separators, Up: Printing
-
-5.4 Controlling Numeric Output with `print'
-===========================================
-
-When printing numeric values with the `print' statement, `awk'
-internally converts the number to a string of characters and prints
-that string. `awk' uses the `sprintf()' function to do this conversion
-(*note String Functions::). For now, it suffices to say that the
-`sprintf()' function accepts a "format specification" that tells it how
-to format numbers (or strings), and that there are a number of
-different ways in which numbers can be formatted. The different format
-specifications are discussed more fully in *note Control Letters::.
-
- The built-in variable `OFMT' contains the default format
-specification that `print' uses with `sprintf()' when it wants to
-convert a number to a string for printing. The default value of `OFMT'
-is `"%.6g"'. The way `print' prints numbers can be changed by
-supplying different format specifications as the value of `OFMT', as
-shown in the following example:
-
- $ awk 'BEGIN {
- > OFMT = "%.0f" # print numbers as integers (rounds)
- > print 17.23, 17.54 }'
- -| 17 18
-
-According to the POSIX standard, `awk''s behavior is undefined if
-`OFMT' contains anything but a floating-point conversion specification.
-(d.c.)
-
-
-File: gawk.info, Node: Printf, Next: Redirection, Prev: OFMT, Up: Printing
-
-5.5 Using `printf' Statements for Fancier Printing
-==================================================
-
-For more precise control over the output format than what is provided
-by `print', use `printf'. With `printf' you can specify the width to
-use for each item, as well as various formatting choices for numbers
-(such as what output base to use, whether to print an exponent, whether
-to print a sign, and how many digits to print after the decimal point).
-You do this by supplying a string, called the "format string", that
-controls how and where to print the other arguments.
-
-* Menu:
-
-* Basic Printf:: Syntax of the `printf' statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-
-
-File: gawk.info, Node: Basic Printf, Next: Control Letters, Up: Printf
-
-5.5.1 Introduction to the `printf' Statement
---------------------------------------------
-
-A simple `printf' statement looks like this:
-
- printf FORMAT, ITEM1, ITEM2, ...
-
-The entire list of arguments may optionally be enclosed in parentheses.
-The parentheses are necessary if any of the item expressions use the `>'
-relational operator; otherwise, it can be confused with an output
-redirection (*note Redirection::).
-
- The difference between `printf' and `print' is the FORMAT argument.
-This is an expression whose value is taken as a string; it specifies
-how to output each of the other arguments. It is called the "format
-string".
-
- The format string is very similar to that in the ISO C library
-function `printf()'. Most of FORMAT is text to output verbatim.
-Scattered among this text are "format specifiers"--one per item. Each
-format specifier says to output the next item in the argument list at
-that place in the format.
-
- The `printf' statement does not automatically append a newline to
-its output. It outputs only what the format string specifies. So if a
-newline is needed, you must include one in the format string. The
-output separator variables `OFS' and `ORS' have no effect on `printf'
-statements. For example:
-
- $ awk 'BEGIN {
- > ORS = "\nOUCH!\n"; OFS = "+"
- > msg = "Dont Panic!"
- > printf "%s\n", msg
- > }'
- -| Dont Panic!
-
-Here, neither the `+' nor the `OUCH' appear in the output message.
-
-
-File: gawk.info, Node: Control Letters, Next: Format Modifiers, Prev: Basic Printf, Up: Printf
-
-5.5.2 Format-Control Letters
-----------------------------
-
-A format specifier starts with the character `%' and ends with a
-"format-control letter"--it tells the `printf' statement how to output
-one item. The format-control letter specifies what _kind_ of value to
-print. The rest of the format specifier is made up of optional
-"modifiers" that control _how_ to print the value, such as the field
-width. Here is a list of the format-control letters:
-
-`%c'
- Print a number as an ASCII character; thus, `printf "%c", 65'
- outputs the letter `A'. The output for a string value is the first
- character of the string.
-
- NOTE: The POSIX standard says the first character of a string
- is printed. In locales with multibyte characters, `gawk'
- attempts to convert the leading bytes of the string into a
- valid wide character and then to print the multibyte encoding
- of that character. Similarly, when printing a numeric value,
- `gawk' allows the value to be within the numeric range of
- values that can be held in a wide character.
-
- Other `awk' versions generally restrict themselves to printing
- the first byte of a string or to numeric values within the
- range of a single byte (0-255).
-
-`%d, %i'
- Print a decimal integer. The two control letters are equivalent.
- (The `%i' specification is for compatibility with ISO C.)
-
-`%e, %E'
- Print a number in scientific (exponential) notation; for example:
-
- printf "%4.3e\n", 1950
-
- prints `1.950e+03', with a total of four significant figures,
- three of which follow the decimal point. (The `4.3' represents
- two modifiers, discussed in the next node.) `%E' uses `E' instead
- of `e' in the output.
-
-`%f'
- Print a number in floating-point notation. For example:
-
- printf "%4.3f", 1950
-
- prints `1950.000', with a total of four significant figures, three
- of which follow the decimal point. (The `4.3' represents two
- modifiers, discussed in the next node.)
-
- On systems supporting IEEE 754 floating point format, values
- representing negative infinity are formatted as `-inf' or
- `-infinity', and positive infinity as `inf' and `infinity'. The
- special "not a number" value formats as `-nan' or `nan'.
-
-`%F'
- Like `%f' but the infinity and "not a number" values are spelled
- using uppercase letters.
-
- The `%F' format is a POSIX extension to ISO C; not all systems
- support it. On those that don't, `gawk' uses `%f' instead.
-
-`%g, %G'
- Print a number in either scientific notation or in floating-point
- notation, whichever uses fewer characters; if the result is
- printed in scientific notation, `%G' uses `E' instead of `e'.
-
-`%o'
- Print an unsigned octal integer (*note Nondecimal-numbers::).
-
-`%s'
- Print a string.
-
-`%u'
- Print an unsigned decimal integer. (This format is of marginal
- use, because all numbers in `awk' are floating-point; it is
- provided primarily for compatibility with C.)
-
-`%x, %X'
- Print an unsigned hexadecimal integer; `%X' uses the letters `A'
- through `F' instead of `a' through `f' (*note
- Nondecimal-numbers::).
-
-`%%'
- Print a single `%'. This does not consume an argument and it
- ignores any modifiers.
-
- NOTE: When using the integer format-control letters for values
- that are outside the range of the widest C integer type, `gawk'
- switches to the `%g' format specifier. If `--lint' is provided on
- the command line (*note Options::), `gawk' warns about this.
- Other versions of `awk' may print invalid values or do something
- else entirely. (d.c.)
-
-
-File: gawk.info, Node: Format Modifiers, Next: Printf Examples, Prev: Control Letters, Up: Printf
-
-5.5.3 Modifiers for `printf' Formats
-------------------------------------
-
-A format specification can also include "modifiers" that can control
-how much of the item's value is printed, as well as how much space it
-gets. The modifiers come between the `%' and the format-control letter.
-We will use the bullet symbol "*" in the following examples to represent
-spaces in the output. Here are the possible modifiers, in the order in
-which they may appear:
-
-`N$'
- An integer constant followed by a `$' is a "positional specifier".
- Normally, format specifications are applied to arguments in the
- order given in the format string. With a positional specifier,
- the format specification is applied to a specific argument,
- instead of what would be the next argument in the list.
- Positional specifiers begin counting with one. Thus:
-
- printf "%s %s\n", "don't", "panic"
- printf "%2$s %1$s\n", "panic", "don't"
-
- prints the famous friendly message twice.
-
- At first glance, this feature doesn't seem to be of much use. It
- is in fact a `gawk' extension, intended for use in translating
- messages at runtime. *Note Printf Ordering::, which describes how
- and why to use positional specifiers. For now, we will not use
- them.
-
-`-'
- The minus sign, used before the width modifier (see later on in
- this list), says to left-justify the argument within its specified
- width. Normally, the argument is printed right-justified in the
- specified width. Thus:
-
- printf "%-4s", "foo"
-
- prints `foo*'.
-
-`SPACE'
- For numeric conversions, prefix positive values with a space and
- negative values with a minus sign.
-
-`+'
- The plus sign, used before the width modifier (see later on in
- this list), says to always supply a sign for numeric conversions,
- even if the data to format is positive. The `+' overrides the
- space modifier.
-
-`#'
- Use an "alternate form" for certain control letters. For `%o',
- supply a leading zero. For `%x' and `%X', supply a leading `0x'
- or `0X' for a nonzero result. For `%e', `%E', `%f', and `%F', the
- result always contains a decimal point. For `%g' and `%G',
- trailing zeros are not removed from the result.
-
-`0'
- A leading `0' (zero) acts as a flag that indicates that output
- should be padded with zeros instead of spaces. This applies only
- to the numeric output formats. This flag only has an effect when
- the field width is wider than the value to print.
-
-`''
- A single quote or apostrophe character is a POSIX extension to ISO
- C. It indicates that the integer part of a floating point value,
- or the entire part of an integer decimal value, should have a
- thousands-separator character in it. This only works in locales
- that support such characters. For example:
-
- $ cat thousands.awk Show source program
- -| BEGIN { printf "%'d\n", 1234567 }
- $ LC_ALL=C gawk -f thousands.awk
- -| 1234567 Results in "C" locale
- $ LC_ALL=en_US.UTF-8 gawk -f thousands.awk
- -| 1,234,567 Results in US English UTF locale
-
- For more information about locales and internationalization issues,
- see *note Locales::.
-
- NOTE: The `'' flag is a nice feature, but its use complicates
- things: it becomes difficult to use it in command-line
- programs. For information on appropriate quoting tricks, see
- *note Quoting::.
-
-`WIDTH'
- This is a number specifying the desired minimum width of a field.
- Inserting any number between the `%' sign and the format-control
- character forces the field to expand to this width. The default
- way to do this is to pad with spaces on the left. For example:
-
- printf "%4s", "foo"
-
- prints `*foo'.
-
- The value of WIDTH is a minimum width, not a maximum. If the item
- value requires more than WIDTH characters, it can be as wide as
- necessary. Thus, the following:
-
- printf "%4s", "foobar"
-
- prints `foobar'.
-
- Preceding the WIDTH with a minus sign causes the output to be
- padded with spaces on the right, instead of on the left.
-
-`.PREC'
- A period followed by an integer constant specifies the precision
- to use when printing. The meaning of the precision varies by
- control letter:
-
- `%d', `%i', `%o', `%u', `%x', `%X'
- Minimum number of digits to print.
-
- `%e', `%E', `%f', `%F'
- Number of digits to the right of the decimal point.
-
- `%g', `%G'
- Maximum number of significant digits.
-
- `%s'
- Maximum number of characters from the string that should
- print.
-
- Thus, the following:
-
- printf "%.4s", "foobar"
-
- prints `foob'.
-
- The C library `printf''s dynamic WIDTH and PREC capability (for
-example, `"%*.*s"') is supported. Instead of supplying explicit WIDTH
-and/or PREC values in the format string, they are passed in the
-argument list. For example:
-
- w = 5
- p = 3
- s = "abcdefg"
- printf "%*.*s\n", w, p, s
-
-is exactly equivalent to:
-
- s = "abcdefg"
- printf "%5.3s\n", s
-
-Both programs output `**abc'. Earlier versions of `awk' did not
-support this capability. If you must use such a version, you may
-simulate this feature by using concatenation to build up the format
-string, like so:
-
- w = 5
- p = 3
- s = "abcdefg"
- printf "%" w "." p "s\n", s
-
-This is not particularly easy to read but it does work.
-
- C programmers may be used to supplying additional `l', `L', and `h'
-modifiers in `printf' format strings. These are not valid in `awk'.
-Most `awk' implementations silently ignore them. If `--lint' is
-provided on the command line (*note Options::), `gawk' warns about
-their use. If `--posix' is supplied, their use is a fatal error.
-
-
-File: gawk.info, Node: Printf Examples, Prev: Format Modifiers, Up: Printf
-
-5.5.4 Examples Using `printf'
------------------------------
-
-The following simple example shows how to use `printf' to make an
-aligned table:
-
- awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
-
-This command prints the names of the bulletin boards (`$1') in the file
-`BBS-list' as a string of 10 characters that are left-justified. It
-also prints the phone numbers (`$2') next on the line. This produces
-an aligned two-column table of names and phone numbers, as shown here:
-
- $ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
- -| aardvark 555-5553
- -| alpo-net 555-3412
- -| barfly 555-7685
- -| bites 555-1675
- -| camelot 555-0542
- -| core 555-2912
- -| fooey 555-1234
- -| foot 555-6699
- -| macfoo 555-6480
- -| sdace 555-3430
- -| sabafoo 555-2127
-
- In this case, the phone numbers had to be printed as strings because
-the numbers are separated by a dash. Printing the phone numbers as
-numbers would have produced just the first three digits: `555'. This
-would have been pretty confusing.
-
- It wasn't necessary to specify a width for the phone numbers because
-they are last on their lines. They don't need to have spaces after
-them.
-
- The table could be made to look even nicer by adding headings to the
-tops of the columns. This is done using the `BEGIN' pattern (*note
-BEGIN/END::) so that the headers are only printed once, at the
-beginning of the `awk' program:
-
- awk 'BEGIN { print "Name Number"
- print "---- ------" }
- { printf "%-10s %s\n", $1, $2 }' BBS-list
-
- The above example mixes `print' and `printf' statements in the same
-program. Using just `printf' statements can produce the same results:
-
- awk 'BEGIN { printf "%-10s %s\n", "Name", "Number"
- printf "%-10s %s\n", "----", "------" }
- { printf "%-10s %s\n", $1, $2 }' BBS-list
-
-Printing each column heading with the same format specification used
-for the column elements ensures that the headings are aligned just like
-the columns.
-
- The fact that the same format specification is used three times can
-be emphasized by storing it in a variable, like this:
-
- awk 'BEGIN { format = "%-10s %s\n"
- printf format, "Name", "Number"
- printf format, "----", "------" }
- { printf format, $1, $2 }' BBS-list
-
- At this point, it would be a worthwhile exercise to use the `printf'
-statement to line up the headings and table data for the
-`inventory-shipped' example that was covered earlier in the minor node
-on the `print' statement (*note Print::).
-
-
-File: gawk.info, Node: Redirection, Next: Special Files, Prev: Printf, Up: Printing
-
-5.6 Redirecting Output of `print' and `printf'
-==============================================
-
-So far, the output from `print' and `printf' has gone to the standard
-output, usually the screen. Both `print' and `printf' can also send
-their output to other places. This is called "redirection".
-
- NOTE: When `--sandbox' is specified (*note Options::), redirecting
- output to files and pipes is disabled.
-
- A redirection appears after the `print' or `printf' statement.
-Redirections in `awk' are written just like redirections in shell
-commands, except that they are written inside the `awk' program.
-
- There are four forms of output redirection: output to a file, output
-appended to a file, output through a pipe to another command, and output
-to a coprocess. They are all shown for the `print' statement, but they
-work identically for `printf':
-
-`print ITEMS > OUTPUT-FILE'
- This redirection prints the items into the output file named
- OUTPUT-FILE. The file name OUTPUT-FILE can be any expression.
- Its value is changed to a string and then used as a file name
- (*note Expressions::).
-
- When this type of redirection is used, the OUTPUT-FILE is erased
- before the first output is written to it. Subsequent writes to
- the same OUTPUT-FILE do not erase OUTPUT-FILE, but append to it.
- (This is different from how you use redirections in shell scripts.)
- If OUTPUT-FILE does not exist, it is created. For example, here
- is how an `awk' program can write a list of BBS names to one file
- named `name-list', and a list of phone numbers to another file
- named `phone-list':
-
- $ awk '{ print $2 > "phone-list"
- > print $1 > "name-list" }' BBS-list
- $ cat phone-list
- -| 555-5553
- -| 555-3412
- ...
- $ cat name-list
- -| aardvark
- -| alpo-net
- ...
-
- Each output file contains one name or number per line.
-
-`print ITEMS >> OUTPUT-FILE'
- This redirection prints the items into the pre-existing output file
- named OUTPUT-FILE. The difference between this and the single-`>'
- redirection is that the old contents (if any) of OUTPUT-FILE are
- not erased. Instead, the `awk' output is appended to the file.
- If OUTPUT-FILE does not exist, then it is created.
-
-`print ITEMS | COMMAND'
- It is possible to send output to another program through a pipe
- instead of into a file. This redirection opens a pipe to
- COMMAND, and writes the values of ITEMS through this pipe to
- another process created to execute COMMAND.
-
- The redirection argument COMMAND is actually an `awk' expression.
- Its value is converted to a string whose contents give the shell
- command to be run. For example, the following produces two files,
- one unsorted list of BBS names, and one list sorted in reverse
- alphabetical order:
-
- awk '{ print $1 > "names.unsorted"
- command = "sort -r > names.sorted"
- print $1 | command }' BBS-list
-
- The unsorted list is written with an ordinary redirection, while
- the sorted list is written by piping through the `sort' utility.
-
- The next example uses redirection to mail a message to the mailing
- list `bug-system'. This might be useful when trouble is
- encountered in an `awk' script run periodically for system
- maintenance:
-
- report = "mail bug-system"
- print "Awk script failed:", $0 | report
- m = ("at record number " FNR " of " FILENAME)
- print m | report
- close(report)
-
- The message is built using string concatenation and saved in the
- variable `m'. It's then sent down the pipeline to the `mail'
- program. (The parentheses group the items to concatenate--see
- *note Concatenation::.)
-
- The `close()' function is called here because it's a good idea to
- close the pipe as soon as all the intended output has been sent to
- it. *Note Close Files And Pipes::, for more information.
-
- This example also illustrates the use of a variable to represent a
- FILE or COMMAND--it is not necessary to always use a string
- constant. Using a variable is generally a good idea, because (if
- you mean to refer to that same file or command) `awk' requires
- that the string value be spelled identically every time.
-
-`print ITEMS |& COMMAND'
- This redirection prints the items to the input of COMMAND. The
- difference between this and the single-`|' redirection is that the
- output from COMMAND can be read with `getline'. Thus COMMAND is a
- "coprocess", which works together with, but subsidiary to, the
- `awk' program.
-
- This feature is a `gawk' extension, and is not available in POSIX
- `awk'. *Note Getline/Coprocess::, for a brief discussion. *Note
- Two-way I/O::, for a more complete discussion.
-
- Redirecting output using `>', `>>', `|', or `|&' asks the system to
-open a file, pipe, or coprocess only if the particular FILE or COMMAND
-you specify has not already been written to by your program or if it
-has been closed since it was last written to.
-
- It is a common error to use `>' redirection for the first `print' to
-a file, and then to use `>>' for subsequent output:
-
- # clear the file
- print "Don't panic" > "guide.txt"
- ...
- # append
- print "Avoid improbability generators" >> "guide.txt"
-
-This is indeed how redirections must be used from the shell. But in
-`awk', it isn't necessary. In this kind of case, a program should use
-`>' for all the `print' statements, since the output file is only
-opened once. (It happens that if you mix `>' and `>>' that output is
-produced in the expected order. However, mixing the operators for the
-same file is definitely poor style, and is confusing to readers of your
-program.)
-
- Many older `awk' implementations limit the number of pipelines that
-an `awk' program may have open to just one! In `gawk', there is no
-such limit. `gawk' allows a program to open as many pipelines as the
-underlying operating system permits.
-
-Advanced Notes: Piping into `sh'
---------------------------------
-
-A particularly powerful way to use redirection is to build command lines
-and pipe them into the shell, `sh'. For example, suppose you have a
-list of files brought over from a system where all the file names are
-stored in uppercase, and you wish to rename them to have names in all
-lowercase. The following program is both simple and efficient:
-
- { printf("mv %s %s\n", $0, tolower($0)) | "sh" }
-
- END { close("sh") }
-
- The `tolower()' function returns its argument string with all
-uppercase characters converted to lowercase (*note String Functions::).
-The program builds up a list of command lines, using the `mv' utility
-to rename the files. It then sends the list to the shell for execution.
-
-
-File: gawk.info, Node: Special Files, Next: Close Files And Pipes, Prev: Redirection, Up: Printing
-
-5.7 Special File Names in `gawk'
-================================
-
-`gawk' provides a number of special file names that it interprets
-internally. These file names provide access to standard file
-descriptors and TCP/IP networking.
-
-* Menu:
-
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-
-
-File: gawk.info, Node: Special FD, Next: Special Network, Up: Special Files
-
-5.7.1 Special Files for Standard Descriptors
---------------------------------------------
-
-Running programs conventionally have three input and output streams
-already available to them for reading and writing. These are known as
-the "standard input", "standard output", and "standard error output".
-These streams are, by default, connected to your keyboard and screen,
-but they are often redirected with the shell, via the `<', `<<', `>',
-`>>', `>&', and `|' operators. Standard error is typically used for
-writing error messages; the reason there are two separate streams,
-standard output and standard error, is so that they can be redirected
-separately.
-
- In other implementations of `awk', the only way to write an error
-message to standard error in an `awk' program is as follows:
-
- print "Serious error detected!" | "cat 1>&2"
-
-This works by opening a pipeline to a shell command that can access the
-standard error stream that it inherits from the `awk' process. This is
-far from elegant, and it is also inefficient, because it requires a
-separate process. So people writing `awk' programs often don't do
-this. Instead, they send the error messages to the screen, like this:
-
- print "Serious error detected!" > "/dev/tty"
-
-(`/dev/tty' is a special file supplied by the operating system that is
-connected to your keyboard and screen. It represents the "terminal,"(1)
-which on modern systems is a keyboard and screen, not a serial console.)
-This usually has the same effect but not always: although the standard
-error stream is usually the screen, it can be redirected; when that
-happens, writing to the screen is not correct. In fact, if `awk' is
-run from a background job, it may not have a terminal at all. Then
-opening `/dev/tty' fails.
-
- `gawk' provides special file names for accessing the three standard
-streams. (c.e.). It also provides syntax for accessing any other
-inherited open files. If the file name matches one of these special
-names when `gawk' redirects input or output, then it directly uses the
-stream that the file name stands for. These special file names work
-for all operating systems that `gawk' has been ported to, not just
-those that are POSIX-compliant:
-
-`/dev/stdin'
- The standard input (file descriptor 0).
-
-`/dev/stdout'
- The standard output (file descriptor 1).
-
-`/dev/stderr'
- The standard error output (file descriptor 2).
-
-`/dev/fd/N'
- The file associated with file descriptor N. Such a file must be
- opened by the program initiating the `awk' execution (typically
- the shell). Unless special pains are taken in the shell from which
- `gawk' is invoked, only descriptors 0, 1, and 2 are available.
-
- The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are
-aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively.
-However, they are more self-explanatory. The proper way to write an
-error message in a `gawk' program is to use `/dev/stderr', like this:
-
- print "Serious error detected!" > "/dev/stderr"
-
- Note the use of quotes around the file name. Like any other
-redirection, the value must be a string. It is a common error to omit
-the quotes, which leads to confusing results.
-
- Finally, using the `close()' function on a file name of the form
-`"/dev/fd/N"', for file descriptor numbers above two, will actually
-close the given file descriptor.
-
- The `/dev/stdin', `/dev/stdout', and `/dev/stderr' special files are
-also recognized internally by several other versions of `awk'.
-
- ---------- Footnotes ----------
-
- (1) The "tty" in `/dev/tty' stands for "Teletype," a serial terminal.
-
-
-File: gawk.info, Node: Special Network, Next: Special Caveats, Prev: Special FD, Up: Special Files
-
-5.7.2 Special Files for Network Communications
-----------------------------------------------
-
-`gawk' programs can open a two-way TCP/IP connection, acting as either
-a client or a server. This is done using a special file name of the
-form:
-
- `/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'
-
- The NET-TYPE is one of `inet', `inet4' or `inet6'. The PROTOCOL is
-one of `tcp' or `udp', and the other fields represent the other
-essential pieces of information for making a networking connection.
-These file names are used with the `|&' operator for communicating with
-a coprocess (*note Two-way I/O::). This is an advanced feature,
-mentioned here only for completeness. Full discussion is delayed until
-*note TCP/IP Networking::.
-
-
-File: gawk.info, Node: Special Caveats, Prev: Special Network, Up: Special Files
-
-5.7.3 Special File Name Caveats
--------------------------------
-
-Here is a list of things to bear in mind when using the special file
-names that `gawk' provides:
-
- * Recognition of these special file names is disabled if `gawk' is in
- compatibility mode (*note Options::).
-
- * `gawk' _always_ interprets these special file names. For example,
- using `/dev/fd/4' for output actually writes on file descriptor 4,
- and not on a new file descriptor that is `dup()''ed from file
- descriptor 4. Most of the time this does not matter; however, it
- is important to _not_ close any of the files related to file
- descriptors 0, 1, and 2. Doing so results in unpredictable
- behavior.
-
-
-File: gawk.info, Node: Close Files And Pipes, Prev: Special Files, Up: Printing
-
-5.8 Closing Input and Output Redirections
-=========================================
-
-If the same file name or the same shell command is used with `getline'
-more than once during the execution of an `awk' program (*note
-Getline::), the file is opened (or the command is executed) the first
-time only. At that time, the first record of input is read from that
-file or command. The next time the same file or command is used with
-`getline', another record is read from it, and so on.
-
- Similarly, when a file or pipe is opened for output, `awk' remembers
-the file name or command associated with it, and subsequent writes to
-the same file or command are appended to the previous writes. The file
-or pipe stays open until `awk' exits.
-
- This implies that special steps are necessary in order to read the
-same file again from the beginning, or to rerun a shell command (rather
-than reading more output from the same command). The `close()' function
-makes these things possible:
-
- close(FILENAME)
-
-or:
-
- close(COMMAND)
-
- The argument FILENAME or COMMAND can be any expression. Its value
-must _exactly_ match the string that was used to open the file or start
-the command (spaces and other "irrelevant" characters included). For
-example, if you open a pipe with this:
-
- "sort -r names" | getline foo
-
-then you must close it with this:
-
- close("sort -r names")
-
- Once this function call is executed, the next `getline' from that
-file or command, or the next `print' or `printf' to that file or
-command, reopens the file or reruns the command. Because the
-expression that you use to close a file or pipeline must exactly match
-the expression used to open the file or run the command, it is good
-practice to use a variable to store the file name or command. The
-previous example becomes the following:
-
- sortcom = "sort -r names"
- sortcom | getline foo
- ...
- close(sortcom)
-
-This helps avoid hard-to-find typographical errors in your `awk'
-programs. Here are some of the reasons for closing an output file:
-
- * To write a file and read it back later on in the same `awk'
- program. Close the file after writing it, then begin reading it
- with `getline'.
-
- * To write numerous files, successively, in the same `awk' program.
- If the files aren't closed, eventually `awk' may exceed a system
- limit on the number of open files in one process. It is best to
- close each one when the program has finished writing it.
-
- * To make a command finish. When output is redirected through a
- pipe, the command reading the pipe normally continues to try to
- read input as long as the pipe is open. Often this means the
- command cannot really do its work until the pipe is closed. For
- example, if output is redirected to the `mail' program, the
- message is not actually sent until the pipe is closed.
-
- * To run the same program a second time, with the same arguments.
- This is not the same thing as giving more input to the first run!
-
- For example, suppose a program pipes output to the `mail' program.
- If it outputs several lines redirected to this pipe without closing
- it, they make a single message of several lines. By contrast, if
- the program closes the pipe after each line of output, then each
- line makes a separate message.
-
- If you use more files than the system allows you to have open,
-`gawk' attempts to multiplex the available open files among your data
-files. `gawk''s ability to do this depends upon the facilities of your
-operating system, so it may not always work. It is therefore both good
-practice and good portability advice to always use `close()' on your
-files when you are done with them. In fact, if you are using a lot of
-pipes, it is essential that you close commands when done. For example,
-consider something like this:
-
- {
- ...
- command = ("grep " $1 " /some/file | my_prog -q " $3)
- while ((command | getline) > 0) {
- PROCESS OUTPUT OF command
- }
- # need close(command) here
- }
-
- This example creates a new pipeline based on data in _each_ record.
-Without the call to `close()' indicated in the comment, `awk' creates
-child processes to run the commands, until it eventually runs out of
-file descriptors for more pipelines.
-
- Even though each command has finished (as indicated by the
-end-of-file return status from `getline'), the child process is not
-terminated;(1) more importantly, the file descriptor for the pipe is
-not closed and released until `close()' is called or `awk' exits.
-
- `close()' will silently do nothing if given an argument that does
-not represent a file, pipe or coprocess that was opened with a
-redirection.
-
- Note also that `close(FILENAME)' has no "magic" effects on the
-implicit loop that reads through the files named on the command line.
-It is, more likely, a close of a file that was never opened, so `awk'
-silently does nothing.
-
- When using the `|&' operator to communicate with a coprocess, it is
-occasionally useful to be able to close one end of the two-way pipe
-without closing the other. This is done by supplying a second argument
-to `close()'. As in any other call to `close()', the first argument is
-the name of the command or special file used to start the coprocess.
-The second argument should be a string, with either of the values
-`"to"' or `"from"'. Case does not matter. As this is an advanced
-feature, a more complete discussion is delayed until *note Two-way
-I/O::, which discusses it in more detail and gives an example.
-
-Advanced Notes: Using `close()''s Return Value
-----------------------------------------------
-
-In many versions of Unix `awk', the `close()' function is actually a
-statement. It is a syntax error to try and use the return value from
-`close()': (d.c.)
-
- command = "..."
- command | getline info
- retval = close(command) # syntax error in many Unix awks
-
- `gawk' treats `close()' as a function. The return value is -1 if
-the argument names something that was never opened with a redirection,
-or if there is a system problem closing the file or process. In these
-cases, `gawk' sets the built-in variable `ERRNO' to a string describing
-the problem.
-
- In `gawk', when closing a pipe or coprocess (input or output), the
-return value is the exit status of the command.(2) Otherwise, it is the
-return value from the system's `close()' or `fclose()' C functions when
-closing input or output files, respectively. This value is zero if the
-close succeeds, or -1 if it fails.
-
- The POSIX standard is very vague; it says that `close()' returns
-zero on success and nonzero otherwise. In general, different
-implementations vary in what they report when closing pipes; thus the
-return value cannot be used portably. (d.c.) In POSIX mode (*note
-Options::), `gawk' just returns zero when closing a pipe.
-
- ---------- Footnotes ----------
-
- (1) The technical terminology is rather morbid. The finished child
-is called a "zombie," and cleaning up after it is referred to as
-"reaping."
-
- (2) This is a full 16-bit value as returned by the `wait()' system
-call. See the system manual pages for information on how to decode this
-value.
-
-
-File: gawk.info, Node: Expressions, Next: Patterns and Actions, Prev: Printing, Up: Top
-
-6 Expressions
-*************
-
-Expressions are the basic building blocks of `awk' patterns and
-actions. An expression evaluates to a value that you can print, test,
-or pass to a function. Additionally, an expression can assign a new
-value to a variable or a field by using an assignment operator.
-
- An expression can serve as a pattern or action statement on its own.
-Most other kinds of statements contain one or more expressions that
-specify the data on which to operate. As in other languages,
-expressions in `awk' include variables, array references, constants,
-and function calls, as well as combinations of these with various
-operators.
-
-* Menu:
-
-* Values:: Constants, Variables, and Regular Expressions.
-* All Operators:: `gawk''s operators.
-* Truth Values and Conditions:: Testing for true and false.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-
-
-File: gawk.info, Node: Values, Next: All Operators, Up: Expressions
-
-6.1 Constants, Variables and Conversions
-========================================
-
-Expressions are built up from values and the operations performed upon
-them. This minor node describes the elementary objects which provide
-the values used in expressions.
-
-* Menu:
-
-* Constants:: String, numeric and regexp constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later use.
-* Conversion:: The conversion of strings to numbers and vice
- versa.
-
-
-File: gawk.info, Node: Constants, Next: Using Constant Regexps, Up: Values
-
-6.1.1 Constant Expressions
---------------------------
-
-The simplest type of expression is the "constant", which always has the
-same value. There are three types of constants: numeric, string, and
-regular expression.
-
- Each is used in the appropriate context when you need a data value
-that isn't going to change. Numeric constants can have different
-forms, but are stored identically internally.
-
-* Menu:
-
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-
-
-File: gawk.info, Node: Scalar Constants, Next: Nondecimal-numbers, Up: Constants
-
-6.1.1.1 Numeric and String Constants
-....................................
-
-A "numeric constant" stands for a number. This number can be an
-integer, a decimal fraction, or a number in scientific (exponential)
-notation.(1) Here are some examples of numeric constants that all have
-the same value:
-
- 105
- 1.05e+2
- 1050e-1
-
- A string constant consists of a sequence of characters enclosed in
-double-quotation marks. For example:
-
- "parrot"
-
-represents the string whose contents are `parrot'. Strings in `gawk'
-can be of any length, and they can contain any of the possible
-eight-bit ASCII characters including ASCII NUL (character code zero).
-Other `awk' implementations may have difficulty with some character
-codes.
-
- ---------- Footnotes ----------
-
- (1) The internal representation of all numbers, including integers,
-uses double precision floating-point numbers. On most modern systems,
-these are in IEEE 754 standard format.
-
-
-File: gawk.info, Node: Nondecimal-numbers, Next: Regexp Constants, Prev: Scalar Constants, Up: Constants
-
-6.1.1.2 Octal and Hexadecimal Numbers
-.....................................
-
-In `awk', all numbers are in decimal; i.e., base 10. Many other
-programming languages allow you to specify numbers in other bases, often
-octal (base 8) and hexadecimal (base 16). In octal, the numbers go 0,
-1, 2, 3, 4, 5, 6, 7, 10, 11, 12, etc. Just as `11', in decimal, is 1
-times 10 plus 1, so `11', in octal, is 1 times 8, plus 1. This equals 9
-in decimal. In hexadecimal, there are 16 digits. Since the everyday
-decimal number system only has ten digits (`0'-`9'), the letters `a'
-through `f' are used to represent the rest. (Case in the letters is
-usually irrelevant; hexadecimal `a' and `A' have the same value.)
-Thus, `11', in hexadecimal, is 1 times 16 plus 1, which equals 17 in
-decimal.
-
- Just by looking at plain `11', you can't tell what base it's in.
-So, in C, C++, and other languages derived from C, there is a special
-notation to signify the base. Octal numbers start with a leading `0',
-and hexadecimal numbers start with a leading `0x' or `0X':
-
-`11'
- Decimal value 11.
-
-`011'
- Octal 11, decimal value 9.
-
-`0x11'
- Hexadecimal 11, decimal value 17.
-
- This example shows the difference:
-
- $ gawk 'BEGIN { printf "%d, %d, %d\n", 011, 11, 0x11 }'
- -| 9, 11, 17
-
- Being able to use octal and hexadecimal constants in your programs
-is most useful when working with data that cannot be represented
-conveniently as characters or as regular numbers, such as binary data
-of various sorts.
-
- `gawk' allows the use of octal and hexadecimal constants in your
-program text. However, such numbers in the input data are not treated
-differently; doing so by default would break old programs. (If you
-really need to do this, use the `--non-decimal-data' command-line
-option; *note Nondecimal Data::.) If you have octal or hexadecimal
-data, you can use the `strtonum()' function (*note String Functions::)
-to convert the data into a number. Most of the time, you will want to
-use octal or hexadecimal constants when working with the built-in bit
-manipulation functions; see *note Bitwise Functions::, for more
-information.
-
- Unlike some early C implementations, `8' and `9' are not valid in
-octal constants; e.g., `gawk' treats `018' as decimal 18:
-
- $ gawk 'BEGIN { print "021 is", 021 ; print 018 }'
- -| 021 is 17
- -| 18
-
- Octal and hexadecimal source code constants are a `gawk' extension.
-If `gawk' is in compatibility mode (*note Options::), they are not
-available.
-
-Advanced Notes: A Constant's Base Does Not Affect Its Value
------------------------------------------------------------
-
-Once a numeric constant has been converted internally into a number,
-`gawk' no longer remembers what the original form of the constant was;
-the internal value is always used. This has particular consequences
-for conversion of numbers to strings:
-
- $ gawk 'BEGIN { printf "0x11 is <%s>\n", 0x11 }'
- -| 0x11 is <17>
-
-
-File: gawk.info, Node: Regexp Constants, Prev: Nondecimal-numbers, Up: Constants
-
-6.1.1.3 Regular Expression Constants
-....................................
-
-A regexp constant is a regular expression description enclosed in
-slashes, such as `/^beginning and end$/'. Most regexps used in `awk'
-programs are constant, but the `~' and `!~' matching operators can also
-match computed or dynamic regexps (which are just ordinary strings or
-variables that contain a regexp).
-
-
-File: gawk.info, Node: Using Constant Regexps, Next: Variables, Prev: Constants, Up: Values
-
-6.1.2 Using Regular Expression Constants
-----------------------------------------
-
-When used on the righthand side of the `~' or `!~' operators, a regexp
-constant merely stands for the regexp that is to be matched. However,
-regexp constants (such as `/foo/') may be used like simple expressions.
-When a regexp constant appears by itself, it has the same meaning as if
-it appeared in a pattern, i.e., `($0 ~ /foo/)' (d.c.) *Note Expression
-Patterns::. This means that the following two code segments:
-
- if ($0 ~ /barfly/ || $0 ~ /camelot/)
- print "found"
-
-and:
-
- if (/barfly/ || /camelot/)
- print "found"
-
-are exactly equivalent. One rather bizarre consequence of this rule is
-that the following Boolean expression is valid, but does not do what
-the user probably intended:
-
- # Note that /foo/ is on the left of the ~
- if (/foo/ ~ $1) print "found foo"
-
-This code is "obviously" testing `$1' for a match against the regexp
-`/foo/'. But in fact, the expression `/foo/ ~ $1' really means `($0 ~
-/foo/) ~ $1'. In other words, first match the input record against the
-regexp `/foo/'. The result is either zero or one, depending upon the
-success or failure of the match. That result is then matched against
-the first field in the record. Because it is unlikely that you would
-ever really want to make this kind of test, `gawk' issues a warning
-when it sees this construct in a program. Another consequence of this
-rule is that the assignment statement:
-
- matches = /foo/
-
-assigns either zero or one to the variable `matches', depending upon
-the contents of the current input record.
-
- Constant regular expressions are also used as the first argument for
-the `gensub()', `sub()', and `gsub()' functions, as the second argument
-of the `match()' function, and as the third argument of the
-`patsplit()' function (*note String Functions::). Modern
-implementations of `awk', including `gawk', allow the third argument of
-`split()' to be a regexp constant, but some older implementations do
-not. (d.c.) This can lead to confusion when attempting to use regexp
-constants as arguments to user-defined functions (*note User-defined::).
-For example:
-
- function mysub(pat, repl, str, global)
- {
- if (global)
- gsub(pat, repl, str)
- else
- sub(pat, repl, str)
- return str
- }
-
- {
- ...
- text = "hi! hi yourself!"
- mysub(/hi/, "howdy", text, 1)
- ...
- }
-
- In this example, the programmer wants to pass a regexp constant to
-the user-defined function `mysub', which in turn passes it on to either
-`sub()' or `gsub()'. However, what really happens is that the `pat'
-parameter is either one or zero, depending upon whether or not `$0'
-matches `/hi/'. `gawk' issues a warning when it sees a regexp constant
-used as a parameter to a user-defined function, since passing a truth
-value in this way is probably not what was intended.
-
-
-File: gawk.info, Node: Variables, Next: Conversion, Prev: Using Constant Regexps, Up: Values
-
-6.1.3 Variables
----------------
-
-Variables are ways of storing values at one point in your program for
-use later in another part of your program. They can be manipulated
-entirely within the program text, and they can also be assigned values
-on the `awk' command line.
-
-* Menu:
-
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-
-
-File: gawk.info, Node: Using Variables, Next: Assignment Options, Up: Variables
-
-6.1.3.1 Using Variables in a Program
-....................................
-
-Variables let you give names to values and refer to them later.
-Variables have already been used in many of the examples. The name of
-a variable must be a sequence of letters, digits, or underscores, and
-it may not begin with a digit. Case is significant in variable names;
-`a' and `A' are distinct variables.
-
- A variable name is a valid expression by itself; it represents the
-variable's current value. Variables are given new values with
-"assignment operators", "increment operators", and "decrement
-operators". *Note Assignment Ops::. In addition, the `sub()' and
-`gsub()' functions can change a variable's value, and the `match()',
-`patsplit()' and `split()' functions can change the contents of their
-array parameters. *Note String Functions::.
-
- A few variables have special built-in meanings, such as `FS' (the
-field separator), and `NF' (the number of fields in the current input
-record). *Note Built-in Variables::, for a list of the built-in
-variables. These built-in variables can be used and assigned just like
-all other variables, but their values are also used or changed
-automatically by `awk'. All built-in variables' names are entirely
-uppercase.
-
- Variables in `awk' can be assigned either numeric or string values.
-The kind of value a variable holds can change over the life of a
-program. By default, variables are initialized to the empty string,
-which is zero if converted to a number. There is no need to explicitly
-"initialize" a variable in `awk', which is what you would do in C and
-in most other traditional languages.
-
-
-File: gawk.info, Node: Assignment Options, Prev: Using Variables, Up: Variables
-
-6.1.3.2 Assigning Variables on the Command Line
-...............................................
-
-Any `awk' variable can be set by including a "variable assignment"
-among the arguments on the command line when `awk' is invoked (*note
-Other Arguments::). Such an assignment has the following form:
-
- VARIABLE=TEXT
-
-With it, a variable is set either at the beginning of the `awk' run or
-in between input files. When the assignment is preceded with the `-v'
-option, as in the following:
-
- -v VARIABLE=TEXT
-
-the variable is set at the very beginning, even before the `BEGIN'
-rules execute. The `-v' option and its assignment must precede all the
-file name arguments, as well as the program text. (*Note Options::,
-for more information about the `-v' option.) Otherwise, the variable
-assignment is performed at a time determined by its position among the
-input file arguments--after the processing of the preceding input file
-argument. For example:
-
- awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
-
-prints the value of field number `n' for all input records. Before the
-first file is read, the command line sets the variable `n' equal to
-four. This causes the fourth field to be printed in lines from
-`inventory-shipped'. After the first file has finished, but before the
-second file is started, `n' is set to two, so that the second field is
-printed in lines from `BBS-list':
-
- $ awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
- -| 15
- -| 24
- ...
- -| 555-5553
- -| 555-3412
- ...
-
- Command-line arguments are made available for explicit examination by
-the `awk' program in the `ARGV' array (*note ARGC and ARGV::). `awk'
-processes the values of command-line assignments for escape sequences
-(*note Escape Sequences::). (d.c.)
-
-
-File: gawk.info, Node: Conversion, Prev: Variables, Up: Values
-
-6.1.4 Conversion of Strings and Numbers
----------------------------------------
-
-Strings are converted to numbers and numbers are converted to strings,
-if the context of the `awk' program demands it. For example, if the
-value of either `foo' or `bar' in the expression `foo + bar' happens to
-be a string, it is converted to a number before the addition is
-performed. If numeric values appear in string concatenation, they are
-converted to strings. Consider the following:
-
- two = 2; three = 3
- print (two three) + 4
-
-This prints the (numeric) value 27. The numeric values of the
-variables `two' and `three' are converted to strings and concatenated
-together. The resulting string is converted back to the number 23, to
-which 4 is then added.
-
- If, for some reason, you need to force a number to be converted to a
-string, concatenate that number with the empty string, `""'. To force
-a string to be converted to a number, add zero to that string. A
-string is converted to a number by interpreting any numeric prefix of
-the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to
-1000, and `"25fix"' has a numeric value of 25. Strings that can't be
-interpreted as valid numbers convert to zero.
-
- The exact manner in which numbers are converted into strings is
-controlled by the `awk' built-in variable `CONVFMT' (*note Built-in
-Variables::). Numbers are converted using the `sprintf()' function
-with `CONVFMT' as the format specifier (*note String Functions::).
-
- `CONVFMT''s default value is `"%.6g"', which prints a value with at
-most six significant digits. For some applications, you might want to
-change it to specify more precision. On most modern machines, 17
-digits is usually enough to capture a floating-point number's value
-exactly.(1)
-
- Strange results can occur if you set `CONVFMT' to a string that
-doesn't tell `sprintf()' how to format floating-point numbers in a
-useful way. For example, if you forget the `%' in the format, `awk'
-converts all numbers to the same constant string.
-
- As a special case, if a number is an integer, then the result of
-converting it to a string is _always_ an integer, no matter what the
-value of `CONVFMT' may be. Given the following code fragment:
-
- CONVFMT = "%2.2f"
- a = 12
- b = a ""
-
-`b' has the value `"12"', not `"12.00"'. (d.c.)
-
- Prior to the POSIX standard, `awk' used the value of `OFMT' for
-converting numbers to strings. `OFMT' specifies the output format to
-use when printing numbers with `print'. `CONVFMT' was introduced in
-order to separate the semantics of conversion from the semantics of
-printing. Both `CONVFMT' and `OFMT' have the same default value:
-`"%.6g"'. In the vast majority of cases, old `awk' programs do not
-change their behavior. However, these semantics for `OFMT' are
-something to keep in mind if you must port your new-style program to
-older implementations of `awk'. We recommend that instead of changing
-your programs, just port `gawk' itself. *Note Print::, for more
-information on the `print' statement.
-
- And, once again, where you are can matter when it comes to converting
-between numbers and strings. In *note Locales::, we mentioned that the
-local character set and language (the locale) can affect how `gawk'
-matches characters. The locale also affects numeric formats. In
-particular, for `awk' programs, it affects the decimal point character.
-The `"C"' locale, and most English-language locales, use the period
-character (`.') as the decimal point. However, many (if not most)
-European and non-English locales use the comma (`,') as the decimal
-point character.
-
- The POSIX standard says that `awk' always uses the period as the
-decimal point when reading the `awk' program source code, and for
-command-line variable assignments (*note Other Arguments::). However,
-when interpreting input data, for `print' and `printf' output, and for
-number to string conversion, the local decimal point character is used.
-Here are some examples indicating the difference in behavior, on a
-GNU/Linux system:
-
- $ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
- -| 3.14159
- $ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }'
- -| 3,14159
- $ echo 4,321 | gawk '{ print $1 + 1 }'
- -| 5
- $ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }'
- -| 5,321
-
-The `en_DK' locale is for English in Denmark, where the comma acts as
-the decimal point separator. In the normal `"C"' locale, `gawk' treats
-`4,321' as `4', while in the Danish locale, it's treated as the full
-number, 4.321.
-
- Some earlier versions of `gawk' fully complied with this aspect of
-the standard. However, many users in non-English locales complained
-about this behavior, since their data used a period as the decimal
-point, so the default behavior was restored to use a period as the
-decimal point character. You can use the `--use-lc-numeric' option
-(*note Options::) to force `gawk' to use the locale's decimal point
-character. (`gawk' also uses the locale's decimal point character when
-in POSIX mode, either via `--posix', or the `POSIXLY_CORRECT'
-environment variable.)
-
- *note table-locale-affects:: describes the cases in which the
-locale's decimal point character is used and when a period is used.
-Some of these features have not been described yet.
-
-Feature Default `--posix' or `--use-lc-numeric'
-------------------------------------------------------------
-`%'g' Use locale Use locale
-`%g' Use period Use locale
-Input Use period Use locale
-`strtonum()'Use period Use locale
-
-Table 6.1: Locale Decimal Point versus A Period
-
- Finally, modern day formal standards and IEEE standard floating point
-representation can have an unusual but important effect on the way
-`gawk' converts some special string values to numbers. The details are
-presented in *note POSIX Floating Point Problems::.
-
- ---------- Footnotes ----------
-
- (1) Pathological cases can require up to 752 digits (!), but we
-doubt that you need to worry about this.
-
-
-File: gawk.info, Node: All Operators, Next: Truth Values and Conditions, Prev: Values, Up: Expressions
-
-6.2 Operators: Doing Something With Values
-==========================================
-
-This minor node introduces the "operators" which make use of the values
-provided by constants and variables.
-
-* Menu:
-
-* Arithmetic Ops:: Arithmetic operations (`+', `-',
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a field.
-* Increment Ops:: Incrementing the numeric value of a variable.
-
-
-File: gawk.info, Node: Arithmetic Ops, Next: Concatenation, Up: All Operators
-
-6.2.1 Arithmetic Operators
---------------------------
-
-The `awk' language uses the common arithmetic operators when evaluating
-expressions. All of these arithmetic operators follow normal
-precedence rules and work as you would expect them to.
-
- The following example uses a file named `grades', which contains a
-list of student names as well as three test scores per student (it's a
-small class):
-
- Pat 100 97 58
- Sandy 84 72 93
- Chris 72 92 89
-
-This program takes the file `grades' and prints the average of the
-scores:
-
- $ awk '{ sum = $2 + $3 + $4 ; avg = sum / 3
- > print $1, avg }' grades
- -| Pat 85
- -| Sandy 83
- -| Chris 84.3333
-
- The following list provides the arithmetic operators in `awk', in
-order from the highest precedence to the lowest:
-
-`- X'
- Negation.
-
-`+ X'
- Unary plus; the expression is converted to a number.
-
-`X ^ Y'
-`X ** Y'
- Exponentiation; X raised to the Y power. `2 ^ 3' has the value
- eight; the character sequence `**' is equivalent to `^'. (c.e.)
-
-`X * Y'
- Multiplication.
-
-`X / Y'
- Division; because all numbers in `awk' are floating-point
- numbers, the result is _not_ rounded to an integer--`3 / 4' has
- the value 0.75. (It is a common mistake, especially for C
- programmers, to forget that _all_ numbers in `awk' are
- floating-point, and that division of integer-looking constants
- produces a real number, not an integer.)
-
-`X % Y'
- Remainder; further discussion is provided in the text, just after
- this list.
-
-`X + Y'
- Addition.
-
-`X - Y'
- Subtraction.
-
- Unary plus and minus have the same precedence, the multiplication
-operators all have the same precedence, and addition and subtraction
-have the same precedence.
-
- When computing the remainder of `X % Y', the quotient is rounded
-toward zero to an integer and multiplied by Y. This result is
-subtracted from X; this operation is sometimes known as "trunc-mod."
-The following relation always holds:
-
- b * int(a / b) + (a % b) == a
-
- One possibly undesirable effect of this definition of remainder is
-that `X % Y' is negative if X is negative. Thus:
-
- -17 % 8 = -1
-
- In other `awk' implementations, the signedness of the remainder may
-be machine-dependent.
-
- NOTE: The POSIX standard only specifies the use of `^' for
- exponentiation. For maximum portability, do not use the `**'
- operator.
-
-
-File: gawk.info, Node: Concatenation, Next: Assignment Ops, Prev: Arithmetic Ops, Up: All Operators
-
-6.2.2 String Concatenation
---------------------------
-
- It seemed like a good idea at the time.
- Brian Kernighan
-
- There is only one string operation: concatenation. It does not have
-a specific operator to represent it. Instead, concatenation is
-performed by writing expressions next to one another, with no operator.
-For example:
-
- $ awk '{ print "Field number one: " $1 }' BBS-list
- -| Field number one: aardvark
- -| Field number one: alpo-net
- ...
-
- Without the space in the string constant after the `:', the line
-runs together. For example:
-
- $ awk '{ print "Field number one:" $1 }' BBS-list
- -| Field number one:aardvark
- -| Field number one:alpo-net
- ...
-
- Because string concatenation does not have an explicit operator, it
-is often necessary to insure that it happens at the right time by using
-parentheses to enclose the items to concatenate. For example, you
-might expect that the following code fragment concatenates `file' and
-`name':
-
- file = "file"
- name = "name"
- print "something meaningful" > file name
-
-This produces a syntax error with some versions of Unix `awk'.(1) It is
-necessary to use the following:
-
- print "something meaningful" > (file name)
-
- Parentheses should be used around concatenation in all but the most
-common contexts, such as on the righthand side of `='. Be careful
-about the kinds of expressions used in string concatenation. In
-particular, the order of evaluation of expressions used for
-concatenation is undefined in the `awk' language. Consider this
-example:
-
- BEGIN {
- a = "don't"
- print (a " " (a = "panic"))
- }
-
-It is not defined whether the assignment to `a' happens before or after
-the value of `a' is retrieved for producing the concatenated value.
-The result could be either `don't panic', or `panic panic'.
-
- The precedence of concatenation, when mixed with other operators, is
-often counter-intuitive. Consider this example:
-
- $ awk 'BEGIN { print -12 " " -24 }'
- -| -12-24
-
- This "obviously" is concatenating -12, a space, and -24. But where
-did the space disappear to? The answer lies in the combination of
-operator precedences and `awk''s automatic conversion rules. To get
-the desired result, write the program this way:
-
- $ awk 'BEGIN { print -12 " " (-24) }'
- -| -12 -24
-
- This forces `awk' to treat the `-' on the `-24' as unary.
-Otherwise, it's parsed as follows:
-
- -12 (`" "' - 24)
- => -12 (0 - 24)
- => -12 (-24)
- => -12-24
-
- As mentioned earlier, when doing concatenation, _parenthesize_.
-Otherwise, you're never quite sure what you'll get.
-
- ---------- Footnotes ----------
-
- (1) It happens that Brian Kernighan's `awk', `gawk' and `mawk' all
-"get it right," but you should not rely on this.
-
-
-File: gawk.info, Node: Assignment Ops, Next: Increment Ops, Prev: Concatenation, Up: All Operators
-
-6.2.3 Assignment Expressions
-----------------------------
-
-An "assignment" is an expression that stores a (usually different)
-value into a variable. For example, let's assign the value one to the
-variable `z':
-
- z = 1
-
- After this expression is executed, the variable `z' has the value
-one. Whatever old value `z' had before the assignment is forgotten.
-
- Assignments can also store string values. For example, the
-following stores the value `"this food is good"' in the variable
-`message':
-
- thing = "food"
- predicate = "good"
- message = "this " thing " is " predicate
-
-This also illustrates string concatenation. The `=' sign is called an
-"assignment operator". It is the simplest assignment operator because
-the value of the righthand operand is stored unchanged. Most operators
-(addition, concatenation, and so on) have no effect except to compute a
-value. If the value isn't used, there's no reason to use the operator.
-An assignment operator is different; it does produce a value, but even
-if you ignore it, the assignment still makes itself felt through the
-alteration of the variable. We call this a "side effect".
-
- The lefthand operand of an assignment need not be a variable (*note
-Variables::); it can also be a field (*note Changing Fields::) or an
-array element (*note Arrays::). These are all called "lvalues", which
-means they can appear on the lefthand side of an assignment operator.
-The righthand operand may be any expression; it produces the new value
-that the assignment stores in the specified variable, field, or array
-element. (Such values are called "rvalues".)
-
- It is important to note that variables do _not_ have permanent types.
-A variable's type is simply the type of whatever value it happens to
-hold at the moment. In the following program fragment, the variable
-`foo' has a numeric value at first, and a string value later on:
-
- foo = 1
- print foo
- foo = "bar"
- print foo
-
-When the second assignment gives `foo' a string value, the fact that it
-previously had a numeric value is forgotten.
-
- String values that do not begin with a digit have a numeric value of
-zero. After executing the following code, the value of `foo' is five:
-
- foo = "a string"
- foo = foo + 5
-
- NOTE: Using a variable as a number and then later as a string can
- be confusing and is poor programming style. The previous two
- examples illustrate how `awk' works, _not_ how you should write
- your programs!
-
- An assignment is an expression, so it has a value--the same value
-that is assigned. Thus, `z = 1' is an expression with the value one.
-One consequence of this is that you can write multiple assignments
-together, such as:
-
- x = y = z = 5
-
-This example stores the value five in all three variables (`x', `y',
-and `z'). It does so because the value of `z = 5', which is five, is
-stored into `y' and then the value of `y = z = 5', which is five, is
-stored into `x'.
-
- Assignments may be used anywhere an expression is called for. For
-example, it is valid to write `x != (y = 1)' to set `y' to one, and
-then test whether `x' equals one. But this style tends to make
-programs hard to read; such nesting of assignments should be avoided,
-except perhaps in a one-shot program.
-
- Aside from `=', there are several other assignment operators that do
-arithmetic with the old value of the variable. For example, the
-operator `+=' computes a new value by adding the righthand value to the
-old value of the variable. Thus, the following assignment adds five to
-the value of `foo':
-
- foo += 5
-
-This is equivalent to the following:
-
- foo = foo + 5
-
-Use whichever makes the meaning of your program clearer.
-
- There are situations where using `+=' (or any assignment operator)
-is _not_ the same as simply repeating the lefthand operand in the
-righthand expression. For example:
-
- # Thanks to Pat Rankin for this example
- BEGIN {
- foo[rand()] += 5
- for (x in foo)
- print x, foo[x]
-
- bar[rand()] = bar[rand()] + 5
- for (x in bar)
- print x, bar[x]
- }
-
-The indices of `bar' are practically guaranteed to be different, because
-`rand()' returns different values each time it is called. (Arrays and
-the `rand()' function haven't been covered yet. *Note Arrays::, and
-see *note Numeric Functions::, for more information). This example
-illustrates an important fact about assignment operators: the lefthand
-expression is only evaluated _once_. It is up to the implementation as
-to which expression is evaluated first, the lefthand or the righthand.
-Consider this example:
-
- i = 1
- a[i += 2] = i + 1
-
-The value of `a[3]' could be either two or four.
-
- *note table-assign-ops:: lists the arithmetic assignment operators.
-In each case, the righthand operand is an expression whose value is
-converted to a number.
-
-Operator Effect
---------------------------------------------------------------------------
-LVALUE `+=' INCREMENT Adds INCREMENT to the value of LVALUE.
-LVALUE `-=' DECREMENT Subtracts DECREMENT from the value of LVALUE.
-LVALUE `*=' Multiplies the value of LVALUE by COEFFICIENT.
-COEFFICIENT
-LVALUE `/=' DIVISOR Divides the value of LVALUE by DIVISOR.
-LVALUE `%=' MODULUS Sets LVALUE to its remainder by MODULUS.
-LVALUE `^=' POWER
-LVALUE `**=' POWER Raises LVALUE to the power POWER. (c.e.)
-
-Table 6.2: Arithmetic Assignment Operators
-
- NOTE: Only the `^=' operator is specified by POSIX. For maximum
- portability, do not use the `**=' operator.
-
-Advanced Notes: Syntactic Ambiguities Between `/=' and Regular Expressions
---------------------------------------------------------------------------
-
-There is a syntactic ambiguity between the `/=' assignment operator and
-regexp constants whose first character is an `='. (d.c.) This is most
-notable in commercial `awk' versions. For example:
-
- $ awk /==/ /dev/null
- error--> awk: syntax error at source line 1
- error--> context is
- error--> >>> /= <<<
- error--> awk: bailing out at source line 1
-
-A workaround is:
-
- awk '/[=]=/' /dev/null
-
- `gawk' does not have this problem, nor do the other freely available
-versions described in *note Other Versions::.
-
-
-File: gawk.info, Node: Increment Ops, Prev: Assignment Ops, Up: All Operators
-
-6.2.4 Increment and Decrement Operators
----------------------------------------
-
-"Increment" and "decrement operators" increase or decrease the value of
-a variable by one. An assignment operator can do the same thing, so
-the increment operators add no power to the `awk' language; however,
-they are convenient abbreviations for very common operations.
-
- The operator used for adding one is written `++'. It can be used to
-increment a variable either before or after taking its value. To
-pre-increment a variable `v', write `++v'. This adds one to the value
-of `v'--that new value is also the value of the expression. (The
-assignment expression `v += 1' is completely equivalent.) Writing the
-`++' after the variable specifies post-increment. This increments the
-variable value just the same; the difference is that the value of the
-increment expression itself is the variable's _old_ value. Thus, if
-`foo' has the value four, then the expression `foo++' has the value
-four, but it changes the value of `foo' to five. In other words, the
-operator returns the old value of the variable, but with the side
-effect of incrementing it.
-
- The post-increment `foo++' is nearly the same as writing `(foo += 1)
-- 1'. It is not perfectly equivalent because all numbers in `awk' are
-floating-point--in floating-point, `foo + 1 - 1' does not necessarily
-equal `foo'. But the difference is minute as long as you stick to
-numbers that are fairly small (less than 10e12).
-
- Fields and array elements are incremented just like variables. (Use
-`$(i++)' when you want to do a field reference and a variable increment
-at the same time. The parentheses are necessary because of the
-precedence of the field reference operator `$'.)
-
- The decrement operator `--' works just like `++', except that it
-subtracts one instead of adding it. As with `++', it can be used before
-the lvalue to pre-decrement or after it to post-decrement. Following
-is a summary of increment and decrement expressions:
-
-`++LVALUE'
- Increment LVALUE, returning the new value as the value of the
- expression.
-
-`LVALUE++'
- Increment LVALUE, returning the _old_ value of LVALUE as the value
- of the expression.
-
-`--LVALUE'
- Decrement LVALUE, returning the new value as the value of the
- expression. (This expression is like `++LVALUE', but instead of
- adding, it subtracts.)
-
-`LVALUE--'
- Decrement LVALUE, returning the _old_ value of LVALUE as the value
- of the expression. (This expression is like `LVALUE++', but
- instead of adding, it subtracts.)
-
-Advanced Notes: Operator Evaluation Order
------------------------------------------
-
- Doctor, doctor! It hurts when I do this!
- So don't do that!
- Groucho Marx
-
-What happens for something like the following?
-
- b = 6
- print b += b++
-
-Or something even stranger?
-
- b = 6
- b += ++b + b++
- print b
-
- In other words, when do the various side effects prescribed by the
-postfix operators (`b++') take effect? When side effects happen is
-"implementation defined". In other words, it is up to the particular
-version of `awk'. The result for the first example may be 12 or 13,
-and for the second, it may be 22 or 23.
-
- In short, doing things like this is not recommended and definitely
-not anything that you can rely upon for portability. You should avoid
-such things in your own programs.
-
-
-File: gawk.info, Node: Truth Values and Conditions, Next: Function Calls, Prev: All Operators, Up: Expressions
-
-6.3 Truth Values and Conditions
-===============================
-
-In certain contexts, expression values also serve as "truth values;"
-i.e., they determine what should happen next as the program runs. This
-minor node describes how `awk' defines "true" and "false" and how
-values are compared.
-
-* Menu:
-
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings with
- `<', etc.
-* Boolean Ops:: Combining comparison expressions using boolean
- operators `||' (``or''), `&&'
- (``and'') and `!' (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-
-
-File: gawk.info, Node: Truth Values, Next: Typing and Comparison, Up: Truth Values and Conditions
-
-6.3.1 True and False in `awk'
------------------------------
-
-Many programming languages have a special representation for the
-concepts of "true" and "false." Such languages usually use the special
-constants `true' and `false', or perhaps their uppercase equivalents.
-However, `awk' is different. It borrows a very simple concept of true
-and false from C. In `awk', any nonzero numeric value _or_ any
-nonempty string value is true. Any other value (zero or the null
-string, `""') is false. The following program prints `A strange truth
-value' three times:
-
- BEGIN {
- if (3.1415927)
- print "A strange truth value"
- if ("Four Score And Seven Years Ago")
- print "A strange truth value"
- if (j = 57)
- print "A strange truth value"
- }
-
- There is a surprising consequence of the "nonzero or non-null" rule:
-the string constant `"0"' is actually true, because it is non-null.
-(d.c.)
-
-
-File: gawk.info, Node: Typing and Comparison, Next: Boolean Ops, Prev: Truth Values, Up: Truth Values and Conditions
-
-6.3.2 Variable Typing and Comparison Expressions
-------------------------------------------------
-
- The Guide is definitive. Reality is frequently inaccurate.
- The Hitchhiker's Guide to the Galaxy
-
- Unlike other programming languages, `awk' variables do not have a
-fixed type. Instead, they can be either a number or a string, depending
-upon the value that is assigned to them. We look now at how variables
-are typed, and how `awk' compares variables.
-
-* Menu:
-
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-
-
-File: gawk.info, Node: Variable Typing, Next: Comparison Operators, Up: Typing and Comparison
-
-6.3.2.1 String Type Versus Numeric Type
-.......................................
-
-The 1992 POSIX standard introduced the concept of a "numeric string",
-which is simply a string that looks like a number--for example,
-`" +2"'. This concept is used for determining the type of a variable.
-The type of the variable is important because the types of two variables
-determine how they are compared. The various versions of the POSIX
-standard did not get the rules quite right for several editions.
-Fortunately, as of at least the 2008 standard (and possibly earlier),
-the standard has been fixed, and variable typing follows these rules:(1)
-
- * A numeric constant or the result of a numeric operation has the
- NUMERIC attribute.
-
- * A string constant or the result of a string operation has the
- STRING attribute.
-
- * Fields, `getline' input, `FILENAME', `ARGV' elements, `ENVIRON'
- elements, and the elements of an array created by `patsplit()',
- `split()' and `match()' that are numeric strings have the STRNUM
- attribute. Otherwise, they have the STRING attribute.
- Uninitialized variables also have the STRNUM attribute.
-
- * Attributes propagate across assignments but are not changed by any
- use.
-
- The last rule is particularly important. In the following program,
-`a' has numeric type, even though it is later used in a string
-operation:
-
- BEGIN {
- a = 12.345
- b = a " is a cute number"
- print b
- }
-
- When two operands are compared, either string comparison or numeric
-comparison may be used. This depends upon the attributes of the
-operands, according to the following symmetric matrix:
-
- +---------------------------------------------
- | STRING NUMERIC STRNUM
- -------+---------------------------------------------
- |
- STRING | string string string
- |
- NUMERIC | string numeric numeric
- |
- STRNUM | string numeric numeric
- -------+---------------------------------------------
-
- The basic idea is that user input that looks numeric--and _only_
-user input--should be treated as numeric, even though it is actually
-made of characters and is therefore also a string. Thus, for example,
-the string constant `" +3.14"', when it appears in program source code,
-is a string--even though it looks numeric--and is _never_ treated as
-number for comparison purposes.
-
- In short, when one operand is a "pure" string, such as a string
-constant, then a string comparison is performed. Otherwise, a numeric
-comparison is performed.
-
- This point bears additional emphasis: All user input is made of
-characters, and so is first and foremost of STRING type; input strings
-that look numeric are additionally given the STRNUM attribute. Thus,
-the six-character input string ` +3.14' receives the STRNUM attribute.
-In contrast, the eight-character literal `" +3.14"' appearing in
-program text is a string constant. The following examples print `1'
-when the comparison between the two different constants is true, `0'
-otherwise:
-
- $ echo ' +3.14' | gawk '{ print $0 == " +3.14" }' True
- -| 1
- $ echo ' +3.14' | gawk '{ print $0 == "+3.14" }' False
- -| 0
- $ echo ' +3.14' | gawk '{ print $0 == "3.14" }' False
- -| 0
- $ echo ' +3.14' | gawk '{ print $0 == 3.14 }' True
- -| 1
- $ echo ' +3.14' | gawk '{ print $1 == " +3.14" }' False
- -| 0
- $ echo ' +3.14' | gawk '{ print $1 == "+3.14" }' True
- -| 1
- $ echo ' +3.14' | gawk '{ print $1 == "3.14" }' False
- -| 0
- $ echo ' +3.14' | gawk '{ print $1 == 3.14 }' True
- -| 1
-
- ---------- Footnotes ----------
-
- (1) `gawk' has followed these rules for many years, and it is
-gratifying that the POSIX standard is also now correct.
-
-
-File: gawk.info, Node: Comparison Operators, Next: POSIX String Comparison, Prev: Variable Typing, Up: Typing and Comparison
-
-6.3.2.2 Comparison Operators
-............................
-
-"Comparison expressions" compare strings or numbers for relationships
-such as equality. They are written using "relational operators", which
-are a superset of those in C. *note table-relational-ops:: describes
-them.
-
-Expression Result
---------------------------------------------------------------------------
-X `<' Y True if X is less than Y.
-X `<=' Y True if X is less than or equal to Y.
-X `>' Y True if X is greater than Y.
-X `>=' Y True if X is greater than or equal to Y.
-X `==' Y True if X is equal to Y.
-X `!=' Y True if X is not equal to Y.
-X `~' Y True if the string X matches the regexp denoted by Y.
-X `!~' Y True if the string X does not match the regexp
- denoted by Y.
-SUBSCRIPT `in' True if the array ARRAY has an element with the
-ARRAY subscript SUBSCRIPT.
-
-Table 6.3: Relational Operators
-
- Comparison expressions have the value one if true and zero if false.
-When comparing operands of mixed types, numeric operands are converted
-to strings using the value of `CONVFMT' (*note Conversion::).
-
- Strings are compared by comparing the first character of each, then
-the second character of each, and so on. Thus, `"10"' is less than
-`"9"'. If there are two strings where one is a prefix of the other,
-the shorter string is less than the longer one. Thus, `"abc"' is less
-than `"abcd"'.
-
- It is very easy to accidentally mistype the `==' operator and leave
-off one of the `=' characters. The result is still valid `awk' code,
-but the program does not do what is intended:
-
- if (a = b) # oops! should be a == b
- ...
- else
- ...
-
-Unless `b' happens to be zero or the null string, the `if' part of the
-test always succeeds. Because the operators are so similar, this kind
-of error is very difficult to spot when scanning the source code.
-
- The following table of expressions illustrates the kind of comparison
-`gawk' performs, as well as what the result of the comparison is:
-
-`1.5 <= 2.0'
- numeric comparison (true)
-
-`"abc" >= "xyz"'
- string comparison (false)
-
-`1.5 != " +2"'
- string comparison (true)
-
-`"1e2" < "3"'
- string comparison (true)
-
-`a = 2; b = "2"'
-`a == b'
- string comparison (true)
-
-`a = 2; b = " +2"'
-
-`a == b'
- string comparison (false)
-
- In this example:
-
- $ echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'
- -| false
-
-the result is `false' because both `$1' and `$2' are user input. They
-are numeric strings--therefore both have the STRNUM attribute,
-dictating a numeric comparison. The purpose of the comparison rules
-and the use of numeric strings is to attempt to produce the behavior
-that is "least surprising," while still "doing the right thing."
-
- String comparisons and regular expression comparisons are very
-different. For example:
-
- x == "foo"
-
-has the value one, or is true if the variable `x' is precisely `foo'.
-By contrast:
-
- x ~ /foo/
-
-has the value one if `x' contains `foo', such as `"Oh, what a fool am
-I!"'.
-
- The righthand operand of the `~' and `!~' operators may be either a
-regexp constant (`/.../') or an ordinary expression. In the latter
-case, the value of the expression as a string is used as a dynamic
-regexp (*note Regexp Usage::; also *note Computed Regexps::).
-
- In modern implementations of `awk', a constant regular expression in
-slashes by itself is also an expression. The regexp `/REGEXP/' is an
-abbreviation for the following comparison expression:
-
- $0 ~ /REGEXP/
-
- One special place where `/foo/' is _not_ an abbreviation for `$0 ~
-/foo/' is when it is the righthand operand of `~' or `!~'. *Note Using
-Constant Regexps::, where this is discussed in more detail.
-
-
-File: gawk.info, Node: POSIX String Comparison, Prev: Comparison Operators, Up: Typing and Comparison
-
-6.3.2.3 String Comparison With POSIX Rules
-..........................................
-
-The POSIX standard says that string comparison is performed based on
-the locale's collating order. This is usually very different from the
-results obtained when doing straight character-by-character
-comparison.(1)
-
- Because this behavior differs considerably from existing practice,
-`gawk' only implements it when in POSIX mode (*note Options::). Here
-is an example to illustrate the difference, in an `en_US.UTF-8' locale:
-
- $ gawk 'BEGIN { printf("ABC < abc = %s\n",
- > ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
- -| ABC < abc = TRUE
- $ gawk --posix 'BEGIN { printf("ABC < abc = %s\n",
- > ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
- -| ABC < abc = FALSE
-
- ---------- Footnotes ----------
-
- (1) Technically, string comparison is supposed to behave the same
-way as if the strings are compared with the C `strcoll()' function.
-
-
-File: gawk.info, Node: Boolean Ops, Next: Conditional Exp, Prev: Typing and Comparison, Up: Truth Values and Conditions
-
-6.3.3 Boolean Expressions
--------------------------
-
-A "Boolean expression" is a combination of comparison expressions or
-matching expressions, using the Boolean operators "or" (`||'), "and"
-(`&&'), and "not" (`!'), along with parentheses to control nesting.
-The truth value of the Boolean expression is computed by combining the
-truth values of the component expressions. Boolean expressions are
-also referred to as "logical expressions". The terms are equivalent.
-
- Boolean expressions can be used wherever comparison and matching
-expressions can be used. They can be used in `if', `while', `do', and
-`for' statements (*note Statements::). They have numeric values (one
-if true, zero if false) that come into play if the result of the
-Boolean expression is stored in a variable or used in arithmetic.
-
- In addition, every Boolean expression is also a valid pattern, so
-you can use one as a pattern to control the execution of rules. The
-Boolean operators are:
-
-`BOOLEAN1 && BOOLEAN2'
- True if both BOOLEAN1 and BOOLEAN2 are true. For example, the
- following statement prints the current input record if it contains
- both `2400' and `foo':
-
- if ($0 ~ /2400/ && $0 ~ /foo/) print
-
- The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true.
- This can make a difference when BOOLEAN2 contains expressions that
- have side effects. In the case of `$0 ~ /foo/ && ($2 == bar++)',
- the variable `bar' is not incremented if there is no substring
- `foo' in the record.
-
-`BOOLEAN1 || BOOLEAN2'
- True if at least one of BOOLEAN1 or BOOLEAN2 is true. For
- example, the following statement prints all records in the input
- that contain _either_ `2400' or `foo' or both:
-
- if ($0 ~ /2400/ || $0 ~ /foo/) print
-
- The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false.
- This can make a difference when BOOLEAN2 contains expressions that
- have side effects.
-
-`! BOOLEAN'
- True if BOOLEAN is false. For example, the following program
- prints `no home!' in the unusual event that the `HOME' environment
- variable is not defined:
-
- BEGIN { if (! ("HOME" in ENVIRON))
- print "no home!" }
-
- (The `in' operator is described in *note Reference to Elements::.)
-
- The `&&' and `||' operators are called "short-circuit" operators
-because of the way they work. Evaluation of the full expression is
-"short-circuited" if the result can be determined part way through its
-evaluation.
-
- Statements that use `&&' or `||' can be continued simply by putting
-a newline after them. But you cannot put a newline in front of either
-of these operators without using backslash continuation (*note
-Statements/Lines::).
-
- The actual value of an expression using the `!' operator is either
-one or zero, depending upon the truth value of the expression it is
-applied to. The `!' operator is often useful for changing the sense of
-a flag variable from false to true and back again. For example, the
-following program is one way to print lines in between special
-bracketing lines:
-
- $1 == "START" { interested = ! interested; next }
- interested == 1 { print }
- $1 == "END" { interested = ! interested; next }
-
-The variable `interested', as with all `awk' variables, starts out
-initialized to zero, which is also false. When a line is seen whose
-first field is `START', the value of `interested' is toggled to true,
-using `!'. The next rule prints lines as long as `interested' is true.
-When a line is seen whose first field is `END', `interested' is toggled
-back to false.(1)
-
- NOTE: The `next' statement is discussed in *note Next Statement::.
- `next' tells `awk' to skip the rest of the rules, get the next
- record, and start processing the rules over again at the top. The
- reason it's there is to avoid printing the bracketing `START' and
- `END' lines.
-
- ---------- Footnotes ----------
-
- (1) This program has a bug; it prints lines starting with `END'. How
-would you fix it?
-
-
-File: gawk.info, Node: Conditional Exp, Prev: Boolean Ops, Up: Truth Values and Conditions
-
-6.3.4 Conditional Expressions
------------------------------
-
-A "conditional expression" is a special kind of expression that has
-three operands. It allows you to use one expression's value to select
-one of two other expressions. The conditional expression is the same
-as in the C language, as shown here:
-
- SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP
-
-There are three subexpressions. The first, SELECTOR, is always
-computed first. If it is "true" (not zero or not null), then
-IF-TRUE-EXP is computed next and its value becomes the value of the
-whole expression. Otherwise, IF-FALSE-EXP is computed next and its
-value becomes the value of the whole expression. For example, the
-following expression produces the absolute value of `x':
-
- x >= 0 ? x : -x
-
- Each time the conditional expression is computed, only one of
-IF-TRUE-EXP and IF-FALSE-EXP is used; the other is ignored. This is
-important when the expressions have side effects. For example, this
-conditional expression examines element `i' of either array `a' or
-array `b', and increments `i':
-
- x == y ? a[i++] : b[i++]
-
-This is guaranteed to increment `i' exactly once, because each time
-only one of the two increment expressions is executed and the other is
-not. *Note Arrays::, for more information about arrays.
-
- As a minor `gawk' extension, a statement that uses `?:' can be
-continued simply by putting a newline after either character. However,
-putting a newline in front of either character does not work without
-using backslash continuation (*note Statements/Lines::). If `--posix'
-is specified (*note Options::), then this extension is disabled.
-
-
-File: gawk.info, Node: Function Calls, Next: Precedence, Prev: Truth Values and Conditions, Up: Expressions
-
-6.4 Function Calls
-==================
-
-A "function" is a name for a particular calculation. This enables you
-to ask for it by name at any point in the program. For example, the
-function `sqrt()' computes the square root of a number.
-
- A fixed set of functions are "built-in", which means they are
-available in every `awk' program. The `sqrt()' function is one of
-these. *Note Built-in::, for a list of built-in functions and their
-descriptions. In addition, you can define functions for use in your
-program. *Note User-defined::, for instructions on how to do this.
-
- The way to use a function is with a "function call" expression,
-which consists of the function name followed immediately by a list of
-"arguments" in parentheses. The arguments are expressions that provide
-the raw materials for the function's calculations. When there is more
-than one argument, they are separated by commas. If there are no
-arguments, just write `()' after the function name. The following
-examples show function calls with and without arguments:
-
- sqrt(x^2 + y^2) one argument
- atan2(y, x) two arguments
- rand() no arguments
-
- CAUTION: Do not put any space between the function name and the
- open-parenthesis! A user-defined function name looks just like
- the name of a variable--a space would make the expression look
- like concatenation of a variable with an expression inside
- parentheses. With built-in functions, space before the
- parenthesis is harmless, but it is best not to get into the habit
- of using space to avoid mistakes with user-defined functions.
-
- Each function expects a particular number of arguments. For
-example, the `sqrt()' function must be called with a single argument,
-the number of which to take the square root:
-
- sqrt(ARGUMENT)
-
- Some of the built-in functions have one or more optional arguments.
-If those arguments are not supplied, the functions use a reasonable
-default value. *Note Built-in::, for full details. If arguments are
-omitted in calls to user-defined functions, then those arguments are
-treated as local variables and initialized to the empty string (*note
-User-defined::).
-
- As an advanced feature, `gawk' provides indirect function calls,
-which is a way to choose the function to call at runtime, instead of
-when you write the source code to your program. We defer discussion of
-this feature until later; see *note Indirect Calls::.
-
- Like every other expression, the function call has a value, which is
-computed by the function based on the arguments you give it. In this
-example, the value of `sqrt(ARGUMENT)' is the square root of ARGUMENT.
-The following program reads numbers, one number per line, and prints the
-square root of each one:
-
- $ awk '{ print "The square root of", $1, "is", sqrt($1) }'
- 1
- -| The square root of 1 is 1
- 3
- -| The square root of 3 is 1.73205
- 5
- -| The square root of 5 is 2.23607
- Ctrl-d
-
- A function can also have side effects, such as assigning values to
-certain variables or doing I/O. This program shows how the `match()'
-function (*note String Functions::) changes the variables `RSTART' and
-`RLENGTH':
-
- {
- if (match($1, $2))
- print RSTART, RLENGTH
- else
- print "no match"
- }
-
-Here is a sample run:
-
- $ awk -f matchit.awk
- aaccdd c+
- -| 3 2
- foo bar
- -| no match
- abcdefg e
- -| 5 1
-
-
-File: gawk.info, Node: Precedence, Next: Locales, Prev: Function Calls, Up: Expressions
-
-6.5 Operator Precedence (How Operators Nest)
-============================================
-
-"Operator precedence" determines how operators are grouped when
-different operators appear close by in one expression. For example,
-`*' has higher precedence than `+'; thus, `a + b * c' means to multiply
-`b' and `c', and then add `a' to the product (i.e., `a + (b * c)').
-
- The normal precedence of the operators can be overruled by using
-parentheses. Think of the precedence rules as saying where the
-parentheses are assumed to be. In fact, it is wise to always use
-parentheses whenever there is an unusual combination of operators,
-because other people who read the program may not remember what the
-precedence is in this case. Even experienced programmers occasionally
-forget the exact rules, which leads to mistakes. Explicit parentheses
-help prevent any such mistakes.
-
- When operators of equal precedence are used together, the leftmost
-operator groups first, except for the assignment, conditional, and
-exponentiation operators, which group in the opposite order. Thus, `a
-- b + c' groups as `(a - b) + c' and `a = b = c' groups as `a = (b =
-c)'.
-
- Normally the precedence of prefix unary operators does not matter,
-because there is only one way to interpret them: innermost first.
-Thus, `$++i' means `$(++i)' and `++$x' means `++($x)'. However, when
-another operator follows the operand, then the precedence of the unary
-operators can matter. `$x^2' means `($x)^2', but `-x^2' means
-`-(x^2)', because `-' has lower precedence than `^', whereas `$' has
-higher precedence. Also, operators cannot be combined in a way that
-violates the precedence rules; for example, `$$0++--' is not a valid
-expression because the first `$' has higher precedence than the `++';
-to avoid the problem the expression can be rewritten as `$($0++)--'.
-
- This table presents `awk''s operators, in order of highest to lowest
-precedence:
-
-`(...)'
- Grouping.
-
-`$'
- Field reference.
-
-`++ --'
- Increment, decrement.
-
-`^ **'
- Exponentiation. These operators group right-to-left.
-
-`+ - !'
- Unary plus, minus, logical "not."
-
-`* / %'
- Multiplication, division, remainder.
-
-`+ -'
- Addition, subtraction.
-
-`String Concatenation'
- There is no special symbol for concatenation. The operands are
- simply written side by side (*note Concatenation::).
-
-`< <= == != > >= >> | |&'
- Relational and redirection. The relational operators and the
- redirections have the same precedence level. Characters such as
- `>' serve both as relationals and as redirections; the context
- distinguishes between the two meanings.
-
- Note that the I/O redirection operators in `print' and `printf'
- statements belong to the statement level, not to expressions. The
- redirection does not produce an expression that could be the
- operand of another operator. As a result, it does not make sense
- to use a redirection operator near another operator of lower
- precedence without parentheses. Such combinations (for example,
- `print foo > a ? b : c'), result in syntax errors. The correct
- way to write this statement is `print foo > (a ? b : c)'.
-
-`~ !~'
- Matching, nonmatching.
-
-`in'
- Array membership.
-
-`&&'
- Logical "and".
-
-`||'
- Logical "or".
-
-`?:'
- Conditional. This operator groups right-to-left.
-
-`= += -= *= /= %= ^= **='
- Assignment. These operators group right-to-left.
-
- NOTE: The `|&', `**', and `**=' operators are not specified by
- POSIX. For maximum portability, do not use them.
-
-
-File: gawk.info, Node: Locales, Prev: Precedence, Up: Expressions
-
-6.6 Where You Are Makes A Difference
-====================================
-
-Modern systems support the notion of "locales": a way to tell the
-system about the local character set and language.
-
- Once upon a time, the locale setting used to affect regexp matching
-(*note Ranges and Locales::), but this is no longer true.
-
- Locales can affect record splitting. For the normal case of `RS =
-"\n"', the locale is largely irrelevant. For other single-character
-record separators, setting `LC_ALL=C' in the environment will give you
-much better performance when reading records. Otherwise, `gawk' has to
-make several function calls, _per input character_, to find the record
-terminator.
-
- According to POSIX, string comparison is also affected by locales
-(similar to regular expressions). The details are presented in *note
-POSIX String Comparison::.
-
- Finally, the locale affects the value of the decimal point character
-used when `gawk' parses input data. This is discussed in detail in
-*note Conversion::.
-
-
-File: gawk.info, Node: Patterns and Actions, Next: Arrays, Prev: Expressions, Up: Top
-
-7 Patterns, Actions, and Variables
-**********************************
-
-As you have already seen, each `awk' statement consists of a pattern
-with an associated action. This major node describes how you build
-patterns and actions, what kinds of things you can do within actions,
-and `awk''s built-in variables.
-
- The pattern-action rules and the statements available for use within
-actions form the core of `awk' programming. In a sense, everything
-covered up to here has been the foundation that programs are built on
-top of. Now it's time to start building something useful.
-
-* Menu:
-
-* Pattern Overview:: What goes into a pattern.
-* Using Shell Variables:: How to use shell variables with `awk'.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* Built-in Variables:: Summarizes the built-in variables.
-
-
-File: gawk.info, Node: Pattern Overview, Next: Using Shell Variables, Up: Patterns and Actions
-
-7.1 Pattern Elements
-====================
-
-* Menu:
-
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every record.
-
- Patterns in `awk' control the execution of rules--a rule is executed
-when its pattern matches the current input record. The following is a
-summary of the types of `awk' patterns:
-
-`/REGULAR EXPRESSION/'
- A regular expression. It matches when the text of the input record
- fits the regular expression. (*Note Regexp::.)
-
-`EXPRESSION'
- A single expression. It matches when its value is nonzero (if a
- number) or non-null (if a string). (*Note Expression Patterns::.)
-
-`PAT1, PAT2'
- A pair of patterns separated by a comma, specifying a range of
- records. The range includes both the initial record that matches
- PAT1 and the final record that matches PAT2. (*Note Ranges::.)
-
-`BEGIN'
-`END'
- Special patterns for you to supply startup or cleanup actions for
- your `awk' program. (*Note BEGIN/END::.)
-
-`BEGINFILE'
-`ENDFILE'
- Special patterns for you to supply startup or cleanup actions to be
- done on a per file basis. (*Note BEGINFILE/ENDFILE::.)
-
-`EMPTY'
- The empty pattern matches every input record. (*Note Empty::.)
-
-
-File: gawk.info, Node: Regexp Patterns, Next: Expression Patterns, Up: Pattern Overview
-
-7.1.1 Regular Expressions as Patterns
--------------------------------------
-
-Regular expressions are one of the first kinds of patterns presented in
-this book. This kind of pattern is simply a regexp constant in the
-pattern part of a rule. Its meaning is `$0 ~ /PATTERN/'. The pattern
-matches when the input record matches the regexp. For example:
-
- /foo|bar|baz/ { buzzwords++ }
- END { print buzzwords, "buzzwords seen" }
-
-
-File: gawk.info, Node: Expression Patterns, Next: Ranges, Prev: Regexp Patterns, Up: Pattern Overview
-
-7.1.2 Expressions as Patterns
------------------------------
-
-Any `awk' expression is valid as an `awk' pattern. The pattern matches
-if the expression's value is nonzero (if a number) or non-null (if a
-string). The expression is reevaluated each time the rule is tested
-against a new input record. If the expression uses fields such as
-`$1', the value depends directly on the new input record's text;
-otherwise, it depends on only what has happened so far in the execution
-of the `awk' program.
-
- Comparison expressions, using the comparison operators described in
-*note Typing and Comparison::, are a very common kind of pattern.
-Regexp matching and nonmatching are also very common expressions. The
-left operand of the `~' and `!~' operators is a string. The right
-operand is either a constant regular expression enclosed in slashes
-(`/REGEXP/'), or any expression whose string value is used as a dynamic
-regular expression (*note Computed Regexps::). The following example
-prints the second field of each input record whose first field is
-precisely `foo':
-
- $ awk '$1 == "foo" { print $2 }' BBS-list
-
-(There is no output, because there is no BBS site with the exact name
-`foo'.) Contrast this with the following regular expression match,
-which accepts any record with a first field that contains `foo':
-
- $ awk '$1 ~ /foo/ { print $2 }' BBS-list
- -| 555-1234
- -| 555-6699
- -| 555-6480
- -| 555-2127
-
- A regexp constant as a pattern is also a special case of an
-expression pattern. The expression `/foo/' has the value one if `foo'
-appears in the current input record. Thus, as a pattern, `/foo/'
-matches any record containing `foo'.
-
- Boolean expressions are also commonly used as patterns. Whether the
-pattern matches an input record depends on whether its subexpressions
-match. For example, the following command prints all the records in
-`BBS-list' that contain both `2400' and `foo':
-
- $ awk '/2400/ && /foo/' BBS-list
- -| fooey 555-1234 2400/1200/300 B
-
- The following command prints all records in `BBS-list' that contain
-_either_ `2400' or `foo' (or both, of course):
-
- $ awk '/2400/ || /foo/' BBS-list
- -| alpo-net 555-3412 2400/1200/300 A
- -| bites 555-1675 2400/1200/300 A
- -| fooey 555-1234 2400/1200/300 B
- -| foot 555-6699 1200/300 B
- -| macfoo 555-6480 1200/300 A
- -| sdace 555-3430 2400/1200/300 A
- -| sabafoo 555-2127 1200/300 C
-
- The following command prints all records in `BBS-list' that do _not_
-contain the string `foo':
-
- $ awk '! /foo/' BBS-list
- -| aardvark 555-5553 1200/300 B
- -| alpo-net 555-3412 2400/1200/300 A
- -| barfly 555-7685 1200/300 A
- -| bites 555-1675 2400/1200/300 A
- -| camelot 555-0542 300 C
- -| core 555-2912 1200/300 C
- -| sdace 555-3430 2400/1200/300 A
-
- The subexpressions of a Boolean operator in a pattern can be
-constant regular expressions, comparisons, or any other `awk'
-expressions. Range patterns are not expressions, so they cannot appear
-inside Boolean patterns. Likewise, the special patterns `BEGIN', `END',
-`BEGINFILE' and `ENDFILE', which never match any input record, are not
-expressions and cannot appear inside Boolean patterns.
-
- The precedence of the different operators which can appear in
-patterns is described in *note Precedence::.
-
-
-File: gawk.info, Node: Ranges, Next: BEGIN/END, Prev: Expression Patterns, Up: Pattern Overview
-
-7.1.3 Specifying Record Ranges with Patterns
---------------------------------------------
-
-A "range pattern" is made of two patterns separated by a comma, in the
-form `BEGPAT, ENDPAT'. It is used to match ranges of consecutive input
-records. The first pattern, BEGPAT, controls where the range begins,
-while ENDPAT controls where the pattern ends. For example, the
-following:
-
- awk '$1 == "on", $1 == "off"' myfile
-
-prints every record in `myfile' between `on'/`off' pairs, inclusive.
-
- A range pattern starts out by matching BEGPAT against every input
-record. When a record matches BEGPAT, the range pattern is "turned on"
-and the range pattern matches this record as well. As long as the
-range pattern stays turned on, it automatically matches every input
-record read. The range pattern also matches ENDPAT against every input
-record; when this succeeds, the range pattern is turned off again for
-the following record. Then the range pattern goes back to checking
-BEGPAT against each record.
-
- The record that turns on the range pattern and the one that turns it
-off both match the range pattern. If you don't want to operate on
-these records, you can write `if' statements in the rule's action to
-distinguish them from the records you are interested in.
-
- It is possible for a pattern to be turned on and off by the same
-record. If the record satisfies both conditions, then the action is
-executed for just that record. For example, suppose there is text
-between two identical markers (e.g., the `%' symbol), each on its own
-line, that should be ignored. A first attempt would be to combine a
-range pattern that describes the delimited text with the `next'
-statement (not discussed yet, *note Next Statement::). This causes
-`awk' to skip any further processing of the current record and start
-over again with the next input record. Such a program looks like this:
-
- /^%$/,/^%$/ { next }
- { print }
-
-This program fails because the range pattern is both turned on and
-turned off by the first line, which just has a `%' on it. To
-accomplish this task, write the program in the following manner, using
-a flag:
-
- /^%$/ { skip = ! skip; next }
- skip == 1 { next } # skip lines with `skip' set
-
- In a range pattern, the comma (`,') has the lowest precedence of all
-the operators (i.e., it is evaluated last). Thus, the following
-program attempts to combine a range pattern with another, simpler test:
-
- echo Yes | awk '/1/,/2/ || /Yes/'
-
- The intent of this program is `(/1/,/2/) || /Yes/'. However, `awk'
-interprets this as `/1/, (/2/ || /Yes/)'. This cannot be changed or
-worked around; range patterns do not combine with other patterns:
-
- $ echo Yes | gawk '(/1/,/2/) || /Yes/'
- error--> gawk: cmd. line:1: (/1/,/2/) || /Yes/
- error--> gawk: cmd. line:1: ^ syntax error
-
-
-File: gawk.info, Node: BEGIN/END, Next: BEGINFILE/ENDFILE, Prev: Ranges, Up: Pattern Overview
-
-7.1.4 The `BEGIN' and `END' Special Patterns
---------------------------------------------
-
-All the patterns described so far are for matching input records. The
-`BEGIN' and `END' special patterns are different. They supply startup
-and cleanup actions for `awk' programs. `BEGIN' and `END' rules must
-have actions; there is no default action for these rules because there
-is no current record when they run. `BEGIN' and `END' rules are often
-referred to as "`BEGIN' and `END' blocks" by long-time `awk'
-programmers.
-
-* Menu:
-
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-
-
-File: gawk.info, Node: Using BEGIN/END, Next: I/O And BEGIN/END, Up: BEGIN/END
-
-7.1.4.1 Startup and Cleanup Actions
-...................................
-
-A `BEGIN' rule is executed once only, before the first input record is
-read. Likewise, an `END' rule is executed once only, after all the
-input is read. For example:
-
- $ awk '
- > BEGIN { print "Analysis of \"foo\"" }
- > /foo/ { ++n }
- > END { print "\"foo\" appears", n, "times." }' BBS-list
- -| Analysis of "foo"
- -| "foo" appears 4 times.
-
- This program finds the number of records in the input file `BBS-list'
-that contain the string `foo'. The `BEGIN' rule prints a title for the
-report. There is no need to use the `BEGIN' rule to initialize the
-counter `n' to zero, since `awk' does this automatically (*note
-Variables::). The second rule increments the variable `n' every time a
-record containing the pattern `foo' is read. The `END' rule prints the
-value of `n' at the end of the run.
-
- The special patterns `BEGIN' and `END' cannot be used in ranges or
-with Boolean operators (indeed, they cannot be used with any operators).
-An `awk' program may have multiple `BEGIN' and/or `END' rules. They
-are executed in the order in which they appear: all the `BEGIN' rules
-at startup and all the `END' rules at termination. `BEGIN' and `END'
-rules may be intermixed with other rules. This feature was added in
-the 1987 version of `awk' and is included in the POSIX standard. The
-original (1978) version of `awk' required the `BEGIN' rule to be placed
-at the beginning of the program, the `END' rule to be placed at the
-end, and only allowed one of each. This is no longer required, but it
-is a good idea to follow this template in terms of program organization
-and readability.
-
- Multiple `BEGIN' and `END' rules are useful for writing library
-functions, because each library file can have its own `BEGIN' and/or
-`END' rule to do its own initialization and/or cleanup. The order in
-which library functions are named on the command line controls the
-order in which their `BEGIN' and `END' rules are executed. Therefore,
-you have to be careful when writing such rules in library files so that
-the order in which they are executed doesn't matter. *Note Options::,
-for more information on using library functions. *Note Library
-Functions::, for a number of useful library functions.
-
- If an `awk' program has only `BEGIN' rules and no other rules, then
-the program exits after the `BEGIN' rule is run.(1) However, if an
-`END' rule exists, then the input is read, even if there are no other
-rules in the program. This is necessary in case the `END' rule checks
-the `FNR' and `NR' variables.
-
- ---------- Footnotes ----------
-
- (1) The original version of `awk' kept reading and ignoring input
-until the end of the file was seen.
-
-
-File: gawk.info, Node: I/O And BEGIN/END, Prev: Using BEGIN/END, Up: BEGIN/END
-
-7.1.4.2 Input/Output from `BEGIN' and `END' Rules
-.................................................
-
-There are several (sometimes subtle) points to remember when doing I/O
-from a `BEGIN' or `END' rule. The first has to do with the value of
-`$0' in a `BEGIN' rule. Because `BEGIN' rules are executed before any
-input is read, there simply is no input record, and therefore no
-fields, when executing `BEGIN' rules. References to `$0' and the fields
-yield a null string or zero, depending upon the context. One way to
-give `$0' a real value is to execute a `getline' command without a
-variable (*note Getline::). Another way is simply to assign a value to
-`$0'.
-
- The second point is similar to the first but from the other
-direction. Traditionally, due largely to implementation issues, `$0'
-and `NF' were _undefined_ inside an `END' rule. The POSIX standard
-specifies that `NF' is available in an `END' rule. It contains the
-number of fields from the last input record. Most probably due to an
-oversight, the standard does not say that `$0' is also preserved,
-although logically one would think that it should be. In fact, `gawk'
-does preserve the value of `$0' for use in `END' rules. Be aware,
-however, that Brian Kernighan's `awk', and possibly other
-implementations, do not.
-
- The third point follows from the first two. The meaning of `print'
-inside a `BEGIN' or `END' rule is the same as always: `print $0'. If
-`$0' is the null string, then this prints an empty record. Many long
-time `awk' programmers use an unadorned `print' in `BEGIN' and `END'
-rules, to mean `print ""', relying on `$0' being null. Although one
-might generally get away with this in `BEGIN' rules, it is a very bad
-idea in `END' rules, at least in `gawk'. It is also poor style, since
-if an empty line is needed in the output, the program should print one
-explicitly.
-
- Finally, the `next' and `nextfile' statements are not allowed in a
-`BEGIN' rule, because the implicit
-read-a-record-and-match-against-the-rules loop has not started yet.
-Similarly, those statements are not valid in an `END' rule, since all
-the input has been read. (*Note Next Statement::, and see *note
-Nextfile Statement::.)
-
-
-File: gawk.info, Node: BEGINFILE/ENDFILE, Next: Empty, Prev: BEGIN/END, Up: Pattern Overview
-
-7.1.5 The `BEGINFILE' and `ENDFILE' Special Patterns
-----------------------------------------------------
-
-This minor node describes a `gawk'-specific feature.
-
- Two special kinds of rule, `BEGINFILE' and `ENDFILE', give you
-"hooks" into `gawk''s command-line file processing loop. As with the
-`BEGIN' and `END' rules (*note BEGIN/END::), all `BEGINFILE' rules in a
-program are merged, in the order they are read by `gawk', and all
-`ENDFILE' rules are merged as well.
-
- The body of the `BEGINFILE' rules is executed just before `gawk'
-reads the first record from a file. `FILENAME' is set to the name of
-the current file, and `FNR' is set to zero.
-
- The `BEGINFILE' rule provides you the opportunity for two tasks that
-would otherwise be difficult or impossible to perform:
-
- * You can test if the file is readable. Normally, it is a fatal
- error if a file named on the command line cannot be opened for
- reading. However, you can bypass the fatal error and move on to
- the next file on the command line.
-
- You do this by checking if the `ERRNO' variable is not the empty
- string; if so, then `gawk' was not able to open the file. In this
- case, your program can execute the `nextfile' statement (*note
- Nextfile Statement::). This causes `gawk' to skip the file
- entirely. Otherwise, `gawk' exits with the usual fatal error.
-
- * If you have written extensions that modify the record handling (by
- inserting an "open hook"), you can invoke them at this point,
- before `gawk' has started processing the file. (This is a _very_
- advanced feature, currently used only by the XMLgawk project
- (http://xmlgawk.sourceforge.net).)
-
- The `ENDFILE' rule is called when `gawk' has finished processing the
-last record in an input file. For the last input file, it will be
-called before any `END' rules. The `ENDFILE' rule is executed even for
-empty input files.
-
- Normally, when an error occurs when reading input in the normal input
-processing loop, the error is fatal. However, if an `ENDFILE' rule is
-present, the error becomes non-fatal, and instead `ERRNO' is set. This
-makes it possible to catch and process I/O errors at the level of the
-`awk' program.
-
- The `next' statement (*note Next Statement::) is not allowed inside
-either a `BEGINFILE' or and `ENDFILE' rule. The `nextfile' statement
-(*note Nextfile Statement::) is allowed only inside a `BEGINFILE' rule,
-but not inside an `ENDFILE' rule.
-
- The `getline' statement (*note Getline::) is restricted inside both
-`BEGINFILE' and `ENDFILE'. Only the `getline VARIABLE < FILE' form is
-allowed.
-
- `BEGINFILE' and `ENDFILE' are `gawk' extensions. In most other
-`awk' implementations, or if `gawk' is in compatibility mode (*note
-Options::), they are not special.
-
-
-File: gawk.info, Node: Empty, Prev: BEGINFILE/ENDFILE, Up: Pattern Overview
-
-7.1.6 The Empty Pattern
------------------------
-
-An empty (i.e., nonexistent) pattern is considered to match _every_
-input record. For example, the program:
-
- awk '{ print $1 }' BBS-list
-
-prints the first field of every record.
-
-
-File: gawk.info, Node: Using Shell Variables, Next: Action Overview, Prev: Pattern Overview, Up: Patterns and Actions
-
-7.2 Using Shell Variables in Programs
-=====================================
-
-`awk' programs are often used as components in larger programs written
-in shell. For example, it is very common to use a shell variable to
-hold a pattern that the `awk' program searches for. There are two ways
-to get the value of the shell variable into the body of the `awk'
-program.
-
- The most common method is to use shell quoting to substitute the
-variable's value into the program inside the script. For example, in
-the following program:
-
- printf "Enter search pattern: "
- read pattern
- awk "/$pattern/ "'{ nmatches++ }
- END { print nmatches, "found" }' /path/to/data
-
-the `awk' program consists of two pieces of quoted text that are
-concatenated together to form the program. The first part is
-double-quoted, which allows substitution of the `pattern' shell
-variable inside the quotes. The second part is single-quoted.
-
- Variable substitution via quoting works, but can be potentially
-messy. It requires a good understanding of the shell's quoting rules
-(*note Quoting::), and it's often difficult to correctly match up the
-quotes when reading the program.
-
- A better method is to use `awk''s variable assignment feature (*note
-Assignment Options::) to assign the shell variable's value to an `awk'
-variable's value. Then use dynamic regexps to match the pattern (*note
-Computed Regexps::). The following shows how to redo the previous
-example using this technique:
-
- printf "Enter search pattern: "
- read pattern
- awk -v pat="$pattern" '$0 ~ pat { nmatches++ }
- END { print nmatches, "found" }' /path/to/data
-
-Now, the `awk' program is just one single-quoted string. The
-assignment `-v pat="$pattern"' still requires double quotes, in case
-there is whitespace in the value of `$pattern'. The `awk' variable
-`pat' could be named `pattern' too, but that would be more confusing.
-Using a variable also provides more flexibility, since the variable can
-be used anywhere inside the program--for printing, as an array
-subscript, or for any other use--without requiring the quoting tricks
-at every point in the program.
-
-
-File: gawk.info, Node: Action Overview, Next: Statements, Prev: Using Shell Variables, Up: Patterns and Actions
-
-7.3 Actions
-===========
-
-An `awk' program or script consists of a series of rules and function
-definitions interspersed. (Functions are described later. *Note
-User-defined::.) A rule contains a pattern and an action, either of
-which (but not both) may be omitted. The purpose of the "action" is to
-tell `awk' what to do once a match for the pattern is found. Thus, in
-outline, an `awk' program generally looks like this:
-
- [PATTERN] { ACTION }
- PATTERN [{ ACTION }]
- ...
- function NAME(ARGS) { ... }
- ...
-
- An action consists of one or more `awk' "statements", enclosed in
-curly braces (`{...}'). Each statement specifies one thing to do. The
-statements are separated by newlines or semicolons. The curly braces
-around an action must be used even if the action contains only one
-statement, or if it contains no statements at all. However, if you
-omit the action entirely, omit the curly braces as well. An omitted
-action is equivalent to `{ print $0 }':
-
- /foo/ { } match `foo', do nothing -- empty action
- /foo/ match `foo', print the record -- omitted action
-
- The following types of statements are supported in `awk':
-
-Expressions
- Call functions or assign values to variables (*note
- Expressions::). Executing this kind of statement simply computes
- the value of the expression. This is useful when the expression
- has side effects (*note Assignment Ops::).
-
-Control statements
- Specify the control flow of `awk' programs. The `awk' language
- gives you C-like constructs (`if', `for', `while', and `do') as
- well as a few special ones (*note Statements::).
-
-Compound statements
- Consist of one or more statements enclosed in curly braces. A
- compound statement is used in order to put several statements
- together in the body of an `if', `while', `do', or `for' statement.
-
-Input statements
- Use the `getline' command (*note Getline::). Also supplied in
- `awk' are the `next' statement (*note Next Statement::), and the
- `nextfile' statement (*note Nextfile Statement::).
-
-Output statements
- Such as `print' and `printf'. *Note Printing::.
-
-Deletion statements
- For deleting array elements. *Note Delete::.
-
-
-File: gawk.info, Node: Statements, Next: Built-in Variables, Prev: Action Overview, Up: Patterns and Actions
-
-7.4 Control Statements in Actions
-=================================
-
-"Control statements", such as `if', `while', and so on, control the
-flow of execution in `awk' programs. Most of `awk''s control
-statements are patterned after similar statements in C.
-
- All the control statements start with special keywords, such as `if'
-and `while', to distinguish them from simple expressions. Many control
-statements contain other statements. For example, the `if' statement
-contains another statement that may or may not be executed. The
-contained statement is called the "body". To include more than one
-statement in the body, group them into a single "compound statement"
-with curly braces, separating them with newlines or semicolons.
-
-* Menu:
-
-* If Statement:: Conditionally execute some `awk'
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until some
- condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of `awk'.
-
-
-File: gawk.info, Node: If Statement, Next: While Statement, Up: Statements
-
-7.4.1 The `if'-`else' Statement
--------------------------------
-
-The `if'-`else' statement is `awk''s decision-making statement. It
-looks like this:
-
- if (CONDITION) THEN-BODY [else ELSE-BODY]
-
-The CONDITION is an expression that controls what the rest of the
-statement does. If the CONDITION is true, THEN-BODY is executed;
-otherwise, ELSE-BODY is executed. The `else' part of the statement is
-optional. The condition is considered false if its value is zero or
-the null string; otherwise, the condition is true. Refer to the
-following:
-
- if (x % 2 == 0)
- print "x is even"
- else
- print "x is odd"
-
- In this example, if the expression `x % 2 == 0' is true (that is, if
-the value of `x' is evenly divisible by two), then the first `print'
-statement is executed; otherwise, the second `print' statement is
-executed. If the `else' keyword appears on the same line as THEN-BODY
-and THEN-BODY is not a compound statement (i.e., not surrounded by
-curly braces), then a semicolon must separate THEN-BODY from the `else'.
-To illustrate this, the previous example can be rewritten as:
-
- if (x % 2 == 0) print "x is even"; else
- print "x is odd"
-
-If the `;' is left out, `awk' can't interpret the statement and it
-produces a syntax error. Don't actually write programs this way,
-because a human reader might fail to see the `else' if it is not the
-first thing on its line.
-
-
-File: gawk.info, Node: While Statement, Next: Do Statement, Prev: If Statement, Up: Statements
-
-7.4.2 The `while' Statement
----------------------------
-
-In programming, a "loop" is a part of a program that can be executed
-two or more times in succession. The `while' statement is the simplest
-looping statement in `awk'. It repeatedly executes a statement as long
-as a condition is true. For example:
-
- while (CONDITION)
- BODY
-
-BODY is a statement called the "body" of the loop, and CONDITION is an
-expression that controls how long the loop keeps running. The first
-thing the `while' statement does is test the CONDITION. If the
-CONDITION is true, it executes the statement BODY. (The CONDITION is
-true when the value is not zero and not a null string.) After BODY has
-been executed, CONDITION is tested again, and if it is still true, BODY
-is executed again. This process repeats until the CONDITION is no
-longer true. If the CONDITION is initially false, the body of the loop
-is never executed and `awk' continues with the statement following the
-loop. This example prints the first three fields of each record, one
-per line:
-
- awk '{
- i = 1
- while (i <= 3) {
- print $i
- i++
- }
- }' inventory-shipped
-
-The body of this loop is a compound statement enclosed in braces,
-containing two statements. The loop works in the following manner:
-first, the value of `i' is set to one. Then, the `while' statement
-tests whether `i' is less than or equal to three. This is true when
-`i' equals one, so the `i'-th field is printed. Then the `i++'
-increments the value of `i' and the loop repeats. The loop terminates
-when `i' reaches four.
-
- A newline is not required between the condition and the body;
-however using one makes the program clearer unless the body is a
-compound statement or else is very simple. The newline after the
-open-brace that begins the compound statement is not required either,
-but the program is harder to read without it.
-
-
-File: gawk.info, Node: Do Statement, Next: For Statement, Prev: While Statement, Up: Statements
-
-7.4.3 The `do'-`while' Statement
---------------------------------
-
-The `do' loop is a variation of the `while' looping statement. The
-`do' loop executes the BODY once and then repeats the BODY as long as
-the CONDITION is true. It looks like this:
-
- do
- BODY
- while (CONDITION)
-
- Even if the CONDITION is false at the start, the BODY is executed at
-least once (and only once, unless executing BODY makes CONDITION true).
-Contrast this with the corresponding `while' statement:
-
- while (CONDITION)
- BODY
-
-This statement does not execute BODY even once if the CONDITION is
-false to begin with. The following is an example of a `do' statement:
-
- {
- i = 1
- do {
- print $0
- i++
- } while (i <= 10)
- }
-
-This program prints each input record 10 times. However, it isn't a
-very realistic example, since in this case an ordinary `while' would do
-just as well. This situation reflects actual experience; only
-occasionally is there a real use for a `do' statement.
-
-
-File: gawk.info, Node: For Statement, Next: Switch Statement, Prev: Do Statement, Up: Statements
-
-7.4.4 The `for' Statement
--------------------------
-
-The `for' statement makes it more convenient to count iterations of a
-loop. The general form of the `for' statement looks like this:
-
- for (INITIALIZATION; CONDITION; INCREMENT)
- BODY
-
-The INITIALIZATION, CONDITION, and INCREMENT parts are arbitrary `awk'
-expressions, and BODY stands for any `awk' statement.
-
- The `for' statement starts by executing INITIALIZATION. Then, as
-long as the CONDITION is true, it repeatedly executes BODY and then
-INCREMENT. Typically, INITIALIZATION sets a variable to either zero or
-one, INCREMENT adds one to it, and CONDITION compares it against the
-desired number of iterations. For example:
-
- awk '{
- for (i = 1; i <= 3; i++)
- print $i
- }' inventory-shipped
-
-This prints the first three fields of each input record, with one field
-per line.
-
- It isn't possible to set more than one variable in the
-INITIALIZATION part without using a multiple assignment statement such
-as `x = y = 0'. This makes sense only if all the initial values are
-equal. (But it is possible to initialize additional variables by
-writing their assignments as separate statements preceding the `for'
-loop.)
-
- The same is true of the INCREMENT part. Incrementing additional
-variables requires separate statements at the end of the loop. The C
-compound expression, using C's comma operator, is useful in this
-context but it is not supported in `awk'.
-
- Most often, INCREMENT is an increment expression, as in the previous
-example. But this is not required; it can be any expression
-whatsoever. For example, the following statement prints all the powers
-of two between 1 and 100:
-
- for (i = 1; i <= 100; i *= 2)
- print i
-
- If there is nothing to be done, any of the three expressions in the
-parentheses following the `for' keyword may be omitted. Thus,
-`for (; x > 0;)' is equivalent to `while (x > 0)'. If the CONDITION is
-omitted, it is treated as true, effectively yielding an "infinite loop"
-(i.e., a loop that never terminates).
-
- In most cases, a `for' loop is an abbreviation for a `while' loop,
-as shown here:
-
- INITIALIZATION
- while (CONDITION) {
- BODY
- INCREMENT
- }
-
-The only exception is when the `continue' statement (*note Continue
-Statement::) is used inside the loop. Changing a `for' statement to a
-`while' statement in this way can change the effect of the `continue'
-statement inside the loop.
-
- The `awk' language has a `for' statement in addition to a `while'
-statement because a `for' loop is often both less work to type and more
-natural to think of. Counting the number of iterations is very common
-in loops. It can be easier to think of this counting as part of
-looping rather than as something to do inside the loop.
-
- There is an alternate version of the `for' loop, for iterating over
-all the indices of an array:
-
- for (i in array)
- DO SOMETHING WITH array[i]
-
-*Note Scanning an Array::, for more information on this version of the
-`for' loop.
-
-
-File: gawk.info, Node: Switch Statement, Next: Break Statement, Prev: For Statement, Up: Statements
-
-7.4.5 The `switch' Statement
-----------------------------
-
-The `switch' statement allows the evaluation of an expression and the
-execution of statements based on a `case' match. Case statements are
-checked for a match in the order they are defined. If no suitable
-`case' is found, the `default' section is executed, if supplied.
-
- Each `case' contains a single constant, be it numeric, string, or
-regexp. The `switch' expression is evaluated, and then each `case''s
-constant is compared against the result in turn. The type of constant
-determines the comparison: numeric or string do the usual comparisons.
-A regexp constant does a regular expression match against the string
-value of the original expression. The general form of the `switch'
-statement looks like this:
-
- switch (EXPRESSION) {
- case VALUE OR REGULAR EXPRESSION:
- CASE-BODY
- default:
- DEFAULT-BODY
- }
-
- Control flow in the `switch' statement works as it does in C. Once a
-match to a given case is made, the case statement bodies execute until
-a `break', `continue', `next', `nextfile' or `exit' is encountered, or
-the end of the `switch' statement itself. For example:
-
- switch (NR * 2 + 1) {
- case 3:
- case "11":
- print NR - 1
- break
-
- case /2[[:digit:]]+/:
- print NR
-
- default:
- print NR + 1
-
- case -1:
- print NR * -1
- }
-
- Note that if none of the statements specified above halt execution
-of a matched `case' statement, execution falls through to the next
-`case' until execution halts. In the above example, for any case value
-starting with `2' followed by one or more digits, the `print' statement
-is executed and then falls through into the `default' section,
-executing its `print' statement. In turn, the -1 case will also be
-executed since the `default' does not halt execution.
-
- This `switch' statement is a `gawk' extension. If `gawk' is in
-compatibility mode (*note Options::), it is not available.
-
-
-File: gawk.info, Node: Break Statement, Next: Continue Statement, Prev: Switch Statement, Up: Statements
-
-7.4.6 The `break' Statement
----------------------------
-
-The `break' statement jumps out of the innermost `for', `while', or
-`do' loop that encloses it. The following example finds the smallest
-divisor of any integer, and also identifies prime numbers:
-
- # find smallest divisor of num
- {
- num = $1
- for (div = 2; div * div <= num; div++) {
- if (num % div == 0)
- break
- }
- if (num % div == 0)
- printf "Smallest divisor of %d is %d\n", num, div
- else
- printf "%d is prime\n", num
- }
-
- When the remainder is zero in the first `if' statement, `awk'
-immediately "breaks out" of the containing `for' loop. This means that
-`awk' proceeds immediately to the statement following the loop and
-continues processing. (This is very different from the `exit'
-statement, which stops the entire `awk' program. *Note Exit
-Statement::.)
-
- The following program illustrates how the CONDITION of a `for' or
-`while' statement could be replaced with a `break' inside an `if':
-
- # find smallest divisor of num
- {
- num = $1
- for (div = 2; ; div++) {
- if (num % div == 0) {
- printf "Smallest divisor of %d is %d\n", num, div
- break
- }
- if (div * div > num) {
- printf "%d is prime\n", num
- break
- }
- }
- }
-
- The `break' statement is also used to break out of the `switch'
-statement. This is discussed in *note Switch Statement::.
-
- The `break' statement has no meaning when used outside the body of a
-loop or `switch'. However, although it was never documented,
-historical implementations of `awk' treated the `break' statement
-outside of a loop as if it were a `next' statement (*note Next
-Statement::). (d.c.) Recent versions of Brian Kernighan's `awk' no
-longer allow this usage, nor does `gawk'.
-
-
-File: gawk.info, Node: Continue Statement, Next: Next Statement, Prev: Break Statement, Up: Statements
-
-7.4.7 The `continue' Statement
-------------------------------
-
-Similar to `break', the `continue' statement is used only inside `for',
-`while', and `do' loops. It skips over the rest of the loop body,
-causing the next cycle around the loop to begin immediately. Contrast
-this with `break', which jumps out of the loop altogether.
-
- The `continue' statement in a `for' loop directs `awk' to skip the
-rest of the body of the loop and resume execution with the
-increment-expression of the `for' statement. The following program
-illustrates this fact:
-
- BEGIN {
- for (x = 0; x <= 20; x++) {
- if (x == 5)
- continue
- printf "%d ", x
- }
- print ""
- }
-
-This program prints all the numbers from 0 to 20--except for 5, for
-which the `printf' is skipped. Because the increment `x++' is not
-skipped, `x' does not remain stuck at 5. Contrast the `for' loop from
-the previous example with the following `while' loop:
-
- BEGIN {
- x = 0
- while (x <= 20) {
- if (x == 5)
- continue
- printf "%d ", x
- x++
- }
- print ""
- }
-
-This program loops forever once `x' reaches 5.
-
- The `continue' statement has no special meaning with respect to the
-`switch' statement, nor does it have any meaning when used outside the
-body of a loop. Historical versions of `awk' treated a `continue'
-statement outside a loop the same way they treated a `break' statement
-outside a loop: as if it were a `next' statement (*note Next
-Statement::). (d.c.) Recent versions of Brian Kernighan's `awk' no
-longer work this way, nor does `gawk'.
-
-
-File: gawk.info, Node: Next Statement, Next: Nextfile Statement, Prev: Continue Statement, Up: Statements
-
-7.4.8 The `next' Statement
---------------------------
-
-The `next' statement forces `awk' to immediately stop processing the
-current record and go on to the next record. This means that no
-further rules are executed for the current record, and the rest of the
-current rule's action isn't executed.
-
- Contrast this with the effect of the `getline' function (*note
-Getline::). That also causes `awk' to read the next record
-immediately, but it does not alter the flow of control in any way
-(i.e., the rest of the current action executes with a new input record).
-
- At the highest level, `awk' program execution is a loop that reads
-an input record and then tests each rule's pattern against it. If you
-think of this loop as a `for' statement whose body contains the rules,
-then the `next' statement is analogous to a `continue' statement. It
-skips to the end of the body of this implicit loop and executes the
-increment (which reads another record).
-
- For example, suppose an `awk' program works only on records with
-four fields, and it shouldn't fail when given bad input. To avoid
-complicating the rest of the program, write a "weed out" rule near the
-beginning, in the following manner:
-
- NF != 4 {
- err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
- print err > "/dev/stderr"
- next
- }
-
-Because of the `next' statement, the program's subsequent rules won't
-see the bad record. The error message is redirected to the standard
-error output stream, as error messages should be. For more detail see
-*note Special Files::.
-
- If the `next' statement causes the end of the input to be reached,
-then the code in any `END' rules is executed. *Note BEGIN/END::.
-
- The `next' statement is not allowed inside `BEGINFILE' and `ENDFILE'
-rules. *Note BEGINFILE/ENDFILE::.
-
- According to the POSIX standard, the behavior is undefined if the
-`next' statement is used in a `BEGIN' or `END' rule. `gawk' treats it
-as a syntax error. Although POSIX permits it, some other `awk'
-implementations don't allow the `next' statement inside function bodies
-(*note User-defined::). Just as with any other `next' statement, a
-`next' statement inside a function body reads the next record and
-starts processing it with the first rule in the program.
-
-
-File: gawk.info, Node: Nextfile Statement, Next: Exit Statement, Prev: Next Statement, Up: Statements
-
-7.4.9 Using `gawk''s `nextfile' Statement
------------------------------------------
-
-`gawk' provides the `nextfile' statement, which is similar to the
-`next' statement. (c.e.) However, instead of abandoning processing of
-the current record, the `nextfile' statement instructs `gawk' to stop
-processing the current data file.
-
- The `nextfile' statement is a `gawk' extension. In most other `awk'
-implementations, or if `gawk' is in compatibility mode (*note
-Options::), `nextfile' is not special.
-
- Upon execution of the `nextfile' statement, any `ENDFILE' rules are
-executed except in the case as mentioned below, `FILENAME' is updated
-to the name of the next data file listed on the command line, `FNR' is
-reset to one, `ARGIND' is incremented, any `BEGINFILE' rules are
-executed, and processing starts over with the first rule in the program.
-(`ARGIND' hasn't been introduced yet. *Note Built-in Variables::.) If
-the `nextfile' statement causes the end of the input to be reached,
-then the code in any `END' rules is executed. An exception to this is
-when the `nextfile' is invoked during execution of any statement in an
-`END' rule; In this case, it causes the program to stop immediately.
-*Note BEGIN/END::.
-
- The `nextfile' statement is useful when there are many data files to
-process but it isn't necessary to process every record in every file.
-Normally, in order to move on to the next data file, a program has to
-continue scanning the unwanted records. The `nextfile' statement
-accomplishes this much more efficiently.
-
- In addition, `nextfile' is useful inside a `BEGINFILE' rule to skip
-over a file that would otherwise cause `gawk' to exit with a fatal
-error. In this case, `ENDFILE' rules are not executed. *Note
-BEGINFILE/ENDFILE::.
-
- While one might think that `close(FILENAME)' would accomplish the
-same as `nextfile', this isn't true. `close()' is reserved for closing
-files, pipes, and coprocesses that are opened with redirections. It is
-not related to the main processing that `awk' does with the files
-listed in `ARGV'.
-
- The current version of the Brian Kernighan's `awk' (*note Other
-Versions::) also supports `nextfile'. However, it doesn't allow the
-`nextfile' statement inside function bodies (*note User-defined::).
-`gawk' does; a `nextfile' inside a function body reads the next record
-and starts processing it with the first rule in the program, just as
-any other `nextfile' statement.
-
-
-File: gawk.info, Node: Exit Statement, Prev: Nextfile Statement, Up: Statements
-
-7.4.10 The `exit' Statement
----------------------------
-
-The `exit' statement causes `awk' to immediately stop executing the
-current rule and to stop processing input; any remaining input is
-ignored. The `exit' statement is written as follows:
-
- exit [RETURN CODE]
-
- When an `exit' statement is executed from a `BEGIN' rule, the
-program stops processing everything immediately. No input records are
-read. However, if an `END' rule is present, as part of executing the
-`exit' statement, the `END' rule is executed (*note BEGIN/END::). If
-`exit' is used in the body of an `END' rule, it causes the program to
-stop immediately.
-
- An `exit' statement that is not part of a `BEGIN' or `END' rule
-stops the execution of any further automatic rules for the current
-record, skips reading any remaining input records, and executes the
-`END' rule if there is one. Any `ENDFILE' rules are also skipped; they
-are not executed.
-
- In such a case, if you don't want the `END' rule to do its job, set
-a variable to nonzero before the `exit' statement and check that
-variable in the `END' rule. *Note Assert Function::, for an example
-that does this.
-
- If an argument is supplied to `exit', its value is used as the exit
-status code for the `awk' process. If no argument is supplied, `exit'
-causes `awk' to return a "success" status. In the case where an
-argument is supplied to a first `exit' statement, and then `exit' is
-called a second time from an `END' rule with no argument, `awk' uses
-the previously supplied exit value. (d.c.) *Note Exit Status::, for
-more information.
-
- For example, suppose an error condition occurs that is difficult or
-impossible to handle. Conventionally, programs report this by exiting
-with a nonzero status. An `awk' program can do this using an `exit'
-statement with a nonzero argument, as shown in the following example:
-
- BEGIN {
- if (("date" | getline date_now) <= 0) {
- print "Can't get system date" > "/dev/stderr"
- exit 1
- }
- print "current date is", date_now
- close("date")
- }
-
- NOTE: For full portability, exit values should be between zero and
- 126, inclusive. Negative values, and values of 127 or greater,
- may not produce consistent results across different operating
- systems.
-
-
-File: gawk.info, Node: Built-in Variables, Prev: Statements, Up: Patterns and Actions
-
-7.5 Built-in Variables
-======================
-
-Most `awk' variables are available to use for your own purposes; they
-never change unless your program assigns values to them, and they never
-affect anything unless your program examines them. However, a few
-variables in `awk' have special built-in meanings. `awk' examines some
-of these automatically, so that they enable you to tell `awk' how to do
-certain things. Others are set automatically by `awk', so that they
-carry information from the internal workings of `awk' to your program.
-
- This minor node documents all the built-in variables of `gawk', most
-of which are also documented in the chapters describing their areas of
-activity.
-
-* Menu:
-
-* User-modified:: Built-in variables that you change to control
- `awk'.
-* Auto-set:: Built-in variables where `awk' gives
- you information.
-* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
-
-
-File: gawk.info, Node: User-modified, Next: Auto-set, Up: Built-in Variables
-
-7.5.1 Built-in Variables That Control `awk'
--------------------------------------------
-
-The following is an alphabetical list of variables that you can change
-to control how `awk' does certain things. The variables that are
-specific to `gawk' are marked with a pound sign (`#').
-
-`BINMODE #'
- On non-POSIX systems, this variable specifies use of binary mode
- for all I/O. Numeric values of one, two, or three specify that
- input files, output files, or all files, respectively, should use
- binary I/O. A numeric value less than zero is treated as zero,
- and a numeric value greater than three is treated as three.
- Alternatively, string values of `"r"' or `"w"' specify that input
- files and output files, respectively, should use binary I/O. A
- string value of `"rw"' or `"wr"' indicates that all files should
- use binary I/O. Any other string value is treated the same as
- `"rw"', but causes `gawk' to generate a warning message.
- `BINMODE' is described in more detail in *note PC Using::.
-
- This variable is a `gawk' extension. In other `awk'
- implementations (except `mawk', *note Other Versions::), or if
- `gawk' is in compatibility mode (*note Options::), it is not
- special.
-
-`CONVFMT'
- This string controls conversion of numbers to strings (*note
- Conversion::). It works by being passed, in effect, as the first
- argument to the `sprintf()' function (*note String Functions::).
- Its default value is `"%.6g"'. `CONVFMT' was introduced by the
- POSIX standard.
-
-`FIELDWIDTHS #'
- This is a space-separated list of columns that tells `gawk' how to
- split input with fixed columnar boundaries. Assigning a value to
- `FIELDWIDTHS' overrides the use of `FS' and `FPAT' for field
- splitting. *Note Constant Size::, for more information.
-
- If `gawk' is in compatibility mode (*note Options::), then
- `FIELDWIDTHS' has no special meaning, and field-splitting
- operations occur based exclusively on the value of `FS'.
-
-`FPAT #'
- This is a regular expression (as a string) that tells `gawk' to
- create the fields based on text that matches the regular
- expression. Assigning a value to `FPAT' overrides the use of `FS'
- and `FIELDWIDTHS' for field splitting. *Note Splitting By
- Content::, for more information.
-
- If `gawk' is in compatibility mode (*note Options::), then `FPAT'
- has no special meaning, and field-splitting operations occur based
- exclusively on the value of `FS'.
-
-`FS'
- This is the input field separator (*note Field Separators::). The
- value is a single-character string or a multi-character regular
- expression that matches the separations between fields in an input
- record. If the value is the null string (`""'), then each
- character in the record becomes a separate field. (This behavior
- is a `gawk' extension. POSIX `awk' does not specify the behavior
- when `FS' is the null string. Nonetheless, some other versions of
- `awk' also treat `""' specially.)
-
- The default value is `" "', a string consisting of a single space.
- As a special exception, this value means that any sequence of
- spaces, TABs, and/or newlines is a single separator.(1) It also
- causes spaces, TABs, and newlines at the beginning and end of a
- record to be ignored.
-
- You can set the value of `FS' on the command line using the `-F'
- option:
-
- awk -F, 'PROGRAM' INPUT-FILES
-
- If `gawk' is using `FIELDWIDTHS' or `FPAT' for field splitting,
- assigning a value to `FS' causes `gawk' to return to the normal,
- `FS'-based field splitting. An easy way to do this is to simply
- say `FS = FS', perhaps with an explanatory comment.
-
-`IGNORECASE #'
- If `IGNORECASE' is nonzero or non-null, then all string comparisons
- and all regular expression matching are case independent. Thus,
- regexp matching with `~' and `!~', as well as the `gensub()',
- `gsub()', `index()', `match()', `patsplit()', `split()', and
- `sub()' functions, record termination with `RS', and field
- splitting with `FS' and `FPAT', all ignore case when doing their
- particular regexp operations. However, the value of `IGNORECASE'
- does _not_ affect array subscripting and it does not affect field
- splitting when using a single-character field separator. *Note
- Case-sensitivity::.
-
- If `gawk' is in compatibility mode (*note Options::), then
- `IGNORECASE' has no special meaning. Thus, string and regexp
- operations are always case-sensitive.
-
-`LINT #'
- When this variable is true (nonzero or non-null), `gawk' behaves
- as if the `--lint' command-line option is in effect. (*note
- Options::). With a value of `"fatal"', lint warnings become fatal
- errors. With a value of `"invalid"', only warnings about things
- that are actually invalid are issued. (This is not fully
- implemented yet.) Any other true value prints nonfatal warnings.
- Assigning a false value to `LINT' turns off the lint warnings.
-
- This variable is a `gawk' extension. It is not special in other
- `awk' implementations. Unlike the other special variables,
- changing `LINT' does affect the production of lint warnings, even
- if `gawk' is in compatibility mode. Much as the `--lint' and
- `--traditional' options independently control different aspects of
- `gawk''s behavior, the control of lint warnings during program
- execution is independent of the flavor of `awk' being executed.
-
-`OFMT'
- This string controls conversion of numbers to strings (*note
- Conversion::) for printing with the `print' statement. It works
- by being passed as the first argument to the `sprintf()' function
- (*note String Functions::). Its default value is `"%.6g"'.
- Earlier versions of `awk' also used `OFMT' to specify the format
- for converting numbers to strings in general expressions; this is
- now done by `CONVFMT'.
-
-`OFS'
- This is the output field separator (*note Output Separators::).
- It is output between the fields printed by a `print' statement.
- Its default value is `" "', a string consisting of a single space.
-
-`ORS'
- This is the output record separator. It is output at the end of
- every `print' statement. Its default value is `"\n"', the newline
- character. (*Note Output Separators::.)
-
-`PREC #'
- The working precision of arbitrary precision floating-point
- numbers, 53 by default (*note Setting Precision::).
-
-`ROUNDMODE #'
- The rounding mode to use for arbitrary precision arithmetic on
- numbers, by default `"N"' (`roundTiesToEven' in the IEEE-754
- standard) (*note Setting Rounding Mode::).
-
-`RS'
- This is `awk''s input record separator. Its default value is a
- string containing a single newline character, which means that an
- input record consists of a single line of text. It can also be
- the null string, in which case records are separated by runs of
- blank lines. If it is a regexp, records are separated by matches
- of the regexp in the input text. (*Note Records::.)
-
- The ability for `RS' to be a regular expression is a `gawk'
- extension. In most other `awk' implementations, or if `gawk' is
- in compatibility mode (*note Options::), just the first character
- of `RS''s value is used.
-
-`SUBSEP'
- This is the subscript separator. It has the default value of
- `"\034"' and is used to separate the parts of the indices of a
- multidimensional array. Thus, the expression `foo["A", "B"]'
- really accesses `foo["A\034B"]' (*note Multi-dimensional::).
-
-`TEXTDOMAIN #'
- This variable is used for internationalization of programs at the
- `awk' level. It sets the default text domain for specially marked
- string constants in the source text, as well as for the
- `dcgettext()', `dcngettext()' and `bindtextdomain()' functions
- (*note Internationalization::). The default value of `TEXTDOMAIN'
- is `"messages"'.
-
- This variable is a `gawk' extension. In other `awk'
- implementations, or if `gawk' is in compatibility mode (*note
- Options::), it is not special.
-
- ---------- Footnotes ----------
-
- (1) In POSIX `awk', newline does not count as whitespace.
-
-
-File: gawk.info, Node: Auto-set, Next: ARGC and ARGV, Prev: User-modified, Up: Built-in Variables
-
-7.5.2 Built-in Variables That Convey Information
-------------------------------------------------
-
-The following is an alphabetical list of variables that `awk' sets
-automatically on certain occasions in order to provide information to
-your program. The variables that are specific to `gawk' are marked
-with a pound sign (`#').
-
-`ARGC, ARGV'
- The command-line arguments available to `awk' programs are stored
- in an array called `ARGV'. `ARGC' is the number of command-line
- arguments present. *Note Other Arguments::. Unlike most `awk'
- arrays, `ARGV' is indexed from 0 to `ARGC' - 1. In the following
- example:
-
- $ awk 'BEGIN {
- > for (i = 0; i < ARGC; i++)
- > print ARGV[i]
- > }' inventory-shipped BBS-list
- -| awk
- -| inventory-shipped
- -| BBS-list
-
- `ARGV[0]' contains `awk', `ARGV[1]' contains `inventory-shipped',
- and `ARGV[2]' contains `BBS-list'. The value of `ARGC' is three,
- one more than the index of the last element in `ARGV', because the
- elements are numbered from zero.
-
- The names `ARGC' and `ARGV', as well as the convention of indexing
- the array from 0 to `ARGC' - 1, are derived from the C language's
- method of accessing command-line arguments.
-
- The value of `ARGV[0]' can vary from system to system. Also, you
- should note that the program text is _not_ included in `ARGV', nor
- are any of `awk''s command-line options. *Note ARGC and ARGV::,
- for information about how `awk' uses these variables. (d.c.)
-
-`ARGIND #'
- The index in `ARGV' of the current file being processed. Every
- time `gawk' opens a new data file for processing, it sets `ARGIND'
- to the index in `ARGV' of the file name. When `gawk' is
- processing the input files, `FILENAME == ARGV[ARGIND]' is always
- true.
-
- This variable is useful in file processing; it allows you to tell
- how far along you are in the list of data files as well as to
- distinguish between successive instances of the same file name on
- the command line.
-
- While you can change the value of `ARGIND' within your `awk'
- program, `gawk' automatically sets it to a new value when the next
- file is opened.
-
- This variable is a `gawk' extension. In other `awk'
- implementations, or if `gawk' is in compatibility mode (*note
- Options::), it is not special.
-
-`ENVIRON'
- An associative array containing the values of the environment.
- The array indices are the environment variable names; the elements
- are the values of the particular environment variables. For
- example, `ENVIRON["HOME"]' might be `/home/arnold'. Changing this
- array does not affect the environment passed on to any programs
- that `awk' may spawn via redirection or the `system()' function.
-
- Some operating systems may not have environment variables. On
- such systems, the `ENVIRON' array is empty (except for
- `ENVIRON["AWKPATH"]', *note AWKPATH Variable:: and
- `ENVIRON["AWKLIBPATH"]', *note AWKLIBPATH Variable::).
-
-`ERRNO #'
- If a system error occurs during a redirection for `getline',
- during a read for `getline', or during a `close()' operation, then
- `ERRNO' contains a string describing the error.
-
- In addition, `gawk' clears `ERRNO' before opening each
- command-line input file. This enables checking if the file is
- readable inside a `BEGINFILE' pattern (*note BEGINFILE/ENDFILE::).
-
- Otherwise, `ERRNO' works similarly to the C variable `errno'.
- Except for the case just mentioned, `gawk' _never_ clears it (sets
- it to zero or `""'). Thus, you should only expect its value to be
- meaningful when an I/O operation returns a failure value, such as
- `getline' returning -1. You are, of course, free to clear it
- yourself before doing an I/O operation.
-
- This variable is a `gawk' extension. In other `awk'
- implementations, or if `gawk' is in compatibility mode (*note
- Options::), it is not special.
-
-`FILENAME'
- The name of the file that `awk' is currently reading. When no
- data files are listed on the command line, `awk' reads from the
- standard input and `FILENAME' is set to `"-"'. `FILENAME' is
- changed each time a new file is read (*note Reading Files::).
- Inside a `BEGIN' rule, the value of `FILENAME' is `""', since
- there are no input files being processed yet.(1) (d.c.) Note,
- though, that using `getline' (*note Getline::) inside a `BEGIN'
- rule can give `FILENAME' a value.
-
-`FNR'
- The current record number in the current file. `FNR' is
- incremented each time a new record is read (*note Records::). It
- is reinitialized to zero each time a new input file is started.
-
-`NF'
- The number of fields in the current input record. `NF' is set
- each time a new record is read, when a new field is created or
- when `$0' changes (*note Fields::).
-
- Unlike most of the variables described in this node, assigning a
- value to `NF' has the potential to affect `awk''s internal
- workings. In particular, assignments to `NF' can be used to
- create or remove fields from the current record. *Note Changing
- Fields::.
-
-`NR'
- The number of input records `awk' has processed since the
- beginning of the program's execution (*note Records::). `NR' is
- incremented each time a new record is read.
-
-`PROCINFO #'
- The elements of this array provide access to information about the
- running `awk' program. The following elements (listed
- alphabetically) are guaranteed to be available:
-
- `PROCINFO["egid"]'
- The value of the `getegid()' system call.
-
- `PROCINFO["euid"]'
- The value of the `geteuid()' system call.
-
- `PROCINFO["FS"]'
- This is `"FS"' if field splitting with `FS' is in effect,
- `"FIELDWIDTHS"' if field splitting with `FIELDWIDTHS' is in
- effect, or `"FPAT"' if field matching with `FPAT' is in
- effect.
-
- `PROCINFO["gid"]'
- The value of the `getgid()' system call.
-
- `PROCINFO["pgrpid"]'
- The process group ID of the current process.
-
- `PROCINFO["pid"]'
- The process ID of the current process.
-
- `PROCINFO["ppid"]'
- The parent process ID of the current process.
-
- `PROCINFO["sorted_in"]'
- If this element exists in `PROCINFO', its value controls the
- order in which array indices will be processed by `for (index
- in array) ...' loops. Since this is an advanced feature, we
- defer the full description until later; see *note Scanning an
- Array::.
-
- `PROCINFO["strftime"]'
- The default time format string for `strftime()'. Assigning a
- new value to this element changes the default. *Note Time
- Functions::.
-
- `PROCINFO["uid"]'
- The value of the `getuid()' system call.
-
- `PROCINFO["version"]'
- The version of `gawk'.
-
- The following additional elements in the array are available to
- provide information about the MPFR and GMP libraries if your
- version of `gawk' supports arbitrary precision numbers (*note
- Arbitrary Precision Arithmetic::):
-
- `PROCINFO["mpfr_version"]'
- The version of the GNU MPFR library.
-
- `PROCINFO["gmp_version"]'
- The version of the GNU MP library.
-
- `PROCINFO["prec_max"]'
- The maximum precision supported by MPFR.
-
- `PROCINFO["prec_min"]'
- The minimum precision required by MPFR.
-
- On some systems, there may be elements in the array, `"group1"'
- through `"groupN"' for some N. N is the number of supplementary
- groups that the process has. Use the `in' operator to test for
- these elements (*note Reference to Elements::).
-
- The `PROCINFO' array is also used to cause coprocesses to
- communicate over pseudo-ttys instead of through two-way pipes;
- this is discussed further in *note Two-way I/O::.
-
- This array is a `gawk' extension. In other `awk' implementations,
- or if `gawk' is in compatibility mode (*note Options::), it is not
- special.
-
-`RLENGTH'
- The length of the substring matched by the `match()' function
- (*note String Functions::). `RLENGTH' is set by invoking the
- `match()' function. Its value is the length of the matched
- string, or -1 if no match is found.
-
-`RSTART'
- The start-index in characters of the substring that is matched by
- the `match()' function (*note String Functions::). `RSTART' is
- set by invoking the `match()' function. Its value is the position
- of the string where the matched substring starts, or zero if no
- match was found.
-
-`RT #'
- This is set each time a record is read. It contains the input text
- that matched the text denoted by `RS', the record separator.
-
- This variable is a `gawk' extension. In other `awk'
- implementations, or if `gawk' is in compatibility mode (*note
- Options::), it is not special.
-
-Advanced Notes: Changing `NR' and `FNR'
----------------------------------------
-
-`awk' increments `NR' and `FNR' each time it reads a record, instead of
-setting them to the absolute value of the number of records read. This
-means that a program can change these variables and their new values
-are incremented for each record. (d.c.) The following example shows
-this:
-
- $ echo '1
- > 2
- > 3
- > 4' | awk 'NR == 2 { NR = 17 }
- > { print NR }'
- -| 1
- -| 17
- -| 18
- -| 19
-
-Before `FNR' was added to the `awk' language (*note V7/SVR3.1::), many
-`awk' programs used this feature to track the number of records in a
-file by resetting `NR' to zero when `FILENAME' changed.
-
- ---------- Footnotes ----------
-
- (1) Some early implementations of Unix `awk' initialized `FILENAME'
-to `"-"', even if there were data files to be processed. This behavior
-was incorrect and should not be relied upon in your programs.
-
-
-File: gawk.info, Node: ARGC and ARGV, Prev: Auto-set, Up: Built-in Variables
-
-7.5.3 Using `ARGC' and `ARGV'
------------------------------
-
-*note Auto-set::, presented the following program describing the
-information contained in `ARGC' and `ARGV':
-
- $ awk 'BEGIN {
- > for (i = 0; i < ARGC; i++)
- > print ARGV[i]
- > }' inventory-shipped BBS-list
- -| awk
- -| inventory-shipped
- -| BBS-list
-
-In this example, `ARGV[0]' contains `awk', `ARGV[1]' contains
-`inventory-shipped', and `ARGV[2]' contains `BBS-list'. Notice that
-the `awk' program is not entered in `ARGV'. The other command-line
-options, with their arguments, are also not entered. This includes
-variable assignments done with the `-v' option (*note Options::).
-Normal variable assignments on the command line _are_ treated as
-arguments and do show up in the `ARGV' array. Given the following
-program in a file named `showargs.awk':
-
- BEGIN {
- printf "A=%d, B=%d\n", A, B
- for (i = 0; i < ARGC; i++)
- printf "\tARGV[%d] = %s\n", i, ARGV[i]
- }
- END { printf "A=%d, B=%d\n", A, B }
-
-Running it produces the following:
-
- $ awk -v A=1 -f showargs.awk B=2 /dev/null
- -| A=1, B=0
- -| ARGV[0] = awk
- -| ARGV[1] = B=2
- -| ARGV[2] = /dev/null
- -| A=1, B=2
-
- A program can alter `ARGC' and the elements of `ARGV'. Each time
-`awk' reaches the end of an input file, it uses the next element of
-`ARGV' as the name of the next input file. By storing a different
-string there, a program can change which files are read. Use `"-"' to
-represent the standard input. Storing additional elements and
-incrementing `ARGC' causes additional files to be read.
-
- If the value of `ARGC' is decreased, that eliminates input files
-from the end of the list. By recording the old value of `ARGC'
-elsewhere, a program can treat the eliminated arguments as something
-other than file names.
-
- To eliminate a file from the middle of the list, store the null
-string (`""') into `ARGV' in place of the file's name. As a special
-feature, `awk' ignores file names that have been replaced with the null
-string. Another option is to use the `delete' statement to remove
-elements from `ARGV' (*note Delete::).
-
- All of these actions are typically done in the `BEGIN' rule, before
-actual processing of the input begins. *Note Split Program::, and see
-*note Tee Program::, for examples of each way of removing elements from
-`ARGV'. The following fragment processes `ARGV' in order to examine,
-and then remove, command-line options:
-
- BEGIN {
- for (i = 1; i < ARGC; i++) {
- if (ARGV[i] == "-v")
- verbose = 1
- else if (ARGV[i] == "-q")
- debug = 1
- else if (ARGV[i] ~ /^-./) {
- e = sprintf("%s: unrecognized option -- %c",
- ARGV[0], substr(ARGV[i], 2, 1))
- print e > "/dev/stderr"
- } else
- break
- delete ARGV[i]
- }
- }
-
- To actually get the options into the `awk' program, end the `awk'
-options with `--' and then supply the `awk' program's options, in the
-following manner:
-
- awk -f myprog -- -v -q file1 file2 ...
-
- This is not necessary in `gawk'. Unless `--posix' has been
-specified, `gawk' silently puts any unrecognized options into `ARGV'
-for the `awk' program to deal with. As soon as it sees an unknown
-option, `gawk' stops looking for other options that it might otherwise
-recognize. The previous example with `gawk' would be:
-
- gawk -f myprog -q -v file1 file2 ...
-
-Because `-q' is not a valid `gawk' option, it and the following `-v'
-are passed on to the `awk' program. (*Note Getopt Function::, for an
-`awk' library function that parses command-line options.)
-
-
-File: gawk.info, Node: Arrays, Next: Functions, Prev: Patterns and Actions, Up: Top
-
-8 Arrays in `awk'
-*****************
-
-An "array" is a table of values called "elements". The elements of an
-array are distinguished by their "indices". Indices may be either
-numbers or strings.
-
- This major node describes how arrays work in `awk', how to use array
-elements, how to scan through every element in an array, and how to
-remove array elements. It also describes how `awk' simulates
-multidimensional arrays, as well as some of the less obvious points
-about array usage. The major node moves on to discuss `gawk''s facility
-for sorting arrays, and ends with a brief description of `gawk''s
-ability to support true multidimensional arrays.
-
- `awk' maintains a single set of names that may be used for naming
-variables, arrays, and functions (*note User-defined::). Thus, you
-cannot have a variable and an array with the same name in the same
-`awk' program.
-
-* Menu:
-
-* Array Basics:: The basics of arrays.
-* Delete:: The `delete' statement removes an element
- from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- `awk'.
-* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- `awk'.
-* Arrays of Arrays:: True multidimensional arrays.
-
-
-File: gawk.info, Node: Array Basics, Next: Delete, Up: Arrays
-
-8.1 The Basics of Arrays
-========================
-
-This minor node presents the basics: working with elements in arrays
-one at a time, and traversing all of the elements in an array.
-
-* Menu:
-
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the `for' statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-
-
-File: gawk.info, Node: Array Intro, Next: Reference to Elements, Up: Array Basics
-
-8.1.1 Introduction to Arrays
-----------------------------
-
- Doing linear scans over an associative array is like trying to
- club someone to death with a loaded Uzi.
- Larry Wall
-
- The `awk' language provides one-dimensional arrays for storing
-groups of related strings or numbers. Every `awk' array must have a
-name. Array names have the same syntax as variable names; any valid
-variable name would also be a valid array name. But one name cannot be
-used in both ways (as an array and as a variable) in the same `awk'
-program.
-
- Arrays in `awk' superficially resemble arrays in other programming
-languages, but there are fundamental differences. In `awk', it isn't
-necessary to specify the size of an array before starting to use it.
-Additionally, any number or string in `awk', not just consecutive
-integers, may be used as an array index.
-
- In most other languages, arrays must be "declared" before use,
-including a specification of how many elements or components they
-contain. In such languages, the declaration causes a contiguous block
-of memory to be allocated for that many elements. Usually, an index in
-the array must be a positive integer. For example, the index zero
-specifies the first element in the array, which is actually stored at
-the beginning of the block of memory. Index one specifies the second
-element, which is stored in memory right after the first element, and
-so on. It is impossible to add more elements to the array, because it
-has room only for as many elements as given in the declaration. (Some
-languages allow arbitrary starting and ending indices--e.g., `15 ..
-27'--but the size of the array is still fixed when the array is
-declared.)
-
- A contiguous array of four elements might look like the following
-example, conceptually, if the element values are 8, `"foo"', `""', and
-30:
-
- +---------+---------+--------+---------+
- | 8 | "foo" | "" | 30 | Value
- +---------+---------+--------+---------+
- 0 1 2 3 Index
-
-Only the values are stored; the indices are implicit from the order of
-the values. Here, 8 is the value at index zero, because 8 appears in the
-position with zero elements before it.
-
- Arrays in `awk' are different--they are "associative". This means
-that each array is a collection of pairs: an index and its corresponding
-array element value:
-
- Index 3 Value 30
- Index 1 Value "foo"
- Index 0 Value 8
- Index 2 Value ""
-
-The pairs are shown in jumbled order because their order is irrelevant.
-
- One advantage of associative arrays is that new pairs can be added
-at any time. For example, suppose a tenth element is added to the array
-whose value is `"number ten"'. The result is:
-
- Index 10 Value "number ten"
- Index 3 Value 30
- Index 1 Value "foo"
- Index 0 Value 8
- Index 2 Value ""
-
-Now the array is "sparse", which just means some indices are missing.
-It has elements 0-3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or
-9.
-
- Another consequence of associative arrays is that the indices don't
-have to be positive integers. Any number, or even a string, can be an
-index. For example, the following is an array that translates words
-from English to French:
-
- Index "dog" Value "chien"
- Index "cat" Value "chat"
- Index "one" Value "un"
- Index 1 Value "un"
-
-Here we decided to translate the number one in both spelled-out and
-numeric form--thus illustrating that a single array can have both
-numbers and strings as indices. In fact, array subscripts are always
-strings; this is discussed in more detail in *note Numeric Array
-Subscripts::. Here, the number `1' isn't double-quoted, since `awk'
-automatically converts it to a string.
-
- The value of `IGNORECASE' has no effect upon array subscripting.
-The identical string value used to store an array element must be used
-to retrieve it. When `awk' creates an array (e.g., with the `split()'
-built-in function), that array's indices are consecutive integers
-starting at one. (*Note String Functions::.)
-
- `awk''s arrays are efficient--the time to access an element is
-independent of the number of elements in the array.
-
-
-File: gawk.info, Node: Reference to Elements, Next: Assigning Elements, Prev: Array Intro, Up: Array Basics
-
-8.1.2 Referring to an Array Element
------------------------------------
-
-The principal way to use an array is to refer to one of its elements.
-An array reference is an expression as follows:
-
- ARRAY[INDEX-EXPRESSION]
-
-Here, ARRAY is the name of an array. The expression INDEX-EXPRESSION is
-the index of the desired element of the array.
-
- The value of the array reference is the current value of that array
-element. For example, `foo[4.3]' is an expression for the element of
-array `foo' at index `4.3'.
-
- A reference to an array element that has no recorded value yields a
-value of `""', the null string. This includes elements that have not
-been assigned any value as well as elements that have been deleted
-(*note Delete::).
-
- NOTE: A reference to an element that does not exist
- _automatically_ creates that array element, with the null string
- as its value. (In some cases, this is unfortunate, because it
- might waste memory inside `awk'.)
-
- Novice `awk' programmers often make the mistake of checking if an
- element exists by checking if the value is empty:
-
- # Check if "foo" exists in a: Incorrect!
- if (a["foo"] != "") ...
-
- This is incorrect, since this will _create_ `a["foo"]' if it
- didn't exist before!
-
- To determine whether an element exists in an array at a certain
-index, use the following expression:
-
- IND in ARRAY
-
-This expression tests whether the particular index IND exists, without
-the side effect of creating that element if it is not present. The
-expression has the value one (true) if `ARRAY[IND]' exists and zero
-(false) if it does not exist. For example, this statement tests
-whether the array `frequencies' contains the index `2':
-
- if (2 in frequencies)
- print "Subscript 2 is present."
-
- Note that this is _not_ a test of whether the array `frequencies'
-contains an element whose _value_ is two. There is no way to do that
-except to scan all the elements. Also, this _does not_ create
-`frequencies[2]', while the following (incorrect) alternative does:
-
- if (frequencies[2] != "")
- print "Subscript 2 is present."
-
-
-File: gawk.info, Node: Assigning Elements, Next: Array Example, Prev: Reference to Elements, Up: Array Basics
-
-8.1.3 Assigning Array Elements
-------------------------------
-
-Array elements can be assigned values just like `awk' variables:
-
- ARRAY[INDEX-EXPRESSION] = VALUE
-
-ARRAY is the name of an array. The expression INDEX-EXPRESSION is the
-index of the element of the array that is assigned a value. The
-expression VALUE is the value to assign to that element of the array.
-
-
-File: gawk.info, Node: Array Example, Next: Scanning an Array, Prev: Assigning Elements, Up: Array Basics
-
-8.1.4 Basic Array Example
--------------------------
-
-The following program takes a list of lines, each beginning with a line
-number, and prints them out in order of line number. The line numbers
-are not in order when they are first read--instead they are scrambled.
-This program sorts the lines by making an array using the line numbers
-as subscripts. The program then prints out the lines in sorted order
-of their numbers. It is a very simple program and gets confused upon
-encountering repeated numbers, gaps, or lines that don't begin with a
-number:
-
- {
- if ($1 > max)
- max = $1
- arr[$1] = $0
- }
-
- END {
- for (x = 1; x <= max; x++)
- print arr[x]
- }
-
- The first rule keeps track of the largest line number seen so far;
-it also stores each line into the array `arr', at an index that is the
-line's number. The second rule runs after all the input has been read,
-to print out all the lines. When this program is run with the
-following input:
-
- 5 I am the Five man
- 2 Who are you? The new number two!
- 4 . . . And four on the floor
- 1 Who is number one?
- 3 I three you.
-
-Its output is:
-
- 1 Who is number one?
- 2 Who are you? The new number two!
- 3 I three you.
- 4 . . . And four on the floor
- 5 I am the Five man
-
- If a line number is repeated, the last line with a given number
-overrides the others. Gaps in the line numbers can be handled with an
-easy improvement to the program's `END' rule, as follows:
-
- END {
- for (x = 1; x <= max; x++)
- if (x in arr)
- print arr[x]
- }
-
-
-File: gawk.info, Node: Scanning an Array, Next: Controlling Scanning, Prev: Array Example, Up: Array Basics
-
-8.1.5 Scanning All Elements of an Array
----------------------------------------
-
-In programs that use arrays, it is often necessary to use a loop that
-executes once for each element of an array. In other languages, where
-arrays are contiguous and indices are limited to positive integers,
-this is easy: all the valid indices can be found by counting from the
-lowest index up to the highest. This technique won't do the job in
-`awk', because any number or string can be an array index. So `awk'
-has a special kind of `for' statement for scanning an array:
-
- for (VAR in ARRAY)
- BODY
-
-This loop executes BODY once for each index in ARRAY that the program
-has previously used, with the variable VAR set to that index.
-
- The following program uses this form of the `for' statement. The
-first rule scans the input records and notes which words appear (at
-least once) in the input, by storing a one into the array `used' with
-the word as index. The second rule scans the elements of `used' to
-find all the distinct words that appear in the input. It prints each
-word that is more than 10 characters long and also prints the number of
-such words. *Note String Functions::, for more information on the
-built-in function `length()'.
-
- # Record a 1 for each word that is used at least once
- {
- for (i = 1; i <= NF; i++)
- used[$i] = 1
- }
-
- # Find number of distinct words more than 10 characters long
- END {
- for (x in used) {
- if (length(x) > 10) {
- ++num_long_words
- print x
- }
- }
- print num_long_words, "words longer than 10 characters"
- }
-
-*Note Word Sorting::, for a more detailed example of this type.
-
- The order in which elements of the array are accessed by this
-statement is determined by the internal arrangement of the array
-elements within `awk' and normally cannot be controlled or changed.
-This can lead to problems if new elements are added to ARRAY by
-statements in the loop body; it is not predictable whether the `for'
-loop will reach them. Similarly, changing VAR inside the loop may
-produce strange results. It is best to avoid such things.
-
-
-File: gawk.info, Node: Controlling Scanning, Prev: Scanning an Array, Up: Array Basics
-
-8.1.6 Using Predefined Array Scanning Orders
---------------------------------------------
-
-By default, when a `for' loop traverses an array, the order is
-undefined, meaning that the `awk' implementation determines the order
-in which the array is traversed. This order is usually based on the
-internal implementation of arrays and will vary from one version of
-`awk' to the next.
-
- Often, though, you may wish to do something simple, such as
-"traverse the array by comparing the indices in ascending order," or
-"traverse the array by on comparing the values in descending order."
-`gawk' provides two mechanisms which give you this control.
-
- * Set `PROCINFO["sorted_in"]' to one of a set of predefined values.
- We describe this now.
-
- * Set `PROCINFO["sorted_in"]' to the name of a user-defined function
- to be used for comparison of array elements. This advanced feature
- is described later, in *note Array Sorting::.
-
- The following special values for `PROCINFO["sorted_in"]' are
-available:
-
-`"@unsorted"'
- Array elements are processed in arbitrary order, which is the
- default `awk' behavior.
-
-`"@ind_str_asc"'
- Order by indices compared as strings; this is the most basic sort.
- (Internally, array indices are always strings, so with `a[2*5] = 1'
- the index is `"10"' rather than numeric 10.)
-
-`"@ind_num_asc"'
- Order by indices but force them to be treated as numbers in the
- process. Any index with a non-numeric value will end up
- positioned as if it were zero.
-
-`"@val_type_asc"'
- Order by element values rather than indices. Ordering is by the
- type assigned to the element (*note Typing and Comparison::). All
- numeric values come before all string values, which in turn come
- before all subarrays. (Subarrays have not been described yet;
- *note Arrays of Arrays::).
-
-`"@val_str_asc"'
- Order by element values rather than by indices. Scalar values are
- compared as strings. Subarrays, if present, come out last.
-
-`"@val_num_asc"'
- Order by element values rather than by indices. Scalar values are
- compared as numbers. Subarrays, if present, come out last. When
- numeric values are equal, the string values are used to provide an
- ordering: this guarantees consistent results across different
- versions of the C `qsort()' function,(1) which `gawk' uses
- internally to perform the sorting.
-
-`"@ind_str_desc"'
- Reverse order from the most basic sort.
-
-`"@ind_num_desc"'
- Numeric indices ordered from high to low.
-
-`"@val_type_desc"'
- Element values, based on type, in descending order.
-
-`"@val_str_desc"'
- Element values, treated as strings, ordered from high to low.
- Subarrays, if present, come out first.
-
-`"@val_num_desc"'
- Element values, treated as numbers, ordered from high to low.
- Subarrays, if present, come out first.
-
- The array traversal order is determined before the `for' loop starts
-to run. Changing `PROCINFO["sorted_in"]' in the loop body will not
-affect the loop.
-
- For example:
-
- $ gawk 'BEGIN {
- > a[4] = 4
- > a[3] = 3
- > for (i in a)
- > print i, a[i]
- > }'
- -| 4 4
- -| 3 3
- $ gawk 'BEGIN {
- > PROCINFO["sorted_in"] = "@ind_str_asc"
- > a[4] = 4
- > a[3] = 3
- > for (i in a)
- > print i, a[i]
- > }'
- -| 3 3
- -| 4 4
-
- When sorting an array by element values, if a value happens to be a
-subarray then it is considered to be greater than any string or numeric
-value, regardless of what the subarray itself contains, and all
-subarrays are treated as being equal to each other. Their order
-relative to each other is determined by their index strings.
-
- Here are some additional things to bear in mind about sorted array
-traversal.
-
- * The value of `PROCINFO["sorted_in"]' is global. That is, it affects
- all array traversal `for' loops. If you need to change it within
- your own code, you should see if it's defined and save and restore
- the value:
-
- ...
- if ("sorted_in" in PROCINFO) {
- save_sorted = PROCINFO["sorted_in"]
- PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
- }
- ...
- if (save_sorted)
- PROCINFO["sorted_in"] = save_sorted
-
- * As mentioned, the default array traversal order is represented by
- `"@unsorted"'. You can also get the default behavior by assigning
- the null string to `PROCINFO["sorted_in"]' or by just deleting the
- `"sorted_in"' element from the `PROCINFO' array with the `delete'
- statement. (The `delete' statement hasn't been described yet;
- *note Delete::.)
-
- In addition, `gawk' provides built-in functions for sorting arrays;
-see *note Array Sorting Functions::.
-
- ---------- Footnotes ----------
-
- (1) When two elements compare as equal, the C `qsort()' function
-does not guarantee that they will maintain their original relative
-order after sorting. Using the string value to provide a unique
-ordering when the numeric values are equal ensures that `gawk' behaves
-consistently across different environments.
-
-
-File: gawk.info, Node: Delete, Next: Numeric Array Subscripts, Prev: Array Basics, Up: Arrays
-
-8.2 The `delete' Statement
-==========================
-
-To remove an individual element of an array, use the `delete' statement:
-
- delete ARRAY[INDEX-EXPRESSION]
-
- Once an array element has been deleted, any value the element once
-had is no longer available. It is as if the element had never been
-referred to or been given a value. The following is an example of
-deleting elements in an array:
-
- for (i in frequencies)
- delete frequencies[i]
-
-This example removes all the elements from the array `frequencies'.
-Once an element is deleted, a subsequent `for' statement to scan the
-array does not report that element and the `in' operator to check for
-the presence of that element returns zero (i.e., false):
-
- delete foo[4]
- if (4 in foo)
- print "This will never be printed"
-
- It is important to note that deleting an element is _not_ the same
-as assigning it a null value (the empty string, `""'). For example:
-
- foo[4] = ""
- if (4 in foo)
- print "This is printed, even though foo[4] is empty"
-
- It is not an error to delete an element that does not exist.
-However, if `--lint' is provided on the command line (*note Options::),
-`gawk' issues a warning message when an element that is not in the
-array is deleted.
-
- All the elements of an array may be deleted with a single statement
-(c.e.) by leaving off the subscript in the `delete' statement, as
-follows:
-
- delete ARRAY
-
- This ability is a `gawk' extension; it is not available in
-compatibility mode (*note Options::).
-
- Using this version of the `delete' statement is about three times
-more efficient than the equivalent loop that deletes each element one
-at a time.
-
- The following statement provides a portable but nonobvious way to
-clear out an array:(1)
-
- split("", array)
-
- The `split()' function (*note String Functions::) clears out the
-target array first. This call asks it to split apart the null string.
-Because there is no data to split out, the function simply clears the
-array and then returns.
-
- CAUTION: Deleting an array does not change its type; you cannot
- delete an array and then use the array's name as a scalar (i.e., a
- regular variable). For example, the following does not work:
-
- a[1] = 3
- delete a
- a = 3
-
- ---------- Footnotes ----------
-
- (1) Thanks to Michael Brennan for pointing this out.
-
-
-File: gawk.info, Node: Numeric Array Subscripts, Next: Uninitialized Subscripts, Prev: Delete, Up: Arrays
-
-8.3 Using Numbers to Subscript Arrays
-=====================================
-
-An important aspect to remember about arrays is that _array subscripts
-are always strings_. When a numeric value is used as a subscript, it
-is converted to a string value before being used for subscripting
-(*note Conversion::). This means that the value of the built-in
-variable `CONVFMT' can affect how your program accesses elements of an
-array. For example:
-
- xyz = 12.153
- data[xyz] = 1
- CONVFMT = "%2.2f"
- if (xyz in data)
- printf "%s is in data\n", xyz
- else
- printf "%s is not in data\n", xyz
-
-This prints `12.15 is not in data'. The first statement gives `xyz' a
-numeric value. Assigning to `data[xyz]' subscripts `data' with the
-string value `"12.153"' (using the default conversion value of
-`CONVFMT', `"%.6g"'). Thus, the array element `data["12.153"]' is
-assigned the value one. The program then changes the value of
-`CONVFMT'. The test `(xyz in data)' generates a new string value from
-`xyz'--this time `"12.15"'--because the value of `CONVFMT' only allows
-two significant digits. This test fails, since `"12.15"' is different
-from `"12.153"'.
-
- According to the rules for conversions (*note Conversion::), integer
-values are always converted to strings as integers, no matter what the
-value of `CONVFMT' may happen to be. So the usual case of the
-following works:
-
- for (i = 1; i <= maxsub; i++)
- do something with array[i]
-
- The "integer values always convert to strings as integers" rule has
-an additional consequence for array indexing. Octal and hexadecimal
-constants (*note Nondecimal-numbers::) are converted internally into
-numbers, and their original form is forgotten. This means, for
-example, that `array[17]', `array[021]', and `array[0x11]' all refer to
-the same element!
-
- As with many things in `awk', the majority of the time things work
-as one would expect them to. But it is useful to have a precise
-knowledge of the actual rules since they can sometimes have a subtle
-effect on your programs.
-
-
-File: gawk.info, Node: Uninitialized Subscripts, Next: Multi-dimensional, Prev: Numeric Array Subscripts, Up: Arrays
-
-8.4 Using Uninitialized Variables as Subscripts
-===============================================
-
-Suppose it's necessary to write a program to print the input data in
-reverse order. A reasonable attempt to do so (with some test data)
-might look like this:
-
- $ echo 'line 1
- > line 2
- > line 3' | awk '{ l[lines] = $0; ++lines }
- > END {
- > for (i = lines-1; i >= 0; --i)
- > print l[i]
- > }'
- -| line 3
- -| line 2
-
- Unfortunately, the very first line of input data did not come out in
-the output!
-
- Upon first glance, we would think that this program should have
-worked. The variable `lines' is uninitialized, and uninitialized
-variables have the numeric value zero. So, `awk' should have printed
-the value of `l[0]'.
-
- The issue here is that subscripts for `awk' arrays are _always_
-strings. Uninitialized variables, when used as strings, have the value
-`""', not zero. Thus, `line 1' ends up stored in `l[""]'. The
-following version of the program works correctly:
-
- { l[lines++] = $0 }
- END {
- for (i = lines - 1; i >= 0; --i)
- print l[i]
- }
-
- Here, the `++' forces `lines' to be numeric, thus making the "old
-value" numeric zero. This is then converted to `"0"' as the array
-subscript.
-
- Even though it is somewhat unusual, the null string (`""') is a
-valid array subscript. (d.c.) `gawk' warns about the use of the null
-string as a subscript if `--lint' is provided on the command line
-(*note Options::).
-
-
-File: gawk.info, Node: Multi-dimensional, Next: Arrays of Arrays, Prev: Uninitialized Subscripts, Up: Arrays
-
-8.5 Multidimensional Arrays
-===========================
-
-* Menu:
-
-* Multi-scanning:: Scanning multidimensional arrays.
-
- A multidimensional array is an array in which an element is
-identified by a sequence of indices instead of a single index. For
-example, a two-dimensional array requires two indices. The usual way
-(in most languages, including `awk') to refer to an element of a
-two-dimensional array named `grid' is with `grid[X,Y]'.
-
- Multidimensional arrays are supported in `awk' through concatenation
-of indices into one string. `awk' converts the indices into strings
-(*note Conversion::) and concatenates them together, with a separator
-between them. This creates a single string that describes the values
-of the separate indices. The combined string is used as a single index
-into an ordinary, one-dimensional array. The separator used is the
-value of the built-in variable `SUBSEP'.
-
- For example, suppose we evaluate the expression `foo[5,12] = "value"'
-when the value of `SUBSEP' is `"@"'. The numbers 5 and 12 are
-converted to strings and concatenated with an `@' between them,
-yielding `"5@12"'; thus, the array element `foo["5@12"]' is set to
-`"value"'.
-
- Once the element's value is stored, `awk' has no record of whether
-it was stored with a single index or a sequence of indices. The two
-expressions `foo[5,12]' and `foo[5 SUBSEP 12]' are always equivalent.
-
- The default value of `SUBSEP' is the string `"\034"', which contains
-a nonprinting character that is unlikely to appear in an `awk' program
-or in most input data. The usefulness of choosing an unlikely
-character comes from the fact that index values that contain a string
-matching `SUBSEP' can lead to combined strings that are ambiguous.
-Suppose that `SUBSEP' is `"@"'; then `foo["a@b", "c"]' and
-`foo["a", "b@c"]' are indistinguishable because both are actually
-stored as `foo["a@b@c"]'.
-
- To test whether a particular index sequence exists in a
-multidimensional array, use the same operator (`in') that is used for
-single dimensional arrays. Write the whole sequence of indices in
-parentheses, separated by commas, as the left operand:
-
- (SUBSCRIPT1, SUBSCRIPT2, ...) in ARRAY
-
- The following example treats its input as a two-dimensional array of
-fields; it rotates this array 90 degrees clockwise and prints the
-result. It assumes that all lines have the same number of elements:
-
- {
- if (max_nf < NF)
- max_nf = NF
- max_nr = NR
- for (x = 1; x <= NF; x++)
- vector[x, NR] = $x
- }
-
- END {
- for (x = 1; x <= max_nf; x++) {
- for (y = max_nr; y >= 1; --y)
- printf("%s ", vector[x, y])
- printf("\n")
- }
- }
-
-When given the input:
-
- 1 2 3 4 5 6
- 2 3 4 5 6 1
- 3 4 5 6 1 2
- 4 5 6 1 2 3
-
-the program produces the following output:
-
- 4 3 2 1
- 5 4 3 2
- 6 5 4 3
- 1 6 5 4
- 2 1 6 5
- 3 2 1 6
-
-
-File: gawk.info, Node: Multi-scanning, Up: Multi-dimensional
-
-8.5.1 Scanning Multidimensional Arrays
---------------------------------------
-
-There is no special `for' statement for scanning a "multidimensional"
-array. There cannot be one, because, in truth, `awk' does not have
-multidimensional arrays or elements--there is only a multidimensional
-_way of accessing_ an array.
-
- However, if your program has an array that is always accessed as
-multidimensional, you can get the effect of scanning it by combining
-the scanning `for' statement (*note Scanning an Array::) with the
-built-in `split()' function (*note String Functions::). It works in
-the following manner:
-
- for (combined in array) {
- split(combined, separate, SUBSEP)
- ...
- }
-
-This sets the variable `combined' to each concatenated combined index
-in the array, and splits it into the individual indices by breaking it
-apart where the value of `SUBSEP' appears. The individual indices then
-become the elements of the array `separate'.
-
- Thus, if a value is previously stored in `array[1, "foo"]', then an
-element with index `"1\034foo"' exists in `array'. (Recall that the
-default value of `SUBSEP' is the character with code 034.) Sooner or
-later, the `for' statement finds that index and does an iteration with
-the variable `combined' set to `"1\034foo"'. Then the `split()'
-function is called as follows:
-
- split("1\034foo", separate, "\034")
-
-The result is to set `separate[1]' to `"1"' and `separate[2]' to
-`"foo"'. Presto! The original sequence of separate indices is
-recovered.
-
-
-File: gawk.info, Node: Arrays of Arrays, Prev: Multi-dimensional, Up: Arrays
-
-8.6 Arrays of Arrays
-====================
-
-`gawk' goes beyond standard `awk''s multidimensional array access and
-provides true arrays of arrays. Elements of a subarray are referred to
-by their own indices enclosed in square brackets, just like the
-elements of the main array. For example, the following creates a
-two-element subarray at index `1' of the main array `a':
-
- a[1][1] = 1
- a[1][2] = 2
-
- This simulates a true two-dimensional array. Each subarray element
-can contain another subarray as a value, which in turn can hold other
-arrays as well. In this way, you can create arrays of three or more
-dimensions. The indices can be any `awk' expression, including scalars
-separated by commas (that is, a regular `awk' simulated
-multidimensional subscript). So the following is valid in `gawk':
-
- a[1][3][1, "name"] = "barney"
-
- Each subarray and the main array can be of different length. In
-fact, the elements of an array or its subarray do not all have to have
-the same type. This means that the main array and any of its subarrays
-can be non-rectangular, or jagged in structure. One can assign a scalar
-value to the index `4' of the main array `a':
-
- a[4] = "An element in a jagged array"
-
- The terms "dimension", "row" and "column" are meaningless when
-applied to such an array, but we will use "dimension" henceforth to
-imply the maximum number of indices needed to refer to an existing
-element. The type of any element that has already been assigned cannot
-be changed by assigning a value of a different type. You have to first
-delete the current element, which effectively makes `gawk' forget about
-the element at that index:
-
- delete a[4]
- a[4][5][6][7] = "An element in a four-dimensional array"
-
-This removes the scalar value from index `4' and then inserts a
-subarray of subarray of subarray containing a scalar. You can also
-delete an entire subarray or subarray of subarrays:
-
- delete a[4][5]
- a[4][5] = "An element in subarray a[4]"
-
- But recall that you can not delete the main array `a' and then use it
-as a scalar.
-
- The built-in functions which take array arguments can also be used
-with subarrays. For example, the following code fragment uses `length()'
-(*note String Functions::) to determine the number of elements in the
-main array `a' and its subarrays:
-
- print length(a), length(a[1]), length(a[1][3])
-
-This results in the following output for our main array `a':
-
- 2, 3, 1
-
-The `SUBSCRIPT in ARRAY' expression (*note Reference to Elements::)
-works similarly for both regular `awk'-style arrays and arrays of
-arrays. For example, the tests `1 in a', `3 in a[1]', and `(1, "name")
-in a[1][3]' all evaluate to one (true) for our array `a'.
-
- The `for (item in array)' statement (*note Scanning an Array::) can
-be nested to scan all the elements of an array of arrays if it is
-rectangular in structure. In order to print the contents (scalar
-values) of a two-dimensional array of arrays (i.e., in which each
-first-level element is itself an array, not necessarily of the same
-length) you could use the following code:
-
- for (i in array)
- for (j in array[i])
- print array[i][j]
-
- The `isarray()' function (*note Type Functions::) lets you test if
-an array element is itself an array:
-
- for (i in array) {
- if (isarray(array[i]) {
- for (j in array[i]) {
- print array[i][j]
- }
- }
- }
-
- If the structure of a jagged array of arrays is known in advance,
-you can often devise workarounds using control statements. For example,
-the following code prints the elements of our main array `a':
-
- for (i in a) {
- for (j in a[i]) {
- if (j == 3) {
- for (k in a[i][j])
- print a[i][j][k]
- } else
- print a[i][j]
- }
- }
-
-*Note Walking Arrays::, for a user-defined function that will "walk" an
-arbitrarily-dimensioned array of arrays.
-
- Recall that a reference to an uninitialized array element yields a
-value of `""', the null string. This has one important implication when
-you intend to use a subarray as an argument to a function, as
-illustrated by the following example:
-
- $ gawk 'BEGIN { split("a b c d", b[1]); print b[1][1] }'
- error--> gawk: cmd. line:1: fatal: split: second argument is not an array
-
- The way to work around this is to first force `b[1]' to be an array
-by creating an arbitrary index:
-
- $ gawk 'BEGIN { b[1][1] = ""; split("a b c d", b[1]); print b[1][1] }'
- -| a
-
-
-File: gawk.info, Node: Functions, Next: Internationalization, Prev: Arrays, Up: Top
-
-9 Functions
-***********
-
-This major node describes `awk''s built-in functions, which fall into
-three categories: numeric, string, and I/O. `gawk' provides additional
-groups of functions to work with values that represent time, do bit
-manipulation, sort arrays, and internationalize and localize programs.
-
- Besides the built-in functions, `awk' has provisions for writing new
-functions that the rest of a program can use. The second half of this
-major node describes these "user-defined" functions.
-
-* Menu:
-
-* Built-in:: Summarizes the built-in functions.
-* User-defined:: Describes User-defined functions in detail.
-* Indirect Calls:: Choosing the function to call at runtime.
-
-
-File: gawk.info, Node: Built-in, Next: User-defined, Up: Functions
-
-9.1 Built-in Functions
-======================
-
-"Built-in" functions are always available for your `awk' program to
-call. This minor node defines all the built-in functions in `awk';
-some of these are mentioned in other sections but are summarized here
-for your convenience.
-
-* Menu:
-
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- `int()', `sin()' and `rand()'.
-* String Functions:: Functions for string manipulation, such as
- `split()', `match()' and
- `sprintf()'.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-
-
-File: gawk.info, Node: Calling Built-in, Next: Numeric Functions, Up: Built-in
-
-9.1.1 Calling Built-in Functions
---------------------------------
-
-To call one of `awk''s built-in functions, write the name of the
-function followed by arguments in parentheses. For example, `atan2(y +
-z, 1)' is a call to the function `atan2()' and has two arguments.
-
- Whitespace is ignored between the built-in function name and the
-open parenthesis, but nonetheless it is good practice to avoid using
-whitespace there. User-defined functions do not permit whitespace in
-this way, and it is easier to avoid mistakes by following a simple
-convention that always works--no whitespace after a function name.
-
- Each built-in function accepts a certain number of arguments. In
-some cases, arguments can be omitted. The defaults for omitted
-arguments vary from function to function and are described under the
-individual functions. In some `awk' implementations, extra arguments
-given to built-in functions are ignored. However, in `gawk', it is a
-fatal error to give extra arguments to a built-in function.
-
- When a function is called, expressions that create the function's
-actual parameters are evaluated completely before the call is performed.
-For example, in the following code fragment:
-
- i = 4
- j = sqrt(i++)
-
-the variable `i' is incremented to the value five before `sqrt()' is
-called with a value of four for its actual parameter. The order of
-evaluation of the expressions used for the function's parameters is
-undefined. Thus, avoid writing programs that assume that parameters
-are evaluated from left to right or from right to left. For example:
-
- i = 5
- j = atan2(i++, i *= 2)
-
- If the order of evaluation is left to right, then `i' first becomes
-6, and then 12, and `atan2()' is called with the two arguments 6 and
-12. But if the order of evaluation is right to left, `i' first becomes
-10, then 11, and `atan2()' is called with the two arguments 11 and 10.
-
-
-File: gawk.info, Node: Numeric Functions, Next: String Functions, Prev: Calling Built-in, Up: Built-in
-
-9.1.2 Numeric Functions
------------------------
-
-The following list describes all of the built-in functions that work
-with numbers. Optional parameters are enclosed in square
-brackets ([ ]):
-
-`atan2(Y, X)'
- Return the arctangent of `Y / X' in radians. You can use `pi =
- atan2(0, -1)' to retrieve the value of pi.
-
-`cos(X)'
- Return the cosine of X, with X in radians.
-
-`exp(X)'
- Return the exponential of X (`e ^ X') or report an error if X is
- out of range. The range of values X can have depends on your
- machine's floating-point representation.
-
-`int(X)'
- Return the nearest integer to X, located between X and zero and
- truncated toward zero.
-
- For example, `int(3)' is 3, `int(3.9)' is 3, `int(-3.9)' is -3,
- and `int(-3)' is -3 as well.
-
-`log(X)'
- Return the natural logarithm of X, if X is positive; otherwise,
- report an error.
-
-`rand()'
- Return a random number. The values of `rand()' are uniformly
- distributed between zero and one. The value could be zero but is
- never one.(1)
-
- Often random integers are needed instead. Following is a
- user-defined function that can be used to obtain a random
- non-negative integer less than N:
-
- function randint(n) {
- return int(n * rand())
- }
-
- The multiplication produces a random number greater than zero and
- less than `n'. Using `int()', this result is made into an integer
- between zero and `n' - 1, inclusive.
-
- The following example uses a similar function to produce random
- integers between one and N. This program prints a new random
- number for each input record:
-
- # Function to roll a simulated die.
- function roll(n) { return 1 + int(rand() * n) }
-
- # Roll 3 six-sided dice and
- # print total number of points.
- {
- printf("%d points\n",
- roll(6)+roll(6)+roll(6))
- }
-
- CAUTION: In most `awk' implementations, including `gawk',
- `rand()' starts generating numbers from the same starting
- number, or "seed", each time you run `awk'.(2) Thus, a
- program generates the same results each time you run it. The
- numbers are random within one `awk' run but predictable from
- run to run. This is convenient for debugging, but if you want
- a program to do different things each time it is used, you
- must change the seed to a value that is different in each
- run. To do this, use `srand()'.
-
-`sin(X)'
- Return the sine of X, with X in radians.
-
-`sqrt(X)'
- Return the positive square root of X. `gawk' prints a warning
- message if X is negative. Thus, `sqrt(4)' is 2.
-
-`srand([X])'
- Set the starting point, or seed, for generating random numbers to
- the value X.
-
- Each seed value leads to a particular sequence of random
- numbers.(3) Thus, if the seed is set to the same value a second
- time, the same sequence of random numbers is produced again.
-
- CAUTION: Different `awk' implementations use different
- random-number generators internally. Don't expect the same
- `awk' program to produce the same series of random numbers
- when executed by different versions of `awk'.
-
- If the argument X is omitted, as in `srand()', then the current
- date and time of day are used for a seed. This is the way to get
- random numbers that are truly unpredictable.
-
- The return value of `srand()' is the previous seed. This makes it
- easy to keep track of the seeds in case you need to consistently
- reproduce sequences of random numbers.
-
- ---------- Footnotes ----------
-
- (1) The C version of `rand()' on many Unix systems is known to
-produce fairly poor sequences of random numbers. However, nothing
-requires that an `awk' implementation use the C `rand()' to implement
-the `awk' version of `rand()'. In fact, `gawk' uses the BSD `random()'
-function, which is considerably better than `rand()', to produce random
-numbers.
-
- (2) `mawk' uses a different seed each time.
-
- (3) Computer-generated random numbers really are not truly random.
-They are technically known as "pseudorandom." This means that while
-the numbers in a sequence appear to be random, you can in fact generate
-the same sequence of random numbers over and over again.
-
-
-File: gawk.info, Node: String Functions, Next: I/O Functions, Prev: Numeric Functions, Up: Built-in
-
-9.1.3 String-Manipulation Functions
------------------------------------
-
-The functions in this minor node look at or change the text of one or
-more strings. `gawk' understands locales (*note Locales::), and does
-all string processing in terms of _characters_, not _bytes_. This
-distinction is particularly important to understand for locales where
-one character may be represented by multiple bytes. Thus, for example,
-`length()' returns the number of characters in a string, and not the
-number of bytes used to represent those characters, Similarly,
-`index()' works with character indices, and not byte indices.
-
- In the following list, optional parameters are enclosed in square
-brackets ([ ]). Several functions perform string substitution; the
-full discussion is provided in the description of the `sub()' function,
-which comes towards the end since the list is presented in alphabetic
-order. Those functions that are specific to `gawk' are marked with a
-pound sign (`#'):
-
-* Menu:
-
-* Gory Details:: More than you want to know about `\' and
- `&' with `sub()', `gsub()', and
- `gensub()'.
-
-`asort(SOURCE [, DEST [, HOW ] ]) #'
- Return the number of elements in the array SOURCE. `gawk' sorts
- the contents of SOURCE and replaces the indices of the sorted
- values of SOURCE with sequential integers starting with one. If
- the optional array DEST is specified, then SOURCE is duplicated
- into DEST. DEST is then sorted, leaving the indices of SOURCE
- unchanged. The optional third argument HOW is a string which
- controls the rule for comparing values, and the sort direction. A
- single space is required between the comparison mode, `string' or
- `number', and the direction specification, `ascending' or
- `descending'. You can omit direction and/or mode in which case it
- will default to `ascending' and `string', respectively. An empty
- string "" is the same as the default `"ascending string"' for the
- value of HOW. If the `source' array contains subarrays as values,
- they will come out last(first) in the `dest' array for
- `ascending'(`descending') order specification. The value of
- `IGNORECASE' affects the sorting. The third argument can also be
- a user-defined function name in which case the value returned by
- the function is used to order the array elements before
- constructing the result array. *Note Array Sorting Functions::,
- for more information.
-
- For example, if the contents of `a' are as follows:
-
- a["last"] = "de"
- a["first"] = "sac"
- a["middle"] = "cul"
-
- A call to `asort()':
-
- asort(a)
-
- results in the following contents of `a':
-
- a[1] = "cul"
- a[2] = "de"
- a[3] = "sac"
-
- In order to reverse the direction of the sorted results in the
- above example, `asort()' can be called with three arguments as
- follows:
-
- asort(a, a, "descending")
-
- The `asort()' function is described in more detail in *note Array
- Sorting Functions::. `asort()' is a `gawk' extension; it is not
- available in compatibility mode (*note Options::).
-
-`asorti(SOURCE [, DEST [, HOW ] ]) #'
- Return the number of elements in the array SOURCE. It works
- similarly to `asort()', however, the _indices_ are sorted, instead
- of the values. (Here too, `IGNORECASE' affects the sorting.)
-
- The `asorti()' function is described in more detail in *note Array
- Sorting Functions::. `asorti()' is a `gawk' extension; it is not
- available in compatibility mode (*note Options::).
-
-`gensub(REGEXP, REPLACEMENT, HOW [, TARGET]) #'
- Search the target string TARGET for matches of the regular
- expression REGEXP. If HOW is a string beginning with `g' or `G'
- (short for "global"), then replace all matches of REGEXP with
- REPLACEMENT. Otherwise, HOW is treated as a number indicating
- which match of REGEXP to replace. If no TARGET is supplied, use
- `$0'. It returns the modified string as the result of the
- function and the original target string is _not_ changed.
-
- `gensub()' is a general substitution function. It's purpose is to
- provide more features than the standard `sub()' and `gsub()'
- functions.
-
- `gensub()' provides an additional feature that is not available in
- `sub()' or `gsub()': the ability to specify components of a regexp
- in the replacement text. This is done by using parentheses in the
- regexp to mark the components and then specifying `\N' in the
- replacement text, where N is a digit from 1 to 9. For example:
-
- $ gawk '
- > BEGIN {
- > a = "abc def"
- > b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
- > print b
- > }'
- -| def abc
-
- As with `sub()', you must type two backslashes in order to get one
- into the string. In the replacement text, the sequence `\0'
- represents the entire matched text, as does the character `&'.
-
- The following example shows how you can use the third argument to
- control which match of the regexp should be changed:
-
- $ echo a b c a b c |
- > gawk '{ print gensub(/a/, "AA", 2) }'
- -| a b c AA b c
-
- In this case, `$0' is the default target string. `gensub()'
- returns the new string as its result, which is passed directly to
- `print' for printing.
-
- If the HOW argument is a string that does not begin with `g' or
- `G', or if it is a number that is less than or equal to zero, only
- one substitution is performed. If HOW is zero, `gawk' issues a
- warning message.
-
- If REGEXP does not match TARGET, `gensub()''s return value is the
- original unchanged value of TARGET.
-
- `gensub()' is a `gawk' extension; it is not available in
- compatibility mode (*note Options::).
-
-`gsub(REGEXP, REPLACEMENT [, TARGET])'
- Search TARGET for _all_ of the longest, leftmost, _nonoverlapping_
- matching substrings it can find and replace them with REPLACEMENT.
- The `g' in `gsub()' stands for "global," which means replace
- everywhere. For example:
-
- { gsub(/Britain/, "United Kingdom"); print }
-
- replaces all occurrences of the string `Britain' with `United
- Kingdom' for all input records.
-
- The `gsub()' function returns the number of substitutions made. If
- the variable to search and alter (TARGET) is omitted, then the
- entire input record (`$0') is used. As in `sub()', the characters
- `&' and `\' are special, and the third argument must be assignable.
-
-`index(IN, FIND)'
- Search the string IN for the first occurrence of the string FIND,
- and return the position in characters where that occurrence begins
- in the string IN. Consider the following example:
-
- $ awk 'BEGIN { print index("peanut", "an") }'
- -| 3
-
- If FIND is not found, `index()' returns zero. (Remember that
- string indices in `awk' start at one.)
-
-`length([STRING])'
- Return the number of characters in STRING. If STRING is a number,
- the length of the digit string representing that number is
- returned. For example, `length("abcde")' is five. By contrast,
- `length(15 * 35)' works out to three. In this example, 15 * 35 =
- 525, and 525 is then converted to the string `"525"', which has
- three characters.
-
- If no argument is supplied, `length()' returns the length of `$0'.
-
- NOTE: In older versions of `awk', the `length()' function
- could be called without any parentheses. Doing so is
- considered poor practice, although the 2008 POSIX standard
- explicitly allows it, to support historical practice. For
- programs to be maximally portable, always supply the
- parentheses.
-
- If `length()' is called with a variable that has not been used,
- `gawk' forces the variable to be a scalar. Other implementations
- of `awk' leave the variable without a type. (d.c.) Consider:
-
- $ gawk 'BEGIN { print length(x) ; x[1] = 1 }'
- -| 0
- error--> gawk: fatal: attempt to use scalar `x' as array
-
- $ nawk 'BEGIN { print length(x) ; x[1] = 1 }'
- -| 0
-
- If `--lint' has been specified on the command line, `gawk' issues a
- warning about this.
-
- With `gawk' and several other `awk' implementations, when given an
- array argument, the `length()' function returns the number of
- elements in the array. (c.e.) This is less useful than it might
- seem at first, as the array is not guaranteed to be indexed from
- one to the number of elements in it. If `--lint' is provided on
- the command line (*note Options::), `gawk' warns that passing an
- array argument is not portable. If `--posix' is supplied, using
- an array argument is a fatal error (*note Arrays::).
-
-`match(STRING, REGEXP [, ARRAY])'
- Search STRING for the longest, leftmost substring matched by the
- regular expression, REGEXP and return the character position, or
- "index", at which that substring begins (one, if it starts at the
- beginning of STRING). If no match is found, return zero.
-
- The REGEXP argument may be either a regexp constant (`/.../') or a
- string constant (`"..."'). In the latter case, the string is
- treated as a regexp to be matched. *Note Computed Regexps::, for a
- discussion of the difference between the two forms, and the
- implications for writing your program correctly.
-
- The order of the first two arguments is backwards from most other
- string functions that work with regular expressions, such as
- `sub()' and `gsub()'. It might help to remember that for
- `match()', the order is the same as for the `~' operator: `STRING
- ~ REGEXP'.
-
- The `match()' function sets the built-in variable `RSTART' to the
- index. It also sets the built-in variable `RLENGTH' to the length
- in characters of the matched substring. If no match is found,
- `RSTART' is set to zero, and `RLENGTH' to -1.
-
- For example:
-
- {
- if ($1 == "FIND")
- regex = $2
- else {
- where = match($0, regex)
- if (where != 0)
- print "Match of", regex, "found at",
- where, "in", $0
- }
- }
-
- This program looks for lines that match the regular expression
- stored in the variable `regex'. This regular expression can be
- changed. If the first word on a line is `FIND', `regex' is
- changed to be the second word on that line. Therefore, if given:
-
- FIND ru+n
- My program runs
- but not very quickly
- FIND Melvin
- JF+KM
- This line is property of Reality Engineering Co.
- Melvin was here.
-
- `awk' prints:
-
- Match of ru+n found at 12 in My program runs
- Match of Melvin found at 1 in Melvin was here.
-
- If ARRAY is present, it is cleared, and then the zeroth element of
- ARRAY is set to the entire portion of STRING matched by REGEXP.
- If REGEXP contains parentheses, the integer-indexed elements of
- ARRAY are set to contain the portion of STRING matching the
- corresponding parenthesized subexpression. For example:
-
- $ echo foooobazbarrrrr |
- > gawk '{ match($0, /(fo+).+(bar*)/, arr)
- > print arr[1], arr[2] }'
- -| foooo barrrrr
-
- In addition, multidimensional subscripts are available providing
- the start index and length of each matched subexpression:
-
- $ echo foooobazbarrrrr |
- > gawk '{ match($0, /(fo+).+(bar*)/, arr)
- > print arr[1], arr[2]
- > print arr[1, "start"], arr[1, "length"]
- > print arr[2, "start"], arr[2, "length"]
- > }'
- -| foooo barrrrr
- -| 1 5
- -| 9 7
-
- There may not be subscripts for the start and index for every
- parenthesized subexpression, since they may not all have matched
- text; thus they should be tested for with the `in' operator (*note
- Reference to Elements::).
-
- The ARRAY argument to `match()' is a `gawk' extension. In
- compatibility mode (*note Options::), using a third argument is a
- fatal error.
-
-`patsplit(STRING, ARRAY [, FIELDPAT [, SEPS ] ]) #'
- Divide STRING into pieces defined by FIELDPAT and store the pieces
- in ARRAY and the separator strings in the SEPS array. The first
- piece is stored in `ARRAY[1]', the second piece in `ARRAY[2]', and
- so forth. The third argument, FIELDPAT, is a regexp describing
- the fields in STRING (just as `FPAT' is a regexp describing the
- fields in input records). It may be either a regexp constant or a
- string. If FIELDPAT is omitted, the value of `FPAT' is used.
- `patsplit()' returns the number of elements created. `SEPS[I]' is
- the separator string between `ARRAY[I]' and `ARRAY[I+1]'. Any
- leading separator will be in `SEPS[0]'.
-
- The `patsplit()' function splits strings into pieces in a manner
- similar to the way input lines are split into fields using `FPAT'
- (*note Splitting By Content::.
-
- Before splitting the string, `patsplit()' deletes any previously
- existing elements in the arrays ARRAY and SEPS.
-
- The `patsplit()' function is a `gawk' extension. In compatibility
- mode (*note Options::), it is not available.
-
-`split(STRING, ARRAY [, FIELDSEP [, SEPS ] ])'
- Divide STRING into pieces separated by FIELDSEP and store the
- pieces in ARRAY and the separator strings in the SEPS array. The
- first piece is stored in `ARRAY[1]', the second piece in
- `ARRAY[2]', and so forth. The string value of the third argument,
- FIELDSEP, is a regexp describing where to split STRING (much as
- `FS' can be a regexp describing where to split input records;
- *note Regexp Field Splitting::). If FIELDSEP is omitted, the
- value of `FS' is used. `split()' returns the number of elements
- created. SEPS is a `gawk' extension with `SEPS[I]' being the
- separator string between `ARRAY[I]' and `ARRAY[I+1]'. If FIELDSEP
- is a single space then any leading whitespace goes into `SEPS[0]'
- and any trailing whitespace goes into `SEPS[N]' where N is the
- return value of `split()' (that is, the number of elements in
- ARRAY).
-
- The `split()' function splits strings into pieces in a manner
- similar to the way input lines are split into fields. For example:
-
- split("cul-de-sac", a, "-", seps)
-
- splits the string `cul-de-sac' into three fields using `-' as the
- separator. It sets the contents of the array `a' as follows:
-
- a[1] = "cul"
- a[2] = "de"
- a[3] = "sac"
-
- and sets the contents of the array `seps' as follows:
-
- seps[1] = "-"
- seps[2] = "-"
-
- The value returned by this call to `split()' is three.
-
- As with input field-splitting, when the value of FIELDSEP is
- `" "', leading and trailing whitespace is ignored in values
- assigned to the elements of ARRAY but not in SEPS, and the elements
- are separated by runs of whitespace. Also as with input
- field-splitting, if FIELDSEP is the null string, each individual
- character in the string is split into its own array element.
- (c.e.)
-
- Note, however, that `RS' has no effect on the way `split()' works.
- Even though `RS = ""' causes newline to also be an input field
- separator, this does not affect how `split()' splits strings.
-
- Modern implementations of `awk', including `gawk', allow the third
- argument to be a regexp constant (`/abc/') as well as a string.
- (d.c.) The POSIX standard allows this as well. *Note Computed
- Regexps::, for a discussion of the difference between using a
- string constant or a regexp constant, and the implications for
- writing your program correctly.
-
- Before splitting the string, `split()' deletes any previously
- existing elements in the arrays ARRAY and SEPS.
-
- If STRING is null, the array has no elements. (So this is a
- portable way to delete an entire array with one statement. *Note
- Delete::.)
-
- If STRING does not match FIELDSEP at all (but is not null), ARRAY
- has one element only. The value of that element is the original
- STRING.
-
-`sprintf(FORMAT, EXPRESSION1, ...)'
- Return (without printing) the string that `printf' would have
- printed out with the same arguments (*note Printf::). For example:
-
- pival = sprintf("pi = %.2f (approx.)", 22/7)
-
- assigns the string `pi = 3.14 (approx.)' to the variable `pival'.
-
-`strtonum(STR) #'
- Examine STR and return its numeric value. If STR begins with a
- leading `0', `strtonum()' assumes that STR is an octal number. If
- STR begins with a leading `0x' or `0X', `strtonum()' assumes that
- STR is a hexadecimal number. For example:
-
- $ echo 0x11 |
- > gawk '{ printf "%d\n", strtonum($1) }'
- -| 17
-
- Using the `strtonum()' function is _not_ the same as adding zero
- to a string value; the automatic coercion of strings to numbers
- works only for decimal data, not for octal or hexadecimal.(1)
-
- Note also that `strtonum()' uses the current locale's decimal point
- for recognizing numbers (*note Locales::).
-
- `strtonum()' is a `gawk' extension; it is not available in
- compatibility mode (*note Options::).
-
-`sub(REGEXP, REPLACEMENT [, TARGET])'
- Search TARGET, which is treated as a string, for the leftmost,
- longest substring matched by the regular expression REGEXP.
- Modify the entire string by replacing the matched text with
- REPLACEMENT. The modified string becomes the new value of TARGET.
- Return the number of substitutions made (zero or one).
-
- The REGEXP argument may be either a regexp constant (`/.../') or a
- string constant (`"..."'). In the latter case, the string is
- treated as a regexp to be matched. *Note Computed Regexps::, for a
- discussion of the difference between the two forms, and the
- implications for writing your program correctly.
-
- This function is peculiar because TARGET is not simply used to
- compute a value, and not just any expression will do--it must be a
- variable, field, or array element so that `sub()' can store a
- modified value there. If this argument is omitted, then the
- default is to use and alter `$0'.(2) For example:
-
- str = "water, water, everywhere"
- sub(/at/, "ith", str)
-
- sets `str' to `wither, water, everywhere', by replacing the
- leftmost longest occurrence of `at' with `ith'.
-
- If the special character `&' appears in REPLACEMENT, it stands for
- the precise substring that was matched by REGEXP. (If the regexp
- can match more than one string, then this precise substring may
- vary.) For example:
-
- { sub(/candidate/, "& and his wife"); print }
-
- changes the first occurrence of `candidate' to `candidate and his
- wife' on each input line. Here is another example:
-
- $ awk 'BEGIN {
- > str = "daabaaa"
- > sub(/a+/, "C&C", str)
- > print str
- > }'
- -| dCaaCbaaa
-
- This shows how `&' can represent a nonconstant string and also
- illustrates the "leftmost, longest" rule in regexp matching (*note
- Leftmost Longest::).
-
- The effect of this special character (`&') can be turned off by
- putting a backslash before it in the string. As usual, to insert
- one backslash in the string, you must write two backslashes.
- Therefore, write `\\&' in a string constant to include a literal
- `&' in the replacement. For example, the following shows how to
- replace the first `|' on each line with an `&':
-
- { sub(/\|/, "\\&"); print }
-
- As mentioned, the third argument to `sub()' must be a variable,
- field or array element. Some versions of `awk' allow the third
- argument to be an expression that is not an lvalue. In such a
- case, `sub()' still searches for the pattern and returns zero or
- one, but the result of the substitution (if any) is thrown away
- because there is no place to put it. Such versions of `awk'
- accept expressions like the following:
-
- sub(/USA/, "United States", "the USA and Canada")
-
- For historical compatibility, `gawk' accepts such erroneous code.
- However, using any other nonchangeable object as the third
- parameter causes a fatal error and your program will not run.
-
- Finally, if the REGEXP is not a regexp constant, it is converted
- into a string, and then the value of that string is treated as the
- regexp to match.
-
-`substr(STRING, START [, LENGTH])'
- Return a LENGTH-character-long substring of STRING, starting at
- character number START. The first character of a string is
- character number one.(3) For example, `substr("washington", 5, 3)'
- returns `"ing"'.
-
- If LENGTH is not present, `substr()' returns the whole suffix of
- STRING that begins at character number START. For example,
- `substr("washington", 5)' returns `"ington"'. The whole suffix is
- also returned if LENGTH is greater than the number of characters
- remaining in the string, counting from character START.
-
- If START is less than one, `substr()' treats it as if it was one.
- (POSIX doesn't specify what to do in this case: Brian Kernighan's
- `awk' acts this way, and therefore `gawk' does too.) If START is
- greater than the number of characters in the string, `substr()'
- returns the null string. Similarly, if LENGTH is present but less
- than or equal to zero, the null string is returned.
-
- The string returned by `substr()' _cannot_ be assigned. Thus, it
- is a mistake to attempt to change a portion of a string, as shown
- in the following example:
-
- string = "abcdef"
- # try to get "abCDEf", won't work
- substr(string, 3, 3) = "CDE"
-
- It is also a mistake to use `substr()' as the third argument of
- `sub()' or `gsub()':
-
- gsub(/xyz/, "pdq", substr($0, 5, 20)) # WRONG
-
- (Some commercial versions of `awk' treat `substr()' as assignable,
- but doing so is not portable.)
-
- If you need to replace bits and pieces of a string, combine
- `substr()' with string concatenation, in the following manner:
-
- string = "abcdef"
- ...
- string = substr(string, 1, 2) "CDE" substr(string, 6)
-
-`tolower(STRING)'
- Return a copy of STRING, with each uppercase character in the
- string replaced with its corresponding lowercase character.
- Nonalphabetic characters are left unchanged. For example,
- `tolower("MiXeD cAsE 123")' returns `"mixed case 123"'.
-
-`toupper(STRING)'
- Return a copy of STRING, with each lowercase character in the
- string replaced with its corresponding uppercase character.
- Nonalphabetic characters are left unchanged. For example,
- `toupper("MiXeD cAsE 123")' returns `"MIXED CASE 123"'.
-
- ---------- Footnotes ----------
-
- (1) Unless you use the `--non-decimal-data' option, which isn't
-recommended. *Note Nondecimal Data::, for more information.
-
- (2) Note that this means that the record will first be regenerated
-using the value of `OFS' if any fields have been changed, and that the
-fields will be updated after the substitution, even if the operation is
-a "no-op" such as `sub(/^/, "")'.
-
- (3) This is different from C and C++, in which the first character
-is number zero.
-
-
-File: gawk.info, Node: Gory Details, Up: String Functions
-
-9.1.3.1 More About `\' and `&' with `sub()', `gsub()', and `gensub()'
-.....................................................................
-
-When using `sub()', `gsub()', or `gensub()', and trying to get literal
-backslashes and ampersands into the replacement text, you need to
-remember that there are several levels of "escape processing" going on.
-
- First, there is the "lexical" level, which is when `awk' reads your
-program and builds an internal copy of it that can be executed. Then
-there is the runtime level, which is when `awk' actually scans the
-replacement string to determine what to generate.
-
- At both levels, `awk' looks for a defined set of characters that can
-come after a backslash. At the lexical level, it looks for the escape
-sequences listed in *note Escape Sequences::. Thus, for every `\' that
-`awk' processes at the runtime level, you must type two backslashes at
-the lexical level. When a character that is not valid for an escape
-sequence follows the `\', Brian Kernighan's `awk' and `gawk' both
-simply remove the initial `\' and put the next character into the
-string. Thus, for example, `"a\qb"' is treated as `"aqb"'.
-
- At the runtime level, the various functions handle sequences of `\'
-and `&' differently. The situation is (sadly) somewhat complex.
-Historically, the `sub()' and `gsub()' functions treated the two
-character sequence `\&' specially; this sequence was replaced in the
-generated text with a single `&'. Any other `\' within the REPLACEMENT
-string that did not precede an `&' was passed through unchanged. This
-is illustrated in *note table-sub-escapes::.
-
- You type `sub()' sees `sub()' generates
- ------- --------- --------------
- `\&' `&' the matched text
- `\\&' `\&' a literal `&'
- `\\\&' `\&' a literal `&'
- `\\\\&' `\\&' a literal `\&'
- `\\\\\&' `\\&' a literal `\&'
- `\\\\\\&' `\\\&' a literal `\\&'
- `\\q' `\q' a literal `\q'
-
-Table 9.1: Historical Escape Sequence Processing for `sub()' and
-`gsub()'
-
-This table shows both the lexical-level processing, where an odd number
-of backslashes becomes an even number at the runtime level, as well as
-the runtime processing done by `sub()'. (For the sake of simplicity,
-the rest of the following tables only show the case of even numbers of
-backslashes entered at the lexical level.)
-
- The problem with the historical approach is that there is no way to
-get a literal `\' followed by the matched text.
-
- The 1992 POSIX standard attempted to fix this problem. That standard
-says that `sub()' and `gsub()' look for either a `\' or an `&' after
-the `\'. If either one follows a `\', that character is output
-literally. The interpretation of `\' and `&' then becomes as shown in
-*note table-sub-posix-92::.
-
- You type `sub()' sees `sub()' generates
- ------- --------- --------------
- `&' `&' the matched text
- `\\&' `\&' a literal `&'
- `\\\\&' `\\&' a literal `\', then the matched text
- `\\\\\\&' `\\\&' a literal `\&'
-
-Table 9.2: 1992 POSIX Rules for sub and gsub Escape Sequence Processing
-
-This appears to solve the problem. Unfortunately, the phrasing of the
-standard is unusual. It says, in effect, that `\' turns off the special
-meaning of any following character, but for anything other than `\' and
-`&', such special meaning is undefined. This wording leads to two
-problems:
-
- * Backslashes must now be doubled in the REPLACEMENT string, breaking
- historical `awk' programs.
-
- * To make sure that an `awk' program is portable, _every_ character
- in the REPLACEMENT string must be preceded with a backslash.(1)
-
- Because of the problems just listed, in 1996, the `gawk' maintainer
-submitted proposed text for a revised standard that reverts to rules
-that correspond more closely to the original existing practice. The
-proposed rules have special cases that make it possible to produce a
-`\' preceding the matched text. This is shown in *note
-table-sub-proposed::.
-
- You type `sub()' sees `sub()' generates
- ------- --------- --------------
- `\\\\\\&' `\\\&' a literal `\&'
- `\\\\&' `\\&' a literal `\', followed by the matched text
- `\\&' `\&' a literal `&'
- `\\q' `\q' a literal `\q'
- `\\\\' `\\' `\\'
-
-Table 9.3: Proposed rules for sub and backslash
-
- In a nutshell, at the runtime level, there are now three special
-sequences of characters (`\\\&', `\\&' and `\&') whereas historically
-there was only one. However, as in the historical case, any `\' that
-is not part of one of these three sequences is not special and appears
-in the output literally.
-
- `gawk' 3.0 and 3.1 follow these proposed POSIX rules for `sub()' and
-`gsub()'. The POSIX standard took much longer to be revised than was
-expected in 1996. The 2001 standard does not follow the above rules.
-Instead, the rules there are somewhat simpler. The results are similar
-except for one case.
-
- The POSIX rules state that `\&' in the replacement string produces a
-literal `&', `\\' produces a literal `\', and `\' followed by anything
-else is not special; the `\' is placed straight into the output. These
-rules are presented in *note table-posix-sub::.
-
- You type `sub()' sees `sub()' generates
- ------- --------- --------------
- `\\\\\\&' `\\\&' a literal `\&'
- `\\\\&' `\\&' a literal `\', followed by the matched text
- `\\&' `\&' a literal `&'
- `\\q' `\q' a literal `\q'
- `\\\\' `\\' `\'
-
-Table 9.4: POSIX rules for `sub()' and `gsub()'
-
- The only case where the difference is noticeable is the last one:
-`\\\\' is seen as `\\' and produces `\' instead of `\\'.
-
- Starting with version 3.1.4, `gawk' followed the POSIX rules when
-`--posix' is specified (*note Options::). Otherwise, it continued to
-follow the 1996 proposed rules, since that had been its behavior for
-many years.
-
- When version 4.0.0, was released, the `gawk' maintainer made the
-POSIX rules the default, breaking well over a decade's worth of
-backwards compatibility.(2) Needless to say, this was a bad idea, and
-as of version 4.0.1, `gawk' resumed its historical behavior, and only
-follows the POSIX rules when `--posix' is given.
-
- The rules for `gensub()' are considerably simpler. At the runtime
-level, whenever `gawk' sees a `\', if the following character is a
-digit, then the text that matched the corresponding parenthesized
-subexpression is placed in the generated output. Otherwise, no matter
-what character follows the `\', it appears in the generated text and
-the `\' does not, as shown in *note table-gensub-escapes::.
-
- You type `gensub()' sees `gensub()' generates
- ------- ------------ -----------------
- `&' `&' the matched text
- `\\&' `\&' a literal `&'
- `\\\\' `\\' a literal `\'
- `\\\\&' `\\&' a literal `\', then the matched text
- `\\\\\\&' `\\\&' a literal `\&'
- `\\q' `\q' a literal `q'
-
-Table 9.5: Escape Sequence Processing for `gensub()'
-
- Because of the complexity of the lexical and runtime level processing
-and the special cases for `sub()' and `gsub()', we recommend the use of
-`gawk' and `gensub()' when you have to do substitutions.
-
-Advanced Notes: Matching the Null String
-----------------------------------------
-
-In `awk', the `*' operator can match the null string. This is
-particularly important for the `sub()', `gsub()', and `gensub()'
-functions. For example:
-
- $ echo abc | awk '{ gsub(/m*/, "X"); print }'
- -| XaXbXcX
-
-Although this makes a certain amount of sense, it can be surprising.
-
- ---------- Footnotes ----------
-
- (1) This consequence was certainly unintended.
-
- (2) This was rather naive of him, despite there being a note in this
-section indicating that the next major version would move to the POSIX
-rules.
-
-
-File: gawk.info, Node: I/O Functions, Next: Time Functions, Prev: String Functions, Up: Built-in
-
-9.1.4 Input/Output Functions
-----------------------------
-
-The following functions relate to input/output (I/O). Optional
-parameters are enclosed in square brackets ([ ]):
-
-`close(FILENAME [, HOW])'
- Close the file FILENAME for input or output. Alternatively, the
- argument may be a shell command that was used for creating a
- coprocess, or for redirecting to or from a pipe; then the
- coprocess or pipe is closed. *Note Close Files And Pipes::, for
- more information.
-
- When closing a coprocess, it is occasionally useful to first close
- one end of the two-way pipe and then to close the other. This is
- done by providing a second argument to `close()'. This second
- argument should be one of the two string values `"to"' or `"from"',
- indicating which end of the pipe to close. Case in the string does
- not matter. *Note Two-way I/O::, which discusses this feature in
- more detail and gives an example.
-
-`fflush([FILENAME])'
- Flush any buffered output associated with FILENAME, which is
- either a file opened for writing or a shell command for
- redirecting output to a pipe or coprocess. (c.e.).
-
- Many utility programs "buffer" their output; i.e., they save
- information to write to a disk file or the screen in memory until
- there is enough for it to be worthwhile to send the data to the
- output device. This is often more efficient than writing every
- little bit of information as soon as it is ready. However,
- sometimes it is necessary to force a program to "flush" its
- buffers; that is, write the information to its destination, even
- if a buffer is not full. This is the purpose of the `fflush()'
- function--`gawk' also buffers its output and the `fflush()'
- function forces `gawk' to flush its buffers.
-
- `fflush()' was added to Brian Kernighan's version of `awk' in
- 1994; it is not part of the POSIX standard and is not available if
- `--posix' has been specified on the command line (*note Options::).
-
- `gawk' extends the `fflush()' function in two ways. The first is
- to allow no argument at all. In this case, the buffer for the
- standard output is flushed. The second is to allow the null string
- (`""') as the argument. In this case, the buffers for _all_ open
- output files and pipes are flushed. Brian Kernighan's `awk' also
- supports these extensions.
-
- `fflush()' returns zero if the buffer is successfully flushed;
- otherwise, it returns -1. In the case where all buffers are
- flushed, the return value is zero only if all buffers were flushed
- successfully. Otherwise, it is -1, and `gawk' warns about the
- problem FILENAME.
-
- `gawk' also issues a warning message if you attempt to flush a
- file or pipe that was opened for reading (such as with `getline'),
- or if FILENAME is not an open file, pipe, or coprocess. In such a
- case, `fflush()' returns -1, as well.
-
-`system(COMMAND)'
- Execute the operating-system command COMMAND and then return to
- the `awk' program. Return COMMAND's exit status.
-
- For example, if the following fragment of code is put in your `awk'
- program:
-
- END {
- system("date | mail -s 'awk run done' root")
- }
-
- the system administrator is sent mail when the `awk' program
- finishes processing input and begins its end-of-input processing.
-
- Note that redirecting `print' or `printf' into a pipe is often
- enough to accomplish your task. If you need to run many commands,
- it is more efficient to simply print them down a pipeline to the
- shell:
-
- while (MORE STUFF TO DO)
- print COMMAND | "/bin/sh"
- close("/bin/sh")
-
- However, if your `awk' program is interactive, `system()' is
- useful for running large self-contained programs, such as a shell
- or an editor. Some operating systems cannot implement the
- `system()' function. `system()' causes a fatal error if it is not
- supported.
-
- NOTE: When `--sandbox' is specified, the `system()' function
- is disabled (*note Options::).
-
-
-Advanced Notes: Interactive Versus Noninteractive Buffering
------------------------------------------------------------
-
-As a side point, buffering issues can be even more confusing, depending
-upon whether your program is "interactive", i.e., communicating with a
-user sitting at a keyboard.(1)
-
- Interactive programs generally "line buffer" their output; i.e., they
-write out every line. Noninteractive programs wait until they have a
-full buffer, which may be many lines of output. Here is an example of
-the difference:
-
- $ awk '{ print $1 + $2 }'
- 1 1
- -| 2
- 2 3
- -| 5
- Ctrl-d
-
-Each line of output is printed immediately. Compare that behavior with
-this example:
-
- $ awk '{ print $1 + $2 }' | cat
- 1 1
- 2 3
- Ctrl-d
- -| 2
- -| 5
-
-Here, no output is printed until after the `Ctrl-d' is typed, because
-it is all buffered and sent down the pipe to `cat' in one shot.
-
-Advanced Notes: Controlling Output Buffering with `system()'
-------------------------------------------------------------
-
-The `fflush()' function provides explicit control over output buffering
-for individual files and pipes. However, its use is not portable to
-many other `awk' implementations. An alternative method to flush output
-buffers is to call `system()' with a null string as its argument:
-
- system("") # flush output
-
-`gawk' treats this use of the `system()' function as a special case and
-is smart enough not to run a shell (or other command interpreter) with
-the empty command. Therefore, with `gawk', this idiom is not only
-useful, it is also efficient. While this method should work with other
-`awk' implementations, it does not necessarily avoid starting an
-unnecessary shell. (Other implementations may only flush the buffer
-associated with the standard output and not necessarily all buffered
-output.)
-
- If you think about what a programmer expects, it makes sense that
-`system()' should flush any pending output. The following program:
-
- BEGIN {
- print "first print"
- system("echo system echo")
- print "second print"
- }
-
-must print:
-
- first print
- system echo
- second print
-
-and not:
-
- system echo
- first print
- second print
-
- If `awk' did not flush its buffers before calling `system()', you
-would see the latter (undesirable) output.
-
- ---------- Footnotes ----------
-
- (1) A program is interactive if the standard output is connected to
-a terminal device. On modern systems, this means your keyboard and
-screen.
-
-
-File: gawk.info, Node: Time Functions, Next: Bitwise Functions, Prev: I/O Functions, Up: Built-in
-
-9.1.5 Time Functions
---------------------
-
-`awk' programs are commonly used to process log files containing
-timestamp information, indicating when a particular log record was
-written. Many programs log their timestamp in the form returned by the
-`time()' system call, which is the number of seconds since a particular
-epoch. On POSIX-compliant systems, it is the number of seconds since
-1970-01-01 00:00:00 UTC, not counting leap seconds.(1) All known
-POSIX-compliant systems support timestamps from 0 through 2^31 - 1,
-which is sufficient to represent times through 2038-01-19 03:14:07 UTC.
-Many systems support a wider range of timestamps, including negative
-timestamps that represent times before the epoch.
-
- In order to make it easier to process such log files and to produce
-useful reports, `gawk' provides the following functions for working
-with timestamps. They are `gawk' extensions; they are not specified in
-the POSIX standard, nor are they in any other known version of `awk'.(2)
-Optional parameters are enclosed in square brackets ([ ]):
-
-`mktime(DATESPEC)'
- Turn DATESPEC into a timestamp in the same form as is returned by
- `systime()'. It is similar to the function of the same name in
- ISO C. The argument, DATESPEC, is a string of the form
- `"YYYY MM DD HH MM SS [DST]"'. The string consists of six or
- seven numbers representing, respectively, the full year including
- century, the month from 1 to 12, the day of the month from 1 to
- 31, the hour of the day from 0 to 23, the minute from 0 to 59, the
- second from 0 to 60,(3) and an optional daylight-savings flag.
-
- The values of these numbers need not be within the ranges
- specified; for example, an hour of -1 means 1 hour before midnight.
- The origin-zero Gregorian calendar is assumed, with year 0
- preceding year 1 and year -1 preceding year 0. The time is
- assumed to be in the local timezone. If the daylight-savings flag
- is positive, the time is assumed to be daylight savings time; if
- zero, the time is assumed to be standard time; and if negative
- (the default), `mktime()' attempts to determine whether daylight
- savings time is in effect for the specified time.
-
- If DATESPEC does not contain enough elements or if the resulting
- time is out of range, `mktime()' returns -1.
-
-`strftime([FORMAT [, TIMESTAMP [, UTC-FLAG]]])'
- Format the time specified by TIMESTAMP based on the contents of
- the FORMAT string and return the result. It is similar to the
- function of the same name in ISO C. If UTC-FLAG is present and is
- either nonzero or non-null, the value is formatted as UTC
- (Coordinated Universal Time, formerly GMT or Greenwich Mean Time).
- Otherwise, the value is formatted for the local time zone. The
- TIMESTAMP is in the same format as the value returned by the
- `systime()' function. If no TIMESTAMP argument is supplied,
- `gawk' uses the current time of day as the timestamp. If no
- FORMAT argument is supplied, `strftime()' uses the value of
- `PROCINFO["strftime"]' as the format string (*note Built-in
- Variables::). The default string value is
- `"%a %b %e %H:%M:%S %Z %Y"'. This format string produces output
- that is equivalent to that of the `date' utility. You can assign
- a new value to `PROCINFO["strftime"]' to change the default format.
-
-`systime()'
- Return the current time as the number of seconds since the system
- epoch. On POSIX systems, this is the number of seconds since
- 1970-01-01 00:00:00 UTC, not counting leap seconds. It may be a
- different number on other systems.
-
- The `systime()' function allows you to compare a timestamp from a
-log file with the current time of day. In particular, it is easy to
-determine how long ago a particular record was logged. It also allows
-you to produce log records using the "seconds since the epoch" format.
-
- The `mktime()' function allows you to convert a textual
-representation of a date and time into a timestamp. This makes it
-easy to do before/after comparisons of dates and times, particularly
-when dealing with date and time data coming from an external source,
-such as a log file.
-
- The `strftime()' function allows you to easily turn a timestamp into
-human-readable information. It is similar in nature to the `sprintf()'
-function (*note String Functions::), in that it copies nonformat
-specification characters verbatim to the returned string, while
-substituting date and time values for format specifications in the
-FORMAT string.
-
- `strftime()' is guaranteed by the 1999 ISO C standard(4) to support
-the following date format specifications:
-
-`%a'
- The locale's abbreviated weekday name.
-
-`%A'
- The locale's full weekday name.
-
-`%b'
- The locale's abbreviated month name.
-
-`%B'
- The locale's full month name.
-
-`%c'
- The locale's "appropriate" date and time representation. (This is
- `%A %B %d %T %Y' in the `"C"' locale.)
-
-`%C'
- The century part of the current year. This is the year divided by
- 100 and truncated to the next lower integer.
-
-`%d'
- The day of the month as a decimal number (01-31).
-
-`%D'
- Equivalent to specifying `%m/%d/%y'.
-
-`%e'
- The day of the month, padded with a space if it is only one digit.
-
-`%F'
- Equivalent to specifying `%Y-%m-%d'. This is the ISO 8601 date
- format.
-
-`%g'
- The year modulo 100 of the ISO 8601 week number, as a decimal
- number (00-99). For example, January 1, 1993 is in week 53 of
- 1992. Thus, the year of its ISO 8601 week number is 1992, even
- though its year is 1993. Similarly, December 31, 1973 is in week
- 1 of 1974. Thus, the year of its ISO week number is 1974, even
- though its year is 1973.
-
-`%G'
- The full year of the ISO week number, as a decimal number.
-
-`%h'
- Equivalent to `%b'.
-
-`%H'
- The hour (24-hour clock) as a decimal number (00-23).
-
-`%I'
- The hour (12-hour clock) as a decimal number (01-12).
-
-`%j'
- The day of the year as a decimal number (001-366).
-
-`%m'
- The month as a decimal number (01-12).
-
-`%M'
- The minute as a decimal number (00-59).
-
-`%n'
- A newline character (ASCII LF).
-
-`%p'
- The locale's equivalent of the AM/PM designations associated with
- a 12-hour clock.
-
-`%r'
- The locale's 12-hour clock time. (This is `%I:%M:%S %p' in the
- `"C"' locale.)
-
-`%R'
- Equivalent to specifying `%H:%M'.
-
-`%S'
- The second as a decimal number (00-60).
-
-`%t'
- A TAB character.
-
-`%T'
- Equivalent to specifying `%H:%M:%S'.
-
-`%u'
- The weekday as a decimal number (1-7). Monday is day one.
-
-`%U'
- The week number of the year (the first Sunday as the first day of
- week one) as a decimal number (00-53).
-
-`%V'
- The week number of the year (the first Monday as the first day of
- week one) as a decimal number (01-53). The method for determining
- the week number is as specified by ISO 8601. (To wit: if the week
- containing January 1 has four or more days in the new year, then
- it is week one; otherwise it is week 53 of the previous year and
- the next week is week one.)
-
-`%w'
- The weekday as a decimal number (0-6). Sunday is day zero.
-
-`%W'
- The week number of the year (the first Monday as the first day of
- week one) as a decimal number (00-53).
-
-`%x'
- The locale's "appropriate" date representation. (This is `%A %B
- %d %Y' in the `"C"' locale.)
-
-`%X'
- The locale's "appropriate" time representation. (This is `%T' in
- the `"C"' locale.)
-
-`%y'
- The year modulo 100 as a decimal number (00-99).
-
-`%Y'
- The full year as a decimal number (e.g., 2011).
-
-`%z'
- The timezone offset in a +HHMM format (e.g., the format necessary
- to produce RFC 822/RFC 1036 date headers).
-
-`%Z'
- The time zone name or abbreviation; no characters if no time zone
- is determinable.
-
-`%Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH'
-`%OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy'
- "Alternate representations" for the specifications that use only
- the second letter (`%c', `%C', and so on).(5) (These facilitate
- compliance with the POSIX `date' utility.)
-
-`%%'
- A literal `%'.
-
- If a conversion specifier is not one of the above, the behavior is
-undefined.(6)
-
- Informally, a "locale" is the geographic place in which a program is
-meant to run. For example, a common way to abbreviate the date
-September 4, 2012 in the United States is "9/4/12." In many countries
-in Europe, however, it is abbreviated "4.9.12." Thus, the `%x'
-specification in a `"US"' locale might produce `9/4/12', while in a
-`"EUROPE"' locale, it might produce `4.9.12'. The ISO C standard
-defines a default `"C"' locale, which is an environment that is typical
-of what many C programmers are used to.
-
- For systems that are not yet fully standards-compliant, `gawk'
-supplies a copy of `strftime()' from the GNU C Library. It supports
-all of the just-listed format specifications. If that version is used
-to compile `gawk' (*note Installation::), then the following additional
-format specifications are available:
-
-`%k'
- The hour (24-hour clock) as a decimal number (0-23). Single-digit
- numbers are padded with a space.
-
-`%l'
- The hour (12-hour clock) as a decimal number (1-12). Single-digit
- numbers are padded with a space.
-
-`%s'
- The time as a decimal timestamp in seconds since the epoch.
-
-
- Additionally, the alternate representations are recognized but their
-normal representations are used.
-
- The following example is an `awk' implementation of the POSIX `date'
-utility. Normally, the `date' utility prints the current date and time
-of day in a well-known format. However, if you provide an argument to
-it that begins with a `+', `date' copies nonformat specifier characters
-to the standard output and interprets the current time according to the
-format specifiers in the string. For example:
-
- $ date '+Today is %A, %B %d, %Y.'
- -| Today is Wednesday, March 30, 2011.
-
- Here is the `gawk' version of the `date' utility. It has a shell
-"wrapper" to handle the `-u' option, which requires that `date' run as
-if the time zone is set to UTC:
-
- #! /bin/sh
- #
- # date --- approximate the POSIX 'date' command
-
- case $1 in
- -u) TZ=UTC0 # use UTC
- export TZ
- shift ;;
- esac
-
- gawk 'BEGIN {
- format = "%a %b %e %H:%M:%S %Z %Y"
- exitval = 0
-
- if (ARGC > 2)
- exitval = 1
- else if (ARGC == 2) {
- format = ARGV[1]
- if (format ~ /^\+/)
- format = substr(format, 2) # remove leading +
- }
- print strftime(format)
- exit exitval
- }' "$@"
-
- ---------- Footnotes ----------
-
- (1) *Note Glossary::, especially the entries "Epoch" and "UTC."
-
- (2) The GNU `date' utility can also do many of the things described
-here. Its use may be preferable for simple time-related operations in
-shell scripts.
-
- (3) Occasionally there are minutes in a year with a leap second,
-which is why the seconds can go up to 60.
-
- (4) Unfortunately, not every system's `strftime()' necessarily
-supports all of the conversions listed here.
-
- (5) If you don't understand any of this, don't worry about it; these
-facilities are meant to make it easier to "internationalize" programs.
-Other internationalization features are described in *note
-Internationalization::.
-
- (6) This is because ISO C leaves the behavior of the C version of
-`strftime()' undefined and `gawk' uses the system's version of
-`strftime()' if it's there. Typically, the conversion specifier either
-does not appear in the returned string or appears literally.
-
-
-File: gawk.info, Node: Bitwise Functions, Next: Type Functions, Prev: Time Functions, Up: Built-in
-
-9.1.6 Bit-Manipulation Functions
---------------------------------
-
- I can explain it for you, but I can't understand it for you.
- Anonymous
-
- Many languages provide the ability to perform "bitwise" operations
-on two integer numbers. In other words, the operation is performed on
-each successive pair of bits in the operands. Three common operations
-are bitwise AND, OR, and XOR. The operations are described in *note
-table-bitwise-ops::.
-
- Bit Operator
- | AND | OR | XOR
- |--+--+--+--+--+--
- Operands | 0 | 1 | 0 | 1 | 0 | 1
- ---------+--+--+--+--+--+--
- 0 | 0 0 | 0 1 | 0 1
- 1 | 0 1 | 1 1 | 1 0
-
-Table 9.6: Bitwise Operations
-
- As you can see, the result of an AND operation is 1 only when _both_
-bits are 1. The result of an OR operation is 1 if _either_ bit is 1.
-The result of an XOR operation is 1 if either bit is 1, but not both.
-The next operation is the "complement"; the complement of 1 is 0 and
-the complement of 0 is 1. Thus, this operation "flips" all the bits of
-a given value.
-
- Finally, two other common operations are to shift the bits left or
-right. For example, if you have a bit string `10111001' and you shift
-it right by three bits, you end up with `00010111'.(1) If you start over
-again with `10111001' and shift it left by three bits, you end up with
-`11001000'. `gawk' provides built-in functions that implement the
-bitwise operations just described. They are:
-
-`and(V1, V2)'
- Return the bitwise AND of the values provided by V1 and V2.
-
-`compl(VAL)'
- Return the bitwise complement of VAL.
-
-`lshift(VAL, COUNT)'
- Return the value of VAL, shifted left by COUNT bits.
-
-`or(V1, V2)'
- Return the bitwise OR of the values provided by V1 and V2.
-
-`rshift(VAL, COUNT)'
- Return the value of VAL, shifted right by COUNT bits.
-
-`xor(V1, V2)'
- Return the bitwise XOR of the values provided by V1 and V2.
-
- For all of these functions, first the double precision
-floating-point value is converted to the widest C unsigned integer
-type, then the bitwise operation is performed. If the result cannot be
-represented exactly as a C `double', leading nonzero bits are removed
-one by one until it can be represented exactly. The result is then
-converted back into a C `double'. (If you don't understand this
-paragraph, don't worry about it.)
-
- Here is a user-defined function (*note User-defined::) that
-illustrates the use of these functions:
-
- # bits2str --- turn a byte into readable 1's and 0's
-
- function bits2str(bits, data, mask)
- {
- if (bits == 0)
- return "0"
-
- mask = 1
- for (; bits != 0; bits = rshift(bits, 1))
- data = (and(bits, mask) ? "1" : "0") data
-
- while ((length(data) % 8) != 0)
- data = "0" data
-
- return data
- }
-
- BEGIN {
- printf "123 = %s\n", bits2str(123)
- printf "0123 = %s\n", bits2str(0123)
- printf "0x99 = %s\n", bits2str(0x99)
- comp = compl(0x99)
- printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp)
- shift = lshift(0x99, 2)
- printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
- shift = rshift(0x99, 2)
- printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
- }
-
-This program produces the following output when run:
-
- $ gawk -f testbits.awk
- -| 123 = 01111011
- -| 0123 = 01010011
- -| 0x99 = 10011001
- -| compl(0x99) = 0xffffff66 = 11111111111111111111111101100110
- -| lshift(0x99, 2) = 0x264 = 0000001001100100
- -| rshift(0x99, 2) = 0x26 = 00100110
-
- The `bits2str()' function turns a binary number into a string. The
-number `1' represents a binary value where the rightmost bit is set to
-1. Using this mask, the function repeatedly checks the rightmost bit.
-ANDing the mask with the value indicates whether the rightmost bit is 1
-or not. If so, a `"1"' is concatenated onto the front of the string.
-Otherwise, a `"0"' is added. The value is then shifted right by one
-bit and the loop continues until there are no more 1 bits.
-
- If the initial value is zero it returns a simple `"0"'. Otherwise,
-at the end, it pads the value with zeros to represent multiples of
-8-bit quantities. This is typical in modern computers.
-
- The main code in the `BEGIN' rule shows the difference between the
-decimal and octal values for the same numbers (*note
-Nondecimal-numbers::), and then demonstrates the results of the
-`compl()', `lshift()', and `rshift()' functions.
-
- ---------- Footnotes ----------
-
- (1) This example shows that 0's come in on the left side. For
-`gawk', this is always true, but in some languages, it's possible to
-have the left side fill with 1's. Caveat emptor.
-
-
-File: gawk.info, Node: Type Functions, Next: I18N Functions, Prev: Bitwise Functions, Up: Built-in
-
-9.1.7 Getting Type Information
-------------------------------
-
-`gawk' provides a single function that lets you distinguish an array
-from a scalar variable. This is necessary for writing code that
-traverses every element of a true multidimensional array (*note Arrays
-of Arrays::).
-
-`isarray(X)'
- Return a true value if X is an array. Otherwise return false.
-
-
-File: gawk.info, Node: I18N Functions, Prev: Type Functions, Up: Built-in
-
-9.1.8 String-Translation Functions
-----------------------------------
-
-`gawk' provides facilities for internationalizing `awk' programs.
-These include the functions described in the following list. The
-descriptions here are purposely brief. *Note Internationalization::,
-for the full story. Optional parameters are enclosed in square
-brackets ([ ]):
-
-`bindtextdomain(DIRECTORY [, DOMAIN])'
- Set the directory in which `gawk' will look for message
- translation files, in case they will not or cannot be placed in
- the "standard" locations (e.g., during testing). It returns the
- directory in which DOMAIN is "bound."
-
- The default DOMAIN is the value of `TEXTDOMAIN'. If DIRECTORY is
- the null string (`""'), then `bindtextdomain()' returns the
- current binding for the given DOMAIN.
-
-`dcgettext(STRING [, DOMAIN [, CATEGORY]])'
- Return the translation of STRING in text domain DOMAIN for locale
- category CATEGORY. The default value for DOMAIN is the current
- value of `TEXTDOMAIN'. The default value for CATEGORY is
- `"LC_MESSAGES"'.
-
-`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])'
- Return the plural form used for NUMBER of the translation of
- STRING1 and STRING2 in text domain DOMAIN for locale category
- CATEGORY. STRING1 is the English singular variant of a message,
- and STRING2 the English plural variant of the same message. The
- default value for DOMAIN is the current value of `TEXTDOMAIN'.
- The default value for CATEGORY is `"LC_MESSAGES"'.
-
-
-File: gawk.info, Node: User-defined, Next: Indirect Calls, Prev: Built-in, Up: Functions
-
-9.2 User-Defined Functions
-==========================
-
-Complicated `awk' programs can often be simplified by defining your own
-functions. User-defined functions can be called just like built-in
-ones (*note Function Calls::), but it is up to you to define them,
-i.e., to tell `awk' what they should do.
-
-* Menu:
-
-* Definition Syntax:: How to write definitions and what they mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-
-
-File: gawk.info, Node: Definition Syntax, Next: Function Example, Up: User-defined
-
-9.2.1 Function Definition Syntax
---------------------------------
-
-Definitions of functions can appear anywhere between the rules of an
-`awk' program. Thus, the general form of an `awk' program is extended
-to include sequences of rules _and_ user-defined function definitions.
-There is no need to put the definition of a function before all uses of
-the function. This is because `awk' reads the entire program before
-starting to execute any of it.
-
- The definition of a function named NAME looks like this:
-
- function NAME([PARAMETER-LIST])
- {
- BODY-OF-FUNCTION
- }
-
-Here, NAME is the name of the function to define. A valid function
-name is like a valid variable name: a sequence of letters, digits, and
-underscores that doesn't start with a digit. Within a single `awk'
-program, any particular name can only be used as a variable, array, or
-function.
-
- PARAMETER-LIST is an optional list of the function's arguments and
-local variable names, separated by commas. When the function is called,
-the argument names are used to hold the argument values given in the
-call. The local variables are initialized to the empty string. A
-function cannot have two parameters with the same name, nor may it have
-a parameter with the same name as the function itself.
-
- In addition, according to the POSIX standard, function parameters
-cannot have the same name as one of the special built-in variables
-(*note Built-in Variables::. Not all versions of `awk' enforce this
-restriction.
-
- The BODY-OF-FUNCTION consists of `awk' statements. It is the most
-important part of the definition, because it says what the function
-should actually _do_. The argument names exist to give the body a way
-to talk about the arguments; local variables exist to give the body
-places to keep temporary values.
-
- Argument names are not distinguished syntactically from local
-variable names. Instead, the number of arguments supplied when the
-function is called determines how many argument variables there are.
-Thus, if three argument values are given, the first three names in
-PARAMETER-LIST are arguments and the rest are local variables.
-
- It follows that if the number of arguments is not the same in all
-calls to the function, some of the names in PARAMETER-LIST may be
-arguments on some occasions and local variables on others. Another way
-to think of this is that omitted arguments default to the null string.
-
- Usually when you write a function, you know how many names you
-intend to use for arguments and how many you intend to use as local
-variables. It is conventional to place some extra space between the
-arguments and the local variables, in order to document how your
-function is supposed to be used.
-
- During execution of the function body, the arguments and local
-variable values hide, or "shadow", any variables of the same names used
-in the rest of the program. The shadowed variables are not accessible
-in the function definition, because there is no way to name them while
-their names have been taken away for the local variables. All other
-variables used in the `awk' program can be referenced or set normally
-in the function's body.
-
- The arguments and local variables last only as long as the function
-body is executing. Once the body finishes, you can once again access
-the variables that were shadowed while the function was running.
-
- The function body can contain expressions that call functions. They
-can even call this function, either directly or by way of another
-function. When this happens, we say the function is "recursive". The
-act of a function calling itself is called "recursion".
-
- All the built-in functions return a value to their caller.
-User-defined functions can do also, using the `return' statement, which
-is described in detail in *note Return Statement::. Many of the
-subsequent examples in this minor node use the `return' statement.
-
- In many `awk' implementations, including `gawk', the keyword
-`function' may be abbreviated `func'. (c.e.) However, POSIX only
-specifies the use of the keyword `function'. This actually has some
-practical implications. If `gawk' is in POSIX-compatibility mode
-(*note Options::), then the following statement does _not_ define a
-function:
-
- func foo() { a = sqrt($1) ; print a }
-
-Instead it defines a rule that, for each record, concatenates the value
-of the variable `func' with the return value of the function `foo'. If
-the resulting string is non-null, the action is executed. This is
-probably not what is desired. (`awk' accepts this input as
-syntactically valid, because functions may be used before they are
-defined in `awk' programs.(1))
-
- To ensure that your `awk' programs are portable, always use the
-keyword `function' when defining a function.
-
- ---------- Footnotes ----------
-
- (1) This program won't actually run, since `foo()' is undefined.
-
-
-File: gawk.info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
-
-9.2.2 Function Definition Examples
-----------------------------------
-
-Here is an example of a user-defined function, called `myprint()', that
-takes a number and prints it in a specific format:
-
- function myprint(num)
- {
- printf "%6.3g\n", num
- }
-
-To illustrate, here is an `awk' rule that uses our `myprint' function:
-
- $3 > 0 { myprint($3) }
-
-This program prints, in our special format, all the third fields that
-contain a positive number in our input. Therefore, when given the
-following input:
-
- 1.2 3.4 5.6 7.8
- 9.10 11.12 -13.14 15.16
- 17.18 19.20 21.22 23.24
-
-this program, using our function to format the results, prints:
-
- 5.6
- 21.2
-
- This function deletes all the elements in an array:
-
- function delarray(a, i)
- {
- for (i in a)
- delete a[i]
- }
-
- When working with arrays, it is often necessary to delete all the
-elements in an array and start over with a new list of elements (*note
-Delete::). Instead of having to repeat this loop everywhere that you
-need to clear out an array, your program can just call `delarray'.
-(This guarantees portability. The use of `delete ARRAY' to delete the
-contents of an entire array is a nonstandard extension.)
-
- The following is an example of a recursive function. It takes a
-string as an input parameter and returns the string in backwards order.
-Recursive functions must always have a test that stops the recursion.
-In this case, the recursion terminates when the starting position is
-zero, i.e., when there are no more characters left in the string.
-
- function rev(str, start)
- {
- if (start == 0)
- return ""
-
- return (substr(str, start, 1) rev(str, start - 1))
- }
-
- If this function is in a file named `rev.awk', it can be tested this
-way:
-
- $ echo "Don't Panic!" |
- > gawk --source '{ print rev($0, length($0)) }' -f rev.awk
- -| !cinaP t'noD
-
- The C `ctime()' function takes a timestamp and returns it in a
-string, formatted in a well-known fashion. The following example uses
-the built-in `strftime()' function (*note Time Functions::) to create
-an `awk' version of `ctime()':
-
- # ctime.awk
- #
- # awk version of C ctime(3) function
-
- function ctime(ts, format)
- {
- format = "%a %b %e %H:%M:%S %Z %Y"
- if (ts == 0)
- ts = systime() # use current time as default
- return strftime(format, ts)
- }
-
-
-File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
-
-9.2.3 Calling User-Defined Functions
-------------------------------------
-
-This section describes how to call a user-defined function.
-
-* Menu:
-
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-
-
-File: gawk.info, Node: Calling A Function, Next: Variable Scope, Up: Function Caveats
-
-9.2.3.1 Writing A Function Call
-...............................
-
-"Calling a function" means causing the function to run and do its job.
-A function call is an expression and its value is the value returned by
-the function.
-
- A function call consists of the function name followed by the
-arguments in parentheses. `awk' expressions are what you write in the
-call for the arguments. Each time the call is executed, these
-expressions are evaluated, and the values become the actual arguments.
-For example, here is a call to `foo()' with three arguments (the first
-being a string concatenation):
-
- foo(x y, "lose", 4 * z)
-
- CAUTION: Whitespace characters (spaces and TABs) are not allowed
- between the function name and the open-parenthesis of the argument
- list. If you write whitespace by mistake, `awk' might think that
- you mean to concatenate a variable with an expression in
- parentheses. However, it notices that you used a function name
- and not a variable name, and reports an error.
-
-
-File: gawk.info, Node: Variable Scope, Next: Pass By Value/Reference, Prev: Calling A Function, Up: Function Caveats
-
-9.2.3.2 Controlling Variable Scope
-..................................
-
-There is no way to make a variable local to a `{ ... }' block in `awk',
-but you can make a variable local to a function. It is good practice to
-do so whenever a variable is needed only in that function.
-
- To make a variable local to a function, simply declare the variable
-as an argument after the actual function arguments (*note Definition
-Syntax::). Look at the following example where variable `i' is a
-global variable used by both functions `foo()' and `bar()':
-
- function bar()
- {
- for (i = 0; i < 3; i++)
- print "bar's i=" i
- }
-
- function foo(j)
- {
- i = j + 1
- print "foo's i=" i
- bar()
- print "foo's i=" i
- }
-
- BEGIN {
- i = 10
- print "top's i=" i
- foo(0)
- print "top's i=" i
- }
-
- Running this script produces the following, because the `i' in
-functions `foo()' and `bar()' and at the top level refer to the same
-variable instance:
-
- top's i=10
- foo's i=1
- bar's i=0
- bar's i=1
- bar's i=2
- foo's i=3
- top's i=3
-
- If you want `i' to be local to both `foo()' and `bar()' do as
-follows (the extra-space before `i' is a coding convention to indicate
-that `i' is a local variable, not an argument):
-
- function bar( i)
- {
- for (i = 0; i < 3; i++)
- print "bar's i=" i
- }
-
- function foo(j, i)
- {
- i = j + 1
- print "foo's i=" i
- bar()
- print "foo's i=" i
- }
-
- BEGIN {
- i = 10
- print "top's i=" i
- foo(0)
- print "top's i=" i
- }
-
- Running the corrected script produces the following:
-
- top's i=10
- foo's i=1
- bar's i=0
- bar's i=1
- bar's i=2
- foo's i=1
- top's i=10
-
-
-File: gawk.info, Node: Pass By Value/Reference, Prev: Variable Scope, Up: Function Caveats
-
-9.2.3.3 Passing Function Arguments By Value Or By Reference
-...........................................................
-
-In `awk', when you declare a function, there is no way to declare
-explicitly whether the arguments are passed "by value" or "by
-reference".
-
- Instead the passing convention is determined at runtime when the
-function is called according to the following rule:
-
- * If the argument is an array variable, then it is passed by
- reference,
-
- * Otherwise the argument is passed by value.
-
- Passing an argument by value means that when a function is called, it
-is given a _copy_ of the value of this argument. The caller may use a
-variable as the expression for the argument, but the called function
-does not know this--it only knows what value the argument had. For
-example, if you write the following code:
-
- foo = "bar"
- z = myfunc(foo)
-
-then you should not think of the argument to `myfunc()' as being "the
-variable `foo'." Instead, think of the argument as the string value
-`"bar"'. If the function `myfunc()' alters the values of its local
-variables, this has no effect on any other variables. Thus, if
-`myfunc()' does this:
-
- function myfunc(str)
- {
- print str
- str = "zzz"
- print str
- }
-
-to change its first argument variable `str', it does _not_ change the
-value of `foo' in the caller. The role of `foo' in calling `myfunc()'
-ended when its value (`"bar"') was computed. If `str' also exists
-outside of `myfunc()', the function body cannot alter this outer value,
-because it is shadowed during the execution of `myfunc()' and cannot be
-seen or changed from there.
-
- However, when arrays are the parameters to functions, they are _not_
-copied. Instead, the array itself is made available for direct
-manipulation by the function. This is usually termed "call by
-reference". Changes made to an array parameter inside the body of a
-function _are_ visible outside that function.
-
- NOTE: Changing an array parameter inside a function can be very
- dangerous if you do not watch what you are doing. For example:
-
- function changeit(array, ind, nvalue)
- {
- array[ind] = nvalue
- }
-
- BEGIN {
- a[1] = 1; a[2] = 2; a[3] = 3
- changeit(a, 2, "two")
- printf "a[1] = %s, a[2] = %s, a[3] = %s\n",
- a[1], a[2], a[3]
- }
-
- prints `a[1] = 1, a[2] = two, a[3] = 3', because `changeit' stores
- `"two"' in the second element of `a'.
-
- Some `awk' implementations allow you to call a function that has not
-been defined. They only report a problem at runtime when the program
-actually tries to call the function. For example:
-
- BEGIN {
- if (0)
- foo()
- else
- bar()
- }
- function bar() { ... }
- # note that `foo' is not defined
-
-Because the `if' statement will never be true, it is not really a
-problem that `foo()' has not been defined. Usually, though, it is a
-problem if a program calls an undefined function.
-
- If `--lint' is specified (*note Options::), `gawk' reports calls to
-undefined functions.
-
- Some `awk' implementations generate a runtime error if you use the
-`next' statement (*note Next Statement::) inside a user-defined
-function. `gawk' does not have this limitation.
-
-
-File: gawk.info, Node: Return Statement, Next: Dynamic Typing, Prev: Function Caveats, Up: User-defined
-
-9.2.4 The `return' Statement
-----------------------------
-
-As seen in several earlier examples, the body of a user-defined
-function can contain a `return' statement. This statement returns
-control to the calling part of the `awk' program. It can also be used
-to return a value for use in the rest of the `awk' program. It looks
-like this:
-
- return [EXPRESSION]
-
- The EXPRESSION part is optional. Due most likely to an oversight,
-POSIX does not define what the return value is if you omit the
-EXPRESSION. Technically speaking, this make the returned value
-undefined, and therefore, unpredictable. In practice, though, all
-versions of `awk' simply return the null string, which acts like zero
-if used in a numeric context.
-
- A `return' statement with no value expression is assumed at the end
-of every function definition. So if control reaches the end of the
-function body, then technically, the function returns an unpredictable
-value. In practice, it returns the empty string. `awk' does _not_
-warn you if you use the return value of such a function.
-
- Sometimes, you want to write a function for what it does, not for
-what it returns. Such a function corresponds to a `void' function in
-C, C++ or Java, or to a `procedure' in Ada. Thus, it may be
-appropriate to not return any value; simply bear in mind that you
-should not be using the return value of such a function.
-
- The following is an example of a user-defined function that returns
-a value for the largest number among the elements of an array:
-
- function maxelt(vec, i, ret)
- {
- for (i in vec) {
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- }
- return ret
- }
-
-You call `maxelt()' with one argument, which is an array name. The
-local variables `i' and `ret' are not intended to be arguments; while
-there is nothing to stop you from passing more than one argument to
-`maxelt()', the results would be strange. The extra space before `i'
-in the function parameter list indicates that `i' and `ret' are local
-variables. You should follow this convention when defining functions.
-
- The following program uses the `maxelt()' function. It loads an
-array, calls `maxelt()', and then reports the maximum number in that
-array:
-
- function maxelt(vec, i, ret)
- {
- for (i in vec) {
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- }
- return ret
- }
-
- # Load all fields of each record into nums.
- {
- for(i = 1; i <= NF; i++)
- nums[NR, i] = $i
- }
-
- END {
- print maxelt(nums)
- }
-
- Given the following input:
-
- 1 5 23 8 16
- 44 3 5 2 8 26
- 256 291 1396 2962 100
- -6 467 998 1101
- 99385 11 0 225
-
-the program reports (predictably) that 99,385 is the largest value in
-the array.
-
-
-File: gawk.info, Node: Dynamic Typing, Prev: Return Statement, Up: User-defined
-
-9.2.5 Functions and Their Effects on Variable Typing
-----------------------------------------------------
-
-`awk' is a very fluid language. It is possible that `awk' can't tell
-if an identifier represents a scalar variable or an array until runtime.
-Here is an annotated sample program:
-
- function foo(a)
- {
- a[1] = 1 # parameter is an array
- }
-
- BEGIN {
- b = 1
- foo(b) # invalid: fatal type mismatch
-
- foo(x) # x uninitialized, becomes an array dynamically
- x = 1 # now not allowed, runtime error
- }
-
- Usually, such things aren't a big issue, but it's worth being aware
-of them.
-
-
-File: gawk.info, Node: Indirect Calls, Prev: User-defined, Up: Functions
-
-9.3 Indirect Function Calls
-===========================
-
-This section describes a `gawk'-specific extension.
-
- Often, you may wish to defer the choice of function to call until
-runtime. For example, you may have different kinds of records, each of
-which should be processed differently.
-
- Normally, you would have to use a series of `if'-`else' statements
-to decide which function to call. By using "indirect" function calls,
-you can specify the name of the function to call as a string variable,
-and then call the function. Let's look at an example.
-
- Suppose you have a file with your test scores for the classes you
-are taking. The first field is the class name. The following fields
-are the functions to call to process the data, up to a "marker" field
-`data:'. Following the marker, to the end of the record, are the
-various numeric test scores.
-
- Here is the initial file; you wish to get the sum and the average of
-your test scores:
-
- Biology_101 sum average data: 87.0 92.4 78.5 94.9
- Chemistry_305 sum average data: 75.2 98.3 94.7 88.2
- English_401 sum average data: 100.0 95.6 87.1 93.4
-
- To process the data, you might write initially:
-
- {
- class = $1
- for (i = 2; $i != "data:"; i++) {
- if ($i == "sum")
- sum() # processes the whole record
- else if ($i == "average")
- average()
- ... # and so on
- }
- }
-
-This style of programming works, but can be awkward. With "indirect"
-function calls, you tell `gawk' to use the _value_ of a variable as the
-name of the function to call.
-
- The syntax is similar to that of a regular function call: an
-identifier immediately followed by a left parenthesis, any arguments,
-and then a closing right parenthesis, with the addition of a leading `@'
-character:
-
- the_func = "sum"
- result = @the_func() # calls the `sum' function
-
- Here is a full program that processes the previously shown data,
-using indirect function calls.
-
- # indirectcall.awk --- Demonstrate indirect function calls
-
- # average --- return the average of the values in fields $first - $last
-
- function average(first, last, sum, i)
- {
- sum = 0;
- for (i = first; i <= last; i++)
- sum += $i
-
- return sum / (last - first + 1)
- }
-
- # sum --- return the sum of the values in fields $first - $last
-
- function sum(first, last, ret, i)
- {
- ret = 0;
- for (i = first; i <= last; i++)
- ret += $i
-
- return ret
- }
-
- These two functions expect to work on fields; thus the parameters
-`first' and `last' indicate where in the fields to start and end.
-Otherwise they perform the expected computations and are not unusual.
-
- # For each record, print the class name and the requested statistics
-
- {
- class_name = $1
- gsub(/_/, " ", class_name) # Replace _ with spaces
-
- # find start
- for (i = 1; i <= NF; i++) {
- if ($i == "data:") {
- start = i + 1
- break
- }
- }
-
- printf("%s:\n", class_name)
- for (i = 2; $i != "data:"; i++) {
- the_function = $i
- printf("\t%s: <%s>\n", $i, @the_function(start, NF) "")
- }
- print ""
- }
-
- This is the main processing for each record. It prints the class
-name (with underscores replaced with spaces). It then finds the start
-of the actual data, saving it in `start'. The last part of the code
-loops through each function name (from `$2' up to the marker, `data:'),
-calling the function named by the field. The indirect function call
-itself occurs as a parameter in the call to `printf'. (The `printf'
-format string uses `%s' as the format specifier so that we can use
-functions that return strings, as well as numbers. Note that the result
-from the indirect call is concatenated with the empty string, in order
-to force it to be a string value.)
-
- Here is the result of running the program:
-
- $ gawk -f indirectcall.awk class_data1
- -| Biology 101:
- -| sum: <352.8>
- -| average: <88.2>
- -|
- -| Chemistry 305:
- -| sum: <356.4>
- -| average: <89.1>
- -|
- -| English 401:
- -| sum: <376.1>
- -| average: <94.025>
-
- The ability to use indirect function calls is more powerful than you
-may think at first. The C and C++ languages provide "function
-pointers," which are a mechanism for calling a function chosen at
-runtime. One of the most well-known uses of this ability is the C
-`qsort()' function, which sorts an array using the famous "quick sort"
-algorithm (see the Wikipedia article
-(http://en.wikipedia.org/wiki/Quick_sort) for more information). To
-use this function, you supply a pointer to a comparison function. This
-mechanism allows you to sort arbitrary data in an arbitrary fashion.
-
- We can do something similar using `gawk', like this:
-
- # quicksort.awk --- Quicksort algorithm, with user-supplied
- # comparison function
- # quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia
- # or almost any algorithms or computer science text
-
- function quicksort(data, left, right, less_than, i, last)
- {
- if (left >= right) # do nothing if array contains fewer
- return # than two elements
-
- quicksort_swap(data, left, int((left + right) / 2))
- last = left
- for (i = left + 1; i <= right; i++)
- if (@less_than(data[i], data[left]))
- quicksort_swap(data, ++last, i)
- quicksort_swap(data, left, last)
- quicksort(data, left, last - 1, less_than)
- quicksort(data, last + 1, right, less_than)
- }
-
- # quicksort_swap --- helper function for quicksort, should really be inline
-
- function quicksort_swap(data, i, j, temp)
- {
- temp = data[i]
- data[i] = data[j]
- data[j] = temp
- }
-
- The `quicksort()' function receives the `data' array, the starting
-and ending indices to sort (`left' and `right'), and the name of a
-function that performs a "less than" comparison. It then implements
-the quick sort algorithm.
-
- To make use of the sorting function, we return to our previous
-example. The first thing to do is write some comparison functions:
-
- # num_lt --- do a numeric less than comparison
-
- function num_lt(left, right)
- {
- return ((left + 0) < (right + 0))
- }
-
- # num_ge --- do a numeric greater than or equal to comparison
-
- function num_ge(left, right)
- {
- return ((left + 0) >= (right + 0))
- }
-
- The `num_ge()' function is needed to perform a descending sort; when
-used to perform a "less than" test, it actually does the opposite
-(greater than or equal to), which yields data sorted in descending
-order.
-
- Next comes a sorting function. It is parameterized with the
-starting and ending field numbers and the comparison function. It
-builds an array with the data and calls `quicksort' appropriately, and
-then formats the results as a single string:
-
- # do_sort --- sort the data according to `compare'
- # and return it as a string
-
- function do_sort(first, last, compare, data, i, retval)
- {
- delete data
- for (i = 1; first <= last; first++) {
- data[i] = $first
- i++
- }
-
- quicksort(data, 1, i-1, compare)
-
- retval = data[1]
- for (i = 2; i in data; i++)
- retval = retval " " data[i]
-
- return retval
- }
-
- Finally, the two sorting functions call `do_sort()', passing in the
-names of the two comparison functions:
-
- # sort --- sort the data in ascending order and return it as a string
-
- function sort(first, last)
- {
- return do_sort(first, last, "num_lt")
- }
-
- # rsort --- sort the data in descending order and return it as a string
-
- function rsort(first, last)
- {
- return do_sort(first, last, "num_ge")
- }
-
- Here is an extended version of the data file:
-
- Biology_101 sum average sort rsort data: 87.0 92.4 78.5 94.9
- Chemistry_305 sum average sort rsort data: 75.2 98.3 94.7 88.2
- English_401 sum average sort rsort data: 100.0 95.6 87.1 93.4
-
- Finally, here are the results when the enhanced program is run:
-
- $ gawk -f quicksort.awk -f indirectcall.awk class_data2
- -| Biology 101:
- -| sum: <352.8>
- -| average: <88.2>
- -| sort: <78.5 87.0 92.4 94.9>
- -| rsort: <94.9 92.4 87.0 78.5>
- -|
- -| Chemistry 305:
- -| sum: <356.4>
- -| average: <89.1>
- -| sort: <75.2 88.2 94.7 98.3>
- -| rsort: <98.3 94.7 88.2 75.2>
- -|
- -| English 401:
- -| sum: <376.1>
- -| average: <94.025>
- -| sort: <87.1 93.4 95.6 100.0>
- -| rsort: <100.0 95.6 93.4 87.1>
-
- Remember that you must supply a leading `@' in front of an indirect
-function call.
-
- Unfortunately, indirect function calls cannot be used with the
-built-in functions. However, you can generally write "wrapper"
-functions which call the built-in ones, and those can be called
-indirectly. (Other than, perhaps, the mathematical functions, there is
-not a lot of reason to try to call the built-in functions indirectly.)
-
- `gawk' does its best to make indirect function calls efficient. For
-example, in the following case:
-
- for (i = 1; i <= n; i++)
- @the_func()
-
-`gawk' will look up the actual function to call only once.
-
-
-File: gawk.info, Node: Internationalization, Next: Arbitrary Precision Arithmetic, Prev: Functions, Up: Top
-
-10 Internationalization with `gawk'
-***********************************
-
-Once upon a time, computer makers wrote software that worked only in
-English. Eventually, hardware and software vendors noticed that if
-their systems worked in the native languages of non-English-speaking
-countries, they were able to sell more systems. As a result,
-internationalization and localization of programs and software systems
-became a common practice.
-
- For many years, the ability to provide internationalization was
-largely restricted to programs written in C and C++. This major node
-describes the underlying library `gawk' uses for internationalization,
-as well as how `gawk' makes internationalization features available at
-the `awk' program level. Having internationalization available at the
-`awk' level gives software developers additional flexibility--they are
-no longer forced to write in C or C++ when internationalization is a
-requirement.
-
-* Menu:
-
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU `gettext' works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: `gawk' is also internationalized.
-
-
-File: gawk.info, Node: I18N and L10N, Next: Explaining gettext, Up: Internationalization
-
-10.1 Internationalization and Localization
-==========================================
-
-"Internationalization" means writing (or modifying) a program once, in
-such a way that it can use multiple languages without requiring further
-source-code changes. "Localization" means providing the data necessary
-for an internationalized program to work in a particular language.
-Most typically, these terms refer to features such as the language used
-for printing error messages, the language used to read responses, and
-information related to how numerical and monetary values are printed
-and read.
-
-
-File: gawk.info, Node: Explaining gettext, Next: Programmer i18n, Prev: I18N and L10N, Up: Internationalization
-
-10.2 GNU `gettext'
-==================
-
-The facilities in GNU `gettext' focus on messages; strings printed by a
-program, either directly or via formatting with `printf' or
-`sprintf()'.(1)
-
- When using GNU `gettext', each application has its own "text
-domain". This is a unique name, such as `kpilot' or `gawk', that
-identifies the application. A complete application may have multiple
-components--programs written in C or C++, as well as scripts written in
-`sh' or `awk'. All of the components use the same text domain.
-
- To make the discussion concrete, assume we're writing an application
-named `guide'. Internationalization consists of the following steps,
-in this order:
-
- 1. The programmer goes through the source for all of `guide''s
- components and marks each string that is a candidate for
- translation. For example, `"`-F': option required"' is a good
- candidate for translation. A table with strings of option names
- is not (e.g., `gawk''s `--profile' option should remain the same,
- no matter what the local language).
-
- 2. The programmer indicates the application's text domain (`"guide"')
- to the `gettext' library, by calling the `textdomain()' function.
-
- 3. Messages from the application are extracted from the source code
- and collected into a portable object template file (`guide.pot'),
- which lists the strings and their translations. The translations
- are initially empty. The original (usually English) messages
- serve as the key for lookup of the translations.
-
- 4. For each language with a translator, `guide.pot' is copied to a
- portable object file (`.po') and translations are created and
- shipped with the application. For example, there might be a
- `fr.po' for a French translation.
-
- 5. Each language's `.po' file is converted into a binary message
- object (`.mo') file. A message object file contains the original
- messages and their translations in a binary format that allows
- fast lookup of translations at runtime.
-
- 6. When `guide' is built and installed, the binary translation files
- are installed in a standard place.
-
- 7. For testing and development, it is possible to tell `gettext' to
- use `.mo' files in a different directory than the standard one by
- using the `bindtextdomain()' function.
-
- 8. At runtime, `guide' looks up each string via a call to
- `gettext()'. The returned string is the translated string if
- available, or the original string if not.
-
- 9. If necessary, it is possible to access messages from a different
- text domain than the one belonging to the application, without
- having to switch the application's default text domain back and
- forth.
-
- In C (or C++), the string marking and dynamic translation lookup are
-accomplished by wrapping each string in a call to `gettext()':
-
- printf("%s", gettext("Don't Panic!\n"));
-
- The tools that extract messages from source code pull out all
-strings enclosed in calls to `gettext()'.
-
- The GNU `gettext' developers, recognizing that typing `gettext(...)'
-over and over again is both painful and ugly to look at, use the macro
-`_' (an underscore) to make things easier:
-
- /* In the standard header file: */
- #define _(str) gettext(str)
-
- /* In the program text: */
- printf("%s", _("Don't Panic!\n"));
-
-This reduces the typing overhead to just three extra characters per
-string and is considerably easier to read as well.
-
- There are locale "categories" for different types of locale-related
-information. The defined locale categories that `gettext' knows about
-are:
-
-`LC_MESSAGES'
- Text messages. This is the default category for `gettext'
- operations, but it is possible to supply a different one
- explicitly, if necessary. (It is almost never necessary to supply
- a different category.)
-
-`LC_COLLATE'
- Text-collation information; i.e., how different characters and/or
- groups of characters sort in a given language.
-
-`LC_CTYPE'
- Character-type information (alphabetic, digit, upper- or
- lowercase, and so on). This information is accessed via the POSIX
- character classes in regular expressions, such as `/[[:alnum:]]/'
- (*note Regexp Operators::).
-
-`LC_MONETARY'
- Monetary information, such as the currency symbol, and whether the
- symbol goes before or after a number.
-
-`LC_NUMERIC'
- Numeric information, such as which characters to use for the
- decimal point and the thousands separator.(2)
-
-`LC_RESPONSE'
- Response information, such as how "yes" and "no" appear in the
- local language, and possibly other information as well.
-
-`LC_TIME'
- Time- and date-related information, such as 12- or 24-hour clock,
- month printed before or after the day in a date, local month
- abbreviations, and so on.
-
-`LC_ALL'
- All of the above. (Not too useful in the context of `gettext'.)
-
- ---------- Footnotes ----------
-
- (1) For some operating systems, the `gawk' port doesn't support GNU
-`gettext'. Therefore, these features are not available if you are
-using one of those operating systems. Sorry.
-
- (2) Americans use a comma every three decimal places and a period
-for the decimal point, while many Europeans do exactly the opposite:
-1,234.56 versus 1.234,56.
-
-
-File: gawk.info, Node: Programmer i18n, Next: Translator i18n, Prev: Explaining gettext, Up: Internationalization
-
-10.3 Internationalizing `awk' Programs
-======================================
-
-`gawk' provides the following variables and functions for
-internationalization:
-
-`TEXTDOMAIN'
- This variable indicates the application's text domain. For
- compatibility with GNU `gettext', the default value is
- `"messages"'.
-
-`_"your message here"'
- String constants marked with a leading underscore are candidates
- for translation at runtime. String constants without a leading
- underscore are not translated.
-
-`dcgettext(STRING [, DOMAIN [, CATEGORY]])'
- Return the translation of STRING in text domain DOMAIN for locale
- category CATEGORY. The default value for DOMAIN is the current
- value of `TEXTDOMAIN'. The default value for CATEGORY is
- `"LC_MESSAGES"'.
-
- If you supply a value for CATEGORY, it must be a string equal to
- one of the known locale categories described in *note Explaining
- gettext::. You must also supply a text domain. Use `TEXTDOMAIN'
- if you want to use the current domain.
-
- CAUTION: The order of arguments to the `awk' version of the
- `dcgettext()' function is purposely different from the order
- for the C version. The `awk' version's order was chosen to
- be simple and to allow for reasonable `awk'-style default
- arguments.
-
-`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])'
- Return the plural form used for NUMBER of the translation of
- STRING1 and STRING2 in text domain DOMAIN for locale category
- CATEGORY. STRING1 is the English singular variant of a message,
- and STRING2 the English plural variant of the same message. The
- default value for DOMAIN is the current value of `TEXTDOMAIN'.
- The default value for CATEGORY is `"LC_MESSAGES"'.
-
- The same remarks about argument order as for the `dcgettext()'
- function apply.
-
-`bindtextdomain(DIRECTORY [, DOMAIN])'
- Change the directory in which `gettext' looks for `.mo' files, in
- case they will not or cannot be placed in the standard locations
- (e.g., during testing). Return the directory in which DOMAIN is
- "bound."
-
- The default DOMAIN is the value of `TEXTDOMAIN'. If DIRECTORY is
- the null string (`""'), then `bindtextdomain()' returns the
- current binding for the given DOMAIN.
-
- To use these facilities in your `awk' program, follow the steps
-outlined in *note Explaining gettext::, like so:
-
- 1. Set the variable `TEXTDOMAIN' to the text domain of your program.
- This is best done in a `BEGIN' rule (*note BEGIN/END::), or it can
- also be done via the `-v' command-line option (*note Options::):
-
- BEGIN {
- TEXTDOMAIN = "guide"
- ...
- }
-
- 2. Mark all translatable strings with a leading underscore (`_')
- character. It _must_ be adjacent to the opening quote of the
- string. For example:
-
- print _"hello, world"
- x = _"you goofed"
- printf(_"Number of users is %d\n", nusers)
-
- 3. If you are creating strings dynamically, you can still translate
- them, using the `dcgettext()' built-in function:
-
- message = nusers " users logged in"
- message = dcgettext(message, "adminprog")
- print message
-
- Here, the call to `dcgettext()' supplies a different text domain
- (`"adminprog"') in which to find the message, but it uses the
- default `"LC_MESSAGES"' category.
-
- 4. During development, you might want to put the `.mo' file in a
- private directory for testing. This is done with the
- `bindtextdomain()' built-in function:
-
- BEGIN {
- TEXTDOMAIN = "guide" # our text domain
- if (Testing) {
- # where to find our files
- bindtextdomain("testdir")
- # joe is in charge of adminprog
- bindtextdomain("../joe/testdir", "adminprog")
- }
- ...
- }
-
-
- *Note I18N Example::, for an example program showing the steps to
-create and use translations from `awk'.
-
-
-File: gawk.info, Node: Translator i18n, Next: I18N Example, Prev: Programmer i18n, Up: Internationalization
-
-10.4 Translating `awk' Programs
-===============================
-
-Once a program's translatable strings have been marked, they must be
-extracted to create the initial `.po' file. As part of translation, it
-is often helpful to rearrange the order in which arguments to `printf'
-are output.
-
- `gawk''s `--gen-pot' command-line option extracts the messages and
-is discussed next. After that, `printf''s ability to rearrange the
-order for `printf' arguments at runtime is covered.
-
-* Menu:
-
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging `printf' arguments.
-* I18N Portability:: `awk'-level portability issues.
-
-
-File: gawk.info, Node: String Extraction, Next: Printf Ordering, Up: Translator i18n
-
-10.4.1 Extracting Marked Strings
---------------------------------
-
-Once your `awk' program is working, and all the strings have been
-marked and you've set (and perhaps bound) the text domain, it is time
-to produce translations. First, use the `--gen-pot' command-line
-option to create the initial `.pot' file:
-
- $ gawk --gen-pot -f guide.awk > guide.pot
-
- When run with `--gen-pot', `gawk' does not execute your program.
-Instead, it parses it as usual and prints all marked strings to
-standard output in the format of a GNU `gettext' Portable Object file.
-Also included in the output are any constant strings that appear as the
-first argument to `dcgettext()' or as the first and second argument to
-`dcngettext()'.(1) *Note I18N Example::, for the full list of steps to
-go through to create and test translations for `guide'.
-
- ---------- Footnotes ----------
-
- (1) The `xgettext' utility that comes with GNU `gettext' can handle
-`.awk' files.
-
-
-File: gawk.info, Node: Printf Ordering, Next: I18N Portability, Prev: String Extraction, Up: Translator i18n
-
-10.4.2 Rearranging `printf' Arguments
--------------------------------------
-
-Format strings for `printf' and `sprintf()' (*note Printf::) present a
-special problem for translation. Consider the following:(1)
-
- printf(_"String `%s' has %d characters\n",
- string, length(string)))
-
- A possible German translation for this might be:
-
- "%d Zeichen lang ist die Zeichenkette `%s'\n"
-
- The problem should be obvious: the order of the format
-specifications is different from the original! Even though `gettext()'
-can return the translated string at runtime, it cannot change the
-argument order in the call to `printf'.
-
- To solve this problem, `printf' format specifiers may have an
-additional optional element, which we call a "positional specifier".
-For example:
-
- "%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
-
- Here, the positional specifier consists of an integer count, which
-indicates which argument to use, and a `$'. Counts are one-based, and
-the format string itself is _not_ included. Thus, in the following
-example, `string' is the first argument and `length(string)' is the
-second:
-
- $ gawk 'BEGIN {
- > string = "Dont Panic"
- > printf _"%2$d characters live in \"%1$s\"\n",
- > string, length(string)
- > }'
- -| 10 characters live in "Dont Panic"
-
- If present, positional specifiers come first in the format
-specification, before the flags, the field width, and/or the precision.
-
- Positional specifiers can be used with the dynamic field width and
-precision capability:
-
- $ gawk 'BEGIN {
- > printf("%*.*s\n", 10, 20, "hello")
- > printf("%3$*2$.*1$s\n", 20, 10, "hello")
- > }'
- -| hello
- -| hello
-
- NOTE: When using `*' with a positional specifier, the `*' comes
- first, then the integer position, and then the `$'. This is
- somewhat counterintuitive.
-
- `gawk' does not allow you to mix regular format specifiers and those
-with positional specifiers in the same string:
-
- $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }'
- error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or none
-
- NOTE: There are some pathological cases that `gawk' may fail to
- diagnose. In such cases, the output may not be what you expect.
- It's still a bad idea to try mixing them, even if `gawk' doesn't
- detect it.
-
- Although positional specifiers can be used directly in `awk'
-programs, their primary purpose is to help in producing correct
-translations of format strings into languages different from the one in
-which the program is first written.
-
- ---------- Footnotes ----------
-
- (1) This example is borrowed from the GNU `gettext' manual.
-
-
-File: gawk.info, Node: I18N Portability, Prev: Printf Ordering, Up: Translator i18n
-
-10.4.3 `awk' Portability Issues
--------------------------------
-
-`gawk''s internationalization features were purposely chosen to have as
-little impact as possible on the portability of `awk' programs that use
-them to other versions of `awk'. Consider this program:
-
- BEGIN {
- TEXTDOMAIN = "guide"
- if (Test_Guide) # set with -v
- bindtextdomain("/test/guide/messages")
- print _"don't panic!"
- }
-
-As written, it won't work on other versions of `awk'. However, it is
-actually almost portable, requiring very little change:
-
- * Assignments to `TEXTDOMAIN' won't have any effect, since
- `TEXTDOMAIN' is not special in other `awk' implementations.
-
- * Non-GNU versions of `awk' treat marked strings as the
- concatenation of a variable named `_' with the string following
- it.(1) Typically, the variable `_' has the null string (`""') as
- its value, leaving the original string constant as the result.
-
- * By defining "dummy" functions to replace `dcgettext()',
- `dcngettext()' and `bindtextdomain()', the `awk' program can be
- made to run, but all the messages are output in the original
- language. For example:
-
- function bindtextdomain(dir, domain)
- {
- return dir
- }
-
- function dcgettext(string, domain, category)
- {
- return string
- }
-
- function dcngettext(string1, string2, number, domain, category)
- {
- return (number == 1 ? string1 : string2)
- }
-
- * The use of positional specifications in `printf' or `sprintf()' is
- _not_ portable. To support `gettext()' at the C level, many
- systems' C versions of `sprintf()' do support positional
- specifiers. But it works only if enough arguments are supplied in
- the function call. Many versions of `awk' pass `printf' formats
- and arguments unchanged to the underlying C library version of
- `sprintf()', but only one format and argument at a time. What
- happens if a positional specification is used is anybody's guess.
- However, since the positional specifications are primarily for use
- in _translated_ format strings, and since non-GNU `awk's never
- retrieve the translated string, this should not be a problem in
- practice.
-
- ---------- Footnotes ----------
-
- (1) This is good fodder for an "Obfuscated `awk'" contest.
-
-
-File: gawk.info, Node: I18N Example, Next: Gawk I18N, Prev: Translator i18n, Up: Internationalization
-
-10.5 A Simple Internationalization Example
-==========================================
-
-Now let's look at a step-by-step example of how to internationalize and
-localize a simple `awk' program, using `guide.awk' as our original
-source:
-
- BEGIN {
- TEXTDOMAIN = "guide"
- bindtextdomain(".") # for testing
- print _"Don't Panic"
- print _"The Answer Is", 42
- print "Pardon me, Zaphod who?"
- }
-
-Run `gawk --gen-pot' to create the `.pot' file:
-
- $ gawk --gen-pot -f guide.awk > guide.pot
-
-This produces:
-
- #: guide.awk:4
- msgid "Don't Panic"
- msgstr ""
-
- #: guide.awk:5
- msgid "The Answer Is"
- msgstr ""
-
- This original portable object template file is saved and reused for
-each language into which the application is translated. The `msgid' is
-the original string and the `msgstr' is the translation.
-
- NOTE: Strings not marked with a leading underscore do not appear
- in the `guide.pot' file.
-
- Next, the messages must be translated. Here is a translation to a
-hypothetical dialect of English, called "Mellow":(1)
-
- $ cp guide.pot guide-mellow.po
- ADD TRANSLATIONS TO guide-mellow.po ...
-
-Following are the translations:
-
- #: guide.awk:4
- msgid "Don't Panic"
- msgstr "Hey man, relax!"
-
- #: guide.awk:5
- msgid "The Answer Is"
- msgstr "Like, the scoop is"
-
- The next step is to make the directory to hold the binary message
-object file and then to create the `guide.mo' file. The directory
-layout shown here is standard for GNU `gettext' on GNU/Linux systems.
-Other versions of `gettext' may use a different layout:
-
- $ mkdir en_US en_US/LC_MESSAGES
-
- The `msgfmt' utility does the conversion from human-readable `.po'
-file to machine-readable `.mo' file. By default, `msgfmt' creates a
-file named `messages'. This file must be renamed and placed in the
-proper directory so that `gawk' can find it:
-
- $ msgfmt guide-mellow.po
- $ mv messages en_US/LC_MESSAGES/guide.mo
-
- Finally, we run the program to test it:
-
- $ gawk -f guide.awk
- -| Hey man, relax!
- -| Like, the scoop is 42
- -| Pardon me, Zaphod who?
-
- If the three replacement functions for `dcgettext()', `dcngettext()'
-and `bindtextdomain()' (*note I18N Portability::) are in a file named
-`libintl.awk', then we can run `guide.awk' unchanged as follows:
-
- $ gawk --posix -f guide.awk -f libintl.awk
- -| Don't Panic
- -| The Answer Is 42
- -| Pardon me, Zaphod who?
-
- ---------- Footnotes ----------
-
- (1) Perhaps it would be better if it were called "Hippy." Ah, well.
-
-
-File: gawk.info, Node: Gawk I18N, Prev: I18N Example, Up: Internationalization
-
-10.6 `gawk' Can Speak Your Language
-===================================
-
-`gawk' itself has been internationalized using the GNU `gettext'
-package. (GNU `gettext' is described in complete detail in *note (GNU
-`gettext' utilities)Top:: gettext, GNU gettext tools.) As of this
-writing, the latest version of GNU `gettext' is version 0.18.1
-(ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz).
-
- If a translation of `gawk''s messages exists, then `gawk' produces
-usage messages, warnings, and fatal errors in the local language.
-
-
-File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Advanced Features, Prev: Internationalization, Up: Top
-
-11 Arithmetic and Arbitrary Precision Arithmetic with `gawk'
-************************************************************
-
- There's a credibility gap: We don't know how much of the
- computer's answers to believe. Novice computer users solve this
- problem by implicitly trusting in the computer as an infallible
- authority; they tend to believe that all digits of a printed
- answer are significant. Disillusioned computer users have just the
- opposite approach; they are constantly afraid that their answers
- are almost meaningless.
- Donald Knuth(1)
-
- This major node discusses issues that you may encounter when
-performing arithmetic. It begins by discussing some of the general
-atributes of computer arithmetic, along with how this can influence
-what you see when running `awk' programs. This discussion applies to
-all versions of `awk'.
-
- Then the discussion moves on to "arbitrary precsion arithmetic", a
-feature which is specific to `gawk'.
-
-* Menu:
-
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating-point Programming:: Effective floating-point programming.
-* Gawk and MPFR:: How `gawk' provides
- aribitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary precision floating-point arithmetic
- with `gawk'.
-* Arbitrary Precision Integers:: Arbitrary precision integer arithmetic with
- `gawk'.
-
- ---------- Footnotes ----------
-
- (1) Donald E. Knuth. `The Art of Computer Programming'. Volume 2,
-`Seminumerical Algorithms', third edition, 1998, ISBN 0-201-89683-4, p.
-229.
-
-
-File: gawk.info, Node: General Arithmetic, Next: Floating-point Programming, Up: Arbitrary Precision Arithmetic
-
-11.1 A General Description of Computer Arithmetic
-=================================================
-
-Within computers, there are two kinds of numeric values: "integers" and
-"floating-point". In school, integer values were referred to as
-"whole" numbers--that is, numbers without any fractional part, such as
-1, 42, or -17. The advantage to integer numbers is that they represent
-values exactly. The disadvantage is that their range is limited. On
-most systems, this range is -2,147,483,648 to 2,147,483,647. However,
-many systems now support a range from -9,223,372,036,854,775,808 to
-9,223,372,036,854,775,807.
-
- Integer values come in two flavors: "signed" and "unsigned". Signed
-values may be negative or positive, with the range of values just
-described. Unsigned values are always positive. On most systems, the
-range is from 0 to 4,294,967,295. However, many systems now support a
-range from 0 to 18,446,744,073,709,551,615.
-
- Floating-point numbers represent what are called "real" numbers;
-i.e., those that do have a fractional part, such as 3.1415927. The
-advantage to floating-point numbers is that they can represent a much
-larger range of values. The disadvantage is that there are numbers
-that they cannot represent exactly. `awk' uses "double precision"
-floating-point numbers, which can hold more digits than "single
-precision" floating-point numbers.
-
- There a several important issues to be aware of, described next.
-
-* Menu:
-
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* Integer Programming:: Effective integer programming.
-
-
-File: gawk.info, Node: Floating Point Issues, Next: Integer Programming, Up: General Arithmetic
-
-11.1.1 Floating-Point Number Caveats
-------------------------------------
-
-As mentioned earlier, floating-point numbers represent what are called
-"real" numbers, i.e., those that have a fractional part. `awk' uses
-double precision floating-point numbers to represent all numeric
-values. This minor node describes some of the issues involved in using
-floating-point numbers.
-
- There is a very nice paper on floating-point arithmetic
-(http://www.validlab.com/goldberg/paper.pdf) by David Goldberg, "What
-Every Computer Scientist Should Know About Floating-point Arithmetic,"
-`ACM Computing Surveys' *23*, 1 (1991-03), 5-48. This is worth reading
-if you are interested in the details, but it does require a background
-in computer science.
-
-* Menu:
-
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-
-
-File: gawk.info, Node: String Conversion Precision, Next: Unexpected Results, Up: Floating Point Issues
-
-11.1.1.1 The String Value Can Lie
-.................................
-
-Internally, `awk' keeps both the numeric value (double precision
-floating-point) and the string value for a variable. Separately, `awk'
-keeps track of what type the variable has (*note Typing and
-Comparison::), which plays a role in how variables are used in
-comparisons.
-
- It is important to note that the string value for a number may not
-reflect the full value (all the digits) that the numeric value actually
-contains. The following program (`values.awk') illustrates this:
-
- {
- sum = $1 + $2
- # see it for what it is
- printf("sum = %.12g\n", sum)
- # use CONVFMT
- a = "<" sum ">"
- print "a =", a
- # use OFMT
- print "sum =", sum
- }
-
-This program shows the full value of the sum of `$1' and `$2' using
-`printf', and then prints the string values obtained from both
-automatic conversion (via `CONVFMT') and from printing (via `OFMT').
-
- Here is what happens when the program is run:
-
- $ echo 3.654321 1.2345678 | awk -f values.awk
- -| sum = 4.8888888
- -| a = <4.88889>
- -| sum = 4.88889
-
- This makes it clear that the full numeric value is different from
-what the default string representations show.
-
- `CONVFMT''s default value is `"%.6g"', which yields a value with at
-least six significant digits. For some applications, you might want to
-change it to specify more precision. On most modern machines, most of
-the time, 17 digits is enough to capture a floating-point number's
-value exactly.(1)
-
- ---------- Footnotes ----------
-
- (1) Pathological cases can require up to 752 digits (!), but we
-doubt that you need to worry about this.
-
-
-File: gawk.info, Node: Unexpected Results, Next: POSIX Floating Point Problems, Prev: String Conversion Precision, Up: Floating Point Issues
-
-11.1.1.2 Floating Point Numbers Are Not Abstract Numbers
-........................................................
-
-Unlike numbers in the abstract sense (such as what you studied in high
-school or college arithmetic), numbers stored in computers are limited
-in certain ways. They cannot represent an infinite number of digits,
-nor can they always represent things exactly. In particular,
-floating-point numbers cannot always represent values exactly. Here is
-an example:
-
- $ awk '{ printf("%010d\n", $1 * 100) }'
- 515.79
- -| 0000051579
- 515.80
- -| 0000051579
- 515.81
- -| 0000051580
- 515.82
- -| 0000051582
- Ctrl-d
-
-This shows that some values can be represented exactly, whereas others
-are only approximated. This is not a "bug" in `awk', but simply an
-artifact of how computers represent numbers.
-
- Another peculiarity of floating-point numbers on modern systems is
-that they often have more than one representation for the number zero!
-In particular, it is possible to represent "minus zero" as well as
-regular, or "positive" zero.
-
- This example shows that negative and positive zero are distinct
-values when stored internally, but that they are in fact equal to each
-other, as well as to "regular" zero:
-
- $ gawk 'BEGIN { mz = -0 ; pz = 0
- > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz
- > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0
- > }'
- -| -0 = -0, +0 = 0, (-0 == +0) -> 1
- -| mz == 0 -> 1, pz == 0 -> 1
-
- It helps to keep this in mind should you process numeric data that
-contains negative zero values; the fact that the zero is negative is
-noted and can affect comparisons.
-
-
-File: gawk.info, Node: POSIX Floating Point Problems, Prev: Unexpected Results, Up: Floating Point Issues
-
-11.1.1.3 Standards Versus Existing Practice
-...........................................
-
-Historically, `awk' has converted any non-numeric looking string to the
-numeric value zero, when required. Furthermore, the original
-definition of the language and the original POSIX standards specified
-that `awk' only understands decimal numbers (base 10), and not octal
-(base 8) or hexadecimal numbers (base 16).
-
- Changes in the language of the 2001 and 2004 POSIX standards can be
-interpreted to imply that `awk' should support additional features.
-These features are:
-
- * Interpretation of floating point data values specified in
- hexadecimal notation (`0xDEADBEEF'). (Note: data values, _not_
- source code constants.)
-
- * Support for the special IEEE 754 floating point values "Not A
- Number" (NaN), positive Infinity ("inf") and negative Infinity
- ("-inf"). In particular, the format for these values is as
- specified by the ISO 1999 C standard, which ignores case and can
- allow machine-dependent additional characters after the `nan' and
- allow either `inf' or `infinity'.
-
- The first problem is that both of these are clear changes to
-historical practice:
-
- * The `gawk' maintainer feels that supporting hexadecimal floating
- point values, in particular, is ugly, and was never intended by the
- original designers to be part of the language.
-
- * Allowing completely alphabetic strings to have valid numeric
- values is also a very severe departure from historical practice.
-
- The second problem is that the `gawk' maintainer feels that this
-interpretation of the standard, which requires a certain amount of
-"language lawyering" to arrive at in the first place, was not even
-intended by the standard developers. In other words, "we see how you
-got where you are, but we don't think that that's where you want to be."
-
- Recognizing the above issues, but attempting to provide compatibility
-with the earlier versions of the standard, the 2008 POSIX standard
-added explicit wording to allow, but not require, that `awk' support
-hexadecimal floating point values and special values for "Not A Number"
-and infinity.
-
- Although the `gawk' maintainer continues to feel that providing
-those features is inadvisable, nevertheless, on systems that support
-IEEE floating point, it seems reasonable to provide _some_ way to
-support NaN and Infinity values. The solution implemented in `gawk' is
-as follows:
-
- * With the `--posix' command-line option, `gawk' becomes "hands
- off." String values are passed directly to the system library's
- `strtod()' function, and if it successfully returns a numeric
- value, that is what's used.(1) By definition, the results are not
- portable across different systems. They are also a little
- surprising:
-
- $ echo nanny | gawk --posix '{ print $1 + 0 }'
- -| nan
- $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }'
- -| 3735928559
-
- * Without `--posix', `gawk' interprets the four strings `+inf',
- `-inf', `+nan', and `-nan' specially, producing the corresponding
- special numeric values. The leading sign acts a signal to `gawk'
- (and the user) that the value is really numeric. Hexadecimal
- floating point is not supported (unless you also use
- `--non-decimal-data', which is _not_ recommended). For example:
-
- $ echo nanny | gawk '{ print $1 + 0 }'
- -| 0
- $ echo +nan | gawk '{ print $1 + 0 }'
- -| nan
- $ echo 0xDeadBeef | gawk '{ print $1 + 0 }'
- -| 0
-
- `gawk' does ignore case in the four special values. Thus `+nan'
- and `+NaN' are the same.
-
- ---------- Footnotes ----------
-
- (1) You asked for it, you got it.
-
-
-File: gawk.info, Node: Integer Programming, Prev: Floating Point Issues, Up: General Arithmetic
-
-11.1.2 Mixing Integers And Floating-point
------------------------------------------
-
-As has been mentioned already, `gawk' ordinarily uses hardware double
-precision with 64-bit IEEE binary floating-point representation for
-numbers on most systems. A large integer like 9007199254740997 has a
-binary representation that, although finite, is more than 53 bits long;
-it must also be rounded to 53 bits. The biggest integer that can be
-stored in a C `double' is usually the same as the largest possible
-value of a `double'. If your system `double' is an IEEE 64-bit
-`double', this largest possible value is an integer and can be
-represented precisely. What more should one know about integers?
-
- If you want to know what is the largest integer, such that it and
-all smaller integers can be stored in 64-bit doubles without losing
-precision, then the answer is 2^53. The next representable number is
-the even number 2^53 + 2, meaning it is unlikely that you will be able
-to make `gawk' print 2^53 + 1 in integer format. The range of integers
-exactly representable by a 64-bit double is [-2^53, 2^53]. If you ever
-see an integer outside this range in `gawk' using 64-bit doubles, you
-have reason to be very suspicious about the accuracy of the output.
-Here is a simple program with erroneous output:
-
- $ gawk 'BEGIN { i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j }'
- -| 9007199254740991
- -| 9007199254740992
- -| 9007199254740992
- -| 9007199254740994
-
- The lesson is to not assume that any large integer printed by `gawk'
-represents an exact result from your computation, especially if it wraps
-around on your screen.
-
-
-File: gawk.info, Node: Floating-point Programming, Next: Gawk and MPFR, Prev: General Arithmetic, Up: Arbitrary Precision Arithmetic
-
-11.2 Understanding Floating-point Programming
-=============================================
-
-Numerical programming is an extensive area; if you need to develop
-sophisticated numerical algorithms then `gawk' may not be the ideal
-tool, and this documentation may not be sufficient. It might require
-digesting a book or two to really internalize how to compute with ideal
-accuracy and precision and the result often depends on the particular
-application.
-
- NOTE: A floating-point calculation's "accuracy" is how close it
- comes to the real value. This is as opposed to the "precision",
- which usually refers to the number of bits used to represent the
- number (see the Wikipedia article
- (http://en.wikipedia.org/wiki/Accuracy_and_precision) for more
- information).
-
- There are two options for doing floating-point calculations:
-hardware floating-point (as used by standard `awk' and the default for
-`gawk'), and "arbitrary-precision" floating-point, which is software
-based. This major node aims to provide enough information to
-understand both, and then will focus on `gawk''s facilities for the
-latter.
-
- Binary floating-point representations and arithmetic are inexact.
-Simple values like 0.1 cannot be precisely represented using binary
-floating-point numbers, and the limited precision of floating-point
-numbers means that slight changes in the order of operations or the
-precision of intermediate storage can change the result. To make
-matters worse, with arbitrary precision floating-point, you can set the
-precision before starting a computation, but then you cannot be sure of
-the number of significant decimal places in the final result.
-
- Sometimes, before you start to write any code, you should think more
-about what you really want and what's really happening. Consider the
-two numbers in the following example:
-
- x = 0.875 # 1/2 + 1/4 + 1/8
- y = 0.425
-
- Unlike the number in `y', the number stored in `x' is exactly
-representable in binary since it can be written as a finite sum of one
-or more fractions whose denominators are all powers of two. When
-`gawk' reads a floating-point number from program source, it
-automatically rounds that number to whatever precision your machine
-supports. If you try to print the numeric content of a variable using
-an output format string of `"%.17g"', it may not produce the same
-number as you assigned to it:
-
- $ gawk 'BEGIN { x = 0.875; y = 0.425
- > printf("%0.17g, %0.17g\n", x, y) }'
- -| 0.875, 0.42499999999999999
-
- Often the error is so small you do not even notice it, and if you do,
-you can always specify how much precision you would like in your output.
-Usually this is a format string like `"%.15g"', which when used in the
-previous example, produces an output identical to the input.
-
- Because the underlying representation can be little bit off from the
-exact value, comparing floating-point values to see if they are equal
-is generally not a good idea. Here is an example where it does not
-work like you expect:
-
- $ gawk 'BEGIN { print (0.1 + 12.2 == 12.3) }'
- -| 0
-
- The loss of accuracy during a single computation with floating-point
-numbers usually isn't enough to worry about. However, if you compute a
-value which is the result of a sequence of floating point operations,
-the error can accumulate and greatly affect the computation itself.
-Here is an attempt to compute the value of the constant pi using one of
-its many series representations:
-
- BEGIN {
- x = 1.0 / sqrt(3.0)
- n = 6
- for (i = 1; i < 30; i++) {
- n = n * 2.0
- x = (sqrt(x * x + 1) - 1) / x
- printf("%.15f\n", n * x)
- }
- }
-
- When run, the early errors propagating through later computations
-cause the loop to terminate prematurely after an attempt to divide by
-zero.
-
- $ gawk -f pi.awk
- -| 3.215390309173475
- -| 3.159659942097510
- -| 3.146086215131467
- -| 3.142714599645573
- ...
- -| 3.224515243534819
- -| 2.791117213058638
- -| 0.000000000000000
- error--> gawk: pi.awk:6: fatal: division by zero attempted
-
- Here is one more example where the inaccuracies in internal
-representations yield an unexpected result:
-
- $ gawk 'BEGIN {
- > for (d = 1.1; d <= 1.5; d += 0.1)
- > i++
- > print i
- > }'
- -| 4
-
- Can computation using aribitrary precision help with the previous
-examples? If you are impatient to know, see *note Exact Arithmetic::.
-
- Instead of aribitrary precision floating-point arithmetic, often all
-you need is an adjustment of your logic or a different order for the
-operations in your calculation. The stability and the accuracy of the
-computation of the constant pi in the previous example can be enhanced
-by using the following simple algebraic transformation:
-
- (sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1)
-
-After making this, change the program does converge to pi in under 30
-iterations:
-
- $ gawk -f /tmp/pi2.awk
- -| 3.215390309173473
- -| 3.159659942097501
- -| 3.146086215131436
- -| 3.142714599645370
- -| 3.141873049979825
- ...
- -| 3.141592653589797
- -| 3.141592653589797
-
- There is no need to be unduly suspicious about the results from
-floating-point arithmetic. The lesson to remember is that
-floating-point arithmetic is always more complex than the arithmetic
-using pencil and paper. In order to take advantage of the power of
-computer floating-point, you need to know its limitations and work
-within them. For most casual use of floating-point arithmetic, you will
-often get the expected result in the end if you simply round the
-display of your final results to the correct number of significant
-decimal digits. And, avoid presenting numerical data in a manner that
-implies better precision than is actually the case.
-
-* Menu:
-
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-
-
-File: gawk.info, Node: Floating-point Representation, Next: Floating-point Context, Up: Floating-point Programming
-
-11.2.1 Binary Floating-point Representation
--------------------------------------------
-
-Although floating-point representations vary from machine to machine,
-the most commonly encountered representation is that defined by the
-IEEE 754 Standard. An IEEE-754 format value has three components:
-
- * A sign bit telling whether the number is positive or negative.
-
- * An "exponent" giving its order of magnitude, E.
-
- * A "significand", S, specifying the actual digits of the number.
-
- The value of the number is then S * 2^E. The first bit of a
-non-zero binary significand is always one, so the significand in an
-IEEE-754 format only includes the fractional part, leaving the leading
-one implicit.
-
- Three of the standard IEEE-754 types are 32-bit single precision,
-64-bit double precision and 128-bit quadruple precision. The standard
-also specifies extended precision formats to allow greater precisions
-and larger exponent ranges.
-
- The significand is stored in "normalized" format, which means that
-the first bit is always a one.
-
-
-File: gawk.info, Node: Floating-point Context, Next: Rounding Mode, Prev: Floating-point Representation, Up: Floating-point Programming
-
-11.2.2 Floating-point Context
------------------------------
-
-A floating-point "context" defines the environment for arithmetic
-operations. It governs precision, sets rules for rounding, and limits
-the range for exponents. The context has the following primary
-components:
-
-"Precision"
- Precision of the floating-point format in bits.
-
-"emax"
- Maximum exponent allowed for this format.
-
-"emin"
- Minimum exponent allowed for this format.
-
-"Underflow behavior"
- The format may or may not support gradual underflow.
-
-"Rounding"
- The rounding mode of this context.
-
- *note table-ieee-formats:: lists the precision and exponent field
-values for the basic IEEE-754 binary formats:
-
-Name Total bits Precision emin emax
----------------------------------------------------------------------------
-Single 32 24 -126 +127
-Double 64 53 -1022 +1023
-Quadruple 128 113 -16382 +16383
-
-Table 11.1: Basic IEEE Format Context Values
-
- NOTE: The precision numbers include the implied leading one that
- gives them one extra bit of significand.
-
- A floating-point context can also determine which signals are treated
-as exceptions, and can set rules for arithmetic with special values.
-Please consult the IEEE-754 standard or other resources for details.
-
- `gawk' ordinarily uses the hardware double precision representation
-for numbers. On most systems, this is IEEE-754 floating-point format,
-corresponding to 64-bit binary with 53 bits of precision.
-
- NOTE: In case an underflow occurs, the standard allows, but does
- not require, the result from an arithmetic operation to be a
- number smaller than the smallest nonzero normalized number. Such
- numbers do not have as many significant digits as normal numbers,
- and are called "denormals" or "subnormals". The alternative,
- simply returning a zero, is called "flush to zero". The basic
- IEEE-754 binary formats support subnormal numbers.
-
-
-File: gawk.info, Node: Rounding Mode, Prev: Floating-point Context, Up: Floating-point Programming
-
-11.2.3 Floating-point Rounding Mode
------------------------------------
-
-The "rounding mode" specifies the behavior for the results of numerical
-operations when discarding extra precision. Each rounding mode indicates
-how the least significant returned digit of a rounded result is to be
-calculated. *note table-rounding-modes:: lists the IEEE-754 defined
-rounding modes:
-
-Rounding Mode IEEE Name
---------------------------------------------------------------------------
-Round to nearest, ties to even `roundTiesToEven'
-Round toward plus Infinity `roundTowardPositive'
-Round toward negative Infinity `roundTowardNegative'
-Round toward zero `roundTowardZero'
-Round to nearest, ties away `roundTiesToAway'
-from zero
-
-Table 11.2: IEEE 754 Rounding Modes
-
- The default mode `roundTiesToEven' is the most preferred, but the
-least intuitive. This method does the obvious thing for most values, by
-rounding them up or down to the nearest digit. For example, rounding
-1.132 to two digits yields 1.13, and rounding 1.157 yields 1.16.
-
- However, when it comes to rounding a value that is exactly halfway
-between, things do not work the way you probably learned in school. In
-this case, the number is rounded to the nearest even digit. So
-rounding 0.125 to two digits rounds down to 0.12, but rounding 0.6875
-to three digits rounds up to 0.688. You probably have already
-encountered this rounding mode when using the `printf' routine to
-format floating-point numbers. For example:
-
- BEGIN {
- x = -4.5
- for (i = 1; i < 10; i++) {
- x += 1.0
- printf("%4.1f => %2.0f\n", x, x)
- }
- }
-
-produces the following output when run:(1)
-
- -3.5 => -4
- -2.5 => -2
- -1.5 => -2
- -0.5 => 0
- 0.5 => 0
- 1.5 => 2
- 2.5 => 2
- 3.5 => 4
- 4.5 => 4
-
- The theory behind the rounding mode `roundTiesToEven' is that it
-more or less evenly distributes upward and downward rounds of exact
-halves, which might cause the round-off error to cancel itself out.
-This is the default rounding mode used in IEEE-754 computing functions
-and operators.
-
- The other rounding modes are rarely used. Round toward positive
-infinity (`roundTowardPositive') and round toward negative infinity
-(`roundTowardNegative') are often used to implement interval arithmetic,
-where you adjust the rounding mode to calculate upper and lower bounds
-for the range of output. The `roundTowardZero' mode can be used for
-converting floating-point numbers to integers. The rounding mode
-`roundTiesToAway' rounds the result to the nearest number and selects
-the number with the larger magnitude if a tie occurs.
-
- Some numerical analysts will tell you that your choice of rounding
-style has tremendous impact on the final outcome, and advise you to
-wait until final output for any rounding. Instead, you can often avoid
-round-off error problems by setting the precision initially to some
-value sufficiently larger than the final desired precision, so that the
-accumulation of round-off error does not influence the outcome. If you
-suspect that results from your computation are sensitive to
-accumulation of round-off error, one way to be sure is to look for a
-significant difference in output when you change the rounding mode.
-
- ---------- Footnotes ----------
-
- (1) It is possible for the output to be completely different if the
-C library in your system does not use the IEEE-754 even-rounding rule
-to round halfway cases for `printf()'.
-
-
-File: gawk.info, Node: Gawk and MPFR, Next: Arbitrary Precision Floats, Prev: Floating-point Programming, Up: Arbitrary Precision Arithmetic
-
-11.3 `gawk' + MPFR = Powerful Arithmetic
-========================================
-
-The rest of this major node decsribes how to use the arbitrary precision
-(also known as "multiple precision" or "infinite precision") numeric
-capabilites in `gawk' to produce maximally accurate results when you
-need it.
-
- But first you should check if your version of `gawk' supports
-arbitrary precision arithmetic. The easiest way to find out is to look
-at the output of the following command:
-
- $ gawk --version
- -| GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
- -| Copyright (C) 1989, 1991-2012 Free Software Foundation.
- ...
-
- `gawk' uses the GNU MPFR (http://www.mpfr.org) and GNU MP
-(http://gmplib.org) (GMP) libraries for arbitrary precision arithmetic
-on numbers. So if you do not see the names of these libraries in the
-output, then your version of `gawk' does not support arbitrary
-precision arithmetic.
-
- Additionally, there are a few elements available in the `PROCINFO'
-array to provide information about the MPFR and GMP libraries. *Note
-Auto-set::, for more information.
-
-
-File: gawk.info, Node: Arbitrary Precision Floats, Next: Arbitrary Precision Integers, Prev: Gawk and MPFR, Up: Arbitrary Precision Arithmetic
-
-11.4 Arbitrary Precision Floating-point Arithmetic with `gawk'
-==============================================================
-
-`gawk' uses the GNU MPFR library for arbitrary precision floating-point
-arithmetic. The MPFR library provides precise control over precisions
-and rounding modes, and gives correctly rounded reproducible
-platform-independent results. With the command-line option `--bignum'
-or `-M', all floating-point arithmetic operators and numeric functions
-can yield results to any desired precision level supported by MPFR.
-Two built-in variables `PREC' (*note Setting Precision::) and
-`ROUNDMODE' (*note Setting Rounding Mode::) provide control over the
-working precision and the rounding mode. The precision and the
-rounding mode are set globally for every operation to follow.
-
- The default working precision for arbitrary precision floating-point
-values is 53, and the default value for `ROUNDMODE' is `"N"', which
-selects the IEEE-754 `roundTiesToEven' (*note Rounding Mode::) rounding
-mode.(1) `gawk' uses the default exponent range in MPFR (EMAX = 2^30 -
-1, EMIN = -EMAX) for all floating-point contexts. There is no explicit
-mechanism to adjust the exponent range. MPFR does not implement
-subnormal numbers by default, and this behavior cannot be changed in
-`gawk'.
-
- NOTE: When emulating an IEEE-754 format (*note Setting
- Precision::), `gawk' internally adjusts the exponent range to the
- value defined for the format and also performs computations needed
- for gradual underflow (subnormal numbers).
-
- NOTE: MPFR numbers are variable-size entities, consuming only as
- much space as needed to store the significant digits. Since the
- performance using MPFR numbers pales in comparison to doing
- arithmetic using the underlying machine types, you should consider
- using only as much precision as needed by your program.
-
-* Menu:
-
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
-
- ---------- Footnotes ----------
-
- (1) The default precision is 53, since according to the MPFR
-documentation, the library should be able to exactly reproduce all
-computations with double-precision machine floating-point numbers
-(`double' type in C), except the default exponent range is much wider
-and subnormal numbers are not implemented.
-
-
-File: gawk.info, Node: Setting Precision, Next: Setting Rounding Mode, Up: Arbitrary Precision Floats
-
-11.4.1 Setting the Working Precision
-------------------------------------
-
-`gawk' uses a global working precision; it does not keep track of the
-precision or accuracy of individual numbers. Performing an arithmetic
-operation or calling a built-in function rounds the result to the
-current working precision. The default working precision is 53 which
-can be modified using the built-in variable `PREC'. You can also set the
-value to one of the following pre-defined case-insensitive strings to
-emulate an IEEE-754 binary format:
-
-`PREC' IEEE-754 Binary Format
----------------------------------------------------
-`"half"' 16-bit half-precision.
-`"single"' Basic 32-bit single precision.
-`"double"' Basic 64-bit double precision.
-`"quad"' Basic 128-bit quadruple precision.
-`"oct"' 256-bit octuple precision.
-
- The following example illustrates the effects of changing precision
-on arithmetic operations:
-
- $ gawk -M -vPREC=100 'BEGIN { x = 1.0e-400; print x + 0; \
- > PREC = "double"; print x + 0 }'
- -| 1e-400
- -| 0
-
- Binary and decimal precisions are related approximately according to
-the formula:
-
- PREC = 3.322 * DPS
-
-Here, PREC denotes the binary precision (measured in bits) and DPS
-(short for decimal places) is the decimal digits. We can easily
-calculate how many decimal digits the 53-bit significand of an IEEE
-double is equivalent to: 53 / 3.332 which is equal to about 15.95. But
-what does 15.95 digits actually mean? It depends whether you are
-concerned about how many digits you can rely on, or how many digits you
-need.
-
- It is important to know how many bits it takes to uniquely identify
-a double-precision value (the C type `double'). If you want to convert
-from `double' to decimal and back to `double' (e.g., saving a `double'
-representing an intermediate result to a file, and later reading it
-back to restart the computation), then a few more decimal digits are
-required. 17 digits is generally enough for a `double'.
-
- It can also be important to know what decimal numbers can be uniquely
-represented with a `double'. If you want to convert from decimal to
-`double' and back again, 15 digits is the most that you can get. Stated
-differently, you should not present the numbers from your
-floating-point computations with more than 15 significant digits in
-them.
-
- Conversely, it takes a precision of 332 bits to hold an approximation
-of the constant pi that is accurate to 100 decimal places. You should
-always add some extra bits in order to avoid the confusing round-off
-issues that occur because numbers are stored internally in binary.
-
-
-File: gawk.info, Node: Setting Rounding Mode, Next: Floating-point Constants, Prev: Setting Precision, Up: Arbitrary Precision Floats
-
-11.4.2 Setting the Rounding Mode
---------------------------------
-
-The `ROUNDMODE' variable provides program level control over the
-rounding mode. The correspondance between `ROUNDMODE' and the IEEE
-rounding modes is shown in *note table-gawk-rounding-modes::.
-
-Rounding Mode IEEE Name `ROUNDMODE'
----------------------------------------------------------------------------
-Round to nearest, ties to even `roundTiesToEven' `"N"' or `"n"'
-Round toward plus Infinity `roundTowardPositive' `"U"' or `"u"'
-Round toward negative Infinity `roundTowardNegative' `"D"' or `"d"'
-Round toward zero `roundTowardZero' `"Z"' or `"z"'
-Round to nearest, ties away `roundTiesToAway' `"A"' or `"a"'
-from zero
-
-Table 11.3: `gawk' Rounding Modes
-
- `ROUNDMODE' has the default value `"N"', which selects the IEEE-754
-rounding mode `roundTiesToEven'. Besides the values listed in *note
-Table 11.3: table-gawk-rounding-modes, `gawk' also accepts `"A"' to
-select the IEEE-754 mode `roundTiesToAway' if your version of the MPFR
-library supports it; otherwise setting `ROUNDMODE' to this value has no
-effect. *Note Rounding Mode::, for the meanings of the various rounding
-modes.
-
- Here is an example of how to change the default rounding behavior of
-`printf''s output:
-
- $ gawk -M -vROUNDMODE="Z" 'BEGIN { printf("%.2f\n", 1.378) }'
- -| 1.37
-
-
-File: gawk.info, Node: Floating-point Constants, Next: Changing Precision, Prev: Setting Rounding Mode, Up: Arbitrary Precision Floats
-
-11.4.3 Representing Floating-point Constants
---------------------------------------------
-
-Be wary of floating-point constants! When reading a floating-point
-constant from program source code, `gawk' uses the default precision,
-unless overridden by an assignment to the special variable `PREC' on
-the command line, to store it internally as a MPFR number. Changing
-the precision using `PREC' in the program text does not change the
-precision of a constant. If you need to represent a floating-point
-constant at a higher precision than the default and cannot use a
-command line assignment to `PREC', you should either specify the
-constant as a string, or as a rational number whenever possible. The
-following example illustrates the differences among various ways to
-print a floating-point constant:
-
- $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 0.1) }'
- -| 0.1000000000000000055511151
- $ gawk -M -vPREC = 113 'BEGIN { printf("%0.25f\n", 0.1) }'
- -| 0.1000000000000000000000000
- $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", "0.1") }'
- -| 0.1000000000000000000000000
- $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 1/10) }'
- -| 0.1000000000000000000000000
-
- In the first case, the number is stored with the default precision
-of 53.
-
-
-File: gawk.info, Node: Changing Precision, Next: Exact Arithmetic, Prev: Floating-point Constants, Up: Arbitrary Precision Floats
-
-11.4.4 Changing the Precision of a Number
------------------------------------------
-
- The point is that in any variable-precision package, a decision is
- made on how to treat numbers given as data, or arising in
- intermediate results, which are represented in floating-point
- format to a precision lower than working precision. Do we promote
- them to full membership of the high-precision club, or do we treat
- them and all their associates as second-class citizens? Sometimes
- the first course is proper, sometimes the second, and it takes
- careful analysis to tell which.
-
- Dirk Laurie(1)
-
- `gawk' does not implicitly modify the precision of any previously
-computed results when the working precision is changed with an
-assignment to `PREC'. The precision of a number is always the one that
-was used at the time of its creation, and there is no way for the user
-to explicitly change it afterwards. However, since the result of a
-floating-point arithmetic operation is always an arbitrary precision
-floating-point value--with a precision set by the value of `PREC'--one
-of the following workarounds effectively accomplishes the desired
-behavior:
-
- x = x + 0.0
-
-or:
-
- x += 0.0
-
- ---------- Footnotes ----------
-
- (1) Dirk Laurie. `Variable-precision Arithmetic Considered Perilous
--- A Detective Story'. Electronic Transactions on Numerical Analysis.
-Volume 28, pp. 168-173, 2008.
-
-
-File: gawk.info, Node: Exact Arithmetic, Prev: Changing Precision, Up: Arbitrary Precision Floats
-
-11.4.5 Exact Arithmetic with Floating-point Numbers
----------------------------------------------------
-
- CAUTION: Never depend on the exactness of floating-point
- arithmetic, even for apparently simple expressions!
-
- Can arbitrary precision arithmetic give exact results? There are no
-easy answers. The standard rules of algebra often do not apply when
-using floating-point arithmetic. Among other things, the distributive
-and associative laws do not hold completely, and order of operation may
-be important for your computation. Rounding error, cumulative precision
-loss and underflow are often troublesome.
-
- When `gawk' tests the expressions `0.1 + 12.2' and `12.3' for
-equality using the machine double precision arithmetic, it decides that
-they are not equal! (*Note Floating-point Programming::.) You can get
-the result you want by increasing the precision; 56 in this case will
-get the job done:
-
- $ gawk -M -vPREC=56 'BEGIN { print (0.1 + 12.2 == 12.3) }'
- -| 1
-
- If adding more bits is good, perhaps adding even more bits of
-precision is better? Here is what happens if we use an even larger
-value of `PREC':
-
- $ gawk -M -vPREC=201 'BEGIN { print (0.1 + 12.2 == 12.3) }'
- -| 0
-
- This is not a bug in `gawk' or in the MPFR library. It is easy to
-forget that the finite number of bits used to store the value is often
-just an approximation after proper rounding. The test for equality
-succeeds if and only if _all_ bits in the two operands are exactly the
-same. Since this is not necessarily true after floating-point
-computations with a particular precision and effective rounding rule, a
-straight test for equality may not work.
-
- So, don't assume that floating-point values can be compared for
-equality. You should also exercise caution when using other forms of
-comparisons. The standard way to compare between floating-point
-numbers is to determine how much error (or "tolerance") you will allow
-in a comparison and check to see if one value is within this error
-range of the other.
-
- In applications where 15 or fewer decimal places suffice, hardware
-double precision arithmetic can be adequate, and is usually much faster.
-But you do need to keep in mind that every floating-point operation can
-suffer a new rounding error with catastrophic consequences as
-illustrated by our attempt to compute the value of the constant pi
-(*note Floating-point Programming::). Extra precision can greatly
-enhance the stability and the accuracy of your computation in such
-cases.
-
- Repeated addition is not necessarily equivalent to multiplication in
-floating-point arithmetic. In the example in *note Floating-point
-Programming:::
-
- $ gawk 'BEGIN {
- > for (d = 1.1; d <= 1.5; d += 0.1)
- > i++
- > print i
- > }'
- -| 4
-
-you may or may not succeed in getting the correct result by choosing an
-arbitrarily large value for `PREC'. Reformulation of the problem at
-hand is often the correct approach in such situations.
-
-
-File: gawk.info, Node: Arbitrary Precision Integers, Prev: Arbitrary Precision Floats, Up: Arbitrary Precision Arithmetic
-
-11.5 Arbitrary Precision Integer Arithmetic with `gawk'
-=======================================================
-
-If the option `--bignum' or `-M' is specified, `gawk' performs all
-integer arithmetic using GMP arbitrary precision integers. Any number
-that looks like an integer in a program source or data file is stored
-as an arbitrary precision integer. The size of the integer is limited
-only by your computer's memory. The current floating-point context has
-no effect on operations involving integers. For example, the following
-computes 5^4^3^2, the result of which is beyond the limits of ordinary
-`gawk' numbers:
-
- $ gawk -M 'BEGIN {
- > x = 5^4^3^2
- > print "# of digits =", length(x)
- > print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)
- > }'
- -| # of digits = 183231
- -| 62060698786608744707 ... 92256259918212890625
-
- If you were to compute the same value using arbitrary precision
-floating-point values instead, the precision needed for correct output
-(using the formula `prec = 3.322 * dps'), would be 3.322 x 183231, or
-608693. (Thus, the floating-point representation requires over 30
-times as many decimal digits!)
-
- The result from an arithmetic operation with an integer and a
-floating-point value is a floating-point value with a precision equal
-to the working precision. The following program calculates the eighth
-term in Sylvester's sequence(1) using a recurrence:
-
- $ gawk -M 'BEGIN {
- > s = 2.0
- > for (i = 1; i <= 7; i++)
- > s = s * (s - 1) + 1
- > print s
- > }'
- -| 113423713055421845118910464
-
- The output differs from the acutal number,
-113423713055421844361000443, because the default precision of 53 is not
-enough to represent the floating-point results exactly. You can either
-increase the precision (100 is enough in this case), or replace the
-floating-point constant `2.0' with an integer, to perform all
-computations using integer arithmetic to get the correct output.
-
- It will sometimes be necessary for `gawk' to implicitly convert an
-arbitrary precision integer into an arbitrary precision floating-point
-value. This is primarily because the MPFR library does not always
-provide the relevant interface to process arbitrary precision integers
-or mixed-mode numbers as needed by an operation or function. In such a
-case, the precision is set to the minimum value necessary for exact
-conversion, and the working precision is not used for this purpose. If
-this is not what you need or want, you can employ a subterfuge like
-this:
-
- gawk -M 'BEGIN { n = 13; print (n + 0.0) % 2.0 }'
-
- You can avoid this issue altogether by specifying the number as a
-floating-point value to begin with:
-
- gawk -M 'BEGIN { n = 13.0; print n % 2.0 }'
-
- Note that for the particular example above, there is likely best to
-just use the following:
-
- gawk -M 'BEGIN { n = 13; print n % 2 }'
-
- ---------- Footnotes ----------
-
- (1) Weisstein, Eric W. `Sylvester's Sequence'. From MathWorld--A
-Wolfram Web Resource.
-`http://mathworld.wolfram.com/SylvestersSequence.html'
-
-
-File: gawk.info, Node: Advanced Features, Next: Library Functions, Prev: Arbitrary Precision Arithmetic, Up: Top
-
-12 Advanced Features of `gawk'
-******************************
-
- Write documentation as if whoever reads it is a violent psychopath
- who knows where you live.
- Steve English, as quoted by Peter Langston
-
- This major node discusses advanced features in `gawk'. It's a bit
-of a "grab bag" of items that are otherwise unrelated to each other.
-First, a command-line option allows `gawk' to recognize nondecimal
-numbers in input data, not just in `awk' programs. Then, `gawk''s
-special features for sorting arrays are presented. Next, two-way I/O,
-discussed briefly in earlier parts of this Info file, is described in
-full detail, along with the basics of TCP/IP networking. Finally,
-`gawk' can "profile" an `awk' program, making it possible to tune it
-for performance.
-
- *note Dynamic Extensions::, discusses the ability to dynamically add
-new built-in functions to `gawk'. As this feature is still immature
-and likely to change, its description is relegated to an appendix.
-
-* Menu:
-
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal and
- sorting arrays.
-* Two-way I/O:: Two-way communications with another process.
-* TCP/IP Networking:: Using `gawk' for network programming.
-* Profiling:: Profiling your `awk' programs.
-
-
-File: gawk.info, Node: Nondecimal Data, Next: Array Sorting, Up: Advanced Features
-
-12.1 Allowing Nondecimal Input Data
-===================================
-
-If you run `gawk' with the `--non-decimal-data' option, you can have
-nondecimal constants in your input data:
-
- $ echo 0123 123 0x123 |
- > gawk --non-decimal-data '{ printf "%d, %d, %d\n",
- > $1, $2, $3 }'
- -| 83, 123, 291
-
- For this feature to work, write your program so that `gawk' treats
-your data as numeric:
-
- $ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }'
- -| 0123 123 0x123
-
-The `print' statement treats its expressions as strings. Although the
-fields can act as numbers when necessary, they are still strings, so
-`print' does not try to treat them numerically. You may need to add
-zero to a field to force it to be treated as a number. For example:
-
- $ echo 0123 123 0x123 | gawk --non-decimal-data '
- > { print $1, $2, $3
- > print $1 + 0, $2 + 0, $3 + 0 }'
- -| 0123 123 0x123
- -| 83 123 291
-
- Because it is common to have decimal data with leading zeros, and
-because using this facility could lead to surprising results, the
-default is to leave it disabled. If you want it, you must explicitly
-request it.
-
- CAUTION: _Use of this option is not recommended._ It can break old
- programs very badly. Instead, use the `strtonum()' function to
- convert your data (*note Nondecimal-numbers::). This makes your
- programs easier to write and easier to read, and leads to less
- surprising results.
-
-
-File: gawk.info, Node: Array Sorting, Next: Two-way I/O, Prev: Nondecimal Data, Up: Advanced Features
-
-12.2 Controlling Array Traversal and Array Sorting
-==================================================
-
-`gawk' lets you control the order in which a `for (i in array)' loop
-traverses an array.
-
- In addition, two built-in functions, `asort()' and `asorti()', let
-you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
-
-* Menu:
-
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use `asort()' and `asorti()'.
-
-
-File: gawk.info, Node: Controlling Array Traversal, Next: Array Sorting Functions, Up: Array Sorting
-
-12.2.1 Controlling Array Traversal
-----------------------------------
-
-By default, the order in which a `for (i in array)' loop scans an array
-is not defined; it is generally based upon the internal implementation
-of arrays inside `awk'.
-
- Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose. `gawk' lets
-you do this.
-
- *note Controlling Scanning::, describes how you can assign special,
-pre-defined values to `PROCINFO["sorted_in"]' in order to control the
-order in which `gawk' will traverse an array during a `for' loop.
-
- In addition, the value of `PROCINFO["sorted_in"]' can be a function
-name. This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function. The comparison function should be defined with at least four
-arguments:
-
- function comp_func(i1, v1, i2, v2)
- {
- COMPARE ELEMENTS 1 AND 2 IN SOME FASHION
- RETURN < 0; 0; OR > 0
- }
-
- Here, I1 and I2 are the indices, and V1 and V2 are the corresponding
-values of the two elements being compared. Either V1 or V2, or both,
-can be arrays if the array being traversed contains subarrays as values.
-(*Note Arrays of Arrays::, for more information about subarrays.) The
-three possible return values are interpreted as follows:
-
-`comp_func(i1, v1, i2, v2) < 0'
- Index I1 comes before index I2 during loop traversal.
-
-`comp_func(i1, v1, i2, v2) == 0'
- Indices I1 and I2 come together but the relative order with
- respect to each other is undefined.
-
-`comp_func(i1, v1, i2, v2) > 0'
- Index I1 comes after index I2 during loop traversal.
-
- Our first comparison function can be used to scan an array in
-numerical order of the indices:
-
- function cmp_num_idx(i1, v1, i2, v2)
- {
- # numerical index comparison, ascending order
- return (i1 - i2)
- }
-
- Our second function traverses an array based on the string order of
-the element values rather than by indices:
-
- function cmp_str_val(i1, v1, i2, v2)
- {
- # string value comparison, ascending order
- v1 = v1 ""
- v2 = v2 ""
- if (v1 < v2)
- return -1
- return (v1 != v2)
- }
-
- The third comparison function makes all numbers, and numeric strings
-without any leading or trailing spaces, come out first during loop
-traversal:
-
- function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
- {
- # numbers before string value comparison, ascending order
- n1 = v1 + 0
- n2 = v2 + 0
- if (n1 == v1)
- return (n2 == v2) ? (n1 - n2) : -1
- else if (n2 == v2)
- return 1
- return (v1 < v2) ? -1 : (v1 != v2)
- }
-
- Here is a main program to demonstrate how `gawk' behaves using each
-of the previous functions:
-
- BEGIN {
- data["one"] = 10
- data["two"] = 20
- data[10] = "one"
- data[100] = 100
- data[20] = "two"
-
- f[1] = "cmp_num_idx"
- f[2] = "cmp_str_val"
- f[3] = "cmp_num_str_val"
- for (i = 1; i <= 3; i++) {
- printf("Sort function: %s\n", f[i])
- PROCINFO["sorted_in"] = f[i]
- for (j in data)
- printf("\tdata[%s] = %s\n", j, data[j])
- print ""
- }
- }
-
- Here are the results when the program is run:
-
- $ gawk -f compdemo.awk
- -| Sort function: cmp_num_idx Sort by numeric index
- -| data[two] = 20
- -| data[one] = 10 Both strings are numerically zero
- -| data[10] = one
- -| data[20] = two
- -| data[100] = 100
- -|
- -| Sort function: cmp_str_val Sort by element values as strings
- -| data[one] = 10
- -| data[100] = 100 String 100 is less than string 20
- -| data[two] = 20
- -| data[10] = one
- -| data[20] = two
- -|
- -| Sort function: cmp_num_str_val Sort all numeric values before all strings
- -| data[one] = 10
- -| data[two] = 20
- -| data[100] = 100
- -| data[10] = one
- -| data[20] = two
-
- Consider sorting the entries of a GNU/Linux system password file
-according to login name. The following program sorts records by a
-specific field position and can be used for this purpose:
-
- # sort.awk --- simple program to sort by field position
- # field position is specified by the global variable POS
-
- function cmp_field(i1, v1, i2, v2)
- {
- # comparison by value, as string, and ascending order
- return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
- }
-
- {
- for (i = 1; i <= NF; i++)
- a[NR][i] = $i
- }
-
- END {
- PROCINFO["sorted_in"] = "cmp_field"
- if (POS < 1 || POS > NF)
- POS = 1
- for (i in a) {
- for (j = 1; j <= NF; j++)
- printf("%s%c", a[i][j], j < NF ? ":" : "")
- print ""
- }
- }
-
- The first field in each entry of the password file is the user's
-login name, and the fields are separated by colons. Each record
-defines a subarray, with each field as an element in the subarray.
-Running the program produces the following output:
-
- $ gawk -vPOS=1 -F: -f sort.awk /etc/passwd
- -| adm:x:3:4:adm:/var/adm:/sbin/nologin
- -| apache:x:48:48:Apache:/var/www:/sbin/nologin
- -| avahi:x:70:70:Avahi daemon:/:/sbin/nologin
- ...
-
- The comparison should normally always return the same value when
-given a specific pair of array elements as its arguments. If
-inconsistent results are returned then the order is undefined. This
-behavior can be exploited to introduce random order into otherwise
-seemingly ordered data:
-
- function cmp_randomize(i1, v1, i2, v2)
- {
- # random order
- return (2 - 4 * rand())
- }
-
- As mentioned above, the order of the indices is arbitrary if two
-elements compare equal. This is usually not a problem, but letting the
-tied elements come out in arbitrary order can be an issue, especially
-when comparing item values. The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added
-or removed from the array. One way to resolve ties when comparing
-elements with otherwise equal values is to include the indices in the
-comparison rules. Note that doing this may make the loop traversal
-less efficient, so consider it only if necessary. The following
-comparison functions force a deterministic order, and are based on the
-fact that the indices of two elements are never equal:
-
- function cmp_numeric(i1, v1, i2, v2)
- {
- # numerical value (and index) comparison, descending order
- return (v1 != v2) ? (v2 - v1) : (i2 - i1)
- }
-
- function cmp_string(i1, v1, i2, v2)
- {
- # string value (and index) comparison, descending order
- v1 = v1 i1
- v2 = v2 i2
- return (v1 > v2) ? -1 : (v1 != v2)
- }
-
- A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to designing
-such a function.
-
- When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices handled
-as strings, the value of `IGNORECASE' (*note Built-in Variables::)
-controls whether the comparisons treat corresponding uppercase and
-lowercase letters as equivalent or distinct.
-
- Another point to keep in mind is that in the case of subarrays the
-element values can themselves be arrays; a production comparison
-function should use the `isarray()' function (*note Type Functions::),
-to check for this, and choose a defined sorting order for subarrays.
-
- All sorting based on `PROCINFO["sorted_in"]' is disabled in POSIX
-mode, since the `PROCINFO' array is not special in that case.
-
- As a side note, sorting the array indices before traversing the
-array has been reported to add 15% to 20% overhead to the execution
-time of `awk' programs. For this reason, sorted array traversal is not
-the default.
-
-
-File: gawk.info, Node: Array Sorting Functions, Prev: Controlling Array Traversal, Up: Array Sorting
-
-12.2.2 Sorting Array Values and Indices with `gawk'
----------------------------------------------------
-
-In most `awk' implementations, sorting an array requires writing a
-`sort()' function. While this can be educational for exploring
-different sorting algorithms, usually that's not the point of the
-program. `gawk' provides the built-in `asort()' and `asorti()'
-functions (*note String Functions::) for sorting arrays. For example:
-
- POPULATE THE ARRAY data
- n = asort(data)
- for (i = 1; i <= n; i++)
- DO SOMETHING WITH data[i]
-
- After the call to `asort()', the array `data' is indexed from 1 to
-some number N, the total number of elements in `data'. (This count is
-`asort()''s return value.) `data[1]' <= `data[2]' <= `data[3]', and so
-on. The comparison is based on the type of the elements (*note Typing
-and Comparison::). All numeric values come before all string values,
-which in turn come before all subarrays.
-
- An important side effect of calling `asort()' is that _the array's
-original indices are irrevocably lost_. As this isn't always
-desirable, `asort()' accepts a second argument:
-
- POPULATE THE ARRAY source
- n = asort(source, dest)
- for (i = 1; i <= n; i++)
- DO SOMETHING WITH dest[i]
-
- In this case, `gawk' copies the `source' array into the `dest' array
-and then sorts `dest', destroying its indices. However, the `source'
-array is not affected.
-
- `asort()' accepts a third string argument to control comparison of
-array elements. As with `PROCINFO["sorted_in"]', this argument may be
-one of the predefined names that `gawk' provides (*note Controlling
-Scanning::), or the name of a user-defined function (*note Controlling
-Array Traversal::).
-
- NOTE: In all cases, the sorted element values consist of the
- original array's element values. The ability to control
- comparison merely affects the way in which they are sorted.
-
- Often, what's needed is to sort on the values of the _indices_
-instead of the values of the elements. To do that, use the `asorti()'
-function. The interface is identical to that of `asort()', except that
-the index values are used for sorting, and become the values of the
-result array:
-
- { source[$0] = some_func($0) }
-
- END {
- n = asorti(source, dest)
- for (i = 1; i <= n; i++) {
- Work with sorted indices directly:
- DO SOMETHING WITH dest[i]
- ...
- Access original array via sorted indices:
- DO SOMETHING WITH source[dest[i]]
- }
- }
-
- Similar to `asort()', in all cases, the sorted element values
-consist of the original array's indices. The ability to control
-comparison merely affects the way in which they are sorted.
-
- Sorting the array by replacing the indices provides maximal
-flexibility. To traverse the elements in decreasing order, use a loop
-that goes from N down to 1, either over the elements or over the
-indices.(1)
-
- Copying array indices and elements isn't expensive in terms of
-memory. Internally, `gawk' maintains "reference counts" to data. For
-example, when `asort()' copies the first array to the second one, there
-is only one copy of the original array elements' data, even though both
-arrays use the values.
-
- Because `IGNORECASE' affects string comparisons, the value of
-`IGNORECASE' also affects sorting for both `asort()' and `asorti()'.
-Note also that the locale's sorting order does _not_ come into play;
-comparisons are based on character values only.(2) Caveat Emptor.
-
- ---------- Footnotes ----------
-
- (1) You may also use one of the predefined sorting names that sorts
-in decreasing order.
-
- (2) This is true because locale-based comparison occurs only when in
-POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk'
-extensions, they are not available in that case.
-
-
-File: gawk.info, Node: Two-way I/O, Next: TCP/IP Networking, Prev: Array Sorting, Up: Advanced Features
-
-12.3 Two-Way Communications with Another Process
-================================================
-
- From: brennan@whidbey.com (Mike Brennan)
- Newsgroups: comp.lang.awk
- Subject: Re: Learn the SECRET to Attract Women Easily
- Date: 4 Aug 1997 17:34:46 GMT
- Message-ID: <5s53rm$eca@news.whidbey.com>
-
- On 3 Aug 1997 13:17:43 GMT, Want More Dates???
- <tracy78@kilgrona.com> wrote:
- >Learn the SECRET to Attract Women Easily
- >
- >The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
-
- The scent of awk programmers is a lot more attractive to women than
- the scent of perl programmers.
- --
- Mike Brennan
-
- It is often useful to be able to send data to a separate program for
-processing and then read the result. This can always be done with
-temporary files:
-
- # Write the data for processing
- tempfile = ("mydata." PROCINFO["pid"])
- while (NOT DONE WITH DATA)
- print DATA | ("subprogram > " tempfile)
- close("subprogram > " tempfile)
-
- # Read the results, remove tempfile when done
- while ((getline newdata < tempfile) > 0)
- PROCESS newdata APPROPRIATELY
- close(tempfile)
- system("rm " tempfile)
-
-This works, but not elegantly. Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, `/tmp' will not do, as another user might happen to be
-using a temporary file with the same name.
-
- However, with `gawk', it is possible to open a _two-way_ pipe to
-another process. The second process is termed a "coprocess", since it
-runs in parallel with `gawk'. The two-way connection is created using
-the `|&' operator (borrowed from the Korn shell, `ksh'):(1)
-
- do {
- print DATA |& "subprogram"
- "subprogram" |& getline results
- } while (DATA LEFT TO PROCESS)
- close("subprogram")
-
- The first time an I/O operation is executed using the `|&' operator,
-`gawk' creates a two-way pipeline to a child process that runs the
-other program. Output created with `print' or `printf' is written to
-the program's standard input, and output from the program's standard
-output can be read by the `gawk' program using `getline'. As is the
-case with processes started by `|', the subprogram can be any program,
-or pipeline of programs, that can be started by the shell.
-
- There are some cautionary items to be aware of:
-
- * As the code inside `gawk' currently stands, the coprocess's
- standard error goes to the same place that the parent `gawk''s
- standard error goes. It is not possible to read the child's
- standard error separately.
-
- * I/O buffering may be a problem. `gawk' automatically flushes all
- output down the pipe to the coprocess. However, if the coprocess
- does not flush its output, `gawk' may hang when doing a `getline'
- in order to read the coprocess's results. This could lead to a
- situation known as "deadlock", where each process is waiting for
- the other one to do something.
-
- It is possible to close just one end of the two-way pipe to a
-coprocess, by supplying a second argument to the `close()' function of
-either `"to"' or `"from"' (*note Close Files And Pipes::). These
-strings tell `gawk' to close the end of the pipe that sends data to the
-coprocess or the end that reads from it, respectively.
-
- This is particularly necessary in order to use the system `sort'
-utility as part of a coprocess; `sort' must read _all_ of its input
-data before it can produce any output. The `sort' program does not
-receive an end-of-file indication until `gawk' closes the write end of
-the pipe.
-
- When you have finished writing data to the `sort' utility, you can
-close the `"to"' end of the pipe, and then start reading sorted data
-via `getline'. For example:
-
- BEGIN {
- command = "LC_ALL=C sort"
- n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
- for (i = n; i > 0; i--)
- print a[i] |& command
- close(command, "to")
-
- while ((command |& getline line) > 0)
- print "got", line
- close(command)
- }
-
- This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to `sort'. It then closes the write
-end of the pipe, so that `sort' receives an end-of-file indication.
-This causes `sort' to sort the data and write the sorted data back to
-the `gawk' program. Once all of the data has been read, `gawk'
-terminates the coprocess and exits.
-
- As a side note, the assignment `LC_ALL=C' in the `sort' command
-ensures traditional Unix (ASCII) sorting from `sort'.
-
- You may also use pseudo-ttys (ptys) for two-way communication
-instead of pipes, if your system supports them. This is done on a
-per-command basis, by setting a special element in the `PROCINFO' array
-(*note Auto-set::), like so:
-
- command = "sort -nr" # command, save in convenience variable
- PROCINFO[command, "pty"] = 1 # update PROCINFO
- print ... |& command # start two-way pipe
- ...
-
-Using ptys avoids the buffer deadlock issues described earlier, at some
-loss in performance. If your system does not have ptys, or if all the
-system's ptys are in use, `gawk' automatically falls back to using
-regular pipes.
-
- ---------- Footnotes ----------
-
- (1) This is very different from the same operator in the C shell.
-
-
-File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way I/O, Up: Advanced Features
-
-12.4 Using `gawk' for Network Programming
-=========================================
-
- `EMISTERED':
- A host is a host from coast to coast,
- and no-one can talk to host that's close,
- unless the host that isn't close
- is busy hung or dead.
-
- In addition to being able to open a two-way pipeline to a coprocess
-on the same system (*note Two-way I/O::), it is possible to make a
-two-way connection to another process on another system across an IP
-network connection.
-
- You can think of this as just a _very long_ two-way pipeline to a
-coprocess. The way `gawk' decides that you want to use TCP/IP
-networking is by recognizing special file names that begin with one of
-`/inet/', `/inet4/' or `/inet6'.
-
- The full syntax of the special file name is
-`/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'. The
-components are:
-
-NET-TYPE
- Specifies the kind of Internet connection to make. Use `/inet4/'
- to force IPv4, and `/inet6/' to force IPv6. Plain `/inet/' (which
- used to be the only option) uses the system default, most likely
- IPv4.
-
-PROTOCOL
- The protocol to use over IP. This must be either `tcp', or `udp',
- for a TCP or UDP IP connection, respectively. The use of TCP is
- recommended for most applications.
-
-LOCAL-PORT
- The local TCP or UDP port number to use. Use a port number of `0'
- when you want the system to pick a port. This is what you should do
- when writing a TCP or UDP client. You may also use a well-known
- service name, such as `smtp' or `http', in which case `gawk'
- attempts to determine the predefined port number using the C
- `getaddrinfo()' function.
-
-REMOTE-HOST
- The IP address or fully-qualified domain name of the Internet host
- to which you want to connect.
-
-REMOTE-PORT
- The TCP or UDP port number to use on the given REMOTE-HOST.
- Again, use `0' if you don't care, or else a well-known service
- name.
-
- NOTE: Failure in opening a two-way socket will result in a
- non-fatal error being returned to the calling code. The value of
- `ERRNO' indicates the error (*note Auto-set::).
-
- Consider the following very simple example:
-
- BEGIN {
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
- }
-
- This program reads the current date and time from the local system's
-TCP `daytime' server. It then prints the results and closes the
-connection.
-
- Because this topic is extensive, the use of `gawk' for TCP/IP
-programming is documented separately. See *note (General
-Introduction)Top:: gawkinet, TCP/IP Internetworking with `gawk', for a
-much more complete introduction and discussion, as well as extensive
-examples.
-
-
-File: gawk.info, Node: Profiling, Prev: TCP/IP Networking, Up: Advanced Features
-
-12.5 Profiling Your `awk' Programs
-==================================
-
-You may produce execution traces of your `awk' programs. This is done
-by passing the option `--profile' to `gawk'. When `gawk' has finished
-running, it creates a profile of your program in a file named
-`awkprof.out'. Because it is profiling, it also executes up to 45%
-slower than `gawk' normally does.
-
- As shown in the following example, the `--profile' option can be
-used to change the name of the file where `gawk' will write the profile:
-
- gawk --profile=myprog.prof -f myprog.awk data1 data2
-
-In the above example, `gawk' places the profile in `myprog.prof'
-instead of in `awkprof.out'.
-
- Here is a sample session showing a simple `awk' program, its input
-data, and the results from running `gawk' with the `--profile' option.
-First, the `awk' program:
-
- BEGIN { print "First BEGIN rule" }
-
- END { print "First END rule" }
-
- /foo/ {
- print "matched /foo/, gosh"
- for (i = 1; i <= 3; i++)
- sing()
- }
-
- {
- if (/foo/)
- print "if is true"
- else
- print "else is true"
- }
-
- BEGIN { print "Second BEGIN rule" }
-
- END { print "Second END rule" }
-
- function sing( dummy)
- {
- print "I gotta be me!"
- }
-
- Following is the input data:
-
- foo
- bar
- baz
- foo
- junk
-
- Here is the `awkprof.out' that results from running the `gawk'
-profiler on this program and data (this example also illustrates that
-`awk' programmers sometimes have to work late):
-
- # gawk profile, created Sun Aug 13 00:00:15 2000
-
- # BEGIN block(s)
-
- BEGIN {
- 1 print "First BEGIN rule"
- 1 print "Second BEGIN rule"
- }
-
- # Rule(s)
-
- 5 /foo/ { # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) {
- 6 sing()
- }
- }
-
- 5 {
- 5 if (/foo/) { # 2
- 2 print "if is true"
- 3 } else {
- 3 print "else is true"
- }
- }
-
- # END block(s)
-
- END {
- 1 print "First END rule"
- 1 print "Second END rule"
- }
-
- # Functions, listed alphabetically
-
- 6 function sing(dummy)
- {
- 6 print "I gotta be me!"
- }
-
- This example illustrates many of the basic features of profiling
-output. They are as follows:
-
- * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule,
- pattern/action rules, `ENDFILE' rule, `END' rule and functions,
- listed alphabetically. Multiple `BEGIN' and `END' rules are
- merged together, as are multiple `BEGINFILE' and `ENDFILE' rules.
-
- * Pattern-action rules have two counts. The first count, to the
- left of the rule, shows how many times the rule's pattern was
- _tested_. The second count, to the right of the rule's opening
- left brace in a comment, shows how many times the rule's action
- was _executed_. The difference between the two indicates how many
- times the rule's pattern evaluated to false.
-
- * Similarly, the count for an `if'-`else' statement shows how many
- times the condition was tested. To the right of the opening left
- brace for the `if''s body is a count showing how many times the
- condition was true. The count for the `else' indicates how many
- times the test failed.
-
- * The count for a loop header (such as `for' or `while') shows how
- many times the loop test was executed. (Because of this, you
- can't just look at the count on the first statement in a rule to
- determine how many times the rule was executed. If the first
- statement is a loop, the count is misleading.)
-
- * For user-defined functions, the count next to the `function'
- keyword indicates how many times the function was called. The
- counts next to the statements in the body show how many times
- those statements were executed.
-
- * The layout uses "K&R" style with TABs. Braces are used
- everywhere, even when the body of an `if', `else', or loop is only
- a single statement.
-
- * Parentheses are used only where needed, as indicated by the
- structure of the program and the precedence rules. For example,
- `(3 + 5) * 4' means add three plus five, then multiply the total
- by four. However, `3 + 5 * 4' has no parentheses, and means `3 +
- (5 * 4)'.
-
- * Parentheses are used around the arguments to `print' and `printf'
- only when the `print' or `printf' statement is followed by a
- redirection. Similarly, if the target of a redirection isn't a
- scalar, it gets parenthesized.
-
- * `gawk' supplies leading comments in front of the `BEGIN' and `END'
- rules, the pattern/action rules, and the functions.
-
-
- The profiled version of your program may not look exactly like what
-you typed when you wrote it. This is because `gawk' creates the
-profiled version by "pretty printing" its internal representation of
-the program. The advantage to this is that `gawk' can produce a
-standard representation. The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple `BEGIN',
-`END', `BEGINFILE', and `ENDFILE' rules. Also, things such as:
-
- /foo/
-
-come out as:
-
- /foo/ {
- print $0
- }
-
-which is correct, but possibly surprising.
-
- Besides creating profiles when a program has completed, `gawk' can
-produce a profile while it is running. This is useful if your `awk'
-program goes into an infinite loop and you want to see what has been
-executed. To use this feature, run `gawk' with the `--profile' option
-in the background:
-
- $ gawk --profile -f myprog &
- [1] 13992
-
-The shell prints a job number and process ID number; in this case,
-13992. Use the `kill' command to send the `USR1' signal to `gawk':
-
- $ kill -USR1 13992
-
-As usual, the profiled version of the program is written to
-`awkprof.out', or to a different file if one specified with the
-`--profile' option.
-
- Along with the regular profile, as shown earlier, the profile
-includes a trace of any active functions:
-
- # Function Call Stack:
-
- # 3. baz
- # 2. bar
- # 1. foo
- # -- main --
-
- You may send `gawk' the `USR1' signal as many times as you like.
-Each time, the profile and function call trace are appended to the
-output profile file.
-
- If you use the `HUP' signal instead of the `USR1' signal, `gawk'
-produces the profile and the function call trace and then exits.
-
- When `gawk' runs on MS-Windows systems, it uses the `INT' and `QUIT'
-signals for producing the profile and, in the case of the `INT' signal,
-`gawk' exits. This is because these systems don't support the `kill'
-command, so the only signals you can deliver to a program are those
-generated by the keyboard. The `INT' signal is generated by the
-`Ctrl-<C>' or `Ctrl-<BREAK>' key, while the `QUIT' signal is generated
-by the `Ctrl-<\>' key.
-
- Finally, `gawk' also accepts another option `--pretty-print'. When
-called this way, `gawk' "pretty prints" the program into `awkprof.out',
-without any execution counts.
-
-
-File: gawk.info, Node: Library Functions, Next: Sample Programs, Prev: Advanced Features, Up: Top
-
-13 A Library of `awk' Functions
-*******************************
-
-*note User-defined::, describes how to write your own `awk' functions.
-Writing functions is important, because it allows you to encapsulate
-algorithms and program tasks in a single place. It simplifies
-programming, making program development more manageable, and making
-programs more readable.
-
- One valuable way to learn a new programming language is to _read_
-programs in that language. To that end, this major node and *note
-Sample Programs::, provide a good-sized body of code for you to read,
-and hopefully, to learn from.
-
- This major node presents a library of useful `awk' functions. Many
-of the sample programs presented later in this Info file use these
-functions. The functions are presented here in a progression from
-simple to complex.
-
- *note Extract Program::, presents a program that you can use to
-extract the source code for these example library functions and
-programs from the Texinfo source for this Info file. (This has already
-been done as part of the `gawk' distribution.)
-
- If you have written one or more useful, general-purpose `awk'
-functions and would like to contribute them to the `awk' user
-community, see *note How To Contribute::, for more information.
-
- The programs in this major node and in *note Sample Programs::,
-freely use features that are `gawk'-specific. Rewriting these programs
-for different implementations of `awk' is pretty straightforward.
-
- * Diagnostic error messages are sent to `/dev/stderr'. Use `| "cat
- 1>&2"' instead of `> "/dev/stderr"' if your system does not have a
- `/dev/stderr', or if you cannot use `gawk'.
-
- * A number of programs use `nextfile' (*note Nextfile Statement::)
- to skip any remaining input in the input file.
-
- * Finally, some of the programs choose to ignore upper- and lowercase
- distinctions in their input. They do so by assigning one to
- `IGNORECASE'. You can achieve almost the same effect(1) by adding
- the following rule to the beginning of the program:
-
- # ignore case
- { $0 = tolower($0) }
-
- Also, verify that all regexp and string constants used in
- comparisons use only lowercase letters.
-
-* Menu:
-
-* Library Names:: How to best name private global variables in
- library functions.
-* General Functions:: Functions that are of general use.
-* Data File Management:: Functions for managing command-line data
- files.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-
- ---------- Footnotes ----------
-
- (1) The effects are not identical. Output of the transformed record
-will be in all lowercase, while `IGNORECASE' preserves the original
-contents of the input record.
-
-
-File: gawk.info, Node: Library Names, Next: General Functions, Up: Library Functions
-
-13.1 Naming Library Function Global Variables
-=============================================
-
-Due to the way the `awk' language evolved, variables are either
-"global" (usable by the entire program) or "local" (usable just by a
-specific function). There is no intermediate state analogous to
-`static' variables in C.
-
- Library functions often need to have global variables that they can
-use to preserve state information between calls to the function--for
-example, `getopt()''s variable `_opti' (*note Getopt Function::). Such
-variables are called "private", since the only functions that need to
-use them are the ones in the library.
-
- When writing a library function, you should try to choose names for
-your private variables that will not conflict with any variables used by
-either another library function or a user's main program. For example,
-a name like `i' or `j' is not a good choice, because user programs
-often use variable names like these for their own purposes.
-
- The example programs shown in this major node all start the names of
-their private variables with an underscore (`_'). Users generally
-don't use leading underscores in their variable names, so this
-convention immediately decreases the chances that the variable name
-will be accidentally shared with the user's program.
-
- In addition, several of the library functions use a prefix that helps
-indicate what function or set of functions use the variables--for
-example, `_pw_byname' in the user database routines (*note Passwd
-Functions::). This convention is recommended, since it even further
-decreases the chance of inadvertent conflict among variable names.
-Note that this convention is used equally well for variable names and
-for private function names.(1)
-
- As a final note on variable naming, if a function makes global
-variables available for use by a main program, it is a good convention
-to start that variable's name with a capital letter--for example,
-`getopt()''s `Opterr' and `Optind' variables (*note Getopt Function::).
-The leading capital letter indicates that it is global, while the fact
-that the variable name is not all capital letters indicates that the
-variable is not one of `awk''s built-in variables, such as `FS'.
-
- It is also important that _all_ variables in library functions that
-do not need to save state are, in fact, declared local.(2) If this is
-not done, the variable could accidentally be used in the user's
-program, leading to bugs that are very difficult to track down:
-
- function lib_func(x, y, l1, l2)
- {
- ...
- USE VARIABLE some_var # some_var should be local
- ... # but is not by oversight
- }
-
- A different convention, common in the Tcl community, is to use a
-single associative array to hold the values needed by the library
-function(s), or "package." This significantly decreases the number of
-actual global names in use. For example, the functions described in
-*note Passwd Functions::, might have used array elements
-`PW_data["inited"]', `PW_data["total"]', `PW_data["count"]', and
-`PW_data["awklib"]', instead of `_pw_inited', `_pw_awklib', `_pw_total',
-and `_pw_count'.
-
- The conventions presented in this minor node are exactly that:
-conventions. You are not required to write your programs this way--we
-merely recommend that you do so.
-
- ---------- Footnotes ----------
-
- (1) While all the library routines could have been rewritten to use
-this convention, this was not done, in order to show how our own `awk'
-programming style has evolved and to provide some basis for this
-discussion.
-
- (2) `gawk''s `--dump-variables' command-line option is useful for
-verifying this.
-
-
-File: gawk.info, Node: General Functions, Next: Data File Management, Prev: Library Names, Up: Library Functions
-
-13.2 General Programming
-========================
-
-This minor node presents a number of functions that are of general
-programming use.
-
-* Menu:
-
-* Strtonum Function:: A replacement for the built-in
- `strtonum()' function.
-* Assert Function:: A function for assertions in `awk'
- programs.
-* Round Function:: A function for rounding if `sprintf()'
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers and
- vice versa.
-* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
-
-
-File: gawk.info, Node: Strtonum Function, Next: Assert Function, Up: General Functions
-
-13.2.1 Converting Strings To Numbers
-------------------------------------
-
-The `strtonum()' function (*note String Functions::) is a `gawk'
-extension. The following function provides an implementation for other
-versions of `awk':
-
- # mystrtonum --- convert string to number
-
- function mystrtonum(str, ret, chars, n, i, k, c)
- {
- if (str ~ /^0[0-7]*$/) {
- # octal
- n = length(str)
- ret = 0
- for (i = 1; i <= n; i++) {
- c = substr(str, i, 1)
- if ((k = index("01234567", c)) > 0)
- k-- # adjust for 1-basing in awk
-
- ret = ret * 8 + k
- }
- } else if (str ~ /^0[xX][[:xdigit:]]+/) {
- # hexadecimal
- str = substr(str, 3) # lop off leading 0x
- n = length(str)
- ret = 0
- for (i = 1; i <= n; i++) {
- c = substr(str, i, 1)
- c = tolower(c)
- if ((k = index("0123456789", c)) > 0)
- k-- # adjust for 1-basing in awk
- else if ((k = index("abcdef", c)) > 0)
- k += 9
-
- ret = ret * 16 + k
- }
- } else if (str ~ \
- /^[-+]?([0-9]+([.][0-9]*([Ee][0-9]+)?)?|([.][0-9]+([Ee][-+]?[0-9]+)?))$/) {
- # decimal number, possibly floating point
- ret = str + 0
- } else
- ret = "NOT-A-NUMBER"
-
- return ret
- }
-
- # BEGIN { # gawk test harness
- # a[1] = "25"
- # a[2] = ".31"
- # a[3] = "0123"
- # a[4] = "0xdeadBEEF"
- # a[5] = "123.45"
- # a[6] = "1.e3"
- # a[7] = "1.32"
- # a[7] = "1.32E2"
- #
- # for (i = 1; i in a; i++)
- # print a[i], strtonum(a[i]), mystrtonum(a[i])
- # }
-
- The function first looks for C-style octal numbers (base 8). If the
-input string matches a regular expression describing octal numbers,
-then `mystrtonum()' loops through each character in the string. It
-sets `k' to the index in `"01234567"' of the current octal digit.
-Since the return value is one-based, the `k--' adjusts `k' so it can be
-used in computing the return value.
-
- Similar logic applies to the code that checks for and converts a
-hexadecimal value, which starts with `0x' or `0X'. The use of
-`tolower()' simplifies the computation for finding the correct numeric
-value for each hexadecimal digit.
-
- Finally, if the string matches the (rather complicated) regexp for a
-regular decimal integer or floating-point number, the computation `ret
-= str + 0' lets `awk' convert the value to a number.
-
- A commented-out test program is included, so that the function can
-be tested with `gawk' and the results compared to the built-in
-`strtonum()' function.
-
-
-File: gawk.info, Node: Assert Function, Next: Round Function, Prev: Strtonum Function, Up: General Functions
-
-13.2.2 Assertions
------------------
-
-When writing large programs, it is often useful to know that a
-condition or set of conditions is true. Before proceeding with a
-particular computation, you make a statement about what you believe to
-be the case. Such a statement is known as an "assertion". The C
-language provides an `<assert.h>' header file and corresponding
-`assert()' macro that the programmer can use to make assertions. If an
-assertion fails, the `assert()' macro arranges to print a diagnostic
-message describing the condition that should have been true but was
-not, and then it kills the program. In C, using `assert()' looks this:
-
- #include <assert.h>
-
- int myfunc(int a, double b)
- {
- assert(a <= 5 && b >= 17.1);
- ...
- }
-
- If the assertion fails, the program prints a message similar to this:
-
- prog.c:5: assertion failed: a <= 5 && b >= 17.1
-
- The C language makes it possible to turn the condition into a string
-for use in printing the diagnostic message. This is not possible in
-`awk', so this `assert()' function also requires a string version of
-the condition that is being tested. Following is the function:
-
- # assert --- assert that a condition is true. Otherwise exit.
-
- function assert(condition, string)
- {
- if (! condition) {
- printf("%s:%d: assertion failed: %s\n",
- FILENAME, FNR, string) > "/dev/stderr"
- _assert_exit = 1
- exit 1
- }
- }
-
- END {
- if (_assert_exit)
- exit 1
- }
-
- The `assert()' function tests the `condition' parameter. If it is
-false, it prints a message to standard error, using the `string'
-parameter to describe the failed condition. It then sets the variable
-`_assert_exit' to one and executes the `exit' statement. The `exit'
-statement jumps to the `END' rule. If the `END' rules finds
-`_assert_exit' to be true, it exits immediately.
-
- The purpose of the test in the `END' rule is to keep any other `END'
-rules from running. When an assertion fails, the program should exit
-immediately. If no assertions fail, then `_assert_exit' is still false
-when the `END' rule is run normally, and the rest of the program's
-`END' rules execute. For all of this to work correctly, `assert.awk'
-must be the first source file read by `awk'. The function can be used
-in a program in the following way:
-
- function myfunc(a, b)
- {
- assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
- ...
- }
-
-If the assertion fails, you see a message similar to the following:
-
- mydata:1357: assertion failed: a <= 5 && b >= 17.1
-
- There is a small problem with this version of `assert()'. An `END'
-rule is automatically added to the program calling `assert()'.
-Normally, if a program consists of just a `BEGIN' rule, the input files
-and/or standard input are not read. However, now that the program has
-an `END' rule, `awk' attempts to read the input data files or standard
-input (*note Using BEGIN/END::), most likely causing the program to
-hang as it waits for input.
-
- There is a simple workaround to this: make sure that such a `BEGIN'
-rule always ends with an `exit' statement.
-
-
-File: gawk.info, Node: Round Function, Next: Cliff Random Function, Prev: Assert Function, Up: General Functions
-
-13.2.3 Rounding Numbers
------------------------
-
-The way `printf' and `sprintf()' (*note Printf::) perform rounding
-often depends upon the system's C `sprintf()' subroutine. On many
-machines, `sprintf()' rounding is "unbiased," which means it doesn't
-always round a trailing `.5' up, contrary to naive expectations. In
-unbiased rounding, `.5' rounds to even, rather than always up, so 1.5
-rounds to 2 but 4.5 rounds to 4. This means that if you are using a
-format that does rounding (e.g., `"%.0f"'), you should check what your
-system does. The following function does traditional rounding; it
-might be useful if your `awk''s `printf' does unbiased rounding:
-
- # round.awk --- do normal rounding
-
- function round(x, ival, aval, fraction)
- {
- ival = int(x) # integer part, int() truncates
-
- # see if fractional part
- if (ival == x) # no fraction
- return ival # ensure no decimals
-
- if (x < 0) {
- aval = -x # absolute value
- ival = int(aval)
- fraction = aval - ival
- if (fraction >= .5)
- return int(x) - 1 # -2.5 --> -3
- else
- return int(x) # -2.3 --> -2
- } else {
- fraction = x - ival
- if (fraction >= .5)
- return ival + 1
- else
- return ival
- }
- }
-
- # test harness
- { print $0, round($0) }
-
-
-File: gawk.info, Node: Cliff Random Function, Next: Ordinal Functions, Prev: Round Function, Up: General Functions
-
-13.2.4 The Cliff Random Number Generator
-----------------------------------------
-
-The Cliff random number generator
-(http://mathworld.wolfram.com/CliffRandomNumberGenerator.html) is a
-very simple random number generator that "passes the noise sphere test
-for randomness by showing no structure." It is easily programmed, in
-less than 10 lines of `awk' code:
-
- # cliff_rand.awk --- generate Cliff random numbers
-
- BEGIN { _cliff_seed = 0.1 }
-
- function cliff_rand()
- {
- _cliff_seed = (100 * log(_cliff_seed)) % 1
- if (_cliff_seed < 0)
- _cliff_seed = - _cliff_seed
- return _cliff_seed
- }
-
- This algorithm requires an initial "seed" of 0.1. Each new value
-uses the current seed as input for the calculation. If the built-in
-`rand()' function (*note Numeric Functions::) isn't random enough, you
-might try using this function instead.
-
-
-File: gawk.info, Node: Ordinal Functions, Next: Join Function, Prev: Cliff Random Function, Up: General Functions
-
-13.2.5 Translating Between Characters and Numbers
--------------------------------------------------
-
-One commercial implementation of `awk' supplies a built-in function,
-`ord()', which takes a character and returns the numeric value for that
-character in the machine's character set. If the string passed to
-`ord()' has more than one character, only the first one is used.
-
- The inverse of this function is `chr()' (from the function of the
-same name in Pascal), which takes a number and returns the
-corresponding character. Both functions are written very nicely in
-`awk'; there is no real reason to build them into the `awk' interpreter:
-
- # ord.awk --- do ord and chr
-
- # Global identifiers:
- # _ord_: numerical values indexed by characters
- # _ord_init: function to initialize _ord_
-
- BEGIN { _ord_init() }
-
- function _ord_init( low, high, i, t)
- {
- low = sprintf("%c", 7) # BEL is ascii 7
- if (low == "\a") { # regular ascii
- low = 0
- high = 127
- } else if (sprintf("%c", 128 + 7) == "\a") {
- # ascii, mark parity
- low = 128
- high = 255
- } else { # ebcdic(!)
- low = 0
- high = 255
- }
-
- for (i = low; i <= high; i++) {
- t = sprintf("%c", i)
- _ord_[t] = i
- }
- }
-
- Some explanation of the numbers used by `chr' is worthwhile. The
-most prominent character set in use today is ASCII.(1) Although an
-8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
-defines characters that use the values from 0 to 127.(2) In the now
-distant past, at least one minicomputer manufacturer used ASCII, but
-with mark parity, meaning that the leftmost bit in the byte is always
-1. This means that on those systems, characters have numeric values
-from 128 to 255. Finally, large mainframe systems use the EBCDIC
-character set, which uses all 256 values. While there are other
-character sets in use on some older systems, they are not really worth
-worrying about:
-
- function ord(str, c)
- {
- # only first character is of interest
- c = substr(str, 1, 1)
- return _ord_[c]
- }
-
- function chr(c)
- {
- # force c to be numeric by adding 0
- return sprintf("%c", c + 0)
- }
-
- #### test code ####
- # BEGIN \
- # {
- # for (;;) {
- # printf("enter a character: ")
- # if (getline var <= 0)
- # break
- # printf("ord(%s) = %d\n", var, ord(var))
- # }
- # }
-
- An obvious improvement to these functions is to move the code for the
-`_ord_init' function into the body of the `BEGIN' rule. It was written
-this way initially for ease of development. There is a "test program"
-in a `BEGIN' rule, to test the function. It is commented out for
-production use.
-
- ---------- Footnotes ----------
-
- (1) This is changing; many systems use Unicode, a very large
-character set that includes ASCII as a subset. On systems with full
-Unicode support, a character can occupy up to 32 bits, making simple
-tests such as used here prohibitively expensive.
-
- (2) ASCII has been extended in many countries to use the values from
-128 to 255 for country-specific characters. If your system uses these
-extensions, you can simplify `_ord_init' to loop from 0 to 255.
-
-
-File: gawk.info, Node: Join Function, Next: Gettimeofday Function, Prev: Ordinal Functions, Up: General Functions
-
-13.2.6 Merging an Array into a String
--------------------------------------
-
-When doing string processing, it is often useful to be able to join all
-the strings in an array into one long string. The following function,
-`join()', accomplishes this task. It is used later in several of the
-application programs (*note Sample Programs::).
-
- Good function design is important; this function needs to be general
-but it should also have a reasonable default behavior. It is called
-with an array as well as the beginning and ending indices of the
-elements in the array to be merged. This assumes that the array
-indices are numeric--a reasonable assumption since the array was likely
-created with `split()' (*note String Functions::):
-
- # join.awk --- join an array into a string
-
- function join(array, start, end, sep, result, i)
- {
- if (sep == "")
- sep = " "
- else if (sep == SUBSEP) # magic value
- sep = ""
- result = array[start]
- for (i = start + 1; i <= end; i++)
- result = result sep array[i]
- return result
- }
-
- An optional additional argument is the separator to use when joining
-the strings back together. If the caller supplies a nonempty value,
-`join()' uses it; if it is not supplied, it has a null value. In this
-case, `join()' uses a single space as a default separator for the
-strings. If the value is equal to `SUBSEP', then `join()' joins the
-strings with no separator between them. `SUBSEP' serves as a "magic"
-value to indicate that there should be no separation between the
-component strings.(1)
-
- ---------- Footnotes ----------
-
- (1) It would be nice if `awk' had an assignment operator for
-concatenation. The lack of an explicit operator for concatenation
-makes string operations more difficult than they really need to be.
-
-
-File: gawk.info, Node: Gettimeofday Function, Prev: Join Function, Up: General Functions
-
-13.2.7 Managing the Time of Day
--------------------------------
-
-The `systime()' and `strftime()' functions described in *note Time
-Functions::, provide the minimum functionality necessary for dealing
-with the time of day in human readable form. While `strftime()' is
-extensive, the control formats are not necessarily easy to remember or
-intuitively obvious when reading a program.
-
- The following function, `gettimeofday()', populates a user-supplied
-array with preformatted time information. It returns a string with the
-current time formatted in the same way as the `date' utility:
-
- # gettimeofday.awk --- get the time of day in a usable format
-
- # Returns a string in the format of output of date(1)
- # Populates the array argument time with individual values:
- # time["second"] -- seconds (0 - 59)
- # time["minute"] -- minutes (0 - 59)
- # time["hour"] -- hours (0 - 23)
- # time["althour"] -- hours (0 - 12)
- # time["monthday"] -- day of month (1 - 31)
- # time["month"] -- month of year (1 - 12)
- # time["monthname"] -- name of the month
- # time["shortmonth"] -- short name of the month
- # time["year"] -- year modulo 100 (0 - 99)
- # time["fullyear"] -- full year
- # time["weekday"] -- day of week (Sunday = 0)
- # time["altweekday"] -- day of week (Monday = 0)
- # time["dayname"] -- name of weekday
- # time["shortdayname"] -- short name of weekday
- # time["yearday"] -- day of year (0 - 365)
- # time["timezone"] -- abbreviation of timezone name
- # time["ampm"] -- AM or PM designation
- # time["weeknum"] -- week number, Sunday first day
- # time["altweeknum"] -- week number, Monday first day
-
- function gettimeofday(time, ret, now, i)
- {
- # get time once, avoids unnecessary system calls
- now = systime()
-
- # return date(1)-style output
- ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
-
- # clear out target array
- delete time
-
- # fill in values, force numeric values to be
- # numeric by adding 0
- time["second"] = strftime("%S", now) + 0
- time["minute"] = strftime("%M", now) + 0
- time["hour"] = strftime("%H", now) + 0
- time["althour"] = strftime("%I", now) + 0
- time["monthday"] = strftime("%d", now) + 0
- time["month"] = strftime("%m", now) + 0
- time["monthname"] = strftime("%B", now)
- time["shortmonth"] = strftime("%b", now)
- time["year"] = strftime("%y", now) + 0
- time["fullyear"] = strftime("%Y", now) + 0
- time["weekday"] = strftime("%w", now) + 0
- time["altweekday"] = strftime("%u", now) + 0
- time["dayname"] = strftime("%A", now)
- time["shortdayname"] = strftime("%a", now)
- time["yearday"] = strftime("%j", now) + 0
- time["timezone"] = strftime("%Z", now)
- time["ampm"] = strftime("%p", now)
- time["weeknum"] = strftime("%U", now) + 0
- time["altweeknum"] = strftime("%W", now) + 0
-
- return ret
- }
-
- The string indices are easier to use and read than the various
-formats required by `strftime()'. The `alarm' program presented in
-*note Alarm Program::, uses this function. A more general design for
-the `gettimeofday()' function would have allowed the user to supply an
-optional timestamp value to use instead of the current time.
-
-
-File: gawk.info, Node: Data File Management, Next: Getopt Function, Prev: General Functions, Up: Library Functions
-
-13.3 Data File Management
-=========================
-
-This minor node presents functions that are useful for managing
-command-line data files.
-
-* Menu:
-
-* Filetrans Function:: A function for handling data file transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-
-
-File: gawk.info, Node: Filetrans Function, Next: Rewind Function, Up: Data File Management
-
-13.3.1 Noting Data File Boundaries
-----------------------------------
-
-The `BEGIN' and `END' rules are each executed exactly once at the
-beginning and end of your `awk' program, respectively (*note
-BEGIN/END::). We (the `gawk' authors) once had a user who mistakenly
-thought that the `BEGIN' rule is executed at the beginning of each data
-file and the `END' rule is executed at the end of each data file.
-
- When informed that this was not the case, the user requested that we
-add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE',
-that would have the desired behavior. He even supplied us the code to
-do so.
-
- Adding these special patterns to `gawk' wasn't necessary; the job
-can be done cleanly in `awk' itself, as illustrated by the following
-library program. It arranges to call two user-supplied functions,
-`beginfile()' and `endfile()', at the beginning and end of each data
-file. Besides solving the problem in only nine(!) lines of code, it
-does so _portably_; this works with any implementation of `awk':
-
- # transfile.awk
- #
- # Give the user a hook for filename transitions
- #
- # The user must supply functions beginfile() and endfile()
- # that each take the name of the file being started or
- # finished, respectively.
-
- FILENAME != _oldfilename \
- {
- if (_oldfilename != "")
- endfile(_oldfilename)
- _oldfilename = FILENAME
- beginfile(FILENAME)
- }
-
- END { endfile(FILENAME) }
-
- This file must be loaded before the user's "main" program, so that
-the rule it supplies is executed first.
-
- This rule relies on `awk''s `FILENAME' variable that automatically
-changes for each new data file. The current file name is saved in a
-private variable, `_oldfilename'. If `FILENAME' does not equal
-`_oldfilename', then a new data file is being processed and it is
-necessary to call `endfile()' for the old file. Because `endfile()'
-should only be called if a file has been processed, the program first
-checks to make sure that `_oldfilename' is not the null string. The
-program then assigns the current file name to `_oldfilename' and calls
-`beginfile()' for the file. Because, like all `awk' variables,
-`_oldfilename' is initialized to the null string, this rule executes
-correctly even for the first data file.
-
- The program also supplies an `END' rule to do the final processing
-for the last file. Because this `END' rule comes before any `END' rules
-supplied in the "main" program, `endfile()' is called first. Once
-again the value of multiple `BEGIN' and `END' rules should be clear.
-
- If the same data file occurs twice in a row on the command line, then
-`endfile()' and `beginfile()' are not executed at the end of the first
-pass and at the beginning of the second pass. The following version
-solves the problem:
-
- # ftrans.awk --- handle data file transitions
- #
- # user supplies beginfile() and endfile() functions
-
- FNR == 1 {
- if (_filename_ != "")
- endfile(_filename_)
- _filename_ = FILENAME
- beginfile(FILENAME)
- }
-
- END { endfile(_filename_) }
-
- *note Wc Program::, shows how this library function can be used and
-how it simplifies writing the main program.
-
-Advanced Notes: So Why Does `gawk' have `BEGINFILE' and `ENDFILE'?
-------------------------------------------------------------------
-
-You are probably wondering, if `beginfile()' and `endfile()' functions
-can do the job, why does `gawk' have `BEGINFILE' and `ENDFILE' patterns
-(*note BEGINFILE/ENDFILE::)?
-
- Good question. Normally, if `awk' cannot open a file, this causes
-an immediate fatal error. In this case, there is no way for a
-user-defined function to deal with the problem, since the mechanism for
-calling it relies on the file being open and at the first record. Thus,
-the main reason for `BEGINFILE' is to give you a "hook" to catch files
-that cannot be processed. `ENDFILE' exists for symmetry, and because
-it provides an easy way to do per-file cleanup processing.
-
-
-File: gawk.info, Node: Rewind Function, Next: File Checking, Prev: Filetrans Function, Up: Data File Management
-
-13.3.2 Rereading the Current File
----------------------------------
-
-Another request for a new built-in function was for a `rewind()'
-function that would make it possible to reread the current file. The
-requesting user didn't want to have to use `getline' (*note Getline::)
-inside a loop.
-
- However, as long as you are not in the `END' rule, it is quite easy
-to arrange to immediately close the current input file and then start
-over with it from the top. For lack of a better name, we'll call it
-`rewind()':
-
- # rewind.awk --- rewind the current file and start over
-
- function rewind( i)
- {
- # shift remaining arguments up
- for (i = ARGC; i > ARGIND; i--)
- ARGV[i] = ARGV[i-1]
-
- # make sure gawk knows to keep going
- ARGC++
-
- # make current file next to get done
- ARGV[ARGIND+1] = FILENAME
-
- # do it
- nextfile
- }
-
- This code relies on the `ARGIND' variable (*note Auto-set::), which
-is specific to `gawk'. If you are not using `gawk', you can use ideas
-presented in *note Filetrans Function::, to either update `ARGIND' on
-your own or modify this code as appropriate.
-
- The `rewind()' function also relies on the `nextfile' keyword (*note
-Nextfile Statement::).
-
-
-File: gawk.info, Node: File Checking, Next: Empty Files, Prev: Rewind Function, Up: Data File Management
-
-13.3.3 Checking for Readable Data Files
----------------------------------------
-
-Normally, if you give `awk' a data file that isn't readable, it stops
-with a fatal error. There are times when you might want to just ignore
-such files and keep going. You can do this by prepending the following
-program to your `awk' program:
-
- # readable.awk --- library file to skip over unreadable files
-
- BEGIN {
- for (i = 1; i < ARGC; i++) {
- if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
- || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
- continue # assignment or standard input
- else if ((getline junk < ARGV[i]) < 0) # unreadable
- delete ARGV[i]
- else
- close(ARGV[i])
- }
- }
-
- This works, because the `getline' won't be fatal. Removing the
-element from `ARGV' with `delete' skips the file (since it's no longer
-in the list). See also *note ARGC and ARGV::.
-
-
-File: gawk.info, Node: Empty Files, Next: Ignoring Assigns, Prev: File Checking, Up: Data File Management
-
-13.3.4 Checking For Zero-length Files
--------------------------------------
-
-All known `awk' implementations silently skip over zero-length files.
-This is a by-product of `awk''s implicit
-read-a-record-and-match-against-the-rules loop: when `awk' tries to
-read a record from an empty file, it immediately receives an end of
-file indication, closes the file, and proceeds on to the next
-command-line data file, _without_ executing any user-level `awk'
-program code.
-
- Using `gawk''s `ARGIND' variable (*note Built-in Variables::), it is
-possible to detect when an empty data file has been skipped. Similar
-to the library file presented in *note Filetrans Function::, the
-following library file calls a function named `zerofile()' that the
-user must provide. The arguments passed are the file name and the
-position in `ARGV' where it was found:
-
- # zerofile.awk --- library file to process empty input files
-
- BEGIN { Argind = 0 }
-
- ARGIND > Argind + 1 {
- for (Argind++; Argind < ARGIND; Argind++)
- zerofile(ARGV[Argind], Argind)
- }
-
- ARGIND != Argind { Argind = ARGIND }
-
- END {
- if (ARGIND > Argind)
- for (Argind++; Argind <= ARGIND; Argind++)
- zerofile(ARGV[Argind], Argind)
- }
-
- The user-level variable `Argind' allows the `awk' program to track
-its progress through `ARGV'. Whenever the program detects that
-`ARGIND' is greater than `Argind + 1', it means that one or more empty
-files were skipped. The action then calls `zerofile()' for each such
-file, incrementing `Argind' along the way.
-
- The `Argind != ARGIND' rule simply keeps `Argind' up to date in the
-normal case.
-
- Finally, the `END' rule catches the case of any empty files at the
-end of the command-line arguments. Note that the test in the condition
-of the `for' loop uses the `<=' operator, not `<'.
-
- As an exercise, you might consider whether this same problem can be
-solved without relying on `gawk''s `ARGIND' variable.
-
- As a second exercise, revise this code to handle the case where an
-intervening value in `ARGV' is a variable assignment.
-
-
-File: gawk.info, Node: Ignoring Assigns, Prev: Empty Files, Up: Data File Management
-
-13.3.5 Treating Assignments as File Names
------------------------------------------
-
-Occasionally, you might not want `awk' to process command-line variable
-assignments (*note Assignment Options::). In particular, if you have a
-file name that contain an `=' character, `awk' treats the file name as
-an assignment, and does not process it.
-
- Some users have suggested an additional command-line option for
-`gawk' to disable command-line assignments. However, some simple
-programming with a library file does the trick:
-
- # noassign.awk --- library file to avoid the need for a
- # special option that disables command-line assignments
-
- function disable_assigns(argc, argv, i)
- {
- for (i = 1; i < argc; i++)
- if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
- argv[i] = ("./" argv[i])
- }
-
- BEGIN {
- if (No_command_assign)
- disable_assigns(ARGC, ARGV)
- }
-
- You then run your program this way:
-
- awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
-
- The function works by looping through the arguments. It prepends
-`./' to any argument that matches the form of a variable assignment,
-turning that argument into a file name.
-
- The use of `No_command_assign' allows you to disable command-line
-assignments at invocation time, by giving the variable a true value.
-When not set, it is initially zero (i.e., false), so the command-line
-arguments are left alone.
-
-
-File: gawk.info, Node: Getopt Function, Next: Passwd Functions, Prev: Data File Management, Up: Library Functions
-
-13.4 Processing Command-Line Options
-====================================
-
-Most utilities on POSIX compatible systems take options on the command
-line that can be used to change the way a program behaves. `awk' is an
-example of such a program (*note Options::). Often, options take
-"arguments"; i.e., data that the program needs to correctly obey the
-command-line option. For example, `awk''s `-F' option requires a
-string to use as the field separator. The first occurrence on the
-command line of either `--' or a string that does not begin with `-'
-ends the options.
-
- Modern Unix systems provide a C function named `getopt()' for
-processing command-line arguments. The programmer provides a string
-describing the one-letter options. If an option requires an argument,
-it is followed in the string with a colon. `getopt()' is also passed
-the count and values of the command-line arguments and is called in a
-loop. `getopt()' processes the command-line arguments for option
-letters. Each time around the loop, it returns a single character
-representing the next option letter that it finds, or `?' if it finds
-an invalid option. When it returns -1, there are no options left on
-the command line.
-
- When using `getopt()', options that do not take arguments can be
-grouped together. Furthermore, options that take arguments require
-that the argument be present. The argument can immediately follow the
-option letter, or it can be a separate command-line argument.
-
- Given a hypothetical program that takes three command-line options,
-`-a', `-b', and `-c', where `-b' requires an argument, all of the
-following are valid ways of invoking the program:
-
- prog -a -b foo -c data1 data2 data3
- prog -ac -bfoo -- data1 data2 data3
- prog -acbfoo data1 data2 data3
-
- Notice that when the argument is grouped with its option, the rest of
-the argument is considered to be the option's argument. In this
-example, `-acbfoo' indicates that all of the `-a', `-b', and `-c'
-options were supplied, and that `foo' is the argument to the `-b'
-option.
-
- `getopt()' provides four external variables that the programmer can
-use:
-
-`optind'
- The index in the argument value array (`argv') where the first
- nonoption command-line argument can be found.
-
-`optarg'
- The string value of the argument to an option.
-
-`opterr'
- Usually `getopt()' prints an error message when it finds an invalid
- option. Setting `opterr' to zero disables this feature. (An
- application might want to print its own error message.)
-
-`optopt'
- The letter representing the command-line option.
-
- The following C fragment shows how `getopt()' might process
-command-line arguments for `awk':
-
- int
- main(int argc, char *argv[])
- {
- ...
- /* print our own message */
- opterr = 0;
- while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
- switch (c) {
- case 'f': /* file */
- ...
- break;
- case 'F': /* field separator */
- ...
- break;
- case 'v': /* variable assignment */
- ...
- break;
- case 'W': /* extension */
- ...
- break;
- case '?':
- default:
- usage();
- break;
- }
- }
- ...
- }
-
- As a side point, `gawk' actually uses the GNU `getopt_long()'
-function to process both normal and GNU-style long options (*note
-Options::).
-
- The abstraction provided by `getopt()' is very useful and is quite
-handy in `awk' programs as well. Following is an `awk' version of
-`getopt()'. This function highlights one of the greatest weaknesses in
-`awk', which is that it is very poor at manipulating single characters.
-Repeated calls to `substr()' are necessary for accessing individual
-characters (*note String Functions::).(1)
-
- The discussion that follows walks through the code a bit at a time:
-
- # getopt.awk --- Do C library getopt(3) function in awk
-
- # External variables:
- # Optind -- index in ARGV of first nonoption argument
- # Optarg -- string value of argument to current option
- # Opterr -- if nonzero, print our own diagnostic
- # Optopt -- current option letter
-
- # Returns:
- # -1 at end of options
- # "?" for unrecognized option
- # <c> a character representing the current option
-
- # Private Data:
- # _opti -- index in multi-flag option, e.g., -abc
-
- The function starts out with comments presenting a list of the
-global variables it uses, what the return values are, what they mean,
-and any global variables that are "private" to this library function.
-Such documentation is essential for any program, and particularly for
-library functions.
-
- The `getopt()' function first checks that it was indeed called with
-a string of options (the `options' parameter). If `options' has a zero
-length, `getopt()' immediately returns -1:
-
- function getopt(argc, argv, options, thisopt, i)
- {
- if (length(options) == 0) # no options given
- return -1
-
- if (argv[Optind] == "--") { # all done
- Optind++
- _opti = 0
- return -1
- } else if (argv[Optind] !~ /^-[^:[:space:]]/) {
- _opti = 0
- return -1
- }
-
- The next thing to check for is the end of the options. A `--' ends
-the command-line options, as does any command-line argument that does
-not begin with a `-'. `Optind' is used to step through the array of
-command-line arguments; it retains its value across calls to
-`getopt()', because it is a global variable.
-
- The regular expression that is used, `/^-[^:[:space:]/', checks for
-a `-' followed by anything that is not whitespace and not a colon. If
-the current command-line argument does not match this pattern, it is
-not an option, and it ends option processing. Continuing on:
-
- if (_opti == 0)
- _opti = 2
- thisopt = substr(argv[Optind], _opti, 1)
- Optopt = thisopt
- i = index(options, thisopt)
- if (i == 0) {
- if (Opterr)
- printf("%c -- invalid option\n",
- thisopt) > "/dev/stderr"
- if (_opti >= length(argv[Optind])) {
- Optind++
- _opti = 0
- } else
- _opti++
- return "?"
- }
-
- The `_opti' variable tracks the position in the current command-line
-argument (`argv[Optind]'). If multiple options are grouped together
-with one `-' (e.g., `-abx'), it is necessary to return them to the user
-one at a time.
-
- If `_opti' is equal to zero, it is set to two, which is the index in
-the string of the next character to look at (we skip the `-', which is
-at position one). The variable `thisopt' holds the character, obtained
-with `substr()'. It is saved in `Optopt' for the main program to use.
-
- If `thisopt' is not in the `options' string, then it is an invalid
-option. If `Opterr' is nonzero, `getopt()' prints an error message on
-the standard error that is similar to the message from the C version of
-`getopt()'.
-
- Because the option is invalid, it is necessary to skip it and move
-on to the next option character. If `_opti' is greater than or equal
-to the length of the current command-line argument, it is necessary to
-move on to the next argument, so `Optind' is incremented and `_opti' is
-reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
-incremented.
-
- In any case, because the option is invalid, `getopt()' returns `"?"'.
-The main program can examine `Optopt' if it needs to know what the
-invalid option letter actually is. Continuing on:
-
- if (substr(options, i + 1, 1) == ":") {
- # get option argument
- if (length(substr(argv[Optind], _opti + 1)) > 0)
- Optarg = substr(argv[Optind], _opti + 1)
- else
- Optarg = argv[++Optind]
- _opti = 0
- } else
- Optarg = ""
-
- If the option requires an argument, the option letter is followed by
-a colon in the `options' string. If there are remaining characters in
-the current command-line argument (`argv[Optind]'), then the rest of
-that string is assigned to `Optarg'. Otherwise, the next command-line
-argument is used (`-xFOO' versus `-x FOO'). In either case, `_opti' is
-reset to zero, because there are no more characters left to examine in
-the current command-line argument. Continuing:
-
- if (_opti == 0 || _opti >= length(argv[Optind])) {
- Optind++
- _opti = 0
- } else
- _opti++
- return thisopt
- }
-
- Finally, if `_opti' is either zero or greater than the length of the
-current command-line argument, it means this element in `argv' is
-through being processed, so `Optind' is incremented to point to the
-next element in `argv'. If neither condition is true, then only
-`_opti' is incremented, so that the next option letter can be processed
-on the next call to `getopt()'.
-
- The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
-`Opterr' is set to one, since the default behavior is for `getopt()' to
-print a diagnostic message upon seeing an invalid option. `Optind' is
-set to one, since there's no reason to look at the program name, which
-is in `ARGV[0]':
-
- BEGIN {
- Opterr = 1 # default is to diagnose
- Optind = 1 # skip ARGV[0]
-
- # test program
- if (_getopt_test) {
- while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
- printf("c = <%c>, optarg = <%s>\n",
- _go_c, Optarg)
- printf("non-option arguments:\n")
- for (; Optind < ARGC; Optind++)
- printf("\tARGV[%d] = <%s>\n",
- Optind, ARGV[Optind])
- }
- }
-
- The rest of the `BEGIN' rule is a simple test program. Here is the
-result of two sample runs of the test program:
-
- $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
- -| c = <a>, optarg = <>
- -| c = <c>, optarg = <>
- -| c = <b>, optarg = <ARG>
- -| non-option arguments:
- -| ARGV[3] = <bax>
- -| ARGV[4] = <-x>
-
- $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
- -| c = <a>, optarg = <>
- error--> x -- invalid option
- -| c = <?>, optarg = <>
- -| non-option arguments:
- -| ARGV[4] = <xyz>
- -| ARGV[5] = <abc>
-
- In both runs, the first `--' terminates the arguments to `awk', so
-that it does not try to interpret the `-a', etc., as its own options.
-
- NOTE: After `getopt()' is through, it is the responsibility of the
- user level code to clear out all the elements of `ARGV' from 1 to
- `Optind', so that `awk' does not try to process the command-line
- options as file names.
-
- Several of the sample programs presented in *note Sample Programs::,
-use `getopt()' to process their arguments.
-
- ---------- Footnotes ----------
-
- (1) This function was written before `gawk' acquired the ability to
-split strings into single characters using `""' as the separator. We
-have left it alone, since using `substr()' is more portable.
-
-
-File: gawk.info, Node: Passwd Functions, Next: Group Functions, Prev: Getopt Function, Up: Library Functions
-
-13.5 Reading the User Database
-==============================
-
-The `PROCINFO' array (*note Built-in Variables::) provides access to
-the current user's real and effective user and group ID numbers, and if
-available, the user's supplementary group set. However, because these
-are numbers, they do not provide very useful information to the average
-user. There needs to be some way to find the user information
-associated with the user and group ID numbers. This minor node
-presents a suite of functions for retrieving information from the user
-database. *Note Group Functions::, for a similar suite that retrieves
-information from the group database.
-
- The POSIX standard does not define the file where user information is
-kept. Instead, it provides the `<pwd.h>' header file and several C
-language subroutines for obtaining user information. The primary
-function is `getpwent()', for "get password entry." The "password"
-comes from the original user database file, `/etc/passwd', which stores
-user information, along with the encrypted passwords (hence the name).
-
- While an `awk' program could simply read `/etc/passwd' directly,
-this file may not contain complete information about the system's set
-of users.(1) To be sure you are able to produce a readable and complete
-version of the user database, it is necessary to write a small C
-program that calls `getpwent()'. `getpwent()' is defined as returning
-a pointer to a `struct passwd'. Each time it is called, it returns the
-next entry in the database. When there are no more entries, it returns
-`NULL', the null pointer. When this happens, the C program should call
-`endpwent()' to close the database. Following is `pwcat', a C program
-that "cats" the password database:
-
- /*
- * pwcat.c
- *
- * Generate a printable version of the password database
- */
- #include <stdio.h>
- #include <pwd.h>
-
- int
- main(int argc, char **argv)
- {
- struct passwd *p;
-
- while ((p = getpwent()) != NULL)
- printf("%s:%s:%ld:%ld:%s:%s:%s\n",
- p->pw_name, p->pw_passwd, (long) p->pw_uid,
- (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
-
- endpwent();
- return 0;
- }
-
- If you don't understand C, don't worry about it. The output from
-`pwcat' is the user database, in the traditional `/etc/passwd' format
-of colon-separated fields. The fields are:
-
-Login name
- The user's login name.
-
-Encrypted password
- The user's encrypted password. This may not be available on some
- systems.
-
-User-ID
- The user's numeric user ID number. (On some systems it's a C
- `long', and not an `int'. Thus we cast it to `long' for all
- cases.)
-
-Group-ID
- The user's numeric group ID number. (Similar comments about
- `long' vs. `int' apply here.)
-
-Full name
- The user's full name, and perhaps other information associated
- with the user.
-
-Home directory
- The user's login (or "home") directory (familiar to shell
- programmers as `$HOME').
-
-Login shell
- The program that is run when the user logs in. This is usually a
- shell, such as Bash.
-
- A few lines representative of `pwcat''s output are as follows:
-
- $ pwcat
- -| root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
- -| nobody:*:65534:65534::/:
- -| daemon:*:1:1::/:
- -| sys:*:2:2::/:/bin/csh
- -| bin:*:3:3::/bin:
- -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
- -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
- -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
- ...
-
- With that introduction, following is a group of functions for
-getting user information. There are several functions here,
-corresponding to the C functions of the same names:
-
- # passwd.awk --- access password file information
-
- BEGIN {
- # tailor this to suit your system
- _pw_awklib = "/usr/local/libexec/awk/"
- }
-
- function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
- {
- if (_pw_inited)
- return
-
- oldfs = FS
- oldrs = RS
- olddol0 = $0
- using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
- using_fpat = (PROCINFO["FS"] == "FPAT")
- FS = ":"
- RS = "\n"
-
- pwcat = _pw_awklib "pwcat"
- while ((pwcat | getline) > 0) {
- _pw_byname[$1] = $0
- _pw_byuid[$3] = $0
- _pw_bycount[++_pw_total] = $0
- }
- close(pwcat)
- _pw_count = 0
- _pw_inited = 1
- FS = oldfs
- if (using_fw)
- FIELDWIDTHS = FIELDWIDTHS
- else if (using_fpat)
- FPAT = FPAT
- RS = oldrs
- $0 = olddol0
- }
-
- The `BEGIN' rule sets a private variable to the directory where
-`pwcat' is stored. Because it is used to help out an `awk' library
-routine, we have chosen to put it in `/usr/local/libexec/awk'; however,
-you might want it to be in a different directory on your system.
-
- The function `_pw_init()' keeps three copies of the user information
-in three associative arrays. The arrays are indexed by username
-(`_pw_byname'), by user ID number (`_pw_byuid'), and by order of
-occurrence (`_pw_bycount'). The variable `_pw_inited' is used for
-efficiency, since `_pw_init()' needs to be called only once.
-
- Because this function uses `getline' to read information from
-`pwcat', it first saves the values of `FS', `RS', and `$0'. It notes
-in the variable `using_fw' whether field splitting with `FIELDWIDTHS'
-is in effect or not. Doing so is necessary, since these functions
-could be called from anywhere within a user's program, and the user may
-have his or her own way of splitting records and fields.
-
- The `using_fw' variable checks `PROCINFO["FS"]', which is
-`"FIELDWIDTHS"' if field splitting is being done with `FIELDWIDTHS'.
-This makes it possible to restore the correct field-splitting mechanism
-later. The test can only be true for `gawk'. It is false if using
-`FS' or `FPAT', or on some other `awk' implementation.
-
- The code that checks for using `FPAT', using `using_fpat' and
-`PROCINFO["FS"]' is similar.
-
- The main part of the function uses a loop to read database lines,
-split the line into fields, and then store the line into each array as
-necessary. When the loop is done, `_pw_init()' cleans up by closing
-the pipeline, setting `_pw_inited' to one, and restoring `FS' (and
-`FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0'. The use of
-`_pw_count' is explained shortly.
-
- The `getpwnam()' function takes a username as a string argument. If
-that user is in the database, it returns the appropriate line.
-Otherwise, it relies on the array reference to a nonexistent element to
-create the element with the null string as its value:
-
- function getpwnam(name)
- {
- _pw_init()
- return _pw_byname[name]
- }
-
- Similarly, the `getpwuid' function takes a user ID number argument.
-If that user number is in the database, it returns the appropriate
-line. Otherwise, it returns the null string:
-
- function getpwuid(uid)
- {
- _pw_init()
- return _pw_byuid[uid]
- }
-
- The `getpwent()' function simply steps through the database, one
-entry at a time. It uses `_pw_count' to track its current position in
-the `_pw_bycount' array:
-
- function getpwent()
- {
- _pw_init()
- if (_pw_count < _pw_total)
- return _pw_bycount[++_pw_count]
- return ""
- }
-
- The `endpwent()' function resets `_pw_count' to zero, so that
-subsequent calls to `getpwent()' start over again:
-
- function endpwent()
- {
- _pw_count = 0
- }
-
- A conscious design decision in this suite is that each subroutine
-calls `_pw_init()' to initialize the database arrays. The overhead of
-running a separate process to generate the user database, and the I/O
-to scan it, are only incurred if the user's main program actually calls
-one of these functions. If this library file is loaded along with a
-user's program, but none of the routines are ever called, then there is
-no extra runtime overhead. (The alternative is move the body of
-`_pw_init()' into a `BEGIN' rule, which always runs `pwcat'. This
-simplifies the code but runs an extra process that may never be needed.)
-
- In turn, calling `_pw_init()' is not too expensive, because the
-`_pw_inited' variable keeps the program from reading the data more than
-once. If you are worried about squeezing every last cycle out of your
-`awk' program, the check of `_pw_inited' could be moved out of
-`_pw_init()' and duplicated in all the other functions. In practice,
-this is not necessary, since most `awk' programs are I/O-bound, and
-such a change would clutter up the code.
-
- The `id' program in *note Id Program::, uses these functions.
-
- ---------- Footnotes ----------
-
- (1) It is often the case that password information is stored in a
-network database.
-
-
-File: gawk.info, Node: Group Functions, Next: Walking Arrays, Prev: Passwd Functions, Up: Library Functions
-
-13.6 Reading the Group Database
-===============================
-
-Much of the discussion presented in *note Passwd Functions::, applies
-to the group database as well. Although there has traditionally been a
-well-known file (`/etc/group') in a well-known format, the POSIX
-standard only provides a set of C library routines (`<grp.h>' and
-`getgrent()') for accessing the information. Even though this file may
-exist, it may not have complete information. Therefore, as with the
-user database, it is necessary to have a small C program that generates
-the group database as its output. `grcat', a C program that "cats" the
-group database, is as follows:
-
- /*
- * grcat.c
- *
- * Generate a printable version of the group database
- */
- #include <stdio.h>
- #include <grp.h>
-
- int
- main(int argc, char **argv)
- {
- struct group *g;
- int i;
-
- while ((g = getgrent()) != NULL) {
- printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
- (long) g->gr_gid);
- for (i = 0; g->gr_mem[i] != NULL; i++) {
- printf("%s", g->gr_mem[i]);
- if (g->gr_mem[i+1] != NULL)
- putchar(',');
- }
- putchar('\n');
- }
- endgrent();
- return 0;
- }
-
- Each line in the group database represents one group. The fields are
-separated with colons and represent the following information:
-
-Group Name
- The group's name.
-
-Group Password
- The group's encrypted password. In practice, this field is never
- used; it is usually empty or set to `*'.
-
-Group ID Number
- The group's numeric group ID number; this number must be unique
- within the file. (On some systems it's a C `long', and not an
- `int'. Thus we cast it to `long' for all cases.)
-
-Group Member List
- A comma-separated list of user names. These users are members of
- the group. Modern Unix systems allow users to be members of
- several groups simultaneously. If your system does, then there
- are elements `"group1"' through `"groupN"' in `PROCINFO' for those
- group ID numbers. (Note that `PROCINFO' is a `gawk' extension;
- *note Built-in Variables::.)
-
- Here is what running `grcat' might produce:
-
- $ grcat
- -| wheel:*:0:arnold
- -| nogroup:*:65534:
- -| daemon:*:1:
- -| kmem:*:2:
- -| staff:*:10:arnold,miriam,andy
- -| other:*:20:
- ...
-
- Here are the functions for obtaining information from the group
-database. There are several, modeled after the C library functions of
-the same names:
-
- # group.awk --- functions for dealing with the group file
-
- BEGIN \
- {
- # Change to suit your system
- _gr_awklib = "/usr/local/libexec/awk/"
- }
-
- function _gr_init( oldfs, oldrs, olddol0, grcat,
- using_fw, using_fpat, n, a, i)
- {
- if (_gr_inited)
- return
-
- oldfs = FS
- oldrs = RS
- olddol0 = $0
- using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
- using_fpat = (PROCINFO["FS"] == "FPAT")
- FS = ":"
- RS = "\n"
-
- grcat = _gr_awklib "grcat"
- while ((grcat | getline) > 0) {
- if ($1 in _gr_byname)
- _gr_byname[$1] = _gr_byname[$1] "," $4
- else
- _gr_byname[$1] = $0
- if ($3 in _gr_bygid)
- _gr_bygid[$3] = _gr_bygid[$3] "," $4
- else
- _gr_bygid[$3] = $0
-
- n = split($4, a, "[ \t]*,[ \t]*")
- for (i = 1; i <= n; i++)
- if (a[i] in _gr_groupsbyuser)
- _gr_groupsbyuser[a[i]] = \
- _gr_groupsbyuser[a[i]] " " $1
- else
- _gr_groupsbyuser[a[i]] = $1
-
- _gr_bycount[++_gr_count] = $0
- }
- close(grcat)
- _gr_count = 0
- _gr_inited++
- FS = oldfs
- if (using_fw)
- FIELDWIDTHS = FIELDWIDTHS
- else if (using_fpat)
- FPAT = FPAT
- RS = oldrs
- $0 = olddol0
- }
-
- The `BEGIN' rule sets a private variable to the directory where
-`grcat' is stored. Because it is used to help out an `awk' library
-routine, we have chosen to put it in `/usr/local/libexec/awk'. You
-might want it to be in a different directory on your system.
-
- These routines follow the same general outline as the user database
-routines (*note Passwd Functions::). The `_gr_inited' variable is used
-to ensure that the database is scanned no more than once. The
-`_gr_init()' function first saves `FS', `RS', and `$0', and then sets
-`FS' and `RS' to the correct values for scanning the group information.
-It also takes care to note whether `FIELDWIDTHS' or `FPAT' is being
-used, and to restore the appropriate field splitting mechanism.
-
- The group information is stored is several associative arrays. The
-arrays are indexed by group name (`_gr_byname'), by group ID number
-(`_gr_bygid'), and by position in the database (`_gr_bycount'). There
-is an additional array indexed by user name (`_gr_groupsbyuser'), which
-is a space-separated list of groups to which each user belongs.
-
- Unlike the user database, it is possible to have multiple records in
-the database for the same group. This is common when a group has a
-large number of members. A pair of such entries might look like the
-following:
-
- tvpeople:*:101:johnny,jay,arsenio
- tvpeople:*:101:david,conan,tom,joan
-
- For this reason, `_gr_init()' looks to see if a group name or group
-ID number is already seen. If it is, then the user names are simply
-concatenated onto the previous list of users. (There is actually a
-subtle problem with the code just presented. Suppose that the first
-time there were no names. This code adds the names with a leading
-comma. It also doesn't check that there is a `$4'.)
-
- Finally, `_gr_init()' closes the pipeline to `grcat', restores `FS'
-(and `FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0', initializes
-`_gr_count' to zero (it is used later), and makes `_gr_inited' nonzero.
-
- The `getgrnam()' function takes a group name as its argument, and if
-that group exists, it is returned. Otherwise, it relies on the array
-reference to a nonexistent element to create the element with the null
-string as its value:
-
- function getgrnam(group)
- {
- _gr_init()
- return _gr_byname[group]
- }
-
- The `getgrgid()' function is similar; it takes a numeric group ID and
-looks up the information associated with that group ID:
-
- function getgrgid(gid)
- {
- _gr_init()
- return _gr_bygid[gid]
- }
-
- The `getgruser()' function does not have a C counterpart. It takes a
-user name and returns the list of groups that have the user as a member:
-
- function getgruser(user)
- {
- _gr_init()
- return _gr_groupsbyuser[user]
- }
-
- The `getgrent()' function steps through the database one entry at a
-time. It uses `_gr_count' to track its position in the list:
-
- function getgrent()
- {
- _gr_init()
- if (++_gr_count in _gr_bycount)
- return _gr_bycount[_gr_count]
- return ""
- }
-
- The `endgrent()' function resets `_gr_count' to zero so that
-`getgrent()' can start over again:
-
- function endgrent()
- {
- _gr_count = 0
- }
-
- As with the user database routines, each function calls `_gr_init()'
-to initialize the arrays. Doing so only incurs the extra overhead of
-running `grcat' if these functions are used (as opposed to moving the
-body of `_gr_init()' into a `BEGIN' rule).
-
- Most of the work is in scanning the database and building the various
-associative arrays. The functions that the user calls are themselves
-very simple, relying on `awk''s associative arrays to do work.
-
- The `id' program in *note Id Program::, uses these functions.
-
-
-File: gawk.info, Node: Walking Arrays, Prev: Group Functions, Up: Library Functions
-
-13.7 Traversing Arrays of Arrays
-================================
-
-*note Arrays of Arrays::, described how `gawk' provides arrays of
-arrays. In particular, any element of an array may be either a scalar,
-or another array. The `isarray()' function (*note Type Functions::)
-lets you distinguish an array from a scalar. The following function,
-`walk_array()', recursively traverses an array, printing each element's
-indices and value. You call it with the array and a string
-representing the name of the array:
-
- function walk_array(arr, name, i)
- {
- for (i in arr) {
- if (isarray(arr[i]))
- walk_array(arr[i], (name "[" i "]"))
- else
- printf("%s[%s] = %s\n", name, i, arr[i])
- }
- }
-
-It works by looping over each element of the array. If any given
-element is itself an array, the function calls itself recursively,
-passing the subarray and a new string representing the current index.
-Otherwise, the function simply prints the element's name, index, and
-value. Here is a main program to demonstrate:
-
- BEGIN {
- a[1] = 1
- a[2][1] = 21
- a[2][2] = 22
- a[3] = 3
- a[4][1][1] = 411
- a[4][2] = 42
-
- walk_array(a, "a")
- }
-
- When run, the program produces the following output:
-
- $ gawk -f walk_array.awk
- -| a[4][1][1] = 411
- -| a[4][2] = 42
- -| a[1] = 1
- -| a[2][1] = 21
- -| a[2][2] = 22
- -| a[3] = 3
-
-
-File: gawk.info, Node: Sample Programs, Next: Debugger, Prev: Library Functions, Up: Top
-
-14 Practical `awk' Programs
-***************************
-
-*note Library Functions::, presents the idea that reading programs in a
-language contributes to learning that language. This major node
-continues that theme, presenting a potpourri of `awk' programs for your
-reading enjoyment.
-
- Many of these programs use library functions presented in *note
-Library Functions::.
-
-* Menu:
-
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Miscellaneous Programs:: Some interesting `awk' programs.
-
-
-File: gawk.info, Node: Running Examples, Next: Clones, Up: Sample Programs
-
-14.1 Running the Example Programs
-=================================
-
-To run a given program, you would typically do something like this:
-
- awk -f PROGRAM -- OPTIONS FILES
-
-Here, PROGRAM is the name of the `awk' program (such as `cut.awk'),
-OPTIONS are any command-line options for the program that start with a
-`-', and FILES are the actual data files.
-
- If your system supports the `#!' executable interpreter mechanism
-(*note Executable Scripts::), you can instead run your program directly:
-
- cut.awk -c1-8 myfiles > results
-
- If your `awk' is not `gawk', you may instead need to use this:
-
- cut.awk -- -c1-8 myfiles > results
-
-
-File: gawk.info, Node: Clones, Next: Miscellaneous Programs, Prev: Running Examples, Up: Sample Programs
-
-14.2 Reinventing Wheels for Fun and Profit
-==========================================
-
-This minor node presents a number of POSIX utilities implemented in
-`awk'. Reinventing these programs in `awk' is often enjoyable, because
-the algorithms can be very clearly expressed, and the code is usually
-very concise and simple. This is true because `awk' does so much for
-you.
-
- It should be noted that these programs are not necessarily intended
-to replace the installed versions on your system. Nor may all of these
-programs be fully compliant with the most recent POSIX standard. This
-is not a problem; their purpose is to illustrate `awk' language
-programming for "real world" tasks.
-
- The programs are presented in alphabetical order.
-
-* Menu:
-
-* Cut Program:: The `cut' utility.
-* Egrep Program:: The `egrep' utility.
-* Id Program:: The `id' utility.
-* Split Program:: The `split' utility.
-* Tee Program:: The `tee' utility.
-* Uniq Program:: The `uniq' utility.
-* Wc Program:: The `wc' utility.
-
-
-File: gawk.info, Node: Cut Program, Next: Egrep Program, Up: Clones
-
-14.2.1 Cutting out Fields and Columns
--------------------------------------
-
-The `cut' utility selects, or "cuts," characters or fields from its
-standard input and sends them to its standard output. Fields are
-separated by TABs by default, but you may supply a command-line option
-to change the field "delimiter" (i.e., the field-separator character).
-`cut''s definition of fields is less general than `awk''s.
-
- A common use of `cut' might be to pull out just the login name of
-logged-on users from the output of `who'. For example, the following
-pipeline generates a sorted, unique list of the logged-on users:
-
- who | cut -c1-8 | sort | uniq
-
- The options for `cut' are:
-
-`-c LIST'
- Use LIST as the list of characters to cut out. Items within the
- list may be separated by commas, and ranges of characters can be
- separated with dashes. The list `1-8,15,22-35' specifies
- characters 1 through 8, 15, and 22 through 35.
-
-`-f LIST'
- Use LIST as the list of fields to cut out.
-
-`-d DELIM'
- Use DELIM as the field-separator character instead of the TAB
- character.
-
-`-s'
- Suppress printing of lines that do not contain the field delimiter.
-
- The `awk' implementation of `cut' uses the `getopt()' library
-function (*note Getopt Function::) and the `join()' library function
-(*note Join Function::).
-
- The program begins with a comment describing the options, the library
-functions needed, and a `usage()' function that prints out a usage
-message and exits. `usage()' is called if invalid arguments are
-supplied:
-
- # cut.awk --- implement cut in awk
-
- # Options:
- # -f list Cut fields
- # -d c Field delimiter character
- # -c list Cut characters
- #
- # -s Suppress lines without the delimiter
- #
- # Requires getopt() and join() library functions
-
- function usage( e1, e2)
- {
- e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
- e2 = "usage: cut [-c list] [files...]"
- print e1 > "/dev/stderr"
- print e2 > "/dev/stderr"
- exit 1
- }
-
-The variables `e1' and `e2' are used so that the function fits nicely
-on the screen.
-
- Next comes a `BEGIN' rule that parses the command-line options. It
-sets `FS' to a single TAB character, because that is `cut''s default
-field separator. The rule then sets the output field separator to be the
-same as the input field separator. A loop using `getopt()' steps
-through the command-line options. Exactly one of the variables
-`by_fields' or `by_chars' is set to true, to indicate that processing
-should be done by fields or by characters, respectively. When cutting
-by characters, the output field separator is set to the null string:
-
- BEGIN \
- {
- FS = "\t" # default
- OFS = FS
- while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) {
- if (c == "f") {
- by_fields = 1
- fieldlist = Optarg
- } else if (c == "c") {
- by_chars = 1
- fieldlist = Optarg
- OFS = ""
- } else if (c == "d") {
- if (length(Optarg) > 1) {
- printf("Using first character of %s" \
- " for delimiter\n", Optarg) > "/dev/stderr"
- Optarg = substr(Optarg, 1, 1)
- }
- FS = Optarg
- OFS = FS
- if (FS == " ") # defeat awk semantics
- FS = "[ ]"
- } else if (c == "s")
- suppress++
- else
- usage()
- }
-
- # Clear out options
- for (i = 1; i < Optind; i++)
- ARGV[i] = ""
-
- The code must take special care when the field delimiter is a space.
-Using a single space (`" "') for the value of `FS' is incorrect--`awk'
-would separate fields with runs of spaces, TABs, and/or newlines, and
-we want them to be separated with individual spaces. Also remember
-that after `getopt()' is through (as described in *note Getopt
-Function::), we have to clear out all the elements of `ARGV' from 1 to
-`Optind', so that `awk' does not try to process the command-line options
-as file names.
-
- After dealing with the command-line options, the program verifies
-that the options make sense. Only one or the other of `-c' and `-f'
-should be used, and both require a field list. Then the program calls
-either `set_fieldlist()' or `set_charlist()' to pull apart the list of
-fields or characters:
-
- if (by_fields && by_chars)
- usage()
-
- if (by_fields == 0 && by_chars == 0)
- by_fields = 1 # default
-
- if (fieldlist == "") {
- print "cut: needs list for -c or -f" > "/dev/stderr"
- exit 1
- }
-
- if (by_fields)
- set_fieldlist()
- else
- set_charlist()
- }
-
- `set_fieldlist()' splits the field list apart at the commas into an
-array. Then, for each element of the array, it looks to see if the
-element is actually a range, and if so, splits it apart. The function
-checks the range to make sure that the first number is smaller than the
-second. Each number in the list is added to the `flist' array, which
-simply lists the fields that will be printed. Normal field splitting
-is used. The program lets `awk' handle the job of doing the field
-splitting:
-
- function set_fieldlist( n, m, i, j, k, f, g)
- {
- n = split(fieldlist, f, ",")
- j = 1 # index in flist
- for (i = 1; i <= n; i++) {
- if (index(f[i], "-") != 0) { # a range
- m = split(f[i], g, "-")
- if (m != 2 || g[1] >= g[2]) {
- printf("bad field list: %s\n",
- f[i]) > "/dev/stderr"
- exit 1
- }
- for (k = g[1]; k <= g[2]; k++)
- flist[j++] = k
- } else
- flist[j++] = f[i]
- }
- nfields = j - 1
- }
-
- The `set_charlist()' function is more complicated than
-`set_fieldlist()'. The idea here is to use `gawk''s `FIELDWIDTHS'
-variable (*note Constant Size::), which describes constant-width input.
-When using a character list, that is exactly what we have.
-
- Setting up `FIELDWIDTHS' is more complicated than simply listing the
-fields that need to be printed. We have to keep track of the fields to
-print and also the intervening characters that have to be skipped. For
-example, suppose you wanted characters 1 through 8, 15, and 22 through
-35. You would use `-c 1-8,15,22-35'. The necessary value for
-`FIELDWIDTHS' is `"8 6 1 6 14"'. This yields five fields, and the
-fields to print are `$1', `$3', and `$5'. The intermediate fields are
-"filler", which is stuff in between the desired data. `flist' lists
-the fields to print, and `t' tracks the complete field list, including
-filler fields:
-
- function set_charlist( field, i, j, f, g, t,
- filler, last, len)
- {
- field = 1 # count total fields
- n = split(fieldlist, f, ",")
- j = 1 # index in flist
- for (i = 1; i <= n; i++) {
- if (index(f[i], "-") != 0) { # range
- m = split(f[i], g, "-")
- if (m != 2 || g[1] >= g[2]) {
- printf("bad character list: %s\n",
- f[i]) > "/dev/stderr"
- exit 1
- }
- len = g[2] - g[1] + 1
- if (g[1] > 1) # compute length of filler
- filler = g[1] - last - 1
- else
- filler = 0
- if (filler)
- t[field++] = filler
- t[field++] = len # length of field
- last = g[2]
- flist[j++] = field - 1
- } else {
- if (f[i] > 1)
- filler = f[i] - last - 1
- else
- filler = 0
- if (filler)
- t[field++] = filler
- t[field++] = 1
- last = f[i]
- flist[j++] = field - 1
- }
- }
- FIELDWIDTHS = join(t, 1, field - 1)
- nfields = j - 1
- }
-
- Next is the rule that actually processes the data. If the `-s'
-option is given, then `suppress' is true. The first `if' statement
-makes sure that the input record does have the field separator. If
-`cut' is processing fields, `suppress' is true, and the field separator
-character is not in the record, then the record is skipped.
-
- If the record is valid, then `gawk' has split the data into fields,
-either using the character in `FS' or using fixed-length fields and
-`FIELDWIDTHS'. The loop goes through the list of fields that should be
-printed. The corresponding field is printed if it contains data. If
-the next field also has data, then the separator character is written
-out between the fields:
-
- {
- if (by_fields && suppress && index($0, FS) != 0)
- next
-
- for (i = 1; i <= nfields; i++) {
- if ($flist[i] != "") {
- printf "%s", $flist[i]
- if (i < nfields && $flist[i+1] != "")
- printf "%s", OFS
- }
- }
- print ""
- }
-
- This version of `cut' relies on `gawk''s `FIELDWIDTHS' variable to
-do the character-based cutting. While it is possible in other `awk'
-implementations to use `substr()' (*note String Functions::), it is
-also extremely painful. The `FIELDWIDTHS' variable supplies an elegant
-solution to the problem of picking the input line apart by characters.
-
-
-File: gawk.info, Node: Egrep Program, Next: Id Program, Prev: Cut Program, Up: Clones
-
-14.2.2 Searching for Regular Expressions in Files
--------------------------------------------------
-
-The `egrep' utility searches files for patterns. It uses regular
-expressions that are almost identical to those available in `awk'
-(*note Regexp::). You invoke it as follows:
-
- egrep [ OPTIONS ] 'PATTERN' FILES ...
-
- The PATTERN is a regular expression. In typical usage, the regular
-expression is quoted to prevent the shell from expanding any of the
-special characters as file name wildcards. Normally, `egrep' prints
-the lines that matched. If multiple file names are provided on the
-command line, each output line is preceded by the name of the file and
-a colon.
-
- The options to `egrep' are as follows:
-
-`-c'
- Print out a count of the lines that matched the pattern, instead
- of the lines themselves.
-
-`-s'
- Be silent. No output is produced and the exit value indicates
- whether the pattern was matched.
-
-`-v'
- Invert the sense of the test. `egrep' prints the lines that do
- _not_ match the pattern and exits successfully if the pattern is
- not matched.
-
-`-i'
- Ignore case distinctions in both the pattern and the input data.
-
-`-l'
- Only print (list) the names of the files that matched, not the
- lines that matched.
-
-`-e PATTERN'
- Use PATTERN as the regexp to match. The purpose of the `-e'
- option is to allow patterns that start with a `-'.
-
- This version uses the `getopt()' library function (*note Getopt
-Function::) and the file transition library program (*note Filetrans
-Function::).
-
- The program begins with a descriptive comment and then a `BEGIN' rule
-that processes the command-line arguments with `getopt()'. The `-i'
-(ignore case) option is particularly easy with `gawk'; we just use the
-`IGNORECASE' built-in variable (*note Built-in Variables::):
-
- # egrep.awk --- simulate egrep in awk
- #
- # Options:
- # -c count of lines
- # -s silent - use exit value
- # -v invert test, success if no match
- # -i ignore case
- # -l print filenames only
- # -e argument is pattern
- #
- # Requires getopt and file transition library functions
-
- BEGIN {
- while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
- if (c == "c")
- count_only++
- else if (c == "s")
- no_print++
- else if (c == "v")
- invert++
- else if (c == "i")
- IGNORECASE = 1
- else if (c == "l")
- filenames_only++
- else if (c == "e")
- pattern = Optarg
- else
- usage()
- }
-
- Next comes the code that handles the `egrep'-specific behavior. If no
-pattern is supplied with `-e', the first nonoption on the command line
-is used. The `awk' command-line arguments up to `ARGV[Optind]' are
-cleared, so that `awk' won't try to process them as files. If no files
-are specified, the standard input is used, and if multiple files are
-specified, we make sure to note this so that the file names can precede
-the matched lines in the output:
-
- if (pattern == "")
- pattern = ARGV[Optind++]
-
- for (i = 1; i < Optind; i++)
- ARGV[i] = ""
- if (Optind >= ARGC) {
- ARGV[1] = "-"
- ARGC = 2
- } else if (ARGC - Optind > 1)
- do_filenames++
-
- # if (IGNORECASE)
- # pattern = tolower(pattern)
- }
-
- The last two lines are commented out, since they are not needed in
-`gawk'. They should be uncommented if you have to use another version
-of `awk'.
-
- The next set of lines should be uncommented if you are not using
-`gawk'. This rule translates all the characters in the input line into
-lowercase if the `-i' option is specified.(1) The rule is commented out
-since it is not necessary with `gawk':
-
- #{
- # if (IGNORECASE)
- # $0 = tolower($0)
- #}
-
- The `beginfile()' function is called by the rule in `ftrans.awk'
-when each new file is processed. In this case, it is very simple; all
-it does is initialize a variable `fcount' to zero. `fcount' tracks how
-many lines in the current file matched the pattern. Naming the
-parameter `junk' shows we know that `beginfile()' is called with a
-parameter, but that we're not interested in its value:
-
- function beginfile(junk)
- {
- fcount = 0
- }
-
- The `endfile()' function is called after each file has been
-processed. It affects the output only when the user wants a count of
-the number of lines that matched. `no_print' is true only if the exit
-status is desired. `count_only' is true if line counts are desired.
-`egrep' therefore only prints line counts if printing and counting are
-enabled. The output format must be adjusted depending upon the number
-of files to process. Finally, `fcount' is added to `total', so that we
-know the total number of lines that matched the pattern:
-
- function endfile(file)
- {
- if (! no_print && count_only) {
- if (do_filenames)
- print file ":" fcount
- else
- print fcount
- }
-
- total += fcount
- }
-
- The following rule does most of the work of matching lines. The
-variable `matches' is true if the line matched the pattern. If the user
-wants lines that did not match, the sense of `matches' is inverted
-using the `!' operator. `fcount' is incremented with the value of
-`matches', which is either one or zero, depending upon a successful or
-unsuccessful match. If the line does not match, the `next' statement
-just moves on to the next record.
-
- A number of additional tests are made, but they are only done if we
-are not counting lines. First, if the user only wants exit status
-(`no_print' is true), then it is enough to know that _one_ line in this
-file matched, and we can skip on to the next file with `nextfile'.
-Similarly, if we are only printing file names, we can print the file
-name, and then skip to the next file with `nextfile'. Finally, each
-line is printed, with a leading file name and colon if necessary:
-
- {
- matches = ($0 ~ pattern)
- if (invert)
- matches = ! matches
-
- fcount += matches # 1 or 0
-
- if (! matches)
- next
-
- if (! count_only) {
- if (no_print)
- nextfile
-
- if (filenames_only) {
- print FILENAME
- nextfile
- }
-
- if (do_filenames)
- print FILENAME ":" $0
- else
- print
- }
- }
-
- The `END' rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
-
- END \
- {
- if (total == 0)
- exit 1
- exit 0
- }
-
- The `usage()' function prints a usage message in case of invalid
-options, and then exits:
-
- function usage( e)
- {
- e = "Usage: egrep [-csvil] [-e pat] [files ...]"
- e = e "\n\tegrep [-csvil] pat [files ...]"
- print e > "/dev/stderr"
- exit 1
- }
-
- The variable `e' is used so that the function fits nicely on the
-printed page.
-
- Just a note on programming style: you may have noticed that the `END'
-rule uses backslash continuation, with the open brace on a line by
-itself. This is so that it more closely resembles the way functions
-are written. Many of the examples in this major node use this style.
-You can decide for yourself if you like writing your `BEGIN' and `END'
-rules this way or not.
-
- ---------- Footnotes ----------
-
- (1) It also introduces a subtle bug; if a match happens, we output
-the translated line, not the original.
-
-
-File: gawk.info, Node: Id Program, Next: Split Program, Prev: Egrep Program, Up: Clones
-
-14.2.3 Printing out User Information
-------------------------------------
-
-The `id' utility lists a user's real and effective user ID numbers,
-real and effective group ID numbers, and the user's group set, if any.
-`id' only prints the effective user ID and group ID if they are
-different from the real ones. If possible, `id' also supplies the
-corresponding user and group names. The output might look like this:
-
- $ id
- -| uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
-
- This information is part of what is provided by `gawk''s `PROCINFO'
-array (*note Built-in Variables::). However, the `id' utility provides
-a more palatable output than just individual numbers.
-
- Here is a simple version of `id' written in `awk'. It uses the user
-database library functions (*note Passwd Functions::) and the group
-database library functions (*note Group Functions::):
-
- The program is fairly straightforward. All the work is done in the
-`BEGIN' rule. The user and group ID numbers are obtained from
-`PROCINFO'. The code is repetitive. The entry in the user database
-for the real user ID number is split into parts at the `:'. The name is
-the first field. Similar code is used for the effective user ID number
-and the group numbers:
-
- # id.awk --- implement id in awk
- #
- # Requires user and group library functions
- # output is:
- # uid=12(foo) euid=34(bar) gid=3(baz) \
- # egid=5(blat) groups=9(nine),2(two),1(one)
-
- BEGIN \
- {
- uid = PROCINFO["uid"]
- euid = PROCINFO["euid"]
- gid = PROCINFO["gid"]
- egid = PROCINFO["egid"]
-
- printf("uid=%d", uid)
- pw = getpwuid(uid)
- if (pw != "") {
- split(pw, a, ":")
- printf("(%s)", a[1])
- }
-
- if (euid != uid) {
- printf(" euid=%d", euid)
- pw = getpwuid(euid)
- if (pw != "") {
- split(pw, a, ":")
- printf("(%s)", a[1])
- }
- }
-
- printf(" gid=%d", gid)
- pw = getgrgid(gid)
- if (pw != "") {
- split(pw, a, ":")
- printf("(%s)", a[1])
- }
-
- if (egid != gid) {
- printf(" egid=%d", egid)
- pw = getgrgid(egid)
- if (pw != "") {
- split(pw, a, ":")
- printf("(%s)", a[1])
- }
- }
-
- for (i = 1; ("group" i) in PROCINFO; i++) {
- if (i == 1)
- printf(" groups=")
- group = PROCINFO["group" i]
- printf("%d", group)
- pw = getgrgid(group)
- if (pw != "") {
- split(pw, a, ":")
- printf("(%s)", a[1])
- }
- if (("group" (i+1)) in PROCINFO)
- printf(",")
- }
-
- print ""
- }
-
- The test in the `for' loop is worth noting. Any supplementary
-groups in the `PROCINFO' array have the indices `"group1"' through
-`"groupN"' for some N, i.e., the total number of supplementary groups.
-However, we don't know in advance how many of these groups there are.
-
- This loop works by starting at one, concatenating the value with
-`"group"', and then using `in' to see if that value is in the array.
-Eventually, `i' is incremented past the last group in the array and the
-loop exits.
-
- The loop is also correct if there are _no_ supplementary groups;
-then the condition is false the first time it's tested, and the loop
-body never executes.
-
-
-File: gawk.info, Node: Split Program, Next: Tee Program, Prev: Id Program, Up: Clones
-
-14.2.4 Splitting a Large File into Pieces
------------------------------------------
-
-The `split' program splits large text files into smaller pieces. Usage
-is as follows:(1)
-
- split [-COUNT] file [ PREFIX ]
-
- By default, the output files are named `xaa', `xab', and so on. Each
-file has 1000 lines in it, with the likely exception of the last file.
-To change the number of lines in each file, supply a number on the
-command line preceded with a minus; e.g., `-500' for files with 500
-lines in them instead of 1000. To change the name of the output files
-to something like `myfileaa', `myfileab', and so on, supply an
-additional argument that specifies the file name prefix.
-
- Here is a version of `split' in `awk'. It uses the `ord()' and
-`chr()' functions presented in *note Ordinal Functions::.
-
- The program first sets its defaults, and then tests to make sure
-there are not too many arguments. It then looks at each argument in
-turn. The first argument could be a minus sign followed by a number.
-If it is, this happens to look like a negative number, so it is made
-positive, and that is the count of lines. The data file name is
-skipped over and the final argument is used as the prefix for the
-output file names:
-
- # split.awk --- do split in awk
- #
- # Requires ord() and chr() library functions
- # usage: split [-num] [file] [outname]
-
- BEGIN {
- outfile = "x" # default
- count = 1000
- if (ARGC > 4)
- usage()
-
- i = 1
- if (ARGV[i] ~ /^-[[:digit:]]+$/) {
- count = -ARGV[i]
- ARGV[i] = ""
- i++
- }
- # test argv in case reading from stdin instead of file
- if (i in ARGV)
- i++ # skip data file name
- if (i in ARGV) {
- outfile = ARGV[i]
- ARGV[i] = ""
- }
-
- s1 = s2 = "a"
- out = (outfile s1 s2)
- }
-
- The next rule does most of the work. `tcount' (temporary count)
-tracks how many lines have been printed to the output file so far. If
-it is greater than `count', it is time to close the current file and
-start a new one. `s1' and `s2' track the current suffixes for the file
-name. If they are both `z', the file is just too big. Otherwise, `s1'
-moves to the next letter in the alphabet and `s2' starts over again at
-`a':
-
- {
- if (++tcount > count) {
- close(out)
- if (s2 == "z") {
- if (s1 == "z") {
- printf("split: %s is too large to split\n",
- FILENAME) > "/dev/stderr"
- exit 1
- }
- s1 = chr(ord(s1) + 1)
- s2 = "a"
- }
- else
- s2 = chr(ord(s2) + 1)
- out = (outfile s1 s2)
- tcount = 1
- }
- print > out
- }
-
-The `usage()' function simply prints an error message and exits:
-
- function usage( e)
- {
- e = "usage: split [-num] [file] [outname]"
- print e > "/dev/stderr"
- exit 1
- }
-
-The variable `e' is used so that the function fits nicely on the screen.
-
- This program is a bit sloppy; it relies on `awk' to automatically
-close the last file instead of doing it in an `END' rule. It also
-assumes that letters are contiguous in the character set, which isn't
-true for EBCDIC systems.
-
- ---------- Footnotes ----------
-
- (1) This is the traditional usage. The POSIX usage is different, but
-not relevant for what the program aims to demonstrate.
-
-
-File: gawk.info, Node: Tee Program, Next: Uniq Program, Prev: Split Program, Up: Clones
-
-14.2.5 Duplicating Output into Multiple Files
----------------------------------------------
-
-The `tee' program is known as a "pipe fitting." `tee' copies its
-standard input to its standard output and also duplicates it to the
-files named on the command line. Its usage is as follows:
-
- tee [-a] file ...
-
- The `-a' option tells `tee' to append to the named files, instead of
-truncating them and starting over.
-
- The `BEGIN' rule first makes a copy of all the command-line arguments
-into an array named `copy'. `ARGV[0]' is not copied, since it is not
-needed. `tee' cannot use `ARGV' directly, since `awk' attempts to
-process each file name in `ARGV' as input data.
-
- If the first argument is `-a', then the flag variable `append' is
-set to true, and both `ARGV[1]' and `copy[1]' are deleted. If `ARGC' is
-less than two, then no file names were supplied and `tee' prints a
-usage message and exits. Finally, `awk' is forced to read the standard
-input by setting `ARGV[1]' to `"-"' and `ARGC' to two:
-
- # tee.awk --- tee in awk
- #
- # Copy standard input to all named output files.
- # Append content if -a option is supplied.
- #
- BEGIN \
- {
- for (i = 1; i < ARGC; i++)
- copy[i] = ARGV[i]
-
- if (ARGV[1] == "-a") {
- append = 1
- delete ARGV[1]
- delete copy[1]
- ARGC--
- }
- if (ARGC < 2) {
- print "usage: tee [-a] file ..." > "/dev/stderr"
- exit 1
- }
- ARGV[1] = "-"
- ARGC = 2
- }
-
- The following single rule does all the work. Since there is no
-pattern, it is executed for each line of input. The body of the rule
-simply prints the line into each file on the command line, and then to
-the standard output:
-
- {
- # moving the if outside the loop makes it run faster
- if (append)
- for (i in copy)
- print >> copy[i]
- else
- for (i in copy)
- print > copy[i]
- print
- }
-
-It is also possible to write the loop this way:
-
- for (i in copy)
- if (append)
- print >> copy[i]
- else
- print > copy[i]
-
-This is more concise but it is also less efficient. The `if' is tested
-for each record and for each output file. By duplicating the loop
-body, the `if' is only tested once for each input record. If there are
-N input records and M output files, the first method only executes N
-`if' statements, while the second executes N`*'M `if' statements.
-
- Finally, the `END' rule cleans up by closing all the output files:
-
- END \
- {
- for (i in copy)
- close(copy[i])
- }
-
-
-File: gawk.info, Node: Uniq Program, Next: Wc Program, Prev: Tee Program, Up: Clones
-
-14.2.6 Printing Nonduplicated Lines of Text
--------------------------------------------
-
-The `uniq' utility reads sorted lines of data on its standard input,
-and by default removes duplicate lines. In other words, it only prints
-unique lines--hence the name. `uniq' has a number of options. The
-usage is as follows:
-
- uniq [-udc [-N]] [+N] [ INPUT FILE [ OUTPUT FILE ]]
-
- The options for `uniq' are:
-
-`-d'
- Print only repeated lines.
-
-`-u'
- Print only nonrepeated lines.
-
-`-c'
- Count lines. This option overrides `-d' and `-u'. Both repeated
- and nonrepeated lines are counted.
-
-`-N'
- Skip N fields before comparing lines. The definition of fields is
- similar to `awk''s default: nonwhitespace characters separated by
- runs of spaces and/or TABs.
-
-`+N'
- Skip N characters before comparing lines. Any fields specified
- with `-N' are skipped first.
-
-`INPUT FILE'
- Data is read from the input file named on the command line,
- instead of from the standard input.
-
-`OUTPUT FILE'
- The generated output is sent to the named output file, instead of
- to the standard output.
-
- Normally `uniq' behaves as if both the `-d' and `-u' options are
-provided.
-
- `uniq' uses the `getopt()' library function (*note Getopt Function::)
-and the `join()' library function (*note Join Function::).
-
- The program begins with a `usage()' function and then a brief
-outline of the options and their meanings in comments. The `BEGIN'
-rule deals with the command-line arguments and options. It uses a trick
-to get `getopt()' to handle options of the form `-25', treating such an
-option as the option letter `2' with an argument of `5'. If indeed two
-or more digits are supplied (`Optarg' looks like a number), `Optarg' is
-concatenated with the option digit and then the result is added to zero
-to make it into a number. If there is only one digit in the option,
-then `Optarg' is not needed. In this case, `Optind' must be decremented
-so that `getopt()' processes it next time. This code is admittedly a
-bit tricky.
-
- If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines. The output file, if provided, is
-assigned to `outputfile'. Early on, `outputfile' is initialized to the
-standard output, `/dev/stdout':
-
- # uniq.awk --- do uniq in awk
- #
- # Requires getopt() and join() library functions
-
- function usage( e)
- {
- e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
- print e > "/dev/stderr"
- exit 1
- }
-
- # -c count lines. overrides -d and -u
- # -d only repeated lines
- # -u only nonrepeated lines
- # -n skip n fields
- # +n skip n characters, skip fields first
-
- BEGIN \
- {
- count = 1
- outputfile = "/dev/stdout"
- opts = "udc0:1:2:3:4:5:6:7:8:9:"
- while ((c = getopt(ARGC, ARGV, opts)) != -1) {
- if (c == "u")
- non_repeated_only++
- else if (c == "d")
- repeated_only++
- else if (c == "c")
- do_count++
- else if (index("0123456789", c) != 0) {
- # getopt requires args to options
- # this messes us up for things like -5
- if (Optarg ~ /^[[:digit:]]+$/)
- fcount = (c Optarg) + 0
- else {
- fcount = c + 0
- Optind--
- }
- } else
- usage()
- }
-
- if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) {
- charcount = substr(ARGV[Optind], 2) + 0
- Optind++
- }
-
- for (i = 1; i < Optind; i++)
- ARGV[i] = ""
-
- if (repeated_only == 0 && non_repeated_only == 0)
- repeated_only = non_repeated_only = 1
-
- if (ARGC - Optind == 2) {
- outputfile = ARGV[ARGC - 1]
- ARGV[ARGC - 1] = ""
- }
- }
-
- The following function, `are_equal()', compares the current line,
-`$0', to the previous line, `last'. It handles skipping fields and
-characters. If no field count and no character count are specified,
-`are_equal()' simply returns one or zero depending upon the result of a
-simple string comparison of `last' and `$0'. Otherwise, things get more
-complicated. If fields have to be skipped, each line is broken into an
-array using `split()' (*note String Functions::); the desired fields
-are then joined back into a line using `join()'. The joined lines are
-stored in `clast' and `cline'. If no fields are skipped, `clast' and
-`cline' are set to `last' and `$0', respectively. Finally, if
-characters are skipped, `substr()' is used to strip off the leading
-`charcount' characters in `clast' and `cline'. The two strings are
-then compared and `are_equal()' returns the result:
-
- function are_equal( n, m, clast, cline, alast, aline)
- {
- if (fcount == 0 && charcount == 0)
- return (last == $0)
-
- if (fcount > 0) {
- n = split(last, alast)
- m = split($0, aline)
- clast = join(alast, fcount+1, n)
- cline = join(aline, fcount+1, m)
- } else {
- clast = last
- cline = $0
- }
- if (charcount) {
- clast = substr(clast, charcount + 1)
- cline = substr(cline, charcount + 1)
- }
-
- return (clast == cline)
- }
-
- The following two rules are the body of the program. The first one
-is executed only for the very first line of data. It sets `last' equal
-to `$0', so that subsequent lines of text have something to be compared
-to.
-
- The second rule does the work. The variable `equal' is one or zero,
-depending upon the results of `are_equal()''s comparison. If `uniq' is
-counting repeated lines, and the lines are equal, then it increments
-the `count' variable. Otherwise, it prints the line and resets `count',
-since the two lines are not equal.
-
- If `uniq' is not counting, and if the lines are equal, `count' is
-incremented. Nothing is printed, since the point is to remove
-duplicates. Otherwise, if `uniq' is counting repeated lines and more
-than one line is seen, or if `uniq' is counting nonrepeated lines and
-only one line is seen, then the line is printed, and `count' is reset.
-
- Finally, similar logic is used in the `END' rule to print the final
-line of input data:
-
- NR == 1 {
- last = $0
- next
- }
-
- {
- equal = are_equal()
-
- if (do_count) { # overrides -d and -u
- if (equal)
- count++
- else {
- printf("%4d %s\n", count, last) > outputfile
- last = $0
- count = 1 # reset
- }
- next
- }
-
- if (equal)
- count++
- else {
- if ((repeated_only && count > 1) ||
- (non_repeated_only && count == 1))
- print last > outputfile
- last = $0
- count = 1
- }
- }
-
- END {
- if (do_count)
- printf("%4d %s\n", count, last) > outputfile
- else if ((repeated_only && count > 1) ||
- (non_repeated_only && count == 1))
- print last > outputfile
- close(outputfile)
- }
-
-
-File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones
-
-14.2.7 Counting Things
-----------------------
-
-The `wc' (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
-
- wc [-lwc] [ FILES ... ]
-
- If no files are specified on the command line, `wc' reads its
-standard input. If there are multiple files, it also prints total
-counts for all the files. The options and their meanings are shown in
-the following list:
-
-`-l'
- Count only lines.
-
-`-w'
- Count only words. A "word" is a contiguous sequence of
- nonwhitespace characters, separated by spaces and/or TABs.
- Luckily, this is the normal way `awk' separates fields in its
- input data.
-
-`-c'
- Count only characters.
-
- Implementing `wc' in `awk' is particularly elegant, since `awk' does
-a lot of the work for us; it splits lines into words (i.e., fields) and
-counts them, it counts lines (i.e., records), and it can easily tell us
-how long a line is.
-
- This program uses the `getopt()' library function (*note Getopt
-Function::) and the file-transition functions (*note Filetrans
-Function::).
-
- This version has one notable difference from traditional versions of
-`wc': it always prints the counts in the order lines, words, and
-characters. Traditional versions note the order of the `-l', `-w', and
-`-c' options on the command line, and print the counts in that order.
-
- The `BEGIN' rule does the argument processing. The variable
-`print_total' is true if more than one file is named on the command
-line:
-
- # wc.awk --- count lines, words, characters
-
- # Options:
- # -l only count lines
- # -w only count words
- # -c only count characters
- #
- # Default is to count lines, words, characters
- #
- # Requires getopt() and file transition library functions
-
- BEGIN {
- # let getopt() print a message about
- # invalid options. we ignore them
- while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
- if (c == "l")
- do_lines = 1
- else if (c == "w")
- do_words = 1
- else if (c == "c")
- do_chars = 1
- }
- for (i = 1; i < Optind; i++)
- ARGV[i] = ""
-
- # if no options, do all
- if (! do_lines && ! do_words && ! do_chars)
- do_lines = do_words = do_chars = 1
-
- print_total = (ARGC - i > 2)
- }
-
- The `beginfile()' function is simple; it just resets the counts of
-lines, words, and characters to zero, and saves the current file name in
-`fname':
-
- function beginfile(file)
- {
- lines = words = chars = 0
- fname = FILENAME
- }
-
- The `endfile()' function adds the current file's numbers to the
-running totals of lines, words, and characters.(1) It then prints out
-those numbers for the file that was just read. It relies on
-`beginfile()' to reset the numbers for the following data file:
-
- function endfile(file)
- {
- tlines += lines
- twords += words
- tchars += chars
- if (do_lines)
- printf "\t%d", lines
- if (do_words)
- printf "\t%d", words
- if (do_chars)
- printf "\t%d", chars
- printf "\t%s\n", fname
- }
-
- There is one rule that is executed for each line. It adds the length
-of the record, plus one, to `chars'.(2) Adding one plus the record
-length is needed because the newline character separating records (the
-value of `RS') is not part of the record itself, and thus not included
-in its length. Next, `lines' is incremented for each line read, and
-`words' is incremented by the value of `NF', which is the number of
-"words" on this line:
-
- # do per line
- {
- chars += length($0) + 1 # get newline
- lines++
- words += NF
- }
-
- Finally, the `END' rule simply prints the totals for all the files:
-
- END {
- if (print_total) {
- if (do_lines)
- printf "\t%d", tlines
- if (do_words)
- printf "\t%d", twords
- if (do_chars)
- printf "\t%d", tchars
- print "\ttotal"
- }
- }
-
- ---------- Footnotes ----------
-
- (1) `wc' can't just use the value of `FNR' in `endfile()'. If you
-examine the code in *note Filetrans Function::, you will see that `FNR'
-has already been reset by the time `endfile()' is called.
-
- (2) Since `gawk' understands multibyte locales, this code counts
-characters, not bytes.
-
-
-File: gawk.info, Node: Miscellaneous Programs, Prev: Clones, Up: Sample Programs
-
-14.3 A Grab Bag of `awk' Programs
-=================================
-
-This minor node is a large "grab bag" of miscellaneous programs. We
-hope you find them both interesting and enjoyable.
-
-* Menu:
-
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the `tr' utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a history
- file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for `awk' that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time on
- their hands.
-
-
-File: gawk.info, Node: Dupword Program, Next: Alarm Program, Up: Miscellaneous Programs
-
-14.3.1 Finding Duplicated Words in a Document
----------------------------------------------
-
-A common error when writing large amounts of prose is to accidentally
-duplicate words. Typically you will see this in text as something like
-"the the program does the following..." When the text is online, often
-the duplicated words occur at the end of one line and the beginning of
-another, making them very difficult to spot.
-
- This program, `dupword.awk', scans through a file one line at a time
-and looks for adjacent occurrences of the same word. It also saves the
-last word on a line (in the variable `prev') for comparison with the
-first word on the next line.
-
- The first two statements make sure that the line is all lowercase,
-so that, for example, "The" and "the" compare equal to each other. The
-next statement replaces nonalphanumeric and nonwhitespace characters
-with spaces, so that punctuation does not affect the comparison either.
-The characters are replaced with spaces so that formatting controls
-don't create nonsense words (e.g., the Texinfo `@code{NF}' becomes
-`codeNF' if punctuation is simply deleted). The record is then resplit
-into fields, yielding just the actual words on the line, and ensuring
-that there are no empty fields.
-
- If there are no fields left after removing all the punctuation, the
-current record is skipped. Otherwise, the program loops through each
-word, comparing it to the previous one:
-
- # dupword.awk --- find duplicate words in text
- {
- $0 = tolower($0)
- gsub(/[^[:alnum:][:blank:]]/, " ");
- $0 = $0 # re-split
- if (NF == 0)
- next
- if ($1 == prev)
- printf("%s:%d: duplicate %s\n",
- FILENAME, FNR, $1)
- for (i = 2; i <= NF; i++)
- if ($i == $(i-1))
- printf("%s:%d: duplicate %s\n",
- FILENAME, FNR, $i)
- prev = $NF
- }
-
-
-File: gawk.info, Node: Alarm Program, Next: Translate Program, Prev: Dupword Program, Up: Miscellaneous Programs
-
-14.3.2 An Alarm Clock Program
------------------------------
-
- Nothing cures insomnia like a ringing alarm clock.
- Arnold Robbins
-
- The following program is a simple "alarm clock" program. You give
-it a time of day and an optional message. At the specified time, it
-prints the message on the standard output. In addition, you can give it
-the number of times to repeat the message as well as a delay between
-repetitions.
-
- This program uses the `gettimeofday()' function from *note
-Gettimeofday Function::.
-
- All the work is done in the `BEGIN' rule. The first part is argument
-checking and setting of defaults: the delay, the count, and the message
-to print. If the user supplied a message without the ASCII BEL
-character (known as the "alert" character, `"\a"'), then it is added to
-the message. (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
-to itself in case the user is not looking at the computer.) Just for a
-change, this program uses a `switch' statement (*note Switch
-Statement::), but the processing could be done with a series of
-`if'-`else' statements instead. Here is the program:
-
- # alarm.awk --- set an alarm
- #
- # Requires gettimeofday() library function
- # usage: alarm time [ "message" [ count [ delay ] ] ]
-
- BEGIN \
- {
- # Initial argument sanity checking
- usage1 = "usage: alarm time ['message' [count [delay]]]"
- usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
-
- if (ARGC < 2) {
- print usage1 > "/dev/stderr"
- print usage2 > "/dev/stderr"
- exit 1
- }
- switch (ARGC) {
- case 5:
- delay = ARGV[4] + 0
- # fall through
- case 4:
- count = ARGV[3] + 0
- # fall through
- case 3:
- message = ARGV[2]
- break
- default:
- if (ARGV[1] !~ /[[:digit:]]?[[:digit:]]:[[:digit:]]{2}/) {
- print usage1 > "/dev/stderr"
- print usage2 > "/dev/stderr"
- exit 1
- }
- break
- }
-
- # set defaults for once we reach the desired time
- if (delay == 0)
- delay = 180 # 3 minutes
- if (count == 0)
- count = 5
- if (message == "")
- message = sprintf("\aIt is now %s!\a", ARGV[1])
- else if (index(message, "\a") == 0)
- message = "\a" message "\a"
-
- The next minor node of code turns the alarm time into hours and
-minutes, converts it (if necessary) to a 24-hour clock, and then turns
-that time into a count of the seconds since midnight. Next it turns
-the current time into a count of seconds since midnight. The
-difference between the two is how long to wait before setting off the
-alarm:
-
- # split up alarm time
- split(ARGV[1], atime, ":")
- hour = atime[1] + 0 # force numeric
- minute = atime[2] + 0 # force numeric
-
- # get current broken down time
- gettimeofday(now)
-
- # if time given is 12-hour hours and it's after that
- # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
- # then add 12 to real hour
- if (hour < 12 && now["hour"] > hour)
- hour += 12
-
- # set target time in seconds since midnight
- target = (hour * 60 * 60) + (minute * 60)
-
- # get current time in seconds since midnight
- current = (now["hour"] * 60 * 60) + \
- (now["minute"] * 60) + now["second"]
-
- # how long to sleep for
- naptime = target - current
- if (naptime <= 0) {
- print "time is in the past!" > "/dev/stderr"
- exit 1
- }
-
- Finally, the program uses the `system()' function (*note I/O
-Functions::) to call the `sleep' utility. The `sleep' utility simply
-pauses for the given number of seconds. If the exit status is not zero,
-the program assumes that `sleep' was interrupted and exits. If `sleep'
-exited with an OK status (zero), then the program prints the message in
-a loop, again using `sleep' to delay for however many seconds are
-necessary:
-
- # zzzzzz..... go away if interrupted
- if (system(sprintf("sleep %d", naptime)) != 0)
- exit 1
-
- # time to notify!
- command = sprintf("sleep %d", delay)
- for (i = 1; i <= count; i++) {
- print message
- # if sleep command interrupted, go away
- if (system(command) != 0)
- break
- }
-
- exit 0
- }
-
-
-File: gawk.info, Node: Translate Program, Next: Labels Program, Prev: Alarm Program, Up: Miscellaneous Programs
-
-14.3.3 Transliterating Characters
----------------------------------
-
-The system `tr' utility transliterates characters. For example, it is
-often used to map uppercase letters into lowercase for further
-processing:
-
- GENERATE DATA | tr 'A-Z' 'a-z' | PROCESS DATA ...
-
- `tr' requires two lists of characters.(1) When processing the
-input, the first character in the first list is replaced with the first
-character in the second list, the second character in the first list is
-replaced with the second character in the second list, and so on. If
-there are more characters in the "from" list than in the "to" list, the
-last character of the "to" list is used for the remaining characters in
-the "from" list.
-
- Some time ago, a user proposed that a transliteration function should
-be added to `gawk'. The following program was written to prove that
-character transliteration could be done with a user-level function.
-This program is not as complete as the system `tr' utility but it does
-most of the job.
-
- The `translate' program demonstrates one of the few weaknesses of
-standard `awk': dealing with individual characters is very painful,
-requiring repeated use of the `substr()', `index()', and `gsub()'
-built-in functions (*note String Functions::).(2) There are two
-functions. The first, `stranslate()', takes three arguments:
-
-`from'
- A list of characters from which to translate.
-
-`to'
- A list of characters to which to translate.
-
-`target'
- The string on which to do the translation.
-
- Associative arrays make the translation part fairly easy. `t_ar'
-holds the "to" characters, indexed by the "from" characters. Then a
-simple loop goes through `from', one character at a time. For each
-character in `from', if the character appears in `target', it is
-replaced with the corresponding `to' character.
-
- The `translate()' function simply calls `stranslate()' using `$0' as
-the target. The main program sets two global variables, `FROM' and
-`TO', from the command line, and then changes `ARGV' so that `awk'
-reads from the standard input.
-
- Finally, the processing rule simply calls `translate()' for each
-record:
-
- # translate.awk --- do tr-like stuff
- # Bugs: does not handle things like: tr A-Z a-z, it has
- # to be spelled out. However, if `to' is shorter than `from',
- # the last character in `to' is used for the rest of `from'.
-
- function stranslate(from, to, target, lf, lt, ltarget, t_ar, i, c,
- result)
- {
- lf = length(from)
- lt = length(to)
- ltarget = length(target)
- for (i = 1; i <= lt; i++)
- t_ar[substr(from, i, 1)] = substr(to, i, 1)
- if (lt < lf)
- for (; i <= lf; i++)
- t_ar[substr(from, i, 1)] = substr(to, lt, 1)
- for (i = 1; i <= ltarget; i++) {
- c = substr(target, i, 1)
- if (c in t_ar)
- c = t_ar[c]
- result = result c
- }
- return result
- }
-
- function translate(from, to)
- {
- return $0 = stranslate(from, to, $0)
- }
-
- # main program
- BEGIN {
- if (ARGC < 3) {
- print "usage: translate from to" > "/dev/stderr"
- exit
- }
- FROM = ARGV[1]
- TO = ARGV[2]
- ARGC = 2
- ARGV[1] = "-"
- }
-
- {
- translate(FROM, TO)
- print
- }
-
- While it is possible to do character transliteration in a user-level
-function, it is not necessarily efficient, and we (the `gawk' authors)
-started to consider adding a built-in function. However, shortly after
-writing this program, we learned that the System V Release 4 `awk' had
-added the `toupper()' and `tolower()' functions (*note String
-Functions::). These functions handle the vast majority of the cases
-where character transliteration is necessary, and so we chose to simply
-add those functions to `gawk' as well and then leave well enough alone.
-
- An obvious improvement to this program would be to set up the `t_ar'
-array only once, in a `BEGIN' rule. However, this assumes that the
-"from" and "to" lists will never change throughout the lifetime of the
-program.
-
- ---------- Footnotes ----------
-
- (1) On some older systems, `tr' may require that the lists be
-written as range expressions enclosed in square brackets (`[a-z]') and
-quoted, to prevent the shell from attempting a file name expansion.
-This is not a feature.
-
- (2) This program was written before `gawk' acquired the ability to
-split each character in a string into separate array elements.
-
-
-File: gawk.info, Node: Labels Program, Next: Word Sorting, Prev: Translate Program, Up: Miscellaneous Programs
-
-14.3.4 Printing Mailing Labels
-------------------------------
-
-Here is a "real world"(1) program. This script reads lists of names and
-addresses and generates mailing labels. Each page of labels has 20
-labels on it, two across and 10 down. The addresses are guaranteed to
-be no more than five lines of data. Each address is separated from the
-next by a blank line.
-
- The basic idea is to read 20 labels worth of data. Each line of
-each label is stored in the `line' array. The single rule takes care
-of filling the `line' array and printing the page when 20 labels have
-been read.
-
- The `BEGIN' rule simply sets `RS' to the empty string, so that `awk'
-splits records at blank lines (*note Records::). It sets `MAXLINES' to
-100, since 100 is the maximum number of lines on the page (20 * 5 =
-100).
-
- Most of the work is done in the `printpage()' function. The label
-lines are stored sequentially in the `line' array. But they have to
-print horizontally; `line[1]' next to `line[6]', `line[2]' next to
-`line[7]', and so on. Two loops are used to accomplish this. The
-outer loop, controlled by `i', steps through every 10 lines of data;
-this is each row of labels. The inner loop, controlled by `j', goes
-through the lines within the row. As `j' goes from 0 to 4, `i+j' is
-the `j'-th line in the row, and `i+j+5' is the entry next to it. The
-output ends up looking something like this:
-
- line 1 line 6
- line 2 line 7
- line 3 line 8
- line 4 line 9
- line 5 line 10
- ...
-
-The `printf' format string `%-41s' left-aligns the data and prints it
-within a fixed-width field.
-
- As a final note, an extra blank line is printed at lines 21 and 61,
-to keep the output lined up on the labels. This is dependent on the
-particular brand of labels in use when the program was written. You
-will also note that there are two blank lines at the top and two blank
-lines at the bottom.
-
- The `END' rule arranges to flush the final page of labels; there may
-not have been an even multiple of 20 labels in the data:
-
- # labels.awk --- print mailing labels
-
- # Each label is 5 lines of data that may have blank lines.
- # The label sheets have 2 blank lines at the top and 2 at
- # the bottom.
-
- BEGIN { RS = "" ; MAXLINES = 100 }
-
- function printpage( i, j)
- {
- if (Nlines <= 0)
- return
-
- printf "\n\n" # header
-
- for (i = 1; i <= Nlines; i += 10) {
- if (i == 21 || i == 61)
- print ""
- for (j = 0; j < 5; j++) {
- if (i + j > MAXLINES)
- break
- printf " %-41s %s\n", line[i+j], line[i+j+5]
- }
- print ""
- }
-
- printf "\n\n" # footer
-
- delete line
- }
-
- # main rule
- {
- if (Count >= 20) {
- printpage()
- Count = 0
- Nlines = 0
- }
- n = split($0, a, "\n")
- for (i = 1; i <= n; i++)
- line[++Nlines] = a[i]
- for (; i <= 5; i++)
- line[++Nlines] = ""
- Count++
- }
-
- END \
- {
- printpage()
- }
-
- ---------- Footnotes ----------
-
- (1) "Real world" is defined as "a program actually used to get
-something done."
-
-
-File: gawk.info, Node: Word Sorting, Next: History Sorting, Prev: Labels Program, Up: Miscellaneous Programs
-
-14.3.5 Generating Word-Usage Counts
------------------------------------
-
-When working with large amounts of text, it can be interesting to know
-how often different words appear. For example, an author may overuse
-certain words, in which case she might wish to find synonyms to
-substitute for words that appear too often. This node develops a
-program for counting words and presenting the frequency information in
-a useful format.
-
- At first glance, a program like this would seem to do the job:
-
- # Print list of word frequencies
-
- {
- for (i = 1; i <= NF; i++)
- freq[$i]++
- }
-
- END {
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
- }
-
- The program relies on `awk''s default field splitting mechanism to
-break each line up into "words," and uses an associative array named
-`freq', indexed by each word, to count the number of times the word
-occurs. In the `END' rule, it prints the counts.
-
- This program has several problems that prevent it from being useful
-on real text files:
-
- * The `awk' language considers upper- and lowercase characters to be
- distinct. Therefore, "bartender" and "Bartender" are not treated
- as the same word. This is undesirable, since in normal text, words
- are capitalized if they begin sentences, and a frequency analyzer
- should not be sensitive to capitalization.
-
- * Words are detected using the `awk' convention that fields are
- separated just by whitespace. Other characters in the input
- (except newlines) don't have any special meaning to `awk'. This
- means that punctuation characters count as part of words.
-
- * The output does not come out in any useful order. You're more
- likely to be interested in which words occur most frequently or in
- having an alphabetized table of how frequently each word occurs.
-
- The first problem can be solved by using `tolower()' to remove case
-distinctions. The second problem can be solved by using `gsub()' to
-remove punctuation characters. Finally, we solve the third problem by
-using the system `sort' utility to process the output of the `awk'
-script. Here is the new version of the program:
-
- # wordfreq.awk --- print list of word frequencies
-
- {
- $0 = tolower($0) # remove case distinctions
- # remove punctuation
- gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
- for (i = 1; i <= NF; i++)
- freq[$i]++
- }
-
- END {
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
- }
-
- Assuming we have saved this program in a file named `wordfreq.awk',
-and that the data is in `file1', the following pipeline:
-
- awk -f wordfreq.awk file1 | sort -k 2nr
-
-produces a table of the words appearing in `file1' in order of
-decreasing frequency.
-
- The `awk' program suitably massages the data and produces a word
-frequency table, which is not ordered. The `awk' script's output is
-then sorted by the `sort' utility and printed on the screen.
-
- The options given to `sort' specify a sort that uses the second
-field of each input line (skipping one field), that the sort keys
-should be treated as numeric quantities (otherwise `15' would come
-before `5'), and that the sorting should be done in descending
-(reverse) order.
-
- The `sort' could even be done from within the program, by changing
-the `END' action to:
-
- END {
- sort = "sort -k 2nr"
- for (word in freq)
- printf "%s\t%d\n", word, freq[word] | sort
- close(sort)
- }
-
- This way of sorting must be used on systems that do not have true
-pipes at the command-line (or batch-file) level. See the general
-operating system documentation for more information on how to use the
-`sort' program.
-
-
-File: gawk.info, Node: History Sorting, Next: Extract Program, Prev: Word Sorting, Up: Miscellaneous Programs
-
-14.3.6 Removing Duplicates from Unsorted Text
----------------------------------------------
-
-The `uniq' program (*note Uniq Program::), removes duplicate lines from
-_sorted_ data.
-
- Suppose, however, you need to remove duplicate lines from a data
-file but that you want to preserve the order the lines are in. A good
-example of this might be a shell history file. The history file keeps
-a copy of all the commands you have entered, and it is not unusual to
-repeat a command several times in a row. Occasionally you might want
-to compact the history by removing duplicate entries. Yet it is
-desirable to maintain the order of the original commands.
-
- This simple program does the job. It uses two arrays. The `data'
-array is indexed by the text of each line. For each line, `data[$0]'
-is incremented. If a particular line has not been seen before, then
-`data[$0]' is zero. In this case, the text of the line is stored in
-`lines[count]'. Each element of `lines' is a unique command, and the
-indices of `lines' indicate the order in which those lines are
-encountered. The `END' rule simply prints out the lines, in order:
-
- # histsort.awk --- compact a shell history file
- # Thanks to Byron Rakitzis for the general idea
-
- {
- if (data[$0]++ == 0)
- lines[++count] = $0
- }
-
- END {
- for (i = 1; i <= count; i++)
- print lines[i]
- }
-
- This program also provides a foundation for generating other useful
-information. For example, using the following `print' statement in the
-`END' rule indicates how often a particular command is used:
-
- print data[lines[i]], lines[i]
-
- This works because `data[$0]' is incremented each time a line is
-seen.
-
-
-File: gawk.info, Node: Extract Program, Next: Simple Sed, Prev: History Sorting, Up: Miscellaneous Programs
-
-14.3.7 Extracting Programs from Texinfo Source Files
-----------------------------------------------------
-
-The nodes *note Library Functions::, and *note Sample Programs::, are
-the top level nodes for a large number of `awk' programs. If you want
-to experiment with these programs, it is tedious to have to type them
-in by hand. Here we present a program that can extract parts of a
-Texinfo input file into separate files.
-
-This Info file is written in Texinfo (http://texinfo.org), the GNU
-project's document formatting language. A single Texinfo source file
-can be used to produce both printed and online documentation. The
-Texinfo language is described fully, starting with *note (Texinfo)Top::
-texinfo,Texinfo--The GNU Documentation Format.
-
- For our purposes, it is enough to know three things about Texinfo
-input files:
-
- * The "at" symbol (`@') is special in Texinfo, much as the backslash
- (`\') is in C or `awk'. Literal `@' symbols are represented in
- Texinfo source files as `@@'.
-
- * Comments start with either `@c' or `@comment'. The
- file-extraction program works by using special comments that start
- at the beginning of a line.
-
- * Lines containing `@group' and `@end group' commands bracket
- example text that should not be split across a page boundary.
- (Unfortunately, TeX isn't always smart enough to do things exactly
- right, so we have to give it some help.)
-
- The following program, `extract.awk', reads through a Texinfo source
-file and does two things, based on the special comments. Upon seeing
-`@c system ...', it runs a command, by extracting the command text from
-the control line and passing it on to the `system()' function (*note
-I/O Functions::). Upon seeing `@c file FILENAME', each subsequent line
-is sent to the file FILENAME, until `@c endfile' is encountered. The
-rules in `extract.awk' match either `@c' or `@comment' by letting the
-`omment' part be optional. Lines containing `@group' and `@end group'
-are simply removed. `extract.awk' uses the `join()' library function
-(*note Join Function::).
-
- The example programs in the online Texinfo source for `GAWK:
-Effective AWK Programming' (`gawk.texi') have all been bracketed inside
-`file' and `endfile' lines. The `gawk' distribution uses a copy of
-`extract.awk' to extract the sample programs and install many of them
-in a standard directory where `gawk' can find them. The Texinfo file
-looks something like this:
-
- ...
- This program has a @code{BEGIN} rule,
- that prints a nice message:
-
- @example
- @c file examples/messages.awk
- BEGIN @{ print "Don't panic!" @}
- @c end file
- @end example
-
- It also prints some final advice:
-
- @example
- @c file examples/messages.awk
- END @{ print "Always avoid bored archeologists!" @}
- @c end file
- @end example
- ...
-
- `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
-upper- and lowercase letters in the directives won't matter.
-
- The first rule handles calling `system()', checking that a command is
-given (`NF' is at least three) and also checking that the command exits
-with a zero exit status, signifying OK:
-
- # extract.awk --- extract files and run programs
- # from texinfo files
-
- BEGIN { IGNORECASE = 1 }
-
- /^@c(omment)?[ \t]+system/ \
- {
- if (NF < 3) {
- e = (FILENAME ":" FNR)
- e = (e ": badly formed `system' line")
- print e > "/dev/stderr"
- next
- }
- $1 = ""
- $2 = ""
- stat = system($0)
- if (stat != 0) {
- e = (FILENAME ":" FNR)
- e = (e ": warning: system returned " stat)
- print e > "/dev/stderr"
- }
- }
-
-The variable `e' is used so that the rule fits nicely on the screen.
-
- The second rule handles moving data into files. It verifies that a
-file name is given in the directive. If the file named is not the
-current file, then the current file is closed. Keeping the current file
-open until a new file is encountered allows the use of the `>'
-redirection for printing the contents, keeping open file management
-simple.
-
- The `for' loop does the work. It reads lines using `getline' (*note
-Getline::). For an unexpected end of file, it calls the
-`unexpected_eof()' function. If the line is an "endfile" line, then it
-breaks out of the loop. If the line is an `@group' or `@end group'
-line, then it ignores it and goes on to the next line. Similarly,
-comments within examples are also ignored.
-
- Most of the work is in the following few lines. If the line has no
-`@' symbols, the program can print it directly. Otherwise, each
-leading `@' must be stripped off. To remove the `@' symbols, the line
-is split into separate elements of the array `a', using the `split()'
-function (*note String Functions::). The `@' symbol is used as the
-separator character. Each element of `a' that is empty indicates two
-successive `@' symbols in the original line. For each two empty
-elements (`@@' in the original file), we have to add a single `@'
-symbol back in.(1)
-
- When the processing of the array is finished, `join()' is called
-with the value of `SUBSEP', to rejoin the pieces back into a single
-line. That line is then printed to the output file:
-
- /^@c(omment)?[ \t]+file/ \
- {
- if (NF != 3) {
- e = (FILENAME ":" FNR ": badly formed `file' line")
- print e > "/dev/stderr"
- next
- }
- if ($3 != curfile) {
- if (curfile != "")
- close(curfile)
- curfile = $3
- }
-
- for (;;) {
- if ((getline line) <= 0)
- unexpected_eof()
- if (line ~ /^@c(omment)?[ \t]+endfile/)
- break
- else if (line ~ /^@(end[ \t]+)?group/)
- continue
- else if (line ~ /^@c(omment+)?[ \t]+/)
- continue
- if (index(line, "@") == 0) {
- print line > curfile
- continue
- }
- n = split(line, a, "@")
- # if a[1] == "", means leading @,
- # don't add one back in.
- for (i = 2; i <= n; i++) {
- if (a[i] == "") { # was an @@
- a[i] = "@"
- if (a[i+1] == "")
- i++
- }
- }
- print join(a, 1, n, SUBSEP) > curfile
- }
- }
-
- An important thing to note is the use of the `>' redirection.
-Output done with `>' only opens the file once; it stays open and
-subsequent output is appended to the file (*note Redirection::). This
-makes it easy to mix program text and explanatory prose for the same
-sample source file (as has been done here!) without any hassle. The
-file is only closed when a new data file name is encountered or at the
-end of the input file.
-
- Finally, the function `unexpected_eof()' prints an appropriate error
-message and then exits. The `END' rule handles the final cleanup,
-closing the open file:
-
- function unexpected_eof()
- {
- printf("%s:%d: unexpected EOF or error\n",
- FILENAME, FNR) > "/dev/stderr"
- exit 1
- }
-
- END {
- if (curfile)
- close(curfile)
- }
-
- ---------- Footnotes ----------
-
- (1) This program was written before `gawk' had the `gensub()'
-function. Consider how you might use it to simplify the code.
-
-
-File: gawk.info, Node: Simple Sed, Next: Igawk Program, Prev: Extract Program, Up: Miscellaneous Programs
-
-14.3.8 A Simple Stream Editor
------------------------------
-
-The `sed' utility is a stream editor, a program that reads a stream of
-data, makes changes to it, and passes it on. It is often used to make
-global changes to a large file or to a stream of data generated by a
-pipeline of commands. While `sed' is a complicated program in its own
-right, its most common use is to perform global substitutions in the
-middle of a pipeline:
-
- command1 < orig.data | sed 's/old/new/g' | command2 > result
-
- Here, `s/old/new/g' tells `sed' to look for the regexp `old' on each
-input line and globally replace it with the text `new', i.e., all the
-occurrences on a line. This is similar to `awk''s `gsub()' function
-(*note String Functions::).
-
- The following program, `awksed.awk', accepts at least two
-command-line arguments: the pattern to look for and the text to replace
-it with. Any additional arguments are treated as data file names to
-process. If none are provided, the standard input is used:
-
- # awksed.awk --- do s/foo/bar/g using just print
- # Thanks to Michael Brennan for the idea
-
- function usage()
- {
- print "usage: awksed pat repl [files...]" > "/dev/stderr"
- exit 1
- }
-
- BEGIN {
- # validate arguments
- if (ARGC < 3)
- usage()
-
- RS = ARGV[1]
- ORS = ARGV[2]
-
- # don't use arguments as files
- ARGV[1] = ARGV[2] = ""
- }
-
- # look ma, no hands!
- {
- if (RT == "")
- printf "%s", $0
- else
- print
- }
-
- The program relies on `gawk''s ability to have `RS' be a regexp, as
-well as on the setting of `RT' to the actual text that terminates the
-record (*note Records::).
-
- The idea is to have `RS' be the pattern to look for. `gawk'
-automatically sets `$0' to the text between matches of the pattern.
-This is text that we want to keep, unmodified. Then, by setting `ORS'
-to the replacement text, a simple `print' statement outputs the text we
-want to keep, followed by the replacement text.
-
- There is one wrinkle to this scheme, which is what to do if the last
-record doesn't end with text that matches `RS'. Using a `print'
-statement unconditionally prints the replacement text, which is not
-correct. However, if the file did not end in text that matches `RS',
-`RT' is set to the null string. In this case, we can print `$0' using
-`printf' (*note Printf::).
-
- The `BEGIN' rule handles the setup, checking for the right number of
-arguments and calling `usage()' if there is a problem. Then it sets
-`RS' and `ORS' from the command-line arguments and sets `ARGV[1]' and
-`ARGV[2]' to the null string, so that they are not treated as file names
-(*note ARGC and ARGV::).
-
- The `usage()' function prints an error message and exits. Finally,
-the single rule handles the printing scheme outlined above, using
-`print' or `printf' as appropriate, depending upon the value of `RT'.
-
-
-File: gawk.info, Node: Igawk Program, Next: Anagram Program, Prev: Simple Sed, Up: Miscellaneous Programs
-
-14.3.9 An Easy Way to Use Library Functions
--------------------------------------------
-
-In *note Include Files::, we saw how `gawk' provides a built-in
-file-inclusion capability. However, this is a `gawk' extension. This
-minor node provides the motivation for making file inclusion available
-for standard `awk', and shows how to do it using a combination of shell
-and `awk' programming.
-
- Using library functions in `awk' can be very beneficial. It
-encourages code reuse and the writing of general functions. Programs are
-smaller and therefore clearer. However, using library functions is
-only easy when writing `awk' programs; it is painful when running them,
-requiring multiple `-f' options. If `gawk' is unavailable, then so too
-is the `AWKPATH' environment variable and the ability to put `awk'
-functions into a library directory (*note Options::). It would be nice
-to be able to write programs in the following manner:
-
- # library functions
- @include getopt.awk
- @include join.awk
- ...
-
- # main program
- BEGIN {
- while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
- ...
- ...
- }
-
- The following program, `igawk.sh', provides this service. It
-simulates `gawk''s searching of the `AWKPATH' variable and also allows
-"nested" includes; i.e., a file that is included with `@include' can
-contain further `@include' statements. `igawk' makes an effort to only
-include files once, so that nested includes don't accidentally include
-a library function twice.
-
- `igawk' should behave just like `gawk' externally. This means it
-should accept all of `gawk''s command-line arguments, including the
-ability to have multiple source files specified via `-f', and the
-ability to mix command-line and library source files.
-
- The program is written using the POSIX Shell (`sh') command
-language.(1) It works as follows:
-
- 1. Loop through the arguments, saving anything that doesn't represent
- `awk' source code for later, when the expanded program is run.
-
- 2. For any arguments that do represent `awk' text, put the arguments
- into a shell variable that will be expanded. There are two cases:
-
- a. Literal text, provided with `--source' or `--source='. This
- text is just appended directly.
-
- b. Source file names, provided with `-f'. We use a neat trick
- and append `@include FILENAME' to the shell variable's
- contents. Since the file-inclusion program works the way
- `gawk' does, this gets the text of the file included into the
- program at the correct point.
-
- 3. Run an `awk' program (naturally) over the shell variable's
- contents to expand `@include' statements. The expanded program is
- placed in a second shell variable.
-
- 4. Run the expanded program with `gawk' and any other original
- command-line arguments that the user supplied (such as the data
- file names).
-
- This program uses shell variables extensively: for storing
-command-line arguments, the text of the `awk' program that will expand
-the user's program, for the user's original program, and for the
-expanded program. Doing so removes some potential problems that might
-arise were we to use temporary files instead, at the cost of making the
-script somewhat more complicated.
-
- The initial part of the program turns on shell tracing if the first
-argument is `debug'.
-
- The next part loops through all the command-line arguments. There
-are several cases of interest:
-
-`--'
- This ends the arguments to `igawk'. Anything else should be
- passed on to the user's `awk' program without being evaluated.
-
-`-W'
- This indicates that the next option is specific to `gawk'. To make
- argument processing easier, the `-W' is appended to the front of
- the remaining arguments and the loop continues. (This is an `sh'
- programming trick. Don't worry about it if you are not familiar
- with `sh'.)
-
-`-v, -F'
- These are saved and passed on to `gawk'.
-
-`-f, --file, --file=, -Wfile='
- The file name is appended to the shell variable `program' with an
- `@include' statement. The `expr' utility is used to remove the
- leading option part of the argument (e.g., `--file='). (Typical
- `sh' usage would be to use the `echo' and `sed' utilities to do
- this work. Unfortunately, some versions of `echo' evaluate escape
- sequences in their arguments, possibly mangling the program text.
- Using `expr' avoids this problem.)
-
-`--source, --source=, -Wsource='
- The source text is appended to `program'.
-
-`--version, -Wversion'
- `igawk' prints its version number, runs `gawk --version' to get
- the `gawk' version information, and then exits.
-
- If none of the `-f', `--file', `-Wfile', `--source', or `-Wsource'
-arguments are supplied, then the first nonoption argument should be the
-`awk' program. If there are no command-line arguments left, `igawk'
-prints an error message and exits. Otherwise, the first argument is
-appended to `program'. In any case, after the arguments have been
-processed, `program' contains the complete text of the original `awk'
-program.
-
- The program is as follows:
-
- #! /bin/sh
- # igawk --- like gawk but do @include processing
-
- if [ "$1" = debug ]
- then
- set -x
- shift
- fi
-
- # A literal newline, so that program text is formatted correctly
- n='
- '
-
- # Initialize variables to empty
- program=
- opts=
-
- while [ $# -ne 0 ] # loop over arguments
- do
- case $1 in
- --) shift
- break ;;
-
- -W) shift
- # The ${x?'message here'} construct prints a
- # diagnostic if $x is the null string
- set -- -W"${@?'missing operand'}"
- continue ;;
-
- -[vF]) opts="$opts $1 '${2?'missing operand'}'"
- shift ;;
-
- -[vF]*) opts="$opts '$1'" ;;
-
- -f) program="$program$n@include ${2?'missing operand'}"
- shift ;;
-
- -f*) f=$(expr "$1" : '-f\(.*\)')
- program="$program$n@include $f" ;;
-
- -[W-]file=*)
- f=$(expr "$1" : '-.file=\(.*\)')
- program="$program$n@include $f" ;;
-
- -[W-]file)
- program="$program$n@include ${2?'missing operand'}"
- shift ;;
-
- -[W-]source=*)
- t=$(expr "$1" : '-.source=\(.*\)')
- program="$program$n$t" ;;
-
- -[W-]source)
- program="$program$n${2?'missing operand'}"
- shift ;;
-
- -[W-]version)
- echo igawk: version 3.0 1>&2
- gawk --version
- exit 0 ;;
-
- -[W-]*) opts="$opts '$1'" ;;
-
- *) break ;;
- esac
- shift
- done
-
- if [ -z "$program" ]
- then
- program=${1?'missing program'}
- shift
- fi
-
- # At this point, `program' has the program.
-
- The `awk' program to process `@include' directives is stored in the
-shell variable `expand_prog'. Doing this keeps the shell script
-readable. The `awk' program reads through the user's program, one line
-at a time, using `getline' (*note Getline::). The input file names and
-`@include' statements are managed using a stack. As each `@include' is
-encountered, the current file name is "pushed" onto the stack and the
-file named in the `@include' directive becomes the current file name.
-As each file is finished, the stack is "popped," and the previous input
-file becomes the current input file again. The process is started by
-making the original file the first one on the stack.
-
- The `pathto()' function does the work of finding the full path to a
-file. It simulates `gawk''s behavior when searching the `AWKPATH'
-environment variable (*note AWKPATH Variable::). If a file name has a
-`/' in it, no path search is done. Similarly, if the file name is
-`"-"', then that string is used as-is. Otherwise, the file name is
-concatenated with the name of each directory in the path, and an
-attempt is made to open the generated file name. The only way to test
-if a file can be read in `awk' is to go ahead and try to read it with
-`getline'; this is what `pathto()' does.(2) If the file can be read, it
-is closed and the file name is returned:
-
- expand_prog='
-
- function pathto(file, i, t, junk)
- {
- if (index(file, "/") != 0)
- return file
-
- if (file == "-")
- return file
-
- for (i = 1; i <= ndirs; i++) {
- t = (pathlist[i] "/" file)
- if ((getline junk < t) > 0) {
- # found it
- close(t)
- return t
- }
- }
- return ""
- }
-
- The main program is contained inside one `BEGIN' rule. The first
-thing it does is set up the `pathlist' array that `pathto()' uses.
-After splitting the path on `:', null elements are replaced with `"."',
-which represents the current directory:
-
- BEGIN {
- path = ENVIRON["AWKPATH"]
- ndirs = split(path, pathlist, ":")
- for (i = 1; i <= ndirs; i++) {
- if (pathlist[i] == "")
- pathlist[i] = "."
- }
-
- The stack is initialized with `ARGV[1]', which will be `/dev/stdin'.
-The main loop comes next. Input lines are read in succession. Lines
-that do not start with `@include' are printed verbatim. If the line
-does start with `@include', the file name is in `$2'. `pathto()' is
-called to generate the full path. If it cannot, then the program
-prints an error message and continues.
-
- The next thing to check is if the file is included already. The
-`processed' array is indexed by the full file name of each included
-file and it tracks this information for us. If the file is seen again,
-a warning message is printed. Otherwise, the new file name is pushed
-onto the stack and processing continues.
-
- Finally, when `getline' encounters the end of the input file, the
-file is closed and the stack is popped. When `stackptr' is less than
-zero, the program is done:
-
- stackptr = 0
- input[stackptr] = ARGV[1] # ARGV[1] is first file
-
- for (; stackptr >= 0; stackptr--) {
- while ((getline < input[stackptr]) > 0) {
- if (tolower($1) != "@include") {
- print
- continue
- }
- fpath = pathto($2)
- if (fpath == "") {
- printf("igawk:%s:%d: cannot find %s\n",
- input[stackptr], FNR, $2) > "/dev/stderr"
- continue
- }
- if (! (fpath in processed)) {
- processed[fpath] = input[stackptr]
- input[++stackptr] = fpath # push onto stack
- } else
- print $2, "included in", input[stackptr],
- "already included in",
- processed[fpath] > "/dev/stderr"
- }
- close(input[stackptr])
- }
- }' # close quote ends `expand_prog' variable
-
- processed_program=$(gawk -- "$expand_prog" /dev/stdin << EOF
- $program
- EOF
- )
-
- The shell construct `COMMAND << MARKER' is called a "here document".
-Everything in the shell script up to the MARKER is fed to COMMAND as
-input. The shell processes the contents of the here document for
-variable and command substitution (and possibly other things as well,
-depending upon the shell).
-
- The shell construct `$(...)' is called "command substitution". The
-output of the command inside the parentheses is substituted into the
-command line. Because the result is used in a variable assignment, it
-is saved as a single string, even if the results contain whitespace.
-
- The expanded program is saved in the variable `processed_program'.
-It's done in these steps:
-
- 1. Run `gawk' with the `@include'-processing program (the value of
- the `expand_prog' shell variable) on standard input.
-
- 2. Standard input is the contents of the user's program, from the
- shell variable `program'. Its contents are fed to `gawk' via a
- here document.
-
- 3. The results of this processing are saved in the shell variable
- `processed_program' by using command substitution.
-
- The last step is to call `gawk' with the expanded program, along
-with the original options and command-line arguments that the user
-supplied.
-
- eval gawk $opts -- '"$processed_program"' '"$@"'
-
- The `eval' command is a shell construct that reruns the shell's
-parsing process. This keeps things properly quoted.
-
- This version of `igawk' represents my fifth version of this program.
-There are four key simplifications that make the program work better:
-
- * Using `@include' even for the files named with `-f' makes building
- the initial collected `awk' program much simpler; all the
- `@include' processing can be done once.
-
- * Not trying to save the line read with `getline' in the `pathto()'
- function when testing for the file's accessibility for use with
- the main program simplifies things considerably.
-
- * Using a `getline' loop in the `BEGIN' rule does it all in one
- place. It is not necessary to call out to a separate loop for
- processing nested `@include' statements.
-
- * Instead of saving the expanded program in a temporary file,
- putting it in a shell variable avoids some potential security
- problems. This has the disadvantage that the script relies upon
- more features of the `sh' language, making it harder to follow for
- those who aren't familiar with `sh'.
-
- Also, this program illustrates that it is often worthwhile to combine
-`sh' and `awk' programming together. You can usually accomplish quite
-a lot, without having to resort to low-level programming in C or C++,
-and it is frequently easier to do certain kinds of string and argument
-manipulation using the shell than it is in `awk'.
-
- Finally, `igawk' shows that it is not always necessary to add new
-features to a program; they can often be layered on top.
-
- As an additional example of this, consider the idea of having two
-files in a directory in the search path:
-
-`default.awk'
- This file contains a set of default library functions, such as
- `getopt()' and `assert()'.
-
-`site.awk'
- This file contains library functions that are specific to a site or
- installation; i.e., locally developed functions. Having a
- separate file allows `default.awk' to change with new `gawk'
- releases, without requiring the system administrator to update it
- each time by adding the local functions.
-
- One user suggested that `gawk' be modified to automatically read
-these files upon startup. Instead, it would be very simple to modify
-`igawk' to do this. Since `igawk' can process nested `@include'
-directives, `default.awk' could simply contain `@include' statements
-for the desired library functions.
-
- ---------- Footnotes ----------
-
- (1) Fully explaining the `sh' language is beyond the scope of this
-book. We provide some minimal explanations, but see a good shell
-programming book if you wish to understand things in more depth.
-
- (2) On some very old versions of `awk', the test `getline junk < t'
-can loop forever if the file exists but is empty. Caveat emptor.
-
-
-File: gawk.info, Node: Anagram Program, Next: Signature Program, Prev: Igawk Program, Up: Miscellaneous Programs
-
-14.3.10 Finding Anagrams From A Dictionary
-------------------------------------------
-
-An interesting programming challenge is to search for "anagrams" in a
-word list (such as `/usr/share/dict/words' on many GNU/Linux systems).
-One word is an anagram of another if both words contain the same letters
-(for example, "babbling" and "blabbing").
-
- An elegant algorithm is presented in Column 2, Problem C of Jon
-Bentley's `Programming Pearls', second edition. The idea is to give
-words that are anagrams a common signature, sort all the words together
-by their signature, and then print them. Dr. Bentley observes that
-taking the letters in each word and sorting them produces that common
-signature.
-
- The following program uses arrays of arrays to bring together words
-with the same signature and array sorting to print the words in sorted
-order.
-
- # anagram.awk --- An implementation of the anagram finding algorithm
- # from Jon Bentley's "Programming Pearls", 2nd edition.
- # Addison Wesley, 2000, ISBN 0-201-65788-0.
- # Column 2, Problem C, section 2.8, pp 18-20.
-
- /'s$/ { next } # Skip possessives
-
- The program starts with a header, and then a rule to skip
-possessives in the dictionary file. The next rule builds up the data
-structure. The first dimension of the array is indexed by the
-signature; the second dimension is the word itself:
-
- {
- key = word2key($1) # Build signature
- data[key][$1] = $1 # Store word with signature
- }
-
- The `word2key()' function creates the signature. It splits the word
-apart into individual letters, sorts the letters, and then joins them
-back together:
-
- # word2key --- split word apart into letters, sort, joining back together
-
- function word2key(word, a, i, n, result)
- {
- n = split(word, a, "")
- asort(a)
-
- for (i = 1; i <= n; i++)
- result = result a[i]
-
- return result
- }
-
- Finally, the `END' rule traverses the array and prints out the
-anagram lists. It sends the output to the system `sort' command, since
-otherwise the anagrams would appear in arbitrary order:
-
- END {
- sort = "sort"
- for (key in data) {
- # Sort words with same key
- nwords = asorti(data[key], words)
- if (nwords == 1)
- continue
-
- # And print. Minor glitch: trailing space at end of each line
- for (j = 1; j <= nwords; j++)
- printf("%s ", words[j]) | sort
- print "" | sort
- }
- close(sort)
- }
-
- Here is some partial output when the program is run:
-
- $ gawk -f anagram.awk /usr/share/dict/words | grep '^b'
- ...
- babbled blabbed
- babbler blabber brabble
- babblers blabbers brabbles
- babbling blabbing
- babbly blabby
- babel bable
- babels beslab
- babery yabber
- ...
-
-
-File: gawk.info, Node: Signature Program, Prev: Anagram Program, Up: Miscellaneous Programs
-
-14.3.11 And Now For Something Completely Different
---------------------------------------------------
-
-The following program was written by Davide Brini and is published on
-his website (http://backreference.org/2011/02/03/obfuscated-awk/). It
-serves as his signature in the Usenet group `comp.lang.awk'. He
-supplies the following copyright terms:
-
- Copyright (C) 2008 Davide Brini
-
- Copying and distribution of the code published in this page, with
- or without modification, are permitted in any medium without
- royalty provided the copyright notice and this notice are
- preserved.
-
- Here is the program:
-
- awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
- printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
- X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
- O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O}'
-
- We leave it to you to determine what the program does.
-
-
-File: gawk.info, Node: Debugger, Next: Language History, Prev: Sample Programs, Up: Top
-
-15 Debugging `awk' Programs
-***************************
-
-It would be nice if computer programs worked perfectly the first time
-they were run, but in real life, this rarely happens for programs of
-any complexity. Thus, most programming languages have facilities
-available for "debugging" programs, and now `awk' is no exception.
-
- The `gawk' debugger is purposely modeled after the GNU Debugger
-(GDB) (http://www.gnu.org/software/gdb/) command-line debugger. If you
-are familiar with GDB, learning how to use `gawk' for debugging your
-program is easy.
-
-* Menu:
-
-* Debugging:: Introduction to `gawk' debugger.
-* Sample Debugging Session:: Sample debugging session.
-* List of Debugger Commands:: Main debugger commands.
-* Readline Support:: Readline support.
-* Limitations:: Limitations and future plans.
-
-
-File: gawk.info, Node: Debugging, Next: Sample Debugging Session, Up: Debugger
-
-15.1 Introduction to `gawk' Debugger
-====================================
-
-This minor node introduces debugging in general and begins the
-discussion of debugging in `gawk'.
-
-* Menu:
-
-* Debugging Concepts:: Debugging in General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-
-
-File: gawk.info, Node: Debugging Concepts, Next: Debugging Terms, Up: Debugging
-
-15.1.1 Debugging in General
----------------------------
-
-(If you have used debuggers in other languages, you may want to skip
-ahead to the next section on the specific features of the `awk'
-debugger.)
-
- Of course, a debugging program cannot remove bugs for you, since it
-has no way of knowing what you or your users consider a "bug" and what
-is a "feature." (Sometimes, we humans have a hard time with this
-ourselves.) In that case, what can you expect from such a tool? The
-answer to that depends on the language being debugged, but in general,
-you can expect at least the following:
-
- * The ability to watch a program execute its instructions one by one,
- giving you, the programmer, the opportunity to think about what is
- happening on a time scale of seconds, minutes, or hours, rather
- than the nanosecond time scale at which the code usually runs.
-
- * The opportunity to not only passively observe the operation of your
- program, but to control it and try different paths of execution,
- without having to change your source files.
-
- * The chance to see the values of data in the program at any point in
- execution, and also to change that data on the fly, to see how that
- affects what happens afterwards. (This often includes the ability
- to look at internal data structures besides the variables you
- actually defined in your code.)
-
- * The ability to obtain additional information about your program's
- state or even its internal structure.
-
- All of these tools provide a great amount of help in using your own
-skills and understanding of the goals of your program to find where it
-is going wrong (or, for that matter, to better comprehend a perfectly
-functional program that you or someone else wrote).
-
-
-File: gawk.info, Node: Debugging Terms, Next: Awk Debugging, Prev: Debugging Concepts, Up: Debugging
-
-15.1.2 Additional Debugging Concepts
-------------------------------------
-
-Before diving in to the details, we need to introduce several important
-concepts that apply to just about all debuggers. The following list
-defines terms used throughout the rest of this major node.
-
-"Stack Frame"
- Programs generally call functions during the course of their
- execution. One function can call another, or a function can call
- itself (recursion). You can view the chain of called functions
- (main program calls A, which calls B, which calls C), as a stack
- of executing functions: the currently running function is the
- topmost one on the stack, and when it finishes (returns), the next
- one down then becomes the active function. Such a stack is termed
- a "call stack".
-
- For each function on the call stack, the system maintains a data
- area that contains the function's parameters, local variables, and
- return value, as well as any other "bookkeeping" information
- needed to manage the call stack. This data area is termed a
- "stack frame".
-
- `gawk' also follows this model, and gives you access to the call
- stack and to each stack frame. You can see the call stack, as well
- as from where each function on the stack was invoked. Commands
- that print the call stack print information about each stack frame
- (as detailed later on).
-
-"Breakpoint"
- During debugging, you often wish to let the program run until it
- reaches a certain point, and then continue execution from there one
- statement (or instruction) at a time. The way to do this is to set
- a "breakpoint" within the program. A breakpoint is where the
- execution of the program should break off (stop), so that you can
- take over control of the program's execution. You can add and
- remove as many breakpoints as you like.
-
-"Watchpoint"
- A watchpoint is similar to a breakpoint. The difference is that
- breakpoints are oriented around the code: stop when a certain
- point in the code is reached. A watchpoint, however, specifies
- that program execution should stop when a _data value_ is changed.
- This is useful, since sometimes it happens that a variable
- receives an erroneous value, and it's hard to track down where
- this happens just by looking at the code. By using a watchpoint,
- you can stop whenever a variable is assigned to, and usually find
- the errant code quite quickly.
-
-
-File: gawk.info, Node: Awk Debugging, Prev: Debugging Terms, Up: Debugging
-
-15.1.3 Awk Debugging
---------------------
-
-Debugging an `awk' program has some specific aspects that are not
-shared with other programming languages.
-
- First of all, the fact that `awk' programs usually take input
-line-by-line from a file or files and operate on those lines using
-specific rules makes it especially useful to organize viewing the
-execution of the program in terms of these rules. As we will see, each
-`awk' rule is treated almost like a function call, with its own
-specific block of instructions.
-
- In addition, since `awk' is by design a very concise language, it is
-easy to lose sight of everything that is going on "inside" each line of
-`awk' code. The debugger provides the opportunity to look at the
-individual primitive instructions carried out by the higher-level `awk'
-commands.
-
-
-File: gawk.info, Node: Sample Debugging Session, Next: List of Debugger Commands, Prev: Debugging, Up: Debugger
-
-15.2 Sample Debugging Session
-=============================
-
-In order to illustrate the use of `gawk' as a debugger, let's look at a
-sample debugging session. We will use the `awk' implementation of the
-POSIX `uniq' command described earlier (*note Uniq Program::) as our
-example.
-
-* Menu:
-
-* Debugger Invocation:: How to Start the Debugger.
-* Finding The Bug:: Finding the Bug.
-
-
-File: gawk.info, Node: Debugger Invocation, Next: Finding The Bug, Up: Sample Debugging Session
-
-15.2.1 How to Start the Debugger
---------------------------------
-
-Starting the debugger is almost exactly like running `awk', except you
-have to pass an additional option `--debug' or the corresponding short
-option `-D'. The file(s) containing the program and any supporting
-code are given on the command line as arguments to one or more `-f'
-options. (`gawk' is not designed to debug command-line programs, only
-programs contained in files.) In our case, we invoke the debugger like
-this:
-
- $ gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile
-
-where both `getopt.awk' and `uniq.awk' are in `$AWKPATH'. (Experienced
-users of GDB or similar debuggers should note that this syntax is
-slightly different from what they are used to. With `gawk' debugger,
-the arguments for running the program are given in the command line to
-the debugger rather than as part of the `run' command at the debugger
-prompt.)
-
- Instead of immediately running the program on `inputfile', as `gawk'
-would ordinarily do, the debugger merely loads all the program source
-files, compiles them internally, and then gives us a prompt:
-
- gawk>
-
-from which we can issue commands to the debugger. At this point, no
-code has been executed.
-
-
-File: gawk.info, Node: Finding The Bug, Prev: Debugger Invocation, Up: Sample Debugging Session
-
-15.2.2 Finding the Bug
-----------------------
-
-Let's say that we are having a problem using (a faulty version of)
-`uniq.awk' in the "field-skipping" mode, and it doesn't seem to be
-catching lines which should be identical when skipping the first field,
-such as:
-
- awk is a wonderful program!
- gawk is a wonderful program!
-
- This could happen if we were thinking (C-like) of the fields in a
-record as being numbered in a zero-based fashion, so instead of the
-lines:
-
- clast = join(alast, fcount+1, n)
- cline = join(aline, fcount+1, m)
-
-we wrote:
-
- clast = join(alast, fcount, n)
- cline = join(aline, fcount, m)
-
- The first thing we usually want to do when trying to investigate a
-problem like this is to put a breakpoint in the program so that we can
-watch it at work and catch what it is doing wrong. A reasonable spot
-for a breakpoint in `uniq.awk' is at the beginning of the function
-`are_equal()', which compares the current line with the previous one.
-To set the breakpoint, use the `b' (breakpoint) command:
-
- gawk> b are_equal
- -| Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64
-
- The debugger tells us the file and line number where the breakpoint
-is. Now type `r' or `run' and the program runs until it hits the
-breakpoint for the first time:
-
- gawk> r
- -| Starting program:
- -| Stopping in Rule ...
- -| Breakpoint 1, are_equal(n, m, clast, cline, alast, aline)
- at `awklib/eg/prog/uniq.awk':64
- -| 64 if (fcount == 0 && charcount == 0)
- gawk>
-
- Now we can look at what's going on inside our program. First of all,
-let's see how we got to where we are. At the prompt, we type `bt'
-(short for "backtrace"), and the debugger responds with a listing of
-the current stack frames:
-
- gawk> bt
- -| #0 are_equal(n, m, clast, cline, alast, aline)
- at `awklib/eg/prog/uniq.awk':69
- -| #1 in main() at `awklib/eg/prog/uniq.awk':89
-
- This tells us that `are_equal()' was called by the main program at
-line 89 of `uniq.awk'. (This is not a big surprise, since this is the
-only call to `are_equal()' in the program, but in more complex
-programs, knowing who called a function and with what parameters can be
-the key to finding the source of the problem.)
-
- Now that we're in `are_equal()', we can start looking at the values
-of some variables. Let's say we type `p n' (`p' is short for "print").
-We would expect to see the value of `n', a parameter to `are_equal()'.
-Actually, the debugger gives us:
-
- gawk> p n
- -| n = untyped variable
-
-In this case, `n' is an uninitialized local variable, since the
-function was called without arguments (*note Function Calls::).
-
- A more useful variable to display might be the current record:
-
- gawk> p $0
- -| $0 = string ("gawk is a wonderful program!")
-
-This might be a bit puzzling at first since this is the second line of
-our test input above. Let's look at `NR':
-
- gawk> p NR
- -| NR = number (2)
-
-So we can see that `are_equal()' was only called for the second record
-of the file. Of course, this is because our program contained a rule
-for `NR == 1':
-
- NR == 1 {
- last = $0
- next
- }
-
- OK, let's just check that that rule worked correctly:
-
- gawk> p last
- -| last = string ("awk is a wonderful program!")
-
- Everything we have done so far has verified that the program has
-worked as planned, up to and including the call to `are_equal()', so
-the problem must be inside this function. To investigate further, we
-must begin "stepping through" the lines of `are_equal()'. We start by
-typing `n' (for "next"):
-
- gawk> n
- -| 67 if (fcount > 0) {
-
- This tells us that `gawk' is now ready to execute line 67, which
-decides whether to give the lines the special "field skipping" treatment
-indicated by the `-f' command-line option. (Notice that we skipped
-from where we were before at line 64 to here, since the condition in
-line 64
-
- if (fcount == 0 && charcount == 0)
-
-was false.)
-
- Continuing to step, we now get to the splitting of the current and
-last records:
-
- gawk> n
- -| 68 n = split(last, alast)
- gawk> n
- -| 69 m = split($0, aline)
-
- At this point, we should be curious to see what our records were
-split into, so we try to look:
-
- gawk> p n m alast aline
- -| n = number (5)
- -| m = number (5)
- -| alast = array, 5 elements
- -| aline = array, 5 elements
-
-(The `p' command can take more than one argument, similar to `awk''s
-`print' statement.)
-
- This is kind of disappointing, though. All we found out is that
-there are five elements in each of our arrays. Useful enough (we now
-know that none of the words were accidentally left out), but what if we
-want to see inside the array?
-
- The first choice would be to use subscripts:
-
- gawk> p alast[0]
- -| "0" not in array `alast'
-
-Oops!
-
- gawk> p alast[1]
- -| alast["1"] = string ("awk")
-
- This would be kind of slow for a 100-member array, though, so `gawk'
-provides a shortcut (reminiscent of another language not to be
-mentioned):
-
- gawk> p @alast
- -| alast["1"] = string ("awk")
- -| alast["2"] = string ("is")
- -| alast["3"] = string ("a")
- -| alast["4"] = string ("wonderful")
- -| alast["5"] = string ("program!")
-
- It looks like we got this far OK. Let's take another step or two:
-
- gawk> n
- -| 70 clast = join(alast, fcount, n)
- gawk> n
- -| 71 cline = join(aline, fcount, m)
-
- Well, here we are at our error (sorry to spoil the suspense). What
-we had in mind was to join the fields starting from the second one to
-make the virtual record to compare, and if the first field was numbered
-zero, this would work. Let's look at what we've got:
-
- gawk> p cline clast
- -| cline = string ("gawk is a wonderful program!")
- -| clast = string ("awk is a wonderful program!")
-
- Hey, those look pretty familiar! They're just our original,
-unaltered, input records. A little thinking (the human brain is still
-the best debugging tool), and we realize that we were off by one!
-
- We get out of the debugger:
-
- gawk> q
- -| The program is running. Exit anyway (y/n)? y
-
-Then we get into an editor:
-
- clast = join(alast, fcount+1, n)
- cline = join(aline, fcount+1, m)
-
-and problem solved!
-
-
-File: gawk.info, Node: List of Debugger Commands, Next: Readline Support, Prev: Sample Debugging Session, Up: Debugger
-
-15.3 Main Debugger Commands
-===========================
-
-The `gawk' debugger command set can be divided into the following
-categories:
-
- * Breakpoint control
-
- * Execution control
-
- * Viewing and changing data
-
- * Working with the stack
-
- * Getting information
-
- * Miscellaneous
-
- Each of these are discussed in the following subsections. In the
-following descriptions, commands which may be abbreviated show the
-abbreviation on a second description line. A debugger command name may
-also be truncated if that partial name is unambiguous. The debugger has
-the built-in capability to automatically repeat the previous command
-when just hitting <Enter>. This works for the commands `list', `next',
-`nexti', `step', `stepi' and `continue' executed without any argument.
-
-* Menu:
-
-* Breakpoint Control:: Control of Breakpoints.
-* Debugger Execution Control:: Control of Execution.
-* Viewing And Changing Data:: Viewing and Changing Data.
-* Execution Stack:: Dealing with the Stack.
-* Debugger Info:: Obtaining Information about the Program and
- the Debugger State.
-* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-
-
-File: gawk.info, Node: Breakpoint Control, Next: Debugger Execution Control, Up: List of Debugger Commands
-
-15.3.1 Control of Breakpoints
------------------------------
-
-As we saw above, the first thing you probably want to do in a debugging
-session is to get your breakpoints set up, since otherwise your program
-will just run as if it was not under the debugger. The commands for
-controlling breakpoints are:
-
-`break' [[FILENAME`:']N | FUNCTION] [`"EXPRESSION"']
-`b' [[FILENAME`:']N | FUNCTION] [`"EXPRESSION"']
- Without any argument, set a breakpoint at the next instruction to
- be executed in the selected stack frame. Arguments can be one of
- the following:
-
- N
- Set a breakpoint at line number N in the current source file.
-
- FILENAME`:'N
- Set a breakpoint at line number N in source file FILENAME.
-
- FUNCTION
- Set a breakpoint at entry to (the first instruction of)
- function FUNCTION.
-
- Each breakpoint is assigned a number which can be used to delete
- it from the breakpoint list using the `delete' command.
-
- With a breakpoint, you may also supply a condition. This is an
- `awk' expression (enclosed in double quotes) that the debugger
- evaluates whenever the breakpoint is reached. If the condition is
- true, then the debugger stops execution and prompts for a command.
- Otherwise, it continues executing the program.
-
-`clear' [[FILENAME`:']N | FUNCTION]
- Without any argument, delete any breakpoint at the next instruction
- to be executed in the selected stack frame. If the program stops at
- a breakpoint, this deletes that breakpoint so that the program
- does not stop at that location again. Arguments can be one of the
- following:
-
- N
- Delete breakpoint(s) set at line number N in the current
- source file.
-
- FILENAME`:'N
- Delete breakpoint(s) set at line number N in source file
- FILENAME.
-
- FUNCTION
- Delete breakpoint(s) set at entry to function FUNCTION.
-
-`condition' N `"EXPRESSION"'
- Add a condition to existing breakpoint or watchpoint N. The
- condition is an `awk' expression that the debugger evaluates
- whenever the breakpoint or watchpoint is reached. If the condition
- is true, then the debugger stops execution and prompts for a
- command. Otherwise, the debugger continues executing the program.
- If the condition expression is not specified, any existing
- condition is removed; i.e., the breakpoint or watchpoint is made
- unconditional.
-
-`delete' [N1 N2 ...] [N-M]
-`d' [N1 N2 ...] [N-M]
- Delete specified breakpoints or a range of breakpoints. Deletes
- all defined breakpoints if no argument is supplied.
-
-`disable' [N1 N2 ... | N-M]
- Disable specified breakpoints or a range of breakpoints. Without
- any argument, disables all breakpoints.
-
-`enable' [`del' | `once'] [N1 N2 ...] [N-M]
-`e' [`del' | `once'] [N1 N2 ...] [N-M]
- Enable specified breakpoints or a range of breakpoints. Without
- any argument, enables all breakpoints. Optionally, you can
- specify how to enable the breakpoint:
-
- `del'
- Enable the breakpoint(s) temporarily, then delete it when the
- program stops at the breakpoint.
-
- `once'
- Enable the breakpoint(s) temporarily, then disable it when
- the program stops at the breakpoint.
-
-`ignore' N COUNT
- Ignore breakpoint number N the next COUNT times it is hit.
-
-`tbreak' [[FILENAME`:']N | FUNCTION]
-`t' [[FILENAME`:']N | FUNCTION]
- Set a temporary breakpoint (enabled for only one stop). The
- arguments are the same as for `break'.
-
-
-File: gawk.info, Node: Debugger Execution Control, Next: Viewing And Changing Data, Prev: Breakpoint Control, Up: List of Debugger Commands
-
-15.3.2 Control of Execution
----------------------------
-
-Now that your breakpoints are ready, you can start running the program
-and observing its behavior. There are more commands for controlling
-execution of the program than we saw in our earlier example:
-
-`commands' [N]
-`silent'
-...
-`end'
- Set a list of commands to be executed upon stopping at a
- breakpoint or watchpoint. N is the breakpoint or watchpoint number.
- Without a number, the last one set is used. The actual commands
- follow, starting on the next line, and terminated by the `end'
- command. If the command `silent' is in the list, the usual
- messages about stopping at a breakpoint and the source line are
- not printed. Any command in the list that resumes execution (e.g.,
- `continue') terminates the list (an implicit `end'), and
- subsequent commands are ignored. For example:
-
- gawk> commands
- > silent
- > printf "A silent breakpoint; i = %d\n", i
- > info locals
- > set i = 10
- > continue
- > end
- gawk>
-
-`continue' [COUNT]
-`c' [COUNT]
- Resume program execution. If continued from a breakpoint and COUNT
- is specified, ignores the breakpoint at that location the next
- COUNT times before stopping.
-
-`finish'
- Execute until the selected stack frame returns. Print the
- returned value.
-
-`next' [COUNT]
-`n' [COUNT]
- Continue execution to the next source line, stepping over function
- calls. The argument COUNT controls how many times to repeat the
- action, as in `step'.
-
-`nexti' [COUNT]
-`ni' [COUNT]
- Execute one (or COUNT) instruction(s), stepping over function
- calls.
-
-`return' [VALUE]
- Cancel execution of a function call. If VALUE (either a string or a
- number) is specified, it is used as the function's return value.
- If used in a frame other than the innermost one (the currently
- executing function, i.e., frame number 0), discard all inner
- frames in addition to the selected one, and the caller of that
- frame becomes the innermost frame.
-
-`run'
-`r'
- Start/restart execution of the program. When restarting, the
- debugger retains the current breakpoints, watchpoints, command
- history, automatic display variables, and debugger options.
-
-`step' [COUNT]
-`s' [COUNT]
- Continue execution until control reaches a different source line
- in the current stack frame. `step' steps inside any function
- called within the line. If the argument COUNT is supplied, steps
- that many times before stopping, unless it encounters a breakpoint
- or watchpoint.
-
-`stepi' [COUNT]
-`si' [COUNT]
- Execute one (or COUNT) instruction(s), stepping inside function
- calls. (For illustration of what is meant by an "instruction" in
- `gawk', see the output shown under `dump' in *note Miscellaneous
- Debugger Commands::.)
-
-`until' [[FILENAME`:']N | FUNCTION]
-`u' [[FILENAME`:']N | FUNCTION]
- Without any argument, continue execution until a line past the
- current line in current stack frame is reached. With an argument,
- continue execution until the specified location is reached, or the
- current stack frame returns.
-
-
-File: gawk.info, Node: Viewing And Changing Data, Next: Execution Stack, Prev: Debugger Execution Control, Up: List of Debugger Commands
-
-15.3.3 Viewing and Changing Data
---------------------------------
-
-The commands for viewing and changing variables inside of `gawk' are:
-
-`display' [VAR | `$'N]
- Add variable VAR (or field `$N') to the display list. The value
- of the variable or field is displayed each time the program stops.
- Each variable added to the list is identified by a unique number:
-
- gawk> display x
- -| 10: x = 1
-
- displays the assigned item number, the variable name and its
- current value. If the display variable refers to a function
- parameter, it is silently deleted from the list as soon as the
- execution reaches a context where no such variable of the given
- name exists. Without argument, `display' displays the current
- values of items on the list.
-
-`eval "AWK STATEMENTS"'
- Evaluate AWK STATEMENTS in the context of the running program.
- You can do anything that an `awk' program would do: assign values
- to variables, call functions, and so on.
-
-`eval' PARAM, ...
-AWK STATEMENTS
-`end'
- This form of `eval' is similar, but it allows you to define "local
- variables" that exist in the context of the AWK STATEMENTS,
- instead of using variables or function parameters defined by the
- program.
-
-`print' VAR1[`,' VAR2 ...]
-`p' VAR1[`,' VAR2 ...]
- Print the value of a `gawk' variable or field. Fields must be
- referenced by constants:
-
- gawk> print $3
-
- This prints the third field in the input record (if the specified
- field does not exist, it prints `Null field'). A variable can be
- an array element, with the subscripts being constant values. To
- print the contents of an array, prefix the name of the array with
- the `@' symbol:
-
- gawk> print @a
-
- This prints the indices and the corresponding values for all
- elements in the array `a'.
-
-`printf' FORMAT [`,' ARG ...]
- Print formatted text. The FORMAT may include escape sequences,
- such as `\n' (*note Escape Sequences::). No newline is printed
- unless one is specified.
-
-`set' VAR`='VALUE
- Assign a constant (number or string) value to an `awk' variable or
- field. String values must be enclosed between double quotes
- (`"..."').
-
- You can also set special `awk' variables, such as `FS', `NF',
- `NR', etc.
-
-`watch' VAR | `$'N [`"EXPRESSION"']
-`w' VAR | `$'N [`"EXPRESSION"']
- Add variable VAR (or field `$N') to the watch list. The debugger
- then stops whenever the value of the variable or field changes.
- Each watched item is assigned a number which can be used to delete
- it from the watch list using the `unwatch' command.
-
- With a watchpoint, you may also supply a condition. This is an
- `awk' expression (enclosed in double quotes) that the debugger
- evaluates whenever the watchpoint is reached. If the condition is
- true, then the debugger stops execution and prompts for a command.
- Otherwise, `gawk' continues executing the program.
-
-`undisplay' [N]
- Remove item number N (or all items, if no argument) from the
- automatic display list.
-
-`unwatch' [N]
- Remove item number N (or all items, if no argument) from the watch
- list.
-
-
-
-File: gawk.info, Node: Execution Stack, Next: Debugger Info, Prev: Viewing And Changing Data, Up: List of Debugger Commands
-
-15.3.4 Dealing with the Stack
------------------------------
-
-Whenever you run a program which contains any function calls, `gawk'
-maintains a stack of all of the function calls leading up to where the
-program is right now. You can see how you got to where you are, and
-also move around in the stack to see what the state of things was in the
-functions which called the one you are in. The commands for doing this
-are:
-
-`backtrace' [COUNT]
-`bt' [COUNT]
- Print a backtrace of all function calls (stack frames), or
- innermost COUNT frames if COUNT > 0. Print the outermost COUNT
- frames if COUNT < 0. The backtrace displays the name and
- arguments to each function, the source file name, and the line
- number.
-
-`down' [COUNT]
- Move COUNT (default 1) frames down the stack toward the innermost
- frame. Then select and print the frame.
-
-`frame' [N]
-`f' [N]
- Select and print (frame number, function and argument names,
- source file, and the source line) stack frame N. Frame 0 is the
- currently executing, or "innermost", frame (function call), frame
- 1 is the frame that called the innermost one. The highest numbered
- frame is the one for the main program.
-
-`up' [COUNT]
- Move COUNT (default 1) frames up the stack toward the outermost
- frame. Then select and print the frame.
-
-
-File: gawk.info, Node: Debugger Info, Next: Miscellaneous Debugger Commands, Prev: Execution Stack, Up: List of Debugger Commands
-
-15.3.5 Obtaining Information about the Program and the Debugger State
----------------------------------------------------------------------
-
-Besides looking at the values of variables, there is often a need to get
-other sorts of information about the state of your program and of the
-debugging environment itself. The `gawk' debugger has one command which
-provides this information, appropriately called `info'. `info' is used
-with one of a number of arguments that tell it exactly what you want to
-know:
-
-`info' WHAT
-`i' WHAT
- The value for WHAT should be one of the following:
-
- `args'
- Arguments of the selected frame.
-
- `break'
- List all currently set breakpoints.
-
- `display'
- List all items in the automatic display list.
-
- `frame'
- Description of the selected stack frame.
-
- `functions'
- List all function definitions including source file names and
- line numbers.
-
- `locals'
- Local variables of the selected frame.
-
- `source'
- The name of the current source file. Each time the program
- stops, the current source file is the file containing the
- current instruction. When the debugger first starts, the
- current source file is the first file included via the `-f'
- option. The `list FILENAME:LINENO' command can be used at any
- time to change the current source.
-
- `sources'
- List all program sources.
-
- `variables'
- List all global variables.
-
- `watch'
- List all items in the watch list.
-
- Additional commands give you control over the debugger, the ability
-to save the debugger's state, and the ability to run debugger commands
-from a file. The commands are:
-
-`option' [NAME[`='VALUE]]
-`o' [NAME[`='VALUE]]
- Without an argument, display the available debugger options and
- their current values. `option NAME' shows the current value of the
- named option. `option NAME=VALUE' assigns a new value to the named
- option. The available options are:
-
- `history_size'
- The maximum number of lines to keep in the history file
- `./.gawk_history'. The default is 100.
-
- `listsize'
- The number of lines that `list' prints. The default is 15.
-
- `outfile'
- Send `gawk' output to a file; debugger output still goes to
- standard output. An empty string (`""') resets output to
- standard output.
-
- `prompt'
- The debugger prompt. The default is `gawk> '.
-
- `save_history [on | off]'
- Save command history to file `./.gawk_history'. The default
- is `on'.
-
- `save_options [on | off]'
- Save current options to file `./.gawkrc' upon exit. The
- default is `on'. Options are read back in to the next
- session upon startup.
-
- `trace [on | off]'
- Turn instruction tracing on or off. The default is `off'.
-
-`save' FILENAME
- Save the commands from the current session to the given file name,
- so that they can be replayed using the `source' command.
-
-`source' FILENAME
- Run command(s) from a file; an error in any command does not
- terminate execution of subsequent commands. Comments (lines
- starting with `#') are allowed in a command file. Empty lines are
- ignored; they do _not_ repeat the last command. You can't restart
- the program by having more than one `run' command in the file.
- Also, the list of commands may include additional `source'
- commands; however, the `gawk' debugger will not source the same
- file more than once in order to avoid infinite recursion.
-
- In addition to, or instead of the `source' command, you can use
- the `-D FILE' or `--debug=FILE' command-line options to execute
- commands from a file non-interactively (*note Options::.
-
-
-File: gawk.info, Node: Miscellaneous Debugger Commands, Prev: Debugger Info, Up: List of Debugger Commands
-
-15.3.6 Miscellaneous Commands
------------------------------
-
-There are a few more commands which do not fit into the previous
-categories, as follows:
-
-`dump' [FILENAME]
- Dump bytecode of the program to standard output or to the file
- named in FILENAME. This prints a representation of the internal
- instructions which `gawk' executes to implement the `awk' commands
- in a program. This can be very enlightening, as the following
- partial dump of Davide Brini's obfuscated code (*note Signature
- Program::) demonstrates:
-
- gawk> dump
- -| # BEGIN
- -|
- -| [ 2:0x89faef4] Op_rule : [in_rule = BEGIN] [source_file = brini.awk]
- -| [ 3:0x89fa428] Op_push_i : "~" [PERM|STRING|STRCUR]
- -| [ 3:0x89fa464] Op_push_i : "~" [PERM|STRING|STRCUR]
- -| [ 3:0x89fa450] Op_match :
- -| [ 3:0x89fa3ec] Op_store_var : O [do_reference = FALSE]
- -| [ 4:0x89fa48c] Op_push_i : "==" [PERM|STRING|STRCUR]
- -| [ 4:0x89fa4c8] Op_push_i : "==" [PERM|STRING|STRCUR]
- -| [ 4:0x89fa4b4] Op_equal :
- -| [ 4:0x89fa400] Op_store_var : o [do_reference = FALSE]
- -| [ 5:0x89fa4f0] Op_push : o
- -| [ 5:0x89fa4dc] Op_plus_i : 0 [PERM|NUMCUR|NUMBER]
- -| [ 5:0x89fa414] Op_push_lhs : o [do_reference = TRUE]
- -| [ 5:0x89fa4a0] Op_assign_plus :
- -| [ :0x89fa478] Op_pop :
- -| [ 6:0x89fa540] Op_push : O
- -| [ 6:0x89fa554] Op_push_i : "" [PERM|STRING|STRCUR]
- -| [ :0x89fa5a4] Op_no_op :
- -| [ 6:0x89fa590] Op_push : O
- -| [ :0x89fa5b8] Op_concat : [expr_count = 3] [concat_flag = 0]
- -| [ 6:0x89fa518] Op_store_var : x [do_reference = FALSE]
- -| [ 7:0x89fa504] Op_push_loop : [target_continue = 0x89fa568] [target_break = 0x89fa680]
- -| [ 7:0x89fa568] Op_push_lhs : X [do_reference = TRUE]
- -| [ 7:0x89fa52c] Op_postincrement :
- -| [ 7:0x89fa5e0] Op_push : x
- -| [ 7:0x89fa61c] Op_push : o
- -| [ 7:0x89fa5f4] Op_plus :
- -| [ 7:0x89fa644] Op_push : o
- -| [ 7:0x89fa630] Op_plus :
- -| [ 7:0x89fa5cc] Op_leq :
- -| [ :0x89fa57c] Op_jmp_false : [target_jmp = 0x89fa680]
- -| [ 7:0x89fa694] Op_push_i : "%c" [PERM|STRING|STRCUR]
- -| [ :0x89fa6d0] Op_no_op :
- -| [ 7:0x89fa608] Op_assign_concat : c
- -| [ :0x89fa6a8] Op_jmp : [target_jmp = 0x89fa568]
- -| [ :0x89fa680] Op_pop_loop :
- -|
- ...
- -|
- -| [ 8:0x89fa658] Op_K_printf : [expr_count = 17] [redir_type = ""]
- -| [ :0x89fa374] Op_no_op :
- -| [ :0x89fa3d8] Op_atexit :
- -| [ :0x89fa6bc] Op_stop :
- -| [ :0x89fa39c] Op_no_op :
- -| [ :0x89fa3b0] Op_after_beginfile :
- -| [ :0x89fa388] Op_no_op :
- -| [ :0x89fa3c4] Op_after_endfile :
- gawk>
-
-`help'
-`h'
- Print a list of all of the `gawk' debugger commands with a short
- summary of their usage. `help COMMAND' prints the information
- about the command COMMAND.
-
-`list' [`-' | `+' | N | FILENAME`:'N | N-M | FUNCTION]
-`l' [`-' | `+' | N | FILENAME`:'N | N-M | FUNCTION]
- Print the specified lines (default 15) from the current source file
- or the file named FILENAME. The possible arguments to `list' are
- as follows:
-
- `-'
- Print lines before the lines last printed.
-
- `+'
- Print lines after the lines last printed. `list' without any
- argument does the same thing.
-
- N
- Print lines centered around line number N.
-
- N-M
- Print lines from N to M.
-
- FILENAME`:'N
- Print lines centered around line number N in source file
- FILENAME. This command may change the current source file.
-
- FUNCTION
- Print lines centered around beginning of the function
- FUNCTION. This command may change the current source file.
-
-`quit'
-`q'
- Exit the debugger. Debugging is great fun, but sometimes we all
- have to tend to other obligations in life, and sometimes we find
- the bug, and are free to go on to the next one! As we saw above,
- if you are running a program, the debugger warns you if you
- accidentally type `q' or `quit', to make sure you really want to
- quit.
-
-`trace' `on' | `off'
- Turn on or off a continuous printing of instructions which are
- about to be executed, along with printing the `awk' line which they
- implement. The default is `off'.
-
- It is to be hoped that most of the "opcodes" in these instructions
- are fairly self-explanatory, and using `stepi' and `nexti' while
- `trace' is on will make them into familiar friends.
-
-
-
-File: gawk.info, Node: Readline Support, Next: Limitations, Prev: List of Debugger Commands, Up: Debugger
-
-15.4 Readline Support
-=====================
-
-If `gawk' is compiled with the `readline' library, you can take
-advantage of that library's command completion and history expansion
-features. The following types of completion are available:
-
-Command completion
- Command names.
-
-Source file name completion
- Source file names. Relevant commands are `break', `clear', `list',
- `tbreak', and `until'.
-
-Argument completion
- Non-numeric arguments to a command. Relevant commands are
- `enable' and `info'.
-
-Variable name completion
- Global variable names, and function arguments in the current
- context if the program is running. Relevant commands are `display',
- `print', `set', and `watch'.
-
-
-
-File: gawk.info, Node: Limitations, Prev: Readline Support, Up: Debugger
-
-15.5 Limitations and Future Plans
-=================================
-
-We hope you find the `gawk' debugger useful and enjoyable to work with,
-but as with any program, especially in its early releases, it still has
-some limitations. A few which are worth being aware of are:
-
- * At this point, the debugger does not give a detailed explanation of
- what you did wrong when you type in something it doesn't like.
- Rather, it just responds `syntax error'. When you do figure out
- what your mistake was, though, you'll feel like a real guru.
-
- * If you perused the dump of opcodes in *note Miscellaneous Debugger
- Commands::, (or if you are already familiar with `gawk' internals),
- you will realize that much of the internal manipulation of data in
- `gawk', as in many interpreters, is done on a stack. `Op_push',
- `Op_pop', etc., are the "bread and butter" of most `gawk' code.
- Unfortunately, as of now, the `gawk' debugger does not allow you
- to examine the stack's contents.
-
- That is, the intermediate results of expression evaluation are on
- the stack, but cannot be printed. Rather, only variables which
- are defined in the program can be printed. Of course, a
- workaround for this is to use more explicit variables at the
- debugging stage and then change back to obscure, perhaps more
- optimal code later.
-
- * There is no way to look "inside" the process of compiling regular
- expressions to see if you got it right. As an `awk' programmer,
- you are expected to know what `/[^[:alnum:][:blank:]]/' means.
-
- * The `gawk' debugger is designed to be used by running a program
- (with all its parameters) on the command line, as described in
- *note Debugger Invocation::. There is no way (as of now) to
- attach or "break in" to a running program. This seems reasonable
- for a language which is used mainly for quickly executing, short
- programs.
-
- * The `gawk' debugger only accepts source supplied with the `-f'
- option.
-
- Look forward to a future release when these and other missing
-features may be added, and of course feel free to try to add them
-yourself!
-
-
-File: gawk.info, Node: Language History, Next: Installation, Prev: Debugger, Up: Top
-
-Appendix A The Evolution of the `awk' Language
-**********************************************
-
-This Info file describes the GNU implementation of `awk', which follows
-the POSIX specification. Many long-time `awk' users learned `awk'
-programming with the original `awk' implementation in Version 7 Unix.
-(This implementation was the basis for `awk' in Berkeley Unix, through
-4.3-Reno. Subsequent versions of Berkeley Unix, and some systems
-derived from 4.4BSD-Lite, use various versions of `gawk' for their
-`awk'.) This major node briefly describes the evolution of the `awk'
-language, with cross-references to other parts of the Info file where
-you can find more information.
-
-* Menu:
-
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version of
- `awk'.
-* POSIX/GNU:: The extensions in `gawk' not in POSIX
- `awk'.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to `gawk'.
-
-
-File: gawk.info, Node: V7/SVR3.1, Next: SVR4, Up: Language History
-
-A.1 Major Changes Between V7 and SVR3.1
-=======================================
-
-The `awk' language evolved considerably between the release of Version
-7 Unix (1978) and the new version that was first made generally
-available in System V Release 3.1 (1987). This minor node summarizes
-the changes, with cross-references to further details:
-
- * The requirement for `;' to separate rules on a line (*note
- Statements/Lines::).
-
- * User-defined functions and the `return' statement (*note
- User-defined::).
-
- * The `delete' statement (*note Delete::).
-
- * The `do'-`while' statement (*note Do Statement::).
-
- * The built-in functions `atan2()', `cos()', `sin()', `rand()', and
- `srand()' (*note Numeric Functions::).
-
- * The built-in functions `gsub()', `sub()', and `match()' (*note
- String Functions::).
-
- * The built-in functions `close()' and `system()' (*note I/O
- Functions::).
-
- * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
- built-in variables (*note Built-in Variables::).
-
- * Assignable `$0' (*note Changing Fields::).
-
- * The conditional expression using the ternary operator `?:' (*note
- Conditional Exp::).
-
- * The expression `INDEX-VARIABLE in ARRAY' outside of `for'
- statements (*note Reference to Elements::).
-
- * The exponentiation operator `^' (*note Arithmetic Ops::) and its
- assignment operator form `^=' (*note Assignment Ops::).
-
- * C-compatible operator precedence, which breaks some old `awk'
- programs (*note Precedence::).
-
- * Regexps as the value of `FS' (*note Field Separators::) and as the
- third argument to the `split()' function (*note String
- Functions::), rather than using only the first character of `FS'.
-
- * Dynamic regexps as operands of the `~' and `!~' operators (*note
- Regexp Usage::).
-
- * The escape sequences `\b', `\f', and `\r' (*note Escape
- Sequences::). (Some vendors have updated their old versions of
- `awk' to recognize `\b', `\f', and `\r', but this is not something
- you can rely on.)
-
- * Redirection of input for the `getline' function (*note Getline::).
-
- * Multiple `BEGIN' and `END' rules (*note BEGIN/END::).
-
- * Multidimensional arrays (*note Multi-dimensional::).
-
-
-File: gawk.info, Node: SVR4, Next: POSIX, Prev: V7/SVR3.1, Up: Language History
-
-A.2 Changes Between SVR3.1 and SVR4
-===================================
-
-The System V Release 4 (1989) version of Unix `awk' added these features
-(some of which originated in `gawk'):
-
- * The `ENVIRON' array (*note Built-in Variables::).
-
- * Multiple `-f' options on the command line (*note Options::).
-
- * The `-v' option for assigning variables before program execution
- begins (*note Options::).
-
- * The `--' option for terminating command-line options.
-
- * The `\a', `\v', and `\x' escape sequences (*note Escape
- Sequences::).
-
- * A defined return value for the `srand()' built-in function (*note
- Numeric Functions::).
-
- * The `toupper()' and `tolower()' built-in string functions for case
- translation (*note String Functions::).
-
- * A cleaner specification for the `%c' format-control letter in the
- `printf' function (*note Control Letters::).
-
- * The ability to dynamically pass the field width and precision
- (`"%*.*d"') in the argument list of the `printf' function (*note
- Control Letters::).
-
- * The use of regexp constants, such as `/foo/', as expressions, where
- they are equivalent to using the matching operator, as in `$0 ~
- /foo/' (*note Using Constant Regexps::).
-
- * Processing of escape sequences inside command-line variable
- assignments (*note Assignment Options::).
-
-
-File: gawk.info, Node: POSIX, Next: BTL, Prev: SVR4, Up: Language History
-
-A.3 Changes Between SVR4 and POSIX `awk'
-========================================
-
-The POSIX Command Language and Utilities standard for `awk' (1992)
-introduced the following changes into the language:
-
- * The use of `-W' for implementation-specific options (*note
- Options::).
-
- * The use of `CONVFMT' for controlling the conversion of numbers to
- strings (*note Conversion::).
-
- * The concept of a numeric string and tighter comparison rules to go
- with it (*note Typing and Comparison::).
-
- * The use of built-in variables as function parameter names is
- forbidden (*note Definition Syntax::.
-
- * More complete documentation of many of the previously undocumented
- features of the language.
-
- *Note Common Extensions::, for a list of common extensions not
-permitted by the POSIX standard.
-
- The 2008 POSIX standard can be found online at
-`http://www.opengroup.org/onlinepubs/9699919799/'.
-
-
-File: gawk.info, Node: BTL, Next: POSIX/GNU, Prev: POSIX, Up: Language History
-
-A.4 Extensions in Brian Kernighan's `awk'
-=========================================
-
-Brian Kernighan has made his version available via his home page (*note
-Other Versions::).
-
- This minor node describes common extensions that originally appeared
-in his version of `awk'.
-
- * The `**' and `**=' operators (*note Arithmetic Ops:: and *note
- Assignment Ops::).
-
- * The use of `func' as an abbreviation for `function' (*note
- Definition Syntax::).
-
- * The `fflush()' built-in function for flushing buffered output
- (*note I/O Functions::).
-
-
- *Note Common Extensions::, for a full list of the extensions
-available in his `awk'.
-
-
-File: gawk.info, Node: POSIX/GNU, Next: Common Extensions, Prev: BTL, Up: Language History
-
-A.5 Extensions in `gawk' Not in POSIX `awk'
-===========================================
-
-The GNU implementation, `gawk', adds a large number of features. They
-can all be disabled with either the `--traditional' or `--posix' options
-(*note Options::).
-
- A number of features have come and gone over the years. This minor
-node summarizes the additional features over POSIX `awk' that are in
-the current version of `gawk'.
-
- * Additional built-in variables:
-
- - The `ARGIND' `BINMODE', `ERRNO', `FIELDWIDTHS', `FPAT',
- `IGNORECASE', `LINT', `PROCINFO', `RT', and `TEXTDOMAIN'
- variables (*note Built-in Variables::).
-
- * Special files in I/O redirections:
-
- - The `/dev/stdin', `/dev/stdout', `/dev/stderr' and
- `/dev/fd/N' special file names (*note Special Files::).
-
- - The `/inet', `/inet4', and `/inet6' special files for TCP/IP
- networking using `|&' to specify which version of the IP
- protocol to use. (*note TCP/IP Networking::).
-
- * Changes and/or additions to the language:
-
- - The `\x' escape sequence (*note Escape Sequences::).
-
- - Full support for both POSIX and GNU regexps (*note Regexp::).
-
- - The ability for `FS' and for the third argument to `split()'
- to be null strings (*note Single Character Fields::).
-
- - The ability for `RS' to be a regexp (*note Records::).
-
- - The ability to use octal and hexadecimal constants in `awk'
- program source code (*note Nondecimal-numbers::).
-
- - The `|&' operator for two-way I/O to a coprocess (*note
- Two-way I/O::).
-
- - Indirect function calls (*note Indirect Calls::).
-
- - Directories on the command line produce a warning and are
- skipped (*note Command line directories::).
-
- * New keywords:
-
- - The `BEGINFILE' and `ENDFILE' special patterns. (*note
- BEGINFILE/ENDFILE::).
-
- - The ability to delete all of an array at once with `delete
- ARRAY' (*note Delete::).
-
- - The `nextfile' statement (*note Nextfile Statement::).
-
- - The `switch' statement (*note Switch Statement::).
-
- * Changes to standard `awk' functions:
-
- - The optional second argument to `close()' that allows closing
- one end of a two-way pipe to a coprocess (*note Two-way
- I/O::).
-
- - POSIX compliance for `gsub()' and `sub()'.
-
- - The `length()' function accepts an array argument and returns
- the number of elements in the array (*note String
- Functions::).
-
- - The optional third argument to the `match()' function for
- capturing text-matching subexpressions within a regexp (*note
- String Functions::).
-
- - Positional specifiers in `printf' formats for making
- translations easier (*note Printf Ordering::).
-
- - The `split()' function's additional optional fourth argument
- which is an array to hold the text of the field separators.
- (*note String Functions::).
-
- * Additional functions only in `gawk':
-
- - The `and()', `compl()', `lshift()', `or()', `rshift()', and
- `xor()' functions for bit manipulation (*note Bitwise
- Functions::).
-
- - The `asort()' and `asorti()' functions for sorting arrays
- (*note Array Sorting::).
-
- - The `bindtextdomain()', `dcgettext()' and `dcngettext()'
- functions for internationalization (*note Programmer i18n::).
-
- - The `extension()' built-in function and the ability to add
- new functions dynamically (*note Dynamic Extensions::).
-
- - The `fflush()' function from Brian Kernighan's version of
- `awk' (*note I/O Functions::).
-
- - The `gensub()', `patsplit()', and `strtonum()' functions for
- more powerful text manipulation (*note String Functions::).
-
- - The `mktime()', `systime()', and `strftime()' functions for
- working with timestamps (*note Time Functions::).
-
- * Changes and/or additions in the command-line options:
-
- - The `AWKPATH' environment variable for specifying a path
- search for the `-f' command-line option (*note Options::).
-
- - The `AWKLIBPATH' environment variable for specifying a path
- search for the `-l' command-line option (*note Options::).
-
- - The ability to use GNU-style long-named options that start
- with `--' and the `--characters-as-bytes', `--compat',
- `--dump-variables', `--exec', `--gen-pot', `--lint',
- `--lint-old', `--non-decimal-data', `--posix', `--profile',
- `--re-interval', `--sandbox', `--source', `--traditional', and
- `--use-lc-numeric' options (*note Options::).
-
- * Support for the following obsolete systems was removed from the
- code and the documentation for `gawk' version 4.0:
-
- - Amiga
-
- - Atari
-
- - BeOS
-
- - Cray
-
- - MIPS RiscOS
-
- - MS-DOS with the Microsoft Compiler
-
- - MS-Windows with the Microsoft Compiler
-
- - NeXT
-
- - SunOS 3.x, Sun 386 (Road Runner)
-
- - Tandem (non-POSIX)
-
- - Prestandard VAX C compiler for VAX/VMS
-
-
-
-
-File: gawk.info, Node: Common Extensions, Next: Ranges and Locales, Prev: POSIX/GNU, Up: Language History
-
-A.6 Common Extensions Summary
-=============================
-
-This minor node summarizes the common extensions supported by `gawk',
-Brian Kernighan's `awk', and `mawk', the three most widely-used freely
-available versions of `awk' (*note Other Versions::).
-
-Feature BWK Awk Mawk GNU Awk
---------------------------------------------------------
-`\x' Escape sequence X X X
-`RS' as regexp X X
-`FS' as null string X X X
-`/dev/stdin' special file X X
-`/dev/stdout' special file X X X
-`/dev/stderr' special file X X X
-`**' and `**=' operators X X
-`func' keyword X X
-`nextfile' statement X X X
-`delete' without subscript X X X
-`length()' of an array X X
-`fflush()' function X X X
-`BINMODE' variable X X
-
-
-File: gawk.info, Node: Ranges and Locales, Next: Contributors, Prev: Common Extensions, Up: Language History
-
-A.7 Regexp Ranges and Locales: A Long Sad Story
-===============================================
-
-This minor node describes the confusing history of ranges within
-regular expressions and their interactions with locales, and how this
-affected different versions of `gawk'.
-
- The original Unix tools that worked with regular expressions defined
-character ranges (such as `[a-z]') to match any character between the
-first character in the range and the last character in the range,
-inclusive. Ordering was based on the numeric value of each character
-in the machine's native character set. Thus, on ASCII-based systems,
-`[a-z]' matched all the lowercase letters, and only the lowercase
-letters, since the numeric values for the letters from `a' through `z'
-were contiguous. (On an EBCDIC system, the range `[a-z]' includes
-additional, non-alphabetic characters as well.)
-
- Almost all introductory Unix literature explained range expressions
-as working in this fashion, and in particular, would teach that the
-"correct" way to match lowercase letters was with `[a-z]', and that
-`[A-Z]' was the "correct" way to match uppercase letters. And indeed,
-this was true.
-
- The 1993 POSIX standard introduced the idea of locales (*note
-Locales::). Since many locales include other letters besides the plain
-twenty-six letters of the American English alphabet, the POSIX standard
-added character classes (*note Bracket Expressions::) as a way to match
-different kinds of characters besides the traditional ones in the ASCII
-character set.
-
- However, the standard _changed_ the interpretation of range
-expressions. In the `"C"' and `"POSIX"' locales, a range expression
-like `[a-dx-z]' is still equivalent to `[abcdxyz]', as in ASCII. But
-outside those locales, the ordering was defined to be based on
-"collation order".
-
- In many locales, `A' and `a' are both less than `B'. In other
-words, these locales sort characters in dictionary order, and
-`[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might
-be equivalent to `[aBbCcdXxYyz]', for example.
-
- This point needs to be emphasized: Much literature teaches that you
-should use `[a-z]' to match a lowercase character. But on systems with
-non-ASCII locales, this also matched all of the uppercase characters
-except `Z'! This was a continuous cause of confusion, even well into
-the twenty-first century.
-
- To demonstrate these issues, the following example uses the `sub()'
-function, which does text replacement (*note String Functions::). Here,
-the intent is to remove trailing uppercase characters:
-
- $ echo something1234abc | gawk-3.1.8 '{ sub("[A-Z]*$", ""); print }'
- -| something1234a
-
-This output is unexpected, since the `bc' at the end of
-`something1234abc' should not normally match `[A-Z]*'. This result is
-due to the locale setting (and thus you may not see it on your system).
-
- Similar considerations apply to other ranges. For example, `["-/]'
-is perfectly valid in ASCII, but is not valid in many Unicode locales,
-such as `en_US.UTF-8'.
-
- Early versions of `gawk' used regexp matching code that was not
-locale aware, so ranges had their traditional interpretation.
-
- When `gawk' switched to using locale-aware regexp matchers, the
-problems began; especially as both GNU/Linux and commercial Unix
-vendors started implementing non-ASCII locales, _and making them the
-default_. Perhaps the most frequently asked question became something
-like "why does `[A-Z]' match lowercase letters?!?"
-
- This situation existed for close to 10 years, if not more, and the
-`gawk' maintainer grew weary of trying to explain that `gawk' was being
-nicely standards-compliant, and that the issue was in the user's
-locale. During the development of version 4.0, he modified `gawk' to
-always treat ranges in the original, pre-POSIX fashion, unless
-`--posix' was used (*note Options::).
-
- Fortunately, shortly before the final release of `gawk' 4.0, the
-maintainer learned that the 2008 standard had changed the definition of
-ranges, such that outside the `"C"' and `"POSIX"' locales, the meaning
-of range expressions was _undefined_.(1)
-
- By using this lovely technical term, the standard gives license to
-implementors to implement ranges in whatever way they choose. The
-`gawk' maintainer chose to apply the pre-POSIX meaning in all cases:
-the default regexp matching; with `--traditional', and with `--posix';
-in all cases, `gawk' remains POSIX compliant.
-
- ---------- Footnotes ----------
-
- (1) See the standard
-(http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05)
-and its rationale
-(http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05).
-
-
-File: gawk.info, Node: Contributors, Prev: Ranges and Locales, Up: Language History
-
-A.8 Major Contributors to `gawk'
-================================
-
- Always give credit where credit is due.
- Anonymous
-
- This minor node names the major contributors to `gawk' and/or this
-Info file, in approximate chronological order:
-
- * Dr. Alfred V. Aho, Dr. Peter J. Weinberger, and Dr. Brian W.
- Kernighan, all of Bell Laboratories, designed and implemented Unix
- `awk', from which `gawk' gets the majority of its feature set.
-
- * Paul Rubin did the initial design and implementation in 1986, and
- wrote the first draft (around 40 pages) of this Info file.
-
- * Jay Fenlason finished the initial implementation.
-
- * Diane Close revised the first draft of this Info file, bringing it
- to around 90 pages.
-
- * Richard Stallman helped finish the implementation and the initial
- draft of this Info file. He is also the founder of the FSF and
- the GNU project.
-
- * John Woods contributed parts of the code (mostly fixes) in the
- initial version of `gawk'.
-
- * In 1988, David Trueman took over primary maintenance of `gawk',
- making it compatible with "new" `awk', and greatly improving its
- performance.
-
- * Conrad Kwok, Scott Garfinkle, and Kent Williams did the initial
- ports to MS-DOS with various versions of MSC.
-
- * Pat Rankin provided the VMS port and its documentation.
-
- * Hal Peterson provided help in porting `gawk' to Cray systems.
- (This is no longer supported.)
-
- * Kai Uwe Rommel provided the initial port to OS/2 and its
- documentation.
-
- * Michal Jaegermann provided the port to Atari systems and its
- documentation. (This port is no longer supported.) He continues
- to provide portability checking with DEC Alpha systems, and has
- done a lot of work to make sure `gawk' works on non-32-bit systems.
-
- * Fred Fish provided the port to Amiga systems and its documentation.
- (With Fred's sad passing, this is no longer supported.)
-
- * Scott Deifik currently maintains the MS-DOS port using DJGPP.
-
- * Eli Zaretskii currently maintains the MS-Windows port using MinGW.
-
- * Juan Grigera provided a port to Windows32 systems. (This is no
- longer supported.)
-
- * For many years, Dr. Darrel Hankerson acted as coordinator for the
- various ports to different PC platforms and created binary
- distributions for various PC operating systems. He was also
- instrumental in keeping the documentation up to date for the
- various PC platforms.
-
- * Christos Zoulas provided the `extension()' built-in function for
- dynamically adding new modules.
-
- * Ju"rgen Kahrs contributed the initial version of the TCP/IP
- networking code and documentation, and motivated the inclusion of
- the `|&' operator.
-
- * Stephen Davies provided the initial port to Tandem systems and its
- documentation. (However, this is no longer supported.) He was
- also instrumental in the initial work to integrate the byte-code
- internals into the `gawk' code base.
-
- * Matthew Woehlke provided improvements for Tandem's POSIX-compliant
- systems.
-
- * Martin Brown provided the port to BeOS and its documentation.
- (This is no longer supported.)
-
- * Arno Peters did the initial work to convert `gawk' to use GNU
- Automake and GNU `gettext'.
-
- * Alan J. Broder provided the initial version of the `asort()'
- function as well as the code for the optional third argument to the
- `match()' function.
-
- * Andreas Buening updated the `gawk' port for OS/2.
-
- * Isamu Hasegawa, of IBM in Japan, contributed support for multibyte
- characters.
-
- * Michael Benzinger contributed the initial code for `switch'
- statements.
-
- * Patrick T.J. McPhee contributed the code for dynamic loading in
- Windows32 environments. (This is no longer supported)
-
- * John Haque reworked the `gawk' internals to use a byte-code engine,
- providing the `gawk' debugger for `awk' programs.
-
- * Efraim Yawitz contributed the original text for *note Debugger::.
-
- * Arnold Robbins has been working on `gawk' since 1988, at first
- helping David Trueman, and as the primary maintainer since around
- 1994.
-
-
-File: gawk.info, Node: Installation, Next: Notes, Prev: Language History, Up: Top
-
-Appendix B Installing `gawk'
-****************************
-
-This appendix provides instructions for installing `gawk' on the
-various platforms that are supported by the developers. The primary
-developer supports GNU/Linux (and Unix), whereas the other ports are
-contributed. *Note Bugs::, for the electronic mail addresses of the
-people who did the respective ports.
-
-* Menu:
-
-* Gawk Distribution:: What is in the `gawk' distribution.
-* Unix Installation:: Installing `gawk' under various
- versions of Unix.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available `awk'
- implementations.
-
-
-File: gawk.info, Node: Gawk Distribution, Next: Unix Installation, Up: Installation
-
-B.1 The `gawk' Distribution
-===========================
-
-This minor node describes how to get the `gawk' distribution, how to
-extract it, and then what is in the various files and subdirectories.
-
-* Menu:
-
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-
-
-File: gawk.info, Node: Getting, Next: Extracting, Up: Gawk Distribution
-
-B.1.1 Getting the `gawk' Distribution
--------------------------------------
-
-There are three ways to get GNU software:
-
- * Copy it from someone else who already has it.
-
- * Retrieve `gawk' from the Internet host `ftp.gnu.org', in the
- directory `/gnu/gawk'. Both anonymous `ftp' and `http' access are
- supported. If you have the `wget' program, you can use a command
- like the following:
-
- wget http://ftp.gnu.org/gnu/gawk/gawk-4.0.1.tar.gz
-
- The GNU software archive is mirrored around the world. The
-up-to-date list of mirror sites is available from the main FSF web site
-(http://www.gnu.org/order/ftp.html). Try to use one of the mirrors;
-they will be less busy, and you can usually find one closer to your
-site.
-
-
-File: gawk.info, Node: Extracting, Next: Distribution contents, Prev: Getting, Up: Gawk Distribution
-
-B.1.2 Extracting the Distribution
----------------------------------
-
-`gawk' is distributed as several `tar' files compressed with different
-compression programs: `gzip', `bzip2', and `xz'. For simplicity, the
-rest of these instructions assume you are using the one compressed with
-the GNU Zip program, `gzip'.
-
- Once you have the distribution (for example, `gawk-4.0.1.tar.gz'),
-use `gzip' to expand the file and then use `tar' to extract it. You
-can use the following pipeline to produce the `gawk' distribution:
-
- # Under System V, add 'o' to the tar options
- gzip -d -c gawk-4.0.1.tar.gz | tar -xvpf -
-
- On a system with GNU `tar', you can let `tar' do the decompression
-for you:
-
- tar -xvpzf gawk-4.0.1.tar.gz
-
-Extracting the archive creates a directory named `gawk-4.0.1' in the
-current directory.
-
- The distribution file name is of the form `gawk-V.R.P.tar.gz'. The
-V represents the major version of `gawk', the R represents the current
-release of version V, and the P represents a "patch level", meaning
-that minor bugs have been fixed in the release. The current patch
-level is 1, but when retrieving distributions, you should get the
-version with the highest version, release, and patch level. (Note,
-however, that patch levels greater than or equal to 70 denote "beta" or
-nonproduction software; you might not want to retrieve such a version
-unless you don't mind experimenting.) If you are not on a Unix or
-GNU/Linux system, you need to make other arrangements for getting and
-extracting the `gawk' distribution. You should consult a local expert.
-
-
-File: gawk.info, Node: Distribution contents, Prev: Extracting, Up: Gawk Distribution
-
-B.1.3 Contents of the `gawk' Distribution
------------------------------------------
-
-The `gawk' distribution has a number of C source files, documentation
-files, subdirectories, and files related to the configuration process
-(*note Unix Installation::), as well as several subdirectories related
-to different non-Unix operating systems:
-
-Various `.c', `.y', and `.h' files
- The actual `gawk' source code.
-
-`README'
-`README_d/README.*'
- Descriptive files: `README' for `gawk' under Unix and the rest for
- the various hardware and software combinations.
-
-`INSTALL'
- A file providing an overview of the configuration and installation
- process.
-
-`ChangeLog'
- A detailed list of source code changes as bugs are fixed or
- improvements made.
-
-`ChangeLog.0'
- An older list of source code changes.
-
-`NEWS'
- A list of changes to `gawk' since the last release or patch.
-
-`NEWS.0'
- An older list of changes to `gawk'.
-
-`COPYING'
- The GNU General Public License.
-
-`FUTURES'
- A brief list of features and changes being contemplated for future
- releases, with some indication of the time frame for the feature,
- based on its difficulty.
-
-`LIMITATIONS'
- A list of those factors that limit `gawk''s performance. Most of
- these depend on the hardware or operating system software and are
- not limits in `gawk' itself.
-
-`POSIX.STD'
- A description of behaviors in the POSIX standard for `awk' which
- are left undefined, or where `gawk' may not comply fully, as well
- as a list of things that the POSIX standard should describe but
- does not.
-
-`doc/awkforai.txt'
- A short article describing why `gawk' is a good language for
- Artificial Intelligence (AI) programming.
-
-`doc/bc_notes'
- A brief description of `gawk''s "byte code" internals.
-
-`doc/README.card'
-`doc/ad.block'
-`doc/awkcard.in'
-`doc/cardfonts'
-`doc/colors'
-`doc/macros'
-`doc/no.colors'
-`doc/setter.outline'
- The `troff' source for a five-color `awk' reference card. A
- modern version of `troff' such as GNU `troff' (`groff') is needed
- to produce the color version. See the file `README.card' for
- instructions if you have an older `troff'.
-
-`doc/gawk.1'
- The `troff' source for a manual page describing `gawk'. This is
- distributed for the convenience of Unix users.
-
-`doc/gawk.texi'
- The Texinfo source file for this Info file. It should be
- processed with TeX (via `texi2dvi' or `texi2pdf') to produce a
- printed document, and with `makeinfo' to produce an Info or HTML
- file.
-
-`doc/gawk.info'
- The generated Info file for this Info file.
-
-`doc/gawkinet.texi'
- The Texinfo source file for *note (General Introduction)Top::
- gawkinet, TCP/IP Internetworking with `gawk'. It should be
- processed with TeX (via `texi2dvi' or `texi2pdf') to produce a
- printed document and with `makeinfo' to produce an Info or HTML
- file.
-
-`doc/gawkinet.info'
- The generated Info file for `TCP/IP Internetworking with `gawk''.
-
-`doc/igawk.1'
- The `troff' source for a manual page describing the `igawk'
- program presented in *note Igawk Program::.
-
-`doc/Makefile.in'
- The input file used during the configuration process to generate
- the actual `Makefile' for creating the documentation.
-
-`Makefile.am'
-`*/Makefile.am'
- Files used by the GNU `automake' software for generating the
- `Makefile.in' files used by `autoconf' and `configure'.
-
-`Makefile.in'
-`aclocal.m4'
-`configh.in'
-`configure.ac'
-`configure'
-`custom.h'
-`missing_d/*'
-`m4/*'
- These files and subdirectories are used when configuring `gawk'
- for various Unix systems. They are explained in *note Unix
- Installation::.
-
-`po/*'
- The `po' library contains message translations.
-
-`awklib/extract.awk'
-`awklib/Makefile.am'
-`awklib/Makefile.in'
-`awklib/eg/*'
- The `awklib' directory contains a copy of `extract.awk' (*note
- Extract Program::), which can be used to extract the sample
- programs from the Texinfo source file for this Info file. It also
- contains a `Makefile.in' file, which `configure' uses to generate
- a `Makefile'. `Makefile.am' is used by GNU Automake to create
- `Makefile.in'. The library functions from *note Library
- Functions::, and the `igawk' program from *note Igawk Program::,
- are included as ready-to-use files in the `gawk' distribution.
- They are installed as part of the installation process. The rest
- of the programs in this Info file are available in appropriate
- subdirectories of `awklib/eg'.
-
-`posix/*'
- Files needed for building `gawk' on POSIX-compliant systems.
-
-`pc/*'
- Files needed for building `gawk' under MS-Windows and OS/2 (*note
- PC Installation::, for details).
-
-`vms/*'
- Files needed for building `gawk' under VMS (*note VMS
- Installation::, for details).
-
-`test/*'
- A test suite for `gawk'. You can use `make check' from the
- top-level `gawk' directory to run your version of `gawk' against
- the test suite. If `gawk' successfully passes `make check', then
- you can be confident of a successful port.
-
-
-File: gawk.info, Node: Unix Installation, Next: Non-Unix Installation, Prev: Gawk Distribution, Up: Installation
-
-B.2 Compiling and Installing `gawk' on Unix-like Systems
-========================================================
-
-Usually, you can compile and install `gawk' by typing only two
-commands. However, if you use an unusual system, you may need to
-configure `gawk' for your system yourself.
-
-* Menu:
-
-* Quick Installation:: Compiling `gawk' under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-
-
-File: gawk.info, Node: Quick Installation, Next: Additional Configuration Options, Up: Unix Installation
-
-B.2.1 Compiling `gawk' for Unix-like Systems
---------------------------------------------
-
-The normal installation steps should work on all modern commercial
-Unix-derived systems, GNU/Linux, BSD-based systems, and the Cygwin
-environment for MS-Windows.
-
- After you have extracted the `gawk' distribution, `cd' to
-`gawk-4.0.1'. Like most GNU software, `gawk' is configured
-automatically for your system by running the `configure' program. This
-program is a Bourne shell script that is generated automatically using
-GNU `autoconf'. (The `autoconf' software is described fully starting
-with *note (Autoconf)Top:: autoconf,Autoconf--Generating Automatic
-Configuration Scripts.)
-
- To configure `gawk', simply run `configure':
-
- sh ./configure
-
- This produces a `Makefile' and `config.h' tailored to your system.
-The `config.h' file describes various facts about your system. You
-might want to edit the `Makefile' to change the `CFLAGS' variable,
-which controls the command-line options that are passed to the C
-compiler (such as optimization levels or compiling for debugging).
-
- Alternatively, you can add your own values for most `make' variables
-on the command line, such as `CC' and `CFLAGS', when running
-`configure':
-
- CC=cc CFLAGS=-g sh ./configure
-
-See the file `INSTALL' in the `gawk' distribution for all the details.
-
- After you have run `configure' and possibly edited the `Makefile',
-type:
-
- make
-
-Shortly thereafter, you should have an executable version of `gawk'.
-That's all there is to it! To verify that `gawk' is working properly,
-run `make check'. All of the tests should succeed. If these steps do
-not work, or if any of the tests fail, check the files in the
-`README_d' directory to see if you've found a known problem. If the
-failure is not described there, please send in a bug report (*note
-Bugs::).
-
-
-File: gawk.info, Node: Additional Configuration Options, Next: Configuration Philosophy, Prev: Quick Installation, Up: Unix Installation
-
-B.2.2 Additional Configuration Options
---------------------------------------
-
-There are several additional options you may use on the `configure'
-command line when compiling `gawk' from scratch, including:
-
-`--disable-lint'
- Disable all lint checking within `gawk'. The `--lint' and
- `--lint-old' options (*note Options::) are accepted, but silently
- do nothing. Similarly, setting the `LINT' variable (*note
- User-modified::) has no effect on the running `awk' program.
-
- When used with GCC's automatic dead-code-elimination, this option
- cuts almost 200K bytes off the size of the `gawk' executable on
- GNU/Linux x86 systems. Results on other systems and with other
- compilers are likely to vary. Using this option may bring you
- some slight performance improvement.
-
- Using this option will cause some of the tests in the test suite
- to fail. This option may be removed at a later date.
-
-`--disable-nls'
- Disable all message-translation facilities. This is usually not
- desirable, but it may bring you some slight performance
- improvement.
-
-`--with-whiny-user-strftime'
- Force use of the included version of the `strftime()' function for
- deficient systems.
-
- Use the command `./configure --help' to see the full list of options
-that `configure' supplies.
-
-
-File: gawk.info, Node: Configuration Philosophy, Prev: Additional Configuration Options, Up: Unix Installation
-
-B.2.3 The Configuration Process
--------------------------------
-
-This minor node is of interest only if you know something about using
-the C language and Unix-like operating systems.
-
- The source code for `gawk' generally attempts to adhere to formal
-standards wherever possible. This means that `gawk' uses library
-routines that are specified by the ISO C standard and by the POSIX
-operating system interface standard. The `gawk' source code requires
-using an ISO C compiler (the 1990 standard).
-
- Many Unix systems do not support all of either the ISO or the POSIX
-standards. The `missing_d' subdirectory in the `gawk' distribution
-contains replacement versions of those functions that are most likely
-to be missing.
-
- The `config.h' file that `configure' creates contains definitions
-that describe features of the particular operating system where you are
-attempting to compile `gawk'. The three things described by this file
-are: what header files are available, so that they can be correctly
-included, what (supposedly) standard functions are actually available
-in your C libraries, and various miscellaneous facts about your
-operating system. For example, there may not be an `st_blksize'
-element in the `stat' structure. In this case, `HAVE_ST_BLKSIZE' is
-undefined.
-
- It is possible for your C compiler to lie to `configure'. It may do
-so by not exiting with an error when a library function is not
-available. To get around this, edit the file `custom.h'. Use an
-`#ifdef' that is appropriate for your system, and either `#define' any
-constants that `configure' should have defined but didn't, or `#undef'
-any constants that `configure' defined and should not have. `custom.h'
-is automatically included by `config.h'.
-
- It is also possible that the `configure' program generated by
-`autoconf' will not work on your system in some other fashion. If you
-do have a problem, the file `configure.ac' is the input for `autoconf'.
-You may be able to change this file and generate a new version of
-`configure' that works on your system (*note Bugs::, for information on
-how to report problems in configuring `gawk'). The same mechanism may
-be used to send in updates to `configure.ac' and/or `custom.h'.
-
-
-File: gawk.info, Node: Non-Unix Installation, Next: Bugs, Prev: Unix Installation, Up: Installation
-
-B.3 Installation on Other Operating Systems
-===========================================
-
-This minor node describes how to install `gawk' on various non-Unix
-systems.
-
-* Menu:
-
-* PC Installation:: Installing and Compiling `gawk' on
- MS-DOS and OS/2.
-* VMS Installation:: Installing `gawk' on VMS.
-
-
-File: gawk.info, Node: PC Installation, Next: VMS Installation, Up: Non-Unix Installation
-
-B.3.1 Installation on PC Operating Systems
-------------------------------------------
-
-This minor node covers installation and usage of `gawk' on x86 machines
-running MS-DOS, any version of MS-Windows, or OS/2. In this minor
-node, the term "Windows32" refers to any of Microsoft
-Windows-95/98/ME/NT/2000/XP/Vista/7.
-
- The limitations of MS-DOS (and MS-DOS shells under Windows32 or
-OS/2) has meant that various "DOS extenders" are often used with
-programs such as `gawk'. The varying capabilities of Microsoft Windows
-3.1 and Windows32 can add to the confusion. For an overview of the
-considerations, please refer to `README_d/README.pc' in the
-distribution.
-
-* Menu:
-
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling `gawk' for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing `gawk' on PC systems.
-* PC Using:: Running `gawk' on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running `gawk' for
- Cygwin.
-* MSYS:: Using `gawk' In The MSYS Environment.
-
-
-File: gawk.info, Node: PC Binary Installation, Next: PC Compiling, Up: PC Installation
-
-B.3.1.1 Installing a Prepared Distribution for PC Systems
-.........................................................
-
-If you have received a binary distribution prepared by the MS-DOS
-maintainers, then `gawk' and the necessary support files appear under
-the `gnu' directory, with executables in `gnu/bin', libraries in
-`gnu/lib/awk', and manual pages under `gnu/man'. This is designed for
-easy installation to a `/gnu' directory on your drive--however, the
-files can be installed anywhere provided `AWKPATH' is set properly.
-Regardless of the installation directory, the first line of `igawk.cmd'
-and `igawk.bat' (in `gnu/bin') may need to be edited.
-
- The binary distribution contains a separate file describing the
-contents. In particular, it may include more than one version of the
-`gawk' executable.
-
- OS/2 (32 bit, EMX) binary distributions are prepared for the `/usr'
-directory of your preferred drive. Set `UNIXROOT' to your installation
-drive (e.g., `e:') if you want to install `gawk' onto another drive
-than the hardcoded default `c:'. Executables appear in `/usr/bin',
-libraries under `/usr/share/awk', manual pages under `/usr/man',
-Texinfo documentation under `/usr/info', and NLS files under
-`/usr/share/locale'. Note that the files can be installed anywhere
-provided `AWKPATH' is set properly.
-
- If you already have a file `/usr/info/dir' from another package _do
-not overwrite it!_ Instead enter the following commands at your prompt
-(replace `x:' by your installation drive):
-
- install-info --info-dir=x:/usr/info x:/usr/info/gawk.info
- install-info --info-dir=x:/usr/info x:/usr/info/gawkinet.info
-
- The binary distribution may contain a separate file containing
-additional or more detailed installation instructions.
-
-
-File: gawk.info, Node: PC Compiling, Next: PC Testing, Prev: PC Binary Installation, Up: PC Installation
-
-B.3.1.2 Compiling `gawk' for PC Operating Systems
-.................................................
-
-`gawk' can be compiled for MS-DOS, Windows32, and OS/2 using the GNU
-development tools from DJ Delorie (DJGPP: MS-DOS only) or Eberhard
-Mattes (EMX: MS-DOS, Windows32 and OS/2). The file
-`README_d/README.pc' in the `gawk' distribution contains additional
-notes, and `pc/Makefile' contains important information on compilation
-options.
-
- To build `gawk' for MS-DOS and Windows32, copy the files in the `pc'
-directory (_except_ for `ChangeLog') to the directory with the rest of
-the `gawk' sources, then invoke `make' with the appropriate target name
-as an argument to build `gawk'. The `Makefile' copied from the `pc'
-directory contains a configuration section with comments and may need
-to be edited in order to work with your `make' utility.
-
- The `Makefile' supports a number of targets for building various
-MS-DOS and Windows32 versions. A list of targets is printed if the
-`make' command is given without a target. As an example, to build
-`gawk' using the DJGPP tools, enter `make djgpp'. (The DJGPP tools
-needed for the build may be found at
-`ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/'.) To build a native
-MS-Windows binary of `gawk', type `make mingw32'.
-
- The 32 bit EMX version of `gawk' works "out of the box" under OS/2.
-However, it is highly recommended to use GCC 2.95.3 for the compilation.
-In principle, it is possible to compile `gawk' the following way:
-
- $ ./configure
- $ make
-
- This is not recommended, though. To get an OMF executable you should
-use the following commands at your `sh' prompt:
-
- $ CFLAGS="-O2 -Zomf -Zmt"
- $ export CFLAGS
- $ LDFLAGS="-s -Zcrtdll -Zlinker /exepack:2 -Zlinker /pm:vio -Zstack 0x6000"
- $ export LDFLAGS
- $ RANLIB="echo"
- $ export RANLIB
- $ ./configure --prefix=c:/usr
- $ make AR=emxomfar
-
- These are just suggestions for use with GCC 2.x. You may use any
-other set of (self-consistent) environment variables and compiler flags.
-
- If you use GCC 2.95 it is recommended to use also:
-
- $ LIBS="-lgcc"
- $ export LIBS
-
- You can also get an `a.out' executable if you prefer:
-
- $ CFLAGS="-O2 -Zmt"
- $ export CFLAGS
- $ LDFLAGS="-s -Zstack 0x6000"
- $ LIBS="-lgcc"
- $ unset RANLIB
- $ ./configure --prefix=c:/usr
- $ make
-
- NOTE: Compilation of `a.out' executables also works with GCC 3.2.
- Versions later than GCC 3.2 have not been tested successfully.
-
- `make install' works as expected with the EMX build.
-
- NOTE: Ancient OS/2 ports of GNU `make' are not able to handle the
- Makefiles of this package. If you encounter any problems with
- `make', try GNU Make 3.79.1 or later versions. You should find
- the latest version on `ftp://hobbes.nmsu.edu/pub/os2/'.
-
-
-File: gawk.info, Node: PC Testing, Next: PC Using, Prev: PC Compiling, Up: PC Installation
-
-B.3.1.3 Testing `gawk' on PC Operating Systems
-..............................................
-
-Using `make' to run the standard tests and to install `gawk' requires
-additional Unix-like tools, including `sh', `sed', and `cp'. In order
-to run the tests, the `test/*.ok' files may need to be converted so
-that they have the usual MS-DOS-style end-of-line markers.
-Alternatively, run `make check CMP="diff -a"' to use GNU `diff' in text
-mode instead of `cmp' to compare the resulting files.
-
- Most of the tests work properly with Stewartson's shell along with
-the companion utilities or appropriate GNU utilities. However, some
-editing of `test/Makefile' is required. It is recommended that you copy
-the file `pc/Makefile.tst' over the file `test/Makefile' as a
-replacement. Details can be found in `README_d/README.pc' and in the
-file `pc/Makefile.tst'.
-
- On OS/2 the `pid' test fails because `spawnl()' is used instead of
-`fork()'/`execl()' to start child processes. Also the `mbfw1' and
-`mbprintf1' tests fail because the needed multibyte functionality is
-not available.
-
-
-File: gawk.info, Node: PC Using, Next: Cygwin, Prev: PC Testing, Up: PC Installation
-
-B.3.1.4 Using `gawk' on PC Operating Systems
-............................................
-
-With the exception of the Cygwin environment, the `|&' operator and
-TCP/IP networking (*note TCP/IP Networking::) are not supported for
-MS-DOS or MS-Windows. EMX (OS/2 only) does support at least the `|&'
-operator.
-
- The MS-DOS and MS-Windows versions of `gawk' search for program
-files as described in *note AWKPATH Variable::. However, semicolons
-(rather than colons) separate elements in the `AWKPATH' variable. If
-`AWKPATH' is not set or is empty, then the default search path for
-MS-Windows and MS-DOS versions is `".;c:/lib/awk;c:/gnu/lib/awk"'.
-
- The search path for OS/2 (32 bit, EMX) is determined by the prefix
-directory (most likely `/usr' or `c:/usr') that has been specified as
-an option of the `configure' script like it is the case for the Unix
-versions. If `c:/usr' is the prefix directory then the default search
-path contains `.' and `c:/usr/share/awk'. Additionally, to support
-binary distributions of `gawk' for OS/2 systems whose drive `c:' might
-not support long file names or might not exist at all, there is a
-special environment variable. If `UNIXROOT' specifies a drive then
-this specific drive is also searched for program files. E.g., if
-`UNIXROOT' is set to `e:' the complete default search path is
-`".;c:/usr/share/awk;e:/usr/share/awk"'.
-
- An `sh'-like shell (as opposed to `command.com' under MS-DOS or
-`cmd.exe' under MS-Windows or OS/2) may be useful for `awk' programming.
-The DJGPP collection of tools includes an MS-DOS port of Bash, and
-several shells are available for OS/2, including `ksh'.
-
- Under MS-Windows, OS/2 and MS-DOS, `gawk' (and many other text
-programs) silently translate end-of-line `"\r\n"' to `"\n"' on input
-and `"\n"' to `"\r\n"' on output. A special `BINMODE' variable
-(c.e.) allows control over these translations and is interpreted as
-follows:
-
- * If `BINMODE' is `"r"', or one, then binary mode is set on read
- (i.e., no translations on reads).
-
- * If `BINMODE' is `"w"', or two, then binary mode is set on write
- (i.e., no translations on writes).
-
- * If `BINMODE' is `"rw"' or `"wr"' or three, binary mode is set for
- both read and write.
-
- * `BINMODE=NON-NULL-STRING' is the same as `BINMODE=3' (i.e., no
- translations on reads or writes). However, `gawk' issues a warning
- message if the string is not one of `"rw"' or `"wr"'.
-
-The modes for standard input and standard output are set one time only
-(after the command line is read, but before processing any of the `awk'
-program). Setting `BINMODE' for standard input or standard output is
-accomplished by using an appropriate `-v BINMODE=N' option on the
-command line. `BINMODE' is set at the time a file or pipe is opened
-and cannot be changed mid-stream.
-
- The name `BINMODE' was chosen to match `mawk' (*note Other
-Versions::). `mawk' and `gawk' handle `BINMODE' similarly; however,
-`mawk' adds a `-W BINMODE=N' option and an environment variable that
-can set `BINMODE', `RS', and `ORS'. The files `binmode[1-3].awk'
-(under `gnu/lib/awk' in some of the prepared distributions) have been
-chosen to match `mawk''s `-W BINMODE=N' option. These can be changed
-or discarded; in particular, the setting of `RS' giving the fewest
-"surprises" is open to debate. `mawk' uses `RS = "\r\n"' if binary
-mode is set on read, which is appropriate for files with the
-MS-DOS-style end-of-line.
-
- To illustrate, the following examples set binary mode on writes for
-standard output and other files, and set `ORS' as the "usual"
-MS-DOS-style end-of-line:
-
- gawk -v BINMODE=2 -v ORS="\r\n" ...
-
-or:
-
- gawk -v BINMODE=w -f binmode2.awk ...
-
-These give the same result as the `-W BINMODE=2' option in `mawk'. The
-following changes the record separator to `"\r\n"' and sets binary mode
-on reads, but does not affect the mode on standard input:
-
- gawk -v RS="\r\n" --source "BEGIN { BINMODE = 1 }" ...
-
-or:
-
- gawk -f binmode1.awk ...
-
-With proper quoting, in the first example the setting of `RS' can be
-moved into the `BEGIN' rule.
-
-
-File: gawk.info, Node: Cygwin, Next: MSYS, Prev: PC Using, Up: PC Installation
-
-B.3.1.5 Using `gawk' In The Cygwin Environment
-..............................................
-
-`gawk' can be built and used "out of the box" under MS-Windows if you
-are using the Cygwin environment (http://www.cygwin.com). This
-environment provides an excellent simulation of Unix, using the GNU
-tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, and
-other GNU programs. Compilation and installation for Cygwin is the
-same as for a Unix system:
-
- tar -xvpzf gawk-4.0.1.tar.gz
- cd gawk-4.0.1
- ./configure
- make
-
- When compared to GNU/Linux on the same system, the `configure' step
-on Cygwin takes considerably longer. However, it does finish, and then
-the `make' proceeds as usual.
-
- NOTE: The `|&' operator and TCP/IP networking (*note TCP/IP
- Networking::) are fully supported in the Cygwin environment. This
- is not true for any other environment on MS-Windows.
-
-
-File: gawk.info, Node: MSYS, Prev: Cygwin, Up: PC Installation
-
-B.3.1.6 Using `gawk' In The MSYS Environment
-............................................
-
-In the MSYS environment under MS-Windows, `gawk' automatically uses
-binary mode for reading and writing files. Thus there is no need to
-use the `BINMODE' variable.
-
- This can cause problems with other Unix-like components that have
-been ported to MS-Windows that expect `gawk' to do automatic
-translation of `"\r\n"', since it won't. Caveat Emptor!
-
-
-File: gawk.info, Node: VMS Installation, Prev: PC Installation, Up: Non-Unix Installation
-
-B.3.2 How to Compile and Install `gawk' on VMS
-----------------------------------------------
-
-This node describes how to compile and install `gawk' under VMS. The
-older designation "VMS" is used throughout to refer to OpenVMS.
-
-* Menu:
-
-* VMS Compilation:: How to compile `gawk' under VMS.
-* VMS Installation Details:: How to install `gawk' under VMS.
-* VMS Running:: How to run `gawk' under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-
-
-File: gawk.info, Node: VMS Compilation, Next: VMS Installation Details, Up: VMS Installation
-
-B.3.2.1 Compiling `gawk' on VMS
-...............................
-
-To compile `gawk' under VMS, there is a `DCL' command procedure that
-issues all the necessary `CC' and `LINK' commands. There is also a
-`Makefile' for use with the `MMS' utility. From the source directory,
-use either:
-
- $ @[.VMS]VMSBUILD.COM
-
-or:
-
- $ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK
-
- Older versions of `gawk' could be built with VAX C or GNU C on
-VAX/VMS, as well as with DEC C, but that is no longer supported. DEC C
-(also briefly known as "Compaq C" and now known as "HP C," but referred
-to here as "DEC C") is required. Both `VMSBUILD.COM' and `DESCRIP.MMS'
-contain some obsolete support for the older compilers but are set up to
-use DEC C by default.
-
- `gawk' has been tested under Alpha/VMS 7.3-1 using Compaq C V6.4,
-and on Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.(1)
-
- ---------- Footnotes ----------
-
- (1) The IA64 architecture is also known as "Itanium."
-
-
-File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Compilation, Up: VMS Installation
-
-B.3.2.2 Installing `gawk' on VMS
-................................
-
-To install `gawk', all you need is a "foreign" command, which is a
-`DCL' symbol whose value begins with a dollar sign. For example:
-
- $ GAWK :== $disk1:[gnubin]GAWK
-
-Substitute the actual location of `gawk.exe' for `$disk1:[gnubin]'. The
-symbol should be placed in the `login.com' of any user who wants to run
-`gawk', so that it is defined every time the user logs on.
-Alternatively, the symbol may be placed in the system-wide
-`sylogin.com' procedure, which allows all users to run `gawk'.
-
- Optionally, the help entry can be loaded into a VMS help library:
-
- $ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
-
-(You may want to substitute a site-specific help library rather than
-the standard VMS library `HELPLIB'.) After loading the help text, the
-command:
-
- $ HELP GAWK
-
-provides information about both the `gawk' implementation and the `awk'
-programming language.
-
- The logical name `AWK_LIBRARY' can designate a default location for
-`awk' program files. For the `-f' option, if the specified file name
-has no device or directory path information in it, `gawk' looks in the
-current directory first, then in the directory specified by the
-translation of `AWK_LIBRARY' if the file is not found. If, after
-searching in both directories, the file still is not found, `gawk'
-appends the suffix `.awk' to the filename and retries the file search.
-If `AWK_LIBRARY' has no definition, a default value of `SYS$LIBRARY:'
-is used for it.
-
-
-File: gawk.info, Node: VMS Running, Next: VMS Old Gawk, Prev: VMS Installation Details, Up: VMS Installation
-
-B.3.2.3 Running `gawk' on VMS
-.............................
-
-Command-line parsing and quoting conventions are significantly different
-on VMS, so examples in this Info file or from other sources often need
-minor changes. They _are_ minor though, and all `awk' programs should
-run correctly.
-
- Here are a couple of trivial tests:
-
- $ gawk -- "BEGIN {print ""Hello, World!""}"
- $ gawk -"W" version
- ! could also be -"W version" or "-W version"
-
-Note that uppercase and mixed-case text must be quoted.
-
- The VMS port of `gawk' includes a `DCL'-style interface in addition
-to the original shell-style interface (see the help entry for details).
-One side effect of dual command-line parsing is that if there is only a
-single parameter (as in the quoted string program above), the command
-becomes ambiguous. To work around this, the normally optional `--'
-flag is required to force Unix-style parsing rather than `DCL' parsing.
-If any other dash-type options (or multiple parameters such as data
-files to process) are present, there is no ambiguity and `--' can be
-omitted.
-
- The default search path, when looking for `awk' program files
-specified by the `-f' option, is `"SYS$DISK:[],AWK_LIBRARY:"'. The
-logical name `AWKPATH' can be used to override this default. The format
-of `AWKPATH' is a comma-separated list of directory specifications.
-When defining it, the value should be quoted so that it retains a single
-translation and not a multitranslation `RMS' searchlist.
-
-
-File: gawk.info, Node: VMS Old Gawk, Prev: VMS Running, Up: VMS Installation
-
-B.3.2.4 Some VMS Systems Have An Old Version of `gawk'
-......................................................
-
-Some versions of VMS have an old version of `gawk'. To access it,
-define a symbol, as follows:
-
- $ gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe
-
- This is apparently version 2.15.6, which is extremely old. We
-recommend compiling and using the current version.
-
-
-File: gawk.info, Node: Bugs, Next: Other Versions, Prev: Non-Unix Installation, Up: Installation
-
-B.4 Reporting Problems and Bugs
-===============================
-
- There is nothing more dangerous than a bored archeologist.
- The Hitchhiker's Guide to the Galaxy
-
- If you have problems with `gawk' or think that you have found a bug,
-please report it to the developers; we cannot promise to do anything
-but we might well want to fix it.
-
- Before reporting a bug, make sure you have actually found a real bug.
-Carefully reread the documentation and see if it really says you can do
-what you're trying to do. If it's not clear whether you should be able
-to do something or not, report that too; it's a bug in the
-documentation!
-
- Before reporting a bug or trying to fix it yourself, try to isolate
-it to the smallest possible `awk' program and input data file that
-reproduces the problem. Then send us the program and data file, some
-idea of what kind of Unix system you're using, the compiler you used to
-compile `gawk', and the exact results `gawk' gave you. Also say what
-you expected to occur; this helps us decide whether the problem is
-really in the documentation.
-
- Please include the version number of `gawk' you are using. You can
-get this information with the command `gawk --version'.
-
- Once you have a precise problem, send email to <bug-gawk@gnu.org>.
-
- Using this address automatically sends a copy of your mail to me.
-If necessary, I can be reached directly at <arnold@skeeve.com>. The
-bug reporting address is preferred since the email list is archived at
-the GNU Project. _All email should be in English, since that is my
-native language._
-
- CAUTION: Do _not_ try to report bugs in `gawk' by posting to the
- Usenet/Internet newsgroup `comp.lang.awk'. While the `gawk'
- developers do occasionally read this newsgroup, there is no
- guarantee that we will see your posting. The steps described
- above are the official recognized ways for reporting bugs. Really.
-
- NOTE: Many distributions of GNU/Linux and the various BSD-based
- operating systems have their own bug reporting systems. If you
- report a bug using your distribution's bug reporting system,
- _please_ also send a copy to <bug-gawk@gnu.org>.
-
- This is for two reasons. First, while some distributions forward
- bug reports "upstream" to the GNU mailing list, many don't, so
- there is a good chance that the `gawk' maintainer won't even see
- the bug report! Second, mail to the GNU list is archived, and
- having everything at the GNU project keeps things self-contained
- and not dependant on other web sites.
-
- Non-bug suggestions are always welcome as well. If you have
-questions about things that are unclear in the documentation or are
-just obscure features, ask me; I will try to help you out, although I
-may not have the time to fix the problem. You can send me electronic
-mail at the Internet address noted previously.
-
- If you find bugs in one of the non-Unix ports of `gawk', please send
-an electronic mail message to the person who maintains that port. They
-are named in the following list, as well as in the `README' file in the
-`gawk' distribution. Information in the `README' file should be
-considered authoritative if it conflicts with this Info file.
-
- The people maintaining the non-Unix ports of `gawk' are as follows:
-
-MS-DOS with DJGPP Scott Deifik, <scottd.mail@sbcglobal.net>.
-MS-Windows with MINGW Eli Zaretskii, <eliz@gnu.org>.
-OS/2 Andreas Buening, <andreas.buening@nexgo.de>.
-VMS Pat Rankin, <r.pat.rankin@gmail.com>
-z/OS (OS/390) Dave Pitts, <dpitts@cozx.com>.
-
- If your bug is also reproducible under Unix, please send a copy of
-your report to the <bug-gawk@gnu.org> email list as well.
-
-
-File: gawk.info, Node: Other Versions, Prev: Bugs, Up: Installation
-
-B.5 Other Freely Available `awk' Implementations
-================================================
-
- It's kind of fun to put comments like this in your awk code.
- `// Do C++ comments work? answer: yes! of course'
- Michael Brennan
-
- There are a number of other freely available `awk' implementations.
-This minor node briefly describes where to get them:
-
-Unix `awk'
- Brian Kernighan, one of the original designers of Unix `awk', has
- made his implementation of `awk' freely available. You can
- retrieve this version via the World Wide Web from his home page
- (http://www.cs.princeton.edu/~bwk). It is available in several
- archive formats:
-
- Shell archive
- `http://www.cs.princeton.edu/~bwk/btl.mirror/awk.shar'
-
- Compressed `tar' file
- `http://www.cs.princeton.edu/~bwk/btl.mirror/awk.tar.gz'
-
- Zip file
- `http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip'
-
- This version requires an ISO C (1990 standard) compiler; the C
- compiler from GCC (the GNU Compiler Collection) works quite nicely.
-
- *Note Common Extensions::, for a list of extensions in this `awk'
- that are not in POSIX `awk'.
-
-`mawk'
- Michael Brennan wrote an independent implementation of `awk',
- called `mawk'. It is available under the GPL (*note Copying::),
- just as `gawk' is.
-
- The original distribution site for the `mawk' source code no
- longer has it. A copy is available at
- `http://www.skeeve.com/gawk/mawk1.3.3.tar.gz'.
-
- In 2009, Thomas Dickey took on `mawk' maintenance. Basic
- information is available on the project's web page
- (http://www.invisible-island.net/mawk/mawk.html). The download
- URL is `http://invisible-island.net/datafiles/release/mawk.tar.gz'.
-
- Once you have it, `gunzip' may be used to decompress this file.
- Installation is similar to `gawk''s (*note Unix Installation::).
-
- *Note Common Extensions::, for a list of extensions in `mawk' that
- are not in POSIX `awk'.
-
-`awka'
- Written by Andrew Sumner, `awka' translates `awk' programs into C,
- compiles them, and links them with a library of functions that
- provides the core `awk' functionality. It also has a number of
- extensions.
-
- The `awk' translator is released under the GPL, and the library is
- under the LGPL.
-
- To get `awka', go to `http://sourceforge.net/projects/awka'.
-
- The project seems to be frozen; no new code changes have been made
- since approximately 2003.
-
-`pawk'
- Nelson H.F. Beebe at the University of Utah has modified Brian
- Kernighan's `awk' to provide timing and profiling information. It
- is different from `gawk' with the `--profile' option. (*note
- Profiling::), in that it uses CPU-based profiling, not line-count
- profiling. You may find it at either
- `ftp://ftp.math.utah.edu/pub/pawk/pawk-20030606.tar.gz' or
- `http://www.math.utah.edu/pub/pawk/pawk-20030606.tar.gz'.
-
-Busybox Awk
- Busybox is a GPL-licensed program providing small versions of many
- applications within a single executable. It is aimed at embedded
- systems. It includes a full implementation of POSIX `awk'. When
- building it, be careful not to do `make install' as it will
- overwrite copies of other applications in your `/usr/local/bin'.
- For more information, see the project's home page
- (http://busybox.net).
-
-The OpenSolaris POSIX `awk'
- The version of `awk' in `/usr/xpg4/bin' on Solaris is more-or-less
- POSIX-compliant. It is based on the `awk' from Mortice Kern
- Systems for PCs. The source code can be downloaded from the
- OpenSolaris web site (http://www.opensolaris.org). This author
- was able to make it compile and work under GNU/Linux with 1-2
- hours of work. Making it more generally portable (using GNU
- Autoconf and/or Automake) would take more work, and this has not
- been done, at least to our knowledge.
-
-`jawk'
- This is an interpreter for `awk' written in Java. It claims to be
- a full interpreter, although because it uses Java facilities for
- I/O and for regexp matching, the language it supports is different
- from POSIX `awk'. More information is available on the project's
- home page (http://jawk.sourceforge.net).
-
-Libmawk
- This is an embeddable `awk' interpreter derived from `mawk'. For
- more information see `http://repo.hu/projects/libmawk/'.
-
-QSE Awk
- This is an embeddable `awk' interpreter. For more information see
- `http://code.google.com/p/qse/' and `http://awk.info/?tools/qse'.
-
-`QTawk'
- This is an independent implementation of `awk' distributed under
- the GPL. It has a large number of extensions over standard `awk'
- and may not be 100% syntactically compatible with it. See
- `http://www.quiktrim.org/QTawk.html' for more information,
- including the manual and a download link.
-
-`xgawk'
- XML `gawk'. This is a fork of the `gawk' 3.1.6 source base to
- support processing XML files. It has a number of interesting
- extensions which should one day be integrated into the main `gawk'
- code base. For more information, see the XMLgawk project web site
- (http://xmlgawk.sourceforge.net).
-
-
-
-File: gawk.info, Node: Notes, Next: Basic Concepts, Prev: Installation, Up: Top
-
-Appendix C Implementation Notes
-*******************************
-
-This appendix contains information mainly of interest to implementers
-and maintainers of `gawk'. Everything in it applies specifically to
-`gawk' and not to other implementations.
-
-* Menu:
-
-* Compatibility Mode:: How to disable certain `gawk'
- extensions.
-* Additions:: Making Additions To `gawk'.
-* Dynamic Extensions:: Adding new built-in functions to
- `gawk'.
-* Future Extensions:: New features that may be implemented one day.
-
-
-File: gawk.info, Node: Compatibility Mode, Next: Additions, Up: Notes
-
-C.1 Downward Compatibility and Debugging
-========================================
-
-*Note POSIX/GNU::, for a summary of the GNU extensions to the `awk'
-language and program. All of these features can be turned off by
-invoking `gawk' with the `--traditional' option or with the `--posix'
-option.
-
- If `gawk' is compiled for debugging with `-DDEBUG', then there is
-one more option available on the command line:
-
-`-Y'
-`--parsedebug'
- Prints out the parse stack information as the program is being
- parsed.
-
- This option is intended only for serious `gawk' developers and not
-for the casual user. It probably has not even been compiled into your
-version of `gawk', since it slows down execution.
-
-
-File: gawk.info, Node: Additions, Next: Dynamic Extensions, Prev: Compatibility Mode, Up: Notes
-
-C.2 Making Additions to `gawk'
-==============================
-
-If you find that you want to enhance `gawk' in a significant fashion,
-you are perfectly free to do so. That is the point of having free
-software; the source code is available and you are free to change it as
-you want (*note Copying::).
-
- This minor node discusses the ways you might want to change `gawk'
-as well as any considerations you should bear in mind.
-
-* Menu:
-
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- `gawk'.
-* New Ports:: Porting `gawk' to a new operating
- system.
-
-
-File: gawk.info, Node: Accessing The Source, Next: Adding Code, Up: Additions
-
-C.2.1 Accessing The `gawk' Git Repository
------------------------------------------
-
-As `gawk' is Free Software, the source code is always available. *note
-Gawk Distribution::, describes how to get and build the formal,
-released versions of `gawk'.
-
- However, if you want to modify `gawk' and contribute back your
-changes, you will probably wish to work with the development version.
-To do so, you will need to access the `gawk' source code repository.
-The code is maintained using the Git distributed version control system
-(http://git-scm.com/). You will need to install it if your system
-doesn't have it. Once you have done so, use the command:
-
- git clone git://git.savannah.gnu.org/gawk.git
-
-This will clone the `gawk' repository. If you are behind a firewall
-that will not allow you to use the Git native protocol, you can still
-access the repository using:
-
- git clone http://git.savannah.gnu.org/r/gawk.git
-
- Once you have made changes, you can use `git diff' to produce a
-patch, and send that to the `gawk' maintainer; see *note Bugs:: for how
-to do that.
-
- Finally, if you cannot install Git (e.g., if it hasn't been ported
-yet to your operating system), you can use the Git-CVS gateway to check
-out a copy using CVS, as follows:
-
- cvs -d:pserver:anonymous@pserver.git.sv.gnu.org:/gawk.git co -d gawk master
-
-
-File: gawk.info, Node: Adding Code, Next: New Ports, Prev: Accessing The Source, Up: Additions
-
-C.2.2 Adding New Features
--------------------------
-
-You are free to add any new features you like to `gawk'. However, if
-you want your changes to be incorporated into the `gawk' distribution,
-there are several steps that you need to take in order to make it
-possible to include your changes:
-
- 1. Before building the new feature into `gawk' itself, consider
- writing it as an extension module (*note Dynamic Extensions::).
- If that's not possible, continue with the rest of the steps in
- this list.
-
- 2. Be prepared to sign the appropriate paperwork. In order for the
- FSF to distribute your changes, you must either place those
- changes in the public domain and submit a signed statement to that
- effect, or assign the copyright in your changes to the FSF. Both
- of these actions are easy to do and _many_ people have done so
- already. If you have questions, please contact me (*note Bugs::),
- or <assign@gnu.org>.
-
- 3. Get the latest version. It is much easier for me to integrate
- changes if they are relative to the most recent distributed
- version of `gawk'. If your version of `gawk' is very old, I may
- not be able to integrate them at all. (*Note Getting::, for
- information on getting the latest version of `gawk'.)
-
- 4. See *note (Version)Top:: standards, GNU Coding Standards. This
- document describes how GNU software should be written. If you
- haven't read it, please do so, preferably _before_ starting to
- modify `gawk'. (The `GNU Coding Standards' are available from the
- GNU Project's web site
- (http://www.gnu.org/prep/standards_toc.html). Texinfo, Info, and
- DVI versions are also available.)
-
- 5. Use the `gawk' coding style. The C code for `gawk' follows the
- instructions in the `GNU Coding Standards', with minor exceptions.
- The code is formatted using the traditional "K&R" style,
- particularly as regards to the placement of braces and the use of
- TABs. In brief, the coding rules for `gawk' are as follows:
-
- * Use ANSI/ISO style (prototype) function headers when defining
- functions.
-
- * Put the name of the function at the beginning of its own line.
-
- * Put the return type of the function, even if it is `int', on
- the line above the line with the name and arguments of the
- function.
-
- * Put spaces around parentheses used in control structures
- (`if', `while', `for', `do', `switch', and `return').
-
- * Do not put spaces in front of parentheses used in function
- calls.
-
- * Put spaces around all C operators and after commas in
- function calls.
-
- * Do not use the comma operator to produce multiple side
- effects, except in `for' loop initialization and increment
- parts, and in macro bodies.
-
- * Use real TABs for indenting, not spaces.
-
- * Use the "K&R" brace layout style.
-
- * Use comparisons against `NULL' and `'\0'' in the conditions of
- `if', `while', and `for' statements, as well as in the `case's
- of `switch' statements, instead of just the plain pointer or
- character value.
-
- * Use the `TRUE', `FALSE' and `NULL' symbolic constants and the
- character constant `'\0'' where appropriate, instead of `1'
- and `0'.
-
- * Provide one-line descriptive comments for each function.
-
- * Do not use the `alloca()' function for allocating memory off
- the stack. Its use causes more portability trouble than is
- worth the minor benefit of not having to free the storage.
- Instead, use `malloc()' and `free()'.
-
- * Do not use comparisons of the form `! strcmp(a, b)' or
- similar. As Henry Spencer once said, "`strcmp()' is not a
- boolean!" Instead, use `strcmp(a, b) == 0'.
-
- * If adding new bit flag values, use explicit hexadecimal
- constants (`0x001', `0x002', `0x004', and son on) instead of
- shifting one left by successive amounts (`(1<<0)', `(1<<1)',
- and so on).
-
- NOTE: If I have to reformat your code to follow the coding
- style used in `gawk', I may not bother to integrate your
- changes at all.
-
- 6. Update the documentation. Along with your new code, please supply
- new sections and/or chapters for this Info file. If at all
- possible, please use real Texinfo, instead of just supplying
- unformatted ASCII text (although even that is better than no
- documentation at all). Conventions to be followed in `GAWK:
- Effective AWK Programming' are provided after the `@bye' at the
- end of the Texinfo source file. If possible, please update the
- `man' page as well.
-
- You will also have to sign paperwork for your documentation
- changes.
-
- 7. Submit changes as unified diffs. Use `diff -u -r -N' to compare
- the original `gawk' source tree with your version. I recommend
- using the GNU version of `diff'. Send the output produced by
- either run of `diff' to me when you submit your changes. (*Note
- Bugs::, for the electronic mail information.)
-
- Using this format makes it easy for me to apply your changes to the
- master version of the `gawk' source code (using `patch'). If I
- have to apply the changes manually, using a text editor, I may not
- do so, particularly if there are lots of changes.
-
- 8. Include an entry for the `ChangeLog' file with your submission.
- This helps further minimize the amount of work I have to do,
- making it easier for me to accept patches.
-
- Although this sounds like a lot of work, please remember that while
-you may write the new code, I have to maintain it and support it. If it
-isn't possible for me to do that with a minimum of extra work, then I
-probably will not.
-
-
-File: gawk.info, Node: New Ports, Prev: Adding Code, Up: Additions
-
-C.2.3 Porting `gawk' to a New Operating System
-----------------------------------------------
-
-If you want to port `gawk' to a new operating system, there are several
-steps:
-
- 1. Follow the guidelines in *note Adding Code::, concerning coding
- style, submission of diffs, and so on.
-
- 2. Be prepared to sign the appropriate paperwork. In order for the
- FSF to distribute your code, you must either place your code in
- the public domain and submit a signed statement to that effect, or
- assign the copyright in your code to the FSF. Both of these
- actions are easy to do and _many_ people have done so already. If
- you have questions, please contact me, or <gnu@gnu.org>.
-
- 3. When doing a port, bear in mind that your code must coexist
- peacefully with the rest of `gawk' and the other ports. Avoid
- gratuitous changes to the system-independent parts of the code. If
- at all possible, avoid sprinkling `#ifdef's just for your port
- throughout the code.
-
- If the changes needed for a particular system affect too much of
- the code, I probably will not accept them. In such a case, you
- can, of course, distribute your changes on your own, as long as
- you comply with the GPL (*note Copying::).
-
- 4. A number of the files that come with `gawk' are maintained by other
- people. Thus, you should not change them unless it is for a very
- good reason; i.e., changes are not out of the question, but
- changes to these files are scrutinized extra carefully. The files
- are `dfa.c', `dfa.h', `getopt1.c', `getopt.c', `getopt.h',
- `install-sh', `mkinstalldirs', `regcomp.c', `regex.c',
- `regexec.c', `regexex.c', `regex.h', `regex_internal.c', and
- `regex_internal.h'.
-
- 5. Be willing to continue to maintain the port. Non-Unix operating
- systems are supported by volunteers who maintain the code needed
- to compile and run `gawk' on their systems. If noone volunteers to
- maintain a port, it becomes unsupported and it may be necessary to
- remove it from the distribution.
-
- 6. Supply an appropriate `gawkmisc.???' file. Each port has its own
- `gawkmisc.???' that implements certain operating system specific
- functions. This is cleaner than a plethora of `#ifdef's scattered
- throughout the code. The `gawkmisc.c' in the main source
- directory includes the appropriate `gawkmisc.???' file from each
- subdirectory. Be sure to update it as well.
-
- Each port's `gawkmisc.???' file has a suffix reminiscent of the
- machine or operating system for the port--for example,
- `pc/gawkmisc.pc' and `vms/gawkmisc.vms'. The use of separate
- suffixes, instead of plain `gawkmisc.c', makes it possible to move
- files from a port's subdirectory into the main subdirectory,
- without accidentally destroying the real `gawkmisc.c' file.
- (Currently, this is only an issue for the PC operating system
- ports.)
-
- 7. Supply a `Makefile' as well as any other C source and header files
- that are necessary for your operating system. All your code
- should be in a separate subdirectory, with a name that is the same
- as, or reminiscent of, either your operating system or the
- computer system. If possible, try to structure things so that it
- is not necessary to move files out of the subdirectory into the
- main source directory. If that is not possible, then be sure to
- avoid using names for your files that duplicate the names of files
- in the main source directory.
-
- 8. Update the documentation. Please write a section (or sections)
- for this Info file describing the installation and compilation
- steps needed to compile and/or install `gawk' for your system.
-
- Following these steps makes it much easier to integrate your changes
-into `gawk' and have them coexist happily with other operating systems'
-code that is already there.
-
- In the code that you supply and maintain, feel free to use a coding
-style and brace layout that suits your taste.
-
-
-File: gawk.info, Node: Dynamic Extensions, Next: Future Extensions, Prev: Additions, Up: Notes
-
-C.3 Adding New Built-in Functions to `gawk'
-===========================================
-
- Danger Will Robinson! Danger!!
- Warning! Warning!
- The Robot
-
- It is possible to add new built-in functions to `gawk' using
-dynamically loaded libraries. This facility is available on systems
-(such as GNU/Linux) that support the C `dlopen()' and `dlsym()'
-functions. This minor node describes how to write and use dynamically
-loaded extensions for `gawk'. Experience with programming in C or C++
-is necessary when reading this minor node.
-
- CAUTION: The facilities described in this minor node are very much
- subject to change in a future `gawk' release. Be aware that you
- may have to re-do everything, at some future time.
-
- If you have written your own dynamic extensions, be sure to
- recompile them for each new `gawk' release. There is no guarantee
- of binary compatibility between different releases, nor will there
- ever be such a guarantee.
-
- NOTE: When `--sandbox' is specified, extensions are disabled
- (*note Options::.
-
-* Menu:
-
-* Internals:: A brief look at some `gawk' internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-
-
-File: gawk.info, Node: Internals, Next: Plugin License, Up: Dynamic Extensions
-
-C.3.1 A Minimal Introduction to `gawk' Internals
-------------------------------------------------
-
-The truth is that `gawk' was not designed for simple extensibility.
-The facilities for adding functions using shared libraries work, but
-are something of a "bag on the side." Thus, this tour is brief and
-simplistic; would-be `gawk' hackers are encouraged to spend some time
-reading the source code before trying to write extensions based on the
-material presented here. Of particular note are the files `awk.h',
-`builtin.c', and `eval.c'. Reading `awkgram.y' in order to see how the
-parse tree is built would also be of use.
-
- With the disclaimers out of the way, the following types, structure
-members, functions, and macros are declared in `awk.h' and are of use
-when writing extensions. The next minor node shows how they are used:
-
-`AWKNUM'
- An `AWKNUM' is the internal type of `awk' floating-point numbers.
- Typically, it is a C `double'.
-
-`NODE'
- Just about everything is done using objects of type `NODE'. These
- contain both strings and numbers, as well as variables and arrays.
-
-`AWKNUM force_number(NODE *n)'
- This macro forces a value to be numeric. It returns the actual
- numeric value contained in the node. It may end up calling an
- internal `gawk' function.
-
-`void force_string(NODE *n)'
- This macro guarantees that a `NODE''s string value is current. It
- may end up calling an internal `gawk' function. It also
- guarantees that the string is zero-terminated.
-
-`void force_wstring(NODE *n)'
- Similarly, this macro guarantees that a `NODE''s wide-string value
- is current. It may end up calling an internal `gawk' function.
- It also guarantees that the wide string is zero-terminated.
-
-`nargs'
- Inside an extension function, this is the actual number of
- parameters passed to the current function.
-
-`n->stptr'
-`n->stlen'
- The data and length of a `NODE''s string value, respectively. The
- string is _not_ guaranteed to be zero-terminated. If you need to
- pass the string value to a C library function, save the value in
- `n->stptr[n->stlen]', assign `'\0'' to it, call the routine, and
- then restore the value.
-
-`n->wstptr'
-`n->wstlen'
- The data and length of a `NODE''s wide-string value, respectively.
- Use `force_wstring()' to make sure these values are current.
-
-`n->type'
- The type of the `NODE'. This is a C `enum'. Values should be one
- of `Node_var', `Node_var_new', or `Node_var_array' for function
- parameters.
-
-`n->vname'
- The "variable name" of a node. This is not of much use inside
- externally written extensions.
-
-`void assoc_clear(NODE *n)'
- Clears the associative array pointed to by `n'. Make sure that
- `n->type == Node_var_array' first.
-
-`NODE **assoc_lookup(NODE *symbol, NODE *subs)'
- Finds, and installs if necessary, array elements. `symbol' is the
- array, `subs' is the subscript. This is usually a value created
- with `make_string()' (see below).
-
-`NODE *make_string(char *s, size_t len)'
- Take a C string and turn it into a pointer to a `NODE' that can be
- stored appropriately. This is permanent storage; understanding of
- `gawk' memory management is helpful.
-
-`NODE *make_number(AWKNUM val)'
- Take an `AWKNUM' and turn it into a pointer to a `NODE' that can
- be stored appropriately. This is permanent storage; understanding
- of `gawk' memory management is helpful.
-
-`NODE *dupnode(NODE *n)'
- Duplicate a node. In most cases, this increments an internal
- reference count instead of actually duplicating the entire `NODE';
- understanding of `gawk' memory management is helpful.
-
-`void unref(NODE *n)'
- This macro releases the memory associated with a `NODE' allocated
- with `make_string()' or `make_number()'. Understanding of `gawk'
- memory management is helpful.
-
-`void make_builtin(const char *name, NODE *(*func)(NODE *), int count)'
- Register a C function pointed to by `func' as new built-in
- function `name'. `name' is a regular C string. `count' is the
- maximum number of arguments that the function takes. The function
- should be written in the following manner:
-
- /* do_xxx --- do xxx function for gawk */
-
- NODE *
- do_xxx(int nargs)
- {
- ...
- }
-
-`NODE *get_argument(int i)'
- This function is called from within a C extension function to get
- the `i'-th argument from the function call. The first argument is
- argument zero.
-
-`NODE *get_actual_argument(int i,'
-` int optional, int wantarray);'
- This function retrieves a particular argument `i'. `wantarray' is
- `TRUE' if the argument should be an array, `FALSE' otherwise. If
- `optional' is `TRUE', the argument need not have been supplied.
- If it wasn't, the return value is `NULL'. It is a fatal error if
- `optional' is `TRUE' but the argument was not provided.
-
-`get_scalar_argument(i, opt)'
- This is a convenience macro that calls `get_actual_argument()'.
-
-`get_array_argument(i, opt)'
- This is a convenience macro that calls `get_actual_argument()'.
-
-`void update_ERRNO_int(int errno_saved)'
- This function is called from within a C extension function to set
- the value of `gawk''s `ERRNO' variable, based on the error value
- provided as the argument. It is provided as a convenience.
-
-`void update_ERRNO_string(const char *string, enum errno_translate)'
- This function is called from within a C extension function to set
- the value of `gawk''s `ERRNO' variable to a given string. The
- second argument determines whether the string is translated before
- being installed into `ERRNO'. It is provided as a convenience.
-
-`void unset_ERRNO(void)'
- This function is called from within a C extension function to set
- the value of `gawk''s `ERRNO' variable to a null string. It is
- provided as a convenience.
-
-`void register_deferred_variable(const char *name, NODE *(*load_func)(void))'
- This function is called to register a function to be called when a
- reference to an undefined variable with the given name is
- encountered. The callback function will never be called if the
- variable exists already, so, unless the calling code is running at
- program startup, it should first check whether a variable of the
- given name already exists. The argument function must return a
- pointer to a `NODE' containing the newly created variable. This
- function is used to implement the builtin `ENVIRON' and `PROCINFO'
- arrays, so you can refer to them for examples.
-
-`void register_open_hook(void *(*open_func)(IOBUF *))'
- This function is called to register a function to be called
- whenever a new data file is opened, leading to the creation of an
- `IOBUF' structure in `iop_alloc()'. After creating the new
- `IOBUF', `iop_alloc()' will call (in reverse order of
- registration, so the last function registered is called first)
- each open hook until one returns non-`NULL'. If any hook returns
- a non-`NULL' value, that value is assigned to the `IOBUF''s
- `opaque' field (which will presumably point to a structure
- containing additional state associated with the input processing),
- and no further open hooks are called.
-
- The function called will most likely want to set the `IOBUF''s
- `get_record' method to indicate that future input records should
- be retrieved by calling that method instead of using the standard
- `gawk' input processing.
-
- And the function will also probably want to set the `IOBUF''s
- `close_func' method to be called when the file is closed to clean
- up any state associated with the input.
-
- Finally, hook functions should be prepared to receive an `IOBUF'
- structure where the `fd' field is set to `INVALID_HANDLE', meaning
- that `gawk' was not able to open the file itself. In this case,
- the hook function must be able to successfully open the file and
- place a valid file descriptor there.
-
- Currently, for example, the hook function facility is used to
- implement the XML parser shared library extension. For more info,
- please look in `awk.h' and in `io.c'.
-
- An argument that is supposed to be an array needs to be handled with
-some extra code, in case the array being passed in is actually from a
-function parameter.
-
- The following boilerplate code shows how to do this:
-
- NODE *the_arg;
-
- /* assume need 3rd arg, 0-based */
- the_arg = get_array_argument(2, FALSE);
-
- Again, you should spend time studying the `gawk' internals; don't
-just blindly copy this code.
-
-
-File: gawk.info, Node: Plugin License, Next: Loading Extensions, Prev: Internals, Up: Dynamic Extensions
-
-C.3.2 Extension Licensing
--------------------------
-
-Every dynamic extension should define the global symbol
-`plugin_is_GPL_compatible' to assert that it has been licensed under a
-GPL-compatible license. If this symbol does not exist, `gawk' will
-emit a fatal error and exit.
-
- The declared type of the symbol should be `int'. It does not need
-to be in any allocated section, though. The code merely asserts that
-the symbol exists in the global scope. Something like this is enough:
-
- int plugin_is_GPL_compatible;
-
-
-File: gawk.info, Node: Loading Extensions, Next: Sample Library, Prev: Plugin License, Up: Dynamic Extensions
-
-C.3.3 Loading a Dynamic Extension
----------------------------------
-
-There are two ways to load a dynamically linked library. The first is
-to use the builtin `extension()':
-
- extension(libname, init_func)
-
- where `libname' is the library to load, and `init_func' is the name
-of the initialization or bootstrap routine to run once loaded.
-
- The second method for dynamic loading of a library is to use the
-command line option `-l':
-
- $ gawk -l libname -f myprog
-
- This will work only if the initialization routine is named
-`dlload()'.
-
- If you use `extension()', the library will be loaded at run time.
-This means that the functions are available only to the rest of your
-script. If you use the command line option `-l' instead, the library
-will be loaded before `gawk' starts compiling the actual program. The
-net effect is that you can use those functions anywhere in the program.
-
- `gawk' has a list of directories where it searches for libraries.
-By default, the list includes directories that depend upon how gawk was
-built and installed (*note AWKLIBPATH Variable::). If you want `gawk'
-to look for libraries in your private directory, you have to tell it.
-The way to do it is to set the `AWKLIBPATH' environment variable (*note
-AWKLIBPATH Variable::). `gawk' supplies the default shared library
-platform suffix if it is not present in the name of the library. If
-the name of your library is `mylib.so', you can simply type
-
- $ gawk -l mylib -f myprog
-
- and `gawk' will do everything necessary to load in your library, and
-then call your `dlload()' routine.
-
- You can always specify the library using an absolute pathname, in
-which case `gawk' will not use `AWKLIBPATH' to search for it.
-
-
-File: gawk.info, Node: Sample Library, Prev: Loading Extensions, Up: Dynamic Extensions
-
-C.3.4 Example: Directory and File Operation Built-ins
------------------------------------------------------
-
-Two useful functions that are not in `awk' are `chdir()' (so that an
-`awk' program can change its directory) and `stat()' (so that an `awk'
-program can gather information about a file). This minor node
-implements these functions for `gawk' in an external extension library.
-
-* Menu:
-
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-
-
-File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library
-
-C.3.4.1 Using `chdir()' and `stat()'
-....................................
-
-This minor node shows how to use the new functions at the `awk' level
-once they've been integrated into the running `gawk' interpreter.
-Using `chdir()' is very straightforward. It takes one argument, the new
-directory to change to:
-
- ...
- newdir = "/home/arnold/funstuff"
- ret = chdir(newdir)
- if (ret < 0) {
- printf("could not change to %s: %s\n",
- newdir, ERRNO) > "/dev/stderr"
- exit 1
- }
- ...
-
- The return value is negative if the `chdir' failed, and `ERRNO'
-(*note Built-in Variables::) is set to a string indicating the error.
-
- Using `stat()' is a bit more complicated. The C `stat()' function
-fills in a structure that has a fair amount of information. The right
-way to model this in `awk' is to fill in an associative array with the
-appropriate information:
-
- file = "/home/arnold/.profile"
- fdata[1] = "x" # force `fdata' to be an array
- ret = stat(file, fdata)
- if (ret < 0) {
- printf("could not stat %s: %s\n",
- file, ERRNO) > "/dev/stderr"
- exit 1
- }
- printf("size of %s is %d bytes\n", file, fdata["size"])
-
- The `stat()' function always clears the data array, even if the
-`stat()' fails. It fills in the following elements:
-
-`"name"'
- The name of the file that was `stat()''ed.
-
-`"dev"'
-`"ino"'
- The file's device and inode numbers, respectively.
-
-`"mode"'
- The file's mode, as a numeric value. This includes both the file's
- type and its permissions.
-
-`"nlink"'
- The number of hard links (directory entries) the file has.
-
-`"uid"'
-`"gid"'
- The numeric user and group ID numbers of the file's owner.
-
-`"size"'
- The size in bytes of the file.
-
-`"blocks"'
- The number of disk blocks the file actually occupies. This may not
- be a function of the file's size if the file has holes.
-
-`"atime"'
-`"mtime"'
-`"ctime"'
- The file's last access, modification, and inode update times,
- respectively. These are numeric timestamps, suitable for
- formatting with `strftime()' (*note Built-in::).
-
-`"pmode"'
- The file's "printable mode." This is a string representation of
- the file's type and permissions, such as what is produced by `ls
- -l'--for example, `"drwxr-xr-x"'.
-
-`"type"'
- A printable string representation of the file's type. The value
- is one of the following:
-
- `"blockdev"'
- `"chardev"'
- The file is a block or character device ("special file").
-
- `"directory"'
- The file is a directory.
-
- `"fifo"'
- The file is a named-pipe (also known as a FIFO).
-
- `"file"'
- The file is just a regular file.
-
- `"socket"'
- The file is an `AF_UNIX' ("Unix domain") socket in the
- filesystem.
-
- `"symlink"'
- The file is a symbolic link.
-
- Several additional elements may be present depending upon the
-operating system and the type of the file. You can test for them in
-your `awk' program by using the `in' operator (*note Reference to
-Elements::):
-
-`"blksize"'
- The preferred block size for I/O to the file. This field is not
- present on all POSIX-like systems in the C `stat' structure.
-
-`"linkval"'
- If the file is a symbolic link, this element is the name of the
- file the link points to (i.e., the value of the link).
-
-`"rdev"'
-`"major"'
-`"minor"'
- If the file is a block or character device file, then these values
- represent the numeric device number and the major and minor
- components of that number, respectively.
-
-
-File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library
-
-C.3.4.2 C Code for `chdir()' and `stat()'
-.........................................
-
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability to
-other POSIX-compliant systems:(1)
-
- #include "awk.h"
-
- #include <sys/sysmacros.h>
-
- int plugin_is_GPL_compatible;
-
- /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-
- static NODE *
- do_chdir(int nargs)
- {
- NODE *newdir;
- int ret = -1;
-
- if (do_lint && nargs != 1)
- lintwarn("chdir: called with incorrect number of arguments");
-
- newdir = get_scalar_argument(0, FALSE);
-
- The file includes the `"awk.h"' header file for definitions for the
-`gawk' internals. It includes `<sys/sysmacros.h>' for access to the
-`major()' and `minor'() macros.
-
- By convention, for an `awk' function `foo', the function that
-implements it is called `do_foo'. The function should take a `int'
-argument, usually called `nargs', that represents the number of defined
-arguments for the function. The `newdir' variable represents the new
-directory to change to, retrieved with `get_scalar_argument()'. Note
-that the first argument is numbered zero.
-
- This code actually accomplishes the `chdir()'. It first forces the
-argument to be a string and passes the string value to the `chdir()'
-system call. If the `chdir()' fails, `ERRNO' is updated.
-
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO_int(errno);
-
- Finally, the function returns the return value to the `awk' level:
-
- return make_number((AWKNUM) ret);
- }
-
- The `stat()' built-in is more involved. First comes a function that
-turns a numeric mode into a printable representation (e.g., 644 becomes
-`-rw-r--r--'). This is omitted here for brevity:
-
- /* format_mode --- turn a stat mode field into something readable */
-
- static char *
- format_mode(unsigned long fmode)
- {
- ...
- }
-
- Next comes the `do_stat()' function. It starts with variable
-declarations and argument checking:
-
- /* do_stat --- provide a stat() function for gawk */
-
- static NODE *
- do_stat(int nargs)
- {
- NODE *file, *array, *tmp;
- struct stat sbuf;
- int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
-
- if (do_lint && nargs > 2)
- lintwarn("stat: called with too many arguments");
-
- Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array. The code use `lstat()' (instead of
-`stat()') to get the file information, in case the file is a symbolic
-link. If there's an error, it sets `ERRNO' and returns:
-
- /* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
-
- /* empty out the array */
- assoc_clear(array);
-
- /* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
- if (ret < 0) {
- update_ERRNO_int(errno);
- return make_number((AWKNUM) ret);
- }
-
- Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
-
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4));
- *aptr = dupnode(file);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("mode", 4));
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
-
- When done, return the `lstat()' return value:
-
-
- return make_number((AWKNUM) ret);
- }
-
- Finally, it's necessary to provide the "glue" that loads the new
-function(s) into `gawk'. By convention, each library has a routine
-named `dlload()' that does the job:
-
- /* dlload --- load new builtins in this library */
-
- NODE *
- dlload(NODE *tree, void *dl)
- {
- make_builtin("chdir", do_chdir, 1);
- make_builtin("stat", do_stat, 2);
- return make_number((AWKNUM) 0);
- }
-
- And that's it! As an exercise, consider adding functions to
-implement system calls such as `chown()', `chmod()', and `umask()'.
-
- ---------- Footnotes ----------
-
- (1) This version is edited slightly for presentation. See
-`extension/filefuncs.c' in the `gawk' distribution for the complete
-version.
-
-
-File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library
-
-C.3.4.3 Integrating the Extensions
-..................................
-
-Now that the code is written, it must be possible to add it at runtime
-to the running `gawk' interpreter. First, the code must be compiled.
-Assuming that the functions are in a file named `filefuncs.c', and IDIR
-is the location of the `gawk' include files, the following steps create
-a GNU/Linux shared library:
-
- $ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c
- $ ld -o filefuncs.so -shared filefuncs.o
-
- Once the library exists, it is loaded by calling the `extension()'
-built-in function. This function takes two arguments: the name of the
-library to load and the name of a function to call when the library is
-first loaded. This function adds the new functions to `gawk'. It
-returns the value returned by the initialization function within the
-shared library:
-
- # file testff.awk
- BEGIN {
- extension("./filefuncs.so", "dlload")
-
- chdir(".") # no-op
-
- data[1] = 1 # force `data' to be an array
- print "Info for testff.awk"
- ret = stat("testff.awk", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "testff.awk modified:",
- strftime("%m %d %y %H:%M:%S", data["mtime"])
-
- print "\nInfo for JUNK"
- ret = stat("JUNK", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
- }
-
- Here are the results of running the program:
-
- $ gawk -f testff.awk
- -| Info for testff.awk
- -| ret = 0
- -| data["size"] = 607
- -| data["ino"] = 14945891
- -| data["name"] = testff.awk
- -| data["pmode"] = -rw-rw-r--
- -| data["nlink"] = 1
- -| data["atime"] = 1293993369
- -| data["mtime"] = 1288520752
- -| data["mode"] = 33204
- -| data["blksize"] = 4096
- -| data["dev"] = 2054
- -| data["type"] = file
- -| data["gid"] = 500
- -| data["uid"] = 500
- -| data["blocks"] = 8
- -| data["ctime"] = 1290113572
- -| testff.awk modified: 10 31 10 12:25:52
- -|
- -| Info for JUNK
- -| ret = -1
- -| JUNK modified: 01 01 70 02:00:00
-
-
-File: gawk.info, Node: Future Extensions, Prev: Dynamic Extensions, Up: Notes
-
-C.4 Probable Future Extensions
-==============================
-
- AWK is a language similar to PERL, only considerably more elegant.
- Arnold Robbins
-
- Hey!
- Larry Wall
-
- This minor node briefly lists extensions and possible improvements
-that indicate the directions we are currently considering for `gawk'.
-The file `FUTURES' in the `gawk' distribution lists these extensions as
-well.
-
- Following is a list of probable future changes visible at the `awk'
-language level:
-
-Loadable module interface
- It is not clear that the `awk'-level interface to the modules
- facility is as good as it should be. The interface needs to be
- redesigned, particularly taking namespace issues into account, as
- well as possibly including issues such as library search path order
- and versioning.
-
-`RECLEN' variable for fixed-length records
- Along with `FIELDWIDTHS', this would speed up the processing of
- fixed-length records. `PROCINFO["RS"]' would be `"RS"' or
- `"RECLEN"', depending upon which kind of record processing is in
- effect.
-
-Databases
- It may be possible to map a GDBM/NDBM/SDBM file into an `awk'
- array.
-
-More `lint' warnings
- There are more things that could be checked for portability.
-
- Following is a list of probable improvements that will make `gawk''s
-source code easier to work with:
-
-Loadable module mechanics
- The current extension mechanism works (*note Dynamic Extensions::),
- but is rather primitive. It requires a fair amount of manual work
- to create and integrate a loadable module. Nor is the current
- mechanism as portable as might be desired. The GNU `libtool'
- package provides a number of features that would make using
- loadable modules much easier. `gawk' should be changed to use
- `libtool'.
-
-Loadable module internals
- The API to its internals that `gawk' "exports" should be revised.
- Too many things are needlessly exposed. A new API should be
- designed and implemented to make module writing easier.
-
-Better array subscript management
- `gawk''s management of array subscript storage could use revamping,
- so that using the same value to index multiple arrays only stores
- one copy of the index value.
-
- Finally, the programs in the test suite could use documenting in
-this Info file.
-
- *Note Additions::, if you are interested in tackling any of these
-projects.
-
-
-File: gawk.info, Node: Basic Concepts, Next: Glossary, Prev: Notes, Up: Top
-
-Appendix D Basic Programming Concepts
-*************************************
-
-This major node attempts to define some of the basic concepts and terms
-that are used throughout the rest of this Info file. As this Info file
-is specifically about `awk', and not about computer programming in
-general, the coverage here is by necessity fairly cursory and
-simplistic. (If you need more background, there are many other
-introductory texts that you should refer to instead.)
-
-* Menu:
-
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
-
-
-File: gawk.info, Node: Basic High Level, Next: Basic Data Typing, Up: Basic Concepts
-
-D.1 What a Program Does
-=======================
-
-At the most basic level, the job of a program is to process some input
-data and produce results.
-
- _______
- +------+ / \ +---------+
- | Data | -----> < Program > -----> | Results |
- +------+ \_______/ +---------+
-
- The "program" in the figure can be either a compiled program(1)
-(such as `ls'), or it may be "interpreted". In the latter case, a
-machine-executable program such as `awk' reads your program, and then
-uses the instructions in your program to process the data.
-
- When you write a program, it usually consists of the following, very
-basic set of steps:
-
- ______
- +----------------+ / More \ No +----------+
- | Initialization | -------> < Data > -------> | Clean Up |
- +----------------+ ^ \ ? / +----------+
- | +--+-+
- | | Yes
- | |
- | V
- | +---------+
- +-----+ Process |
- +---------+
-
-Initialization
- These are the things you do before actually starting to process
- data, such as checking arguments, initializing any data you need
- to work with, and so on. This step corresponds to `awk''s `BEGIN'
- rule (*note BEGIN/END::).
-
- If you were baking a cake, this might consist of laying out all the
- mixing bowls and the baking pan, and making sure you have all the
- ingredients that you need.
-
-Processing
- This is where the actual work is done. Your program reads data,
- one logical chunk at a time, and processes it as appropriate.
-
- In most programming languages, you have to manually manage the
- reading of data, checking to see if there is more each time you
- read a chunk. `awk''s pattern-action paradigm (*note Getting
- Started::) handles the mechanics of this for you.
-
- In baking a cake, the processing corresponds to the actual labor:
- breaking eggs, mixing the flour, water, and other ingredients, and
- then putting the cake into the oven.
-
-Clean Up
- Once you've processed all the data, you may have things you need to
- do before exiting. This step corresponds to `awk''s `END' rule
- (*note BEGIN/END::).
-
- After the cake comes out of the oven, you still have to wrap it in
- plastic wrap to keep anyone from tasting it, as well as wash the
- mixing bowls and utensils.
-
- An "algorithm" is a detailed set of instructions necessary to
-accomplish a task, or process data. It is much the same as a recipe
-for baking a cake. Programs implement algorithms. Often, it is up to
-you to design the algorithm and implement it, simultaneously.
-
- The "logical chunks" we talked about previously are called "records",
-similar to the records a company keeps on employees, a school keeps for
-students, or a doctor keeps for patients. Each record has many
-component parts, such as first and last names, date of birth, address,
-and so on. The component parts are referred to as the "fields" of the
-record.
-
- The act of reading data is termed "input", and that of generating
-results, not too surprisingly, is termed "output". They are often
-referred to together as "input/output," and even more often, as "I/O"
-for short. (You will also see "input" and "output" used as verbs.)
-
- `awk' manages the reading of data for you, as well as the breaking
-it up into records and fields. Your program's job is to tell `awk'
-what to do with the data. You do this by describing "patterns" in the
-data to look for, and "actions" to execute when those patterns are
-seen. This "data-driven" nature of `awk' programs usually makes them
-both easier to write and easier to read.
-
- ---------- Footnotes ----------
-
- (1) Compiled programs are typically written in lower-level languages
-such as C, C++, or Ada, and then translated, or "compiled", into a form
-that the computer can execute directly.
-
-
-File: gawk.info, Node: Basic Data Typing, Prev: Basic High Level, Up: Basic Concepts
-
-D.2 Data Values in a Computer
-=============================
-
-In a program, you keep track of information and values in things called
-"variables". A variable is just a name for a given value, such as
-`first_name', `last_name', `address', and so on. `awk' has several
-predefined variables, and it has special names to refer to the current
-input record and the fields of the record. You may also group multiple
-associated values under one name, as an array.
-
- Data, particularly in `awk', consists of either numeric values, such
-as 42 or 3.1415927, or string values. String values are essentially
-anything that's not a number, such as a name. Strings are sometimes
-referred to as "character data", since they store the individual
-characters that comprise them. Individual variables, as well as
-numeric and string variables, are referred to as "scalar" values.
-Groups of values, such as arrays, are not scalars.
-
- *note General Arithmetic::, provided a basic introduction to numeric
-types (integer and floating-point) and how they are used in a computer.
-Please review that information, including a number of caveats that were
-presented.
-
- While you are probably used to the idea of a number without a value
-(i.e., zero), it takes a bit more getting used to the idea of
-zero-length character data. Nevertheless, such a thing exists. It is
-called the "null string". The null string is character data that has
-no value. In other words, it is empty. It is written in `awk' programs
-like this: `""'.
-
- Humans are used to working in decimal; i.e., base 10. In base 10,
-numbers go from 0 to 9, and then "roll over" into the next column.
-(Remember grade school? 42 is 4 times 10 plus 2.)
-
- There are other number bases though. Computers commonly use base 2
-or "binary", base 8 or "octal", and base 16 or "hexadecimal". In
-binary, each column represents two times the value in the column to its
-right. Each column may contain either a 0 or a 1. Thus, binary 1010
-represents 1 times 8, plus 0 times 4, plus 1 times 2, plus 0 times 1,
-or decimal 10. Octal and hexadecimal are discussed more in *note
-Nondecimal-numbers::.
-
- At the very lowest level, computers store values as groups of binary
-digits, or "bits". Modern computers group bits into groups of eight,
-called "bytes". Advanced applications sometimes have to manipulate
-bits directly, and `gawk' provides functions for doing so.
-
- Programs are written in programming languages. Hundreds, if not
-thousands, of programming languages exist. One of the most popular is
-the C programming language. The C language had a very strong influence
-on the design of the `awk' language.
-
- There have been several versions of C. The first is often referred
-to as "K&R" C, after the initials of Brian Kernighan and Dennis Ritchie,
-the authors of the first book on C. (Dennis Ritchie created the
-language, and Brian Kernighan was one of the creators of `awk'.)
-
- In the mid-1980s, an effort began to produce an international
-standard for C. This work culminated in 1989, with the production of
-the ANSI standard for C. This standard became an ISO standard in 1990.
-In 1999, a revised ISO C standard was approved and released. Where it
-makes sense, POSIX `awk' is compatible with 1999 ISO C.
-
-
-File: gawk.info, Node: Glossary, Next: Copying, Prev: Basic Concepts, Up: Top
-
-Glossary
-********
-
-Action
- A series of `awk' statements attached to a rule. If the rule's
- pattern matches an input record, `awk' executes the rule's action.
- Actions are always enclosed in curly braces. (*Note Action
- Overview::.)
-
-Amazing `awk' Assembler
- Henry Spencer at the University of Toronto wrote a retargetable
- assembler completely as `sed' and `awk' scripts. It is thousands
- of lines long, including machine descriptions for several eight-bit
- microcomputers. It is a good example of a program that would have
- been better written in another language. You can get it from
- `http://awk.info/?awk100/aaa'.
-
-Ada
- A programming language originally defined by the U.S. Department of
- Defense for embedded programming. It was designed to enforce good
- Software Engineering practices.
-
-Amazingly Workable Formatter (`awf')
- Henry Spencer at the University of Toronto wrote a formatter that
- accepts a large subset of the `nroff -ms' and `nroff -man'
- formatting commands, using `awk' and `sh'. It is available from
- `http://awk.info/?tools/awf'.
-
-Anchor
- The regexp metacharacters `^' and `$', which force the match to
- the beginning or end of the string, respectively.
-
-ANSI
- The American National Standards Institute. This organization
- produces many standards, among them the standards for the C and
- C++ programming languages. These standards often become
- international standards as well. See also "ISO."
-
-Array
- A grouping of multiple values under the same name. Most languages
- just provide sequential arrays. `awk' provides associative arrays.
-
-Assertion
- A statement in a program that a condition is true at this point in
- the program. Useful for reasoning about how a program is supposed
- to behave.
-
-Assignment
- An `awk' expression that changes the value of some `awk' variable
- or data object. An object that you can assign to is called an
- "lvalue". The assigned values are called "rvalues". *Note
- Assignment Ops::.
-
-Associative Array
- Arrays in which the indices may be numbers or strings, not just
- sequential integers in a fixed range.
-
-`awk' Language
- The language in which `awk' programs are written.
-
-`awk' Program
- An `awk' program consists of a series of "patterns" and "actions",
- collectively known as "rules". For each input record given to the
- program, the program's rules are all processed in turn. `awk'
- programs may also contain function definitions.
-
-`awk' Script
- Another name for an `awk' program.
-
-Bash
- The GNU version of the standard shell (the Bourne-Again SHell).
- See also "Bourne Shell."
-
-BBS
- See "Bulletin Board System."
-
-Bit
- Short for "Binary Digit." All values in computer memory
- ultimately reduce to binary digits: values that are either zero or
- one. Groups of bits may be interpreted differently--as integers,
- floating-point numbers, character data, addresses of other memory
- objects, or other data. `awk' lets you work with floating-point
- numbers and strings. `gawk' lets you manipulate bit values with
- the built-in functions described in *note Bitwise Functions::.
-
- Computers are often defined by how many bits they use to represent
- integer values. Typical systems are 32-bit systems, but 64-bit
- systems are becoming increasingly popular, and 16-bit systems have
- essentially disappeared.
-
-Boolean Expression
- Named after the English mathematician Boole. See also "Logical
- Expression."
-
-Bourne Shell
- The standard shell (`/bin/sh') on Unix and Unix-like systems,
- originally written by Steven R. Bourne. Many shells (Bash, `ksh',
- `pdksh', `zsh') are generally upwardly compatible with the Bourne
- shell.
-
-Built-in Function
- The `awk' language provides built-in functions that perform various
- numerical, I/O-related, and string computations. Examples are
- `sqrt()' (for the square root of a number) and `substr()' (for a
- substring of a string). `gawk' provides functions for timestamp
- management, bit manipulation, array sorting, type checking, and
- runtime string translation. (*Note Built-in::.)
-
-Built-in Variable
- `ARGC', `ARGV', `CONVFMT', `ENVIRON', `FILENAME', `FNR', `FS',
- `NF', `NR', `OFMT', `OFS', `ORS', `RLENGTH', `RSTART', `RS', and
- `SUBSEP' are the variables that have special meaning to `awk'. In
- addition, `ARGIND', `BINMODE', `ERRNO', `FIELDWIDTHS', `FPAT',
- `IGNORECASE', `LINT', `PROCINFO', `RT', and `TEXTDOMAIN' are the
- variables that have special meaning to `gawk'. Changing some of
- them affects `awk''s running environment. (*Note Built-in
- Variables::.)
-
-Braces
- See "Curly Braces."
-
-Bulletin Board System
- A computer system allowing users to log in and read and/or leave
- messages for other users of the system, much like leaving paper
- notes on a bulletin board.
-
-C
- The system programming language that most GNU software is written
- in. The `awk' programming language has C-like syntax, and this
- Info file points out similarities between `awk' and C when
- appropriate.
-
- In general, `gawk' attempts to be as similar to the 1990 version
- of ISO C as makes sense.
-
-C++
- A popular object-oriented programming language derived from C.
-
-Character Set
- The set of numeric codes used by a computer system to represent the
- characters (letters, numbers, punctuation, etc.) of a particular
- country or place. The most common character set in use today is
- ASCII (American Standard Code for Information Interchange). Many
- European countries use an extension of ASCII known as ISO-8859-1
- (ISO Latin-1). The Unicode character set (http://www.unicode.org)
- is becoming increasingly popular and standard, and is particularly
- widely used on GNU/Linux systems.
-
-CHEM
- A preprocessor for `pic' that reads descriptions of molecules and
- produces `pic' input for drawing them. It was written in `awk' by
- Brian Kernighan and Jon Bentley, and is available from
- `http://netlib.sandia.gov/netlib/typesetting/chem.gz'.
-
-Coprocess
- A subordinate program with which two-way communications is
- possible.
-
-Compiler
- A program that translates human-readable source code into
- machine-executable object code. The object code is then executed
- directly by the computer. See also "Interpreter."
-
-Compound Statement
- A series of `awk' statements, enclosed in curly braces. Compound
- statements may be nested. (*Note Statements::.)
-
-Concatenation
- Concatenating two strings means sticking them together, one after
- another, producing a new string. For example, the string `foo'
- concatenated with the string `bar' gives the string `foobar'.
- (*Note Concatenation::.)
-
-Conditional Expression
- An expression using the `?:' ternary operator, such as `EXPR1 ?
- EXPR2 : EXPR3'. The expression EXPR1 is evaluated; if the result
- is true, the value of the whole expression is the value of EXPR2;
- otherwise the value is EXPR3. In either case, only one of EXPR2
- and EXPR3 is evaluated. (*Note Conditional Exp::.)
-
-Comparison Expression
- A relation that is either true or false, such as `a < b'.
- Comparison expressions are used in `if', `while', `do', and `for'
- statements, and in patterns to select which input records to
- process. (*Note Typing and Comparison::.)
-
-Curly Braces
- The characters `{' and `}'. Curly braces are used in `awk' for
- delimiting actions, compound statements, and function bodies.
-
-Dark Corner
- An area in the language where specifications often were (or still
- are) not clear, leading to unexpected or undesirable behavior.
- Such areas are marked in this Info file with "(d.c.)" in the text
- and are indexed under the heading "dark corner."
-
-Data Driven
- A description of `awk' programs, where you specify the data you
- are interested in processing, and what to do when that data is
- seen.
-
-Data Objects
- These are numbers and strings of characters. Numbers are
- converted into strings and vice versa, as needed. (*Note
- Conversion::.)
-
-Deadlock
- The situation in which two communicating processes are each waiting
- for the other to perform an action.
-
-Debugger
- A program used to help developers remove "bugs" from (de-bug)
- their programs.
-
-Double Precision
- An internal representation of numbers that can have fractional
- parts. Double precision numbers keep track of more digits than do
- single precision numbers, but operations on them are sometimes
- more expensive. This is the way `awk' stores numeric values. It
- is the C type `double'.
-
-Dynamic Regular Expression
- A dynamic regular expression is a regular expression written as an
- ordinary expression. It could be a string constant, such as
- `"foo"', but it may also be an expression whose value can vary.
- (*Note Computed Regexps::.)
-
-Environment
- A collection of strings, of the form NAME`='VAL, that each program
- has available to it. Users generally place values into the
- environment in order to provide information to various programs.
- Typical examples are the environment variables `HOME' and `PATH'.
-
-Empty String
- See "Null String."
-
-Epoch
- The date used as the "beginning of time" for timestamps. Time
- values in most systems are represented as seconds since the epoch,
- with library functions available for converting these values into
- standard date and time formats.
-
- The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC.
- See also "GMT" and "UTC."
-
-Escape Sequences
- A special sequence of characters used for describing nonprinting
- characters, such as `\n' for newline or `\033' for the ASCII ESC
- (Escape) character. (*Note Escape Sequences::.)
-
-Extension
- An additional feature or change to a programming language or
- utility not defined by that language's or utility's standard.
- `gawk' has (too) many extensions over POSIX `awk'.
-
-FDL
- See "Free Documentation License."
-
-Field
- When `awk' reads an input record, it splits the record into pieces
- separated by whitespace (or by a separator regexp that you can
- change by setting the built-in variable `FS'). Such pieces are
- called fields. If the pieces are of fixed length, you can use the
- built-in variable `FIELDWIDTHS' to describe their lengths. If you
- wish to specify the contents of fields instead of the field
- separator, you can use the built-in variable `FPAT' to do so.
- (*Note Field Separators::, *note Constant Size::, and *note
- Splitting By Content::.)
-
-Flag
- A variable whose truth value indicates the existence or
- nonexistence of some condition.
-
-Floating-Point Number
- Often referred to in mathematical terms as a "rational" or real
- number, this is just a number that can have a fractional part.
- See also "Double Precision" and "Single Precision."
-
-Format
- Format strings are used to control the appearance of output in the
- `strftime()' and `sprintf()' functions, and are used in the
- `printf' statement as well. Also, data conversions from numbers
- to strings are controlled by the format strings contained in the
- built-in variables `CONVFMT' and `OFMT'. (*Note Control Letters::.)
-
-Free Documentation License
- This document describes the terms under which this Info file is
- published and may be copied. (*Note GNU Free Documentation
- License::.)
-
-Function
- A specialized group of statements used to encapsulate general or
- program-specific tasks. `awk' has a number of built-in functions,
- and also allows you to define your own. (*Note Functions::.)
-
-FSF
- See "Free Software Foundation."
-
-Free Software Foundation
- A nonprofit organization dedicated to the production and
- distribution of freely distributable software. It was founded by
- Richard M. Stallman, the author of the original Emacs editor. GNU
- Emacs is the most widely used version of Emacs today.
-
-`gawk'
- The GNU implementation of `awk'.
-
-General Public License
- This document describes the terms under which `gawk' and its source
- code may be distributed. (*Note Copying::.)
-
-GMT
- "Greenwich Mean Time." This is the old term for UTC. It is the
- time of day used internally for Unix and POSIX systems. See also
- "Epoch" and "UTC."
-
-GNU
- "GNU's not Unix". An on-going project of the Free Software
- Foundation to create a complete, freely distributable,
- POSIX-compliant computing environment.
-
-GNU/Linux
- A variant of the GNU system using the Linux kernel, instead of the
- Free Software Foundation's Hurd kernel. The Linux kernel is a
- stable, efficient, full-featured clone of Unix that has been
- ported to a variety of architectures. It is most popular on
- PC-class systems, but runs well on a variety of other systems too.
- The Linux kernel source code is available under the terms of the
- GNU General Public License, which is perhaps its most important
- aspect.
-
-GPL
- See "General Public License."
-
-Hexadecimal
- Base 16 notation, where the digits are `0'-`9' and `A'-`F', with
- `A' representing 10, `B' representing 11, and so on, up to `F' for
- 15. Hexadecimal numbers are written in C using a leading `0x', to
- indicate their base. Thus, `0x12' is 18 (1 times 16 plus 2).
- *Note Nondecimal-numbers::.
-
-I/O
- Abbreviation for "Input/Output," the act of moving data into and/or
- out of a running program.
-
-Input Record
- A single chunk of data that is read in by `awk'. Usually, an
- `awk' input record consists of one line of text. (*Note
- Records::.)
-
-Integer
- A whole number, i.e., a number that does not have a fractional
- part.
-
-Internationalization
- The process of writing or modifying a program so that it can use
- multiple languages without requiring further source code changes.
-
-Interpreter
- A program that reads human-readable source code directly, and uses
- the instructions in it to process data and produce results. `awk'
- is typically (but not always) implemented as an interpreter. See
- also "Compiler."
-
-Interval Expression
- A component of a regular expression that lets you specify repeated
- matches of some part of the regexp. Interval expressions were not
- originally available in `awk' programs.
-
-ISO
- The International Standards Organization. This organization
- produces international standards for many things, including
- programming languages, such as C and C++. In the computer arena,
- important standards like those for C, C++, and POSIX become both
- American national and ISO international standards simultaneously.
- This Info file refers to Standard C as "ISO C" throughout.
-
-Java
- A modern programming language originally developed by Sun
- Microsystems (now Oracle) supporting Object-Oriented programming.
- Although usually implemented by compiling to the instructions for
- a standard virtual machine (the JVM), the language can be compiled
- to native code.
-
-Keyword
- In the `awk' language, a keyword is a word that has special
- meaning. Keywords are reserved and may not be used as variable
- names.
-
- `gawk''s keywords are: `BEGIN', `BEGINFILE', `END', `ENDFILE',
- `break', `case', `continue', `default' `delete', `do...while',
- `else', `exit', `for...in', `for', `function', `func', `if',
- `nextfile', `next', `switch', and `while'.
-
-Lesser General Public License
- This document describes the terms under which binary library
- archives or shared objects, and their source code may be
- distributed.
-
-Linux
- See "GNU/Linux."
-
-LGPL
- See "Lesser General Public License."
-
-Localization
- The process of providing the data necessary for an
- internationalized program to work in a particular language.
-
-Logical Expression
- An expression using the operators for logic, AND, OR, and NOT,
- written `&&', `||', and `!' in `awk'. Often called Boolean
- expressions, after the mathematician who pioneered this kind of
- mathematical logic.
-
-Lvalue
- An expression that can appear on the left side of an assignment
- operator. In most languages, lvalues can be variables or array
- elements. In `awk', a field designator can also be used as an
- lvalue.
-
-Matching
- The act of testing a string against a regular expression. If the
- regexp describes the contents of the string, it is said to "match"
- it.
-
-Metacharacters
- Characters used within a regexp that do not stand for themselves.
- Instead, they denote regular expression operations, such as
- repetition, grouping, or alternation.
-
-No-op
- An operation that does nothing.
-
-Null String
- A string with no characters in it. It is represented explicitly in
- `awk' programs by placing two double quote characters next to each
- other (`""'). It can appear in input data by having two successive
- occurrences of the field separator appear next to each other.
-
-Number
- A numeric-valued data object. Modern `awk' implementations use
- double precision floating-point to represent numbers. Ancient
- `awk' implementations used single precision floating-point.
-
-Octal
- Base-eight notation, where the digits are `0'-`7'. Octal numbers
- are written in C using a leading `0', to indicate their base.
- Thus, `013' is 11 (one times 8 plus 3). *Note
- Nondecimal-numbers::.
-
-P1003.1, P1003.2
- See "POSIX."
-
-Pattern
- Patterns tell `awk' which input records are interesting to which
- rules.
-
- A pattern is an arbitrary conditional expression against which
- input is tested. If the condition is satisfied, the pattern is
- said to "match" the input record. A typical pattern might compare
- the input record against a regular expression. (*Note Pattern
- Overview::.)
-
-POSIX
- The name for a series of standards that specify a Portable
- Operating System interface. The "IX" denotes the Unix heritage of
- these standards. The main standard of interest for `awk' users is
- `IEEE Standard for Information Technology, Standard 1003.1-2008'.
- The 2008 POSIX standard can be found online at
- `http://www.opengroup.org/onlinepubs/9699919799/'.
-
-Precedence
- The order in which operations are performed when operators are used
- without explicit parentheses.
-
-Private
- Variables and/or functions that are meant for use exclusively by
- library functions and not for the main `awk' program. Special care
- must be taken when naming such variables and functions. (*Note
- Library Names::.)
-
-Range (of input lines)
- A sequence of consecutive lines from the input file(s). A pattern
- can specify ranges of input lines for `awk' to process or it can
- specify single lines. (*Note Pattern Overview::.)
-
-Recursion
- When a function calls itself, either directly or indirectly. As
- long as this is not clear, refer to the entry for "recursion." If
- this is clear, stop, and proceed to the next entry.
-
-Redirection
- Redirection means performing input from something other than the
- standard input stream, or performing output to something other
- than the standard output stream.
-
- You can redirect input to the `getline' statement using the `<',
- `|', and `|&' operators. You can redirect the output of the
- `print' and `printf' statements to a file or a system command,
- using the `>', `>>', `|', and `|&' operators. (*Note Getline::,
- and *note Redirection::.)
-
-Regexp
- See "Regular Expression."
-
-Regular Expression
- A regular expression ("regexp" for short) is a pattern that
- denotes a set of strings, possibly an infinite set. For example,
- the regular expression `R.*xp' matches any string starting with
- the letter `R' and ending with the letters `xp'. In `awk',
- regular expressions are used in patterns and in conditional
- expressions. Regular expressions may contain escape sequences.
- (*Note Regexp::.)
-
-Regular Expression Constant
- A regular expression constant is a regular expression written
- within slashes, such as `/foo/'. This regular expression is chosen
- when you write the `awk' program and cannot be changed during its
- execution. (*Note Regexp Usage::.)
-
-Rule
- A segment of an `awk' program that specifies how to process single
- input records. A rule consists of a "pattern" and an "action".
- `awk' reads an input record; then, for each rule, if the input
- record satisfies the rule's pattern, `awk' executes the rule's
- action. Otherwise, the rule does nothing for that input record.
-
-Rvalue
- A value that can appear on the right side of an assignment
- operator. In `awk', essentially every expression has a value.
- These values are rvalues.
-
-Scalar
- A single value, be it a number or a string. Regular variables are
- scalars; arrays and functions are not.
-
-Search Path
- In `gawk', a list of directories to search for `awk' program
- source files. In the shell, a list of directories to search for
- executable programs.
-
-Seed
- The initial value, or starting point, for a sequence of random
- numbers.
-
-`sed'
- See "Stream Editor."
-
-Shell
- The command interpreter for Unix and POSIX-compliant systems. The
- shell works both interactively, and as a programming language for
- batch files, or shell scripts.
-
-Short-Circuit
- The nature of the `awk' logical operators `&&' and `||'. If the
- value of the entire expression is determinable from evaluating just
- the lefthand side of these operators, the righthand side is not
- evaluated. (*Note Boolean Ops::.)
-
-Side Effect
- A side effect occurs when an expression has an effect aside from
- merely producing a value. Assignment expressions, increment and
- decrement expressions, and function calls have side effects.
- (*Note Assignment Ops::.)
-
-Single Precision
- An internal representation of numbers that can have fractional
- parts. Single precision numbers keep track of fewer digits than
- do double precision numbers, but operations on them are sometimes
- less expensive in terms of CPU time. This is the type used by
- some very old versions of `awk' to store numeric values. It is
- the C type `float'.
-
-Space
- The character generated by hitting the space bar on the keyboard.
-
-Special File
- A file name interpreted internally by `gawk', instead of being
- handed directly to the underlying operating system--for example,
- `/dev/stderr'. (*Note Special Files::.)
-
-Stream Editor
- A program that reads records from an input stream and processes
- them one or more at a time. This is in contrast with batch
- programs, which may expect to read their input files in entirety
- before starting to do anything, as well as with interactive
- programs which require input from the user.
-
-String
- A datum consisting of a sequence of characters, such as `I am a
- string'. Constant strings are written with double quotes in the
- `awk' language and may contain escape sequences. (*Note Escape
- Sequences::.)
-
-Tab
- The character generated by hitting the `TAB' key on the keyboard.
- It usually expands to up to eight spaces upon output.
-
-Text Domain
- A unique name that identifies an application. Used for grouping
- messages that are translated at runtime into the local language.
-
-Timestamp
- A value in the "seconds since the epoch" format used by Unix and
- POSIX systems. Used for the `gawk' functions `mktime()',
- `strftime()', and `systime()'. See also "Epoch" and "UTC."
-
-Unix
- A computer operating system originally developed in the early
- 1970's at AT&T Bell Laboratories. It initially became popular in
- universities around the world and later moved into commercial
- environments as a software development system and network server
- system. There are many commercial versions of Unix, as well as
- several work-alike systems whose source code is freely available
- (such as GNU/Linux, NetBSD (http://www.netbsd.org), FreeBSD
- (http://www.freebsd.org), and OpenBSD (http://www.openbsd.org)).
-
-UTC
- The accepted abbreviation for "Universal Coordinated Time." This
- is standard time in Greenwich, England, which is used as a
- reference time for day and date calculations. See also "Epoch"
- and "GMT."
-
-Whitespace
- A sequence of space, TAB, or newline characters occurring inside
- an input record or a string.
-
-
-File: gawk.info, Node: Copying, Next: GNU Free Documentation License, Prev: Glossary, Up: Top
-
-GNU General Public License
-**************************
-
- Version 3, 29 June 2007
-
- Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
-
- Everyone is permitted to copy and distribute verbatim copies of this
- license document, but changing it is not allowed.
-
-Preamble
-========
-
-The GNU General Public License is a free, copyleft license for software
-and other kinds of works.
-
- The licenses for most software and other practical works are designed
-to take away your freedom to share and change the works. By contrast,
-the GNU General Public License is intended to guarantee your freedom to
-share and change all versions of a program--to make sure it remains
-free software for all its users. We, the Free Software Foundation, use
-the GNU General Public License for most of our software; it applies
-also to any other work released this way by its authors. You can apply
-it to your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-them if you wish), that you receive source code or can get it if you
-want it, that you can change the software or use pieces of it in new
-free programs, and that you know you can do these things.
-
- To protect your rights, we need to prevent others from denying you
-these rights or asking you to surrender the rights. Therefore, you
-have certain responsibilities if you distribute copies of the software,
-or if you modify it: responsibilities to respect the freedom of others.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must pass on to the recipients the same
-freedoms that you received. You must make sure that they, too, receive
-or can get the source code. And you must show them these terms so they
-know their rights.
-
- Developers that use the GNU GPL protect your rights with two steps:
-(1) assert copyright on the software, and (2) offer you this License
-giving you legal permission to copy, distribute and/or modify it.
-
- For the developers' and authors' protection, the GPL clearly explains
-that there is no warranty for this free software. For both users' and
-authors' sake, the GPL requires that modified versions be marked as
-changed, so that their problems will not be attributed erroneously to
-authors of previous versions.
-
- Some devices are designed to deny users access to install or run
-modified versions of the software inside them, although the
-manufacturer can do so. This is fundamentally incompatible with the
-aim of protecting users' freedom to change the software. The
-systematic pattern of such abuse occurs in the area of products for
-individuals to use, which is precisely where it is most unacceptable.
-Therefore, we have designed this version of the GPL to prohibit the
-practice for those products. If such problems arise substantially in
-other domains, we stand ready to extend this provision to those domains
-in future versions of the GPL, as needed to protect the freedom of
-users.
-
- Finally, every program is threatened constantly by software patents.
-States should not allow patents to restrict development and use of
-software on general-purpose computers, but in those that do, we wish to
-avoid the special danger that patents applied to a free program could
-make it effectively proprietary. To prevent this, the GPL assures that
-patents cannot be used to render the program non-free.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
-TERMS AND CONDITIONS
-====================
-
- 0. Definitions.
-
- "This License" refers to version 3 of the GNU General Public
- License.
-
- "Copyright" also means copyright-like laws that apply to other
- kinds of works, such as semiconductor masks.
-
- "The Program" refers to any copyrightable work licensed under this
- License. Each licensee is addressed as "you". "Licensees" and
- "recipients" may be individuals or organizations.
-
- To "modify" a work means to copy from or adapt all or part of the
- work in a fashion requiring copyright permission, other than the
- making of an exact copy. The resulting work is called a "modified
- version" of the earlier work or a work "based on" the earlier work.
-
- A "covered work" means either the unmodified Program or a work
- based on the Program.
-
- To "propagate" a work means to do anything with it that, without
- permission, would make you directly or secondarily liable for
- infringement under applicable copyright law, except executing it
- on a computer or modifying a private copy. Propagation includes
- copying, distribution (with or without modification), making
- available to the public, and in some countries other activities as
- well.
-
- To "convey" a work means any kind of propagation that enables other
- parties to make or receive copies. Mere interaction with a user
- through a computer network, with no transfer of a copy, is not
- conveying.
-
- An interactive user interface displays "Appropriate Legal Notices"
- to the extent that it includes a convenient and prominently visible
- feature that (1) displays an appropriate copyright notice, and (2)
- tells the user that there is no warranty for the work (except to
- the extent that warranties are provided), that licensees may
- convey the work under this License, and how to view a copy of this
- License. If the interface presents a list of user commands or
- options, such as a menu, a prominent item in the list meets this
- criterion.
-
- 1. Source Code.
-
- The "source code" for a work means the preferred form of the work
- for making modifications to it. "Object code" means any
- non-source form of a work.
-
- A "Standard Interface" means an interface that either is an
- official standard defined by a recognized standards body, or, in
- the case of interfaces specified for a particular programming
- language, one that is widely used among developers working in that
- language.
-
- The "System Libraries" of an executable work include anything,
- other than the work as a whole, that (a) is included in the normal
- form of packaging a Major Component, but which is not part of that
- Major Component, and (b) serves only to enable use of the work
- with that Major Component, or to implement a Standard Interface
- for which an implementation is available to the public in source
- code form. A "Major Component", in this context, means a major
- essential component (kernel, window system, and so on) of the
- specific operating system (if any) on which the executable work
- runs, or a compiler used to produce the work, or an object code
- interpreter used to run it.
-
- The "Corresponding Source" for a work in object code form means all
- the source code needed to generate, install, and (for an executable
- work) run the object code and to modify the work, including
- scripts to control those activities. However, it does not include
- the work's System Libraries, or general-purpose tools or generally
- available free programs which are used unmodified in performing
- those activities but which are not part of the work. For example,
- Corresponding Source includes interface definition files
- associated with source files for the work, and the source code for
- shared libraries and dynamically linked subprograms that the work
- is specifically designed to require, such as by intimate data
- communication or control flow between those subprograms and other
- parts of the work.
-
- The Corresponding Source need not include anything that users can
- regenerate automatically from other parts of the Corresponding
- Source.
-
- The Corresponding Source for a work in source code form is that
- same work.
-
- 2. Basic Permissions.
-
- All rights granted under this License are granted for the term of
- copyright on the Program, and are irrevocable provided the stated
- conditions are met. This License explicitly affirms your unlimited
- permission to run the unmodified Program. The output from running
- a covered work is covered by this License only if the output,
- given its content, constitutes a covered work. This License
- acknowledges your rights of fair use or other equivalent, as
- provided by copyright law.
-
- You may make, run and propagate covered works that you do not
- convey, without conditions so long as your license otherwise
- remains in force. You may convey covered works to others for the
- sole purpose of having them make modifications exclusively for
- you, or provide you with facilities for running those works,
- provided that you comply with the terms of this License in
- conveying all material for which you do not control copyright.
- Those thus making or running the covered works for you must do so
- exclusively on your behalf, under your direction and control, on
- terms that prohibit them from making any copies of your
- copyrighted material outside their relationship with you.
-
- Conveying under any other circumstances is permitted solely under
- the conditions stated below. Sublicensing is not allowed; section
- 10 makes it unnecessary.
-
- 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
-
- No covered work shall be deemed part of an effective technological
- measure under any applicable law fulfilling obligations under
- article 11 of the WIPO copyright treaty adopted on 20 December
- 1996, or similar laws prohibiting or restricting circumvention of
- such measures.
-
- When you convey a covered work, you waive any legal power to forbid
- circumvention of technological measures to the extent such
- circumvention is effected by exercising rights under this License
- with respect to the covered work, and you disclaim any intention
- to limit operation or modification of the work as a means of
- enforcing, against the work's users, your or third parties' legal
- rights to forbid circumvention of technological measures.
-
- 4. Conveying Verbatim Copies.
-
- You may convey verbatim copies of the Program's source code as you
- receive it, in any medium, provided that you conspicuously and
- appropriately publish on each copy an appropriate copyright notice;
- keep intact all notices stating that this License and any
- non-permissive terms added in accord with section 7 apply to the
- code; keep intact all notices of the absence of any warranty; and
- give all recipients a copy of this License along with the Program.
-
- You may charge any price or no price for each copy that you convey,
- and you may offer support or warranty protection for a fee.
-
- 5. Conveying Modified Source Versions.
-
- You may convey a work based on the Program, or the modifications to
- produce it from the Program, in the form of source code under the
- terms of section 4, provided that you also meet all of these
- conditions:
-
- a. The work must carry prominent notices stating that you
- modified it, and giving a relevant date.
-
- b. The work must carry prominent notices stating that it is
- released under this License and any conditions added under
- section 7. This requirement modifies the requirement in
- section 4 to "keep intact all notices".
-
- c. You must license the entire work, as a whole, under this
- License to anyone who comes into possession of a copy. This
- License will therefore apply, along with any applicable
- section 7 additional terms, to the whole of the work, and all
- its parts, regardless of how they are packaged. This License
- gives no permission to license the work in any other way, but
- it does not invalidate such permission if you have separately
- received it.
-
- d. If the work has interactive user interfaces, each must display
- Appropriate Legal Notices; however, if the Program has
- interactive interfaces that do not display Appropriate Legal
- Notices, your work need not make them do so.
-
- A compilation of a covered work with other separate and independent
- works, which are not by their nature extensions of the covered
- work, and which are not combined with it such as to form a larger
- program, in or on a volume of a storage or distribution medium, is
- called an "aggregate" if the compilation and its resulting
- copyright are not used to limit the access or legal rights of the
- compilation's users beyond what the individual works permit.
- Inclusion of a covered work in an aggregate does not cause this
- License to apply to the other parts of the aggregate.
-
- 6. Conveying Non-Source Forms.
-
- You may convey a covered work in object code form under the terms
- of sections 4 and 5, provided that you also convey the
- machine-readable Corresponding Source under the terms of this
- License, in one of these ways:
-
- a. Convey the object code in, or embodied in, a physical product
- (including a physical distribution medium), accompanied by the
- Corresponding Source fixed on a durable physical medium
- customarily used for software interchange.
-
- b. Convey the object code in, or embodied in, a physical product
- (including a physical distribution medium), accompanied by a
- written offer, valid for at least three years and valid for
- as long as you offer spare parts or customer support for that
- product model, to give anyone who possesses the object code
- either (1) a copy of the Corresponding Source for all the
- software in the product that is covered by this License, on a
- durable physical medium customarily used for software
- interchange, for a price no more than your reasonable cost of
- physically performing this conveying of source, or (2) access
- to copy the Corresponding Source from a network server at no
- charge.
-
- c. Convey individual copies of the object code with a copy of
- the written offer to provide the Corresponding Source. This
- alternative is allowed only occasionally and noncommercially,
- and only if you received the object code with such an offer,
- in accord with subsection 6b.
-
- d. Convey the object code by offering access from a designated
- place (gratis or for a charge), and offer equivalent access
- to the Corresponding Source in the same way through the same
- place at no further charge. You need not require recipients
- to copy the Corresponding Source along with the object code.
- If the place to copy the object code is a network server, the
- Corresponding Source may be on a different server (operated
- by you or a third party) that supports equivalent copying
- facilities, provided you maintain clear directions next to
- the object code saying where to find the Corresponding Source.
- Regardless of what server hosts the Corresponding Source, you
- remain obligated to ensure that it is available for as long
- as needed to satisfy these requirements.
-
- e. Convey the object code using peer-to-peer transmission,
- provided you inform other peers where the object code and
- Corresponding Source of the work are being offered to the
- general public at no charge under subsection 6d.
-
-
- A separable portion of the object code, whose source code is
- excluded from the Corresponding Source as a System Library, need
- not be included in conveying the object code work.
-
- A "User Product" is either (1) a "consumer product", which means
- any tangible personal property which is normally used for personal,
- family, or household purposes, or (2) anything designed or sold for
- incorporation into a dwelling. In determining whether a product
- is a consumer product, doubtful cases shall be resolved in favor of
- coverage. For a particular product received by a particular user,
- "normally used" refers to a typical or common use of that class of
- product, regardless of the status of the particular user or of the
- way in which the particular user actually uses, or expects or is
- expected to use, the product. A product is a consumer product
- regardless of whether the product has substantial commercial,
- industrial or non-consumer uses, unless such uses represent the
- only significant mode of use of the product.
-
- "Installation Information" for a User Product means any methods,
- procedures, authorization keys, or other information required to
- install and execute modified versions of a covered work in that
- User Product from a modified version of its Corresponding Source.
- The information must suffice to ensure that the continued
- functioning of the modified object code is in no case prevented or
- interfered with solely because modification has been made.
-
- If you convey an object code work under this section in, or with,
- or specifically for use in, a User Product, and the conveying
- occurs as part of a transaction in which the right of possession
- and use of the User Product is transferred to the recipient in
- perpetuity or for a fixed term (regardless of how the transaction
- is characterized), the Corresponding Source conveyed under this
- section must be accompanied by the Installation Information. But
- this requirement does not apply if neither you nor any third party
- retains the ability to install modified object code on the User
- Product (for example, the work has been installed in ROM).
-
- The requirement to provide Installation Information does not
- include a requirement to continue to provide support service,
- warranty, or updates for a work that has been modified or
- installed by the recipient, or for the User Product in which it
- has been modified or installed. Access to a network may be denied
- when the modification itself materially and adversely affects the
- operation of the network or violates the rules and protocols for
- communication across the network.
-
- Corresponding Source conveyed, and Installation Information
- provided, in accord with this section must be in a format that is
- publicly documented (and with an implementation available to the
- public in source code form), and must require no special password
- or key for unpacking, reading or copying.
-
- 7. Additional Terms.
-
- "Additional permissions" are terms that supplement the terms of
- this License by making exceptions from one or more of its
- conditions. Additional permissions that are applicable to the
- entire Program shall be treated as though they were included in
- this License, to the extent that they are valid under applicable
- law. If additional permissions apply only to part of the Program,
- that part may be used separately under those permissions, but the
- entire Program remains governed by this License without regard to
- the additional permissions.
-
- When you convey a copy of a covered work, you may at your option
- remove any additional permissions from that copy, or from any part
- of it. (Additional permissions may be written to require their own
- removal in certain cases when you modify the work.) You may place
- additional permissions on material, added by you to a covered work,
- for which you have or can give appropriate copyright permission.
-
- Notwithstanding any other provision of this License, for material
- you add to a covered work, you may (if authorized by the copyright
- holders of that material) supplement the terms of this License
- with terms:
-
- a. Disclaiming warranty or limiting liability differently from
- the terms of sections 15 and 16 of this License; or
-
- b. Requiring preservation of specified reasonable legal notices
- or author attributions in that material or in the Appropriate
- Legal Notices displayed by works containing it; or
-
- c. Prohibiting misrepresentation of the origin of that material,
- or requiring that modified versions of such material be
- marked in reasonable ways as different from the original
- version; or
-
- d. Limiting the use for publicity purposes of names of licensors
- or authors of the material; or
-
- e. Declining to grant rights under trademark law for use of some
- trade names, trademarks, or service marks; or
-
- f. Requiring indemnification of licensors and authors of that
- material by anyone who conveys the material (or modified
- versions of it) with contractual assumptions of liability to
- the recipient, for any liability that these contractual
- assumptions directly impose on those licensors and authors.
-
- All other non-permissive additional terms are considered "further
- restrictions" within the meaning of section 10. If the Program as
- you received it, or any part of it, contains a notice stating that
- it is governed by this License along with a term that is a further
- restriction, you may remove that term. If a license document
- contains a further restriction but permits relicensing or
- conveying under this License, you may add to a covered work
- material governed by the terms of that license document, provided
- that the further restriction does not survive such relicensing or
- conveying.
-
- If you add terms to a covered work in accord with this section, you
- must place, in the relevant source files, a statement of the
- additional terms that apply to those files, or a notice indicating
- where to find the applicable terms.
-
- Additional terms, permissive or non-permissive, may be stated in
- the form of a separately written license, or stated as exceptions;
- the above requirements apply either way.
-
- 8. Termination.
-
- You may not propagate or modify a covered work except as expressly
- provided under this License. Any attempt otherwise to propagate or
- modify it is void, and will automatically terminate your rights
- under this License (including any patent licenses granted under
- the third paragraph of section 11).
-
- However, if you cease all violation of this License, then your
- license from a particular copyright holder is reinstated (a)
- provisionally, unless and until the copyright holder explicitly
- and finally terminates your license, and (b) permanently, if the
- copyright holder fails to notify you of the violation by some
- reasonable means prior to 60 days after the cessation.
-
- Moreover, your license from a particular copyright holder is
- reinstated permanently if the copyright holder notifies you of the
- violation by some reasonable means, this is the first time you have
- received notice of violation of this License (for any work) from
- that copyright holder, and you cure the violation prior to 30 days
- after your receipt of the notice.
-
- Termination of your rights under this section does not terminate
- the licenses of parties who have received copies or rights from
- you under this License. If your rights have been terminated and
- not permanently reinstated, you do not qualify to receive new
- licenses for the same material under section 10.
-
- 9. Acceptance Not Required for Having Copies.
-
- You are not required to accept this License in order to receive or
- run a copy of the Program. Ancillary propagation of a covered work
- occurring solely as a consequence of using peer-to-peer
- transmission to receive a copy likewise does not require
- acceptance. However, nothing other than this License grants you
- permission to propagate or modify any covered work. These actions
- infringe copyright if you do not accept this License. Therefore,
- by modifying or propagating a covered work, you indicate your
- acceptance of this License to do so.
-
- 10. Automatic Licensing of Downstream Recipients.
-
- Each time you convey a covered work, the recipient automatically
- receives a license from the original licensors, to run, modify and
- propagate that work, subject to this License. You are not
- responsible for enforcing compliance by third parties with this
- License.
-
- An "entity transaction" is a transaction transferring control of an
- organization, or substantially all assets of one, or subdividing an
- organization, or merging organizations. If propagation of a
- covered work results from an entity transaction, each party to that
- transaction who receives a copy of the work also receives whatever
- licenses to the work the party's predecessor in interest had or
- could give under the previous paragraph, plus a right to
- possession of the Corresponding Source of the work from the
- predecessor in interest, if the predecessor has it or can get it
- with reasonable efforts.
-
- You may not impose any further restrictions on the exercise of the
- rights granted or affirmed under this License. For example, you
- may not impose a license fee, royalty, or other charge for
- exercise of rights granted under this License, and you may not
- initiate litigation (including a cross-claim or counterclaim in a
- lawsuit) alleging that any patent claim is infringed by making,
- using, selling, offering for sale, or importing the Program or any
- portion of it.
-
- 11. Patents.
-
- A "contributor" is a copyright holder who authorizes use under this
- License of the Program or a work on which the Program is based.
- The work thus licensed is called the contributor's "contributor
- version".
-
- A contributor's "essential patent claims" are all patent claims
- owned or controlled by the contributor, whether already acquired or
- hereafter acquired, that would be infringed by some manner,
- permitted by this License, of making, using, or selling its
- contributor version, but do not include claims that would be
- infringed only as a consequence of further modification of the
- contributor version. For purposes of this definition, "control"
- includes the right to grant patent sublicenses in a manner
- consistent with the requirements of this License.
-
- Each contributor grants you a non-exclusive, worldwide,
- royalty-free patent license under the contributor's essential
- patent claims, to make, use, sell, offer for sale, import and
- otherwise run, modify and propagate the contents of its
- contributor version.
-
- In the following three paragraphs, a "patent license" is any
- express agreement or commitment, however denominated, not to
- enforce a patent (such as an express permission to practice a
- patent or covenant not to sue for patent infringement). To
- "grant" such a patent license to a party means to make such an
- agreement or commitment not to enforce a patent against the party.
-
- If you convey a covered work, knowingly relying on a patent
- license, and the Corresponding Source of the work is not available
- for anyone to copy, free of charge and under the terms of this
- License, through a publicly available network server or other
- readily accessible means, then you must either (1) cause the
- Corresponding Source to be so available, or (2) arrange to deprive
- yourself of the benefit of the patent license for this particular
- work, or (3) arrange, in a manner consistent with the requirements
- of this License, to extend the patent license to downstream
- recipients. "Knowingly relying" means you have actual knowledge
- that, but for the patent license, your conveying the covered work
- in a country, or your recipient's use of the covered work in a
- country, would infringe one or more identifiable patents in that
- country that you have reason to believe are valid.
-
- If, pursuant to or in connection with a single transaction or
- arrangement, you convey, or propagate by procuring conveyance of, a
- covered work, and grant a patent license to some of the parties
- receiving the covered work authorizing them to use, propagate,
- modify or convey a specific copy of the covered work, then the
- patent license you grant is automatically extended to all
- recipients of the covered work and works based on it.
-
- A patent license is "discriminatory" if it does not include within
- the scope of its coverage, prohibits the exercise of, or is
- conditioned on the non-exercise of one or more of the rights that
- are specifically granted under this License. You may not convey a
- covered work if you are a party to an arrangement with a third
- party that is in the business of distributing software, under
- which you make payment to the third party based on the extent of
- your activity of conveying the work, and under which the third
- party grants, to any of the parties who would receive the covered
- work from you, a discriminatory patent license (a) in connection
- with copies of the covered work conveyed by you (or copies made
- from those copies), or (b) primarily for and in connection with
- specific products or compilations that contain the covered work,
- unless you entered into that arrangement, or that patent license
- was granted, prior to 28 March 2007.
-
- Nothing in this License shall be construed as excluding or limiting
- any implied license or other defenses to infringement that may
- otherwise be available to you under applicable patent law.
-
- 12. No Surrender of Others' Freedom.
-
- If conditions are imposed on you (whether by court order,
- agreement or otherwise) that contradict the conditions of this
- License, they do not excuse you from the conditions of this
- License. If you cannot convey a covered work so as to satisfy
- simultaneously your obligations under this License and any other
- pertinent obligations, then as a consequence you may not convey it
- at all. For example, if you agree to terms that obligate you to
- collect a royalty for further conveying from those to whom you
- convey the Program, the only way you could satisfy both those
- terms and this License would be to refrain entirely from conveying
- the Program.
-
- 13. Use with the GNU Affero General Public License.
-
- Notwithstanding any other provision of this License, you have
- permission to link or combine any covered work with a work licensed
- under version 3 of the GNU Affero General Public License into a
- single combined work, and to convey the resulting work. The terms
- of this License will continue to apply to the part which is the
- covered work, but the special requirements of the GNU Affero
- General Public License, section 13, concerning interaction through
- a network will apply to the combination as such.
-
- 14. Revised Versions of this License.
-
- The Free Software Foundation may publish revised and/or new
- versions of the GNU General Public License from time to time.
- Such new versions will be similar in spirit to the present
- version, but may differ in detail to address new problems or
- concerns.
-
- Each version is given a distinguishing version number. If the
- Program specifies that a certain numbered version of the GNU
- General Public License "or any later version" applies to it, you
- have the option of following the terms and conditions either of
- that numbered version or of any later version published by the
- Free Software Foundation. If the Program does not specify a
- version number of the GNU General Public License, you may choose
- any version ever published by the Free Software Foundation.
-
- If the Program specifies that a proxy can decide which future
- versions of the GNU General Public License can be used, that
- proxy's public statement of acceptance of a version permanently
- authorizes you to choose that version for the Program.
-
- Later license versions may give you additional or different
- permissions. However, no additional obligations are imposed on any
- author or copyright holder as a result of your choosing to follow a
- later version.
-
- 15. Disclaimer of Warranty.
-
- THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
- APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
- COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS"
- WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
- INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE
- RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
- SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
- NECESSARY SERVICING, REPAIR OR CORRECTION.
-
- 16. Limitation of Liability.
-
- IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
- WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
- AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
- FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
- CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
- THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
- BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
- PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
- PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
- THE POSSIBILITY OF SUCH DAMAGES.
-
- 17. Interpretation of Sections 15 and 16.
-
- If the disclaimer of warranty and limitation of liability provided
- above cannot be given local legal effect according to their terms,
- reviewing courts shall apply local law that most closely
- approximates an absolute waiver of all civil liability in
- connection with the Program, unless a warranty or assumption of
- liability accompanies a copy of the Program in return for a fee.
-
-
-END OF TERMS AND CONDITIONS
-===========================
-
-How to Apply These Terms to Your New Programs
-=============================================
-
-If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these
-terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-state the exclusion of warranty; and each file should have at least the
-"copyright" line and a pointer to where the full notice is found.
-
- ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
- Copyright (C) YEAR NAME OF AUTHOR
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or (at
- your option) any later version.
-
- This program is distributed in the hope that it will be useful, but
- WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see `http://www.gnu.org/licenses/'.
-
- Also add information on how to contact you by electronic and paper
-mail.
-
- If the program does terminal interaction, make it output a short
-notice like this when it starts in an interactive mode:
-
- PROGRAM Copyright (C) YEAR NAME OF AUTHOR
- This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
- The hypothetical commands `show w' and `show c' should show the
-appropriate parts of the General Public License. Of course, your
-program's commands might be different; for a GUI interface, you would
-use an "about box".
-
- You should also get your employer (if you work as a programmer) or
-school, if any, to sign a "copyright disclaimer" for the program, if
-necessary. For more information on this, and how to apply and follow
-the GNU GPL, see `http://www.gnu.org/licenses/'.
-
- The GNU General Public License does not permit incorporating your
-program into proprietary programs. If your program is a subroutine
-library, you may consider it more useful to permit linking proprietary
-applications with the library. If this is what you want to do, use the
-GNU Lesser General Public License instead of this License. But first,
-please read `http://www.gnu.org/philosophy/why-not-lgpl.html'.
-
-
-File: gawk.info, Node: GNU Free Documentation License, Next: Index, Prev: Copying, Up: Top
-
-GNU Free Documentation License
-******************************
-
- Version 1.3, 3 November 2008
-
- Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
- `http://fsf.org/'
-
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- 0. PREAMBLE
-
- The purpose of this License is to make a manual, textbook, or other
- functional and useful document "free" in the sense of freedom: to
- assure everyone the effective freedom to copy and redistribute it,
- with or without modifying it, either commercially or
- noncommercially. Secondarily, this License preserves for the
- author and publisher a way to get credit for their work, while not
- being considered responsible for modifications made by others.
-
- This License is a kind of "copyleft", which means that derivative
- works of the document must themselves be free in the same sense.
- It complements the GNU General Public License, which is a copyleft
- license designed for free software.
-
- We have designed this License in order to use it for manuals for
- free software, because free software needs free documentation: a
- free program should come with manuals providing the same freedoms
- that the software does. But this License is not limited to
- software manuals; it can be used for any textual work, regardless
- of subject matter or whether it is published as a printed book.
- We recommend this License principally for works whose purpose is
- instruction or reference.
-
- 1. APPLICABILITY AND DEFINITIONS
-
- This License applies to any manual or other work, in any medium,
- that contains a notice placed by the copyright holder saying it
- can be distributed under the terms of this License. Such a notice
- grants a world-wide, royalty-free license, unlimited in duration,
- to use that work under the conditions stated herein. The
- "Document", below, refers to any such manual or work. Any member
- of the public is a licensee, and is addressed as "you". You
- accept the license if you copy, modify or distribute the work in a
- way requiring permission under copyright law.
-
- A "Modified Version" of the Document means any work containing the
- Document or a portion of it, either copied verbatim, or with
- modifications and/or translated into another language.
-
- A "Secondary Section" is a named appendix or a front-matter section
- of the Document that deals exclusively with the relationship of the
- publishers or authors of the Document to the Document's overall
- subject (or to related matters) and contains nothing that could
- fall directly within that overall subject. (Thus, if the Document
- is in part a textbook of mathematics, a Secondary Section may not
- explain any mathematics.) The relationship could be a matter of
- historical connection with the subject or with related matters, or
- of legal, commercial, philosophical, ethical or political position
- regarding them.
-
- The "Invariant Sections" are certain Secondary Sections whose
- titles are designated, as being those of Invariant Sections, in
- the notice that says that the Document is released under this
- License. If a section does not fit the above definition of
- Secondary then it is not allowed to be designated as Invariant.
- The Document may contain zero Invariant Sections. If the Document
- does not identify any Invariant Sections then there are none.
-
- The "Cover Texts" are certain short passages of text that are
- listed, as Front-Cover Texts or Back-Cover Texts, in the notice
- that says that the Document is released under this License. A
- Front-Cover Text may be at most 5 words, and a Back-Cover Text may
- be at most 25 words.
-
- A "Transparent" copy of the Document means a machine-readable copy,
- represented in a format whose specification is available to the
- general public, that is suitable for revising the document
- straightforwardly with generic text editors or (for images
- composed of pixels) generic paint programs or (for drawings) some
- widely available drawing editor, and that is suitable for input to
- text formatters or for automatic translation to a variety of
- formats suitable for input to text formatters. A copy made in an
- otherwise Transparent file format whose markup, or absence of
- markup, has been arranged to thwart or discourage subsequent
- modification by readers is not Transparent. An image format is
- not Transparent if used for any substantial amount of text. A
- copy that is not "Transparent" is called "Opaque".
-
- Examples of suitable formats for Transparent copies include plain
- ASCII without markup, Texinfo input format, LaTeX input format,
- SGML or XML using a publicly available DTD, and
- standard-conforming simple HTML, PostScript or PDF designed for
- human modification. Examples of transparent image formats include
- PNG, XCF and JPG. Opaque formats include proprietary formats that
- can be read and edited only by proprietary word processors, SGML or
- XML for which the DTD and/or processing tools are not generally
- available, and the machine-generated HTML, PostScript or PDF
- produced by some word processors for output purposes only.
-
- The "Title Page" means, for a printed book, the title page itself,
- plus such following pages as are needed to hold, legibly, the
- material this License requires to appear in the title page. For
- works in formats which do not have any title page as such, "Title
- Page" means the text near the most prominent appearance of the
- work's title, preceding the beginning of the body of the text.
-
- The "publisher" means any person or entity that distributes copies
- of the Document to the public.
-
- A section "Entitled XYZ" means a named subunit of the Document
- whose title either is precisely XYZ or contains XYZ in parentheses
- following text that translates XYZ in another language. (Here XYZ
- stands for a specific section name mentioned below, such as
- "Acknowledgements", "Dedications", "Endorsements", or "History".)
- To "Preserve the Title" of such a section when you modify the
- Document means that it remains a section "Entitled XYZ" according
- to this definition.
-
- The Document may include Warranty Disclaimers next to the notice
- which states that this License applies to the Document. These
- Warranty Disclaimers are considered to be included by reference in
- this License, but only as regards disclaiming warranties: any other
- implication that these Warranty Disclaimers may have is void and
- has no effect on the meaning of this License.
-
- 2. VERBATIM COPYING
-
- You may copy and distribute the Document in any medium, either
- commercially or noncommercially, provided that this License, the
- copyright notices, and the license notice saying this License
- applies to the Document are reproduced in all copies, and that you
- add no other conditions whatsoever to those of this License. You
- may not use technical measures to obstruct or control the reading
- or further copying of the copies you make or distribute. However,
- you may accept compensation in exchange for copies. If you
- distribute a large enough number of copies you must also follow
- the conditions in section 3.
-
- You may also lend copies, under the same conditions stated above,
- and you may publicly display copies.
-
- 3. COPYING IN QUANTITY
-
- If you publish printed copies (or copies in media that commonly
- have printed covers) of the Document, numbering more than 100, and
- the Document's license notice requires Cover Texts, you must
- enclose the copies in covers that carry, clearly and legibly, all
- these Cover Texts: Front-Cover Texts on the front cover, and
- Back-Cover Texts on the back cover. Both covers must also clearly
- and legibly identify you as the publisher of these copies. The
- front cover must present the full title with all words of the
- title equally prominent and visible. You may add other material
- on the covers in addition. Copying with changes limited to the
- covers, as long as they preserve the title of the Document and
- satisfy these conditions, can be treated as verbatim copying in
- other respects.
-
- If the required texts for either cover are too voluminous to fit
- legibly, you should put the first ones listed (as many as fit
- reasonably) on the actual cover, and continue the rest onto
- adjacent pages.
-
- If you publish or distribute Opaque copies of the Document
- numbering more than 100, you must either include a
- machine-readable Transparent copy along with each Opaque copy, or
- state in or with each Opaque copy a computer-network location from
- which the general network-using public has access to download
- using public-standard network protocols a complete Transparent
- copy of the Document, free of added material. If you use the
- latter option, you must take reasonably prudent steps, when you
- begin distribution of Opaque copies in quantity, to ensure that
- this Transparent copy will remain thus accessible at the stated
- location until at least one year after the last time you
- distribute an Opaque copy (directly or through your agents or
- retailers) of that edition to the public.
-
- It is requested, but not required, that you contact the authors of
- the Document well before redistributing any large number of
- copies, to give them a chance to provide you with an updated
- version of the Document.
-
- 4. MODIFICATIONS
-
- You may copy and distribute a Modified Version of the Document
- under the conditions of sections 2 and 3 above, provided that you
- release the Modified Version under precisely this License, with
- the Modified Version filling the role of the Document, thus
- licensing distribution and modification of the Modified Version to
- whoever possesses a copy of it. In addition, you must do these
- things in the Modified Version:
-
- A. Use in the Title Page (and on the covers, if any) a title
- distinct from that of the Document, and from those of
- previous versions (which should, if there were any, be listed
- in the History section of the Document). You may use the
- same title as a previous version if the original publisher of
- that version gives permission.
-
- B. List on the Title Page, as authors, one or more persons or
- entities responsible for authorship of the modifications in
- the Modified Version, together with at least five of the
- principal authors of the Document (all of its principal
- authors, if it has fewer than five), unless they release you
- from this requirement.
-
- C. State on the Title page the name of the publisher of the
- Modified Version, as the publisher.
-
- D. Preserve all the copyright notices of the Document.
-
- E. Add an appropriate copyright notice for your modifications
- adjacent to the other copyright notices.
-
- F. Include, immediately after the copyright notices, a license
- notice giving the public permission to use the Modified
- Version under the terms of this License, in the form shown in
- the Addendum below.
-
- G. Preserve in that license notice the full lists of Invariant
- Sections and required Cover Texts given in the Document's
- license notice.
-
- H. Include an unaltered copy of this License.
-
- I. Preserve the section Entitled "History", Preserve its Title,
- and add to it an item stating at least the title, year, new
- authors, and publisher of the Modified Version as given on
- the Title Page. If there is no section Entitled "History" in
- the Document, create one stating the title, year, authors,
- and publisher of the Document as given on its Title Page,
- then add an item describing the Modified Version as stated in
- the previous sentence.
-
- J. Preserve the network location, if any, given in the Document
- for public access to a Transparent copy of the Document, and
- likewise the network locations given in the Document for
- previous versions it was based on. These may be placed in
- the "History" section. You may omit a network location for a
- work that was published at least four years before the
- Document itself, or if the original publisher of the version
- it refers to gives permission.
-
- K. For any section Entitled "Acknowledgements" or "Dedications",
- Preserve the Title of the section, and preserve in the
- section all the substance and tone of each of the contributor
- acknowledgements and/or dedications given therein.
-
- L. Preserve all the Invariant Sections of the Document,
- unaltered in their text and in their titles. Section numbers
- or the equivalent are not considered part of the section
- titles.
-
- M. Delete any section Entitled "Endorsements". Such a section
- may not be included in the Modified Version.
-
- N. Do not retitle any existing section to be Entitled
- "Endorsements" or to conflict in title with any Invariant
- Section.
-
- O. Preserve any Warranty Disclaimers.
-
- If the Modified Version includes new front-matter sections or
- appendices that qualify as Secondary Sections and contain no
- material copied from the Document, you may at your option
- designate some or all of these sections as invariant. To do this,
- add their titles to the list of Invariant Sections in the Modified
- Version's license notice. These titles must be distinct from any
- other section titles.
-
- You may add a section Entitled "Endorsements", provided it contains
- nothing but endorsements of your Modified Version by various
- parties--for example, statements of peer review or that the text
- has been approved by an organization as the authoritative
- definition of a standard.
-
- You may add a passage of up to five words as a Front-Cover Text,
- and a passage of up to 25 words as a Back-Cover Text, to the end
- of the list of Cover Texts in the Modified Version. Only one
- passage of Front-Cover Text and one of Back-Cover Text may be
- added by (or through arrangements made by) any one entity. If the
- Document already includes a cover text for the same cover,
- previously added by you or by arrangement made by the same entity
- you are acting on behalf of, you may not add another; but you may
- replace the old one, on explicit permission from the previous
- publisher that added the old one.
-
- The author(s) and publisher(s) of the Document do not by this
- License give permission to use their names for publicity for or to
- assert or imply endorsement of any Modified Version.
-
- 5. COMBINING DOCUMENTS
-
- You may combine the Document with other documents released under
- this License, under the terms defined in section 4 above for
- modified versions, provided that you include in the combination
- all of the Invariant Sections of all of the original documents,
- unmodified, and list them all as Invariant Sections of your
- combined work in its license notice, and that you preserve all
- their Warranty Disclaimers.
-
- The combined work need only contain one copy of this License, and
- multiple identical Invariant Sections may be replaced with a single
- copy. If there are multiple Invariant Sections with the same name
- but different contents, make the title of each such section unique
- by adding at the end of it, in parentheses, the name of the
- original author or publisher of that section if known, or else a
- unique number. Make the same adjustment to the section titles in
- the list of Invariant Sections in the license notice of the
- combined work.
-
- In the combination, you must combine any sections Entitled
- "History" in the various original documents, forming one section
- Entitled "History"; likewise combine any sections Entitled
- "Acknowledgements", and any sections Entitled "Dedications". You
- must delete all sections Entitled "Endorsements."
-
- 6. COLLECTIONS OF DOCUMENTS
-
- You may make a collection consisting of the Document and other
- documents released under this License, and replace the individual
- copies of this License in the various documents with a single copy
- that is included in the collection, provided that you follow the
- rules of this License for verbatim copying of each of the
- documents in all other respects.
-
- You may extract a single document from such a collection, and
- distribute it individually under this License, provided you insert
- a copy of this License into the extracted document, and follow
- this License in all other respects regarding verbatim copying of
- that document.
-
- 7. AGGREGATION WITH INDEPENDENT WORKS
-
- A compilation of the Document or its derivatives with other
- separate and independent documents or works, in or on a volume of
- a storage or distribution medium, is called an "aggregate" if the
- copyright resulting from the compilation is not used to limit the
- legal rights of the compilation's users beyond what the individual
- works permit. When the Document is included in an aggregate, this
- License does not apply to the other works in the aggregate which
- are not themselves derivative works of the Document.
-
- If the Cover Text requirement of section 3 is applicable to these
- copies of the Document, then if the Document is less than one half
- of the entire aggregate, the Document's Cover Texts may be placed
- on covers that bracket the Document within the aggregate, or the
- electronic equivalent of covers if the Document is in electronic
- form. Otherwise they must appear on printed covers that bracket
- the whole aggregate.
-
- 8. TRANSLATION
-
- Translation is considered a kind of modification, so you may
- distribute translations of the Document under the terms of section
- 4. Replacing Invariant Sections with translations requires special
- permission from their copyright holders, but you may include
- translations of some or all Invariant Sections in addition to the
- original versions of these Invariant Sections. You may include a
- translation of this License, and all the license notices in the
- Document, and any Warranty Disclaimers, provided that you also
- include the original English version of this License and the
- original versions of those notices and disclaimers. In case of a
- disagreement between the translation and the original version of
- this License or a notice or disclaimer, the original version will
- prevail.
-
- If a section in the Document is Entitled "Acknowledgements",
- "Dedications", or "History", the requirement (section 4) to
- Preserve its Title (section 1) will typically require changing the
- actual title.
-
- 9. TERMINATION
-
- You may not copy, modify, sublicense, or distribute the Document
- except as expressly provided under this License. Any attempt
- otherwise to copy, modify, sublicense, or distribute it is void,
- and will automatically terminate your rights under this License.
-
- However, if you cease all violation of this License, then your
- license from a particular copyright holder is reinstated (a)
- provisionally, unless and until the copyright holder explicitly
- and finally terminates your license, and (b) permanently, if the
- copyright holder fails to notify you of the violation by some
- reasonable means prior to 60 days after the cessation.
-
- Moreover, your license from a particular copyright holder is
- reinstated permanently if the copyright holder notifies you of the
- violation by some reasonable means, this is the first time you have
- received notice of violation of this License (for any work) from
- that copyright holder, and you cure the violation prior to 30 days
- after your receipt of the notice.
-
- Termination of your rights under this section does not terminate
- the licenses of parties who have received copies or rights from
- you under this License. If your rights have been terminated and
- not permanently reinstated, receipt of a copy of some or all of
- the same material does not give you any rights to use it.
-
- 10. FUTURE REVISIONS OF THIS LICENSE
-
- The Free Software Foundation may publish new, revised versions of
- the GNU Free Documentation License from time to time. Such new
- versions will be similar in spirit to the present version, but may
- differ in detail to address new problems or concerns. See
- `http://www.gnu.org/copyleft/'.
-
- Each version of the License is given a distinguishing version
- number. If the Document specifies that a particular numbered
- version of this License "or any later version" applies to it, you
- have the option of following the terms and conditions either of
- that specified version or of any later version that has been
- published (not as a draft) by the Free Software Foundation. If
- the Document does not specify a version number of this License,
- you may choose any version ever published (not as a draft) by the
- Free Software Foundation. If the Document specifies that a proxy
- can decide which future versions of this License can be used, that
- proxy's public statement of acceptance of a version permanently
- authorizes you to choose that version for the Document.
-
- 11. RELICENSING
-
- "Massive Multiauthor Collaboration Site" (or "MMC Site") means any
- World Wide Web server that publishes copyrightable works and also
- provides prominent facilities for anybody to edit those works. A
- public wiki that anybody can edit is an example of such a server.
- A "Massive Multiauthor Collaboration" (or "MMC") contained in the
- site means any set of copyrightable works thus published on the MMC
- site.
-
- "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
- license published by Creative Commons Corporation, a not-for-profit
- corporation with a principal place of business in San Francisco,
- California, as well as future copyleft versions of that license
- published by that same organization.
-
- "Incorporate" means to publish or republish a Document, in whole or
- in part, as part of another Document.
-
- An MMC is "eligible for relicensing" if it is licensed under this
- License, and if all works that were first published under this
- License somewhere other than this MMC, and subsequently
- incorporated in whole or in part into the MMC, (1) had no cover
- texts or invariant sections, and (2) were thus incorporated prior
- to November 1, 2008.
-
- The operator of an MMC Site may republish an MMC contained in the
- site under CC-BY-SA on the same site at any time before August 1,
- 2009, provided the MMC is eligible for relicensing.
-
-
-ADDENDUM: How to use this License for your documents
-====================================================
-
-To use this License in a document you have written, include a copy of
-the License in the document and put the following copyright and license
-notices just after the title page:
-
- Copyright (C) YEAR YOUR NAME.
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3
- or any later version published by the Free Software Foundation;
- with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
- Texts. A copy of the license is included in the section entitled ``GNU
- Free Documentation License''.
-
- If you have Invariant Sections, Front-Cover Texts and Back-Cover
-Texts, replace the "with...Texts." line with this:
-
- with the Invariant Sections being LIST THEIR TITLES, with
- the Front-Cover Texts being LIST, and with the Back-Cover Texts
- being LIST.
-
- If you have Invariant Sections without Cover Texts, or some other
-combination of the three, merge those two alternatives to suit the
-situation.
-
- If your document contains nontrivial examples of program code, we
-recommend releasing these examples in parallel under your choice of
-free software license, such as the GNU General Public License, to
-permit their use in free software.
-
-
-File: gawk.info, Node: Index, Prev: GNU Free Documentation License, Up: Top
-
-Index
-*****
-
-
-* Menu:
-
-* ! (exclamation point), ! operator: Boolean Ops. (line 67)
-* ! (exclamation point), ! operator <1>: Egrep Program. (line 170)
-* ! (exclamation point), ! operator <2>: Ranges. (line 48)
-* ! (exclamation point), ! operator: Precedence. (line 52)
-* ! (exclamation point), != operator <1>: Precedence. (line 65)
-* ! (exclamation point), != operator: Comparison Operators.
- (line 11)
-* ! (exclamation point), !~ operator <1>: Expression Patterns.
- (line 24)
-* ! (exclamation point), !~ operator <2>: Precedence. (line 80)
-* ! (exclamation point), !~ operator <3>: Comparison Operators.
- (line 11)
-* ! (exclamation point), !~ operator <4>: Regexp Constants. (line 6)
-* ! (exclamation point), !~ operator <5>: Computed Regexps. (line 6)
-* ! (exclamation point), !~ operator <6>: Case-sensitivity. (line 26)
-* ! (exclamation point), !~ operator: Regexp Usage. (line 19)
-* " (double quote) <1>: Quoting. (line 37)
-* " (double quote): Read Terminal. (line 25)
-* " (double quote), regexp constants: Computed Regexps. (line 28)
-* # (number sign), #! (executable scripts): Executable Scripts.
- (line 6)
-* # (number sign), #! (executable scripts), portability issues with: Executable Scripts.
- (line 6)
-* # (number sign), commenting: Comments. (line 6)
-* $ (dollar sign): Regexp Operators. (line 35)
-* $ (dollar sign), $ field operator <1>: Precedence. (line 43)
-* $ (dollar sign), $ field operator: Fields. (line 19)
-* $ (dollar sign), incrementing fields and arrays: Increment Ops.
- (line 30)
-* % (percent sign), % operator: Precedence. (line 55)
-* % (percent sign), %= operator <1>: Precedence. (line 95)
-* % (percent sign), %= operator: Assignment Ops. (line 129)
-* & (ampersand), && operator <1>: Precedence. (line 86)
-* & (ampersand), && operator: Boolean Ops. (line 57)
-* & (ampersand), gsub()/gensub()/sub() functions and: Gory Details.
- (line 6)
-* ' (single quote) <1>: Quoting. (line 31)
-* ' (single quote) <2>: Long. (line 33)
-* ' (single quote): One-shot. (line 15)
-* ' (single quote), vs. apostrophe: Comments. (line 27)
-* ' (single quote), with double quotes: Quoting. (line 53)
-* () (parentheses) <1>: Profiling. (line 138)
-* () (parentheses): Regexp Operators. (line 79)
-* * (asterisk), * operator, as multiplication operator: Precedence.
- (line 55)
-* * (asterisk), * operator, as regexp operator: Regexp Operators.
- (line 87)
-* * (asterisk), * operator, null strings, matching: Gory Details.
- (line 164)
-* * (asterisk), ** operator <1>: Precedence. (line 49)
-* * (asterisk), ** operator: Arithmetic Ops. (line 81)
-* * (asterisk), **= operator <1>: Precedence. (line 95)
-* * (asterisk), **= operator: Assignment Ops. (line 129)
-* * (asterisk), *= operator <1>: Precedence. (line 95)
-* * (asterisk), *= operator: Assignment Ops. (line 129)
-* + (plus sign): Regexp Operators. (line 102)
-* + (plus sign), + operator: Precedence. (line 52)
-* + (plus sign), ++ (decrement/increment operators): Increment Ops.
- (line 11)
-* + (plus sign), ++ operator <1>: Precedence. (line 46)
-* + (plus sign), ++ operator: Increment Ops. (line 40)
-* + (plus sign), += operator <1>: Precedence. (line 95)
-* + (plus sign), += operator: Assignment Ops. (line 82)
-* , (comma), in range patterns: Ranges. (line 6)
-* - (hyphen), - operator: Precedence. (line 52)
-* - (hyphen), -- (decrement/increment) operator: Precedence. (line 46)
-* - (hyphen), -- operator: Increment Ops. (line 48)
-* - (hyphen), -= operator <1>: Precedence. (line 95)
-* - (hyphen), -= operator: Assignment Ops. (line 129)
-* - (hyphen), filenames beginning with: Options. (line 59)
-* - (hyphen), in bracket expressions: Bracket Expressions. (line 17)
-* --assign option: Options. (line 32)
-* --bignum option: Options. (line 187)
-* --c option: Options. (line 81)
-* --characters-as-bytes option: Options. (line 68)
-* --copyright option: Options. (line 88)
-* --debug option: Options. (line 108)
-* --disable-lint configuration option: Additional Configuration Options.
- (line 9)
-* --disable-nls configuration option: Additional Configuration Options.
- (line 24)
-* --dump-variables option <1>: Library Names. (line 45)
-* --dump-variables option: Options. (line 93)
-* --exec option: Options. (line 125)
-* --field-separator option: Options. (line 21)
-* --file option: Options. (line 25)
-* --gen-pot option <1>: String Extraction. (line 6)
-* --gen-pot option: Options. (line 147)
-* --help option: Options. (line 154)
-* --L option: Options. (line 274)
-* --lint option <1>: Options. (line 168)
-* --lint option: Command Line. (line 20)
-* --lint-old option: Options. (line 274)
-* --load option: Options. (line 159)
-* --non-decimal-data option <1>: Nondecimal Data. (line 6)
-* --non-decimal-data option: Options. (line 193)
-* --non-decimal-data option, strtonum() function and: Nondecimal Data.
- (line 36)
-* --optimize option: Options. (line 214)
-* --posix option: Options. (line 233)
-* --posix option, --traditional option and: Options. (line 252)
-* --pretty-print option: Options. (line 206)
-* --profile option <1>: Profiling. (line 12)
-* --profile option: Options. (line 221)
-* --re-interval option: Options. (line 258)
-* --sandbox option: Options. (line 265)
-* --sandbox option, disabling system() function: I/O Functions.
- (line 85)
-* --sandbox option, input redirection with getline: Getline. (line 19)
-* --sandbox option, output redirection with print, printf: Redirection.
- (line 6)
-* --source option: Options. (line 117)
-* --traditional option: Options. (line 81)
-* --traditional option, --posix option and: Options. (line 252)
-* --use-lc-numeric option: Options. (line 201)
-* --version option: Options. (line 279)
-* --with-whiny-user-strftime configuration option: Additional Configuration Options.
- (line 29)
-* -b option: Options. (line 68)
-* -C option: Options. (line 88)
-* -D option: Options. (line 108)
-* -d option: Options. (line 93)
-* -E option: Options. (line 125)
-* -e option: Options. (line 117)
-* -F option: Command Line Field Separator.
- (line 6)
-* -f option: Options. (line 25)
-* -F option: Options. (line 21)
-* -f option: Long. (line 12)
-* -F option, -Ft sets FS to TAB: Options. (line 287)
-* -f option, on command line: Options. (line 292)
-* -g option: Options. (line 147)
-* -h option: Options. (line 154)
-* -l option: Options. (line 159)
-* -M option: Options. (line 187)
-* -N option: Options. (line 201)
-* -n option: Options. (line 193)
-* -O option: Options. (line 214)
-* -o option: Options. (line 206)
-* -P option: Options. (line 233)
-* -p option: Options. (line 221)
-* -r option: Options. (line 258)
-* -S option: Options. (line 265)
-* -V option: Options. (line 279)
-* -v option: Options. (line 32)
-* -v option, variables, assigning: Assignment Options. (line 12)
-* -W option: Options. (line 46)
-* . (period): Regexp Operators. (line 43)
-* .mo files: Explaining gettext. (line 41)
-* .mo files, converting from .po: I18N Example. (line 62)
-* .mo files, specifying directory of <1>: Programmer i18n. (line 47)
-* .mo files, specifying directory of: Explaining gettext. (line 53)
-* .po files <1>: Translator i18n. (line 6)
-* .po files: Explaining gettext. (line 36)
-* .po files, converting to .mo: I18N Example. (line 62)
-* .pot files: Explaining gettext. (line 30)
-* / (forward slash): Regexp. (line 10)
-* / (forward slash), / operator: Precedence. (line 55)
-* / (forward slash), /= operator <1>: Precedence. (line 95)
-* / (forward slash), /= operator: Assignment Ops. (line 129)
-* / (forward slash), /= operator, vs. /=.../ regexp constant: Assignment Ops.
- (line 148)
-* / (forward slash), patterns and: Expression Patterns. (line 24)
-* /= operator vs. /=.../ regexp constant: Assignment Ops. (line 148)
-* /dev/... special files (gawk): Special FD. (line 46)
-* /dev/fd/N special files: Special FD. (line 46)
-* /inet/... special files (gawk): TCP/IP Networking. (line 6)
-* /inet4/... special files (gawk): TCP/IP Networking. (line 6)
-* /inet6/... special files (gawk): TCP/IP Networking. (line 6)
-* ; (semicolon): Statements/Lines. (line 91)
-* ; (semicolon), AWKPATH variable and: PC Using. (line 11)
-* ; (semicolon), separating statements in actions <1>: Statements.
- (line 10)
-* ; (semicolon), separating statements in actions: Action Overview.
- (line 19)
-* < (left angle bracket), < operator <1>: Precedence. (line 65)
-* < (left angle bracket), < operator: Comparison Operators.
- (line 11)
-* < (left angle bracket), < operator (I/O): Getline/File. (line 6)
-* < (left angle bracket), <= operator <1>: Precedence. (line 65)
-* < (left angle bracket), <= operator: Comparison Operators.
- (line 11)
-* = (equals sign), = operator: Assignment Ops. (line 6)
-* = (equals sign), == operator <1>: Precedence. (line 65)
-* = (equals sign), == operator: Comparison Operators.
- (line 11)
-* > (right angle bracket), > operator <1>: Precedence. (line 65)
-* > (right angle bracket), > operator: Comparison Operators.
- (line 11)
-* > (right angle bracket), > operator (I/O): Redirection. (line 22)
-* > (right angle bracket), >= operator <1>: Precedence. (line 65)
-* > (right angle bracket), >= operator: Comparison Operators.
- (line 11)
-* > (right angle bracket), >> operator (I/O) <1>: Precedence. (line 65)
-* > (right angle bracket), >> operator (I/O): Redirection. (line 50)
-* ? (question mark) regexp operator <1>: GNU Regexp Operators.
- (line 59)
-* ? (question mark) regexp operator: Regexp Operators. (line 111)
-* ? (question mark), ?: operator: Precedence. (line 92)
-* [] (square brackets): Regexp Operators. (line 55)
-* \ (backslash) <1>: Regexp Operators. (line 18)
-* \ (backslash) <2>: Quoting. (line 31)
-* \ (backslash) <3>: Comments. (line 50)
-* \ (backslash): Read Terminal. (line 25)
-* \ (backslash), \" escape sequence: Escape Sequences. (line 76)
-* \ (backslash), \' operator (gawk): GNU Regexp Operators.
- (line 56)
-* \ (backslash), \/ escape sequence: Escape Sequences. (line 69)
-* \ (backslash), \< operator (gawk): GNU Regexp Operators.
- (line 30)
-* \ (backslash), \> operator (gawk): GNU Regexp Operators.
- (line 34)
-* \ (backslash), \` operator (gawk): GNU Regexp Operators.
- (line 54)
-* \ (backslash), \a escape sequence: Escape Sequences. (line 34)
-* \ (backslash), \b escape sequence: Escape Sequences. (line 38)
-* \ (backslash), \B operator (gawk): GNU Regexp Operators.
- (line 43)
-* \ (backslash), \f escape sequence: Escape Sequences. (line 41)
-* \ (backslash), \n escape sequence: Escape Sequences. (line 44)
-* \ (backslash), \NNN escape sequence: Escape Sequences. (line 56)
-* \ (backslash), \r escape sequence: Escape Sequences. (line 47)
-* \ (backslash), \S operator (gawk): GNU Regexp Operators.
- (line 17)
-* \ (backslash), \s operator (gawk): GNU Regexp Operators.
- (line 13)
-* \ (backslash), \t escape sequence: Escape Sequences. (line 50)
-* \ (backslash), \v escape sequence: Escape Sequences. (line 53)
-* \ (backslash), \W operator (gawk): GNU Regexp Operators.
- (line 26)
-* \ (backslash), \w operator (gawk): GNU Regexp Operators.
- (line 21)
-* \ (backslash), \x escape sequence: Escape Sequences. (line 61)
-* \ (backslash), \y operator (gawk): GNU Regexp Operators.
- (line 38)
-* \ (backslash), as field separators: Command Line Field Separator.
- (line 27)
-* \ (backslash), continuing lines and <1>: Egrep Program. (line 220)
-* \ (backslash), continuing lines and: Statements/Lines. (line 19)
-* \ (backslash), continuing lines and, comments and: Statements/Lines.
- (line 76)
-* \ (backslash), continuing lines and, in csh: Statements/Lines.
- (line 44)
-* \ (backslash), gsub()/gensub()/sub() functions and: Gory Details.
- (line 6)
-* \ (backslash), in bracket expressions: Bracket Expressions. (line 17)
-* \ (backslash), in escape sequences: Escape Sequences. (line 6)
-* \ (backslash), in escape sequences, POSIX and: Escape Sequences.
- (line 113)
-* \ (backslash), regexp constants: Computed Regexps. (line 28)
-* ^ (caret) <1>: GNU Regexp Operators.
- (line 59)
-* ^ (caret): Regexp Operators. (line 22)
-* ^ (caret), ^ operator: Precedence. (line 49)
-* ^ (caret), ^= operator <1>: Precedence. (line 95)
-* ^ (caret), ^= operator: Assignment Ops. (line 129)
-* ^ (caret), in bracket expressions: Bracket Expressions. (line 17)
-* ^, in FS: Regexp Field Splitting.
- (line 59)
-* _ (underscore), _ C macro: Explaining gettext. (line 70)
-* _ (underscore), in names of private variables: Library Names.
- (line 29)
-* _ (underscore), translatable string: Programmer i18n. (line 69)
-* _gr_init() user-defined function: Group Functions. (line 82)
-* _pw_init() user-defined function: Passwd Functions. (line 105)
-* accessing fields: Fields. (line 6)
-* account information <1>: Group Functions. (line 6)
-* account information: Passwd Functions. (line 16)
-* actions: Action Overview. (line 6)
-* actions, control statements in: Statements. (line 6)
-* actions, default: Very Simple. (line 34)
-* actions, empty: Very Simple. (line 39)
-* Ada programming language: Glossary. (line 20)
-* adding, features to gawk: Adding Code. (line 6)
-* adding, fields: Changing Fields. (line 53)
-* adding, functions to gawk: Dynamic Extensions. (line 10)
-* advanced features, buffering: I/O Functions. (line 98)
-* advanced features, close() function: Close Files And Pipes.
- (line 131)
-* advanced features, constants, values of: Nondecimal-numbers.
- (line 67)
-* advanced features, data files as single record: Records. (line 175)
-* advanced features, fixed-width data: Constant Size. (line 9)
-* advanced features, FNR/NR variables: Auto-set. (line 225)
-* advanced features, gawk: Advanced Features. (line 6)
-* advanced features, gawk, network programming: TCP/IP Networking.
- (line 6)
-* advanced features, gawk, nondecimal input data: Nondecimal Data.
- (line 6)
-* advanced features, gawk, processes, communicating with: Two-way I/O.
- (line 23)
-* advanced features, network connections, See Also networks, connections: Advanced Features.
- (line 6)
-* advanced features, null strings, matching: Gory Details. (line 164)
-* advanced features, operators, precedence: Increment Ops. (line 61)
-* advanced features, piping into sh: Redirection. (line 143)
-* advanced features, regexp constants: Assignment Ops. (line 148)
-* advanced features, specifying field content: Splitting By Content.
- (line 9)
-* Aho, Alfred <1>: Contributors. (line 12)
-* Aho, Alfred: History. (line 17)
-* alarm clock example program: Alarm Program. (line 9)
-* alarm.awk program: Alarm Program. (line 29)
-* algorithms: Basic High Level. (line 66)
-* Alpha (DEC): Manual History. (line 28)
-* amazing awk assembler (aaa): Glossary. (line 12)
-* amazingly workable formatter (awf): Glossary. (line 25)
-* ambiguity, syntactic: /= operator vs. /=.../ regexp constant: Assignment Ops.
- (line 148)
-* ampersand (&), && operator <1>: Precedence. (line 86)
-* ampersand (&), && operator: Boolean Ops. (line 57)
-* ampersand (&), gsub()/gensub()/sub() functions and: Gory Details.
- (line 6)
-* anagram.awk program: Anagram Program. (line 22)
-* AND bitwise operation: Bitwise Functions. (line 6)
-* and Boolean-logic operator: Boolean Ops. (line 6)
-* and() function (gawk): Bitwise Functions. (line 39)
-* ANSI: Glossary. (line 35)
-* arbitrary precision: Arbitrary Precision Arithmetic.
- (line 6)
-* archeologists: Bugs. (line 6)
-* ARGC/ARGV variables <1>: ARGC and ARGV. (line 6)
-* ARGC/ARGV variables: Auto-set. (line 11)
-* ARGC/ARGV variables, command-line arguments: Other Arguments.
- (line 12)
-* ARGC/ARGV variables, portability and: Executable Scripts. (line 43)
-* ARGIND variable: Auto-set. (line 40)
-* ARGIND variable, command-line arguments: Other Arguments. (line 12)
-* arguments, command-line <1>: ARGC and ARGV. (line 6)
-* arguments, command-line <2>: Auto-set. (line 11)
-* arguments, command-line: Other Arguments. (line 6)
-* arguments, command-line, invoking awk: Command Line. (line 6)
-* arguments, in function calls: Function Calls. (line 16)
-* arguments, processing: Getopt Function. (line 6)
-* arguments, retrieving: Internals. (line 111)
-* arithmetic operators: Arithmetic Ops. (line 6)
-* arrays: Arrays. (line 6)
-* arrays, as parameters to functions: Pass By Value/Reference.
- (line 47)
-* arrays, associative: Array Intro. (line 50)
-* arrays, associative, clearing: Internals. (line 68)
-* arrays, associative, library functions and: Library Names. (line 57)
-* arrays, deleting entire contents: Delete. (line 39)
-* arrays, elements, assigning: Assigning Elements. (line 6)
-* arrays, elements, deleting: Delete. (line 6)
-* arrays, elements, installing: Internals. (line 72)
-* arrays, elements, order of: Scanning an Array. (line 48)
-* arrays, elements, referencing: Reference to Elements.
- (line 6)
-* arrays, elements, retrieving number of: String Functions. (line 29)
-* arrays, for statement and: Scanning an Array. (line 20)
-* arrays, IGNORECASE variable and: Array Intro. (line 92)
-* arrays, indexing: Array Intro. (line 50)
-* arrays, merging into strings: Join Function. (line 6)
-* arrays, multidimensional: Multi-dimensional. (line 10)
-* arrays, multidimensional, scanning: Multi-scanning. (line 11)
-* arrays, names of: Arrays. (line 18)
-* arrays, scanning: Scanning an Array. (line 6)
-* arrays, sorting: Array Sorting Functions.
- (line 6)
-* arrays, sorting, IGNORECASE variable and: Array Sorting Functions.
- (line 81)
-* arrays, sparse: Array Intro. (line 71)
-* arrays, subscripts: Numeric Array Subscripts.
- (line 6)
-* arrays, subscripts, uninitialized variables as: Uninitialized Subscripts.
- (line 6)
-* artificial intelligence, gawk and: Distribution contents.
- (line 55)
-* ASCII <1>: Glossary. (line 141)
-* ASCII: Ordinal Functions. (line 45)
-* asort() function (gawk) <1>: Array Sorting Functions.
- (line 6)
-* asort() function (gawk): String Functions. (line 29)
-* asort() function (gawk), arrays, sorting: Array Sorting Functions.
- (line 6)
-* asorti() function (gawk): String Functions. (line 77)
-* assert() function (C library): Assert Function. (line 6)
-* assert() user-defined function: Assert Function. (line 28)
-* assertions: Assert Function. (line 6)
-* assignment operators: Assignment Ops. (line 6)
-* assignment operators, evaluation order: Assignment Ops. (line 111)
-* assignment operators, lvalues/rvalues: Assignment Ops. (line 32)
-* assignments as filenames: Ignoring Assigns. (line 6)
-* assoc_clear() internal function: Internals. (line 68)
-* assoc_lookup() internal function: Internals. (line 72)
-* associative arrays: Array Intro. (line 50)
-* asterisk (*), * operator, as multiplication operator: Precedence.
- (line 55)
-* asterisk (*), * operator, as regexp operator: Regexp Operators.
- (line 87)
-* asterisk (*), * operator, null strings, matching: Gory Details.
- (line 164)
-* asterisk (*), ** operator <1>: Precedence. (line 49)
-* asterisk (*), ** operator: Arithmetic Ops. (line 81)
-* asterisk (*), **= operator <1>: Precedence. (line 95)
-* asterisk (*), **= operator: Assignment Ops. (line 129)
-* asterisk (*), *= operator <1>: Precedence. (line 95)
-* asterisk (*), *= operator: Assignment Ops. (line 129)
-* atan2() function: Numeric Functions. (line 11)
-* awf (amazingly workable formatter) program: Glossary. (line 25)
-* awk debugging, enabling: Options. (line 108)
-* awk enabling: Options. (line 206)
-* awk language, POSIX version: Assignment Ops. (line 136)
-* awk profiling, enabling: Options. (line 221)
-* awk programs <1>: Two Rules. (line 6)
-* awk programs <2>: Executable Scripts. (line 6)
-* awk programs: Getting Started. (line 12)
-* awk programs, complex: When. (line 29)
-* awk programs, documenting <1>: Library Names. (line 6)
-* awk programs, documenting: Comments. (line 6)
-* awk programs, examples of: Sample Programs. (line 6)
-* awk programs, execution of: Next Statement. (line 16)
-* awk programs, internationalizing <1>: Programmer i18n. (line 6)
-* awk programs, internationalizing: I18N Functions. (line 6)
-* awk programs, lengthy: Long. (line 6)
-* awk programs, lengthy, assertions: Assert Function. (line 6)
-* awk programs, location of: Options. (line 25)
-* awk programs, one-line examples: Very Simple. (line 45)
-* awk programs, profiling: Profiling. (line 6)
-* awk programs, running <1>: Long. (line 6)
-* awk programs, running: Running gawk. (line 6)
-* awk programs, running, from shell scripts: One-shot. (line 22)
-* awk programs, running, without input files: Read Terminal. (line 17)
-* awk programs, shell variables in: Using Shell Variables.
- (line 6)
-* awk, function of: Getting Started. (line 6)
-* awk, gawk and <1>: This Manual. (line 14)
-* awk, gawk and: Preface. (line 23)
-* awk, history of: History. (line 17)
-* awk, implementation issues, pipes: Redirection. (line 135)
-* awk, implementations: Other Versions. (line 6)
-* awk, implementations, limits: Getline Notes. (line 14)
-* awk, invoking: Command Line. (line 6)
-* awk, new vs. old: Names. (line 6)
-* awk, new vs. old, OFMT variable: Conversion. (line 55)
-* awk, POSIX and: Preface. (line 23)
-* awk, POSIX and, See Also POSIX awk: Preface. (line 23)
-* awk, regexp constants and: Comparison Operators.
- (line 103)
-* awk, See Also gawk: Preface. (line 36)
-* awk, terms describing: This Manual. (line 6)
-* awk, uses for <1>: When. (line 6)
-* awk, uses for <2>: Getting Started. (line 12)
-* awk, uses for: Preface. (line 23)
-* awk, versions of <1>: V7/SVR3.1. (line 6)
-* awk, versions of: Names. (line 10)
-* awk, versions of, changes between SVR3.1 and SVR4: SVR4. (line 6)
-* awk, versions of, changes between SVR4 and POSIX awk: POSIX.
- (line 6)
-* awk, versions of, changes between V7 and SVR3.1: V7/SVR3.1. (line 6)
-* awk, versions of, See Also Brian Kernighan's awk <1>: Other Versions.
- (line 13)
-* awk, versions of, See Also Brian Kernighan's awk: BTL. (line 6)
-* awk.h file (internal): Internals. (line 15)
-* awka compiler for awk: Other Versions. (line 55)
-* AWKLIBPATH environment variable: AWKLIBPATH Variable. (line 6)
-* AWKNUM internal type: Internals. (line 19)
-* AWKPATH environment variable <1>: PC Using. (line 11)
-* AWKPATH environment variable: AWKPATH Variable. (line 6)
-* awkprof.out file: Profiling. (line 6)
-* awksed.awk program: Simple Sed. (line 25)
-* awkvars.out file: Options. (line 93)
-* b debugger command (alias for break): Breakpoint Control. (line 11)
-* backslash (\) <1>: Regexp Operators. (line 18)
-* backslash (\) <2>: Quoting. (line 31)
-* backslash (\) <3>: Comments. (line 50)
-* backslash (\): Read Terminal. (line 25)
-* backslash (\), \" escape sequence: Escape Sequences. (line 76)
-* backslash (\), \' operator (gawk): GNU Regexp Operators.
- (line 56)
-* backslash (\), \/ escape sequence: Escape Sequences. (line 69)
-* backslash (\), \< operator (gawk): GNU Regexp Operators.
- (line 30)
-* backslash (\), \> operator (gawk): GNU Regexp Operators.
- (line 34)
-* backslash (\), \` operator (gawk): GNU Regexp Operators.
- (line 54)
-* backslash (\), \a escape sequence: Escape Sequences. (line 34)
-* backslash (\), \b escape sequence: Escape Sequences. (line 38)
-* backslash (\), \B operator (gawk): GNU Regexp Operators.
- (line 43)
-* backslash (\), \f escape sequence: Escape Sequences. (line 41)
-* backslash (\), \n escape sequence: Escape Sequences. (line 44)
-* backslash (\), \NNN escape sequence: Escape Sequences. (line 56)
-* backslash (\), \r escape sequence: Escape Sequences. (line 47)
-* backslash (\), \S operator (gawk): GNU Regexp Operators.
- (line 17)
-* backslash (\), \s operator (gawk): GNU Regexp Operators.
- (line 13)
-* backslash (\), \t escape sequence: Escape Sequences. (line 50)
-* backslash (\), \v escape sequence: Escape Sequences. (line 53)
-* backslash (\), \W operator (gawk): GNU Regexp Operators.
- (line 26)
-* backslash (\), \w operator (gawk): GNU Regexp Operators.
- (line 21)
-* backslash (\), \x escape sequence: Escape Sequences. (line 61)
-* backslash (\), \y operator (gawk): GNU Regexp Operators.
- (line 38)
-* backslash (\), as field separators: Command Line Field Separator.
- (line 27)
-* backslash (\), continuing lines and <1>: Egrep Program. (line 220)
-* backslash (\), continuing lines and: Statements/Lines. (line 19)
-* backslash (\), continuing lines and, comments and: Statements/Lines.
- (line 76)
-* backslash (\), continuing lines and, in csh: Statements/Lines.
- (line 44)
-* backslash (\), gsub()/gensub()/sub() functions and: Gory Details.
- (line 6)
-* backslash (\), in bracket expressions: Bracket Expressions. (line 17)
-* backslash (\), in escape sequences: Escape Sequences. (line 6)
-* backslash (\), in escape sequences, POSIX and: Escape Sequences.
- (line 113)
-* backslash (\), regexp constants: Computed Regexps. (line 28)
-* backtrace debugger command: Execution Stack. (line 13)
-* BBS-list file: Sample Data Files. (line 6)
-* Beebe, Nelson <1>: Other Versions. (line 69)
-* Beebe, Nelson: Acknowledgments. (line 60)
-* BEGIN pattern <1>: Profiling. (line 62)
-* BEGIN pattern <2>: BEGIN/END. (line 6)
-* BEGIN pattern <3>: Field Separators. (line 44)
-* BEGIN pattern: Records. (line 29)
-* BEGIN pattern, assert() user-defined function and: Assert Function.
- (line 83)
-* BEGIN pattern, Boolean patterns and: Expression Patterns. (line 73)
-* BEGIN pattern, exit statement and: Exit Statement. (line 12)
-* BEGIN pattern, getline and: Getline Notes. (line 19)
-* BEGIN pattern, headings, adding: Print Examples. (line 43)
-* BEGIN pattern, next/nextfile statements and <1>: Next Statement.
- (line 45)
-* BEGIN pattern, next/nextfile statements and: I/O And BEGIN/END.
- (line 37)
-* BEGIN pattern, OFS/ORS variables, assigning values to: Output Separators.
- (line 20)
-* BEGIN pattern, operators and: Using BEGIN/END. (line 17)
-* BEGIN pattern, print statement and: I/O And BEGIN/END. (line 16)
-* BEGIN pattern, pwcat program: Passwd Functions. (line 143)
-* BEGIN pattern, running awk programs and: Cut Program. (line 68)
-* BEGIN pattern, TEXTDOMAIN variable and: Programmer i18n. (line 60)
-* BEGINFILE pattern: BEGINFILE/ENDFILE. (line 6)
-* BEGINFILE pattern, Boolean patterns and: Expression Patterns.
- (line 73)
-* beginfile() user-defined function: Filetrans Function. (line 62)
-* Benzinger, Michael: Contributors. (line 97)
-* Berry, Karl: Acknowledgments. (line 33)
-* binary input/output: User-modified. (line 10)
-* bindtextdomain() function (C library): Explaining gettext. (line 49)
-* bindtextdomain() function (gawk) <1>: Programmer i18n. (line 47)
-* bindtextdomain() function (gawk): I18N Functions. (line 12)
-* bindtextdomain() function (gawk), portability and: I18N Portability.
- (line 33)
-* BINMODE variable <1>: PC Using. (line 34)
-* BINMODE variable: User-modified. (line 10)
-* bits2str() user-defined function: Bitwise Functions. (line 68)
-* bitwise, complement: Bitwise Functions. (line 25)
-* bitwise, operations: Bitwise Functions. (line 6)
-* bitwise, shift: Bitwise Functions. (line 32)
-* body, in actions: Statements. (line 10)
-* body, in loops: While Statement. (line 14)
-* Boolean expressions: Boolean Ops. (line 6)
-* Boolean expressions, as patterns: Expression Patterns. (line 41)
-* Boolean operators, See Boolean expressions: Boolean Ops. (line 6)
-* Bourne shell, quoting rules for: Quoting. (line 18)
-* braces ({}): Profiling. (line 134)
-* braces ({}), actions and: Action Overview. (line 19)
-* braces ({}), statements, grouping: Statements. (line 10)
-* bracket expressions <1>: Bracket Expressions. (line 6)
-* bracket expressions: Regexp Operators. (line 55)
-* bracket expressions, character classes: Bracket Expressions.
- (line 30)
-* bracket expressions, collating elements: Bracket Expressions.
- (line 69)
-* bracket expressions, collating symbols: Bracket Expressions.
- (line 76)
-* bracket expressions, complemented: Regexp Operators. (line 63)
-* bracket expressions, equivalence classes: Bracket Expressions.
- (line 82)
-* bracket expressions, non-ASCII: Bracket Expressions. (line 69)
-* bracket expressions, range expressions: Bracket Expressions.
- (line 6)
-* break debugger command: Breakpoint Control. (line 11)
-* break statement: Break Statement. (line 6)
-* Brennan, Michael <1>: Other Versions. (line 6)
-* Brennan, Michael <2>: Simple Sed. (line 25)
-* Brennan, Michael <3>: Two-way I/O. (line 6)
-* Brennan, Michael: Delete. (line 52)
-* Brian Kernighan's awk, extensions <1>: Other Versions. (line 13)
-* Brian Kernighan's awk, extensions: BTL. (line 6)
-* Broder, Alan J.: Contributors. (line 88)
-* Brown, Martin: Contributors. (line 82)
-* BSD-based operating systems: Glossary. (line 611)
-* bt debugger command (alias for backtrace): Execution Stack. (line 13)
-* Buening, Andreas <1>: Bugs. (line 71)
-* Buening, Andreas <2>: Contributors. (line 92)
-* Buening, Andreas: Acknowledgments. (line 60)
-* buffering, input/output <1>: Two-way I/O. (line 70)
-* buffering, input/output: I/O Functions. (line 130)
-* buffering, interactive vs. noninteractive: I/O Functions. (line 98)
-* buffers, flushing: I/O Functions. (line 29)
-* buffers, operators for: GNU Regexp Operators.
- (line 48)
-* bug reports, email address, bug-gawk@gnu.org: Bugs. (line 30)
-* bug-gawk@gnu.org bug reporting address: Bugs. (line 30)
-* built-in functions: Functions. (line 6)
-* built-in functions, evaluation order: Calling Built-in. (line 30)
-* built-in variables: Built-in Variables. (line 6)
-* built-in variables, -v option, setting with: Options. (line 40)
-* built-in variables, conveying information: Auto-set. (line 6)
-* built-in variables, user-modifiable: User-modified. (line 6)
-* Busybox Awk: Other Versions. (line 79)
-* call by reference: Pass By Value/Reference.
- (line 47)
-* call by value: Pass By Value/Reference.
- (line 18)
-* caret (^) <1>: GNU Regexp Operators.
- (line 59)
-* caret (^): Regexp Operators. (line 22)
-* caret (^), ^ operator: Precedence. (line 49)
-* caret (^), ^= operator <1>: Precedence. (line 95)
-* caret (^), ^= operator: Assignment Ops. (line 129)
-* caret (^), in bracket expressions: Bracket Expressions. (line 17)
-* case keyword: Switch Statement. (line 6)
-* case sensitivity, array indices and: Array Intro. (line 92)
-* case sensitivity, converting case: String Functions. (line 522)
-* case sensitivity, example programs: Library Functions. (line 42)
-* case sensitivity, gawk: Case-sensitivity. (line 26)
-* case sensitivity, regexps and <1>: User-modified. (line 82)
-* case sensitivity, regexps and: Case-sensitivity. (line 6)
-* case sensitivity, string comparisons and: User-modified. (line 82)
-* CGI, awk scripts for: Options. (line 125)
-* character lists, See bracket expressions: Regexp Operators. (line 55)
-* character sets (machine character encodings) <1>: Glossary. (line 141)
-* character sets (machine character encodings): Ordinal Functions.
- (line 45)
-* character sets, See Also bracket expressions: Regexp Operators.
- (line 55)
-* characters, counting: Wc Program. (line 6)
-* characters, transliterating: Translate Program. (line 6)
-* characters, values of as numbers: Ordinal Functions. (line 6)
-* Chassell, Robert J.: Acknowledgments. (line 33)
-* chdir() function, implementing in gawk: Sample Library. (line 6)
-* chem utility: Glossary. (line 151)
-* chr() user-defined function: Ordinal Functions. (line 16)
-* clear debugger command: Breakpoint Control. (line 36)
-* Cliff random numbers: Cliff Random Function.
- (line 6)
-* cliff_rand() user-defined function: Cliff Random Function.
- (line 12)
-* close() function <1>: I/O Functions. (line 10)
-* close() function <2>: Close Files And Pipes.
- (line 18)
-* close() function <3>: Getline/Pipe. (line 24)
-* close() function: Getline/Variable/File.
- (line 30)
-* close() function, return values: Close Files And Pipes.
- (line 131)
-* close() function, two-way pipes and: Two-way I/O. (line 77)
-* Close, Diane <1>: Contributors. (line 21)
-* Close, Diane: Manual History. (line 41)
-* close_func() input method: Internals. (line 157)
-* collating elements: Bracket Expressions. (line 69)
-* collating symbols: Bracket Expressions. (line 76)
-* Colombo, Antonio: Acknowledgments. (line 60)
-* columns, aligning: Print Examples. (line 70)
-* columns, cutting: Cut Program. (line 6)
-* comma (,), in range patterns: Ranges. (line 6)
-* command line, arguments <1>: ARGC and ARGV. (line 6)
-* command line, arguments <2>: Auto-set. (line 11)
-* command line, arguments: Other Arguments. (line 6)
-* command line, directories on: Command line directories.
- (line 6)
-* command line, formats: Running gawk. (line 12)
-* command line, FS on, setting: Command Line Field Separator.
- (line 6)
-* command line, invoking awk from: Command Line. (line 6)
-* command line, options <1>: Command Line Field Separator.
- (line 6)
-* command line, options <2>: Options. (line 6)
-* command line, options: Long. (line 12)
-* command line, options, end of: Options. (line 54)
-* command line, variables, assigning on: Assignment Options. (line 6)
-* command-line options, processing: Getopt Function. (line 6)
-* command-line options, string extraction: String Extraction. (line 6)
-* commands debugger command: Debugger Execution Control.
- (line 10)
-* commenting: Comments. (line 6)
-* commenting, backslash continuation and: Statements/Lines. (line 76)
-* common extensions, ** operator: Arithmetic Ops. (line 36)
-* common extensions, **= operator: Assignment Ops. (line 136)
-* common extensions, /dev/stderr special file: Special FD. (line 46)
-* common extensions, /dev/stdin special file: Special FD. (line 46)
-* common extensions, /dev/stdout special file: Special FD. (line 46)
-* common extensions, \x escape sequence: Escape Sequences. (line 61)
-* common extensions, BINMODE variable: PC Using. (line 34)
-* common extensions, delete to delete entire arrays: Delete. (line 39)
-* common extensions, fflush() function: I/O Functions. (line 25)
-* common extensions, func keyword: Definition Syntax. (line 83)
-* common extensions, length() applied to an array: String Functions.
- (line 196)
-* common extensions, nextfile statement: Nextfile Statement. (line 6)
-* common extensions, RS as a regexp: Records. (line 115)
-* common extensions, single character fields: Single Character Fields.
- (line 6)
-* comp.lang.awk newsgroup: Bugs. (line 38)
-* comparison expressions: Typing and Comparison.
- (line 9)
-* comparison expressions, as patterns: Expression Patterns. (line 14)
-* comparison expressions, string vs. regexp: Comparison Operators.
- (line 79)
-* compatibility mode (gawk), extensions: POSIX/GNU. (line 6)
-* compatibility mode (gawk), file names: Special Caveats. (line 9)
-* compatibility mode (gawk), hexadecimal numbers: Nondecimal-numbers.
- (line 60)
-* compatibility mode (gawk), octal numbers: Nondecimal-numbers.
- (line 60)
-* compatibility mode (gawk), specifying: Options. (line 81)
-* compiled programs <1>: Glossary. (line 161)
-* compiled programs: Basic High Level. (line 14)
-* compiling gawk for Cygwin: Cygwin. (line 6)
-* compiling gawk for MS-DOS and MS-Windows: PC Compiling. (line 13)
-* compiling gawk for VMS: VMS Compilation. (line 6)
-* compiling gawk with EMX for OS/2: PC Compiling. (line 28)
-* compl() function (gawk): Bitwise Functions. (line 42)
-* complement, bitwise: Bitwise Functions. (line 25)
-* compound statements, control statements and: Statements. (line 10)
-* concatenating: Concatenation. (line 9)
-* condition debugger command: Breakpoint Control. (line 54)
-* conditional expressions: Conditional Exp. (line 6)
-* configuration option, --disable-lint: Additional Configuration Options.
- (line 9)
-* configuration option, --disable-nls: Additional Configuration Options.
- (line 24)
-* configuration option, --with-whiny-user-strftime: Additional Configuration Options.
- (line 29)
-* configuration options, gawk: Additional Configuration Options.
- (line 6)
-* constants, floating-point: Floating-point Constants.
- (line 6)
-* constants, nondecimal: Nondecimal Data. (line 6)
-* constants, types of: Constants. (line 6)
-* context, floating-point: Floating-point Context.
- (line 6)
-* continue statement: Continue Statement. (line 6)
-* control statements: Statements. (line 6)
-* converting, case: String Functions. (line 522)
-* converting, dates to timestamps: Time Functions. (line 74)
-* converting, during subscripting: Numeric Array Subscripts.
- (line 31)
-* converting, numbers to strings <1>: Bitwise Functions. (line 107)
-* converting, numbers to strings: Conversion. (line 6)
-* converting, strings to numbers <1>: Bitwise Functions. (line 107)
-* converting, strings to numbers: Conversion. (line 6)
-* CONVFMT variable <1>: User-modified. (line 28)
-* CONVFMT variable: Conversion. (line 29)
-* CONVFMT variable, array subscripts and: Numeric Array Subscripts.
- (line 6)
-* coprocesses <1>: Two-way I/O. (line 44)
-* coprocesses: Redirection. (line 102)
-* coprocesses, closing: Close Files And Pipes.
- (line 6)
-* coprocesses, getline from: Getline/Coprocess. (line 6)
-* cos() function: Numeric Functions. (line 15)
-* counting: Wc Program. (line 6)
-* csh utility: Statements/Lines. (line 44)
-* csh utility, POSIXLY_CORRECT environment variable: Options. (line 334)
-* csh utility, |& operator, comparison with: Two-way I/O. (line 44)
-* ctime() user-defined function: Function Example. (line 72)
-* currency symbols, localization: Explaining gettext. (line 103)
-* custom.h file: Configuration Philosophy.
- (line 30)
-* cut utility: Cut Program. (line 6)
-* cut.awk program: Cut Program. (line 45)
-* d debugger command (alias for delete): Breakpoint Control. (line 64)
-* d.c., See dark corner: Conventions. (line 38)
-* dark corner <1>: Glossary. (line 193)
-* dark corner <2>: Truth Values. (line 24)
-* dark corner <3>: Assignment Ops. (line 148)
-* dark corner: Conventions. (line 38)
-* dark corner, ^, in FS: Regexp Field Splitting.
- (line 59)
-* dark corner, array subscripts: Uninitialized Subscripts.
- (line 43)
-* dark corner, break statement: Break Statement. (line 51)
-* dark corner, close() function: Close Files And Pipes.
- (line 131)
-* dark corner, command-line arguments: Assignment Options. (line 43)
-* dark corner, continue statement: Continue Statement. (line 43)
-* dark corner, CONVFMT variable: Conversion. (line 40)
-* dark corner, escape sequences: Other Arguments. (line 31)
-* dark corner, escape sequences, for metacharacters: Escape Sequences.
- (line 136)
-* dark corner, exit statement: Exit Statement. (line 30)
-* dark corner, field separators: Field Splitting Summary.
- (line 47)
-* dark corner, FILENAME variable <1>: Auto-set. (line 93)
-* dark corner, FILENAME variable: Getline Notes. (line 19)
-* dark corner, FNR/NR variables: Auto-set. (line 225)
-* dark corner, format-control characters: Control Letters. (line 18)
-* dark corner, FS as null string: Single Character Fields.
- (line 20)
-* dark corner, input files: Records. (line 98)
-* dark corner, invoking awk: Command Line. (line 16)
-* dark corner, length() function: String Functions. (line 182)
-* dark corner, multiline records: Multiple Line. (line 35)
-* dark corner, NF variable, decrementing: Changing Fields. (line 107)
-* dark corner, OFMT variable: OFMT. (line 27)
-* dark corner, regexp constants: Using Constant Regexps.
- (line 6)
-* dark corner, regexp constants, /= operator and: Assignment Ops.
- (line 148)
-* dark corner, regexp constants, as arguments to user-defined functions: Using Constant Regexps.
- (line 43)
-* dark corner, split() function: String Functions. (line 361)
-* dark corner, strings, storing: Records. (line 191)
-* dark corner, value of ARGV[0]: Auto-set. (line 35)
-* data, fixed-width: Constant Size. (line 9)
-* data-driven languages: Basic High Level. (line 83)
-* database, group, reading: Group Functions. (line 6)
-* database, users, reading: Passwd Functions. (line 6)
-* date utility, GNU: Time Functions. (line 17)
-* date utility, POSIX: Time Functions. (line 261)
-* dates, converting to timestamps: Time Functions. (line 74)
-* dates, information related to, localization: Explaining gettext.
- (line 115)
-* Davies, Stephen <1>: Contributors. (line 74)
-* Davies, Stephen: Acknowledgments. (line 60)
-* dcgettext() function (gawk) <1>: Programmer i18n. (line 19)
-* dcgettext() function (gawk): I18N Functions. (line 22)
-* dcgettext() function (gawk), portability and: I18N Portability.
- (line 33)
-* dcngettext() function (gawk) <1>: Programmer i18n. (line 36)
-* dcngettext() function (gawk): I18N Functions. (line 28)
-* dcngettext() function (gawk), portability and: I18N Portability.
- (line 33)
-* deadlocks: Two-way I/O. (line 70)
-* debugger commands, b (break): Breakpoint Control. (line 11)
-* debugger commands, backtrace: Execution Stack. (line 13)
-* debugger commands, break: Breakpoint Control. (line 11)
-* debugger commands, bt (backtrace): Execution Stack. (line 13)
-* debugger commands, c (continue): Debugger Execution Control.
- (line 33)
-* debugger commands, clear: Breakpoint Control. (line 36)
-* debugger commands, commands: Debugger Execution Control.
- (line 10)
-* debugger commands, condition: Breakpoint Control. (line 54)
-* debugger commands, continue: Debugger Execution Control.
- (line 33)
-* debugger commands, d (delete): Breakpoint Control. (line 64)
-* debugger commands, delete: Breakpoint Control. (line 64)
-* debugger commands, disable: Breakpoint Control. (line 69)
-* debugger commands, display: Viewing And Changing Data.
- (line 8)
-* debugger commands, down: Execution Stack. (line 21)
-* debugger commands, dump: Miscellaneous Debugger Commands.
- (line 9)
-* debugger commands, e (enable): Breakpoint Control. (line 73)
-* debugger commands, enable: Breakpoint Control. (line 73)
-* debugger commands, end: Debugger Execution Control.
- (line 10)
-* debugger commands, eval: Viewing And Changing Data.
- (line 23)
-* debugger commands, f (frame): Execution Stack. (line 25)
-* debugger commands, finish: Debugger Execution Control.
- (line 39)
-* debugger commands, frame: Execution Stack. (line 25)
-* debugger commands, h (help): Miscellaneous Debugger Commands.
- (line 68)
-* debugger commands, help: Miscellaneous Debugger Commands.
- (line 68)
-* debugger commands, i (info): Debugger Info. (line 13)
-* debugger commands, ignore: Breakpoint Control. (line 87)
-* debugger commands, info: Debugger Info. (line 13)
-* debugger commands, l (list): Miscellaneous Debugger Commands.
- (line 74)
-* debugger commands, list: Miscellaneous Debugger Commands.
- (line 74)
-* debugger commands, n (next): Debugger Execution Control.
- (line 43)
-* debugger commands, next: Debugger Execution Control.
- (line 43)
-* debugger commands, nexti: Debugger Execution Control.
- (line 49)
-* debugger commands, ni (nexti): Debugger Execution Control.
- (line 49)
-* debugger commands, o (option): Debugger Info. (line 57)
-* debugger commands, option: Debugger Info. (line 57)
-* debugger commands, p (print): Viewing And Changing Data.
- (line 36)
-* debugger commands, print: Viewing And Changing Data.
- (line 36)
-* debugger commands, printf: Viewing And Changing Data.
- (line 54)
-* debugger commands, q (quit): Miscellaneous Debugger Commands.
- (line 101)
-* debugger commands, quit: Miscellaneous Debugger Commands.
- (line 101)
-* debugger commands, r (run): Debugger Execution Control.
- (line 62)
-* debugger commands, return: Debugger Execution Control.
- (line 54)
-* debugger commands, run: Debugger Execution Control.
- (line 62)
-* debugger commands, s (step): Debugger Execution Control.
- (line 68)
-* debugger commands, set: Viewing And Changing Data.
- (line 59)
-* debugger commands, si (stepi): Debugger Execution Control.
- (line 76)
-* debugger commands, silent: Debugger Execution Control.
- (line 10)
-* debugger commands, step: Debugger Execution Control.
- (line 68)
-* debugger commands, stepi: Debugger Execution Control.
- (line 76)
-* debugger commands, t (tbreak): Breakpoint Control. (line 90)
-* debugger commands, tbreak: Breakpoint Control. (line 90)
-* debugger commands, trace: Miscellaneous Debugger Commands.
- (line 110)
-* debugger commands, u (until): Debugger Execution Control.
- (line 83)
-* debugger commands, undisplay: Viewing And Changing Data.
- (line 80)
-* debugger commands, until: Debugger Execution Control.
- (line 83)
-* debugger commands, unwatch: Viewing And Changing Data.
- (line 84)
-* debugger commands, up: Execution Stack. (line 33)
-* debugger commands, w (watch): Viewing And Changing Data.
- (line 67)
-* debugger commands, watch: Viewing And Changing Data.
- (line 67)
-* debugging awk programs: Debugger. (line 6)
-* debugging gawk, bug reports: Bugs. (line 9)
-* decimal point character, locale specific: Options. (line 249)
-* decrement operators: Increment Ops. (line 35)
-* default keyword: Switch Statement. (line 6)
-* Deifik, Scott <1>: Bugs. (line 70)
-* Deifik, Scott <2>: Contributors. (line 54)
-* Deifik, Scott: Acknowledgments. (line 60)
-* delete debugger command: Breakpoint Control. (line 64)
-* delete statement: Delete. (line 6)
-* deleting elements in arrays: Delete. (line 6)
-* deleting entire arrays: Delete. (line 39)
-* differences between gawk and awk: String Functions. (line 196)
-* differences in awk and gawk, ARGC/ARGV variables: ARGC and ARGV.
- (line 88)
-* differences in awk and gawk, ARGIND variable: Auto-set. (line 40)
-* differences in awk and gawk, array elements, deleting: Delete.
- (line 39)
-* differences in awk and gawk, AWKLIBPATH environment variable: AWKLIBPATH Variable.
- (line 6)
-* differences in awk and gawk, AWKPATH environment variable: AWKPATH Variable.
- (line 6)
-* differences in awk and gawk, BEGIN/END patterns: I/O And BEGIN/END.
- (line 16)
-* differences in awk and gawk, BINMODE variable <1>: PC Using.
- (line 34)
-* differences in awk and gawk, BINMODE variable: User-modified.
- (line 23)
-* differences in awk and gawk, close() function: Close Files And Pipes.
- (line 81)
-* differences in awk and gawk, ERRNO variable: Auto-set. (line 73)
-* differences in awk and gawk, error messages: Special FD. (line 16)
-* differences in awk and gawk, FIELDWIDTHS variable: User-modified.
- (line 35)
-* differences in awk and gawk, FPAT variable: User-modified. (line 45)
-* differences in awk and gawk, function arguments (gawk): Calling Built-in.
- (line 16)
-* differences in awk and gawk, getline command: Getline. (line 19)
-* differences in awk and gawk, IGNORECASE variable: User-modified.
- (line 82)
-* differences in awk and gawk, implementation limitations <1>: Redirection.
- (line 135)
-* differences in awk and gawk, implementation limitations: Getline Notes.
- (line 14)
-* differences in awk and gawk, indirect function calls: Indirect Calls.
- (line 6)
-* differences in awk and gawk, input/output operators <1>: Redirection.
- (line 102)
-* differences in awk and gawk, input/output operators: Getline/Coprocess.
- (line 6)
-* differences in awk and gawk, line continuations: Conditional Exp.
- (line 34)
-* differences in awk and gawk, LINT variable: User-modified. (line 98)
-* differences in awk and gawk, match() function: String Functions.
- (line 259)
-* differences in awk and gawk, next/nextfile statements: Nextfile Statement.
- (line 6)
-* differences in awk and gawk, print/printf statements: Format Modifiers.
- (line 13)
-* differences in awk and gawk, PROCINFO array: Auto-set. (line 124)
-* differences in awk and gawk, record separators: Records. (line 112)
-* differences in awk and gawk, regexp constants: Using Constant Regexps.
- (line 43)
-* differences in awk and gawk, regular expressions: Case-sensitivity.
- (line 26)
-* differences in awk and gawk, RS/RT variables: Records. (line 167)
-* differences in awk and gawk, RT variable: Auto-set. (line 214)
-* differences in awk and gawk, single-character fields: Single Character Fields.
- (line 6)
-* differences in awk and gawk, split() function: String Functions.
- (line 349)
-* differences in awk and gawk, strings: Scalar Constants. (line 20)
-* differences in awk and gawk, strings, storing: Records. (line 187)
-* differences in awk and gawk, strtonum() function (gawk): String Functions.
- (line 404)
-* differences in awk and gawk, TEXTDOMAIN variable: User-modified.
- (line 162)
-* differences in awk and gawk, trunc-mod operation: Arithmetic Ops.
- (line 66)
-* directories, changing: Sample Library. (line 6)
-* directories, command line: Command line directories.
- (line 6)
-* directories, searching <1>: Igawk Program. (line 368)
-* directories, searching <2>: AWKLIBPATH Variable. (line 6)
-* directories, searching: AWKPATH Variable. (line 6)
-* disable debugger command: Breakpoint Control. (line 69)
-* display debugger command: Viewing And Changing Data.
- (line 8)
-* division: Arithmetic Ops. (line 44)
-* do-while statement <1>: Do Statement. (line 6)
-* do-while statement: Regexp Usage. (line 19)
-* documentation, of awk programs: Library Names. (line 6)
-* documentation, online: Manual History. (line 11)
-* documents, searching: Dupword Program. (line 6)
-* dollar sign ($): Regexp Operators. (line 35)
-* dollar sign ($), $ field operator <1>: Precedence. (line 43)
-* dollar sign ($), $ field operator: Fields. (line 19)
-* dollar sign ($), incrementing fields and arrays: Increment Ops.
- (line 30)
-* double precision floating-point: General Arithmetic. (line 21)
-* double quote (") <1>: Quoting. (line 37)
-* double quote ("): Read Terminal. (line 25)
-* double quote ("), regexp constants: Computed Regexps. (line 28)
-* down debugger command: Execution Stack. (line 21)
-* Drepper, Ulrich: Acknowledgments. (line 52)
-* DuBois, John: Acknowledgments. (line 60)
-* dump debugger command: Miscellaneous Debugger Commands.
- (line 9)
-* dupnode() internal function: Internals. (line 87)
-* dupword.awk program: Dupword Program. (line 31)
-* e debugger command (alias for enable): Breakpoint Control. (line 73)
-* EBCDIC: Ordinal Functions. (line 45)
-* egrep utility <1>: Egrep Program. (line 6)
-* egrep utility: Bracket Expressions. (line 24)
-* egrep.awk program: Egrep Program. (line 54)
-* elements in arrays: Reference to Elements.
- (line 6)
-* elements in arrays, assigning: Assigning Elements. (line 6)
-* elements in arrays, deleting: Delete. (line 6)
-* elements in arrays, order of: Scanning an Array. (line 48)
-* elements in arrays, scanning: Scanning an Array. (line 6)
-* email address for bug reports, bug-gawk@gnu.org: Bugs. (line 30)
-* EMISTERED: TCP/IP Networking. (line 6)
-* empty pattern: Empty. (line 6)
-* empty strings, See null strings: Regexp Field Splitting.
- (line 43)
-* enable debugger command: Breakpoint Control. (line 73)
-* end debugger command: Debugger Execution Control.
- (line 10)
-* END pattern <1>: Profiling. (line 62)
-* END pattern: BEGIN/END. (line 6)
-* END pattern, assert() user-defined function and: Assert Function.
- (line 75)
-* END pattern, backslash continuation and: Egrep Program. (line 220)
-* END pattern, Boolean patterns and: Expression Patterns. (line 73)
-* END pattern, exit statement and: Exit Statement. (line 12)
-* END pattern, next/nextfile statements and <1>: Next Statement.
- (line 45)
-* END pattern, next/nextfile statements and: I/O And BEGIN/END.
- (line 37)
-* END pattern, operators and: Using BEGIN/END. (line 17)
-* END pattern, print statement and: I/O And BEGIN/END. (line 16)
-* ENDFILE pattern: BEGINFILE/ENDFILE. (line 6)
-* ENDFILE pattern, Boolean patterns and: Expression Patterns. (line 73)
-* endfile() user-defined function: Filetrans Function. (line 62)
-* endgrent() function (C library): Group Functions. (line 215)
-* endgrent() user-defined function: Group Functions. (line 218)
-* endpwent() function (C library): Passwd Functions. (line 210)
-* endpwent() user-defined function: Passwd Functions. (line 213)
-* ENVIRON array <1>: Internals. (line 146)
-* ENVIRON array: Auto-set. (line 60)
-* environment variables: Auto-set. (line 60)
-* epoch, definition of: Glossary. (line 239)
-* equals sign (=), = operator: Assignment Ops. (line 6)
-* equals sign (=), == operator <1>: Precedence. (line 65)
-* equals sign (=), == operator: Comparison Operators.
- (line 11)
-* EREs (Extended Regular Expressions): Bracket Expressions. (line 24)
-* ERRNO variable <1>: Internals. (line 130)
-* ERRNO variable <2>: TCP/IP Networking. (line 54)
-* ERRNO variable <3>: Auto-set. (line 73)
-* ERRNO variable <4>: BEGINFILE/ENDFILE. (line 26)
-* ERRNO variable <5>: Close Files And Pipes.
- (line 139)
-* ERRNO variable: Getline. (line 19)
-* error handling: Special FD. (line 16)
-* error handling, ERRNO variable and: Auto-set. (line 73)
-* error output: Special FD. (line 6)
-* escape processing, gsub()/gensub()/sub() functions: Gory Details.
- (line 6)
-* escape sequences: Escape Sequences. (line 6)
-* eval debugger command: Viewing And Changing Data.
- (line 23)
-* evaluation order: Increment Ops. (line 61)
-* evaluation order, concatenation: Concatenation. (line 42)
-* evaluation order, functions: Calling Built-in. (line 30)
-* examining fields: Fields. (line 6)
-* exclamation point (!), ! operator <1>: Egrep Program. (line 170)
-* exclamation point (!), ! operator <2>: Precedence. (line 52)
-* exclamation point (!), ! operator: Boolean Ops. (line 67)
-* exclamation point (!), != operator <1>: Precedence. (line 65)
-* exclamation point (!), != operator: Comparison Operators.
- (line 11)
-* exclamation point (!), !~ operator <1>: Expression Patterns.
- (line 24)
-* exclamation point (!), !~ operator <2>: Precedence. (line 80)
-* exclamation point (!), !~ operator <3>: Comparison Operators.
- (line 11)
-* exclamation point (!), !~ operator <4>: Regexp Constants. (line 6)
-* exclamation point (!), !~ operator <5>: Computed Regexps. (line 6)
-* exclamation point (!), !~ operator <6>: Case-sensitivity. (line 26)
-* exclamation point (!), !~ operator: Regexp Usage. (line 19)
-* exit statement: Exit Statement. (line 6)
-* exit status, of gawk: Exit Status. (line 6)
-* exp() function: Numeric Functions. (line 18)
-* expand utility: Very Simple. (line 69)
-* expressions: Expressions. (line 6)
-* expressions, as patterns: Expression Patterns. (line 6)
-* expressions, assignment: Assignment Ops. (line 6)
-* expressions, Boolean: Boolean Ops. (line 6)
-* expressions, comparison: Typing and Comparison.
- (line 9)
-* expressions, conditional: Conditional Exp. (line 6)
-* expressions, matching, See comparison expressions: Typing and Comparison.
- (line 9)
-* expressions, selecting: Conditional Exp. (line 6)
-* Extended Regular Expressions (EREs): Bracket Expressions. (line 24)
-* eXtensible Markup Language (XML): Internals. (line 157)
-* extension() function (gawk): Using Internal File Ops.
- (line 15)
-* extensions, Brian Kernighan's awk <1>: Other Versions. (line 13)
-* extensions, Brian Kernighan's awk: BTL. (line 6)
-* extensions, common, ** operator: Arithmetic Ops. (line 36)
-* extensions, common, **= operator: Assignment Ops. (line 136)
-* extensions, common, /dev/stderr special file: Special FD. (line 46)
-* extensions, common, /dev/stdin special file: Special FD. (line 46)
-* extensions, common, /dev/stdout special file: Special FD. (line 46)
-* extensions, common, \x escape sequence: Escape Sequences. (line 61)
-* extensions, common, BINMODE variable: PC Using. (line 34)
-* extensions, common, delete to delete entire arrays: Delete. (line 39)
-* extensions, common, fflush() function: I/O Functions. (line 25)
-* extensions, common, func keyword: Definition Syntax. (line 83)
-* extensions, common, length() applied to an array: String Functions.
- (line 196)
-* extensions, common, nextfile statement: Nextfile Statement. (line 6)
-* extensions, common, RS as a regexp: Records. (line 115)
-* extensions, common, single character fields: Single Character Fields.
- (line 6)
-* extensions, in gawk, not in POSIX awk: POSIX/GNU. (line 6)
-* extract.awk program: Extract Program. (line 78)
-* extraction, of marked strings (internationalization): String Extraction.
- (line 6)
-* f debugger command (alias for frame): Execution Stack. (line 25)
-* false, logical: Truth Values. (line 6)
-* FDL (Free Documentation License): GNU Free Documentation License.
- (line 6)
-* features, adding to gawk: Adding Code. (line 6)
-* features, advanced, See advanced features: Obsolete. (line 6)
-* features, deprecated: Obsolete. (line 6)
-* features, undocumented: Undocumented. (line 6)
-* Fenlason, Jay <1>: Contributors. (line 19)
-* Fenlason, Jay: History. (line 30)
-* fflush() function: I/O Functions. (line 25)
-* field numbers: Nonconstant Fields. (line 6)
-* field operator $: Fields. (line 19)
-* field operators, dollar sign as: Fields. (line 19)
-* field separators <1>: User-modified. (line 56)
-* field separators: Field Separators. (line 14)
-* field separators, choice of: Field Separators. (line 50)
-* field separators, FIELDWIDTHS variable and: User-modified. (line 35)
-* field separators, FPAT variable and: User-modified. (line 45)
-* field separators, in multiline records: Multiple Line. (line 41)
-* field separators, on command line: Command Line Field Separator.
- (line 6)
-* field separators, POSIX and <1>: Field Splitting Summary.
- (line 41)
-* field separators, POSIX and: Fields. (line 6)
-* field separators, regular expressions as <1>: Regexp Field Splitting.
- (line 6)
-* field separators, regular expressions as: Field Separators. (line 50)
-* field separators, See Also OFS: Changing Fields. (line 64)
-* field separators, spaces as: Cut Program. (line 109)
-* fields <1>: Basic High Level. (line 71)
-* fields <2>: Fields. (line 6)
-* fields: Reading Files. (line 14)
-* fields, adding: Changing Fields. (line 53)
-* fields, changing contents of: Changing Fields. (line 6)
-* fields, cutting: Cut Program. (line 6)
-* fields, examining: Fields. (line 6)
-* fields, number of: Fields. (line 33)
-* fields, numbers: Nonconstant Fields. (line 6)
-* fields, printing: Print Examples. (line 21)
-* fields, separating: Field Separators. (line 14)
-* fields, single-character: Single Character Fields.
- (line 6)
-* FIELDWIDTHS variable <1>: User-modified. (line 35)
-* FIELDWIDTHS variable: Constant Size. (line 22)
-* file descriptors: Special FD. (line 6)
-* file names, distinguishing: Auto-set. (line 52)
-* file names, in compatibility mode: Special Caveats. (line 9)
-* file names, standard streams in gawk: Special FD. (line 46)
-* FILENAME variable <1>: Auto-set. (line 93)
-* FILENAME variable: Reading Files. (line 6)
-* FILENAME variable, getline, setting with: Getline Notes. (line 19)
-* filenames, assignments as: Ignoring Assigns. (line 6)
-* files, .mo: Explaining gettext. (line 41)
-* files, .mo, converting from .po: I18N Example. (line 62)
-* files, .mo, specifying directory of <1>: Programmer i18n. (line 47)
-* files, .mo, specifying directory of: Explaining gettext. (line 53)
-* files, .po <1>: Translator i18n. (line 6)
-* files, .po: Explaining gettext. (line 36)
-* files, .po, converting to .mo: I18N Example. (line 62)
-* files, .pot: Explaining gettext. (line 30)
-* files, /dev/... special files: Special FD. (line 46)
-* files, /inet/... (gawk): TCP/IP Networking. (line 6)
-* files, /inet4/... (gawk): TCP/IP Networking. (line 6)
-* files, /inet6/... (gawk): TCP/IP Networking. (line 6)
-* files, as single records: Records. (line 196)
-* files, awk programs in: Long. (line 6)
-* files, awkprof.out: Profiling. (line 6)
-* files, awkvars.out: Options. (line 93)
-* files, closing: I/O Functions. (line 10)
-* files, descriptors, See file descriptors: Special FD. (line 6)
-* files, group: Group Functions. (line 6)
-* files, information about, retrieving: Sample Library. (line 6)
-* files, initialization and cleanup: Filetrans Function. (line 6)
-* files, input, See input files: Read Terminal. (line 17)
-* files, log, timestamps in: Time Functions. (line 6)
-* files, managing: Data File Management.
- (line 6)
-* files, managing, data file boundaries: Filetrans Function. (line 6)
-* files, message object: Explaining gettext. (line 41)
-* files, message object, converting from portable object files: I18N Example.
- (line 62)
-* files, message object, specifying directory of <1>: Programmer i18n.
- (line 47)
-* files, message object, specifying directory of: Explaining gettext.
- (line 53)
-* files, multiple passes over: Other Arguments. (line 49)
-* files, multiple, duplicating output into: Tee Program. (line 6)
-* files, output, See output files: Close Files And Pipes.
- (line 6)
-* files, password: Passwd Functions. (line 16)
-* files, portable object <1>: Translator i18n. (line 6)
-* files, portable object: Explaining gettext. (line 36)
-* files, portable object template: Explaining gettext. (line 30)
-* files, portable object, converting to message object files: I18N Example.
- (line 62)
-* files, portable object, generating: Options. (line 147)
-* files, processing, ARGIND variable and: Auto-set. (line 47)
-* files, reading: Rewind Function. (line 6)
-* files, reading, multiline records: Multiple Line. (line 6)
-* files, searching for regular expressions: Egrep Program. (line 6)
-* files, skipping: File Checking. (line 6)
-* files, source, search path for: Igawk Program. (line 368)
-* files, splitting: Split Program. (line 6)
-* files, Texinfo, extracting programs from: Extract Program. (line 6)
-* finish debugger command: Debugger Execution Control.
- (line 39)
-* Fish, Fred: Contributors. (line 51)
-* fixed-width data: Constant Size. (line 9)
-* flag variables <1>: Tee Program. (line 20)
-* flag variables: Boolean Ops. (line 67)
-* floating-point numbers, arbitrary precision: Arbitrary Precision Arithmetic.
- (line 6)
-* floating-point, numbers <1>: Unexpected Results. (line 6)
-* floating-point, numbers: General Arithmetic. (line 6)
-* floating-point, numbers, AWKNUM internal type: Internals. (line 19)
-* FNR variable <1>: Auto-set. (line 103)
-* FNR variable: Records. (line 6)
-* FNR variable, changing: Auto-set. (line 225)
-* for statement: For Statement. (line 6)
-* for statement, in arrays: Scanning an Array. (line 20)
-* force_number() internal function: Internals. (line 27)
-* force_string() internal function: Internals. (line 32)
-* force_wstring() internal function: Internals. (line 37)
-* format specifiers, mixing regular with positional specifiers: Printf Ordering.
- (line 57)
-* format specifiers, printf statement: Control Letters. (line 6)
-* format specifiers, strftime() function (gawk): Time Functions.
- (line 87)
-* format strings: Basic Printf. (line 15)
-* formats, numeric output: OFMT. (line 6)
-* formatting output: Printf. (line 6)
-* forward slash (/): Regexp. (line 10)
-* forward slash (/), / operator: Precedence. (line 55)
-* forward slash (/), /= operator <1>: Precedence. (line 95)
-* forward slash (/), /= operator: Assignment Ops. (line 129)
-* forward slash (/), /= operator, vs. /=.../ regexp constant: Assignment Ops.
- (line 148)
-* forward slash (/), patterns and: Expression Patterns. (line 24)
-* FPAT variable <1>: User-modified. (line 45)
-* FPAT variable: Splitting By Content.
- (line 26)
-* frame debugger command: Execution Stack. (line 25)
-* Free Documentation License (FDL): GNU Free Documentation License.
- (line 6)
-* Free Software Foundation (FSF) <1>: Glossary. (line 301)
-* Free Software Foundation (FSF) <2>: Getting. (line 10)
-* Free Software Foundation (FSF): Manual History. (line 6)
-* FreeBSD: Glossary. (line 611)
-* FS variable <1>: User-modified. (line 56)
-* FS variable: Field Separators. (line 14)
-* FS variable, --field-separator option and: Options. (line 21)
-* FS variable, as null string: Single Character Fields.
- (line 20)
-* FS variable, as TAB character: Options. (line 245)
-* FS variable, changing value of: Field Separators. (line 34)
-* FS variable, running awk programs and: Cut Program. (line 68)
-* FS variable, setting from command line: Command Line Field Separator.
- (line 6)
-* FS, containing ^: Regexp Field Splitting.
- (line 59)
-* FSF (Free Software Foundation) <1>: Glossary. (line 301)
-* FSF (Free Software Foundation) <2>: Getting. (line 10)
-* FSF (Free Software Foundation): Manual History. (line 6)
-* function calls: Function Calls. (line 6)
-* function calls, indirect: Indirect Calls. (line 6)
-* function pointers: Indirect Calls. (line 6)
-* functions, arrays as parameters to: Pass By Value/Reference.
- (line 47)
-* functions, built-in <1>: Functions. (line 6)
-* functions, built-in: Function Calls. (line 10)
-* functions, built-in, adding to gawk: Dynamic Extensions. (line 10)
-* functions, built-in, evaluation order: Calling Built-in. (line 30)
-* functions, defining: Definition Syntax. (line 6)
-* functions, library: Library Functions. (line 6)
-* functions, library, assertions: Assert Function. (line 6)
-* functions, library, associative arrays and: Library Names. (line 57)
-* functions, library, C library: Getopt Function. (line 6)
-* functions, library, character values as numbers: Ordinal Functions.
- (line 6)
-* functions, library, Cliff random numbers: Cliff Random Function.
- (line 6)
-* functions, library, command-line options: Getopt Function. (line 6)
-* functions, library, example program for using: Igawk Program.
- (line 6)
-* functions, library, group database, reading: Group Functions.
- (line 6)
-* functions, library, managing data files: Data File Management.
- (line 6)
-* functions, library, managing time: Gettimeofday Function.
- (line 6)
-* functions, library, merging arrays into strings: Join Function.
- (line 6)
-* functions, library, rounding numbers: Round Function. (line 6)
-* functions, library, user database, reading: Passwd Functions.
- (line 6)
-* functions, names of <1>: Definition Syntax. (line 20)
-* functions, names of: Arrays. (line 18)
-* functions, recursive: Definition Syntax. (line 73)
-* functions, return values, setting: Internals. (line 130)
-* functions, string-translation: I18N Functions. (line 6)
-* functions, undefined: Pass By Value/Reference.
- (line 71)
-* functions, user-defined: User-defined. (line 6)
-* functions, user-defined, calling: Calling A Function. (line 6)
-* functions, user-defined, counts: Profiling. (line 129)
-* functions, user-defined, library of: Library Functions. (line 6)
-* functions, user-defined, next/nextfile statements and <1>: Nextfile Statement.
- (line 44)
-* functions, user-defined, next/nextfile statements and: Next Statement.
- (line 45)
-* G-d: Acknowledgments. (line 83)
-* Garfinkle, Scott: Contributors. (line 35)
-* gawk program, dynamic profiling: Profiling. (line 171)
-* gawk, ARGIND variable in: Other Arguments. (line 12)
-* gawk, awk and <1>: This Manual. (line 14)
-* gawk, awk and: Preface. (line 23)
-* gawk, bitwise operations in: Bitwise Functions. (line 39)
-* gawk, break statement in: Break Statement. (line 51)
-* gawk, built-in variables and: Built-in Variables. (line 14)
-* gawk, character classes and: Bracket Expressions. (line 90)
-* gawk, coding style in: Adding Code. (line 38)
-* gawk, command-line options: GNU Regexp Operators.
- (line 70)
-* gawk, comparison operators and: Comparison Operators.
- (line 50)
-* gawk, configuring: Configuration Philosophy.
- (line 6)
-* gawk, configuring, options: Additional Configuration Options.
- (line 6)
-* gawk, continue statement in: Continue Statement. (line 43)
-* gawk, distribution: Distribution contents.
- (line 6)
-* gawk, ERRNO variable in <1>: TCP/IP Networking. (line 54)
-* gawk, ERRNO variable in <2>: Auto-set. (line 73)
-* gawk, ERRNO variable in <3>: BEGINFILE/ENDFILE. (line 26)
-* gawk, ERRNO variable in <4>: Close Files And Pipes.
- (line 139)
-* gawk, ERRNO variable in: Getline. (line 19)
-* gawk, escape sequences: Escape Sequences. (line 125)
-* gawk, extensions, disabling: Options. (line 233)
-* gawk, features, adding: Adding Code. (line 6)
-* gawk, features, advanced: Advanced Features. (line 6)
-* gawk, fflush() function in: I/O Functions. (line 44)
-* gawk, field separators and: User-modified. (line 77)
-* gawk, FIELDWIDTHS variable in <1>: User-modified. (line 35)
-* gawk, FIELDWIDTHS variable in: Constant Size. (line 22)
-* gawk, file names in: Special Files. (line 6)
-* gawk, format-control characters: Control Letters. (line 18)
-* gawk, FPAT variable in <1>: User-modified. (line 45)
-* gawk, FPAT variable in: Splitting By Content.
- (line 26)
-* gawk, function arguments and: Calling Built-in. (line 16)
-* gawk, functions, adding: Dynamic Extensions. (line 10)
-* gawk, functions, loading: Loading Extensions. (line 6)
-* gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42)
-* gawk, IGNORECASE variable in <1>: Array Sorting Functions.
- (line 81)
-* gawk, IGNORECASE variable in <2>: String Functions. (line 29)
-* gawk, IGNORECASE variable in <3>: Array Intro. (line 92)
-* gawk, IGNORECASE variable in <4>: User-modified. (line 82)
-* gawk, IGNORECASE variable in: Case-sensitivity. (line 26)
-* gawk, implementation issues: Notes. (line 6)
-* gawk, implementation issues, debugging: Compatibility Mode. (line 6)
-* gawk, implementation issues, downward compatibility: Compatibility Mode.
- (line 6)
-* gawk, implementation issues, limits: Getline Notes. (line 14)
-* gawk, implementation issues, pipes: Redirection. (line 135)
-* gawk, installing: Installation. (line 6)
-* gawk, internals: Internals. (line 6)
-* gawk, internationalization and, See internationalization: Internationalization.
- (line 13)
-* gawk, interpreter, adding code to: Using Internal File Ops.
- (line 6)
-* gawk, interval expressions and: Regexp Operators. (line 139)
-* gawk, line continuation in: Conditional Exp. (line 34)
-* gawk, LINT variable in: User-modified. (line 98)
-* gawk, list of contributors to: Contributors. (line 6)
-* gawk, MS-DOS version of: PC Using. (line 11)
-* gawk, MS-Windows version of: PC Using. (line 11)
-* gawk, newlines in: Statements/Lines. (line 12)
-* gawk, octal numbers and: Nondecimal-numbers. (line 42)
-* gawk, OS/2 version of: PC Using. (line 11)
-* gawk, PROCINFO array in <1>: Two-way I/O. (line 116)
-* gawk, PROCINFO array in <2>: Time Functions. (line 46)
-* gawk, PROCINFO array in: Auto-set. (line 124)
-* gawk, regexp constants and: Using Constant Regexps.
- (line 28)
-* gawk, regular expressions, case sensitivity: Case-sensitivity.
- (line 26)
-* gawk, regular expressions, operators: GNU Regexp Operators.
- (line 6)
-* gawk, regular expressions, precedence: Regexp Operators. (line 161)
-* gawk, RT variable in <1>: Auto-set. (line 214)
-* gawk, RT variable in <2>: Getline/Variable/File.
- (line 10)
-* gawk, RT variable in <3>: Multiple Line. (line 129)
-* gawk, RT variable in: Records. (line 112)
-* gawk, See Also awk: Preface. (line 36)
-* gawk, source code, obtaining: Getting. (line 6)
-* gawk, splitting fields and: Constant Size. (line 87)
-* gawk, string-translation functions: I18N Functions. (line 6)
-* gawk, TEXTDOMAIN variable in: User-modified. (line 162)
-* gawk, timestamps: Time Functions. (line 6)
-* gawk, uses for: Preface. (line 36)
-* gawk, versions of, information about, printing: Options. (line 279)
-* gawk, VMS version of: VMS Installation. (line 6)
-* gawk, word-boundary operator: GNU Regexp Operators.
- (line 63)
-* General Public License (GPL): Glossary. (line 310)
-* General Public License, See GPL: Manual History. (line 11)
-* gensub() function (gawk) <1>: String Functions. (line 86)
-* gensub() function (gawk): Using Constant Regexps.
- (line 43)
-* gensub() function (gawk), escape processing: Gory Details. (line 6)
-* get_actual_argument() internal function: Internals. (line 116)
-* get_argument() internal function: Internals. (line 111)
-* get_array_argument() internal macro: Internals. (line 127)
-* get_record() input method: Internals. (line 157)
-* get_scalar_argument() internal macro: Internals. (line 124)
-* getaddrinfo() function (C library): TCP/IP Networking. (line 38)
-* getgrent() function (C library): Group Functions. (line 6)
-* getgrent() user-defined function: Group Functions. (line 6)
-* getgrgid() function (C library): Group Functions. (line 186)
-* getgrgid() user-defined function: Group Functions. (line 189)
-* getgrnam() function (C library): Group Functions. (line 175)
-* getgrnam() user-defined function: Group Functions. (line 180)
-* getgruser() function (C library): Group Functions. (line 195)
-* getgruser() function, user-defined: Group Functions. (line 198)
-* getline command: Reading Files. (line 20)
-* getline command, _gr_init() user-defined function: Group Functions.
- (line 82)
-* getline command, _pw_init() function: Passwd Functions. (line 154)
-* getline command, coprocesses, using from <1>: Close Files And Pipes.
- (line 6)
-* getline command, coprocesses, using from: Getline/Coprocess.
- (line 6)
-* getline command, deadlock and: Two-way I/O. (line 70)
-* getline command, explicit input with: Getline. (line 6)
-* getline command, FILENAME variable and: Getline Notes. (line 19)
-* getline command, return values: Getline. (line 19)
-* getline command, variants: Getline Summary. (line 6)
-* getline statement, BEGINFILE/ENDFILE patterns and: BEGINFILE/ENDFILE.
- (line 54)
-* getopt() function (C library): Getopt Function. (line 15)
-* getopt() user-defined function: Getopt Function. (line 108)
-* getpwent() function (C library): Passwd Functions. (line 16)
-* getpwent() user-defined function: Passwd Functions. (line 16)
-* getpwnam() function (C library): Passwd Functions. (line 177)
-* getpwnam() user-defined function: Passwd Functions. (line 182)
-* getpwuid() function (C library): Passwd Functions. (line 188)
-* getpwuid() user-defined function: Passwd Functions. (line 192)
-* gettext library: Explaining gettext. (line 6)
-* gettext library, locale categories: Explaining gettext. (line 80)
-* gettext() function (C library): Explaining gettext. (line 62)
-* gettimeofday() user-defined function: Gettimeofday Function.
- (line 16)
-* GMP: Arbitrary Precision Arithmetic.
- (line 6)
-* GNITS mailing list: Acknowledgments. (line 52)
-* GNU awk, See gawk: Preface. (line 49)
-* GNU Free Documentation License: GNU Free Documentation License.
- (line 6)
-* GNU General Public License: Glossary. (line 310)
-* GNU Lesser General Public License: Glossary. (line 397)
-* GNU long options <1>: Options. (line 6)
-* GNU long options: Command Line. (line 13)
-* GNU long options, printing list of: Options. (line 154)
-* GNU Project <1>: Glossary. (line 319)
-* GNU Project: Manual History. (line 11)
-* GNU/Linux <1>: Glossary. (line 611)
-* GNU/Linux <2>: I18N Example. (line 55)
-* GNU/Linux: Manual History. (line 28)
-* GPL (General Public License) <1>: Glossary. (line 310)
-* GPL (General Public License): Manual History. (line 11)
-* GPL (General Public License), printing: Options. (line 88)
-* grcat program: Group Functions. (line 16)
-* Grigera, Juan: Contributors. (line 58)
-* group database, reading: Group Functions. (line 6)
-* group file: Group Functions. (line 6)
-* groups, information about: Group Functions. (line 6)
-* gsub() function <1>: String Functions. (line 139)
-* gsub() function: Using Constant Regexps.
- (line 43)
-* gsub() function, arguments of: String Functions. (line 462)
-* gsub() function, escape processing: Gory Details. (line 6)
-* h debugger command (alias for help): Miscellaneous Debugger Commands.
- (line 68)
-* Hankerson, Darrel <1>: Contributors. (line 61)
-* Hankerson, Darrel: Acknowledgments. (line 60)
-* Haque, John <1>: Contributors. (line 103)
-* Haque, John: Acknowledgments. (line 60)
-* Hartholz, Elaine: Acknowledgments. (line 38)
-* Hartholz, Marshall: Acknowledgments. (line 38)
-* Hasegawa, Isamu: Contributors. (line 94)
-* help debugger command: Miscellaneous Debugger Commands.
- (line 68)
-* hexadecimal numbers: Nondecimal-numbers. (line 6)
-* hexadecimal values, enabling interpretation of: Options. (line 193)
-* histsort.awk program: History Sorting. (line 25)
-* Hughes, Phil: Acknowledgments. (line 43)
-* HUP signal: Profiling. (line 203)
-* hyphen (-), - operator: Precedence. (line 52)
-* hyphen (-), -- (decrement/increment) operators: Precedence. (line 46)
-* hyphen (-), -- operator: Increment Ops. (line 48)
-* hyphen (-), -= operator <1>: Precedence. (line 95)
-* hyphen (-), -= operator: Assignment Ops. (line 129)
-* hyphen (-), filenames beginning with: Options. (line 59)
-* hyphen (-), in bracket expressions: Bracket Expressions. (line 17)
-* i debugger command (alias for info): Debugger Info. (line 13)
-* id utility: Id Program. (line 6)
-* id.awk program: Id Program. (line 30)
-* IEEE-754 format: Floating-point Representation.
- (line 6)
-* if statement <1>: If Statement. (line 6)
-* if statement: Regexp Usage. (line 19)
-* if statement, actions, changing: Ranges. (line 25)
-* igawk.sh program: Igawk Program. (line 124)
-* ignore debugger command: Breakpoint Control. (line 87)
-* IGNORECASE variable <1>: Array Sorting Functions.
- (line 81)
-* IGNORECASE variable <2>: String Functions. (line 29)
-* IGNORECASE variable <3>: Array Intro. (line 92)
-* IGNORECASE variable <4>: User-modified. (line 82)
-* IGNORECASE variable: Case-sensitivity. (line 26)
-* IGNORECASE variable, array sorting and: Array Sorting Functions.
- (line 81)
-* IGNORECASE variable, array subscripts and: Array Intro. (line 92)
-* IGNORECASE variable, in example programs: Library Functions.
- (line 42)
-* implementation issues, gawk: Notes. (line 6)
-* implementation issues, gawk, debugging: Compatibility Mode. (line 6)
-* implementation issues, gawk, limits <1>: Redirection. (line 135)
-* implementation issues, gawk, limits: Getline Notes. (line 14)
-* in operator <1>: Id Program. (line 93)
-* in operator <2>: For Statement. (line 75)
-* in operator <3>: Precedence. (line 83)
-* in operator: Comparison Operators.
- (line 11)
-* in operator, arrays and <1>: Scanning an Array. (line 17)
-* in operator, arrays and: Reference to Elements.
- (line 37)
-* increment operators: Increment Ops. (line 6)
-* index() function: String Functions. (line 155)
-* indexing arrays: Array Intro. (line 50)
-* indirect function calls: Indirect Calls. (line 6)
-* infinite precision: Arbitrary Precision Arithmetic.
- (line 6)
-* info debugger command: Debugger Info. (line 13)
-* initialization, automatic: More Complex. (line 38)
-* input files: Reading Files. (line 6)
-* input files, closing: Close Files And Pipes.
- (line 6)
-* input files, counting elements in: Wc Program. (line 6)
-* input files, examples: Sample Data Files. (line 6)
-* input files, reading: Reading Files. (line 6)
-* input files, running awk without: Read Terminal. (line 6)
-* input files, variable assignments and: Other Arguments. (line 19)
-* input pipeline: Getline/Pipe. (line 6)
-* input redirection: Getline/File. (line 6)
-* input, data, nondecimal: Nondecimal Data. (line 6)
-* input, explicit: Getline. (line 6)
-* input, files, See input files: Multiple Line. (line 6)
-* input, multiline records: Multiple Line. (line 6)
-* input, splitting into records: Records. (line 6)
-* input, standard <1>: Special FD. (line 6)
-* input, standard: Read Terminal. (line 6)
-* input/output, binary: User-modified. (line 10)
-* input/output, from BEGIN and END: I/O And BEGIN/END. (line 6)
-* input/output, two-way: Two-way I/O. (line 44)
-* insomnia, cure for: Alarm Program. (line 6)
-* installation, VMS: VMS Installation. (line 6)
-* installing gawk: Installation. (line 6)
-* INT signal (MS-Windows): Profiling. (line 206)
-* int() function: Numeric Functions. (line 23)
-* integer, arbitrary precision: Arbitrary Precision Integers.
- (line 6)
-* integers: General Arithmetic. (line 6)
-* integers, unsigned: General Arithmetic. (line 15)
-* interacting with other programs: I/O Functions. (line 63)
-* internal constant, INVALID_HANDLE: Internals. (line 157)
-* internal function, assoc_clear(): Internals. (line 68)
-* internal function, assoc_lookup(): Internals. (line 72)
-* internal function, dupnode(): Internals. (line 87)
-* internal function, force_number(): Internals. (line 27)
-* internal function, force_string(): Internals. (line 32)
-* internal function, force_wstring(): Internals. (line 37)
-* internal function, get_actual_argument(): Internals. (line 116)
-* internal function, get_argument(): Internals. (line 111)
-* internal function, iop_alloc(): Internals. (line 157)
-* internal function, make_builtin(): Internals. (line 97)
-* internal function, make_number(): Internals. (line 82)
-* internal function, make_string(): Internals. (line 77)
-* internal function, register_deferred_variable(): Internals. (line 146)
-* internal function, register_open_hook(): Internals. (line 157)
-* internal function, unref(): Internals. (line 92)
-* internal function, unset_ERRNO(): Internals. (line 141)
-* internal function, update_ERRNO_int(): Internals. (line 130)
-* internal function, update_ERRNO_string(): Internals. (line 135)
-* internal macro, get_array_argument(): Internals. (line 127)
-* internal macro, get_scalar_argument(): Internals. (line 124)
-* internal structure, IOBUF: Internals. (line 157)
-* internal type, AWKNUM: Internals. (line 19)
-* internal type, NODE: Internals. (line 23)
-* internal variable, nargs: Internals. (line 42)
-* internal variable, stlen: Internals. (line 46)
-* internal variable, stptr: Internals. (line 46)
-* internal variable, type: Internals. (line 59)
-* internal variable, vname: Internals. (line 64)
-* internal variable, wstlen: Internals. (line 54)
-* internal variable, wstptr: Internals. (line 54)
-* internationalization <1>: I18N and L10N. (line 6)
-* internationalization: I18N Functions. (line 6)
-* internationalization, localization <1>: Internationalization.
- (line 13)
-* internationalization, localization: User-modified. (line 162)
-* internationalization, localization, character classes: Bracket Expressions.
- (line 90)
-* internationalization, localization, gawk and: Internationalization.
- (line 13)
-* internationalization, localization, locale categories: Explaining gettext.
- (line 80)
-* internationalization, localization, marked strings: Programmer i18n.
- (line 14)
-* internationalization, localization, portability and: I18N Portability.
- (line 6)
-* internationalizing a program: Explaining gettext. (line 6)
-* interpreted programs <1>: Glossary. (line 361)
-* interpreted programs: Basic High Level. (line 14)
-* interval expressions: Regexp Operators. (line 116)
-* INVALID_HANDLE internal constant: Internals. (line 157)
-* inventory-shipped file: Sample Data Files. (line 32)
-* IOBUF internal structure: Internals. (line 157)
-* iop_alloc() internal function: Internals. (line 157)
-* isarray() function (gawk): Type Functions. (line 11)
-* ISO: Glossary. (line 372)
-* ISO 8859-1: Glossary. (line 141)
-* ISO Latin-1: Glossary. (line 141)
-* Jacobs, Andrew: Passwd Functions. (line 90)
-* Jaegermann, Michal <1>: Contributors. (line 46)
-* Jaegermann, Michal: Acknowledgments. (line 60)
-* Java implementation of awk: Other Versions. (line 97)
-* Java programming language: Glossary. (line 380)
-* jawk: Other Versions. (line 97)
-* Jedi knights: Undocumented. (line 6)
-* join() user-defined function: Join Function. (line 18)
-* Kahrs, Ju"rgen <1>: Contributors. (line 70)
-* Kahrs, Ju"rgen: Acknowledgments. (line 60)
-* Kasal, Stepan: Acknowledgments. (line 60)
-* Kenobi, Obi-Wan: Undocumented. (line 6)
-* Kernighan, Brian <1>: Basic Data Typing. (line 55)
-* Kernighan, Brian <2>: Other Versions. (line 13)
-* Kernighan, Brian <3>: Contributors. (line 12)
-* Kernighan, Brian <4>: BTL. (line 6)
-* Kernighan, Brian <5>: Concatenation. (line 6)
-* Kernighan, Brian <6>: Acknowledgments. (line 77)
-* Kernighan, Brian <7>: Conventions. (line 34)
-* Kernighan, Brian: History. (line 17)
-* kill command, dynamic profiling: Profiling. (line 180)
-* Knights, jedi: Undocumented. (line 6)
-* Knuth, Donald: Arbitrary Precision Arithmetic.
- (line 6)
-* Kwok, Conrad: Contributors. (line 35)
-* l debugger command (alias for list): Miscellaneous Debugger Commands.
- (line 74)
-* labels.awk program: Labels Program. (line 51)
-* languages, data-driven: Basic High Level. (line 83)
-* Laurie, Dirk: Changing Precision. (line 6)
-* LC_ALL locale category: Explaining gettext. (line 120)
-* LC_COLLATE locale category: Explaining gettext. (line 93)
-* LC_CTYPE locale category: Explaining gettext. (line 97)
-* LC_MESSAGES locale category: Explaining gettext. (line 87)
-* LC_MESSAGES locale category, bindtextdomain() function (gawk): Programmer i18n.
- (line 88)
-* LC_MONETARY locale category: Explaining gettext. (line 103)
-* LC_NUMERIC locale category: Explaining gettext. (line 107)
-* LC_RESPONSE locale category: Explaining gettext. (line 111)
-* LC_TIME locale category: Explaining gettext. (line 115)
-* left angle bracket (<), < operator <1>: Precedence. (line 65)
-* left angle bracket (<), < operator: Comparison Operators.
- (line 11)
-* left angle bracket (<), < operator (I/O): Getline/File. (line 6)
-* left angle bracket (<), <= operator <1>: Precedence. (line 65)
-* left angle bracket (<), <= operator: Comparison Operators.
- (line 11)
-* left shift, bitwise: Bitwise Functions. (line 32)
-* leftmost longest match: Multiple Line. (line 26)
-* length() function: String Functions. (line 166)
-* Lesser General Public License (LGPL): Glossary. (line 397)
-* LGPL (Lesser General Public License): Glossary. (line 397)
-* libmawk: Other Versions. (line 105)
-* libraries of awk functions: Library Functions. (line 6)
-* libraries of awk functions, assertions: Assert Function. (line 6)
-* libraries of awk functions, associative arrays and: Library Names.
- (line 57)
-* libraries of awk functions, character values as numbers: Ordinal Functions.
- (line 6)
-* libraries of awk functions, command-line options: Getopt Function.
- (line 6)
-* libraries of awk functions, example program for using: Igawk Program.
- (line 6)
-* libraries of awk functions, group database, reading: Group Functions.
- (line 6)
-* libraries of awk functions, managing, data files: Data File Management.
- (line 6)
-* libraries of awk functions, managing, time: Gettimeofday Function.
- (line 6)
-* libraries of awk functions, merging arrays into strings: Join Function.
- (line 6)
-* libraries of awk functions, rounding numbers: Round Function.
- (line 6)
-* libraries of awk functions, user database, reading: Passwd Functions.
- (line 6)
-* line breaks: Statements/Lines. (line 6)
-* line continuations: Boolean Ops. (line 62)
-* line continuations, gawk: Conditional Exp. (line 34)
-* line continuations, in print statement: Print Examples. (line 76)
-* line continuations, with C shell: More Complex. (line 30)
-* lines, blank, printing: Print. (line 22)
-* lines, counting: Wc Program. (line 6)
-* lines, duplicate, removing: History Sorting. (line 6)
-* lines, matching ranges of: Ranges. (line 6)
-* lines, skipping between markers: Ranges. (line 43)
-* lint checking: User-modified. (line 98)
-* lint checking, array elements: Delete. (line 34)
-* lint checking, array subscripts: Uninitialized Subscripts.
- (line 43)
-* lint checking, empty programs: Command Line. (line 16)
-* lint checking, issuing warnings: Options. (line 168)
-* lint checking, POSIXLY_CORRECT environment variable: Options.
- (line 318)
-* lint checking, undefined functions: Pass By Value/Reference.
- (line 88)
-* LINT variable: User-modified. (line 98)
-* Linux <1>: Glossary. (line 611)
-* Linux <2>: I18N Example. (line 55)
-* Linux: Manual History. (line 28)
-* list debugger command: Miscellaneous Debugger Commands.
- (line 74)
-* loading extension: Loading Extensions. (line 6)
-* loading, library: Options. (line 159)
-* local variables: Variable Scope. (line 6)
-* locale categories: Explaining gettext. (line 80)
-* locale decimal point character: Options. (line 249)
-* locale, definition of: Locales. (line 6)
-* localization: I18N and L10N. (line 6)
-* localization, See internationalization, localization: I18N and L10N.
- (line 6)
-* log files, timestamps in: Time Functions. (line 6)
-* log() function: Numeric Functions. (line 30)
-* logical false/true: Truth Values. (line 6)
-* logical operators, See Boolean expressions: Boolean Ops. (line 6)
-* login information: Passwd Functions. (line 16)
-* long options: Command Line. (line 13)
-* loops: While Statement. (line 6)
-* loops, continue statements and: For Statement. (line 64)
-* loops, count for header: Profiling. (line 123)
-* loops, exiting: Break Statement. (line 6)
-* loops, See Also while statement: While Statement. (line 6)
-* Lost In Space: Dynamic Extensions. (line 6)
-* ls utility: More Complex. (line 15)
-* lshift() function (gawk): Bitwise Functions. (line 45)
-* lvalues/rvalues: Assignment Ops. (line 32)
-* mailing labels, printing: Labels Program. (line 6)
-* mailing list, GNITS: Acknowledgments. (line 52)
-* make_builtin() internal function: Internals. (line 97)
-* make_number() internal function: Internals. (line 82)
-* make_string() internal function: Internals. (line 77)
-* mark parity: Ordinal Functions. (line 45)
-* marked string extraction (internationalization): String Extraction.
- (line 6)
-* marked strings, extracting: String Extraction. (line 6)
-* Marx, Groucho: Increment Ops. (line 61)
-* match() function: String Functions. (line 206)
-* match() function, RSTART/RLENGTH variables: String Functions.
- (line 223)
-* matching, expressions, See comparison expressions: Typing and Comparison.
- (line 9)
-* matching, leftmost longest: Multiple Line. (line 26)
-* matching, null strings: Gory Details. (line 164)
-* mawk program: Other Versions. (line 35)
-* McPhee, Patrick: Contributors. (line 100)
-* memory, releasing: Internals. (line 92)
-* message object files: Explaining gettext. (line 41)
-* message object files, converting from portable object files: I18N Example.
- (line 62)
-* message object files, specifying directory of <1>: Programmer i18n.
- (line 47)
-* message object files, specifying directory of: Explaining gettext.
- (line 53)
-* metacharacters, escape sequences for: Escape Sequences. (line 132)
-* mktime() function (gawk): Time Functions. (line 24)
-* modifiers, in format specifiers: Format Modifiers. (line 6)
-* monetary information, localization: Explaining gettext. (line 103)
-* MPFR: Arbitrary Precision Arithmetic.
- (line 6)
-* msgfmt utility: I18N Example. (line 62)
-* multiple precision: Arbitrary Precision Arithmetic.
- (line 6)
-* n debugger command (alias for next): Debugger Execution Control.
- (line 43)
-* names, arrays/variables <1>: Library Names. (line 6)
-* names, arrays/variables: Arrays. (line 18)
-* names, functions <1>: Library Names. (line 6)
-* names, functions: Definition Syntax. (line 20)
-* namespace issues <1>: Library Names. (line 6)
-* namespace issues: Arrays. (line 18)
-* namespace issues, functions: Definition Syntax. (line 20)
-* nargs internal variable: Internals. (line 42)
-* nawk utility: Names. (line 17)
-* negative zero: Unexpected Results. (line 28)
-* NetBSD: Glossary. (line 611)
-* networks, programming: TCP/IP Networking. (line 6)
-* networks, support for: Special Network. (line 6)
-* newlines <1>: Boolean Ops. (line 67)
-* newlines <2>: Options. (line 239)
-* newlines: Statements/Lines. (line 6)
-* newlines, as field separators: Default Field Splitting.
- (line 6)
-* newlines, as record separators: Records. (line 20)
-* newlines, in dynamic regexps: Computed Regexps. (line 59)
-* newlines, in regexp constants: Computed Regexps. (line 69)
-* newlines, printing: Print Examples. (line 12)
-* newlines, separating statements in actions <1>: Statements. (line 10)
-* newlines, separating statements in actions: Action Overview.
- (line 19)
-* next debugger command: Debugger Execution Control.
- (line 43)
-* next statement <1>: Next Statement. (line 6)
-* next statement: Boolean Ops. (line 85)
-* next statement, BEGIN/END patterns and: I/O And BEGIN/END. (line 37)
-* next statement, BEGINFILE/ENDFILE patterns and: BEGINFILE/ENDFILE.
- (line 49)
-* next statement, user-defined functions and: Next Statement. (line 45)
-* nextfile statement: Nextfile Statement. (line 6)
-* nextfile statement, BEGIN/END patterns and: I/O And BEGIN/END.
- (line 37)
-* nextfile statement, BEGINFILE/ENDFILE patterns and: BEGINFILE/ENDFILE.
- (line 26)
-* nextfile statement, user-defined functions and: Nextfile Statement.
- (line 44)
-* nexti debugger command: Debugger Execution Control.
- (line 49)
-* NF variable <1>: Auto-set. (line 108)
-* NF variable: Fields. (line 33)
-* NF variable, decrementing: Changing Fields. (line 107)
-* ni debugger command (alias for nexti): Debugger Execution Control.
- (line 49)
-* noassign.awk program: Ignoring Assigns. (line 15)
-* NODE internal type: Internals. (line 23)
-* nodes, duplicating: Internals. (line 87)
-* not Boolean-logic operator: Boolean Ops. (line 6)
-* NR variable <1>: Auto-set. (line 119)
-* NR variable: Records. (line 6)
-* NR variable, changing: Auto-set. (line 225)
-* null strings <1>: Basic Data Typing. (line 26)
-* null strings <2>: Truth Values. (line 6)
-* null strings <3>: Regexp Field Splitting.
- (line 43)
-* null strings: Records. (line 102)
-* null strings, array elements and: Delete. (line 27)
-* null strings, as array subscripts: Uninitialized Subscripts.
- (line 43)
-* null strings, converting numbers to strings: Conversion. (line 21)
-* null strings, matching: Gory Details. (line 164)
-* null strings, quoting and: Quoting. (line 62)
-* number sign (#), #! (executable scripts): Executable Scripts.
- (line 6)
-* number sign (#), #! (executable scripts), portability issues with: Executable Scripts.
- (line 6)
-* number sign (#), commenting: Comments. (line 6)
-* numbers: Internals. (line 82)
-* numbers, as array subscripts: Numeric Array Subscripts.
- (line 6)
-* numbers, as values of characters: Ordinal Functions. (line 6)
-* numbers, Cliff random: Cliff Random Function.
- (line 6)
-* numbers, converting <1>: Bitwise Functions. (line 107)
-* numbers, converting: Conversion. (line 6)
-* numbers, converting, to strings: User-modified. (line 28)
-* numbers, floating-point: General Arithmetic. (line 6)
-* numbers, floating-point, AWKNUM internal type: Internals. (line 19)
-* numbers, hexadecimal: Nondecimal-numbers. (line 6)
-* numbers, NODE internal type: Internals. (line 23)
-* numbers, octal: Nondecimal-numbers. (line 6)
-* numbers, random: Numeric Functions. (line 64)
-* numbers, rounding: Round Function. (line 6)
-* numeric, constants: Scalar Constants. (line 6)
-* numeric, output format: OFMT. (line 6)
-* numeric, strings: Variable Typing. (line 6)
-* numeric, values: Internals. (line 27)
-* o debugger command (alias for option): Debugger Info. (line 57)
-* oawk utility: Names. (line 17)
-* obsolete features: Obsolete. (line 6)
-* octal numbers: Nondecimal-numbers. (line 6)
-* octal values, enabling interpretation of: Options. (line 193)
-* OFMT variable <1>: User-modified. (line 115)
-* OFMT variable <2>: Conversion. (line 55)
-* OFMT variable: OFMT. (line 15)
-* OFMT variable, POSIX awk and: OFMT. (line 27)
-* OFS variable <1>: User-modified. (line 124)
-* OFS variable <2>: Output Separators. (line 6)
-* OFS variable: Changing Fields. (line 64)
-* OpenBSD: Glossary. (line 611)
-* OpenSolaris: Other Versions. (line 87)
-* operating systems, BSD-based: Manual History. (line 28)
-* operating systems, PC, gawk on: PC Using. (line 6)
-* operating systems, PC, gawk on, installing: PC Installation.
- (line 6)
-* operating systems, porting gawk to: New Ports. (line 6)
-* operating systems, See Also GNU/Linux, PC operating systems, Unix: Installation.
- (line 6)
-* operations, bitwise: Bitwise Functions. (line 6)
-* operators, arithmetic: Arithmetic Ops. (line 6)
-* operators, assignment: Assignment Ops. (line 6)
-* operators, assignment, evaluation order: Assignment Ops. (line 111)
-* operators, Boolean, See Boolean expressions: Boolean Ops. (line 6)
-* operators, decrement/increment: Increment Ops. (line 6)
-* operators, GNU-specific: GNU Regexp Operators.
- (line 6)
-* operators, input/output <1>: Precedence. (line 65)
-* operators, input/output <2>: Redirection. (line 22)
-* operators, input/output <3>: Getline/Coprocess. (line 6)
-* operators, input/output <4>: Getline/Pipe. (line 6)
-* operators, input/output: Getline/File. (line 6)
-* operators, logical, See Boolean expressions: Boolean Ops. (line 6)
-* operators, precedence <1>: Precedence. (line 6)
-* operators, precedence: Increment Ops. (line 61)
-* operators, relational, See operators, comparison: Typing and Comparison.
- (line 9)
-* operators, short-circuit: Boolean Ops. (line 57)
-* operators, string: Concatenation. (line 9)
-* operators, string-matching: Regexp Usage. (line 19)
-* operators, string-matching, for buffers: GNU Regexp Operators.
- (line 48)
-* operators, word-boundary (gawk): GNU Regexp Operators.
- (line 63)
-* option debugger command: Debugger Info. (line 57)
-* options, command-line <1>: Command Line Field Separator.
- (line 6)
-* options, command-line <2>: Options. (line 6)
-* options, command-line: Long. (line 12)
-* options, command-line, end of: Options. (line 54)
-* options, command-line, invoking awk: Command Line. (line 6)
-* options, command-line, processing: Getopt Function. (line 6)
-* options, deprecated: Obsolete. (line 6)
-* options, long <1>: Options. (line 6)
-* options, long: Command Line. (line 13)
-* options, printing list of: Options. (line 154)
-* OR bitwise operation: Bitwise Functions. (line 6)
-* or Boolean-logic operator: Boolean Ops. (line 6)
-* or() function (gawk): Bitwise Functions. (line 48)
-* ord() user-defined function: Ordinal Functions. (line 16)
-* order of evaluation, concatenation: Concatenation. (line 42)
-* ORS variable <1>: User-modified. (line 129)
-* ORS variable: Output Separators. (line 20)
-* output field separator, See OFS variable: Changing Fields. (line 64)
-* output record separator, See ORS variable: Output Separators.
- (line 20)
-* output redirection: Redirection. (line 6)
-* output, buffering: I/O Functions. (line 29)
-* output, duplicating into files: Tee Program. (line 6)
-* output, files, closing: Close Files And Pipes.
- (line 6)
-* output, format specifier, OFMT: OFMT. (line 15)
-* output, formatted: Printf. (line 6)
-* output, pipes: Redirection. (line 57)
-* output, printing, See printing: Printing. (line 6)
-* output, records: Output Separators. (line 20)
-* output, standard: Special FD. (line 6)
-* p debugger command (alias for print): Viewing And Changing Data.
- (line 36)
-* P1003.1 POSIX standard: Glossary. (line 454)
-* P1003.2 POSIX standard: Glossary. (line 454)
-* parameters, number of: Internals. (line 42)
-* parentheses () <1>: Profiling. (line 138)
-* parentheses (): Regexp Operators. (line 79)
-* password file: Passwd Functions. (line 16)
-* patsplit() function: String Functions. (line 293)
-* patterns: Patterns and Actions.
- (line 6)
-* patterns, comparison expressions as: Expression Patterns. (line 14)
-* patterns, counts: Profiling. (line 110)
-* patterns, default: Very Simple. (line 34)
-* patterns, empty: Empty. (line 6)
-* patterns, expressions as: Regexp Patterns. (line 6)
-* patterns, ranges in: Ranges. (line 6)
-* patterns, regexp constants as: Expression Patterns. (line 36)
-* patterns, types of: Pattern Overview. (line 15)
-* pawk (profiling version of Brian Kernighan's awk): Other Versions.
- (line 69)
-* PC operating systems, gawk on: PC Using. (line 6)
-* PC operating systems, gawk on, installing: PC Installation. (line 6)
-* percent sign (%), % operator: Precedence. (line 55)
-* percent sign (%), %= operator <1>: Precedence. (line 95)
-* percent sign (%), %= operator: Assignment Ops. (line 129)
-* period (.): Regexp Operators. (line 43)
-* Perl: Future Extensions. (line 6)
-* Peters, Arno: Contributors. (line 85)
-* Peterson, Hal: Contributors. (line 40)
-* pipes, closing: Close Files And Pipes.
- (line 6)
-* pipes, input: Getline/Pipe. (line 6)
-* pipes, output: Redirection. (line 57)
-* Pitts, Dave <1>: Bugs. (line 73)
-* Pitts, Dave: Acknowledgments. (line 60)
-* plus sign (+): Regexp Operators. (line 102)
-* plus sign (+), + operator: Precedence. (line 52)
-* plus sign (+), ++ (decrement/increment operators): Increment Ops.
- (line 11)
-* plus sign (+), ++ operator <1>: Precedence. (line 46)
-* plus sign (+), ++ operator: Increment Ops. (line 40)
-* plus sign (+), += operator <1>: Precedence. (line 95)
-* plus sign (+), += operator: Assignment Ops. (line 82)
-* pointers to functions: Indirect Calls. (line 6)
-* portability: Escape Sequences. (line 94)
-* portability, #! (executable scripts): Executable Scripts. (line 34)
-* portability, ** operator and: Arithmetic Ops. (line 81)
-* portability, **= operator and: Assignment Ops. (line 142)
-* portability, ARGV variable: Executable Scripts. (line 43)
-* portability, backslash continuation and: Statements/Lines. (line 30)
-* portability, backslash in escape sequences: Escape Sequences.
- (line 113)
-* portability, close() function and: Close Files And Pipes.
- (line 81)
-* portability, data files as single record: Records. (line 175)
-* portability, deleting array elements: Delete. (line 52)
-* portability, example programs: Library Functions. (line 31)
-* portability, fflush() function and: I/O Functions. (line 29)
-* portability, functions, defining: Definition Syntax. (line 99)
-* portability, gawk: New Ports. (line 6)
-* portability, gettext library and: Explaining gettext. (line 10)
-* portability, internationalization and: I18N Portability. (line 6)
-* portability, length() function: String Functions. (line 175)
-* portability, new awk vs. old awk: Conversion. (line 55)
-* portability, next statement in user-defined functions: Pass By Value/Reference.
- (line 91)
-* portability, NF variable, decrementing: Changing Fields. (line 115)
-* portability, operators: Increment Ops. (line 61)
-* portability, operators, not in POSIX awk: Precedence. (line 98)
-* portability, POSIXLY_CORRECT environment variable: Options. (line 339)
-* portability, substr() function: String Functions. (line 512)
-* portable object files <1>: Translator i18n. (line 6)
-* portable object files: Explaining gettext. (line 36)
-* portable object files, converting to message object files: I18N Example.
- (line 62)
-* portable object files, generating: Options. (line 147)
-* portable object template files: Explaining gettext. (line 30)
-* porting gawk: New Ports. (line 6)
-* positional specifiers, printf statement <1>: Printf Ordering.
- (line 6)
-* positional specifiers, printf statement: Format Modifiers. (line 13)
-* positional specifiers, printf statement, mixing with regular formats: Printf Ordering.
- (line 57)
-* positive zero: Unexpected Results. (line 28)
-* POSIX awk <1>: Assignment Ops. (line 136)
-* POSIX awk: This Manual. (line 14)
-* POSIX awk, ** operator and: Precedence. (line 98)
-* POSIX awk, **= operator and: Assignment Ops. (line 142)
-* POSIX awk, < operator and: Getline/File. (line 26)
-* POSIX awk, arithmetic operators and: Arithmetic Ops. (line 36)
-* POSIX awk, backslashes in string constants: Escape Sequences.
- (line 113)
-* POSIX awk, BEGIN/END patterns: I/O And BEGIN/END. (line 16)
-* POSIX awk, bracket expressions and: Bracket Expressions. (line 24)
-* POSIX awk, bracket expressions and, character classes: Bracket Expressions.
- (line 30)
-* POSIX awk, break statement and: Break Statement. (line 51)
-* POSIX awk, changes in awk versions: POSIX. (line 6)
-* POSIX awk, continue statement and: Continue Statement. (line 43)
-* POSIX awk, CONVFMT variable and: User-modified. (line 28)
-* POSIX awk, date utility and: Time Functions. (line 261)
-* POSIX awk, field separators and <1>: Field Splitting Summary.
- (line 41)
-* POSIX awk, field separators and: Fields. (line 6)
-* POSIX awk, FS variable and: User-modified. (line 66)
-* POSIX awk, function keyword in: Definition Syntax. (line 83)
-* POSIX awk, functions and, gsub()/sub(): Gory Details. (line 54)
-* POSIX awk, functions and, length(): String Functions. (line 175)
-* POSIX awk, GNU long options and: Options. (line 15)
-* POSIX awk, interval expressions in: Regexp Operators. (line 135)
-* POSIX awk, next/nextfile statements and: Next Statement. (line 45)
-* POSIX awk, numeric strings and: Variable Typing. (line 6)
-* POSIX awk, OFMT variable and <1>: Conversion. (line 55)
-* POSIX awk, OFMT variable and: OFMT. (line 27)
-* POSIX awk, period (.), using: Regexp Operators. (line 50)
-* POSIX awk, printf format strings and: Format Modifiers. (line 159)
-* POSIX awk, regular expressions and: Regexp Operators. (line 161)
-* POSIX awk, timestamps and: Time Functions. (line 6)
-* POSIX awk, | I/O operator and: Getline/Pipe. (line 52)
-* POSIX mode: Options. (line 233)
-* POSIX, awk and: Preface. (line 23)
-* POSIX, gawk extensions not included in: POSIX/GNU. (line 6)
-* POSIX, programs, implementing in awk: Clones. (line 6)
-* POSIXLY_CORRECT environment variable: Options. (line 318)
-* PREC variable <1>: Setting Precision. (line 6)
-* PREC variable: User-modified. (line 134)
-* precedence <1>: Precedence. (line 6)
-* precedence: Increment Ops. (line 61)
-* precedence, regexp operators: Regexp Operators. (line 156)
-* print debugger command: Viewing And Changing Data.
- (line 36)
-* print statement: Printing. (line 16)
-* print statement, BEGIN/END patterns and: I/O And BEGIN/END. (line 16)
-* print statement, commas, omitting: Print Examples. (line 31)
-* print statement, I/O operators in: Precedence. (line 71)
-* print statement, line continuations and: Print Examples. (line 76)
-* print statement, OFMT variable and: User-modified. (line 124)
-* print statement, See Also redirection, of output: Redirection.
- (line 17)
-* print statement, sprintf() function and: Round Function. (line 6)
-* printf debugger command: Viewing And Changing Data.
- (line 54)
-* printf statement <1>: Printf. (line 6)
-* printf statement: Printing. (line 16)
-* printf statement, columns, aligning: Print Examples. (line 70)
-* printf statement, format-control characters: Control Letters.
- (line 6)
-* printf statement, I/O operators in: Precedence. (line 71)
-* printf statement, modifiers: Format Modifiers. (line 6)
-* printf statement, positional specifiers <1>: Printf Ordering.
- (line 6)
-* printf statement, positional specifiers: Format Modifiers. (line 13)
-* printf statement, positional specifiers, mixing with regular formats: Printf Ordering.
- (line 57)
-* printf statement, See Also redirection, of output: Redirection.
- (line 17)
-* printf statement, sprintf() function and: Round Function. (line 6)
-* printf statement, syntax of: Basic Printf. (line 6)
-* printing: Printing. (line 6)
-* printing, list of options: Options. (line 154)
-* printing, mailing labels: Labels Program. (line 6)
-* printing, unduplicated lines of text: Uniq Program. (line 6)
-* printing, user information: Id Program. (line 6)
-* private variables: Library Names. (line 11)
-* processes, two-way communications with: Two-way I/O. (line 23)
-* processing data: Basic High Level. (line 6)
-* PROCINFO array <1>: Internals. (line 146)
-* PROCINFO array <2>: Id Program. (line 15)
-* PROCINFO array <3>: Group Functions. (line 6)
-* PROCINFO array <4>: Passwd Functions. (line 6)
-* PROCINFO array <5>: Two-way I/O. (line 116)
-* PROCINFO array <6>: Time Functions. (line 46)
-* PROCINFO array <7>: Auto-set. (line 124)
-* PROCINFO array: Obsolete. (line 11)
-* profiling awk programs: Profiling. (line 6)
-* profiling awk programs, dynamically: Profiling. (line 171)
-* profiling gawk: Profiling. (line 6)
-* program, definition of: Getting Started. (line 21)
-* programmers, attractiveness of: Two-way I/O. (line 6)
-* programming conventions, --non-decimal-data option: Nondecimal Data.
- (line 36)
-* programming conventions, ARGC/ARGV variables: Auto-set. (line 31)
-* programming conventions, exit statement: Exit Statement. (line 38)
-* programming conventions, function parameters: Return Statement.
- (line 45)
-* programming conventions, functions, calling: Calling Built-in.
- (line 10)
-* programming conventions, functions, writing: Definition Syntax.
- (line 55)
-* programming conventions, gawk internals: Internal File Ops. (line 33)
-* programming conventions, private variable names: Library Names.
- (line 23)
-* programming language, recipe for: History. (line 6)
-* Programming languages, Ada: Glossary. (line 20)
-* programming languages, data-driven vs. procedural: Getting Started.
- (line 12)
-* Programming languages, Java: Glossary. (line 380)
-* programming, basic steps: Basic High Level. (line 19)
-* programming, concepts: Basic Concepts. (line 6)
-* pwcat program: Passwd Functions. (line 23)
-* q debugger command (alias for quit): Miscellaneous Debugger Commands.
- (line 101)
-* QSE Awk: Other Versions. (line 109)
-* question mark (?) regexp operator <1>: GNU Regexp Operators.
- (line 59)
-* question mark (?) regexp operator: Regexp Operators. (line 111)
-* question mark (?), ?: operator: Precedence. (line 92)
-* QuikTrim Awk: Other Versions. (line 113)
-* quit debugger command: Miscellaneous Debugger Commands.
- (line 101)
-* QUIT signal (MS-Windows): Profiling. (line 206)
-* quoting <1>: Comments. (line 27)
-* quoting <2>: Long. (line 26)
-* quoting: Read Terminal. (line 25)
-* quoting, rules for: Quoting. (line 6)
-* quoting, tricks for: Quoting. (line 71)
-* r debugger command (alias for run): Debugger Execution Control.
- (line 62)
-* Rakitzis, Byron: History Sorting. (line 25)
-* rand() function: Numeric Functions. (line 34)
-* random numbers, Cliff: Cliff Random Function.
- (line 6)
-* random numbers, rand()/srand() functions: Numeric Functions.
- (line 34)
-* random numbers, seed of: Numeric Functions. (line 64)
-* range expressions (regexps): Bracket Expressions. (line 6)
-* range patterns: Ranges. (line 6)
-* Rankin, Pat <1>: Bugs. (line 72)
-* Rankin, Pat <2>: Contributors. (line 38)
-* Rankin, Pat <3>: Assignment Ops. (line 100)
-* Rankin, Pat: Acknowledgments. (line 60)
-* readable data files, checking: File Checking. (line 6)
-* readable.awk program: File Checking. (line 11)
-* recipe for a programming language: History. (line 6)
-* record separators <1>: User-modified. (line 143)
-* record separators: Records. (line 14)
-* record separators, changing: Records. (line 81)
-* record separators, regular expressions as: Records. (line 112)
-* record separators, with multiline records: Multiple Line. (line 10)
-* records <1>: Basic High Level. (line 71)
-* records: Reading Files. (line 14)
-* records, multiline: Multiple Line. (line 6)
-* records, printing: Print. (line 22)
-* records, splitting input into: Records. (line 6)
-* records, terminating: Records. (line 112)
-* records, treating files as: Records. (line 196)
-* recursive functions: Definition Syntax. (line 73)
-* redirection of input: Getline/File. (line 6)
-* redirection of output: Redirection. (line 6)
-* reference counting, sorting arrays: Array Sorting Functions.
- (line 75)
-* regexp constants <1>: Comparison Operators.
- (line 103)
-* regexp constants <2>: Regexp Constants. (line 6)
-* regexp constants: Regexp Usage. (line 57)
-* regexp constants, /=.../, /= operator and: Assignment Ops. (line 148)
-* regexp constants, as patterns: Expression Patterns. (line 36)
-* regexp constants, in gawk: Using Constant Regexps.
- (line 28)
-* regexp constants, slashes vs. quotes: Computed Regexps. (line 28)
-* regexp constants, vs. string constants: Computed Regexps. (line 38)
-* regexp, See regular expressions: Regexp. (line 6)
-* register_deferred_variable() internal function: Internals. (line 146)
-* register_open_hook() internal function: Internals. (line 157)
-* regular expressions: Regexp. (line 6)
-* regular expressions as field separators: Field Separators. (line 50)
-* regular expressions, anchors in: Regexp Operators. (line 22)
-* regular expressions, as field separators: Regexp Field Splitting.
- (line 6)
-* regular expressions, as patterns <1>: Regexp Patterns. (line 6)
-* regular expressions, as patterns: Regexp Usage. (line 6)
-* regular expressions, as record separators: Records. (line 112)
-* regular expressions, case sensitivity <1>: User-modified. (line 82)
-* regular expressions, case sensitivity: Case-sensitivity. (line 6)
-* regular expressions, computed: Computed Regexps. (line 6)
-* regular expressions, constants, See regexp constants: Regexp Usage.
- (line 57)
-* regular expressions, dynamic: Computed Regexps. (line 6)
-* regular expressions, dynamic, with embedded newlines: Computed Regexps.
- (line 59)
-* regular expressions, gawk, command-line options: GNU Regexp Operators.
- (line 70)
-* regular expressions, interval expressions and: Options. (line 258)
-* regular expressions, leftmost longest match: Leftmost Longest.
- (line 6)
-* regular expressions, operators <1>: Regexp Operators. (line 6)
-* regular expressions, operators: Regexp Usage. (line 19)
-* regular expressions, operators, for buffers: GNU Regexp Operators.
- (line 48)
-* regular expressions, operators, for words: GNU Regexp Operators.
- (line 6)
-* regular expressions, operators, gawk: GNU Regexp Operators.
- (line 6)
-* regular expressions, operators, precedence of: Regexp Operators.
- (line 156)
-* regular expressions, searching for: Egrep Program. (line 6)
-* relational operators, See comparison operators: Typing and Comparison.
- (line 9)
-* return debugger command: Debugger Execution Control.
- (line 54)
-* return statement, user-defined functions: Return Statement. (line 6)
-* return values, close() function: Close Files And Pipes.
- (line 131)
-* rev() user-defined function: Function Example. (line 52)
-* rewind() user-defined function: Rewind Function. (line 16)
-* right angle bracket (>), > operator <1>: Precedence. (line 65)
-* right angle bracket (>), > operator: Comparison Operators.
- (line 11)
-* right angle bracket (>), > operator (I/O): Redirection. (line 22)
-* right angle bracket (>), >= operator <1>: Precedence. (line 65)
-* right angle bracket (>), >= operator: Comparison Operators.
- (line 11)
-* right angle bracket (>), >> operator (I/O) <1>: Precedence. (line 65)
-* right angle bracket (>), >> operator (I/O): Redirection. (line 50)
-* right shift, bitwise: Bitwise Functions. (line 32)
-* Ritchie, Dennis: Basic Data Typing. (line 55)
-* RLENGTH variable: Auto-set. (line 201)
-* RLENGTH variable, match() function and: String Functions. (line 223)
-* Robbins, Arnold <1>: Future Extensions. (line 6)
-* Robbins, Arnold <2>: Bugs. (line 32)
-* Robbins, Arnold <3>: Contributors. (line 108)
-* Robbins, Arnold <4>: Alarm Program. (line 6)
-* Robbins, Arnold <5>: Passwd Functions. (line 90)
-* Robbins, Arnold <6>: Getline/Pipe. (line 36)
-* Robbins, Arnold: Command Line Field Separator.
- (line 80)
-* Robbins, Bill: Getline/Pipe. (line 36)
-* Robbins, Harry: Acknowledgments. (line 83)
-* Robbins, Jean: Acknowledgments. (line 83)
-* Robbins, Miriam <1>: Passwd Functions. (line 90)
-* Robbins, Miriam <2>: Getline/Pipe. (line 36)
-* Robbins, Miriam: Acknowledgments. (line 83)
-* Robinson, Will: Dynamic Extensions. (line 6)
-* robot, the: Dynamic Extensions. (line 6)
-* Rommel, Kai Uwe: Contributors. (line 43)
-* round() user-defined function: Round Function. (line 16)
-* rounding mode, floating-point: Rounding Mode. (line 6)
-* rounding numbers: Round Function. (line 6)
-* ROUNDMODE variable <1>: Setting Rounding Mode.
- (line 6)
-* ROUNDMODE variable: User-modified. (line 138)
-* RS variable <1>: User-modified. (line 143)
-* RS variable: Records. (line 20)
-* RS variable, multiline records and: Multiple Line. (line 17)
-* rshift() function (gawk): Bitwise Functions. (line 51)
-* RSTART variable: Auto-set. (line 207)
-* RSTART variable, match() function and: String Functions. (line 223)
-* RT variable <1>: Auto-set. (line 214)
-* RT variable <2>: Getline/Variable/File.
- (line 10)
-* RT variable <3>: Multiple Line. (line 129)
-* RT variable: Records. (line 112)
-* Rubin, Paul <1>: Contributors. (line 16)
-* Rubin, Paul: History. (line 30)
-* rule, definition of: Getting Started. (line 21)
-* run debugger command: Debugger Execution Control.
- (line 62)
-* rvalues/lvalues: Assignment Ops. (line 32)
-* s debugger command (alias for step): Debugger Execution Control.
- (line 68)
-* sandbox mode: Options. (line 265)
-* scalar values: Basic Data Typing. (line 13)
-* Schorr, Andrew: Acknowledgments. (line 60)
-* Schreiber, Bert: Acknowledgments. (line 38)
-* Schreiber, Rita: Acknowledgments. (line 38)
-* search paths <1>: VMS Running. (line 29)
-* search paths <2>: PC Using. (line 11)
-* search paths <3>: Igawk Program. (line 368)
-* search paths <4>: AWKLIBPATH Variable. (line 6)
-* search paths: AWKPATH Variable. (line 6)
-* search paths, for shared libraries: AWKLIBPATH Variable. (line 6)
-* search paths, for source files <1>: VMS Running. (line 29)
-* search paths, for source files <2>: PC Using. (line 11)
-* search paths, for source files <3>: Igawk Program. (line 368)
-* search paths, for source files: AWKPATH Variable. (line 6)
-* searching: String Functions. (line 155)
-* searching, files for regular expressions: Egrep Program. (line 6)
-* searching, for words: Dupword Program. (line 6)
-* sed utility <1>: Glossary. (line 12)
-* sed utility <2>: Simple Sed. (line 6)
-* sed utility: Field Splitting Summary.
- (line 47)
-* semicolon (;): Statements/Lines. (line 91)
-* semicolon (;), AWKPATH variable and: PC Using. (line 11)
-* semicolon (;), separating statements in actions <1>: Statements.
- (line 10)
-* semicolon (;), separating statements in actions: Action Overview.
- (line 19)
-* separators, field: User-modified. (line 56)
-* separators, field, FIELDWIDTHS variable and: User-modified. (line 35)
-* separators, field, FPAT variable and: User-modified. (line 45)
-* separators, field, POSIX and: Fields. (line 6)
-* separators, for records <1>: User-modified. (line 143)
-* separators, for records: Records. (line 14)
-* separators, for records, regular expressions as: Records. (line 112)
-* separators, for statements in actions: Action Overview. (line 19)
-* separators, subscript: User-modified. (line 156)
-* set debugger command: Viewing And Changing Data.
- (line 59)
-* shells, piping commands into: Redirection. (line 143)
-* shells, quoting: Using Shell Variables.
- (line 12)
-* shells, quoting, rules for: Quoting. (line 18)
-* shells, scripts: One-shot. (line 22)
-* shells, variables: Using Shell Variables.
- (line 6)
-* shift, bitwise: Bitwise Functions. (line 32)
-* short-circuit operators: Boolean Ops. (line 57)
-* si debugger command (alias for stepi): Debugger Execution Control.
- (line 76)
-* side effects <1>: Increment Ops. (line 11)
-* side effects: Concatenation. (line 42)
-* side effects, array indexing: Reference to Elements.
- (line 42)
-* side effects, asort() function: Array Sorting Functions.
- (line 24)
-* side effects, assignment expressions: Assignment Ops. (line 23)
-* side effects, Boolean operators: Boolean Ops. (line 30)
-* side effects, conditional expressions: Conditional Exp. (line 22)
-* side effects, decrement/increment operators: Increment Ops. (line 11)
-* side effects, FILENAME variable: Getline Notes. (line 19)
-* side effects, function calls: Function Calls. (line 54)
-* side effects, statements: Action Overview. (line 32)
-* SIGHUP signal: Profiling. (line 203)
-* SIGINT signal (MS-Windows): Profiling. (line 206)
-* signals, HUP/SIGHUP: Profiling. (line 203)
-* signals, INT/SIGINT (MS-Windows): Profiling. (line 206)
-* signals, QUIT/SIGQUIT (MS-Windows): Profiling. (line 206)
-* signals, USR1/SIGUSR1: Profiling. (line 180)
-* SIGQUIT signal (MS-Windows): Profiling. (line 206)
-* SIGUSR1 signal: Profiling. (line 180)
-* silent debugger command: Debugger Execution Control.
- (line 10)
-* sin() function: Numeric Functions. (line 75)
-* single precision floating-point: General Arithmetic. (line 21)
-* single quote (') <1>: Quoting. (line 31)
-* single quote (') <2>: Long. (line 33)
-* single quote ('): One-shot. (line 15)
-* single quote ('), vs. apostrophe: Comments. (line 27)
-* single quote ('), with double quotes: Quoting. (line 53)
-* single-character fields: Single Character Fields.
- (line 6)
-* Skywalker, Luke: Undocumented. (line 6)
-* sleep utility: Alarm Program. (line 109)
-* Solaris, POSIX-compliant awk: Other Versions. (line 87)
-* sort function, arrays, sorting: Array Sorting Functions.
- (line 6)
-* sort utility: Word Sorting. (line 50)
-* sort utility, coprocesses and: Two-way I/O. (line 83)
-* sorting characters in different languages: Explaining gettext.
- (line 93)
-* source code, awka: Other Versions. (line 55)
-* source code, Brian Kernighan's awk: Other Versions. (line 13)
-* source code, Busybox Awk: Other Versions. (line 79)
-* source code, gawk: Gawk Distribution. (line 6)
-* source code, jawk: Other Versions. (line 97)
-* source code, libmawk: Other Versions. (line 105)
-* source code, mawk: Other Versions. (line 35)
-* source code, mixing: Options. (line 117)
-* source code, pawk: Other Versions. (line 69)
-* source code, QSE Awk: Other Versions. (line 109)
-* source code, QuikTrim Awk: Other Versions. (line 113)
-* source code, Solaris awk: Other Versions. (line 87)
-* source code, xgawk: Other Versions. (line 120)
-* source files, search path for: Igawk Program. (line 368)
-* sparse arrays: Array Intro. (line 71)
-* Spencer, Henry: Glossary. (line 12)
-* split utility: Split Program. (line 6)
-* split() function: String Functions. (line 315)
-* split() function, array elements, deleting: Delete. (line 57)
-* split.awk program: Split Program. (line 30)
-* sprintf() function <1>: String Functions. (line 380)
-* sprintf() function: OFMT. (line 15)
-* sprintf() function, OFMT variable and: User-modified. (line 124)
-* sprintf() function, print/printf statements and: Round Function.
- (line 6)
-* sqrt() function: Numeric Functions. (line 78)
-* square brackets ([]): Regexp Operators. (line 55)
-* srand() function: Numeric Functions. (line 82)
-* Stallman, Richard <1>: Glossary. (line 301)
-* Stallman, Richard <2>: Contributors. (line 24)
-* Stallman, Richard <3>: Acknowledgments. (line 18)
-* Stallman, Richard: Manual History. (line 6)
-* standard error: Special FD. (line 6)
-* standard input <1>: Special FD. (line 6)
-* standard input: Read Terminal. (line 6)
-* standard output: Special FD. (line 6)
-* stat() function, implementing in gawk: Sample Library. (line 6)
-* statements, compound, control statements and: Statements. (line 10)
-* statements, control, in actions: Statements. (line 6)
-* statements, multiple: Statements/Lines. (line 91)
-* step debugger command: Debugger Execution Control.
- (line 68)
-* stepi debugger command: Debugger Execution Control.
- (line 76)
-* stlen internal variable: Internals. (line 46)
-* stptr internal variable: Internals. (line 46)
-* stream editors <1>: Simple Sed. (line 6)
-* stream editors: Field Splitting Summary.
- (line 47)
-* strftime() function (gawk): Time Functions. (line 47)
-* string constants: Scalar Constants. (line 15)
-* string constants, vs. regexp constants: Computed Regexps. (line 38)
-* string extraction (internationalization): String Extraction.
- (line 6)
-* string operators: Concatenation. (line 9)
-* string-matching operators: Regexp Usage. (line 19)
-* strings: Internals. (line 77)
-* strings, converting <1>: Bitwise Functions. (line 107)
-* strings, converting: Conversion. (line 6)
-* strings, converting, numbers to: User-modified. (line 28)
-* strings, empty, See null strings: Records. (line 102)
-* strings, extracting: String Extraction. (line 6)
-* strings, for localization: Programmer i18n. (line 14)
-* strings, length of: Scalar Constants. (line 20)
-* strings, merging arrays into: Join Function. (line 6)
-* strings, NODE internal type: Internals. (line 23)
-* strings, null: Regexp Field Splitting.
- (line 43)
-* strings, numeric: Variable Typing. (line 6)
-* strings, splitting: String Functions. (line 335)
-* strtonum() function (gawk): String Functions. (line 387)
-* strtonum() function (gawk), --non-decimal-data option and: Nondecimal Data.
- (line 36)
-* sub() function <1>: String Functions. (line 408)
-* sub() function: Using Constant Regexps.
- (line 43)
-* sub() function, arguments of: String Functions. (line 462)
-* sub() function, escape processing: Gory Details. (line 6)
-* subscript separators: User-modified. (line 156)
-* subscripts in arrays, multidimensional: Multi-dimensional. (line 10)
-* subscripts in arrays, multidimensional, scanning: Multi-scanning.
- (line 11)
-* subscripts in arrays, numbers as: Numeric Array Subscripts.
- (line 6)
-* subscripts in arrays, uninitialized variables as: Uninitialized Subscripts.
- (line 6)
-* SUBSEP variable: User-modified. (line 156)
-* SUBSEP variable, multidimensional arrays: Multi-dimensional.
- (line 16)
-* substr() function: String Functions. (line 481)
-* Sumner, Andrew: Other Versions. (line 55)
-* switch statement: Switch Statement. (line 6)
-* syntactic ambiguity: /= operator vs. /=.../ regexp constant: Assignment Ops.
- (line 148)
-* system() function: I/O Functions. (line 63)
-* systime() function (gawk): Time Functions. (line 64)
-* t debugger command (alias for tbreak): Breakpoint Control. (line 90)
-* tbreak debugger command: Breakpoint Control. (line 90)
-* Tcl: Library Names. (line 57)
-* TCP/IP: TCP/IP Networking. (line 6)
-* TCP/IP, support for: Special Network. (line 6)
-* tee utility: Tee Program. (line 6)
-* tee.awk program: Tee Program. (line 26)
-* terminating records: Records. (line 112)
-* testbits.awk program: Bitwise Functions. (line 68)
-* Texinfo <1>: Adding Code. (line 99)
-* Texinfo <2>: Distribution contents.
- (line 79)
-* Texinfo <3>: Extract Program. (line 12)
-* Texinfo <4>: Dupword Program. (line 17)
-* Texinfo <5>: Library Functions. (line 22)
-* Texinfo <6>: Sample Data Files. (line 66)
-* Texinfo: Conventions. (line 6)
-* Texinfo, chapter beginnings in files: Regexp Operators. (line 22)
-* Texinfo, extracting programs from source files: Extract Program.
- (line 6)
-* text, printing: Print. (line 22)
-* text, printing, unduplicated lines of: Uniq Program. (line 6)
-* TEXTDOMAIN variable <1>: Programmer i18n. (line 9)
-* TEXTDOMAIN variable: User-modified. (line 162)
-* TEXTDOMAIN variable, BEGIN pattern and: Programmer i18n. (line 60)
-* TEXTDOMAIN variable, portability and: I18N Portability. (line 20)
-* textdomain() function (C library): Explaining gettext. (line 27)
-* tilde (~), ~ operator <1>: Expression Patterns. (line 24)
-* tilde (~), ~ operator <2>: Precedence. (line 80)
-* tilde (~), ~ operator <3>: Comparison Operators.
- (line 11)
-* tilde (~), ~ operator <4>: Regexp Constants. (line 6)
-* tilde (~), ~ operator <5>: Computed Regexps. (line 6)
-* tilde (~), ~ operator <6>: Case-sensitivity. (line 26)
-* tilde (~), ~ operator: Regexp Usage. (line 19)
-* time, alarm clock example program: Alarm Program. (line 9)
-* time, localization and: Explaining gettext. (line 115)
-* time, managing: Gettimeofday Function.
- (line 6)
-* time, retrieving: Time Functions. (line 17)
-* timeout, reading input: Read Timeout. (line 6)
-* timestamps: Time Functions. (line 6)
-* timestamps, converting dates to: Time Functions. (line 74)
-* timestamps, formatted: Gettimeofday Function.
- (line 6)
-* tolower() function: String Functions. (line 523)
-* toupper() function: String Functions. (line 529)
-* tr utility: Translate Program. (line 6)
-* trace debugger command: Miscellaneous Debugger Commands.
- (line 110)
-* translate.awk program: Translate Program. (line 55)
-* troubleshooting, --non-decimal-data option: Options. (line 193)
-* troubleshooting, == operator: Comparison Operators.
- (line 37)
-* troubleshooting, awk uses FS not IFS: Field Separators. (line 29)
-* troubleshooting, backslash before nonspecial character: Escape Sequences.
- (line 113)
-* troubleshooting, division: Arithmetic Ops. (line 44)
-* troubleshooting, fatal errors, field widths, specifying: Constant Size.
- (line 22)
-* troubleshooting, fatal errors, printf format strings: Format Modifiers.
- (line 159)
-* troubleshooting, fflush() function: I/O Functions. (line 51)
-* troubleshooting, function call syntax: Function Calls. (line 28)
-* troubleshooting, gawk: Compatibility Mode. (line 6)
-* troubleshooting, gawk, bug reports: Bugs. (line 9)
-* troubleshooting, gawk, fatal errors, function arguments: Calling Built-in.
- (line 16)
-* troubleshooting, getline function: File Checking. (line 25)
-* troubleshooting, gsub()/sub() functions: String Functions. (line 472)
-* troubleshooting, match() function: String Functions. (line 288)
-* troubleshooting, patsplit() function: String Functions. (line 311)
-* troubleshooting, print statement, omitting commas: Print Examples.
- (line 31)
-* troubleshooting, printing: Redirection. (line 118)
-* troubleshooting, quotes with file names: Special FD. (line 68)
-* troubleshooting, readable data files: File Checking. (line 6)
-* troubleshooting, regexp constants vs. string constants: Computed Regexps.
- (line 38)
-* troubleshooting, string concatenation: Concatenation. (line 27)
-* troubleshooting, substr() function: String Functions. (line 499)
-* troubleshooting, system() function: I/O Functions. (line 85)
-* troubleshooting, typographical errors, global variables: Options.
- (line 98)
-* true, logical: Truth Values. (line 6)
-* Trueman, David <1>: Contributors. (line 31)
-* Trueman, David <2>: Acknowledgments. (line 47)
-* Trueman, David: History. (line 30)
-* trunc-mod operation: Arithmetic Ops. (line 66)
-* truth values: Truth Values. (line 6)
-* type conversion: Conversion. (line 21)
-* type internal variable: Internals. (line 59)
-* u debugger command (alias for until): Debugger Execution Control.
- (line 83)
-* undefined functions: Pass By Value/Reference.
- (line 71)
-* underscore (_), _ C macro: Explaining gettext. (line 70)
-* underscore (_), in names of private variables: Library Names.
- (line 29)
-* underscore (_), translatable string: Programmer i18n. (line 69)
-* undisplay debugger command: Viewing And Changing Data.
- (line 80)
-* undocumented features: Undocumented. (line 6)
-* Unicode: Glossary. (line 141)
-* uninitialized variables, as array subscripts: Uninitialized Subscripts.
- (line 6)
-* uniq utility: Uniq Program. (line 6)
-* uniq.awk program: Uniq Program. (line 65)
-* Unix: Glossary. (line 611)
-* Unix awk, backslashes in escape sequences: Escape Sequences.
- (line 125)
-* Unix awk, close() function and: Close Files And Pipes.
- (line 131)
-* Unix awk, password files, field separators and: Command Line Field Separator.
- (line 72)
-* Unix, awk scripts and: Executable Scripts. (line 6)
-* UNIXROOT variable, on OS/2 systems: PC Using. (line 17)
-* unref() internal function: Internals. (line 92)
-* unset_ERRNO() internal function: Internals. (line 141)
-* unsigned integers: General Arithmetic. (line 15)
-* until debugger command: Debugger Execution Control.
- (line 83)
-* unwatch debugger command: Viewing And Changing Data.
- (line 84)
-* up debugger command: Execution Stack. (line 33)
-* update_ERRNO_int() internal function: Internals. (line 130)
-* update_ERRNO_string() internal function: Internals. (line 135)
-* user database, reading: Passwd Functions. (line 6)
-* user-defined, functions: User-defined. (line 6)
-* user-defined, functions, counts: Profiling. (line 129)
-* user-defined, variables: Variables. (line 6)
-* user-modifiable variables: User-modified. (line 6)
-* users, information about, printing: Id Program. (line 6)
-* users, information about, retrieving: Passwd Functions. (line 16)
-* USR1 signal: Profiling. (line 180)
-* values, numeric: Basic Data Typing. (line 13)
-* values, string: Basic Data Typing. (line 13)
-* variable typing: Typing and Comparison.
- (line 9)
-* variables <1>: Basic Data Typing. (line 6)
-* variables: Other Features. (line 6)
-* variables, assigning on command line: Assignment Options. (line 6)
-* variables, built-in <1>: Built-in Variables. (line 6)
-* variables, built-in: Using Variables. (line 20)
-* variables, built-in, -v option, setting with: Options. (line 40)
-* variables, built-in, conveying information: Auto-set. (line 6)
-* variables, flag: Boolean Ops. (line 67)
-* variables, getline command into, using <1>: Getline/Variable/Coprocess.
- (line 6)
-* variables, getline command into, using <2>: Getline/Variable/Pipe.
- (line 6)
-* variables, getline command into, using <3>: Getline/Variable/File.
- (line 6)
-* variables, getline command into, using: Getline/Variable. (line 6)
-* variables, global, for library functions: Library Names. (line 11)
-* variables, global, printing list of: Options. (line 93)
-* variables, initializing: Using Variables. (line 20)
-* variables, local: Variable Scope. (line 6)
-* variables, names of: Arrays. (line 18)
-* variables, private: Library Names. (line 11)
-* variables, setting: Options. (line 32)
-* variables, shadowing: Definition Syntax. (line 61)
-* variables, types of: Assignment Ops. (line 40)
-* variables, types of, comparison expressions and: Typing and Comparison.
- (line 9)
-* variables, uninitialized, as array subscripts: Uninitialized Subscripts.
- (line 6)
-* variables, user-defined: Variables. (line 6)
-* vertical bar (|): Regexp Operators. (line 69)
-* vertical bar (|), | operator (I/O) <1>: Precedence. (line 65)
-* vertical bar (|), | operator (I/O): Getline/Pipe. (line 6)
-* vertical bar (|), |& operator (I/O) <1>: Two-way I/O. (line 44)
-* vertical bar (|), |& operator (I/O) <2>: Precedence. (line 65)
-* vertical bar (|), |& operator (I/O): Getline/Coprocess. (line 6)
-* vertical bar (|), || operator <1>: Precedence. (line 89)
-* vertical bar (|), || operator: Boolean Ops. (line 57)
-* Vinschen, Corinna: Acknowledgments. (line 60)
-* vname internal variable: Internals. (line 64)
-* w debugger command (alias for watch): Viewing And Changing Data.
- (line 67)
-* w utility: Constant Size. (line 22)
-* walk_array() user-defined function: Walking Arrays. (line 14)
-* Wall, Larry <1>: Future Extensions. (line 6)
-* Wall, Larry: Array Intro. (line 6)
-* Wallin, Anders: Acknowledgments. (line 60)
-* warnings, issuing: Options. (line 168)
-* watch debugger command: Viewing And Changing Data.
- (line 67)
-* wc utility: Wc Program. (line 6)
-* wc.awk program: Wc Program. (line 46)
-* Weinberger, Peter <1>: Contributors. (line 12)
-* Weinberger, Peter: History. (line 17)
-* while statement <1>: While Statement. (line 6)
-* while statement: Regexp Usage. (line 19)
-* whitespace, as field separators: Default Field Splitting.
- (line 6)
-* whitespace, functions, calling: Calling Built-in. (line 10)
-* whitespace, newlines as: Options. (line 239)
-* Williams, Kent: Contributors. (line 35)
-* Woehlke, Matthew: Contributors. (line 79)
-* Woods, John: Contributors. (line 28)
-* word boundaries, matching: GNU Regexp Operators.
- (line 38)
-* word, regexp definition of: GNU Regexp Operators.
- (line 6)
-* word-boundary operator (gawk): GNU Regexp Operators.
- (line 63)
-* wordfreq.awk program: Word Sorting. (line 56)
-* words, counting: Wc Program. (line 6)
-* words, duplicate, searching for: Dupword Program. (line 6)
-* words, usage counts, generating: Word Sorting. (line 6)
-* wstlen internal variable: Internals. (line 54)
-* wstptr internal variable: Internals. (line 54)
-* xgawk: Other Versions. (line 120)
-* xgettext utility: String Extraction. (line 13)
-* XML (eXtensible Markup Language): Internals. (line 157)
-* XOR bitwise operation: Bitwise Functions. (line 6)
-* xor() function (gawk): Bitwise Functions. (line 54)
-* Yawitz, Efraim: Contributors. (line 106)
-* Zaretskii, Eli <1>: Bugs. (line 70)
-* Zaretskii, Eli <2>: Contributors. (line 56)
-* Zaretskii, Eli: Acknowledgments. (line 60)
-* zero, negative vs. positive: Unexpected Results. (line 28)
-* zerofile.awk program: Empty Files. (line 21)
-* Zoulas, Christos: Contributors. (line 67)
-* {} (braces): Profiling. (line 134)
-* {} (braces), actions and: Action Overview. (line 19)
-* {} (braces), statements, grouping: Statements. (line 10)
-* | (vertical bar): Regexp Operators. (line 69)
-* | (vertical bar), | operator (I/O) <1>: Precedence. (line 65)
-* | (vertical bar), | operator (I/O) <2>: Redirection. (line 57)
-* | (vertical bar), | operator (I/O): Getline/Pipe. (line 6)
-* | (vertical bar), |& operator (I/O) <1>: Two-way I/O. (line 44)
-* | (vertical bar), |& operator (I/O) <2>: Precedence. (line 65)
-* | (vertical bar), |& operator (I/O) <3>: Redirection. (line 102)
-* | (vertical bar), |& operator (I/O): Getline/Coprocess. (line 6)
-* | (vertical bar), |& operator (I/O), pipes, closing: Close Files And Pipes.
- (line 118)
-* | (vertical bar), || operator <1>: Precedence. (line 89)
-* | (vertical bar), || operator: Boolean Ops. (line 57)
-* ~ (tilde), ~ operator <1>: Expression Patterns. (line 24)
-* ~ (tilde), ~ operator <2>: Precedence. (line 80)
-* ~ (tilde), ~ operator <3>: Comparison Operators.
- (line 11)
-* ~ (tilde), ~ operator <4>: Regexp Constants. (line 6)
-* ~ (tilde), ~ operator <5>: Computed Regexps. (line 6)
-* ~ (tilde), ~ operator <6>: Case-sensitivity. (line 26)
-* ~ (tilde), ~ operator: Regexp Usage. (line 19)
-
-
+Indirect:
+gawk.info-1: 1351
+gawk.info-2: 297599
+gawk.info-3: 596405
+gawk.info-4: 896127
+gawk.info-5: 1043758

Tag Table:
-Node: Top1352
-Node: Foreword31919
-Node: Preface36264
-Ref: Preface-Footnote-139317
-Ref: Preface-Footnote-239423
-Node: History39655
-Node: Names42046
-Ref: Names-Footnote-143523
-Node: This Manual43595
-Ref: This Manual-Footnote-148533
-Node: Conventions48633
-Node: Manual History50767
-Ref: Manual History-Footnote-154037
-Ref: Manual History-Footnote-254078
-Node: How To Contribute54152
-Node: Acknowledgments55296
-Node: Getting Started59792
-Node: Running gawk62171
-Node: One-shot63357
-Node: Read Terminal64582
-Ref: Read Terminal-Footnote-166232
-Ref: Read Terminal-Footnote-266508
-Node: Long66679
-Node: Executable Scripts68055
-Ref: Executable Scripts-Footnote-169924
-Ref: Executable Scripts-Footnote-270026
-Node: Comments70573
-Node: Quoting73040
-Node: DOS Quoting77663
-Node: Sample Data Files78338
-Node: Very Simple81370
-Node: Two Rules85969
-Node: More Complex88116
-Ref: More Complex-Footnote-191046
-Node: Statements/Lines91131
-Ref: Statements/Lines-Footnote-195593
-Node: Other Features95858
-Node: When96786
-Node: Invoking Gawk98933
-Node: Command Line100394
-Node: Options101177
-Ref: Options-Footnote-1115819
-Node: Other Arguments115844
-Node: Naming Standard Input118502
-Node: Environment Variables119596
-Node: AWKPATH Variable120154
-Ref: AWKPATH Variable-Footnote-1122743
-Node: AWKLIBPATH Variable123003
-Node: Other Environment Variables123600
-Node: Exit Status126095
-Node: Include Files126770
-Node: Loading Shared Libraries130271
-Node: Obsolete131496
-Node: Undocumented132193
-Node: Regexp132436
-Node: Regexp Usage133825
-Node: Escape Sequences135851
-Node: Regexp Operators141614
-Ref: Regexp Operators-Footnote-1148994
-Ref: Regexp Operators-Footnote-2149141
-Node: Bracket Expressions149239
-Ref: table-char-classes151129
-Node: GNU Regexp Operators153652
-Node: Case-sensitivity157375
-Ref: Case-sensitivity-Footnote-1160343
-Ref: Case-sensitivity-Footnote-2160578
-Node: Leftmost Longest160686
-Node: Computed Regexps161887
-Node: Reading Files165297
-Node: Records167300
-Ref: Records-Footnote-1175974
-Node: Fields176011
-Ref: Fields-Footnote-1179044
-Node: Nonconstant Fields179130
-Node: Changing Fields181332
-Node: Field Separators187313
-Node: Default Field Splitting189942
-Node: Regexp Field Splitting191059
-Node: Single Character Fields194401
-Node: Command Line Field Separator195460
-Node: Field Splitting Summary198901
-Ref: Field Splitting Summary-Footnote-1202093
-Node: Constant Size202194
-Node: Splitting By Content206778
-Ref: Splitting By Content-Footnote-1210504
-Node: Multiple Line210544
-Ref: Multiple Line-Footnote-1216391
-Node: Getline216570
-Node: Plain Getline218786
-Node: Getline/Variable220875
-Node: Getline/File222016
-Node: Getline/Variable/File223338
-Ref: Getline/Variable/File-Footnote-1224937
-Node: Getline/Pipe225024
-Node: Getline/Variable/Pipe227584
-Node: Getline/Coprocess228691
-Node: Getline/Variable/Coprocess229934
-Node: Getline Notes230648
-Node: Getline Summary232590
-Ref: table-getline-variants232933
-Node: Read Timeout233789
-Ref: Read Timeout-Footnote-1237534
-Node: Command line directories237591
-Node: Printing238221
-Node: Print239852
-Node: Print Examples241189
-Node: Output Separators243973
-Node: OFMT245733
-Node: Printf247091
-Node: Basic Printf247997
-Node: Control Letters249536
-Node: Format Modifiers253348
-Node: Printf Examples259357
-Node: Redirection262072
-Node: Special Files269056
-Node: Special FD269589
-Ref: Special FD-Footnote-1273214
-Node: Special Network273288
-Node: Special Caveats274138
-Node: Close Files And Pipes274934
-Ref: Close Files And Pipes-Footnote-1281957
-Ref: Close Files And Pipes-Footnote-2282105
-Node: Expressions282255
-Node: Values283387
-Node: Constants284063
-Node: Scalar Constants284743
-Ref: Scalar Constants-Footnote-1285602
-Node: Nondecimal-numbers285784
-Node: Regexp Constants288843
-Node: Using Constant Regexps289318
-Node: Variables292373
-Node: Using Variables293028
-Node: Assignment Options294752
-Node: Conversion296624
-Ref: table-locale-affects302000
-Ref: Conversion-Footnote-1302624
-Node: All Operators302733
-Node: Arithmetic Ops303363
-Node: Concatenation305868
-Ref: Concatenation-Footnote-1308661
-Node: Assignment Ops308781
-Ref: table-assign-ops313769
-Node: Increment Ops315177
-Node: Truth Values and Conditions318647
-Node: Truth Values319730
-Node: Typing and Comparison320779
-Node: Variable Typing321568
-Ref: Variable Typing-Footnote-1325465
-Node: Comparison Operators325587
-Ref: table-relational-ops325997
-Node: POSIX String Comparison329546
-Ref: POSIX String Comparison-Footnote-1330502
-Node: Boolean Ops330640
-Ref: Boolean Ops-Footnote-1334718
-Node: Conditional Exp334809
-Node: Function Calls336541
-Node: Precedence340135
-Node: Locales343804
-Node: Patterns and Actions344893
-Node: Pattern Overview345947
-Node: Regexp Patterns347616
-Node: Expression Patterns348159
-Node: Ranges351844
-Node: BEGIN/END354810
-Node: Using BEGIN/END355572
-Ref: Using BEGIN/END-Footnote-1358303
-Node: I/O And BEGIN/END358409
-Node: BEGINFILE/ENDFILE360691
-Node: Empty363584
-Node: Using Shell Variables363900
-Node: Action Overview366185
-Node: Statements368542
-Node: If Statement370396
-Node: While Statement371895
-Node: Do Statement373939
-Node: For Statement375095
-Node: Switch Statement378247
-Node: Break Statement380344
-Node: Continue Statement382334
-Node: Next Statement384127
-Node: Nextfile Statement386517
-Node: Exit Statement389062
-Node: Built-in Variables391478
-Node: User-modified392573
-Ref: User-modified-Footnote-1400928
-Node: Auto-set400990
-Ref: Auto-set-Footnote-1410898
-Node: ARGC and ARGV411103
-Node: Arrays414954
-Node: Array Basics416459
-Node: Array Intro417285
-Node: Reference to Elements421603
-Node: Assigning Elements423873
-Node: Array Example424364
-Node: Scanning an Array426096
-Node: Controlling Scanning428410
-Ref: Controlling Scanning-Footnote-1433343
-Node: Delete433659
-Ref: Delete-Footnote-1436094
-Node: Numeric Array Subscripts436151
-Node: Uninitialized Subscripts438334
-Node: Multi-dimensional439962
-Node: Multi-scanning443056
-Node: Arrays of Arrays444647
-Node: Functions449292
-Node: Built-in450114
-Node: Calling Built-in451192
-Node: Numeric Functions453180
-Ref: Numeric Functions-Footnote-1457012
-Ref: Numeric Functions-Footnote-2457369
-Ref: Numeric Functions-Footnote-3457417
-Node: String Functions457686
-Ref: String Functions-Footnote-1481183
-Ref: String Functions-Footnote-2481312
-Ref: String Functions-Footnote-3481560
-Node: Gory Details481647
-Ref: table-sub-escapes483326
-Ref: table-sub-posix-92484680
-Ref: table-sub-proposed486023
-Ref: table-posix-sub487373
-Ref: table-gensub-escapes488919
-Ref: Gory Details-Footnote-1490126
-Ref: Gory Details-Footnote-2490177
-Node: I/O Functions490328
-Ref: I/O Functions-Footnote-1496983
-Node: Time Functions497130
-Ref: Time Functions-Footnote-1508022
-Ref: Time Functions-Footnote-2508090
-Ref: Time Functions-Footnote-3508248
-Ref: Time Functions-Footnote-4508359
-Ref: Time Functions-Footnote-5508471
-Ref: Time Functions-Footnote-6508698
-Node: Bitwise Functions508964
-Ref: table-bitwise-ops509522
-Ref: Bitwise Functions-Footnote-1513682
-Node: Type Functions513866
-Node: I18N Functions514336
-Node: User-defined515963
-Node: Definition Syntax516767
-Ref: Definition Syntax-Footnote-1521677
-Node: Function Example521746
-Node: Function Caveats524340
-Node: Calling A Function524761
-Node: Variable Scope525876
-Node: Pass By Value/Reference527851
-Node: Return Statement531291
-Node: Dynamic Typing534272
-Node: Indirect Calls535007
-Node: Internationalization544692
-Node: I18N and L10N546131
-Node: Explaining gettext546817
-Ref: Explaining gettext-Footnote-1551883
-Ref: Explaining gettext-Footnote-2552067
-Node: Programmer i18n552232
-Node: Translator i18n556432
-Node: String Extraction557225
-Ref: String Extraction-Footnote-1558186
-Node: Printf Ordering558272
-Ref: Printf Ordering-Footnote-1561056
-Node: I18N Portability561120
-Ref: I18N Portability-Footnote-1563569
-Node: I18N Example563632
-Ref: I18N Example-Footnote-1566267
-Node: Gawk I18N566339
-Node: Arbitrary Precision Arithmetic566956
-Ref: Arbitrary Precision Arithmetic-Footnote-1568608
-Node: General Arithmetic568756
-Node: Floating Point Issues570476
-Node: String Conversion Precision571571
-Ref: String Conversion Precision-Footnote-1573277
-Node: Unexpected Results573386
-Node: POSIX Floating Point Problems575224
-Ref: POSIX Floating Point Problems-Footnote-1579049
-Node: Integer Programming579087
-Node: Floating-point Programming580835
-Node: Floating-point Representation587060
-Node: Floating-point Context588227
-Ref: table-ieee-formats589069
-Node: Rounding Mode590453
-Ref: table-rounding-modes590932
-Ref: Rounding Mode-Footnote-1593936
-Node: Gawk and MPFR594117
-Node: Arbitrary Precision Floats595358
-Ref: Arbitrary Precision Floats-Footnote-1597780
-Node: Setting Precision598091
-Node: Setting Rounding Mode600818
-Ref: table-gawk-rounding-modes601222
-Node: Floating-point Constants602419
-Node: Changing Precision603841
-Ref: Changing Precision-Footnote-1605241
-Node: Exact Arithmetic605415
-Node: Arbitrary Precision Integers608513
-Ref: Arbitrary Precision Integers-Footnote-1611595
-Node: Advanced Features611742
-Node: Nondecimal Data613265
-Node: Array Sorting614848
-Node: Controlling Array Traversal615545
-Node: Array Sorting Functions623782
-Ref: Array Sorting Functions-Footnote-1627456
-Ref: Array Sorting Functions-Footnote-2627549
-Node: Two-way I/O627743
-Ref: Two-way I/O-Footnote-1633175
-Node: TCP/IP Networking633245
-Node: Profiling636089
-Node: Library Functions643543
-Ref: Library Functions-Footnote-1646550
-Node: Library Names646721
-Ref: Library Names-Footnote-1650192
-Ref: Library Names-Footnote-2650412
-Node: General Functions650498
-Node: Strtonum Function651451
-Node: Assert Function654381
-Node: Round Function657707
-Node: Cliff Random Function659250
-Node: Ordinal Functions660266
-Ref: Ordinal Functions-Footnote-1663336
-Ref: Ordinal Functions-Footnote-2663588
-Node: Join Function663797
-Ref: Join Function-Footnote-1665568
-Node: Gettimeofday Function665768
-Node: Data File Management669483
-Node: Filetrans Function670115
-Node: Rewind Function674254
-Node: File Checking675641
-Node: Empty Files676735
-Node: Ignoring Assigns678965
-Node: Getopt Function680518
-Ref: Getopt Function-Footnote-1691822
-Node: Passwd Functions692025
-Ref: Passwd Functions-Footnote-1701000
-Node: Group Functions701088
-Node: Walking Arrays709172
-Node: Sample Programs710741
-Node: Running Examples711406
-Node: Clones712134
-Node: Cut Program713358
-Node: Egrep Program723203
-Ref: Egrep Program-Footnote-1730976
-Node: Id Program731086
-Node: Split Program734702
-Ref: Split Program-Footnote-1738221
-Node: Tee Program738349
-Node: Uniq Program741152
-Node: Wc Program748581
-Ref: Wc Program-Footnote-1752847
-Ref: Wc Program-Footnote-2753047
-Node: Miscellaneous Programs753139
-Node: Dupword Program754327
-Node: Alarm Program756358
-Node: Translate Program761107
-Ref: Translate Program-Footnote-1765494
-Ref: Translate Program-Footnote-2765722
-Node: Labels Program765856
-Ref: Labels Program-Footnote-1769227
-Node: Word Sorting769311
-Node: History Sorting773195
-Node: Extract Program775034
-Ref: Extract Program-Footnote-1782517
-Node: Simple Sed782645
-Node: Igawk Program785707
-Ref: Igawk Program-Footnote-1800864
-Ref: Igawk Program-Footnote-2801065
-Node: Anagram Program801203
-Node: Signature Program804271
-Node: Debugger805371
-Node: Debugging806323
-Node: Debugging Concepts806756
-Node: Debugging Terms808612
-Node: Awk Debugging811209
-Node: Sample Debugging Session812101
-Node: Debugger Invocation812621
-Node: Finding The Bug813950
-Node: List of Debugger Commands820438
-Node: Breakpoint Control821772
-Node: Debugger Execution Control825436
-Node: Viewing And Changing Data828796
-Node: Execution Stack832152
-Node: Debugger Info833619
-Node: Miscellaneous Debugger Commands837600
-Node: Readline Support843045
-Node: Limitations843876
-Node: Language History846128
-Node: V7/SVR3.1847640
-Node: SVR4849961
-Node: POSIX851403
-Node: BTL852411
-Node: POSIX/GNU853145
-Node: Common Extensions858436
-Node: Ranges and Locales859543
-Ref: Ranges and Locales-Footnote-1864147
-Node: Contributors864368
-Node: Installation868629
-Node: Gawk Distribution869523
-Node: Getting870007
-Node: Extracting870833
-Node: Distribution contents872525
-Node: Unix Installation877747
-Node: Quick Installation878364
-Node: Additional Configuration Options880326
-Node: Configuration Philosophy881803
-Node: Non-Unix Installation884145
-Node: PC Installation884603
-Node: PC Binary Installation885902
-Node: PC Compiling887750
-Node: PC Testing890694
-Node: PC Using891870
-Node: Cygwin896055
-Node: MSYS897055
-Node: VMS Installation897569
-Node: VMS Compilation898172
-Ref: VMS Compilation-Footnote-1899179
-Node: VMS Installation Details899237
-Node: VMS Running900872
-Node: VMS Old Gawk902479
-Node: Bugs902953
-Node: Other Versions906805
-Node: Notes912120
-Node: Compatibility Mode912812
-Node: Additions913595
-Node: Accessing The Source914407
-Node: Adding Code915832
-Node: New Ports921799
-Node: Dynamic Extensions925912
-Node: Internals927352
-Node: Plugin License936174
-Node: Loading Extensions936812
-Node: Sample Library938651
-Node: Internal File Description939341
-Node: Internal File Ops943056
-Ref: Internal File Ops-Footnote-1947798
-Node: Using Internal File Ops947938
-Node: Future Extensions950315
-Node: Basic Concepts952819
-Node: Basic High Level953500
-Ref: Basic High Level-Footnote-1957535
-Node: Basic Data Typing957720
-Node: Glossary961075
-Node: Copying986051
-Node: GNU Free Documentation License1023608
-Node: Index1048745
+(Indirect)
+Node: Top1351
+Node: Foreword31870
+Node: Preface36215
+Ref: Preface-Footnote-139268
+Ref: Preface-Footnote-239374
+Node: History39606
+Node: Names41997
+Ref: Names-Footnote-143474
+Node: This Manual43546
+Ref: This Manual-Footnote-148450
+Node: Conventions48550
+Node: Manual History50684
+Ref: Manual History-Footnote-153954
+Ref: Manual History-Footnote-253995
+Node: How To Contribute54069
+Node: Acknowledgments55213
+Node: Getting Started59709
+Node: Running gawk62088
+Node: One-shot63274
+Node: Read Terminal64499
+Ref: Read Terminal-Footnote-166149
+Ref: Read Terminal-Footnote-266425
+Node: Long66596
+Node: Executable Scripts67972
+Ref: Executable Scripts-Footnote-169841
+Ref: Executable Scripts-Footnote-269943
+Node: Comments70490
+Node: Quoting72957
+Node: DOS Quoting77580
+Node: Sample Data Files78255
+Node: Very Simple81287
+Node: Two Rules85886
+Node: More Complex88033
+Ref: More Complex-Footnote-190963
+Node: Statements/Lines91048
+Ref: Statements/Lines-Footnote-195510
+Node: Other Features95775
+Node: When96703
+Node: Invoking Gawk98850
+Node: Command Line100311
+Node: Options101094
+Ref: Options-Footnote-1116492
+Node: Other Arguments116517
+Node: Naming Standard Input119175
+Node: Environment Variables120269
+Node: AWKPATH Variable120827
+Ref: AWKPATH Variable-Footnote-1123585
+Node: AWKLIBPATH Variable123845
+Node: Other Environment Variables124442
+Node: Exit Status126937
+Node: Include Files127612
+Node: Loading Shared Libraries131181
+Node: Obsolete132406
+Node: Undocumented133103
+Node: Regexp133346
+Node: Regexp Usage134735
+Node: Escape Sequences136761
+Node: Regexp Operators142524
+Ref: Regexp Operators-Footnote-1149904
+Ref: Regexp Operators-Footnote-2150051
+Node: Bracket Expressions150149
+Ref: table-char-classes152039
+Node: GNU Regexp Operators154562
+Node: Case-sensitivity158285
+Ref: Case-sensitivity-Footnote-1161253
+Ref: Case-sensitivity-Footnote-2161488
+Node: Leftmost Longest161596
+Node: Computed Regexps162797
+Node: Reading Files166207
+Node: Records168210
+Ref: Records-Footnote-1176884
+Node: Fields176921
+Ref: Fields-Footnote-1179954
+Node: Nonconstant Fields180040
+Node: Changing Fields182242
+Node: Field Separators188223
+Node: Default Field Splitting190852
+Node: Regexp Field Splitting191969
+Node: Single Character Fields195311
+Node: Command Line Field Separator196370
+Node: Field Splitting Summary199811
+Ref: Field Splitting Summary-Footnote-1203003
+Node: Constant Size203104
+Node: Splitting By Content207688
+Ref: Splitting By Content-Footnote-1211414
+Node: Multiple Line211454
+Ref: Multiple Line-Footnote-1217301
+Node: Getline217480
+Node: Plain Getline219696
+Node: Getline/Variable221785
+Node: Getline/File222926
+Node: Getline/Variable/File224248
+Ref: Getline/Variable/File-Footnote-1225847
+Node: Getline/Pipe225934
+Node: Getline/Variable/Pipe228494
+Node: Getline/Coprocess229601
+Node: Getline/Variable/Coprocess230844
+Node: Getline Notes231558
+Node: Getline Summary233500
+Ref: table-getline-variants233908
+Node: Read Timeout234764
+Ref: Read Timeout-Footnote-1238509
+Node: Command line directories238566
+Node: Printing239196
+Node: Print240827
+Node: Print Examples242164
+Node: Output Separators244948
+Node: OFMT246708
+Node: Printf248066
+Node: Basic Printf248972
+Node: Control Letters250511
+Node: Format Modifiers254323
+Node: Printf Examples260332
+Node: Redirection263047
+Node: Special Files270031
+Node: Special FD270564
+Ref: Special FD-Footnote-1274189
+Node: Special Network274263
+Node: Special Caveats275113
+Node: Close Files And Pipes275909
+Ref: Close Files And Pipes-Footnote-1282932
+Ref: Close Files And Pipes-Footnote-2283080
+Node: Expressions283230
+Node: Values284362
+Node: Constants285038
+Node: Scalar Constants285718
+Ref: Scalar Constants-Footnote-1286577
+Node: Nondecimal-numbers286759
+Node: Regexp Constants289818
+Node: Using Constant Regexps290293
+Node: Variables293348
+Node: Using Variables294003
+Node: Assignment Options295727
+Node: Conversion297599
+Ref: table-locale-affects302975
+Ref: Conversion-Footnote-1303599
+Node: All Operators303708
+Node: Arithmetic Ops304338
+Node: Concatenation306843
+Ref: Concatenation-Footnote-1309636
+Node: Assignment Ops309756
+Ref: table-assign-ops314744
+Node: Increment Ops316152
+Node: Truth Values and Conditions319622
+Node: Truth Values320705
+Node: Typing and Comparison321754
+Node: Variable Typing322543
+Ref: Variable Typing-Footnote-1326440
+Node: Comparison Operators326562
+Ref: table-relational-ops326972
+Node: POSIX String Comparison330521
+Ref: POSIX String Comparison-Footnote-1331477
+Node: Boolean Ops331615
+Ref: Boolean Ops-Footnote-1335693
+Node: Conditional Exp335784
+Node: Function Calls337516
+Node: Precedence341110
+Node: Locales344779
+Node: Patterns and Actions345868
+Node: Pattern Overview346922
+Node: Regexp Patterns348591
+Node: Expression Patterns349134
+Node: Ranges352819
+Node: BEGIN/END355785
+Node: Using BEGIN/END356547
+Ref: Using BEGIN/END-Footnote-1359278
+Node: I/O And BEGIN/END359384
+Node: BEGINFILE/ENDFILE361666
+Node: Empty364570
+Node: Using Shell Variables364886
+Node: Action Overview367171
+Node: Statements369528
+Node: If Statement371382
+Node: While Statement372881
+Node: Do Statement374925
+Node: For Statement376081
+Node: Switch Statement379233
+Node: Break Statement381330
+Node: Continue Statement383320
+Node: Next Statement385113
+Node: Nextfile Statement387503
+Node: Exit Statement390048
+Node: Built-in Variables392464
+Node: User-modified393559
+Ref: User-modified-Footnote-1401914
+Node: Auto-set401976
+Ref: Auto-set-Footnote-1411884
+Node: ARGC and ARGV412089
+Node: Arrays415940
+Node: Array Basics417445
+Node: Array Intro418271
+Node: Reference to Elements422589
+Node: Assigning Elements424859
+Node: Array Example425350
+Node: Scanning an Array427082
+Node: Controlling Scanning429396
+Ref: Controlling Scanning-Footnote-1434329
+Node: Delete434645
+Ref: Delete-Footnote-1437080
+Node: Numeric Array Subscripts437137
+Node: Uninitialized Subscripts439320
+Node: Multi-dimensional440948
+Node: Multi-scanning444042
+Node: Arrays of Arrays445633
+Node: Functions450278
+Node: Built-in451100
+Node: Calling Built-in452178
+Node: Numeric Functions454166
+Ref: Numeric Functions-Footnote-1457998
+Ref: Numeric Functions-Footnote-2458355
+Ref: Numeric Functions-Footnote-3458403
+Node: String Functions458672
+Ref: String Functions-Footnote-1482169
+Ref: String Functions-Footnote-2482298
+Ref: String Functions-Footnote-3482546
+Node: Gory Details482633
+Ref: table-sub-escapes484312
+Ref: table-sub-posix-92485666
+Ref: table-sub-proposed487009
+Ref: table-posix-sub488359
+Ref: table-gensub-escapes489905
+Ref: Gory Details-Footnote-1491112
+Ref: Gory Details-Footnote-2491163
+Node: I/O Functions491314
+Ref: I/O Functions-Footnote-1497969
+Node: Time Functions498116
+Ref: Time Functions-Footnote-1509008
+Ref: Time Functions-Footnote-2509076
+Ref: Time Functions-Footnote-3509234
+Ref: Time Functions-Footnote-4509345
+Ref: Time Functions-Footnote-5509457
+Ref: Time Functions-Footnote-6509684
+Node: Bitwise Functions509950
+Ref: table-bitwise-ops510508
+Ref: Bitwise Functions-Footnote-1514729
+Node: Type Functions514913
+Node: I18N Functions515383
+Node: User-defined517010
+Node: Definition Syntax517814
+Ref: Definition Syntax-Footnote-1522724
+Node: Function Example522793
+Node: Function Caveats525387
+Node: Calling A Function525808
+Node: Variable Scope526923
+Node: Pass By Value/Reference528898
+Node: Return Statement532338
+Node: Dynamic Typing535319
+Node: Indirect Calls536054
+Node: Internationalization545739
+Node: I18N and L10N547178
+Node: Explaining gettext547864
+Ref: Explaining gettext-Footnote-1552930
+Ref: Explaining gettext-Footnote-2553114
+Node: Programmer i18n553279
+Node: Translator i18n557479
+Node: String Extraction558272
+Ref: String Extraction-Footnote-1559233
+Node: Printf Ordering559319
+Ref: Printf Ordering-Footnote-1562103
+Node: I18N Portability562167
+Ref: I18N Portability-Footnote-1564616
+Node: I18N Example564679
+Ref: I18N Example-Footnote-1567314
+Node: Gawk I18N567386
+Node: Arbitrary Precision Arithmetic568003
+Ref: Arbitrary Precision Arithmetic-Footnote-1569655
+Node: General Arithmetic569803
+Node: Floating Point Issues571523
+Node: String Conversion Precision572618
+Ref: String Conversion Precision-Footnote-1574324
+Node: Unexpected Results574433
+Node: POSIX Floating Point Problems576271
+Ref: POSIX Floating Point Problems-Footnote-1580096
+Node: Integer Programming580134
+Node: Floating-point Programming581882
+Node: Floating-point Representation588107
+Node: Floating-point Context589274
+Ref: table-ieee-formats590116
+Node: Rounding Mode591500
+Ref: table-rounding-modes591979
+Ref: Rounding Mode-Footnote-1594983
+Node: Gawk and MPFR595164
+Node: Arbitrary Precision Floats596405
+Ref: Arbitrary Precision Floats-Footnote-1598827
+Node: Setting Precision599138
+Node: Setting Rounding Mode601865
+Ref: table-gawk-rounding-modes602269
+Node: Floating-point Constants603466
+Node: Changing Precision604888
+Ref: Changing Precision-Footnote-1606288
+Node: Exact Arithmetic606462
+Node: Arbitrary Precision Integers609560
+Ref: Arbitrary Precision Integers-Footnote-1612642
+Node: Advanced Features612789
+Node: Nondecimal Data614312
+Node: Array Sorting615895
+Node: Controlling Array Traversal616592
+Node: Array Sorting Functions624829
+Ref: Array Sorting Functions-Footnote-1628503
+Ref: Array Sorting Functions-Footnote-2628596
+Node: Two-way I/O628790
+Ref: Two-way I/O-Footnote-1634222
+Node: TCP/IP Networking634292
+Node: Profiling637136
+Node: Library Functions644590
+Ref: Library Functions-Footnote-1647597
+Node: Library Names647768
+Ref: Library Names-Footnote-1651239
+Ref: Library Names-Footnote-2651459
+Node: General Functions651545
+Node: Strtonum Function652498
+Node: Assert Function655428
+Node: Round Function658754
+Node: Cliff Random Function660297
+Node: Ordinal Functions661313
+Ref: Ordinal Functions-Footnote-1664383
+Ref: Ordinal Functions-Footnote-2664635
+Node: Join Function664844
+Ref: Join Function-Footnote-1666615
+Node: Getlocaltime Function666815
+Node: Data File Management670530
+Node: Filetrans Function671162
+Node: Rewind Function675301
+Node: File Checking676688
+Node: Empty Files677782
+Node: Ignoring Assigns680012
+Node: Getopt Function681565
+Ref: Getopt Function-Footnote-1692869
+Node: Passwd Functions693072
+Ref: Passwd Functions-Footnote-1702047
+Node: Group Functions702135
+Node: Walking Arrays710219
+Node: Sample Programs711788
+Node: Running Examples712453
+Node: Clones713181
+Node: Cut Program714405
+Node: Egrep Program724250
+Ref: Egrep Program-Footnote-1732023
+Node: Id Program732133
+Node: Split Program735749
+Ref: Split Program-Footnote-1739268
+Node: Tee Program739396
+Node: Uniq Program742199
+Node: Wc Program749628
+Ref: Wc Program-Footnote-1753894
+Ref: Wc Program-Footnote-2754094
+Node: Miscellaneous Programs754186
+Node: Dupword Program755374
+Node: Alarm Program757405
+Node: Translate Program762154
+Ref: Translate Program-Footnote-1766541
+Ref: Translate Program-Footnote-2766769
+Node: Labels Program766903
+Ref: Labels Program-Footnote-1770274
+Node: Word Sorting770358
+Node: History Sorting774242
+Node: Extract Program776081
+Ref: Extract Program-Footnote-1783564
+Node: Simple Sed783692
+Node: Igawk Program786754
+Ref: Igawk Program-Footnote-1801911
+Ref: Igawk Program-Footnote-2802112
+Node: Anagram Program802250
+Node: Signature Program805318
+Node: Debugger806418
+Node: Debugging807372
+Node: Debugging Concepts807805
+Node: Debugging Terms809661
+Node: Awk Debugging812258
+Node: Sample Debugging Session813150
+Node: Debugger Invocation813670
+Node: Finding The Bug814999
+Node: List of Debugger Commands821487
+Node: Breakpoint Control822821
+Node: Debugger Execution Control826485
+Node: Viewing And Changing Data829845
+Node: Execution Stack833201
+Node: Debugger Info834668
+Node: Miscellaneous Debugger Commands838649
+Node: Readline Support844094
+Node: Limitations844925
+Node: Dynamic Extensions847177
+Node: Plugin License848073
+Node: Sample Library848687
+Node: Internal File Description849371
+Node: Internal File Ops853084
+Ref: Internal File Ops-Footnote-1857647
+Node: Using Internal File Ops857787
+Node: Language History860163
+Node: V7/SVR3.1861685
+Node: SVR4864006
+Node: POSIX865448
+Node: BTL866456
+Node: POSIX/GNU867190
+Node: Common Extensions872725
+Node: Ranges and Locales873832
+Ref: Ranges and Locales-Footnote-1878436
+Node: Contributors878657
+Node: Installation882953
+Node: Gawk Distribution883847
+Node: Getting884331
+Node: Extracting885157
+Node: Distribution contents886849
+Node: Unix Installation892071
+Node: Quick Installation892688
+Node: Additional Configuration Options894650
+Node: Configuration Philosophy896127
+Node: Non-Unix Installation898469
+Node: PC Installation898927
+Node: PC Binary Installation900226
+Node: PC Compiling902074
+Node: PC Testing905018
+Node: PC Using906194
+Node: Cygwin910379
+Node: MSYS911379
+Node: VMS Installation911893
+Node: VMS Compilation912496
+Ref: VMS Compilation-Footnote-1913503
+Node: VMS Installation Details913561
+Node: VMS Running915196
+Node: VMS Old Gawk916803
+Node: Bugs917277
+Node: Other Versions921129
+Node: Notes926444
+Node: Compatibility Mode927031
+Node: Additions927814
+Node: Accessing The Source928741
+Node: Adding Code930166
+Node: New Ports936174
+Node: Derived Files940309
+Ref: Derived Files-Footnote-1945613
+Ref: Derived Files-Footnote-2945647
+Ref: Derived Files-Footnote-3946247
+Node: Future Extensions946345
+Node: Basic Concepts947832
+Node: Basic High Level948513
+Ref: Basic High Level-Footnote-1952548
+Node: Basic Data Typing952733
+Node: Glossary956088
+Node: Copying981064
+Node: GNU Free Documentation License1018621
+Node: Index1043758

End Tag Table