aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.info-2
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.info-2')
-rw-r--r--gawk.info-21236
1 files changed, 1236 insertions, 0 deletions
diff --git a/gawk.info-2 b/gawk.info-2
new file mode 100644
index 00000000..954b4309
--- /dev/null
+++ b/gawk.info-2
@@ -0,0 +1,1236 @@
+This is Info file gawk.info, produced by Makeinfo-1.54 from the input
+file gawk.texi.
+
+ This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ This is Edition 0.15 of `The GAWK Manual',
+for the 2.15 version of the GNU implementation
+of AWK.
+
+ Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
+
+ Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+ Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+ Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+File: gawk.info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started
+
+`awk' Statements versus Lines
+=============================
+
+ Most often, each line in an `awk' program is a separate statement or
+separate rule, like this:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+ But sometimes statements can be more than one line, and lines can
+contain several statements. You can split a statement into multiple
+lines by inserting a newline after any of the following:
+
+ , { ? : || && do else
+
+A newline at any other point is considered the end of the statement.
+(Splitting lines after `?' and `:' is a minor `gawk' extension. The
+`?' and `:' referred to here is the three operand conditional
+expression described in *Note Conditional Expressions: Conditional Exp.)
+
+ If you would like to split a single statement into two lines at a
+point where a newline would terminate it, you can "continue" it by
+ending the first line with a backslash character, `\'. This is allowed
+absolutely anywhere in the statement, even in the middle of a string or
+regular expression. For example:
+
+ awk '/This program is too long, so continue it\
+ on the next line/ { print $1 }'
+
+We have generally not used backslash continuation in the sample
+programs in this manual. Since in `gawk' there is no limit on the
+length of a line, it is never strictly necessary; it just makes
+programs prettier. We have preferred to make them even more pretty by
+keeping the statements short. Backslash continuation is most useful
+when your `awk' program is in a separate source file, instead of typed
+in on the command line. You should also note that many `awk'
+implementations are more picky about where you may use backslash
+continuation. For maximal portability of your `awk' programs, it is
+best not to split your lines in the middle of a regular expression or a
+string.
+
+ *Warning: backslash continuation does not work as described above
+with the C shell.* Continuation with backslash works for `awk'
+programs in files, and also for one-shot programs *provided* you are
+using a POSIX-compliant shell, such as the Bourne shell or the
+Bourne-again shell. But the C shell used on Berkeley Unix behaves
+differently! There, you must use two backslashes in a row, followed by
+a newline.
+
+ When `awk' statements within one rule are short, you might want to
+put more than one of them on a line. You do this by separating the
+statements with a semicolon, `;'. This also applies to the rules
+themselves. Thus, the previous program could have been written:
+
+ /12/ { print $0 } ; /21/ { print $0 }
+
+*Note:* the requirement that rules on the same line must be separated
+with a semicolon is a recent change in the `awk' language; it was done
+for consistency with the treatment of statements within an action.
+
+
+File: gawk.info, Node: When, Prev: Statements/Lines, Up: Getting Started
+
+When to Use `awk'
+=================
+
+ You might wonder how `awk' might be useful for you. Using additional
+utility programs, more advanced patterns, field separators, arithmetic
+statements, and other selection criteria, you can produce much more
+complex output. The `awk' language is very useful for producing
+reports from large amounts of raw data, such as summarizing information
+from the output of other utility programs like `ls'. (*Note A More
+Complex Example: More Complex.)
+
+ Programs written with `awk' are usually much smaller than they would
+be in other languages. This makes `awk' programs easy to compose and
+use. Often `awk' programs can be quickly composed at your terminal,
+used once, and thrown away. Since `awk' programs are interpreted, you
+can avoid the usually lengthy edit-compile-test-debug cycle of software
+development.
+
+ Complex programs have been written in `awk', including a complete
+retargetable assembler for 8-bit microprocessors (*note Glossary::., for
+more information) and a microcode assembler for a special purpose Prolog
+computer. However, `awk''s capabilities are strained by tasks of such
+complexity.
+
+ If you find yourself writing `awk' scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated string
+or pattern matching capabilities. The shell is also good at string and
+pattern matching; in addition, it allows powerful use of the system
+utilities. More conventional languages, such as C, C++, and Lisp, offer
+better facilities for system programming and for managing the complexity
+of large programs. Programs in these languages may require more lines
+of source code than the equivalent `awk' programs, but they are easier
+to maintain and usually run more efficiently.
+
+
+File: gawk.info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top
+
+Reading Input Files
+*******************
+
+ In the typical `awk' program, all input is read either from the
+standard input (by default the keyboard, but often a pipe from another
+command) or from files whose names you specify on the `awk' command
+line. If you specify input files, `awk' reads them in order, reading
+all the data from one before going on to the next. The name of the
+current input file can be found in the built-in variable `FILENAME'
+(*note Built-in Variables::.).
+
+ The input is read in units called records, and processed by the
+rules one record at a time. By default, each record is one line. Each
+record is split automatically into fields, to make it more convenient
+for a rule to work on its parts.
+
+ On rare occasions you will need to use the `getline' command, which
+can do explicit input from any number of files (*note Explicit Input
+with `getline': Getline.).
+
+* Menu:
+
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the `getline' function.
+* Close Input:: Closing an input file (so you can read from
+ the beginning once more).
+
+
+File: gawk.info, Node: Records, Next: Fields, Prev: Reading Files, Up: Reading Files
+
+How Input is Split into Records
+===============================
+
+ The `awk' language divides its input into records and fields.
+Records are separated by a character called the "record separator". By
+default, the record separator is the newline character, defining a
+record to be a single line of text.
+
+ Sometimes you may want to use a different character to separate your
+records. You can use a different character by changing the built-in
+variable `RS'. The value of `RS' is a string that says how to separate
+records; the default value is `"\n"', the string containing just a
+newline character. This is why records are, by default, single lines.
+
+ `RS' can have any string as its value, but only the first character
+of the string is used as the record separator. The other characters are
+ignored. `RS' is exceptional in this regard; `awk' uses the full value
+of all its other built-in variables.
+
+ You can change the value of `RS' in the `awk' program with the
+assignment operator, `=' (*note Assignment Expressions: Assignment
+Ops.). The new record-separator character should be enclosed in
+quotation marks to make a string constant. Often the right time to do
+this is at the beginning of execution, before any input has been
+processed, so that the very first record will be read with the proper
+separator. To do this, use the special `BEGIN' pattern (*note `BEGIN'
+and `END' Special Patterns: BEGIN/END.). For example:
+
+ awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
+
+changes the value of `RS' to `"/"', before reading any input. This is
+a string whose first character is a slash; as a result, records are
+separated by slashes. Then the input file is read, and the second rule
+in the `awk' program (the action with no pattern) prints each record.
+Since each `print' statement adds a newline at the end of its output,
+the effect of this `awk' program is to copy the input with each slash
+changed to a newline.
+
+ Another way to change the record separator is on the command line,
+using the variable-assignment feature (*note Invoking `awk': Command
+Line.).
+
+ awk '{ print $0 }' RS="/" BBS-list
+
+This sets `RS' to `/' before processing `BBS-list'.
+
+ Reaching the end of an input file terminates the current input
+record, even if the last character in the file is not the character in
+`RS'.
+
+ The empty string, `""' (a string of no characters), has a special
+meaning as the value of `RS': it means that records are separated only
+by blank lines. *Note Multiple-Line Records: Multiple Line, for more
+details.
+
+ The `awk' utility keeps track of the number of records that have
+been read so far from the current input file. This value is stored in a
+built-in variable called `FNR'. It is reset to zero when a new file is
+started. Another built-in variable, `NR', is the total number of input
+records read so far from all files. It starts at zero but is never
+automatically reset to zero.
+
+ If you change the value of `RS' in the middle of an `awk' run, the
+new value is used to delimit subsequent records, but the record
+currently being processed (and records already processed) are not
+affected.
+
+
+File: gawk.info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files
+
+Examining Fields
+================
+
+ When `awk' reads an input record, the record is automatically
+separated or "parsed" by the interpreter into chunks called "fields".
+By default, fields are separated by whitespace, like words in a line.
+Whitespace in `awk' means any string of one or more spaces and/or tabs;
+other characters such as newline, formfeed, and so on, that are
+considered whitespace by other languages are *not* considered
+whitespace by `awk'.
+
+ The purpose of fields is to make it more convenient for you to refer
+to these pieces of the record. You don't have to use them--you can
+operate on the whole record if you wish--but fields are what make
+simple `awk' programs so powerful.
+
+ To refer to a field in an `awk' program, you use a dollar-sign, `$',
+followed by the number of the field you want. Thus, `$1' refers to the
+first field, `$2' to the second, and so on. For example, suppose the
+following is a line of input:
+
+ This seems like a pretty nice example.
+
+Here the first field, or `$1', is `This'; the second field, or `$2', is
+`seems'; and so on. Note that the last field, `$7', is `example.'.
+Because there is no space between the `e' and the `.', the period is
+considered part of the seventh field.
+
+ No matter how many fields there are, the last field in a record can
+be represented by `$NF'. So, in the example above, `$NF' would be the
+same as `$7', which is `example.'. Why this works is explained below
+(*note Non-constant Field Numbers: Non-Constant Fields.). If you try
+to refer to a field beyond the last one, such as `$8' when the record
+has only 7 fields, you get the empty string.
+
+ Plain `NF', with no `$', is a built-in variable whose value is the
+number of fields in the current record.
+
+ `$0', which looks like an attempt to refer to the zeroth field, is a
+special case: it represents the whole input record. This is what you
+would use if you weren't interested in fields.
+
+ Here are some more examples:
+
+ awk '$1 ~ /foo/ { print $0 }' BBS-list
+
+This example prints each record in the file `BBS-list' whose first
+field contains the string `foo'. The operator `~' is called a
+"matching operator" (*note Comparison Expressions: Comparison Ops.); it
+tests whether a string (here, the field `$1') matches a given regular
+expression.
+
+ By contrast, the following example:
+
+ awk '/foo/ { print $1, $NF }' BBS-list
+
+looks for `foo' in *the entire record* and prints the first field and
+the last field for each input record containing a match.
+
+
+File: gawk.info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files
+
+Non-constant Field Numbers
+==========================
+
+ The number of a field does not need to be a constant. Any
+expression in the `awk' language can be used after a `$' to refer to a
+field. The value of the expression specifies the field number. If the
+value is a string, rather than a number, it is converted to a number.
+Consider this example:
+
+ awk '{ print $NR }'
+
+Recall that `NR' is the number of records read so far: 1 in the first
+record, 2 in the second, etc. So this example prints the first field
+of the first record, the second field of the second record, and so on.
+For the twentieth record, field number 20 is printed; most likely, the
+record has fewer than 20 fields, so this prints a blank line.
+
+ Here is another example of using expressions as field numbers:
+
+ awk '{ print $(2*2) }' BBS-list
+
+ The `awk' language must evaluate the expression `(2*2)' and use its
+value as the number of the field to print. The `*' sign represents
+multiplication, so the expression `2*2' evaluates to 4. The
+parentheses are used so that the multiplication is done before the `$'
+operation; they are necessary whenever there is a binary operator in
+the field-number expression. This example, then, prints the hours of
+operation (the fourth field) for every line of the file `BBS-list'.
+
+ If the field number you compute is zero, you get the entire record.
+Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are
+not allowed.
+
+ The number of fields in the current record is stored in the built-in
+variable `NF' (*note Built-in Variables::.). The expression `$NF' is
+not a special feature: it is the direct consequence of evaluating `NF'
+and using its value as a field number.
+
+
+File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files
+
+Changing the Contents of a Field
+================================
+
+ You can change the contents of a field as seen by `awk' within an
+`awk' program; this changes what `awk' perceives as the current input
+record. (The actual input is untouched: `awk' never modifies the input
+file.)
+
+ Consider this example:
+
+ awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
+
+The `-' sign represents subtraction, so this program reassigns field
+three, `$3', to be the value of field two minus ten, `$2 - 10'. (*Note
+Arithmetic Operators: Arithmetic Ops.) Then field two, and the new
+value for field three, are printed.
+
+ In order for this to work, the text in field `$2' must make sense as
+a number; the string of characters must be converted to a number in
+order for the computer to do arithmetic on it. The number resulting
+from the subtraction is converted back to a string of characters which
+then becomes field three. *Note Conversion of Strings and Numbers:
+Conversion.
+
+ When you change the value of a field (as perceived by `awk'), the
+text of the input record is recalculated to contain the new field where
+the old one was. Therefore, `$0' changes to reflect the altered field.
+Thus,
+
+ awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
+
+prints a copy of the input file, with 10 subtracted from the second
+field of each line.
+
+ You can also assign contents to fields that are out of range. For
+example:
+
+ awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
+
+We've just created `$6', whose value is the sum of fields `$2', `$3',
+`$4', and `$5'. The `+' sign represents addition. For the file
+`inventory-shipped', `$6' represents the total number of parcels
+shipped for a particular month.
+
+ Creating a new field changes the internal `awk' copy of the current
+input record--the value of `$0'. Thus, if you do `print $0' after
+adding a field, the record printed includes the new field, with the
+appropriate number of field separators between it and the previously
+existing fields.
+
+ This recomputation affects and is affected by several features not
+yet discussed, in particular, the "output field separator", `OFS',
+which is used to separate the fields (*note Output Separators::.), and
+`NF' (the number of fields; *note Examining Fields: Fields.). For
+example, the value of `NF' is set to the number of the highest field
+you create.
+
+ Note, however, that merely *referencing* an out-of-range field does
+*not* change the value of either `$0' or `NF'. Referencing an
+out-of-range field merely produces a null string. For example:
+
+ if ($(NF+1) != "")
+ print "can't happen"
+ else
+ print "everything is normal"
+
+should print `everything is normal', because `NF+1' is certain to be
+out of range. (*Note The `if' Statement: If Statement, for more
+information about `awk''s `if-else' statements.)
+
+ It is important to note that assigning to a field will change the
+value of `$0', but will not change the value of `NF', even when you
+assign the null string to a field. For example:
+
+ echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }'
+
+prints
+
+ a::c:d
+ 4
+
+The field is still there, it just has an empty value. You can tell
+because there are two colons in a row.
+
+
+File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files
+
+Specifying how Fields are Separated
+===================================
+
+ (This section is rather long; it describes one of the most
+fundamental operations in `awk'. If you are a novice with `awk', we
+recommend that you re-read this section after you have studied the
+section on regular expressions, *Note Regular Expressions as Patterns:
+Regexp.)
+
+ The way `awk' splits an input record into fields is controlled by
+the "field separator", which is a single character or a regular
+expression. `awk' scans the input record for matches for the
+separator; the fields themselves are the text between the matches. For
+example, if the field separator is `oo', then the following line:
+
+ moo goo gai pan
+
+would be split into three fields: `m', ` g' and ` gai pan'.
+
+ The field separator is represented by the built-in variable `FS'.
+Shell programmers take note! `awk' does not use the name `IFS' which
+is used by the shell.
+
+ You can change the value of `FS' in the `awk' program with the
+assignment operator, `=' (*note Assignment Expressions: Assignment
+Ops.). Often the right time to do this is at the beginning of
+execution, before any input has been processed, so that the very first
+record will be read with the proper separator. To do this, use the
+special `BEGIN' pattern (*note `BEGIN' and `END' Special Patterns:
+BEGIN/END.). For example, here we set the value of `FS' to the string
+`","':
+
+ awk 'BEGIN { FS = "," } ; { print $2 }'
+
+Given the input line,
+
+ John Q. Smith, 29 Oak St., Walamazoo, MI 42139
+
+this `awk' program extracts the string ` 29 Oak St.'.
+
+ Sometimes your input data will contain separator characters that
+don't separate fields the way you thought they would. For instance, the
+person's name in the example we've been using might have a title or
+suffix attached, such as `John Q. Smith, LXIX'. From input containing
+such a name:
+
+ John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
+
+the previous sample program would extract ` LXIX', instead of ` 29 Oak
+St.'. If you were expecting the program to print the address, you
+would be surprised. So choose your data layout and separator
+characters carefully to prevent such problems.
+
+ As you know, by default, fields are separated by whitespace sequences
+(spaces and tabs), not by single spaces: two spaces in a row do not
+delimit an empty field. The default value of the field separator is a
+string `" "' containing a single space. If this value were interpreted
+in the usual way, each space character would separate fields, so two
+spaces in a row would make an empty field between them. The reason
+this does not happen is that a single space as the value of `FS' is a
+special case: it is taken to specify the default manner of delimiting
+fields.
+
+ If `FS' is any other single character, such as `","', then each
+occurrence of that character separates two fields. Two consecutive
+occurrences delimit an empty field. If the character occurs at the
+beginning or the end of the line, that too delimits an empty field. The
+space character is the only single character which does not follow these
+rules.
+
+ More generally, the value of `FS' may be a string containing any
+regular expression. Then each match in the record for the regular
+expression separates fields. For example, the assignment:
+
+ FS = ", \t"
+
+makes every area of an input line that consists of a comma followed by a
+space and a tab, into a field separator. (`\t' stands for a tab.)
+
+ For a less trivial example of a regular expression, suppose you want
+single spaces to separate fields the way single commas were used above.
+You can set `FS' to `"[ ]"'. This regular expression matches a single
+space and nothing else.
+
+ `FS' can be set on the command line. You use the `-F' argument to
+do so. For example:
+
+ awk -F, 'PROGRAM' INPUT-FILES
+
+sets `FS' to be the `,' character. Notice that the argument uses a
+capital `F'. Contrast this with `-f', which specifies a file
+containing an `awk' program. Case is significant in command options:
+the `-F' and `-f' options have nothing to do with each other. You can
+use both options at the same time to set the `FS' argument *and* get an
+`awk' program from a file.
+
+ The value used for the argument to `-F' is processed in exactly the
+same way as assignments to the built-in variable `FS'. This means that
+if the field separator contains special characters, they must be escaped
+appropriately. For example, to use a `\' as the field separator, you
+would have to type:
+
+ # same as FS = "\\"
+ awk -F\\\\ '...' files ...
+
+Since `\' is used for quoting in the shell, `awk' will see `-F\\'.
+Then `awk' processes the `\\' for escape characters (*note Constant
+Expressions: Constants.), finally yielding a single `\' to be used for
+the field separator.
+
+ As a special case, in compatibility mode (*note Invoking `awk':
+Command Line.), if the argument to `-F' is `t', then `FS' is set to the
+tab character. (This is because if you type `-F\t', without the quotes,
+at the shell, the `\' gets deleted, so `awk' figures that you really
+want your fields to be separated with tabs, and not `t's. Use `-v
+FS="t"' on the command line if you really do want to separate your
+fields with `t's.)
+
+ For example, let's use an `awk' program file called `baud.awk' that
+contains the pattern `/300/', and the action `print $1'. Here is the
+program:
+
+ /300/ { print $1 }
+
+ Let's also set `FS' to be the `-' character, and run the program on
+the file `BBS-list'. The following command prints a list of the names
+of the bulletin boards that operate at 300 baud and the first three
+digits of their phone numbers:
+
+ awk -F- -f baud.awk BBS-list
+
+It produces this output:
+
+ aardvark 555
+ alpo
+ barfly 555
+ bites 555
+ camelot 555
+ core 555
+ fooey 555
+ foot 555
+ macfoo 555
+ sdace 555
+ sabafoo 555
+
+Note the second line of output. If you check the original file, you
+will see that the second line looked like this:
+
+ alpo-net 555-3412 2400/1200/300 A
+
+ The `-' as part of the system's name was used as the field
+separator, instead of the `-' in the phone number that was originally
+intended. This demonstrates why you have to be careful in choosing
+your field and record separators.
+
+ The following program searches the system password file, and prints
+the entries for users who have no password:
+
+ awk -F: '$2 == ""' /etc/passwd
+
+Here we use the `-F' option on the command line to set the field
+separator. Note that fields in `/etc/passwd' are separated by colons.
+The second field represents a user's encrypted password, but if the
+field is empty, that user has no password.
+
+ According to the POSIX standard, `awk' is supposed to behave as if
+each record is split into fields at the time that it is read. In
+particular, this means that you can change the value of `FS' after a
+record is read, but before any of the fields are referenced. The value
+of the fields (i.e. how they were split) should reflect the old value
+of `FS', not the new one.
+
+ However, many implementations of `awk' do not do this. Instead,
+they defer splitting the fields until a field reference actually
+happens, using the *current* value of `FS'! This behavior can be
+difficult to diagnose. The following example illustrates the results of
+the two methods. (The `sed' command prints just the first line of
+`/etc/passwd'.)
+
+ sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
+
+will usually print
+
+ root
+
+on an incorrect implementation of `awk', while `gawk' will print
+something like
+
+ root:nSijPlPhZZwgE:0:0:Root:/:
+
+ There is an important difference between the two cases of `FS = " "'
+(a single blank) and `FS = "[ \t]+"' (which is a regular expression
+matching one or more blanks or tabs). For both values of `FS', fields
+are separated by runs of blanks and/or tabs. However, when the value of
+`FS' is `" "', `awk' will strip leading and trailing whitespace from
+the record, and then decide where the fields are.
+
+ For example, the following expression prints `b':
+
+ echo ' a b c d ' | awk '{ print $2 }'
+
+However, the following prints `a':
+
+ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }'
+
+In this case, the first field is null.
+
+ The stripping of leading and trailing whitespace also comes into
+play whenever `$0' is recomputed. For instance, this pipeline
+
+ echo ' a b c d' | awk '{ print; $2 = $2; print }'
+
+produces this output:
+
+ a b c d
+ a b c d
+
+The first `print' statement prints the record as it was read, with
+leading whitespace intact. The assignment to `$2' rebuilds `$0' by
+concatenating `$1' through `$NF' together, separated by the value of
+`OFS'. Since the leading whitespace was ignored when finding `$1', it
+is not part of the new `$0'. Finally, the last `print' statement
+prints the new `$0'.
+
+ The following table summarizes how fields are split, based on the
+value of `FS'.
+
+`FS == " "'
+ Fields are separated by runs of whitespace. Leading and trailing
+ whitespace are ignored. This is the default.
+
+`FS == ANY SINGLE CHARACTER'
+ Fields are separated by each occurrence of the character. Multiple
+ successive occurrences delimit empty fields, as do leading and
+ trailing occurrences.
+
+`FS == REGEXP'
+ Fields are separated by occurrences of characters that match
+ REGEXP. Leading and trailing matches of REGEXP delimit empty
+ fields.
+
+
+File: gawk.info, Node: Constant Size, Next: Multiple Line, Prev: Field Separators, Up: Reading Files
+
+Reading Fixed-width Data
+========================
+
+ (This section discusses an advanced, experimental feature. If you
+are a novice `awk' user, you may wish to skip it on the first reading.)
+
+ `gawk' 2.13 introduced a new facility for dealing with fixed-width
+fields with no distinctive field separator. Data of this nature arises
+typically in one of at least two ways: the input for old FORTRAN
+programs where numbers are run together, and the output of programs
+that did not anticipate the use of their output as input for other
+programs.
+
+ An example of the latter is a table where all the columns are lined
+up by the use of a variable number of spaces and *empty fields are just
+spaces*. Clearly, `awk''s normal field splitting based on `FS' will
+not work well in this case. (Although a portable `awk' program can use
+a series of `substr' calls on `$0', this is awkward and inefficient for
+a large number of fields.)
+
+ The splitting of an input record into fixed-width fields is
+specified by assigning a string containing space-separated numbers to
+the built-in variable `FIELDWIDTHS'. Each number specifies the width
+of the field *including* columns between fields. If you want to ignore
+the columns between fields, you can specify the width as a separate
+field that is subsequently ignored.
+
+ The following data is the output of the `w' utility. It is useful
+to illustrate the use of `FIELDWIDTHS'.
+
+ 10:06pm up 21 days, 14:04, 23 users
+ User tty login idle JCPU PCPU what
+ hzuo ttyV0 8:58pm 9 5 vi p24.tex
+ hzang ttyV3 6:37pm 50 -csh
+ eklye ttyV5 9:53pm 7 1 em thes.tex
+ dportein ttyV6 8:17pm 1:47 -csh
+ gierd ttyD3 10:00pm 1 elm
+ dave ttyD4 9:47pm 4 4 w
+ brent ttyp0 26Jun91 4:46 26:46 4:41 bash
+ dave ttyq4 26Jun9115days 46 46 wnewmail
+
+ The following program takes the above input, converts the idle time
+to number of seconds and prints out the first two fields and the
+calculated idle time. (This program uses a number of `awk' features
+that haven't been introduced yet.)
+
+ BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
+ NR > 2 {
+ idle = $4
+ sub(/^ */, "", idle) # strip leading spaces
+ if (idle == "") idle = 0
+ if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] }
+ if (idle ~ /days/) { idle *= 24 * 60 * 60 }
+
+ print $1, $2, idle
+ }
+
+ Here is the result of running the program on the data:
+
+ hzuo ttyV0 0
+ hzang ttyV3 50
+ eklye ttyV5 0
+ dportein ttyV6 107
+ gierd ttyD3 1
+ dave ttyD4 0
+ brent ttyp0 286
+ dave ttyq4 1296000
+
+ Another (possibly more practical) example of fixed-width input data
+would be the input from a deck of balloting cards. In some parts of
+the United States, voters make their choices by punching holes in
+computer cards. These cards are then processed to count the votes for
+any particular candidate or on any particular issue. Since a voter may
+choose not to vote on some issue, any column on the card may be empty.
+An `awk' program for processing such data could use the `FIELDWIDTHS'
+feature to simplify reading the data.
+
+ This feature is still experimental, and will likely evolve over time.
+
+
+File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Constant Size, Up: Reading Files
+
+Multiple-Line Records
+=====================
+
+ In some data bases, a single line cannot conveniently hold all the
+information in one entry. In such cases, you can use multi-line
+records.
+
+ The first step in doing this is to choose your data format: when
+records are not defined as single lines, how do you want to define them?
+What should separate records?
+
+ One technique is to use an unusual character or string to separate
+records. For example, you could use the formfeed character (written
+`\f' in `awk', as in C) to separate them, making each record a page of
+the file. To do this, just set the variable `RS' to `"\f"' (a string
+containing the formfeed character). Any other character could equally
+well be used, as long as it won't be part of the data in a record.
+
+ Another technique is to have blank lines separate records. By a
+special dispensation, a null string as the value of `RS' indicates that
+records are separated by one or more blank lines. If you set `RS' to
+the null string, a record always ends at the first blank line
+encountered. And the next record doesn't start until the first nonblank
+line that follows--no matter how many blank lines appear in a row, they
+are considered one record-separator. (End of file is also considered a
+record separator.)
+
+ The second step is to separate the fields in the record. One way to
+do this is to put each field on a separate line: to do this, just set
+the variable `FS' to the string `"\n"'. (This simple regular
+expression matches a single newline.)
+
+ Another way to separate fields is to divide each of the lines into
+fields in the normal manner. This happens by default as a result of a
+special feature: when `RS' is set to the null string, the newline
+character *always* acts as a field separator. This is in addition to
+whatever field separations result from `FS'.
+
+ The original motivation for this special exception was probably so
+that you get useful behavior in the default case (i.e., `FS == " "').
+This feature can be a problem if you really don't want the newline
+character to separate fields, since there is no way to prevent it.
+However, you can work around this by using the `split' function to
+break up the record manually (*note Built-in Functions for String
+Manipulation: String Functions.).
+
+
+File: gawk.info, Node: Getline, Next: Close Input, Prev: Multiple Line, Up: Reading Files
+
+Explicit Input with `getline'
+=============================
+
+ So far we have been getting our input files from `awk''s main input
+stream--either the standard input (usually your terminal) or the files
+specified on the command line. The `awk' language has a special
+built-in command called `getline' that can be used to read input under
+your explicit control.
+
+ This command is quite complex and should *not* be used by beginners.
+It is covered here because this is the chapter on input. The examples
+that follow the explanation of the `getline' command include material
+that has not been covered yet. Therefore, come back and study the
+`getline' command *after* you have reviewed the rest of this manual and
+have a good knowledge of how `awk' works.
+
+ `getline' returns 1 if it finds a record, and 0 if the end of the
+file is encountered. If there is some error in getting a record, such
+as a file that cannot be opened, then `getline' returns -1. In this
+case, `gawk' sets the variable `ERRNO' to a string describing the error
+that occurred.
+
+ In the following examples, COMMAND stands for a string value that
+represents a shell command.
+
+`getline'
+ The `getline' command can be used without arguments to read input
+ from the current input file. All it does in this case is read the
+ next input record and split it up into fields. This is useful if
+ you've finished processing the current record, but you want to do
+ some special processing *right now* on the next record. Here's an
+ example:
+
+ awk '{
+ if (t = index($0, "/*")) {
+ if (t > 1)
+ tmp = substr($0, 1, t - 1)
+ else
+ tmp = ""
+ u = index(substr($0, t + 2), "*/")
+ while (u == 0) {
+ getline
+ t = -1
+ u = index($0, "*/")
+ }
+ if (u <= length($0) - 2)
+ $0 = tmp substr($0, t + u + 3)
+ else
+ $0 = tmp
+ }
+ print $0
+ }'
+
+ This `awk' program deletes all C-style comments, `/* ... */',
+ from the input. By replacing the `print $0' with other
+ statements, you could perform more complicated processing on the
+ decommented input, like searching for matches of a regular
+ expression. (This program has a subtle problem--can you spot it?)
+
+ This form of the `getline' command sets `NF' (the number of
+ fields; *note Examining Fields: Fields.), `NR' (the number of
+ records read so far; *note How Input is Split into Records:
+ Records.), `FNR' (the number of records read from this input
+ file), and the value of `$0'.
+
+ *Note:* the new value of `$0' is used in testing the patterns of
+ any subsequent rules. The original value of `$0' that triggered
+ the rule which executed `getline' is lost. By contrast, the
+ `next' statement reads a new record but immediately begins
+ processing it normally, starting with the first rule in the
+ program. *Note The `next' Statement: Next Statement.
+
+`getline VAR'
+ This form of `getline' reads a record into the variable VAR. This
+ is useful when you want your program to read the next record from
+ the current input file, but you don't want to subject the record
+ to the normal input processing.
+
+ For example, suppose the next line is a comment, or a special
+ string, and you want to read it, but you must make certain that it
+ won't trigger any rules. This version of `getline' allows you to
+ read that line and store it in a variable so that the main
+ read-a-line-and-check-each-rule loop of `awk' never sees it.
+
+ The following example swaps every two lines of input. For
+ example, given:
+
+ wan
+ tew
+ free
+ phore
+
+ it outputs:
+
+ tew
+ wan
+ phore
+ free
+
+ Here's the program:
+
+ awk '{
+ if ((getline tmp) > 0) {
+ print tmp
+ print $0
+ } else
+ print $0
+ }'
+
+ The `getline' function used in this way sets only the variables
+ `NR' and `FNR' (and of course, VAR). The record is not split into
+ fields, so the values of the fields (including `$0') and the value
+ of `NF' do not change.
+
+`getline < FILE'
+ This form of the `getline' function takes its input from the file
+ FILE. Here FILE is a string-valued expression that specifies the
+ file name. `< FILE' is called a "redirection" since it directs
+ input to come from a different place.
+
+ This form is useful if you want to read your input from a
+ particular file, instead of from the main input stream. For
+ example, the following program reads its input record from the
+ file `foo.input' when it encounters a first field with a value
+ equal to 10 in the current input file.
+
+ awk '{
+ if ($1 == 10) {
+ getline < "foo.input"
+ print
+ } else
+ print
+ }'
+
+ Since the main input stream is not used, the values of `NR' and
+ `FNR' are not changed. But the record read is split into fields in
+ the normal manner, so the values of `$0' and other fields are
+ changed. So is the value of `NF'.
+
+ This does not cause the record to be tested against all the
+ patterns in the `awk' program, in the way that would happen if the
+ record were read normally by the main processing loop of `awk'.
+ However the new record is tested against any subsequent rules,
+ just as when `getline' is used without a redirection.
+
+`getline VAR < FILE'
+ This form of the `getline' function takes its input from the file
+ FILE and puts it in the variable VAR. As above, FILE is a
+ string-valued expression that specifies the file from which to
+ read.
+
+ In this version of `getline', none of the built-in variables are
+ changed, and the record is not split into fields. The only
+ variable changed is VAR.
+
+ For example, the following program copies all the input files to
+ the output, except for records that say `@include FILENAME'. Such
+ a record is replaced by the contents of the file FILENAME.
+
+ awk '{
+ if (NF == 2 && $1 == "@include") {
+ while ((getline line < $2) > 0)
+ print line
+ close($2)
+ } else
+ print
+ }'
+
+ Note here how the name of the extra input file is not built into
+ the program; it is taken from the data, from the second field on
+ the `@include' line.
+
+ The `close' function is called to ensure that if two identical
+ `@include' lines appear in the input, the entire specified file is
+ included twice. *Note Closing Input Files and Pipes: Close Input.
+
+ One deficiency of this program is that it does not process nested
+ `@include' statements the way a true macro preprocessor would.
+
+`COMMAND | getline'
+ You can "pipe" the output of a command into `getline'. A pipe is
+ simply a way to link the output of one program to the input of
+ another. In this case, the string COMMAND is run as a shell
+ command and its output is piped into `awk' to be used as input.
+ This form of `getline' reads one record from the pipe.
+
+ For example, the following program copies input to output, except
+ for lines that begin with `@execute', which are replaced by the
+ output produced by running the rest of the line as a shell command:
+
+ awk '{
+ if ($1 == "@execute") {
+ tmp = substr($0, 10)
+ while ((tmp | getline) > 0)
+ print
+ close(tmp)
+ } else
+ print
+ }'
+
+ The `close' function is called to ensure that if two identical
+ `@execute' lines appear in the input, the command is run for each
+ one. *Note Closing Input Files and Pipes: Close Input.
+
+ Given the input:
+
+ foo
+ bar
+ baz
+ @execute who
+ bletch
+
+ the program might produce:
+
+ foo
+ bar
+ baz
+ hack ttyv0 Jul 13 14:22
+ hack ttyp0 Jul 13 14:23 (gnu:0)
+ hack ttyp1 Jul 13 14:23 (gnu:0)
+ hack ttyp2 Jul 13 14:23 (gnu:0)
+ hack ttyp3 Jul 13 14:23 (gnu:0)
+ bletch
+
+ Notice that this program ran the command `who' and printed the
+ result. (If you try this program yourself, you will get different
+ results, showing you who is logged in on your system.)
+
+ This variation of `getline' splits the record into fields, sets the
+ value of `NF' and recomputes the value of `$0'. The values of
+ `NR' and `FNR' are not changed.
+
+`COMMAND | getline VAR'
+ The output of the command COMMAND is sent through a pipe to
+ `getline' and into the variable VAR. For example, the following
+ program reads the current date and time into the variable
+ `current_time', using the `date' utility, and then prints it.
+
+ awk 'BEGIN {
+ "date" | getline current_time
+ close("date")
+ print "Report printed on " current_time
+ }'
+
+ In this version of `getline', none of the built-in variables are
+ changed, and the record is not split into fields.
+
+
+File: gawk.info, Node: Close Input, Prev: Getline, Up: Reading Files
+
+Closing Input Files and Pipes
+=============================
+
+ If the same file name or the same shell command is used with
+`getline' more than once during the execution of an `awk' program, the
+file is opened (or the command is executed) only the first time. At
+that time, the first record of input is read from that file or command.
+The next time the same file or command is used in `getline', another
+record is read from it, and so on.
+
+ This implies that if you want to start reading the same file again
+from the beginning, or if you want to rerun a shell command (rather than
+reading more output from the command), you must take special steps.
+What you must do is use the `close' function, as follows:
+
+ close(FILENAME)
+
+or
+
+ close(COMMAND)
+
+ The argument FILENAME or COMMAND can be any expression. Its value
+must exactly equal the string that was used to open the file or start
+the command--for example, if you open a pipe with this:
+
+ "sort -r names" | getline foo
+
+then you must close it with this:
+
+ close("sort -r names")
+
+ Once this function call is executed, the next `getline' from that
+file or command will reopen the file or rerun the command.
+
+ `close' returns a value of zero if the close succeeded. Otherwise,
+the value will be non-zero. In this case, `gawk' sets the variable
+`ERRNO' to a string describing the error that occurred.
+
+
+File: gawk.info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top
+
+Printing Output
+***************
+
+ One of the most common things that actions do is to output or "print"
+some or all of the input. For simple output, use the `print'
+statement. For fancier formatting use the `printf' statement. Both
+are described in this chapter.
+
+* Menu:
+
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print' statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With `print'.
+* Printf:: The `printf' statement.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in `gawk'.
+ `gawk' allows access to
+ inherited file descriptors.
+
+
+File: gawk.info, Node: Print, Next: Print Examples, Prev: Printing, Up: Printing
+
+The `print' Statement
+=====================
+
+ The `print' statement does output with simple, standardized
+formatting. You specify only the strings or numbers to be printed, in a
+list separated by commas. They are output, separated by single spaces,
+followed by a newline. The statement looks like this:
+
+ print ITEM1, ITEM2, ...
+
+The entire list of items may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a redirection
+(*note Redirecting Output of `print' and `printf': Redirection.). The
+relational operators are `==', `!=', `<', `>', `>=', `<=', `~' and `!~'
+(*note Comparison Expressions: Comparison Ops.).
+
+ The items printed can be constant strings or numbers, fields of the
+current record (such as `$1'), variables, or any `awk' expressions.
+The `print' statement is completely general for computing *what* values
+to print. With two exceptions, you cannot specify *how* to print
+them--how many columns, whether to use exponential notation or not, and
+so on. (*Note Output Separators::, and *Note Controlling Numeric
+Output with `print': OFMT.) For that, you need the `printf' statement
+(*note Using `printf' Statements for Fancier Printing: Printf.).
+
+ The simple statement `print' with no items is equivalent to `print
+$0': it prints the entire current record. To print a blank line, use
+`print ""', where `""' is the null, or empty, string.
+
+ To print a fixed piece of text, use a string constant such as
+`"Hello there"' as one item. If you forget to use the double-quote
+characters, your text will be taken as an `awk' expression, and you
+will probably get an error. Keep in mind that a space is printed
+between any two items.
+
+ Most often, each `print' statement makes one line of output. But it
+isn't limited to one line. If an item value is a string that contains a
+newline, the newline is output along with the rest of the string. A
+single `print' can make any number of lines this way.
+
+
+File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing
+
+Examples of `print' Statements
+==============================
+
+ Here is an example of printing a string that contains embedded
+newlines:
+
+ awk 'BEGIN { print "line one\nline two\nline three" }'
+
+produces output like this:
+
+ line one
+ line two
+ line three
+
+ Here is an example that prints the first two fields of each input
+record, with a space between them:
+
+ awk '{ print $1, $2 }' inventory-shipped
+
+Its output looks like this:
+
+ Jan 13
+ Feb 15
+ Mar 15
+ ...
+
+ A common mistake in using the `print' statement is to omit the comma
+between two items. This often has the effect of making the items run
+together in the output, with no space. The reason for this is that
+juxtaposing two string expressions in `awk' means to concatenate them.
+For example, without the comma:
+
+ awk '{ print $1 $2 }' inventory-shipped
+
+prints:
+
+ Jan13
+ Feb15
+ Mar15
+ ...
+
+ Neither example's output makes much sense to someone unfamiliar with
+the file `inventory-shipped'. A heading line at the beginning would
+make it clearer. Let's add some headings to our table of months (`$1')
+and green crates shipped (`$2'). We do this using the `BEGIN' pattern
+(*note `BEGIN' and `END' Special Patterns: BEGIN/END.) to force the
+headings to be printed only once:
+
+ awk 'BEGIN { print "Month Crates"
+ print "----- ------" }
+ { print $1, $2 }' inventory-shipped
+
+Did you already guess what happens? This program prints the following:
+
+ Month Crates
+ ----- ------
+ Jan 13
+ Feb 15
+ Mar 15
+ ...
+
+The headings and the table data don't line up! We can fix this by
+printing some spaces between the two fields:
+
+ awk 'BEGIN { print "Month Crates"
+ print "----- ------" }
+ { print $1, " ", $2 }' inventory-shipped
+
+ You can imagine that this way of lining up columns can get pretty
+complicated when you have many columns to fix. Counting spaces for two
+or three columns can be simple, but more than this and you can get
+"lost" quite easily. This is why the `printf' statement was created
+(*note Using `printf' Statements for Fancier Printing: Printf.); one of
+its specialties is lining up columns of data.
+