aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.info-2
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.info-2')
-rw-r--r--gawk.info-21236
1 files changed, 0 insertions, 1236 deletions
diff --git a/gawk.info-2 b/gawk.info-2
deleted file mode 100644
index 954b4309..00000000
--- a/gawk.info-2
+++ /dev/null
@@ -1,1236 +0,0 @@
-This is Info file gawk.info, produced by Makeinfo-1.54 from the input
-file gawk.texi.
-
- This file documents `awk', a program that you can use to select
-particular records in a file and perform operations upon them.
-
- This is Edition 0.15 of `The GAWK Manual',
-for the 2.15 version of the GNU implementation
-of AWK.
-
- Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
-manual provided the copyright notice and this permission notice are
-preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
-this manual under the conditions for verbatim copying, provided that
-the entire resulting derived work is distributed under the terms of a
-permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
-manual into another language, under the above conditions for modified
-versions, except that this permission notice may be stated in a
-translation approved by the Foundation.
-
-
-File: gawk.info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started
-
-`awk' Statements versus Lines
-=============================
-
- Most often, each line in an `awk' program is a separate statement or
-separate rule, like this:
-
- awk '/12/ { print $0 }
- /21/ { print $0 }' BBS-list inventory-shipped
-
- But sometimes statements can be more than one line, and lines can
-contain several statements. You can split a statement into multiple
-lines by inserting a newline after any of the following:
-
- , { ? : || && do else
-
-A newline at any other point is considered the end of the statement.
-(Splitting lines after `?' and `:' is a minor `gawk' extension. The
-`?' and `:' referred to here is the three operand conditional
-expression described in *Note Conditional Expressions: Conditional Exp.)
-
- If you would like to split a single statement into two lines at a
-point where a newline would terminate it, you can "continue" it by
-ending the first line with a backslash character, `\'. This is allowed
-absolutely anywhere in the statement, even in the middle of a string or
-regular expression. For example:
-
- awk '/This program is too long, so continue it\
- on the next line/ { print $1 }'
-
-We have generally not used backslash continuation in the sample
-programs in this manual. Since in `gawk' there is no limit on the
-length of a line, it is never strictly necessary; it just makes
-programs prettier. We have preferred to make them even more pretty by
-keeping the statements short. Backslash continuation is most useful
-when your `awk' program is in a separate source file, instead of typed
-in on the command line. You should also note that many `awk'
-implementations are more picky about where you may use backslash
-continuation. For maximal portability of your `awk' programs, it is
-best not to split your lines in the middle of a regular expression or a
-string.
-
- *Warning: backslash continuation does not work as described above
-with the C shell.* Continuation with backslash works for `awk'
-programs in files, and also for one-shot programs *provided* you are
-using a POSIX-compliant shell, such as the Bourne shell or the
-Bourne-again shell. But the C shell used on Berkeley Unix behaves
-differently! There, you must use two backslashes in a row, followed by
-a newline.
-
- When `awk' statements within one rule are short, you might want to
-put more than one of them on a line. You do this by separating the
-statements with a semicolon, `;'. This also applies to the rules
-themselves. Thus, the previous program could have been written:
-
- /12/ { print $0 } ; /21/ { print $0 }
-
-*Note:* the requirement that rules on the same line must be separated
-with a semicolon is a recent change in the `awk' language; it was done
-for consistency with the treatment of statements within an action.
-
-
-File: gawk.info, Node: When, Prev: Statements/Lines, Up: Getting Started
-
-When to Use `awk'
-=================
-
- You might wonder how `awk' might be useful for you. Using additional
-utility programs, more advanced patterns, field separators, arithmetic
-statements, and other selection criteria, you can produce much more
-complex output. The `awk' language is very useful for producing
-reports from large amounts of raw data, such as summarizing information
-from the output of other utility programs like `ls'. (*Note A More
-Complex Example: More Complex.)
-
- Programs written with `awk' are usually much smaller than they would
-be in other languages. This makes `awk' programs easy to compose and
-use. Often `awk' programs can be quickly composed at your terminal,
-used once, and thrown away. Since `awk' programs are interpreted, you
-can avoid the usually lengthy edit-compile-test-debug cycle of software
-development.
-
- Complex programs have been written in `awk', including a complete
-retargetable assembler for 8-bit microprocessors (*note Glossary::., for
-more information) and a microcode assembler for a special purpose Prolog
-computer. However, `awk''s capabilities are strained by tasks of such
-complexity.
-
- If you find yourself writing `awk' scripts of more than, say, a few
-hundred lines, you might consider using a different programming
-language. Emacs Lisp is a good choice if you need sophisticated string
-or pattern matching capabilities. The shell is also good at string and
-pattern matching; in addition, it allows powerful use of the system
-utilities. More conventional languages, such as C, C++, and Lisp, offer
-better facilities for system programming and for managing the complexity
-of large programs. Programs in these languages may require more lines
-of source code than the equivalent `awk' programs, but they are easier
-to maintain and usually run more efficiently.
-
-
-File: gawk.info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top
-
-Reading Input Files
-*******************
-
- In the typical `awk' program, all input is read either from the
-standard input (by default the keyboard, but often a pipe from another
-command) or from files whose names you specify on the `awk' command
-line. If you specify input files, `awk' reads them in order, reading
-all the data from one before going on to the next. The name of the
-current input file can be found in the built-in variable `FILENAME'
-(*note Built-in Variables::.).
-
- The input is read in units called records, and processed by the
-rules one record at a time. By default, each record is one line. Each
-record is split automatically into fields, to make it more convenient
-for a rule to work on its parts.
-
- On rare occasions you will need to use the `getline' command, which
-can do explicit input from any number of files (*note Explicit Input
-with `getline': Getline.).
-
-* Menu:
-
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Non-Constant Fields:: Non-constant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Constant Size:: Reading constant width data.
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program control
- using the `getline' function.
-* Close Input:: Closing an input file (so you can read from
- the beginning once more).
-
-
-File: gawk.info, Node: Records, Next: Fields, Prev: Reading Files, Up: Reading Files
-
-How Input is Split into Records
-===============================
-
- The `awk' language divides its input into records and fields.
-Records are separated by a character called the "record separator". By
-default, the record separator is the newline character, defining a
-record to be a single line of text.
-
- Sometimes you may want to use a different character to separate your
-records. You can use a different character by changing the built-in
-variable `RS'. The value of `RS' is a string that says how to separate
-records; the default value is `"\n"', the string containing just a
-newline character. This is why records are, by default, single lines.
-
- `RS' can have any string as its value, but only the first character
-of the string is used as the record separator. The other characters are
-ignored. `RS' is exceptional in this regard; `awk' uses the full value
-of all its other built-in variables.
-
- You can change the value of `RS' in the `awk' program with the
-assignment operator, `=' (*note Assignment Expressions: Assignment
-Ops.). The new record-separator character should be enclosed in
-quotation marks to make a string constant. Often the right time to do
-this is at the beginning of execution, before any input has been
-processed, so that the very first record will be read with the proper
-separator. To do this, use the special `BEGIN' pattern (*note `BEGIN'
-and `END' Special Patterns: BEGIN/END.). For example:
-
- awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
-
-changes the value of `RS' to `"/"', before reading any input. This is
-a string whose first character is a slash; as a result, records are
-separated by slashes. Then the input file is read, and the second rule
-in the `awk' program (the action with no pattern) prints each record.
-Since each `print' statement adds a newline at the end of its output,
-the effect of this `awk' program is to copy the input with each slash
-changed to a newline.
-
- Another way to change the record separator is on the command line,
-using the variable-assignment feature (*note Invoking `awk': Command
-Line.).
-
- awk '{ print $0 }' RS="/" BBS-list
-
-This sets `RS' to `/' before processing `BBS-list'.
-
- Reaching the end of an input file terminates the current input
-record, even if the last character in the file is not the character in
-`RS'.
-
- The empty string, `""' (a string of no characters), has a special
-meaning as the value of `RS': it means that records are separated only
-by blank lines. *Note Multiple-Line Records: Multiple Line, for more
-details.
-
- The `awk' utility keeps track of the number of records that have
-been read so far from the current input file. This value is stored in a
-built-in variable called `FNR'. It is reset to zero when a new file is
-started. Another built-in variable, `NR', is the total number of input
-records read so far from all files. It starts at zero but is never
-automatically reset to zero.
-
- If you change the value of `RS' in the middle of an `awk' run, the
-new value is used to delimit subsequent records, but the record
-currently being processed (and records already processed) are not
-affected.
-
-
-File: gawk.info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files
-
-Examining Fields
-================
-
- When `awk' reads an input record, the record is automatically
-separated or "parsed" by the interpreter into chunks called "fields".
-By default, fields are separated by whitespace, like words in a line.
-Whitespace in `awk' means any string of one or more spaces and/or tabs;
-other characters such as newline, formfeed, and so on, that are
-considered whitespace by other languages are *not* considered
-whitespace by `awk'.
-
- The purpose of fields is to make it more convenient for you to refer
-to these pieces of the record. You don't have to use them--you can
-operate on the whole record if you wish--but fields are what make
-simple `awk' programs so powerful.
-
- To refer to a field in an `awk' program, you use a dollar-sign, `$',
-followed by the number of the field you want. Thus, `$1' refers to the
-first field, `$2' to the second, and so on. For example, suppose the
-following is a line of input:
-
- This seems like a pretty nice example.
-
-Here the first field, or `$1', is `This'; the second field, or `$2', is
-`seems'; and so on. Note that the last field, `$7', is `example.'.
-Because there is no space between the `e' and the `.', the period is
-considered part of the seventh field.
-
- No matter how many fields there are, the last field in a record can
-be represented by `$NF'. So, in the example above, `$NF' would be the
-same as `$7', which is `example.'. Why this works is explained below
-(*note Non-constant Field Numbers: Non-Constant Fields.). If you try
-to refer to a field beyond the last one, such as `$8' when the record
-has only 7 fields, you get the empty string.
-
- Plain `NF', with no `$', is a built-in variable whose value is the
-number of fields in the current record.
-
- `$0', which looks like an attempt to refer to the zeroth field, is a
-special case: it represents the whole input record. This is what you
-would use if you weren't interested in fields.
-
- Here are some more examples:
-
- awk '$1 ~ /foo/ { print $0 }' BBS-list
-
-This example prints each record in the file `BBS-list' whose first
-field contains the string `foo'. The operator `~' is called a
-"matching operator" (*note Comparison Expressions: Comparison Ops.); it
-tests whether a string (here, the field `$1') matches a given regular
-expression.
-
- By contrast, the following example:
-
- awk '/foo/ { print $1, $NF }' BBS-list
-
-looks for `foo' in *the entire record* and prints the first field and
-the last field for each input record containing a match.
-
-
-File: gawk.info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files
-
-Non-constant Field Numbers
-==========================
-
- The number of a field does not need to be a constant. Any
-expression in the `awk' language can be used after a `$' to refer to a
-field. The value of the expression specifies the field number. If the
-value is a string, rather than a number, it is converted to a number.
-Consider this example:
-
- awk '{ print $NR }'
-
-Recall that `NR' is the number of records read so far: 1 in the first
-record, 2 in the second, etc. So this example prints the first field
-of the first record, the second field of the second record, and so on.
-For the twentieth record, field number 20 is printed; most likely, the
-record has fewer than 20 fields, so this prints a blank line.
-
- Here is another example of using expressions as field numbers:
-
- awk '{ print $(2*2) }' BBS-list
-
- The `awk' language must evaluate the expression `(2*2)' and use its
-value as the number of the field to print. The `*' sign represents
-multiplication, so the expression `2*2' evaluates to 4. The
-parentheses are used so that the multiplication is done before the `$'
-operation; they are necessary whenever there is a binary operator in
-the field-number expression. This example, then, prints the hours of
-operation (the fourth field) for every line of the file `BBS-list'.
-
- If the field number you compute is zero, you get the entire record.
-Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are
-not allowed.
-
- The number of fields in the current record is stored in the built-in
-variable `NF' (*note Built-in Variables::.). The expression `$NF' is
-not a special feature: it is the direct consequence of evaluating `NF'
-and using its value as a field number.
-
-
-File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files
-
-Changing the Contents of a Field
-================================
-
- You can change the contents of a field as seen by `awk' within an
-`awk' program; this changes what `awk' perceives as the current input
-record. (The actual input is untouched: `awk' never modifies the input
-file.)
-
- Consider this example:
-
- awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
-
-The `-' sign represents subtraction, so this program reassigns field
-three, `$3', to be the value of field two minus ten, `$2 - 10'. (*Note
-Arithmetic Operators: Arithmetic Ops.) Then field two, and the new
-value for field three, are printed.
-
- In order for this to work, the text in field `$2' must make sense as
-a number; the string of characters must be converted to a number in
-order for the computer to do arithmetic on it. The number resulting
-from the subtraction is converted back to a string of characters which
-then becomes field three. *Note Conversion of Strings and Numbers:
-Conversion.
-
- When you change the value of a field (as perceived by `awk'), the
-text of the input record is recalculated to contain the new field where
-the old one was. Therefore, `$0' changes to reflect the altered field.
-Thus,
-
- awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
-
-prints a copy of the input file, with 10 subtracted from the second
-field of each line.
-
- You can also assign contents to fields that are out of range. For
-example:
-
- awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
-
-We've just created `$6', whose value is the sum of fields `$2', `$3',
-`$4', and `$5'. The `+' sign represents addition. For the file
-`inventory-shipped', `$6' represents the total number of parcels
-shipped for a particular month.
-
- Creating a new field changes the internal `awk' copy of the current
-input record--the value of `$0'. Thus, if you do `print $0' after
-adding a field, the record printed includes the new field, with the
-appropriate number of field separators between it and the previously
-existing fields.
-
- This recomputation affects and is affected by several features not
-yet discussed, in particular, the "output field separator", `OFS',
-which is used to separate the fields (*note Output Separators::.), and
-`NF' (the number of fields; *note Examining Fields: Fields.). For
-example, the value of `NF' is set to the number of the highest field
-you create.
-
- Note, however, that merely *referencing* an out-of-range field does
-*not* change the value of either `$0' or `NF'. Referencing an
-out-of-range field merely produces a null string. For example:
-
- if ($(NF+1) != "")
- print "can't happen"
- else
- print "everything is normal"
-
-should print `everything is normal', because `NF+1' is certain to be
-out of range. (*Note The `if' Statement: If Statement, for more
-information about `awk''s `if-else' statements.)
-
- It is important to note that assigning to a field will change the
-value of `$0', but will not change the value of `NF', even when you
-assign the null string to a field. For example:
-
- echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }'
-
-prints
-
- a::c:d
- 4
-
-The field is still there, it just has an empty value. You can tell
-because there are two colons in a row.
-
-
-File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files
-
-Specifying how Fields are Separated
-===================================
-
- (This section is rather long; it describes one of the most
-fundamental operations in `awk'. If you are a novice with `awk', we
-recommend that you re-read this section after you have studied the
-section on regular expressions, *Note Regular Expressions as Patterns:
-Regexp.)
-
- The way `awk' splits an input record into fields is controlled by
-the "field separator", which is a single character or a regular
-expression. `awk' scans the input record for matches for the
-separator; the fields themselves are the text between the matches. For
-example, if the field separator is `oo', then the following line:
-
- moo goo gai pan
-
-would be split into three fields: `m', ` g' and ` gai pan'.
-
- The field separator is represented by the built-in variable `FS'.
-Shell programmers take note! `awk' does not use the name `IFS' which
-is used by the shell.
-
- You can change the value of `FS' in the `awk' program with the
-assignment operator, `=' (*note Assignment Expressions: Assignment
-Ops.). Often the right time to do this is at the beginning of
-execution, before any input has been processed, so that the very first
-record will be read with the proper separator. To do this, use the
-special `BEGIN' pattern (*note `BEGIN' and `END' Special Patterns:
-BEGIN/END.). For example, here we set the value of `FS' to the string
-`","':
-
- awk 'BEGIN { FS = "," } ; { print $2 }'
-
-Given the input line,
-
- John Q. Smith, 29 Oak St., Walamazoo, MI 42139
-
-this `awk' program extracts the string ` 29 Oak St.'.
-
- Sometimes your input data will contain separator characters that
-don't separate fields the way you thought they would. For instance, the
-person's name in the example we've been using might have a title or
-suffix attached, such as `John Q. Smith, LXIX'. From input containing
-such a name:
-
- John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
-
-the previous sample program would extract ` LXIX', instead of ` 29 Oak
-St.'. If you were expecting the program to print the address, you
-would be surprised. So choose your data layout and separator
-characters carefully to prevent such problems.
-
- As you know, by default, fields are separated by whitespace sequences
-(spaces and tabs), not by single spaces: two spaces in a row do not
-delimit an empty field. The default value of the field separator is a
-string `" "' containing a single space. If this value were interpreted
-in the usual way, each space character would separate fields, so two
-spaces in a row would make an empty field between them. The reason
-this does not happen is that a single space as the value of `FS' is a
-special case: it is taken to specify the default manner of delimiting
-fields.
-
- If `FS' is any other single character, such as `","', then each
-occurrence of that character separates two fields. Two consecutive
-occurrences delimit an empty field. If the character occurs at the
-beginning or the end of the line, that too delimits an empty field. The
-space character is the only single character which does not follow these
-rules.
-
- More generally, the value of `FS' may be a string containing any
-regular expression. Then each match in the record for the regular
-expression separates fields. For example, the assignment:
-
- FS = ", \t"
-
-makes every area of an input line that consists of a comma followed by a
-space and a tab, into a field separator. (`\t' stands for a tab.)
-
- For a less trivial example of a regular expression, suppose you want
-single spaces to separate fields the way single commas were used above.
-You can set `FS' to `"[ ]"'. This regular expression matches a single
-space and nothing else.
-
- `FS' can be set on the command line. You use the `-F' argument to
-do so. For example:
-
- awk -F, 'PROGRAM' INPUT-FILES
-
-sets `FS' to be the `,' character. Notice that the argument uses a
-capital `F'. Contrast this with `-f', which specifies a file
-containing an `awk' program. Case is significant in command options:
-the `-F' and `-f' options have nothing to do with each other. You can
-use both options at the same time to set the `FS' argument *and* get an
-`awk' program from a file.
-
- The value used for the argument to `-F' is processed in exactly the
-same way as assignments to the built-in variable `FS'. This means that
-if the field separator contains special characters, they must be escaped
-appropriately. For example, to use a `\' as the field separator, you
-would have to type:
-
- # same as FS = "\\"
- awk -F\\\\ '...' files ...
-
-Since `\' is used for quoting in the shell, `awk' will see `-F\\'.
-Then `awk' processes the `\\' for escape characters (*note Constant
-Expressions: Constants.), finally yielding a single `\' to be used for
-the field separator.
-
- As a special case, in compatibility mode (*note Invoking `awk':
-Command Line.), if the argument to `-F' is `t', then `FS' is set to the
-tab character. (This is because if you type `-F\t', without the quotes,
-at the shell, the `\' gets deleted, so `awk' figures that you really
-want your fields to be separated with tabs, and not `t's. Use `-v
-FS="t"' on the command line if you really do want to separate your
-fields with `t's.)
-
- For example, let's use an `awk' program file called `baud.awk' that
-contains the pattern `/300/', and the action `print $1'. Here is the
-program:
-
- /300/ { print $1 }
-
- Let's also set `FS' to be the `-' character, and run the program on
-the file `BBS-list'. The following command prints a list of the names
-of the bulletin boards that operate at 300 baud and the first three
-digits of their phone numbers:
-
- awk -F- -f baud.awk BBS-list
-
-It produces this output:
-
- aardvark 555
- alpo
- barfly 555
- bites 555
- camelot 555
- core 555
- fooey 555
- foot 555
- macfoo 555
- sdace 555
- sabafoo 555
-
-Note the second line of output. If you check the original file, you
-will see that the second line looked like this:
-
- alpo-net 555-3412 2400/1200/300 A
-
- The `-' as part of the system's name was used as the field
-separator, instead of the `-' in the phone number that was originally
-intended. This demonstrates why you have to be careful in choosing
-your field and record separators.
-
- The following program searches the system password file, and prints
-the entries for users who have no password:
-
- awk -F: '$2 == ""' /etc/passwd
-
-Here we use the `-F' option on the command line to set the field
-separator. Note that fields in `/etc/passwd' are separated by colons.
-The second field represents a user's encrypted password, but if the
-field is empty, that user has no password.
-
- According to the POSIX standard, `awk' is supposed to behave as if
-each record is split into fields at the time that it is read. In
-particular, this means that you can change the value of `FS' after a
-record is read, but before any of the fields are referenced. The value
-of the fields (i.e. how they were split) should reflect the old value
-of `FS', not the new one.
-
- However, many implementations of `awk' do not do this. Instead,
-they defer splitting the fields until a field reference actually
-happens, using the *current* value of `FS'! This behavior can be
-difficult to diagnose. The following example illustrates the results of
-the two methods. (The `sed' command prints just the first line of
-`/etc/passwd'.)
-
- sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
-
-will usually print
-
- root
-
-on an incorrect implementation of `awk', while `gawk' will print
-something like
-
- root:nSijPlPhZZwgE:0:0:Root:/:
-
- There is an important difference between the two cases of `FS = " "'
-(a single blank) and `FS = "[ \t]+"' (which is a regular expression
-matching one or more blanks or tabs). For both values of `FS', fields
-are separated by runs of blanks and/or tabs. However, when the value of
-`FS' is `" "', `awk' will strip leading and trailing whitespace from
-the record, and then decide where the fields are.
-
- For example, the following expression prints `b':
-
- echo ' a b c d ' | awk '{ print $2 }'
-
-However, the following prints `a':
-
- echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }'
-
-In this case, the first field is null.
-
- The stripping of leading and trailing whitespace also comes into
-play whenever `$0' is recomputed. For instance, this pipeline
-
- echo ' a b c d' | awk '{ print; $2 = $2; print }'
-
-produces this output:
-
- a b c d
- a b c d
-
-The first `print' statement prints the record as it was read, with
-leading whitespace intact. The assignment to `$2' rebuilds `$0' by
-concatenating `$1' through `$NF' together, separated by the value of
-`OFS'. Since the leading whitespace was ignored when finding `$1', it
-is not part of the new `$0'. Finally, the last `print' statement
-prints the new `$0'.
-
- The following table summarizes how fields are split, based on the
-value of `FS'.
-
-`FS == " "'
- Fields are separated by runs of whitespace. Leading and trailing
- whitespace are ignored. This is the default.
-
-`FS == ANY SINGLE CHARACTER'
- Fields are separated by each occurrence of the character. Multiple
- successive occurrences delimit empty fields, as do leading and
- trailing occurrences.
-
-`FS == REGEXP'
- Fields are separated by occurrences of characters that match
- REGEXP. Leading and trailing matches of REGEXP delimit empty
- fields.
-
-
-File: gawk.info, Node: Constant Size, Next: Multiple Line, Prev: Field Separators, Up: Reading Files
-
-Reading Fixed-width Data
-========================
-
- (This section discusses an advanced, experimental feature. If you
-are a novice `awk' user, you may wish to skip it on the first reading.)
-
- `gawk' 2.13 introduced a new facility for dealing with fixed-width
-fields with no distinctive field separator. Data of this nature arises
-typically in one of at least two ways: the input for old FORTRAN
-programs where numbers are run together, and the output of programs
-that did not anticipate the use of their output as input for other
-programs.
-
- An example of the latter is a table where all the columns are lined
-up by the use of a variable number of spaces and *empty fields are just
-spaces*. Clearly, `awk''s normal field splitting based on `FS' will
-not work well in this case. (Although a portable `awk' program can use
-a series of `substr' calls on `$0', this is awkward and inefficient for
-a large number of fields.)
-
- The splitting of an input record into fixed-width fields is
-specified by assigning a string containing space-separated numbers to
-the built-in variable `FIELDWIDTHS'. Each number specifies the width
-of the field *including* columns between fields. If you want to ignore
-the columns between fields, you can specify the width as a separate
-field that is subsequently ignored.
-
- The following data is the output of the `w' utility. It is useful
-to illustrate the use of `FIELDWIDTHS'.
-
- 10:06pm up 21 days, 14:04, 23 users
- User tty login idle JCPU PCPU what
- hzuo ttyV0 8:58pm 9 5 vi p24.tex
- hzang ttyV3 6:37pm 50 -csh
- eklye ttyV5 9:53pm 7 1 em thes.tex
- dportein ttyV6 8:17pm 1:47 -csh
- gierd ttyD3 10:00pm 1 elm
- dave ttyD4 9:47pm 4 4 w
- brent ttyp0 26Jun91 4:46 26:46 4:41 bash
- dave ttyq4 26Jun9115days 46 46 wnewmail
-
- The following program takes the above input, converts the idle time
-to number of seconds and prints out the first two fields and the
-calculated idle time. (This program uses a number of `awk' features
-that haven't been introduced yet.)
-
- BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
- NR > 2 {
- idle = $4
- sub(/^ */, "", idle) # strip leading spaces
- if (idle == "") idle = 0
- if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] }
- if (idle ~ /days/) { idle *= 24 * 60 * 60 }
-
- print $1, $2, idle
- }
-
- Here is the result of running the program on the data:
-
- hzuo ttyV0 0
- hzang ttyV3 50
- eklye ttyV5 0
- dportein ttyV6 107
- gierd ttyD3 1
- dave ttyD4 0
- brent ttyp0 286
- dave ttyq4 1296000
-
- Another (possibly more practical) example of fixed-width input data
-would be the input from a deck of balloting cards. In some parts of
-the United States, voters make their choices by punching holes in
-computer cards. These cards are then processed to count the votes for
-any particular candidate or on any particular issue. Since a voter may
-choose not to vote on some issue, any column on the card may be empty.
-An `awk' program for processing such data could use the `FIELDWIDTHS'
-feature to simplify reading the data.
-
- This feature is still experimental, and will likely evolve over time.
-
-
-File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Constant Size, Up: Reading Files
-
-Multiple-Line Records
-=====================
-
- In some data bases, a single line cannot conveniently hold all the
-information in one entry. In such cases, you can use multi-line
-records.
-
- The first step in doing this is to choose your data format: when
-records are not defined as single lines, how do you want to define them?
-What should separate records?
-
- One technique is to use an unusual character or string to separate
-records. For example, you could use the formfeed character (written
-`\f' in `awk', as in C) to separate them, making each record a page of
-the file. To do this, just set the variable `RS' to `"\f"' (a string
-containing the formfeed character). Any other character could equally
-well be used, as long as it won't be part of the data in a record.
-
- Another technique is to have blank lines separate records. By a
-special dispensation, a null string as the value of `RS' indicates that
-records are separated by one or more blank lines. If you set `RS' to
-the null string, a record always ends at the first blank line
-encountered. And the next record doesn't start until the first nonblank
-line that follows--no matter how many blank lines appear in a row, they
-are considered one record-separator. (End of file is also considered a
-record separator.)
-
- The second step is to separate the fields in the record. One way to
-do this is to put each field on a separate line: to do this, just set
-the variable `FS' to the string `"\n"'. (This simple regular
-expression matches a single newline.)
-
- Another way to separate fields is to divide each of the lines into
-fields in the normal manner. This happens by default as a result of a
-special feature: when `RS' is set to the null string, the newline
-character *always* acts as a field separator. This is in addition to
-whatever field separations result from `FS'.
-
- The original motivation for this special exception was probably so
-that you get useful behavior in the default case (i.e., `FS == " "').
-This feature can be a problem if you really don't want the newline
-character to separate fields, since there is no way to prevent it.
-However, you can work around this by using the `split' function to
-break up the record manually (*note Built-in Functions for String
-Manipulation: String Functions.).
-
-
-File: gawk.info, Node: Getline, Next: Close Input, Prev: Multiple Line, Up: Reading Files
-
-Explicit Input with `getline'
-=============================
-
- So far we have been getting our input files from `awk''s main input
-stream--either the standard input (usually your terminal) or the files
-specified on the command line. The `awk' language has a special
-built-in command called `getline' that can be used to read input under
-your explicit control.
-
- This command is quite complex and should *not* be used by beginners.
-It is covered here because this is the chapter on input. The examples
-that follow the explanation of the `getline' command include material
-that has not been covered yet. Therefore, come back and study the
-`getline' command *after* you have reviewed the rest of this manual and
-have a good knowledge of how `awk' works.
-
- `getline' returns 1 if it finds a record, and 0 if the end of the
-file is encountered. If there is some error in getting a record, such
-as a file that cannot be opened, then `getline' returns -1. In this
-case, `gawk' sets the variable `ERRNO' to a string describing the error
-that occurred.
-
- In the following examples, COMMAND stands for a string value that
-represents a shell command.
-
-`getline'
- The `getline' command can be used without arguments to read input
- from the current input file. All it does in this case is read the
- next input record and split it up into fields. This is useful if
- you've finished processing the current record, but you want to do
- some special processing *right now* on the next record. Here's an
- example:
-
- awk '{
- if (t = index($0, "/*")) {
- if (t > 1)
- tmp = substr($0, 1, t - 1)
- else
- tmp = ""
- u = index(substr($0, t + 2), "*/")
- while (u == 0) {
- getline
- t = -1
- u = index($0, "*/")
- }
- if (u <= length($0) - 2)
- $0 = tmp substr($0, t + u + 3)
- else
- $0 = tmp
- }
- print $0
- }'
-
- This `awk' program deletes all C-style comments, `/* ... */',
- from the input. By replacing the `print $0' with other
- statements, you could perform more complicated processing on the
- decommented input, like searching for matches of a regular
- expression. (This program has a subtle problem--can you spot it?)
-
- This form of the `getline' command sets `NF' (the number of
- fields; *note Examining Fields: Fields.), `NR' (the number of
- records read so far; *note How Input is Split into Records:
- Records.), `FNR' (the number of records read from this input
- file), and the value of `$0'.
-
- *Note:* the new value of `$0' is used in testing the patterns of
- any subsequent rules. The original value of `$0' that triggered
- the rule which executed `getline' is lost. By contrast, the
- `next' statement reads a new record but immediately begins
- processing it normally, starting with the first rule in the
- program. *Note The `next' Statement: Next Statement.
-
-`getline VAR'
- This form of `getline' reads a record into the variable VAR. This
- is useful when you want your program to read the next record from
- the current input file, but you don't want to subject the record
- to the normal input processing.
-
- For example, suppose the next line is a comment, or a special
- string, and you want to read it, but you must make certain that it
- won't trigger any rules. This version of `getline' allows you to
- read that line and store it in a variable so that the main
- read-a-line-and-check-each-rule loop of `awk' never sees it.
-
- The following example swaps every two lines of input. For
- example, given:
-
- wan
- tew
- free
- phore
-
- it outputs:
-
- tew
- wan
- phore
- free
-
- Here's the program:
-
- awk '{
- if ((getline tmp) > 0) {
- print tmp
- print $0
- } else
- print $0
- }'
-
- The `getline' function used in this way sets only the variables
- `NR' and `FNR' (and of course, VAR). The record is not split into
- fields, so the values of the fields (including `$0') and the value
- of `NF' do not change.
-
-`getline < FILE'
- This form of the `getline' function takes its input from the file
- FILE. Here FILE is a string-valued expression that specifies the
- file name. `< FILE' is called a "redirection" since it directs
- input to come from a different place.
-
- This form is useful if you want to read your input from a
- particular file, instead of from the main input stream. For
- example, the following program reads its input record from the
- file `foo.input' when it encounters a first field with a value
- equal to 10 in the current input file.
-
- awk '{
- if ($1 == 10) {
- getline < "foo.input"
- print
- } else
- print
- }'
-
- Since the main input stream is not used, the values of `NR' and
- `FNR' are not changed. But the record read is split into fields in
- the normal manner, so the values of `$0' and other fields are
- changed. So is the value of `NF'.
-
- This does not cause the record to be tested against all the
- patterns in the `awk' program, in the way that would happen if the
- record were read normally by the main processing loop of `awk'.
- However the new record is tested against any subsequent rules,
- just as when `getline' is used without a redirection.
-
-`getline VAR < FILE'
- This form of the `getline' function takes its input from the file
- FILE and puts it in the variable VAR. As above, FILE is a
- string-valued expression that specifies the file from which to
- read.
-
- In this version of `getline', none of the built-in variables are
- changed, and the record is not split into fields. The only
- variable changed is VAR.
-
- For example, the following program copies all the input files to
- the output, except for records that say `@include FILENAME'. Such
- a record is replaced by the contents of the file FILENAME.
-
- awk '{
- if (NF == 2 && $1 == "@include") {
- while ((getline line < $2) > 0)
- print line
- close($2)
- } else
- print
- }'
-
- Note here how the name of the extra input file is not built into
- the program; it is taken from the data, from the second field on
- the `@include' line.
-
- The `close' function is called to ensure that if two identical
- `@include' lines appear in the input, the entire specified file is
- included twice. *Note Closing Input Files and Pipes: Close Input.
-
- One deficiency of this program is that it does not process nested
- `@include' statements the way a true macro preprocessor would.
-
-`COMMAND | getline'
- You can "pipe" the output of a command into `getline'. A pipe is
- simply a way to link the output of one program to the input of
- another. In this case, the string COMMAND is run as a shell
- command and its output is piped into `awk' to be used as input.
- This form of `getline' reads one record from the pipe.
-
- For example, the following program copies input to output, except
- for lines that begin with `@execute', which are replaced by the
- output produced by running the rest of the line as a shell command:
-
- awk '{
- if ($1 == "@execute") {
- tmp = substr($0, 10)
- while ((tmp | getline) > 0)
- print
- close(tmp)
- } else
- print
- }'
-
- The `close' function is called to ensure that if two identical
- `@execute' lines appear in the input, the command is run for each
- one. *Note Closing Input Files and Pipes: Close Input.
-
- Given the input:
-
- foo
- bar
- baz
- @execute who
- bletch
-
- the program might produce:
-
- foo
- bar
- baz
- hack ttyv0 Jul 13 14:22
- hack ttyp0 Jul 13 14:23 (gnu:0)
- hack ttyp1 Jul 13 14:23 (gnu:0)
- hack ttyp2 Jul 13 14:23 (gnu:0)
- hack ttyp3 Jul 13 14:23 (gnu:0)
- bletch
-
- Notice that this program ran the command `who' and printed the
- result. (If you try this program yourself, you will get different
- results, showing you who is logged in on your system.)
-
- This variation of `getline' splits the record into fields, sets the
- value of `NF' and recomputes the value of `$0'. The values of
- `NR' and `FNR' are not changed.
-
-`COMMAND | getline VAR'
- The output of the command COMMAND is sent through a pipe to
- `getline' and into the variable VAR. For example, the following
- program reads the current date and time into the variable
- `current_time', using the `date' utility, and then prints it.
-
- awk 'BEGIN {
- "date" | getline current_time
- close("date")
- print "Report printed on " current_time
- }'
-
- In this version of `getline', none of the built-in variables are
- changed, and the record is not split into fields.
-
-
-File: gawk.info, Node: Close Input, Prev: Getline, Up: Reading Files
-
-Closing Input Files and Pipes
-=============================
-
- If the same file name or the same shell command is used with
-`getline' more than once during the execution of an `awk' program, the
-file is opened (or the command is executed) only the first time. At
-that time, the first record of input is read from that file or command.
-The next time the same file or command is used in `getline', another
-record is read from it, and so on.
-
- This implies that if you want to start reading the same file again
-from the beginning, or if you want to rerun a shell command (rather than
-reading more output from the command), you must take special steps.
-What you must do is use the `close' function, as follows:
-
- close(FILENAME)
-
-or
-
- close(COMMAND)
-
- The argument FILENAME or COMMAND can be any expression. Its value
-must exactly equal the string that was used to open the file or start
-the command--for example, if you open a pipe with this:
-
- "sort -r names" | getline foo
-
-then you must close it with this:
-
- close("sort -r names")
-
- Once this function call is executed, the next `getline' from that
-file or command will reopen the file or rerun the command.
-
- `close' returns a value of zero if the close succeeded. Otherwise,
-the value will be non-zero. In this case, `gawk' sets the variable
-`ERRNO' to a string describing the error that occurred.
-
-
-File: gawk.info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top
-
-Printing Output
-***************
-
- One of the most common things that actions do is to output or "print"
-some or all of the input. For simple output, use the `print'
-statement. For fancier formatting use the `printf' statement. Both
-are described in this chapter.
-
-* Menu:
-
-* Print:: The `print' statement.
-* Print Examples:: Simple examples of `print' statements.
-* Output Separators:: The output separators and how to change them.
-* OFMT:: Controlling Numeric Output With `print'.
-* Printf:: The `printf' statement.
-* Redirection:: How to redirect output to multiple
- files and pipes.
-* Special Files:: File name interpretation in `gawk'.
- `gawk' allows access to
- inherited file descriptors.
-
-
-File: gawk.info, Node: Print, Next: Print Examples, Prev: Printing, Up: Printing
-
-The `print' Statement
-=====================
-
- The `print' statement does output with simple, standardized
-formatting. You specify only the strings or numbers to be printed, in a
-list separated by commas. They are output, separated by single spaces,
-followed by a newline. The statement looks like this:
-
- print ITEM1, ITEM2, ...
-
-The entire list of items may optionally be enclosed in parentheses. The
-parentheses are necessary if any of the item expressions uses a
-relational operator; otherwise it could be confused with a redirection
-(*note Redirecting Output of `print' and `printf': Redirection.). The
-relational operators are `==', `!=', `<', `>', `>=', `<=', `~' and `!~'
-(*note Comparison Expressions: Comparison Ops.).
-
- The items printed can be constant strings or numbers, fields of the
-current record (such as `$1'), variables, or any `awk' expressions.
-The `print' statement is completely general for computing *what* values
-to print. With two exceptions, you cannot specify *how* to print
-them--how many columns, whether to use exponential notation or not, and
-so on. (*Note Output Separators::, and *Note Controlling Numeric
-Output with `print': OFMT.) For that, you need the `printf' statement
-(*note Using `printf' Statements for Fancier Printing: Printf.).
-
- The simple statement `print' with no items is equivalent to `print
-$0': it prints the entire current record. To print a blank line, use
-`print ""', where `""' is the null, or empty, string.
-
- To print a fixed piece of text, use a string constant such as
-`"Hello there"' as one item. If you forget to use the double-quote
-characters, your text will be taken as an `awk' expression, and you
-will probably get an error. Keep in mind that a space is printed
-between any two items.
-
- Most often, each `print' statement makes one line of output. But it
-isn't limited to one line. If an item value is a string that contains a
-newline, the newline is output along with the rest of the string. A
-single `print' can make any number of lines this way.
-
-
-File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing
-
-Examples of `print' Statements
-==============================
-
- Here is an example of printing a string that contains embedded
-newlines:
-
- awk 'BEGIN { print "line one\nline two\nline three" }'
-
-produces output like this:
-
- line one
- line two
- line three
-
- Here is an example that prints the first two fields of each input
-record, with a space between them:
-
- awk '{ print $1, $2 }' inventory-shipped
-
-Its output looks like this:
-
- Jan 13
- Feb 15
- Mar 15
- ...
-
- A common mistake in using the `print' statement is to omit the comma
-between two items. This often has the effect of making the items run
-together in the output, with no space. The reason for this is that
-juxtaposing two string expressions in `awk' means to concatenate them.
-For example, without the comma:
-
- awk '{ print $1 $2 }' inventory-shipped
-
-prints:
-
- Jan13
- Feb15
- Mar15
- ...
-
- Neither example's output makes much sense to someone unfamiliar with
-the file `inventory-shipped'. A heading line at the beginning would
-make it clearer. Let's add some headings to our table of months (`$1')
-and green crates shipped (`$2'). We do this using the `BEGIN' pattern
-(*note `BEGIN' and `END' Special Patterns: BEGIN/END.) to force the
-headings to be printed only once:
-
- awk 'BEGIN { print "Month Crates"
- print "----- ------" }
- { print $1, $2 }' inventory-shipped
-
-Did you already guess what happens? This program prints the following:
-
- Month Crates
- ----- ------
- Jan 13
- Feb 15
- Mar 15
- ...
-
-The headings and the table data don't line up! We can fix this by
-printing some spaces between the two fields:
-
- awk 'BEGIN { print "Month Crates"
- print "----- ------" }
- { print $1, " ", $2 }' inventory-shipped
-
- You can imagine that this way of lining up columns can get pretty
-complicated when you have many columns to fix. Counting spaces for two
-or three columns can be simple, but more than this and you can get
-"lost" quite easily. This is why the `printf' statement was created
-(*note Using `printf' Statements for Fancier Printing: Printf.); one of
-its specialties is lining up columns of data.
-