diff options
Diffstat (limited to 'gawk.info-2')
-rw-r--r-- | gawk.info-2 | 1236 |
1 files changed, 0 insertions, 1236 deletions
diff --git a/gawk.info-2 b/gawk.info-2 deleted file mode 100644 index 954b4309..00000000 --- a/gawk.info-2 +++ /dev/null @@ -1,1236 +0,0 @@ -This is Info file gawk.info, produced by Makeinfo-1.54 from the input -file gawk.texi. - - This file documents `awk', a program that you can use to select -particular records in a file and perform operations upon them. - - This is Edition 0.15 of `The GAWK Manual', -for the 2.15 version of the GNU implementation -of AWK. - - Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. - - Permission is granted to make and distribute verbatim copies of this -manual provided the copyright notice and this permission notice are -preserved on all copies. - - Permission is granted to copy and distribute modified versions of -this manual under the conditions for verbatim copying, provided that -the entire resulting derived work is distributed under the terms of a -permission notice identical to this one. - - Permission is granted to copy and distribute translations of this -manual into another language, under the above conditions for modified -versions, except that this permission notice may be stated in a -translation approved by the Foundation. - - -File: gawk.info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started - -`awk' Statements versus Lines -============================= - - Most often, each line in an `awk' program is a separate statement or -separate rule, like this: - - awk '/12/ { print $0 } - /21/ { print $0 }' BBS-list inventory-shipped - - But sometimes statements can be more than one line, and lines can -contain several statements. You can split a statement into multiple -lines by inserting a newline after any of the following: - - , { ? : || && do else - -A newline at any other point is considered the end of the statement. -(Splitting lines after `?' and `:' is a minor `gawk' extension. The -`?' and `:' referred to here is the three operand conditional -expression described in *Note Conditional Expressions: Conditional Exp.) - - If you would like to split a single statement into two lines at a -point where a newline would terminate it, you can "continue" it by -ending the first line with a backslash character, `\'. This is allowed -absolutely anywhere in the statement, even in the middle of a string or -regular expression. For example: - - awk '/This program is too long, so continue it\ - on the next line/ { print $1 }' - -We have generally not used backslash continuation in the sample -programs in this manual. Since in `gawk' there is no limit on the -length of a line, it is never strictly necessary; it just makes -programs prettier. We have preferred to make them even more pretty by -keeping the statements short. Backslash continuation is most useful -when your `awk' program is in a separate source file, instead of typed -in on the command line. You should also note that many `awk' -implementations are more picky about where you may use backslash -continuation. For maximal portability of your `awk' programs, it is -best not to split your lines in the middle of a regular expression or a -string. - - *Warning: backslash continuation does not work as described above -with the C shell.* Continuation with backslash works for `awk' -programs in files, and also for one-shot programs *provided* you are -using a POSIX-compliant shell, such as the Bourne shell or the -Bourne-again shell. But the C shell used on Berkeley Unix behaves -differently! There, you must use two backslashes in a row, followed by -a newline. - - When `awk' statements within one rule are short, you might want to -put more than one of them on a line. You do this by separating the -statements with a semicolon, `;'. This also applies to the rules -themselves. Thus, the previous program could have been written: - - /12/ { print $0 } ; /21/ { print $0 } - -*Note:* the requirement that rules on the same line must be separated -with a semicolon is a recent change in the `awk' language; it was done -for consistency with the treatment of statements within an action. - - -File: gawk.info, Node: When, Prev: Statements/Lines, Up: Getting Started - -When to Use `awk' -================= - - You might wonder how `awk' might be useful for you. Using additional -utility programs, more advanced patterns, field separators, arithmetic -statements, and other selection criteria, you can produce much more -complex output. The `awk' language is very useful for producing -reports from large amounts of raw data, such as summarizing information -from the output of other utility programs like `ls'. (*Note A More -Complex Example: More Complex.) - - Programs written with `awk' are usually much smaller than they would -be in other languages. This makes `awk' programs easy to compose and -use. Often `awk' programs can be quickly composed at your terminal, -used once, and thrown away. Since `awk' programs are interpreted, you -can avoid the usually lengthy edit-compile-test-debug cycle of software -development. - - Complex programs have been written in `awk', including a complete -retargetable assembler for 8-bit microprocessors (*note Glossary::., for -more information) and a microcode assembler for a special purpose Prolog -computer. However, `awk''s capabilities are strained by tasks of such -complexity. - - If you find yourself writing `awk' scripts of more than, say, a few -hundred lines, you might consider using a different programming -language. Emacs Lisp is a good choice if you need sophisticated string -or pattern matching capabilities. The shell is also good at string and -pattern matching; in addition, it allows powerful use of the system -utilities. More conventional languages, such as C, C++, and Lisp, offer -better facilities for system programming and for managing the complexity -of large programs. Programs in these languages may require more lines -of source code than the equivalent `awk' programs, but they are easier -to maintain and usually run more efficiently. - - -File: gawk.info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top - -Reading Input Files -******************* - - In the typical `awk' program, all input is read either from the -standard input (by default the keyboard, but often a pipe from another -command) or from files whose names you specify on the `awk' command -line. If you specify input files, `awk' reads them in order, reading -all the data from one before going on to the next. The name of the -current input file can be found in the built-in variable `FILENAME' -(*note Built-in Variables::.). - - The input is read in units called records, and processed by the -rules one record at a time. By default, each record is one line. Each -record is split automatically into fields, to make it more convenient -for a rule to work on its parts. - - On rare occasions you will need to use the `getline' command, which -can do explicit input from any number of files (*note Explicit Input -with `getline': Getline.). - -* Menu: - -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Non-Constant Fields:: Non-constant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Constant Size:: Reading constant width data. -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program control - using the `getline' function. -* Close Input:: Closing an input file (so you can read from - the beginning once more). - - -File: gawk.info, Node: Records, Next: Fields, Prev: Reading Files, Up: Reading Files - -How Input is Split into Records -=============================== - - The `awk' language divides its input into records and fields. -Records are separated by a character called the "record separator". By -default, the record separator is the newline character, defining a -record to be a single line of text. - - Sometimes you may want to use a different character to separate your -records. You can use a different character by changing the built-in -variable `RS'. The value of `RS' is a string that says how to separate -records; the default value is `"\n"', the string containing just a -newline character. This is why records are, by default, single lines. - - `RS' can have any string as its value, but only the first character -of the string is used as the record separator. The other characters are -ignored. `RS' is exceptional in this regard; `awk' uses the full value -of all its other built-in variables. - - You can change the value of `RS' in the `awk' program with the -assignment operator, `=' (*note Assignment Expressions: Assignment -Ops.). The new record-separator character should be enclosed in -quotation marks to make a string constant. Often the right time to do -this is at the beginning of execution, before any input has been -processed, so that the very first record will be read with the proper -separator. To do this, use the special `BEGIN' pattern (*note `BEGIN' -and `END' Special Patterns: BEGIN/END.). For example: - - awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list - -changes the value of `RS' to `"/"', before reading any input. This is -a string whose first character is a slash; as a result, records are -separated by slashes. Then the input file is read, and the second rule -in the `awk' program (the action with no pattern) prints each record. -Since each `print' statement adds a newline at the end of its output, -the effect of this `awk' program is to copy the input with each slash -changed to a newline. - - Another way to change the record separator is on the command line, -using the variable-assignment feature (*note Invoking `awk': Command -Line.). - - awk '{ print $0 }' RS="/" BBS-list - -This sets `RS' to `/' before processing `BBS-list'. - - Reaching the end of an input file terminates the current input -record, even if the last character in the file is not the character in -`RS'. - - The empty string, `""' (a string of no characters), has a special -meaning as the value of `RS': it means that records are separated only -by blank lines. *Note Multiple-Line Records: Multiple Line, for more -details. - - The `awk' utility keeps track of the number of records that have -been read so far from the current input file. This value is stored in a -built-in variable called `FNR'. It is reset to zero when a new file is -started. Another built-in variable, `NR', is the total number of input -records read so far from all files. It starts at zero but is never -automatically reset to zero. - - If you change the value of `RS' in the middle of an `awk' run, the -new value is used to delimit subsequent records, but the record -currently being processed (and records already processed) are not -affected. - - -File: gawk.info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files - -Examining Fields -================ - - When `awk' reads an input record, the record is automatically -separated or "parsed" by the interpreter into chunks called "fields". -By default, fields are separated by whitespace, like words in a line. -Whitespace in `awk' means any string of one or more spaces and/or tabs; -other characters such as newline, formfeed, and so on, that are -considered whitespace by other languages are *not* considered -whitespace by `awk'. - - The purpose of fields is to make it more convenient for you to refer -to these pieces of the record. You don't have to use them--you can -operate on the whole record if you wish--but fields are what make -simple `awk' programs so powerful. - - To refer to a field in an `awk' program, you use a dollar-sign, `$', -followed by the number of the field you want. Thus, `$1' refers to the -first field, `$2' to the second, and so on. For example, suppose the -following is a line of input: - - This seems like a pretty nice example. - -Here the first field, or `$1', is `This'; the second field, or `$2', is -`seems'; and so on. Note that the last field, `$7', is `example.'. -Because there is no space between the `e' and the `.', the period is -considered part of the seventh field. - - No matter how many fields there are, the last field in a record can -be represented by `$NF'. So, in the example above, `$NF' would be the -same as `$7', which is `example.'. Why this works is explained below -(*note Non-constant Field Numbers: Non-Constant Fields.). If you try -to refer to a field beyond the last one, such as `$8' when the record -has only 7 fields, you get the empty string. - - Plain `NF', with no `$', is a built-in variable whose value is the -number of fields in the current record. - - `$0', which looks like an attempt to refer to the zeroth field, is a -special case: it represents the whole input record. This is what you -would use if you weren't interested in fields. - - Here are some more examples: - - awk '$1 ~ /foo/ { print $0 }' BBS-list - -This example prints each record in the file `BBS-list' whose first -field contains the string `foo'. The operator `~' is called a -"matching operator" (*note Comparison Expressions: Comparison Ops.); it -tests whether a string (here, the field `$1') matches a given regular -expression. - - By contrast, the following example: - - awk '/foo/ { print $1, $NF }' BBS-list - -looks for `foo' in *the entire record* and prints the first field and -the last field for each input record containing a match. - - -File: gawk.info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files - -Non-constant Field Numbers -========================== - - The number of a field does not need to be a constant. Any -expression in the `awk' language can be used after a `$' to refer to a -field. The value of the expression specifies the field number. If the -value is a string, rather than a number, it is converted to a number. -Consider this example: - - awk '{ print $NR }' - -Recall that `NR' is the number of records read so far: 1 in the first -record, 2 in the second, etc. So this example prints the first field -of the first record, the second field of the second record, and so on. -For the twentieth record, field number 20 is printed; most likely, the -record has fewer than 20 fields, so this prints a blank line. - - Here is another example of using expressions as field numbers: - - awk '{ print $(2*2) }' BBS-list - - The `awk' language must evaluate the expression `(2*2)' and use its -value as the number of the field to print. The `*' sign represents -multiplication, so the expression `2*2' evaluates to 4. The -parentheses are used so that the multiplication is done before the `$' -operation; they are necessary whenever there is a binary operator in -the field-number expression. This example, then, prints the hours of -operation (the fourth field) for every line of the file `BBS-list'. - - If the field number you compute is zero, you get the entire record. -Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are -not allowed. - - The number of fields in the current record is stored in the built-in -variable `NF' (*note Built-in Variables::.). The expression `$NF' is -not a special feature: it is the direct consequence of evaluating `NF' -and using its value as a field number. - - -File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files - -Changing the Contents of a Field -================================ - - You can change the contents of a field as seen by `awk' within an -`awk' program; this changes what `awk' perceives as the current input -record. (The actual input is untouched: `awk' never modifies the input -file.) - - Consider this example: - - awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped - -The `-' sign represents subtraction, so this program reassigns field -three, `$3', to be the value of field two minus ten, `$2 - 10'. (*Note -Arithmetic Operators: Arithmetic Ops.) Then field two, and the new -value for field three, are printed. - - In order for this to work, the text in field `$2' must make sense as -a number; the string of characters must be converted to a number in -order for the computer to do arithmetic on it. The number resulting -from the subtraction is converted back to a string of characters which -then becomes field three. *Note Conversion of Strings and Numbers: -Conversion. - - When you change the value of a field (as perceived by `awk'), the -text of the input record is recalculated to contain the new field where -the old one was. Therefore, `$0' changes to reflect the altered field. -Thus, - - awk '{ $2 = $2 - 10; print $0 }' inventory-shipped - -prints a copy of the input file, with 10 subtracted from the second -field of each line. - - You can also assign contents to fields that are out of range. For -example: - - awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped - -We've just created `$6', whose value is the sum of fields `$2', `$3', -`$4', and `$5'. The `+' sign represents addition. For the file -`inventory-shipped', `$6' represents the total number of parcels -shipped for a particular month. - - Creating a new field changes the internal `awk' copy of the current -input record--the value of `$0'. Thus, if you do `print $0' after -adding a field, the record printed includes the new field, with the -appropriate number of field separators between it and the previously -existing fields. - - This recomputation affects and is affected by several features not -yet discussed, in particular, the "output field separator", `OFS', -which is used to separate the fields (*note Output Separators::.), and -`NF' (the number of fields; *note Examining Fields: Fields.). For -example, the value of `NF' is set to the number of the highest field -you create. - - Note, however, that merely *referencing* an out-of-range field does -*not* change the value of either `$0' or `NF'. Referencing an -out-of-range field merely produces a null string. For example: - - if ($(NF+1) != "") - print "can't happen" - else - print "everything is normal" - -should print `everything is normal', because `NF+1' is certain to be -out of range. (*Note The `if' Statement: If Statement, for more -information about `awk''s `if-else' statements.) - - It is important to note that assigning to a field will change the -value of `$0', but will not change the value of `NF', even when you -assign the null string to a field. For example: - - echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }' - -prints - - a::c:d - 4 - -The field is still there, it just has an empty value. You can tell -because there are two colons in a row. - - -File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files - -Specifying how Fields are Separated -=================================== - - (This section is rather long; it describes one of the most -fundamental operations in `awk'. If you are a novice with `awk', we -recommend that you re-read this section after you have studied the -section on regular expressions, *Note Regular Expressions as Patterns: -Regexp.) - - The way `awk' splits an input record into fields is controlled by -the "field separator", which is a single character or a regular -expression. `awk' scans the input record for matches for the -separator; the fields themselves are the text between the matches. For -example, if the field separator is `oo', then the following line: - - moo goo gai pan - -would be split into three fields: `m', ` g' and ` gai pan'. - - The field separator is represented by the built-in variable `FS'. -Shell programmers take note! `awk' does not use the name `IFS' which -is used by the shell. - - You can change the value of `FS' in the `awk' program with the -assignment operator, `=' (*note Assignment Expressions: Assignment -Ops.). Often the right time to do this is at the beginning of -execution, before any input has been processed, so that the very first -record will be read with the proper separator. To do this, use the -special `BEGIN' pattern (*note `BEGIN' and `END' Special Patterns: -BEGIN/END.). For example, here we set the value of `FS' to the string -`","': - - awk 'BEGIN { FS = "," } ; { print $2 }' - -Given the input line, - - John Q. Smith, 29 Oak St., Walamazoo, MI 42139 - -this `awk' program extracts the string ` 29 Oak St.'. - - Sometimes your input data will contain separator characters that -don't separate fields the way you thought they would. For instance, the -person's name in the example we've been using might have a title or -suffix attached, such as `John Q. Smith, LXIX'. From input containing -such a name: - - John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 - -the previous sample program would extract ` LXIX', instead of ` 29 Oak -St.'. If you were expecting the program to print the address, you -would be surprised. So choose your data layout and separator -characters carefully to prevent such problems. - - As you know, by default, fields are separated by whitespace sequences -(spaces and tabs), not by single spaces: two spaces in a row do not -delimit an empty field. The default value of the field separator is a -string `" "' containing a single space. If this value were interpreted -in the usual way, each space character would separate fields, so two -spaces in a row would make an empty field between them. The reason -this does not happen is that a single space as the value of `FS' is a -special case: it is taken to specify the default manner of delimiting -fields. - - If `FS' is any other single character, such as `","', then each -occurrence of that character separates two fields. Two consecutive -occurrences delimit an empty field. If the character occurs at the -beginning or the end of the line, that too delimits an empty field. The -space character is the only single character which does not follow these -rules. - - More generally, the value of `FS' may be a string containing any -regular expression. Then each match in the record for the regular -expression separates fields. For example, the assignment: - - FS = ", \t" - -makes every area of an input line that consists of a comma followed by a -space and a tab, into a field separator. (`\t' stands for a tab.) - - For a less trivial example of a regular expression, suppose you want -single spaces to separate fields the way single commas were used above. -You can set `FS' to `"[ ]"'. This regular expression matches a single -space and nothing else. - - `FS' can be set on the command line. You use the `-F' argument to -do so. For example: - - awk -F, 'PROGRAM' INPUT-FILES - -sets `FS' to be the `,' character. Notice that the argument uses a -capital `F'. Contrast this with `-f', which specifies a file -containing an `awk' program. Case is significant in command options: -the `-F' and `-f' options have nothing to do with each other. You can -use both options at the same time to set the `FS' argument *and* get an -`awk' program from a file. - - The value used for the argument to `-F' is processed in exactly the -same way as assignments to the built-in variable `FS'. This means that -if the field separator contains special characters, they must be escaped -appropriately. For example, to use a `\' as the field separator, you -would have to type: - - # same as FS = "\\" - awk -F\\\\ '...' files ... - -Since `\' is used for quoting in the shell, `awk' will see `-F\\'. -Then `awk' processes the `\\' for escape characters (*note Constant -Expressions: Constants.), finally yielding a single `\' to be used for -the field separator. - - As a special case, in compatibility mode (*note Invoking `awk': -Command Line.), if the argument to `-F' is `t', then `FS' is set to the -tab character. (This is because if you type `-F\t', without the quotes, -at the shell, the `\' gets deleted, so `awk' figures that you really -want your fields to be separated with tabs, and not `t's. Use `-v -FS="t"' on the command line if you really do want to separate your -fields with `t's.) - - For example, let's use an `awk' program file called `baud.awk' that -contains the pattern `/300/', and the action `print $1'. Here is the -program: - - /300/ { print $1 } - - Let's also set `FS' to be the `-' character, and run the program on -the file `BBS-list'. The following command prints a list of the names -of the bulletin boards that operate at 300 baud and the first three -digits of their phone numbers: - - awk -F- -f baud.awk BBS-list - -It produces this output: - - aardvark 555 - alpo - barfly 555 - bites 555 - camelot 555 - core 555 - fooey 555 - foot 555 - macfoo 555 - sdace 555 - sabafoo 555 - -Note the second line of output. If you check the original file, you -will see that the second line looked like this: - - alpo-net 555-3412 2400/1200/300 A - - The `-' as part of the system's name was used as the field -separator, instead of the `-' in the phone number that was originally -intended. This demonstrates why you have to be careful in choosing -your field and record separators. - - The following program searches the system password file, and prints -the entries for users who have no password: - - awk -F: '$2 == ""' /etc/passwd - -Here we use the `-F' option on the command line to set the field -separator. Note that fields in `/etc/passwd' are separated by colons. -The second field represents a user's encrypted password, but if the -field is empty, that user has no password. - - According to the POSIX standard, `awk' is supposed to behave as if -each record is split into fields at the time that it is read. In -particular, this means that you can change the value of `FS' after a -record is read, but before any of the fields are referenced. The value -of the fields (i.e. how they were split) should reflect the old value -of `FS', not the new one. - - However, many implementations of `awk' do not do this. Instead, -they defer splitting the fields until a field reference actually -happens, using the *current* value of `FS'! This behavior can be -difficult to diagnose. The following example illustrates the results of -the two methods. (The `sed' command prints just the first line of -`/etc/passwd'.) - - sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }' - -will usually print - - root - -on an incorrect implementation of `awk', while `gawk' will print -something like - - root:nSijPlPhZZwgE:0:0:Root:/: - - There is an important difference between the two cases of `FS = " "' -(a single blank) and `FS = "[ \t]+"' (which is a regular expression -matching one or more blanks or tabs). For both values of `FS', fields -are separated by runs of blanks and/or tabs. However, when the value of -`FS' is `" "', `awk' will strip leading and trailing whitespace from -the record, and then decide where the fields are. - - For example, the following expression prints `b': - - echo ' a b c d ' | awk '{ print $2 }' - -However, the following prints `a': - - echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }' - -In this case, the first field is null. - - The stripping of leading and trailing whitespace also comes into -play whenever `$0' is recomputed. For instance, this pipeline - - echo ' a b c d' | awk '{ print; $2 = $2; print }' - -produces this output: - - a b c d - a b c d - -The first `print' statement prints the record as it was read, with -leading whitespace intact. The assignment to `$2' rebuilds `$0' by -concatenating `$1' through `$NF' together, separated by the value of -`OFS'. Since the leading whitespace was ignored when finding `$1', it -is not part of the new `$0'. Finally, the last `print' statement -prints the new `$0'. - - The following table summarizes how fields are split, based on the -value of `FS'. - -`FS == " "' - Fields are separated by runs of whitespace. Leading and trailing - whitespace are ignored. This is the default. - -`FS == ANY SINGLE CHARACTER' - Fields are separated by each occurrence of the character. Multiple - successive occurrences delimit empty fields, as do leading and - trailing occurrences. - -`FS == REGEXP' - Fields are separated by occurrences of characters that match - REGEXP. Leading and trailing matches of REGEXP delimit empty - fields. - - -File: gawk.info, Node: Constant Size, Next: Multiple Line, Prev: Field Separators, Up: Reading Files - -Reading Fixed-width Data -======================== - - (This section discusses an advanced, experimental feature. If you -are a novice `awk' user, you may wish to skip it on the first reading.) - - `gawk' 2.13 introduced a new facility for dealing with fixed-width -fields with no distinctive field separator. Data of this nature arises -typically in one of at least two ways: the input for old FORTRAN -programs where numbers are run together, and the output of programs -that did not anticipate the use of their output as input for other -programs. - - An example of the latter is a table where all the columns are lined -up by the use of a variable number of spaces and *empty fields are just -spaces*. Clearly, `awk''s normal field splitting based on `FS' will -not work well in this case. (Although a portable `awk' program can use -a series of `substr' calls on `$0', this is awkward and inefficient for -a large number of fields.) - - The splitting of an input record into fixed-width fields is -specified by assigning a string containing space-separated numbers to -the built-in variable `FIELDWIDTHS'. Each number specifies the width -of the field *including* columns between fields. If you want to ignore -the columns between fields, you can specify the width as a separate -field that is subsequently ignored. - - The following data is the output of the `w' utility. It is useful -to illustrate the use of `FIELDWIDTHS'. - - 10:06pm up 21 days, 14:04, 23 users - User tty login idle JCPU PCPU what - hzuo ttyV0 8:58pm 9 5 vi p24.tex - hzang ttyV3 6:37pm 50 -csh - eklye ttyV5 9:53pm 7 1 em thes.tex - dportein ttyV6 8:17pm 1:47 -csh - gierd ttyD3 10:00pm 1 elm - dave ttyD4 9:47pm 4 4 w - brent ttyp0 26Jun91 4:46 26:46 4:41 bash - dave ttyq4 26Jun9115days 46 46 wnewmail - - The following program takes the above input, converts the idle time -to number of seconds and prints out the first two fields and the -calculated idle time. (This program uses a number of `awk' features -that haven't been introduced yet.) - - BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" } - NR > 2 { - idle = $4 - sub(/^ */, "", idle) # strip leading spaces - if (idle == "") idle = 0 - if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] } - if (idle ~ /days/) { idle *= 24 * 60 * 60 } - - print $1, $2, idle - } - - Here is the result of running the program on the data: - - hzuo ttyV0 0 - hzang ttyV3 50 - eklye ttyV5 0 - dportein ttyV6 107 - gierd ttyD3 1 - dave ttyD4 0 - brent ttyp0 286 - dave ttyq4 1296000 - - Another (possibly more practical) example of fixed-width input data -would be the input from a deck of balloting cards. In some parts of -the United States, voters make their choices by punching holes in -computer cards. These cards are then processed to count the votes for -any particular candidate or on any particular issue. Since a voter may -choose not to vote on some issue, any column on the card may be empty. -An `awk' program for processing such data could use the `FIELDWIDTHS' -feature to simplify reading the data. - - This feature is still experimental, and will likely evolve over time. - - -File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Constant Size, Up: Reading Files - -Multiple-Line Records -===================== - - In some data bases, a single line cannot conveniently hold all the -information in one entry. In such cases, you can use multi-line -records. - - The first step in doing this is to choose your data format: when -records are not defined as single lines, how do you want to define them? -What should separate records? - - One technique is to use an unusual character or string to separate -records. For example, you could use the formfeed character (written -`\f' in `awk', as in C) to separate them, making each record a page of -the file. To do this, just set the variable `RS' to `"\f"' (a string -containing the formfeed character). Any other character could equally -well be used, as long as it won't be part of the data in a record. - - Another technique is to have blank lines separate records. By a -special dispensation, a null string as the value of `RS' indicates that -records are separated by one or more blank lines. If you set `RS' to -the null string, a record always ends at the first blank line -encountered. And the next record doesn't start until the first nonblank -line that follows--no matter how many blank lines appear in a row, they -are considered one record-separator. (End of file is also considered a -record separator.) - - The second step is to separate the fields in the record. One way to -do this is to put each field on a separate line: to do this, just set -the variable `FS' to the string `"\n"'. (This simple regular -expression matches a single newline.) - - Another way to separate fields is to divide each of the lines into -fields in the normal manner. This happens by default as a result of a -special feature: when `RS' is set to the null string, the newline -character *always* acts as a field separator. This is in addition to -whatever field separations result from `FS'. - - The original motivation for this special exception was probably so -that you get useful behavior in the default case (i.e., `FS == " "'). -This feature can be a problem if you really don't want the newline -character to separate fields, since there is no way to prevent it. -However, you can work around this by using the `split' function to -break up the record manually (*note Built-in Functions for String -Manipulation: String Functions.). - - -File: gawk.info, Node: Getline, Next: Close Input, Prev: Multiple Line, Up: Reading Files - -Explicit Input with `getline' -============================= - - So far we have been getting our input files from `awk''s main input -stream--either the standard input (usually your terminal) or the files -specified on the command line. The `awk' language has a special -built-in command called `getline' that can be used to read input under -your explicit control. - - This command is quite complex and should *not* be used by beginners. -It is covered here because this is the chapter on input. The examples -that follow the explanation of the `getline' command include material -that has not been covered yet. Therefore, come back and study the -`getline' command *after* you have reviewed the rest of this manual and -have a good knowledge of how `awk' works. - - `getline' returns 1 if it finds a record, and 0 if the end of the -file is encountered. If there is some error in getting a record, such -as a file that cannot be opened, then `getline' returns -1. In this -case, `gawk' sets the variable `ERRNO' to a string describing the error -that occurred. - - In the following examples, COMMAND stands for a string value that -represents a shell command. - -`getline' - The `getline' command can be used without arguments to read input - from the current input file. All it does in this case is read the - next input record and split it up into fields. This is useful if - you've finished processing the current record, but you want to do - some special processing *right now* on the next record. Here's an - example: - - awk '{ - if (t = index($0, "/*")) { - if (t > 1) - tmp = substr($0, 1, t - 1) - else - tmp = "" - u = index(substr($0, t + 2), "*/") - while (u == 0) { - getline - t = -1 - u = index($0, "*/") - } - if (u <= length($0) - 2) - $0 = tmp substr($0, t + u + 3) - else - $0 = tmp - } - print $0 - }' - - This `awk' program deletes all C-style comments, `/* ... */', - from the input. By replacing the `print $0' with other - statements, you could perform more complicated processing on the - decommented input, like searching for matches of a regular - expression. (This program has a subtle problem--can you spot it?) - - This form of the `getline' command sets `NF' (the number of - fields; *note Examining Fields: Fields.), `NR' (the number of - records read so far; *note How Input is Split into Records: - Records.), `FNR' (the number of records read from this input - file), and the value of `$0'. - - *Note:* the new value of `$0' is used in testing the patterns of - any subsequent rules. The original value of `$0' that triggered - the rule which executed `getline' is lost. By contrast, the - `next' statement reads a new record but immediately begins - processing it normally, starting with the first rule in the - program. *Note The `next' Statement: Next Statement. - -`getline VAR' - This form of `getline' reads a record into the variable VAR. This - is useful when you want your program to read the next record from - the current input file, but you don't want to subject the record - to the normal input processing. - - For example, suppose the next line is a comment, or a special - string, and you want to read it, but you must make certain that it - won't trigger any rules. This version of `getline' allows you to - read that line and store it in a variable so that the main - read-a-line-and-check-each-rule loop of `awk' never sees it. - - The following example swaps every two lines of input. For - example, given: - - wan - tew - free - phore - - it outputs: - - tew - wan - phore - free - - Here's the program: - - awk '{ - if ((getline tmp) > 0) { - print tmp - print $0 - } else - print $0 - }' - - The `getline' function used in this way sets only the variables - `NR' and `FNR' (and of course, VAR). The record is not split into - fields, so the values of the fields (including `$0') and the value - of `NF' do not change. - -`getline < FILE' - This form of the `getline' function takes its input from the file - FILE. Here FILE is a string-valued expression that specifies the - file name. `< FILE' is called a "redirection" since it directs - input to come from a different place. - - This form is useful if you want to read your input from a - particular file, instead of from the main input stream. For - example, the following program reads its input record from the - file `foo.input' when it encounters a first field with a value - equal to 10 in the current input file. - - awk '{ - if ($1 == 10) { - getline < "foo.input" - print - } else - print - }' - - Since the main input stream is not used, the values of `NR' and - `FNR' are not changed. But the record read is split into fields in - the normal manner, so the values of `$0' and other fields are - changed. So is the value of `NF'. - - This does not cause the record to be tested against all the - patterns in the `awk' program, in the way that would happen if the - record were read normally by the main processing loop of `awk'. - However the new record is tested against any subsequent rules, - just as when `getline' is used without a redirection. - -`getline VAR < FILE' - This form of the `getline' function takes its input from the file - FILE and puts it in the variable VAR. As above, FILE is a - string-valued expression that specifies the file from which to - read. - - In this version of `getline', none of the built-in variables are - changed, and the record is not split into fields. The only - variable changed is VAR. - - For example, the following program copies all the input files to - the output, except for records that say `@include FILENAME'. Such - a record is replaced by the contents of the file FILENAME. - - awk '{ - if (NF == 2 && $1 == "@include") { - while ((getline line < $2) > 0) - print line - close($2) - } else - print - }' - - Note here how the name of the extra input file is not built into - the program; it is taken from the data, from the second field on - the `@include' line. - - The `close' function is called to ensure that if two identical - `@include' lines appear in the input, the entire specified file is - included twice. *Note Closing Input Files and Pipes: Close Input. - - One deficiency of this program is that it does not process nested - `@include' statements the way a true macro preprocessor would. - -`COMMAND | getline' - You can "pipe" the output of a command into `getline'. A pipe is - simply a way to link the output of one program to the input of - another. In this case, the string COMMAND is run as a shell - command and its output is piped into `awk' to be used as input. - This form of `getline' reads one record from the pipe. - - For example, the following program copies input to output, except - for lines that begin with `@execute', which are replaced by the - output produced by running the rest of the line as a shell command: - - awk '{ - if ($1 == "@execute") { - tmp = substr($0, 10) - while ((tmp | getline) > 0) - print - close(tmp) - } else - print - }' - - The `close' function is called to ensure that if two identical - `@execute' lines appear in the input, the command is run for each - one. *Note Closing Input Files and Pipes: Close Input. - - Given the input: - - foo - bar - baz - @execute who - bletch - - the program might produce: - - foo - bar - baz - hack ttyv0 Jul 13 14:22 - hack ttyp0 Jul 13 14:23 (gnu:0) - hack ttyp1 Jul 13 14:23 (gnu:0) - hack ttyp2 Jul 13 14:23 (gnu:0) - hack ttyp3 Jul 13 14:23 (gnu:0) - bletch - - Notice that this program ran the command `who' and printed the - result. (If you try this program yourself, you will get different - results, showing you who is logged in on your system.) - - This variation of `getline' splits the record into fields, sets the - value of `NF' and recomputes the value of `$0'. The values of - `NR' and `FNR' are not changed. - -`COMMAND | getline VAR' - The output of the command COMMAND is sent through a pipe to - `getline' and into the variable VAR. For example, the following - program reads the current date and time into the variable - `current_time', using the `date' utility, and then prints it. - - awk 'BEGIN { - "date" | getline current_time - close("date") - print "Report printed on " current_time - }' - - In this version of `getline', none of the built-in variables are - changed, and the record is not split into fields. - - -File: gawk.info, Node: Close Input, Prev: Getline, Up: Reading Files - -Closing Input Files and Pipes -============================= - - If the same file name or the same shell command is used with -`getline' more than once during the execution of an `awk' program, the -file is opened (or the command is executed) only the first time. At -that time, the first record of input is read from that file or command. -The next time the same file or command is used in `getline', another -record is read from it, and so on. - - This implies that if you want to start reading the same file again -from the beginning, or if you want to rerun a shell command (rather than -reading more output from the command), you must take special steps. -What you must do is use the `close' function, as follows: - - close(FILENAME) - -or - - close(COMMAND) - - The argument FILENAME or COMMAND can be any expression. Its value -must exactly equal the string that was used to open the file or start -the command--for example, if you open a pipe with this: - - "sort -r names" | getline foo - -then you must close it with this: - - close("sort -r names") - - Once this function call is executed, the next `getline' from that -file or command will reopen the file or rerun the command. - - `close' returns a value of zero if the close succeeded. Otherwise, -the value will be non-zero. In this case, `gawk' sets the variable -`ERRNO' to a string describing the error that occurred. - - -File: gawk.info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top - -Printing Output -*************** - - One of the most common things that actions do is to output or "print" -some or all of the input. For simple output, use the `print' -statement. For fancier formatting use the `printf' statement. Both -are described in this chapter. - -* Menu: - -* Print:: The `print' statement. -* Print Examples:: Simple examples of `print' statements. -* Output Separators:: The output separators and how to change them. -* OFMT:: Controlling Numeric Output With `print'. -* Printf:: The `printf' statement. -* Redirection:: How to redirect output to multiple - files and pipes. -* Special Files:: File name interpretation in `gawk'. - `gawk' allows access to - inherited file descriptors. - - -File: gawk.info, Node: Print, Next: Print Examples, Prev: Printing, Up: Printing - -The `print' Statement -===================== - - The `print' statement does output with simple, standardized -formatting. You specify only the strings or numbers to be printed, in a -list separated by commas. They are output, separated by single spaces, -followed by a newline. The statement looks like this: - - print ITEM1, ITEM2, ... - -The entire list of items may optionally be enclosed in parentheses. The -parentheses are necessary if any of the item expressions uses a -relational operator; otherwise it could be confused with a redirection -(*note Redirecting Output of `print' and `printf': Redirection.). The -relational operators are `==', `!=', `<', `>', `>=', `<=', `~' and `!~' -(*note Comparison Expressions: Comparison Ops.). - - The items printed can be constant strings or numbers, fields of the -current record (such as `$1'), variables, or any `awk' expressions. -The `print' statement is completely general for computing *what* values -to print. With two exceptions, you cannot specify *how* to print -them--how many columns, whether to use exponential notation or not, and -so on. (*Note Output Separators::, and *Note Controlling Numeric -Output with `print': OFMT.) For that, you need the `printf' statement -(*note Using `printf' Statements for Fancier Printing: Printf.). - - The simple statement `print' with no items is equivalent to `print -$0': it prints the entire current record. To print a blank line, use -`print ""', where `""' is the null, or empty, string. - - To print a fixed piece of text, use a string constant such as -`"Hello there"' as one item. If you forget to use the double-quote -characters, your text will be taken as an `awk' expression, and you -will probably get an error. Keep in mind that a space is printed -between any two items. - - Most often, each `print' statement makes one line of output. But it -isn't limited to one line. If an item value is a string that contains a -newline, the newline is output along with the rest of the string. A -single `print' can make any number of lines this way. - - -File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing - -Examples of `print' Statements -============================== - - Here is an example of printing a string that contains embedded -newlines: - - awk 'BEGIN { print "line one\nline two\nline three" }' - -produces output like this: - - line one - line two - line three - - Here is an example that prints the first two fields of each input -record, with a space between them: - - awk '{ print $1, $2 }' inventory-shipped - -Its output looks like this: - - Jan 13 - Feb 15 - Mar 15 - ... - - A common mistake in using the `print' statement is to omit the comma -between two items. This often has the effect of making the items run -together in the output, with no space. The reason for this is that -juxtaposing two string expressions in `awk' means to concatenate them. -For example, without the comma: - - awk '{ print $1 $2 }' inventory-shipped - -prints: - - Jan13 - Feb15 - Mar15 - ... - - Neither example's output makes much sense to someone unfamiliar with -the file `inventory-shipped'. A heading line at the beginning would -make it clearer. Let's add some headings to our table of months (`$1') -and green crates shipped (`$2'). We do this using the `BEGIN' pattern -(*note `BEGIN' and `END' Special Patterns: BEGIN/END.) to force the -headings to be printed only once: - - awk 'BEGIN { print "Month Crates" - print "----- ------" } - { print $1, $2 }' inventory-shipped - -Did you already guess what happens? This program prints the following: - - Month Crates - ----- ------ - Jan 13 - Feb 15 - Mar 15 - ... - -The headings and the table data don't line up! We can fix this by -printing some spaces between the two fields: - - awk 'BEGIN { print "Month Crates" - print "----- ------" } - { print $1, " ", $2 }' inventory-shipped - - You can imagine that this way of lining up columns can get pretty -complicated when you have many columns to fix. Counting spaces for two -or three columns can be simple, but more than this and you can get -"lost" quite easily. This is why the `printf' statement was created -(*note Using `printf' Statements for Fancier Printing: Printf.); one of -its specialties is lining up columns of data. - |