diff options
Diffstat (limited to 'gawk.info-2')
-rw-r--r-- | gawk.info-2 | 1236 |
1 files changed, 1236 insertions, 0 deletions
diff --git a/gawk.info-2 b/gawk.info-2 new file mode 100644 index 00000000..954b4309 --- /dev/null +++ b/gawk.info-2 @@ -0,0 +1,1236 @@ +This is Info file gawk.info, produced by Makeinfo-1.54 from the input +file gawk.texi. + + This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + + This is Edition 0.15 of `The GAWK Manual', +for the 2.15 version of the GNU implementation +of AWK. + + Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + +File: gawk.info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started + +`awk' Statements versus Lines +============================= + + Most often, each line in an `awk' program is a separate statement or +separate rule, like this: + + awk '/12/ { print $0 } + /21/ { print $0 }' BBS-list inventory-shipped + + But sometimes statements can be more than one line, and lines can +contain several statements. You can split a statement into multiple +lines by inserting a newline after any of the following: + + , { ? : || && do else + +A newline at any other point is considered the end of the statement. +(Splitting lines after `?' and `:' is a minor `gawk' extension. The +`?' and `:' referred to here is the three operand conditional +expression described in *Note Conditional Expressions: Conditional Exp.) + + If you would like to split a single statement into two lines at a +point where a newline would terminate it, you can "continue" it by +ending the first line with a backslash character, `\'. This is allowed +absolutely anywhere in the statement, even in the middle of a string or +regular expression. For example: + + awk '/This program is too long, so continue it\ + on the next line/ { print $1 }' + +We have generally not used backslash continuation in the sample +programs in this manual. Since in `gawk' there is no limit on the +length of a line, it is never strictly necessary; it just makes +programs prettier. We have preferred to make them even more pretty by +keeping the statements short. Backslash continuation is most useful +when your `awk' program is in a separate source file, instead of typed +in on the command line. You should also note that many `awk' +implementations are more picky about where you may use backslash +continuation. For maximal portability of your `awk' programs, it is +best not to split your lines in the middle of a regular expression or a +string. + + *Warning: backslash continuation does not work as described above +with the C shell.* Continuation with backslash works for `awk' +programs in files, and also for one-shot programs *provided* you are +using a POSIX-compliant shell, such as the Bourne shell or the +Bourne-again shell. But the C shell used on Berkeley Unix behaves +differently! There, you must use two backslashes in a row, followed by +a newline. + + When `awk' statements within one rule are short, you might want to +put more than one of them on a line. You do this by separating the +statements with a semicolon, `;'. This also applies to the rules +themselves. Thus, the previous program could have been written: + + /12/ { print $0 } ; /21/ { print $0 } + +*Note:* the requirement that rules on the same line must be separated +with a semicolon is a recent change in the `awk' language; it was done +for consistency with the treatment of statements within an action. + + +File: gawk.info, Node: When, Prev: Statements/Lines, Up: Getting Started + +When to Use `awk' +================= + + You might wonder how `awk' might be useful for you. Using additional +utility programs, more advanced patterns, field separators, arithmetic +statements, and other selection criteria, you can produce much more +complex output. The `awk' language is very useful for producing +reports from large amounts of raw data, such as summarizing information +from the output of other utility programs like `ls'. (*Note A More +Complex Example: More Complex.) + + Programs written with `awk' are usually much smaller than they would +be in other languages. This makes `awk' programs easy to compose and +use. Often `awk' programs can be quickly composed at your terminal, +used once, and thrown away. Since `awk' programs are interpreted, you +can avoid the usually lengthy edit-compile-test-debug cycle of software +development. + + Complex programs have been written in `awk', including a complete +retargetable assembler for 8-bit microprocessors (*note Glossary::., for +more information) and a microcode assembler for a special purpose Prolog +computer. However, `awk''s capabilities are strained by tasks of such +complexity. + + If you find yourself writing `awk' scripts of more than, say, a few +hundred lines, you might consider using a different programming +language. Emacs Lisp is a good choice if you need sophisticated string +or pattern matching capabilities. The shell is also good at string and +pattern matching; in addition, it allows powerful use of the system +utilities. More conventional languages, such as C, C++, and Lisp, offer +better facilities for system programming and for managing the complexity +of large programs. Programs in these languages may require more lines +of source code than the equivalent `awk' programs, but they are easier +to maintain and usually run more efficiently. + + +File: gawk.info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top + +Reading Input Files +******************* + + In the typical `awk' program, all input is read either from the +standard input (by default the keyboard, but often a pipe from another +command) or from files whose names you specify on the `awk' command +line. If you specify input files, `awk' reads them in order, reading +all the data from one before going on to the next. The name of the +current input file can be found in the built-in variable `FILENAME' +(*note Built-in Variables::.). + + The input is read in units called records, and processed by the +rules one record at a time. By default, each record is one line. Each +record is split automatically into fields, to make it more convenient +for a rule to work on its parts. + + On rare occasions you will need to use the `getline' command, which +can do explicit input from any number of files (*note Explicit Input +with `getline': Getline.). + +* Menu: + +* Records:: Controlling how data is split into records. +* Fields:: An introduction to fields. +* Non-Constant Fields:: Non-constant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change it. +* Constant Size:: Reading constant width data. +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program control + using the `getline' function. +* Close Input:: Closing an input file (so you can read from + the beginning once more). + + +File: gawk.info, Node: Records, Next: Fields, Prev: Reading Files, Up: Reading Files + +How Input is Split into Records +=============================== + + The `awk' language divides its input into records and fields. +Records are separated by a character called the "record separator". By +default, the record separator is the newline character, defining a +record to be a single line of text. + + Sometimes you may want to use a different character to separate your +records. You can use a different character by changing the built-in +variable `RS'. The value of `RS' is a string that says how to separate +records; the default value is `"\n"', the string containing just a +newline character. This is why records are, by default, single lines. + + `RS' can have any string as its value, but only the first character +of the string is used as the record separator. The other characters are +ignored. `RS' is exceptional in this regard; `awk' uses the full value +of all its other built-in variables. + + You can change the value of `RS' in the `awk' program with the +assignment operator, `=' (*note Assignment Expressions: Assignment +Ops.). The new record-separator character should be enclosed in +quotation marks to make a string constant. Often the right time to do +this is at the beginning of execution, before any input has been +processed, so that the very first record will be read with the proper +separator. To do this, use the special `BEGIN' pattern (*note `BEGIN' +and `END' Special Patterns: BEGIN/END.). For example: + + awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list + +changes the value of `RS' to `"/"', before reading any input. This is +a string whose first character is a slash; as a result, records are +separated by slashes. Then the input file is read, and the second rule +in the `awk' program (the action with no pattern) prints each record. +Since each `print' statement adds a newline at the end of its output, +the effect of this `awk' program is to copy the input with each slash +changed to a newline. + + Another way to change the record separator is on the command line, +using the variable-assignment feature (*note Invoking `awk': Command +Line.). + + awk '{ print $0 }' RS="/" BBS-list + +This sets `RS' to `/' before processing `BBS-list'. + + Reaching the end of an input file terminates the current input +record, even if the last character in the file is not the character in +`RS'. + + The empty string, `""' (a string of no characters), has a special +meaning as the value of `RS': it means that records are separated only +by blank lines. *Note Multiple-Line Records: Multiple Line, for more +details. + + The `awk' utility keeps track of the number of records that have +been read so far from the current input file. This value is stored in a +built-in variable called `FNR'. It is reset to zero when a new file is +started. Another built-in variable, `NR', is the total number of input +records read so far from all files. It starts at zero but is never +automatically reset to zero. + + If you change the value of `RS' in the middle of an `awk' run, the +new value is used to delimit subsequent records, but the record +currently being processed (and records already processed) are not +affected. + + +File: gawk.info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files + +Examining Fields +================ + + When `awk' reads an input record, the record is automatically +separated or "parsed" by the interpreter into chunks called "fields". +By default, fields are separated by whitespace, like words in a line. +Whitespace in `awk' means any string of one or more spaces and/or tabs; +other characters such as newline, formfeed, and so on, that are +considered whitespace by other languages are *not* considered +whitespace by `awk'. + + The purpose of fields is to make it more convenient for you to refer +to these pieces of the record. You don't have to use them--you can +operate on the whole record if you wish--but fields are what make +simple `awk' programs so powerful. + + To refer to a field in an `awk' program, you use a dollar-sign, `$', +followed by the number of the field you want. Thus, `$1' refers to the +first field, `$2' to the second, and so on. For example, suppose the +following is a line of input: + + This seems like a pretty nice example. + +Here the first field, or `$1', is `This'; the second field, or `$2', is +`seems'; and so on. Note that the last field, `$7', is `example.'. +Because there is no space between the `e' and the `.', the period is +considered part of the seventh field. + + No matter how many fields there are, the last field in a record can +be represented by `$NF'. So, in the example above, `$NF' would be the +same as `$7', which is `example.'. Why this works is explained below +(*note Non-constant Field Numbers: Non-Constant Fields.). If you try +to refer to a field beyond the last one, such as `$8' when the record +has only 7 fields, you get the empty string. + + Plain `NF', with no `$', is a built-in variable whose value is the +number of fields in the current record. + + `$0', which looks like an attempt to refer to the zeroth field, is a +special case: it represents the whole input record. This is what you +would use if you weren't interested in fields. + + Here are some more examples: + + awk '$1 ~ /foo/ { print $0 }' BBS-list + +This example prints each record in the file `BBS-list' whose first +field contains the string `foo'. The operator `~' is called a +"matching operator" (*note Comparison Expressions: Comparison Ops.); it +tests whether a string (here, the field `$1') matches a given regular +expression. + + By contrast, the following example: + + awk '/foo/ { print $1, $NF }' BBS-list + +looks for `foo' in *the entire record* and prints the first field and +the last field for each input record containing a match. + + +File: gawk.info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files + +Non-constant Field Numbers +========================== + + The number of a field does not need to be a constant. Any +expression in the `awk' language can be used after a `$' to refer to a +field. The value of the expression specifies the field number. If the +value is a string, rather than a number, it is converted to a number. +Consider this example: + + awk '{ print $NR }' + +Recall that `NR' is the number of records read so far: 1 in the first +record, 2 in the second, etc. So this example prints the first field +of the first record, the second field of the second record, and so on. +For the twentieth record, field number 20 is printed; most likely, the +record has fewer than 20 fields, so this prints a blank line. + + Here is another example of using expressions as field numbers: + + awk '{ print $(2*2) }' BBS-list + + The `awk' language must evaluate the expression `(2*2)' and use its +value as the number of the field to print. The `*' sign represents +multiplication, so the expression `2*2' evaluates to 4. The +parentheses are used so that the multiplication is done before the `$' +operation; they are necessary whenever there is a binary operator in +the field-number expression. This example, then, prints the hours of +operation (the fourth field) for every line of the file `BBS-list'. + + If the field number you compute is zero, you get the entire record. +Thus, `$(2-2)' has the same value as `$0'. Negative field numbers are +not allowed. + + The number of fields in the current record is stored in the built-in +variable `NF' (*note Built-in Variables::.). The expression `$NF' is +not a special feature: it is the direct consequence of evaluating `NF' +and using its value as a field number. + + +File: gawk.info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files + +Changing the Contents of a Field +================================ + + You can change the contents of a field as seen by `awk' within an +`awk' program; this changes what `awk' perceives as the current input +record. (The actual input is untouched: `awk' never modifies the input +file.) + + Consider this example: + + awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped + +The `-' sign represents subtraction, so this program reassigns field +three, `$3', to be the value of field two minus ten, `$2 - 10'. (*Note +Arithmetic Operators: Arithmetic Ops.) Then field two, and the new +value for field three, are printed. + + In order for this to work, the text in field `$2' must make sense as +a number; the string of characters must be converted to a number in +order for the computer to do arithmetic on it. The number resulting +from the subtraction is converted back to a string of characters which +then becomes field three. *Note Conversion of Strings and Numbers: +Conversion. + + When you change the value of a field (as perceived by `awk'), the +text of the input record is recalculated to contain the new field where +the old one was. Therefore, `$0' changes to reflect the altered field. +Thus, + + awk '{ $2 = $2 - 10; print $0 }' inventory-shipped + +prints a copy of the input file, with 10 subtracted from the second +field of each line. + + You can also assign contents to fields that are out of range. For +example: + + awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped + +We've just created `$6', whose value is the sum of fields `$2', `$3', +`$4', and `$5'. The `+' sign represents addition. For the file +`inventory-shipped', `$6' represents the total number of parcels +shipped for a particular month. + + Creating a new field changes the internal `awk' copy of the current +input record--the value of `$0'. Thus, if you do `print $0' after +adding a field, the record printed includes the new field, with the +appropriate number of field separators between it and the previously +existing fields. + + This recomputation affects and is affected by several features not +yet discussed, in particular, the "output field separator", `OFS', +which is used to separate the fields (*note Output Separators::.), and +`NF' (the number of fields; *note Examining Fields: Fields.). For +example, the value of `NF' is set to the number of the highest field +you create. + + Note, however, that merely *referencing* an out-of-range field does +*not* change the value of either `$0' or `NF'. Referencing an +out-of-range field merely produces a null string. For example: + + if ($(NF+1) != "") + print "can't happen" + else + print "everything is normal" + +should print `everything is normal', because `NF+1' is certain to be +out of range. (*Note The `if' Statement: If Statement, for more +information about `awk''s `if-else' statements.) + + It is important to note that assigning to a field will change the +value of `$0', but will not change the value of `NF', even when you +assign the null string to a field. For example: + + echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }' + +prints + + a::c:d + 4 + +The field is still there, it just has an empty value. You can tell +because there are two colons in a row. + + +File: gawk.info, Node: Field Separators, Next: Constant Size, Prev: Changing Fields, Up: Reading Files + +Specifying how Fields are Separated +=================================== + + (This section is rather long; it describes one of the most +fundamental operations in `awk'. If you are a novice with `awk', we +recommend that you re-read this section after you have studied the +section on regular expressions, *Note Regular Expressions as Patterns: +Regexp.) + + The way `awk' splits an input record into fields is controlled by +the "field separator", which is a single character or a regular +expression. `awk' scans the input record for matches for the +separator; the fields themselves are the text between the matches. For +example, if the field separator is `oo', then the following line: + + moo goo gai pan + +would be split into three fields: `m', ` g' and ` gai pan'. + + The field separator is represented by the built-in variable `FS'. +Shell programmers take note! `awk' does not use the name `IFS' which +is used by the shell. + + You can change the value of `FS' in the `awk' program with the +assignment operator, `=' (*note Assignment Expressions: Assignment +Ops.). Often the right time to do this is at the beginning of +execution, before any input has been processed, so that the very first +record will be read with the proper separator. To do this, use the +special `BEGIN' pattern (*note `BEGIN' and `END' Special Patterns: +BEGIN/END.). For example, here we set the value of `FS' to the string +`","': + + awk 'BEGIN { FS = "," } ; { print $2 }' + +Given the input line, + + John Q. Smith, 29 Oak St., Walamazoo, MI 42139 + +this `awk' program extracts the string ` 29 Oak St.'. + + Sometimes your input data will contain separator characters that +don't separate fields the way you thought they would. For instance, the +person's name in the example we've been using might have a title or +suffix attached, such as `John Q. Smith, LXIX'. From input containing +such a name: + + John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 + +the previous sample program would extract ` LXIX', instead of ` 29 Oak +St.'. If you were expecting the program to print the address, you +would be surprised. So choose your data layout and separator +characters carefully to prevent such problems. + + As you know, by default, fields are separated by whitespace sequences +(spaces and tabs), not by single spaces: two spaces in a row do not +delimit an empty field. The default value of the field separator is a +string `" "' containing a single space. If this value were interpreted +in the usual way, each space character would separate fields, so two +spaces in a row would make an empty field between them. The reason +this does not happen is that a single space as the value of `FS' is a +special case: it is taken to specify the default manner of delimiting +fields. + + If `FS' is any other single character, such as `","', then each +occurrence of that character separates two fields. Two consecutive +occurrences delimit an empty field. If the character occurs at the +beginning or the end of the line, that too delimits an empty field. The +space character is the only single character which does not follow these +rules. + + More generally, the value of `FS' may be a string containing any +regular expression. Then each match in the record for the regular +expression separates fields. For example, the assignment: + + FS = ", \t" + +makes every area of an input line that consists of a comma followed by a +space and a tab, into a field separator. (`\t' stands for a tab.) + + For a less trivial example of a regular expression, suppose you want +single spaces to separate fields the way single commas were used above. +You can set `FS' to `"[ ]"'. This regular expression matches a single +space and nothing else. + + `FS' can be set on the command line. You use the `-F' argument to +do so. For example: + + awk -F, 'PROGRAM' INPUT-FILES + +sets `FS' to be the `,' character. Notice that the argument uses a +capital `F'. Contrast this with `-f', which specifies a file +containing an `awk' program. Case is significant in command options: +the `-F' and `-f' options have nothing to do with each other. You can +use both options at the same time to set the `FS' argument *and* get an +`awk' program from a file. + + The value used for the argument to `-F' is processed in exactly the +same way as assignments to the built-in variable `FS'. This means that +if the field separator contains special characters, they must be escaped +appropriately. For example, to use a `\' as the field separator, you +would have to type: + + # same as FS = "\\" + awk -F\\\\ '...' files ... + +Since `\' is used for quoting in the shell, `awk' will see `-F\\'. +Then `awk' processes the `\\' for escape characters (*note Constant +Expressions: Constants.), finally yielding a single `\' to be used for +the field separator. + + As a special case, in compatibility mode (*note Invoking `awk': +Command Line.), if the argument to `-F' is `t', then `FS' is set to the +tab character. (This is because if you type `-F\t', without the quotes, +at the shell, the `\' gets deleted, so `awk' figures that you really +want your fields to be separated with tabs, and not `t's. Use `-v +FS="t"' on the command line if you really do want to separate your +fields with `t's.) + + For example, let's use an `awk' program file called `baud.awk' that +contains the pattern `/300/', and the action `print $1'. Here is the +program: + + /300/ { print $1 } + + Let's also set `FS' to be the `-' character, and run the program on +the file `BBS-list'. The following command prints a list of the names +of the bulletin boards that operate at 300 baud and the first three +digits of their phone numbers: + + awk -F- -f baud.awk BBS-list + +It produces this output: + + aardvark 555 + alpo + barfly 555 + bites 555 + camelot 555 + core 555 + fooey 555 + foot 555 + macfoo 555 + sdace 555 + sabafoo 555 + +Note the second line of output. If you check the original file, you +will see that the second line looked like this: + + alpo-net 555-3412 2400/1200/300 A + + The `-' as part of the system's name was used as the field +separator, instead of the `-' in the phone number that was originally +intended. This demonstrates why you have to be careful in choosing +your field and record separators. + + The following program searches the system password file, and prints +the entries for users who have no password: + + awk -F: '$2 == ""' /etc/passwd + +Here we use the `-F' option on the command line to set the field +separator. Note that fields in `/etc/passwd' are separated by colons. +The second field represents a user's encrypted password, but if the +field is empty, that user has no password. + + According to the POSIX standard, `awk' is supposed to behave as if +each record is split into fields at the time that it is read. In +particular, this means that you can change the value of `FS' after a +record is read, but before any of the fields are referenced. The value +of the fields (i.e. how they were split) should reflect the old value +of `FS', not the new one. + + However, many implementations of `awk' do not do this. Instead, +they defer splitting the fields until a field reference actually +happens, using the *current* value of `FS'! This behavior can be +difficult to diagnose. The following example illustrates the results of +the two methods. (The `sed' command prints just the first line of +`/etc/passwd'.) + + sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }' + +will usually print + + root + +on an incorrect implementation of `awk', while `gawk' will print +something like + + root:nSijPlPhZZwgE:0:0:Root:/: + + There is an important difference between the two cases of `FS = " "' +(a single blank) and `FS = "[ \t]+"' (which is a regular expression +matching one or more blanks or tabs). For both values of `FS', fields +are separated by runs of blanks and/or tabs. However, when the value of +`FS' is `" "', `awk' will strip leading and trailing whitespace from +the record, and then decide where the fields are. + + For example, the following expression prints `b': + + echo ' a b c d ' | awk '{ print $2 }' + +However, the following prints `a': + + echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }' + +In this case, the first field is null. + + The stripping of leading and trailing whitespace also comes into +play whenever `$0' is recomputed. For instance, this pipeline + + echo ' a b c d' | awk '{ print; $2 = $2; print }' + +produces this output: + + a b c d + a b c d + +The first `print' statement prints the record as it was read, with +leading whitespace intact. The assignment to `$2' rebuilds `$0' by +concatenating `$1' through `$NF' together, separated by the value of +`OFS'. Since the leading whitespace was ignored when finding `$1', it +is not part of the new `$0'. Finally, the last `print' statement +prints the new `$0'. + + The following table summarizes how fields are split, based on the +value of `FS'. + +`FS == " "' + Fields are separated by runs of whitespace. Leading and trailing + whitespace are ignored. This is the default. + +`FS == ANY SINGLE CHARACTER' + Fields are separated by each occurrence of the character. Multiple + successive occurrences delimit empty fields, as do leading and + trailing occurrences. + +`FS == REGEXP' + Fields are separated by occurrences of characters that match + REGEXP. Leading and trailing matches of REGEXP delimit empty + fields. + + +File: gawk.info, Node: Constant Size, Next: Multiple Line, Prev: Field Separators, Up: Reading Files + +Reading Fixed-width Data +======================== + + (This section discusses an advanced, experimental feature. If you +are a novice `awk' user, you may wish to skip it on the first reading.) + + `gawk' 2.13 introduced a new facility for dealing with fixed-width +fields with no distinctive field separator. Data of this nature arises +typically in one of at least two ways: the input for old FORTRAN +programs where numbers are run together, and the output of programs +that did not anticipate the use of their output as input for other +programs. + + An example of the latter is a table where all the columns are lined +up by the use of a variable number of spaces and *empty fields are just +spaces*. Clearly, `awk''s normal field splitting based on `FS' will +not work well in this case. (Although a portable `awk' program can use +a series of `substr' calls on `$0', this is awkward and inefficient for +a large number of fields.) + + The splitting of an input record into fixed-width fields is +specified by assigning a string containing space-separated numbers to +the built-in variable `FIELDWIDTHS'. Each number specifies the width +of the field *including* columns between fields. If you want to ignore +the columns between fields, you can specify the width as a separate +field that is subsequently ignored. + + The following data is the output of the `w' utility. It is useful +to illustrate the use of `FIELDWIDTHS'. + + 10:06pm up 21 days, 14:04, 23 users + User tty login idle JCPU PCPU what + hzuo ttyV0 8:58pm 9 5 vi p24.tex + hzang ttyV3 6:37pm 50 -csh + eklye ttyV5 9:53pm 7 1 em thes.tex + dportein ttyV6 8:17pm 1:47 -csh + gierd ttyD3 10:00pm 1 elm + dave ttyD4 9:47pm 4 4 w + brent ttyp0 26Jun91 4:46 26:46 4:41 bash + dave ttyq4 26Jun9115days 46 46 wnewmail + + The following program takes the above input, converts the idle time +to number of seconds and prints out the first two fields and the +calculated idle time. (This program uses a number of `awk' features +that haven't been introduced yet.) + + BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" } + NR > 2 { + idle = $4 + sub(/^ */, "", idle) # strip leading spaces + if (idle == "") idle = 0 + if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] } + if (idle ~ /days/) { idle *= 24 * 60 * 60 } + + print $1, $2, idle + } + + Here is the result of running the program on the data: + + hzuo ttyV0 0 + hzang ttyV3 50 + eklye ttyV5 0 + dportein ttyV6 107 + gierd ttyD3 1 + dave ttyD4 0 + brent ttyp0 286 + dave ttyq4 1296000 + + Another (possibly more practical) example of fixed-width input data +would be the input from a deck of balloting cards. In some parts of +the United States, voters make their choices by punching holes in +computer cards. These cards are then processed to count the votes for +any particular candidate or on any particular issue. Since a voter may +choose not to vote on some issue, any column on the card may be empty. +An `awk' program for processing such data could use the `FIELDWIDTHS' +feature to simplify reading the data. + + This feature is still experimental, and will likely evolve over time. + + +File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Constant Size, Up: Reading Files + +Multiple-Line Records +===================== + + In some data bases, a single line cannot conveniently hold all the +information in one entry. In such cases, you can use multi-line +records. + + The first step in doing this is to choose your data format: when +records are not defined as single lines, how do you want to define them? +What should separate records? + + One technique is to use an unusual character or string to separate +records. For example, you could use the formfeed character (written +`\f' in `awk', as in C) to separate them, making each record a page of +the file. To do this, just set the variable `RS' to `"\f"' (a string +containing the formfeed character). Any other character could equally +well be used, as long as it won't be part of the data in a record. + + Another technique is to have blank lines separate records. By a +special dispensation, a null string as the value of `RS' indicates that +records are separated by one or more blank lines. If you set `RS' to +the null string, a record always ends at the first blank line +encountered. And the next record doesn't start until the first nonblank +line that follows--no matter how many blank lines appear in a row, they +are considered one record-separator. (End of file is also considered a +record separator.) + + The second step is to separate the fields in the record. One way to +do this is to put each field on a separate line: to do this, just set +the variable `FS' to the string `"\n"'. (This simple regular +expression matches a single newline.) + + Another way to separate fields is to divide each of the lines into +fields in the normal manner. This happens by default as a result of a +special feature: when `RS' is set to the null string, the newline +character *always* acts as a field separator. This is in addition to +whatever field separations result from `FS'. + + The original motivation for this special exception was probably so +that you get useful behavior in the default case (i.e., `FS == " "'). +This feature can be a problem if you really don't want the newline +character to separate fields, since there is no way to prevent it. +However, you can work around this by using the `split' function to +break up the record manually (*note Built-in Functions for String +Manipulation: String Functions.). + + +File: gawk.info, Node: Getline, Next: Close Input, Prev: Multiple Line, Up: Reading Files + +Explicit Input with `getline' +============================= + + So far we have been getting our input files from `awk''s main input +stream--either the standard input (usually your terminal) or the files +specified on the command line. The `awk' language has a special +built-in command called `getline' that can be used to read input under +your explicit control. + + This command is quite complex and should *not* be used by beginners. +It is covered here because this is the chapter on input. The examples +that follow the explanation of the `getline' command include material +that has not been covered yet. Therefore, come back and study the +`getline' command *after* you have reviewed the rest of this manual and +have a good knowledge of how `awk' works. + + `getline' returns 1 if it finds a record, and 0 if the end of the +file is encountered. If there is some error in getting a record, such +as a file that cannot be opened, then `getline' returns -1. In this +case, `gawk' sets the variable `ERRNO' to a string describing the error +that occurred. + + In the following examples, COMMAND stands for a string value that +represents a shell command. + +`getline' + The `getline' command can be used without arguments to read input + from the current input file. All it does in this case is read the + next input record and split it up into fields. This is useful if + you've finished processing the current record, but you want to do + some special processing *right now* on the next record. Here's an + example: + + awk '{ + if (t = index($0, "/*")) { + if (t > 1) + tmp = substr($0, 1, t - 1) + else + tmp = "" + u = index(substr($0, t + 2), "*/") + while (u == 0) { + getline + t = -1 + u = index($0, "*/") + } + if (u <= length($0) - 2) + $0 = tmp substr($0, t + u + 3) + else + $0 = tmp + } + print $0 + }' + + This `awk' program deletes all C-style comments, `/* ... */', + from the input. By replacing the `print $0' with other + statements, you could perform more complicated processing on the + decommented input, like searching for matches of a regular + expression. (This program has a subtle problem--can you spot it?) + + This form of the `getline' command sets `NF' (the number of + fields; *note Examining Fields: Fields.), `NR' (the number of + records read so far; *note How Input is Split into Records: + Records.), `FNR' (the number of records read from this input + file), and the value of `$0'. + + *Note:* the new value of `$0' is used in testing the patterns of + any subsequent rules. The original value of `$0' that triggered + the rule which executed `getline' is lost. By contrast, the + `next' statement reads a new record but immediately begins + processing it normally, starting with the first rule in the + program. *Note The `next' Statement: Next Statement. + +`getline VAR' + This form of `getline' reads a record into the variable VAR. This + is useful when you want your program to read the next record from + the current input file, but you don't want to subject the record + to the normal input processing. + + For example, suppose the next line is a comment, or a special + string, and you want to read it, but you must make certain that it + won't trigger any rules. This version of `getline' allows you to + read that line and store it in a variable so that the main + read-a-line-and-check-each-rule loop of `awk' never sees it. + + The following example swaps every two lines of input. For + example, given: + + wan + tew + free + phore + + it outputs: + + tew + wan + phore + free + + Here's the program: + + awk '{ + if ((getline tmp) > 0) { + print tmp + print $0 + } else + print $0 + }' + + The `getline' function used in this way sets only the variables + `NR' and `FNR' (and of course, VAR). The record is not split into + fields, so the values of the fields (including `$0') and the value + of `NF' do not change. + +`getline < FILE' + This form of the `getline' function takes its input from the file + FILE. Here FILE is a string-valued expression that specifies the + file name. `< FILE' is called a "redirection" since it directs + input to come from a different place. + + This form is useful if you want to read your input from a + particular file, instead of from the main input stream. For + example, the following program reads its input record from the + file `foo.input' when it encounters a first field with a value + equal to 10 in the current input file. + + awk '{ + if ($1 == 10) { + getline < "foo.input" + print + } else + print + }' + + Since the main input stream is not used, the values of `NR' and + `FNR' are not changed. But the record read is split into fields in + the normal manner, so the values of `$0' and other fields are + changed. So is the value of `NF'. + + This does not cause the record to be tested against all the + patterns in the `awk' program, in the way that would happen if the + record were read normally by the main processing loop of `awk'. + However the new record is tested against any subsequent rules, + just as when `getline' is used without a redirection. + +`getline VAR < FILE' + This form of the `getline' function takes its input from the file + FILE and puts it in the variable VAR. As above, FILE is a + string-valued expression that specifies the file from which to + read. + + In this version of `getline', none of the built-in variables are + changed, and the record is not split into fields. The only + variable changed is VAR. + + For example, the following program copies all the input files to + the output, except for records that say `@include FILENAME'. Such + a record is replaced by the contents of the file FILENAME. + + awk '{ + if (NF == 2 && $1 == "@include") { + while ((getline line < $2) > 0) + print line + close($2) + } else + print + }' + + Note here how the name of the extra input file is not built into + the program; it is taken from the data, from the second field on + the `@include' line. + + The `close' function is called to ensure that if two identical + `@include' lines appear in the input, the entire specified file is + included twice. *Note Closing Input Files and Pipes: Close Input. + + One deficiency of this program is that it does not process nested + `@include' statements the way a true macro preprocessor would. + +`COMMAND | getline' + You can "pipe" the output of a command into `getline'. A pipe is + simply a way to link the output of one program to the input of + another. In this case, the string COMMAND is run as a shell + command and its output is piped into `awk' to be used as input. + This form of `getline' reads one record from the pipe. + + For example, the following program copies input to output, except + for lines that begin with `@execute', which are replaced by the + output produced by running the rest of the line as a shell command: + + awk '{ + if ($1 == "@execute") { + tmp = substr($0, 10) + while ((tmp | getline) > 0) + print + close(tmp) + } else + print + }' + + The `close' function is called to ensure that if two identical + `@execute' lines appear in the input, the command is run for each + one. *Note Closing Input Files and Pipes: Close Input. + + Given the input: + + foo + bar + baz + @execute who + bletch + + the program might produce: + + foo + bar + baz + hack ttyv0 Jul 13 14:22 + hack ttyp0 Jul 13 14:23 (gnu:0) + hack ttyp1 Jul 13 14:23 (gnu:0) + hack ttyp2 Jul 13 14:23 (gnu:0) + hack ttyp3 Jul 13 14:23 (gnu:0) + bletch + + Notice that this program ran the command `who' and printed the + result. (If you try this program yourself, you will get different + results, showing you who is logged in on your system.) + + This variation of `getline' splits the record into fields, sets the + value of `NF' and recomputes the value of `$0'. The values of + `NR' and `FNR' are not changed. + +`COMMAND | getline VAR' + The output of the command COMMAND is sent through a pipe to + `getline' and into the variable VAR. For example, the following + program reads the current date and time into the variable + `current_time', using the `date' utility, and then prints it. + + awk 'BEGIN { + "date" | getline current_time + close("date") + print "Report printed on " current_time + }' + + In this version of `getline', none of the built-in variables are + changed, and the record is not split into fields. + + +File: gawk.info, Node: Close Input, Prev: Getline, Up: Reading Files + +Closing Input Files and Pipes +============================= + + If the same file name or the same shell command is used with +`getline' more than once during the execution of an `awk' program, the +file is opened (or the command is executed) only the first time. At +that time, the first record of input is read from that file or command. +The next time the same file or command is used in `getline', another +record is read from it, and so on. + + This implies that if you want to start reading the same file again +from the beginning, or if you want to rerun a shell command (rather than +reading more output from the command), you must take special steps. +What you must do is use the `close' function, as follows: + + close(FILENAME) + +or + + close(COMMAND) + + The argument FILENAME or COMMAND can be any expression. Its value +must exactly equal the string that was used to open the file or start +the command--for example, if you open a pipe with this: + + "sort -r names" | getline foo + +then you must close it with this: + + close("sort -r names") + + Once this function call is executed, the next `getline' from that +file or command will reopen the file or rerun the command. + + `close' returns a value of zero if the close succeeded. Otherwise, +the value will be non-zero. In this case, `gawk' sets the variable +`ERRNO' to a string describing the error that occurred. + + +File: gawk.info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top + +Printing Output +*************** + + One of the most common things that actions do is to output or "print" +some or all of the input. For simple output, use the `print' +statement. For fancier formatting use the `printf' statement. Both +are described in this chapter. + +* Menu: + +* Print:: The `print' statement. +* Print Examples:: Simple examples of `print' statements. +* Output Separators:: The output separators and how to change them. +* OFMT:: Controlling Numeric Output With `print'. +* Printf:: The `printf' statement. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in `gawk'. + `gawk' allows access to + inherited file descriptors. + + +File: gawk.info, Node: Print, Next: Print Examples, Prev: Printing, Up: Printing + +The `print' Statement +===================== + + The `print' statement does output with simple, standardized +formatting. You specify only the strings or numbers to be printed, in a +list separated by commas. They are output, separated by single spaces, +followed by a newline. The statement looks like this: + + print ITEM1, ITEM2, ... + +The entire list of items may optionally be enclosed in parentheses. The +parentheses are necessary if any of the item expressions uses a +relational operator; otherwise it could be confused with a redirection +(*note Redirecting Output of `print' and `printf': Redirection.). The +relational operators are `==', `!=', `<', `>', `>=', `<=', `~' and `!~' +(*note Comparison Expressions: Comparison Ops.). + + The items printed can be constant strings or numbers, fields of the +current record (such as `$1'), variables, or any `awk' expressions. +The `print' statement is completely general for computing *what* values +to print. With two exceptions, you cannot specify *how* to print +them--how many columns, whether to use exponential notation or not, and +so on. (*Note Output Separators::, and *Note Controlling Numeric +Output with `print': OFMT.) For that, you need the `printf' statement +(*note Using `printf' Statements for Fancier Printing: Printf.). + + The simple statement `print' with no items is equivalent to `print +$0': it prints the entire current record. To print a blank line, use +`print ""', where `""' is the null, or empty, string. + + To print a fixed piece of text, use a string constant such as +`"Hello there"' as one item. If you forget to use the double-quote +characters, your text will be taken as an `awk' expression, and you +will probably get an error. Keep in mind that a space is printed +between any two items. + + Most often, each `print' statement makes one line of output. But it +isn't limited to one line. If an item value is a string that contains a +newline, the newline is output along with the rest of the string. A +single `print' can make any number of lines this way. + + +File: gawk.info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing + +Examples of `print' Statements +============================== + + Here is an example of printing a string that contains embedded +newlines: + + awk 'BEGIN { print "line one\nline two\nline three" }' + +produces output like this: + + line one + line two + line three + + Here is an example that prints the first two fields of each input +record, with a space between them: + + awk '{ print $1, $2 }' inventory-shipped + +Its output looks like this: + + Jan 13 + Feb 15 + Mar 15 + ... + + A common mistake in using the `print' statement is to omit the comma +between two items. This often has the effect of making the items run +together in the output, with no space. The reason for this is that +juxtaposing two string expressions in `awk' means to concatenate them. +For example, without the comma: + + awk '{ print $1 $2 }' inventory-shipped + +prints: + + Jan13 + Feb15 + Mar15 + ... + + Neither example's output makes much sense to someone unfamiliar with +the file `inventory-shipped'. A heading line at the beginning would +make it clearer. Let's add some headings to our table of months (`$1') +and green crates shipped (`$2'). We do this using the `BEGIN' pattern +(*note `BEGIN' and `END' Special Patterns: BEGIN/END.) to force the +headings to be printed only once: + + awk 'BEGIN { print "Month Crates" + print "----- ------" } + { print $1, $2 }' inventory-shipped + +Did you already guess what happens? This program prints the following: + + Month Crates + ----- ------ + Jan 13 + Feb 15 + Mar 15 + ... + +The headings and the table data don't line up! We can fix this by +printing some spaces between the two fields: + + awk 'BEGIN { print "Month Crates" + print "----- ------" } + { print $1, " ", $2 }' inventory-shipped + + You can imagine that this way of lining up columns can get pretty +complicated when you have many columns to fix. Counting spaces for two +or three columns can be simple, but more than this and you can get +"lost" quite easily. This is why the `printf' statement was created +(*note Using `printf' Statements for Fancier Printing: Printf.); one of +its specialties is lining up columns of data. + |