diff options
Diffstat (limited to 'gawk.info-4')
-rw-r--r-- | gawk.info-4 | 1305 |
1 files changed, 1305 insertions, 0 deletions
diff --git a/gawk.info-4 b/gawk.info-4 new file mode 100644 index 00000000..100e1d25 --- /dev/null +++ b/gawk.info-4 @@ -0,0 +1,1305 @@ +This is Info file gawk.info, produced by Makeinfo-1.54 from the input +file gawk.texi. + + This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + + This is Edition 0.15 of `The GAWK Manual', +for the 2.15 version of the GNU implementation +of AWK. + + Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + +File: gawk.info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top + +Overview of Actions +******************* + + An `awk' program or script consists of a series of rules and +function definitions, interspersed. (Functions are described later. +*Note User-defined Functions: User-defined.) + + A rule contains a pattern and an action, either of which may be +omitted. The purpose of the "action" is to tell `awk' what to do once +a match for the pattern is found. Thus, the entire program looks +somewhat like this: + + [PATTERN] [{ ACTION }] + [PATTERN] [{ ACTION }] + ... + function NAME (ARGS) { ... } + ... + + An action consists of one or more `awk' "statements", enclosed in +curly braces (`{' and `}'). Each statement specifies one thing to be +done. The statements are separated by newlines or semicolons. + + The curly braces around an action must be used even if the action +contains only one statement, or even if it contains no statements at +all. However, if you omit the action entirely, omit the curly braces as +well. (An omitted action is equivalent to `{ print $0 }'.) + + Here are the kinds of statements supported in `awk': + + * Expressions, which can call functions or assign values to variables + (*note Expressions as Action Statements: Expressions.). Executing + this kind of statement simply computes the value of the expression + and then ignores it. This is useful when the expression has side + effects (*note Assignment Expressions: Assignment Ops.). + + * Control statements, which specify the control flow of `awk' + programs. The `awk' language gives you C-like constructs (`if', + `for', `while', and so on) as well as a few special ones (*note + Control Statements in Actions: Statements.). + + * Compound statements, which consist of one or more statements + enclosed in curly braces. A compound statement is used in order + to put several statements together in the body of an `if', + `while', `do' or `for' statement. + + * Input control, using the `getline' command (*note Explicit Input + with `getline': Getline.), and the `next' statement (*note The + `next' Statement: Next Statement.). + + * Output statements, `print' and `printf'. *Note Printing Output: + Printing. + + * Deletion statements, for deleting array elements. *Note The + `delete' Statement: Delete. + + +File: gawk.info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top + +Expressions as Action Statements +******************************** + + Expressions are the basic building block of `awk' actions. An +expression evaluates to a value, which you can print, test, store in a +variable or pass to a function. But beyond that, an expression can +assign a new value to a variable or a field, with an assignment +operator. + + An expression can serve as a statement on its own. Most other kinds +of statements contain one or more expressions which specify data to be +operated on. As in other languages, expressions in `awk' include +variables, array references, constants, and function calls, as well as +combinations of these with various operators. + +* Menu: + +* Constants:: String, numeric, and regexp constants. +* Variables:: Variables give names to values for later use. +* Arithmetic Ops:: Arithmetic operations (`+', `-', etc.) +* Concatenation:: Concatenating strings. +* Comparison Ops:: Comparison of numbers and strings + with `<', etc. +* Boolean Ops:: Combining comparison expressions + using boolean operators + `||' ("or"), `&&' ("and") and `!' ("not"). + +* Assignment Ops:: Changing the value of a variable or a field. +* Increment Ops:: Incrementing the numeric value of a variable. + +* Conversion:: The conversion of strings to numbers + and vice versa. +* Values:: The whole truth about numbers and strings. +* Conditional Exp:: Conditional expressions select + between two subexpressions under control + of a third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. + + +File: gawk.info, Node: Constants, Next: Variables, Prev: Expressions, Up: Expressions + +Constant Expressions +==================== + + The simplest type of expression is the "constant", which always has +the same value. There are three types of constants: numeric constants, +string constants, and regular expression constants. + + A "numeric constant" stands for a number. This number can be an +integer, a decimal fraction, or a number in scientific (exponential) +notation. Note that all numeric values are represented within `awk' in +double-precision floating point. Here are some examples of numeric +constants, which all have the same value: + + 105 + 1.05e+2 + 1050e-1 + + A string constant consists of a sequence of characters enclosed in +double-quote marks. For example: + + "parrot" + +represents the string whose contents are `parrot'. Strings in `gawk' +can be of any length and they can contain all the possible 8-bit ASCII +characters including ASCII NUL. Other `awk' implementations may have +difficulty with some character codes. + + Some characters cannot be included literally in a string constant. +You represent them instead with "escape sequences", which are character +sequences beginning with a backslash (`\'). + + One use of an escape sequence is to include a double-quote character +in a string constant. Since a plain double-quote would end the string, +you must use `\"' to represent a single double-quote character as a +part of the string. The backslash character itself is another +character that cannot be included normally; you write `\\' to put one +backslash in the string. Thus, the string whose contents are the two +characters `"\' must be written `"\"\\"'. + + Another use of backslash is to represent unprintable characters such +as newline. While there is nothing to stop you from writing most of +these characters directly in a string constant, they may look ugly. + + Here is a table of all the escape sequences used in `awk': + +`\\' + Represents a literal backslash, `\'. + +`\a' + Represents the "alert" character, control-g, ASCII code 7. + +`\b' + Represents a backspace, control-h, ASCII code 8. + +`\f' + Represents a formfeed, control-l, ASCII code 12. + +`\n' + Represents a newline, control-j, ASCII code 10. + +`\r' + Represents a carriage return, control-m, ASCII code 13. + +`\t' + Represents a horizontal tab, control-i, ASCII code 9. + +`\v' + Represents a vertical tab, control-k, ASCII code 11. + +`\NNN' + Represents the octal value NNN, where NNN are one to three digits + between 0 and 7. For example, the code for the ASCII ESC (escape) + character is `\033'. + +`\xHH...' + Represents the hexadecimal value HH, where HH are hexadecimal + digits (`0' through `9' and either `A' through `F' or `a' through + `f'). Like the same construct in ANSI C, the escape sequence + continues until the first non-hexadecimal digit is seen. However, + using more than two hexadecimal digits produces undefined results. + (The `\x' escape sequence is not allowed in POSIX `awk'.) + + A "constant regexp" is a regular expression description enclosed in +slashes, such as `/^beginning and end$/'. Most regexps used in `awk' +programs are constant, but the `~' and `!~' operators can also match +computed or "dynamic" regexps (*note How to Use Regular Expressions: +Regexp Usage.). + + Constant regexps may be used like simple expressions. When a +constant regexp is not on the right hand side of the `~' or `!~' +operators, it has the same meaning as if it appeared in a pattern, i.e. +`($0 ~ /foo/)' (*note Expressions as Patterns: Expression Patterns.). +This means that the two code segments, + + if ($0 ~ /barfly/ || $0 ~ /camelot/) + print "found" + +and + + if (/barfly/ || /camelot/) + print "found" + +are exactly equivalent. One rather bizarre consequence of this rule is +that the following boolean expression is legal, but does not do what +the user intended: + + if (/foo/ ~ $1) print "found foo" + + This code is "obviously" testing `$1' for a match against the regexp +`/foo/'. But in fact, the expression `(/foo/ ~ $1)' actually means +`(($0 ~ /foo/) ~ $1)'. In other words, first match the input record +against the regexp `/foo/'. The result will be either a 0 or a 1, +depending upon the success or failure of the match. Then match that +result against the first field in the record. + + Since it is unlikely that you would ever really wish to make this +kind of test, `gawk' will issue a warning when it sees this construct in +a program. + + Another consequence of this rule is that the assignment statement + + matches = /foo/ + +will assign either 0 or 1 to the variable `matches', depending upon the +contents of the current input record. + + Constant regular expressions are also used as the first argument for +the `sub' and `gsub' functions (*note Built-in Functions for String +Manipulation: String Functions.). + + This feature of the language was never well documented until the +POSIX specification. + + You may be wondering, when is + + $1 ~ /foo/ { ... } + +preferable to + + $1 ~ "foo" { ... } + + Since the right-hand sides of both `~' operators are constants, it +is more efficient to use the `/foo/' form: `awk' can note that you have +supplied a regexp and store it internally in a form that makes pattern +matching more efficient. In the second form, `awk' must first convert +the string into this internal form, and then perform the pattern +matching. The first form is also better style; it shows clearly that +you intend a regexp match. + + +File: gawk.info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions + +Variables +========= + + Variables let you give names to values and refer to them later. You +have already seen variables in many of the examples. The name of a +variable must be a sequence of letters, digits and underscores, but it +may not begin with a digit. Case is significant in variable names; `a' +and `A' are distinct variables. + + A variable name is a valid expression by itself; it represents the +variable's current value. Variables are given new values with +"assignment operators" and "increment operators". *Note Assignment +Expressions: Assignment Ops. + + A few variables have special built-in meanings, such as `FS', the +field separator, and `NF', the number of fields in the current input +record. *Note Built-in Variables::, for a list of them. These +built-in variables can be used and assigned just like all other +variables, but their values are also used or changed automatically by +`awk'. Each built-in variable's name is made entirely of upper case +letters. + + Variables in `awk' can be assigned either numeric or string values. +By default, variables are initialized to the null string, which is +effectively zero if converted to a number. There is no need to +"initialize" each variable explicitly in `awk', the way you would in C +or most other traditional languages. + +* Menu: + +* Assignment Options:: Setting variables on the command line + and a summary of command line syntax. + This is an advanced method of input. + + +File: gawk.info, Node: Assignment Options, Prev: Variables, Up: Variables + +Assigning Variables on the Command Line +--------------------------------------- + + You can set any `awk' variable by including a "variable assignment" +among the arguments on the command line when you invoke `awk' (*note +Invoking `awk': Command Line.). Such an assignment has this form: + + VARIABLE=TEXT + +With it, you can set a variable either at the beginning of the `awk' +run or in between input files. + + If you precede the assignment with the `-v' option, like this: + + -v VARIABLE=TEXT + +then the variable is set at the very beginning, before even the `BEGIN' +rules are run. The `-v' option and its assignment must precede all the +file name arguments, as well as the program text. + + Otherwise, the variable assignment is performed at a time determined +by its position among the input file arguments: after the processing of +the preceding input file argument. For example: + + awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list + +prints the value of field number `n' for all input records. Before the +first file is read, the command line sets the variable `n' equal to 4. +This causes the fourth field to be printed in lines from the file +`inventory-shipped'. After the first file has finished, but before the +second file is started, `n' is set to 2, so that the second field is +printed in lines from `BBS-list'. + + Command line arguments are made available for explicit examination by +the `awk' program in an array named `ARGV' (*note Built-in +Variables::.). + + `awk' processes the values of command line assignments for escape +sequences (*note Constant Expressions: Constants.). + + +File: gawk.info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions + +Arithmetic Operators +==================== + + The `awk' language uses the common arithmetic operators when +evaluating expressions. All of these arithmetic operators follow normal +precedence rules, and work as you would expect them to. This example +divides field three by field four, adds field two, stores the result +into field one, and prints the resulting altered input record: + + awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped + + The arithmetic operators in `awk' are: + +`X + Y' + Addition. + +`X - Y' + Subtraction. + +`- X' + Negation. + +`+ X' + Unary plus. No real effect on the expression. + +`X * Y' + Multiplication. + +`X / Y' + Division. Since all numbers in `awk' are double-precision + floating point, the result is not rounded to an integer: `3 / 4' + has the value 0.75. + +`X % Y' + Remainder. The quotient is rounded toward zero to an integer, + multiplied by Y and this result is subtracted from X. This + operation is sometimes known as "trunc-mod." The following + relation always holds: + + b * int(a / b) + (a % b) == a + + One possibly undesirable effect of this definition of remainder is + that `X % Y' is negative if X is negative. Thus, + + -17 % 8 = -1 + + In other `awk' implementations, the signedness of the remainder + may be machine dependent. + +`X ^ Y' +`X ** Y' + Exponentiation: X raised to the Y power. `2 ^ 3' has the value 8. + The character sequence `**' is equivalent to `^'. (The POSIX + standard only specifies the use of `^' for exponentiation.) + + +File: gawk.info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions + +String Concatenation +==================== + + There is only one string operation: concatenation. It does not have +a specific operator to represent it. Instead, concatenation is +performed by writing expressions next to one another, with no operator. +For example: + + awk '{ print "Field number one: " $1 }' BBS-list + +produces, for the first record in `BBS-list': + + Field number one: aardvark + + Without the space in the string constant after the `:', the line +would run together. For example: + + awk '{ print "Field number one:" $1 }' BBS-list + +produces, for the first record in `BBS-list': + + Field number one:aardvark + + Since string concatenation does not have an explicit operator, it is +often necessary to insure that it happens where you want it to by +enclosing the items to be concatenated in parentheses. For example, the +following code fragment does not concatenate `file' and `name' as you +might expect: + + file = "file" + name = "name" + print "something meaningful" > file name + +It is necessary to use the following: + + print "something meaningful" > (file name) + + We recommend you use parentheses around concatenation in all but the +most common contexts (such as in the right-hand operand of `='). + + +File: gawk.info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions + +Comparison Expressions +====================== + + "Comparison expressions" compare strings or numbers for +relationships such as equality. They are written using "relational +operators", which are a superset of those in C. Here is a table of +them: + +`X < Y' + True if X is less than Y. + +`X <= Y' + True if X is less than or equal to Y. + +`X > Y' + True if X is greater than Y. + +`X >= Y' + True if X is greater than or equal to Y. + +`X == Y' + True if X is equal to Y. + +`X != Y' + True if X is not equal to Y. + +`X ~ Y' + True if the string X matches the regexp denoted by Y. + +`X !~ Y' + True if the string X does not match the regexp denoted by Y. + +`SUBSCRIPT in ARRAY' + True if array ARRAY has an element with the subscript SUBSCRIPT. + + Comparison expressions have the value 1 if true and 0 if false. + + The rules `gawk' uses for performing comparisons are based on those +in draft 11.2 of the POSIX standard. The POSIX standard introduced the +concept of a "numeric string", which is simply a string that looks like +a number, for example, `" +2"'. + + When performing a relational operation, `gawk' considers the type of +an operand to be the type it received on its last *assignment*, rather +than the type of its last *use* (*note Numeric and String Values: +Values.). This type is *unknown* when the operand is from an +"external" source: field variables, command line arguments, array +elements resulting from a `split' operation, and the value of an +`ENVIRON' element. In this case only, if the operand is a numeric +string, then it is considered to be of both string type and numeric +type. If at least one operand of a comparison is of string type only, +then a string comparison is performed. Any numeric operand will be +converted to a string using the value of `CONVFMT' (*note Conversion of +Strings and Numbers: Conversion.). If one operand of a comparison is +numeric, and the other operand is either numeric or both numeric and +string, then `gawk' does a numeric comparison. If both operands have +both types, then the comparison is numeric. Strings are compared by +comparing the first character of each, then the second character of +each, and so on. Thus `"10"' is less than `"9"'. If there are two +strings where one is a prefix of the other, the shorter string is less +than the longer one. Thus `"abc"' is less than `"abcd"'. + + Here are some sample expressions, how `gawk' compares them, and what +the result of the comparison is. + +`1.5 <= 2.0' + numeric comparison (true) + +`"abc" >= "xyz"' + string comparison (false) + +`1.5 != " +2"' + string comparison (true) + +`"1e2" < "3"' + string comparison (true) + +`a = 2; b = "2"' +`a == b' + string comparison (true) + + echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }' + +prints `false' since both `$1' and `$2' are numeric strings and thus +have both string and numeric types, thus dictating a numeric comparison. + + The purpose of the comparison rules and the use of numeric strings is +to attempt to produce the behavior that is "least surprising," while +still "doing the right thing." + + String comparisons and regular expression comparisons are very +different. For example, + + $1 == "foo" + +has the value of 1, or is true, if the first field of the current input +record is precisely `foo'. By contrast, + + $1 ~ /foo/ + +has the value 1 if the first field contains `foo', such as `foobar'. + + The right hand operand of the `~' and `!~' operators may be either a +constant regexp (`/.../'), or it may be an ordinary expression, in +which case the value of the expression as a string is a dynamic regexp +(*note How to Use Regular Expressions: Regexp Usage.). + + In very recent implementations of `awk', a constant regular +expression in slashes by itself is also an expression. The regexp +`/REGEXP/' is an abbreviation for this comparison expression: + + $0 ~ /REGEXP/ + + In some contexts it may be necessary to write parentheses around the +regexp to avoid confusing the `gawk' parser. For example, `(/x/ - /y/) +> threshold' is not allowed, but `((/x/) - (/y/)) > threshold' parses +properly. + + One special place where `/foo/' is *not* an abbreviation for `$0 ~ +/foo/' is when it is the right-hand operand of `~' or `!~'! *Note +Constant Expressions: Constants, where this is discussed in more detail. + + +File: gawk.info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions + +Boolean Expressions +=================== + + A "boolean expression" is a combination of comparison expressions or +matching expressions, using the boolean operators "or" (`||'), "and" +(`&&'), and "not" (`!'), along with parentheses to control nesting. +The truth of the boolean expression is computed by combining the truth +values of the component expressions. + + Boolean expressions can be used wherever comparison and matching +expressions can be used. They can be used in `if', `while' `do' and +`for' statements. They have numeric values (1 if true, 0 if false), +which come into play if the result of the boolean expression is stored +in a variable, or used in arithmetic. + + In addition, every boolean expression is also a valid boolean +pattern, so you can use it as a pattern to control the execution of +rules. + + Here are descriptions of the three boolean operators, with an +example of each. It may be instructive to compare these examples with +the analogous examples of boolean patterns (*note Boolean Operators and +Patterns: Boolean Patterns.), which use the same boolean operators in +patterns instead of expressions. + +`BOOLEAN1 && BOOLEAN2' + True if both BOOLEAN1 and BOOLEAN2 are true. For example, the + following statement prints the current input record if it contains + both `2400' and `foo'. + + if ($0 ~ /2400/ && $0 ~ /foo/) print + + The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true. + This can make a difference when BOOLEAN2 contains expressions that + have side effects: in the case of `$0 ~ /foo/ && ($2 == bar++)', + the variable `bar' is not incremented if there is no `foo' in the + record. + +`BOOLEAN1 || BOOLEAN2' + True if at least one of BOOLEAN1 or BOOLEAN2 is true. For + example, the following command prints all records in the input + file `BBS-list' that contain *either* `2400' or `foo', or both. + + awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list + + The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false. + This can make a difference when BOOLEAN2 contains expressions + that have side effects. + +`!BOOLEAN' + True if BOOLEAN is false. For example, the following program + prints all records in the input file `BBS-list' that do *not* + contain the string `foo'. + + awk '{ if (! ($0 ~ /foo/)) print }' BBS-list + + +File: gawk.info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions + +Assignment Expressions +====================== + + An "assignment" is an expression that stores a new value into a +variable. For example, let's assign the value 1 to the variable `z': + + z = 1 + + After this expression is executed, the variable `z' has the value 1. +Whatever old value `z' had before the assignment is forgotten. + + Assignments can store string values also. For example, this would +store the value `"this food is good"' in the variable `message': + + thing = "food" + predicate = "good" + message = "this " thing " is " predicate + +(This also illustrates concatenation of strings.) + + The `=' sign is called an "assignment operator". It is the simplest +assignment operator because the value of the right-hand operand is +stored unchanged. + + Most operators (addition, concatenation, and so on) have no effect +except to compute a value. If you ignore the value, you might as well +not use the operator. An assignment operator is different; it does +produce a value, but even if you ignore the value, the assignment still +makes itself felt through the alteration of the variable. We call this +a "side effect". + + The left-hand operand of an assignment need not be a variable (*note +Variables::.); it can also be a field (*note Changing the Contents of a +Field: Changing Fields.) or an array element (*note Arrays in `awk': +Arrays.). These are all called "lvalues", which means they can appear +on the left-hand side of an assignment operator. The right-hand +operand may be any expression; it produces the new value which the +assignment stores in the specified variable, field or array element. + + It is important to note that variables do *not* have permanent types. +The type of a variable is simply the type of whatever value it happens +to hold at the moment. In the following program fragment, the variable +`foo' has a numeric value at first, and a string value later on: + + foo = 1 + print foo + foo = "bar" + print foo + +When the second assignment gives `foo' a string value, the fact that it +previously had a numeric value is forgotten. + + An assignment is an expression, so it has a value: the same value +that is assigned. Thus, `z = 1' as an expression has the value 1. One +consequence of this is that you can write multiple assignments together: + + x = y = z = 0 + +stores the value 0 in all three variables. It does this because the +value of `z = 0', which is 0, is stored into `y', and then the value of +`y = z = 0', which is 0, is stored into `x'. + + You can use an assignment anywhere an expression is called for. For +example, it is valid to write `x != (y = 1)' to set `y' to 1 and then +test whether `x' equals 1. But this style tends to make programs hard +to read; except in a one-shot program, you should rewrite it to get rid +of such nesting of assignments. This is never very hard. + + Aside from `=', there are several other assignment operators that do +arithmetic with the old value of the variable. For example, the +operator `+=' computes a new value by adding the right-hand value to +the old value of the variable. Thus, the following assignment adds 5 +to the value of `foo': + + foo += 5 + +This is precisely equivalent to the following: + + foo = foo + 5 + +Use whichever one makes the meaning of your program clearer. + + Here is a table of the arithmetic assignment operators. In each +case, the right-hand operand is an expression whose value is converted +to a number. + +`LVALUE += INCREMENT' + Adds INCREMENT to the value of LVALUE to make the new value of + LVALUE. + +`LVALUE -= DECREMENT' + Subtracts DECREMENT from the value of LVALUE. + +`LVALUE *= COEFFICIENT' + Multiplies the value of LVALUE by COEFFICIENT. + +`LVALUE /= QUOTIENT' + Divides the value of LVALUE by QUOTIENT. + +`LVALUE %= MODULUS' + Sets LVALUE to its remainder by MODULUS. + +`LVALUE ^= POWER' +`LVALUE **= POWER' + Raises LVALUE to the power POWER. (Only the `^=' operator is + specified by POSIX.) + + +File: gawk.info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions + +Increment Operators +=================== + + "Increment operators" increase or decrease the value of a variable +by 1. You could do the same thing with an assignment operator, so the +increment operators add no power to the `awk' language; but they are +convenient abbreviations for something very common. + + The operator to add 1 is written `++'. It can be used to increment +a variable either before or after taking its value. + + To pre-increment a variable V, write `++V'. This adds 1 to the +value of V and that new value is also the value of this expression. +The assignment expression `V += 1' is completely equivalent. + + Writing the `++' after the variable specifies post-increment. This +increments the variable value just the same; the difference is that the +value of the increment expression itself is the variable's *old* value. +Thus, if `foo' has the value 4, then the expression `foo++' has the +value 4, but it changes the value of `foo' to 5. + + The post-increment `foo++' is nearly equivalent to writing `(foo += +1) - 1'. It is not perfectly equivalent because all numbers in `awk' +are floating point: in floating point, `foo + 1 - 1' does not +necessarily equal `foo'. But the difference is minute as long as you +stick to numbers that are fairly small (less than a trillion). + + Any lvalue can be incremented. Fields and array elements are +incremented just like variables. (Use `$(i++)' when you wish to do a +field reference and a variable increment at the same time. The +parentheses are necessary because of the precedence of the field +reference operator, `$'.) + + The decrement operator `--' works just like `++' except that it +subtracts 1 instead of adding. Like `++', it can be used before the +lvalue to pre-decrement or after it to post-decrement. + + Here is a summary of increment and decrement expressions. + +`++LVALUE' + This expression increments LVALUE and the new value becomes the + value of this expression. + +`LVALUE++' + This expression causes the contents of LVALUE to be incremented. + The value of the expression is the *old* value of LVALUE. + +`--LVALUE' + Like `++LVALUE', but instead of adding, it subtracts. It + decrements LVALUE and delivers the value that results. + +`LVALUE--' + Like `LVALUE++', but instead of adding, it subtracts. It + decrements LVALUE. The value of the expression is the *old* value + of LVALUE. + + +File: gawk.info, Node: Conversion, Next: Values, Prev: Increment Ops, Up: Expressions + +Conversion of Strings and Numbers +================================= + + Strings are converted to numbers, and numbers to strings, if the +context of the `awk' program demands it. For example, if the value of +either `foo' or `bar' in the expression `foo + bar' happens to be a +string, it is converted to a number before the addition is performed. +If numeric values appear in string concatenation, they are converted to +strings. Consider this: + + two = 2; three = 3 + print (two three) + 4 + +This eventually prints the (numeric) value 27. The numeric values of +the variables `two' and `three' are converted to strings and +concatenated together, and the resulting string is converted back to the +number 23, to which 4 is then added. + + If, for some reason, you need to force a number to be converted to a +string, concatenate the null string with that number. To force a string +to be converted to a number, add zero to that string. + + A string is converted to a number by interpreting a numeric prefix +of the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to +1000, and `"25fix"' has a numeric value of 25. Strings that can't be +interpreted as valid numbers are converted to zero. + + The exact manner in which numbers are converted into strings is +controlled by the `awk' built-in variable `CONVFMT' (*note Built-in +Variables::.). Numbers are converted using a special version of the +`sprintf' function (*note Built-in Functions: Built-in.) with `CONVFMT' +as the format specifier. + + `CONVFMT''s default value is `"%.6g"', which prints a value with at +least six significant digits. For some applications you will want to +change it to specify more precision. Double precision on most modern +machines gives you 16 or 17 decimal digits of precision. + + Strange results can happen if you set `CONVFMT' to a string that +doesn't tell `sprintf' how to format floating point numbers in a useful +way. For example, if you forget the `%' in the format, all numbers +will be converted to the same constant string. + + As a special case, if a number is an integer, then the result of +converting it to a string is *always* an integer, no matter what the +value of `CONVFMT' may be. Given the following code fragment: + + CONVFMT = "%2.2f" + a = 12 + b = a "" + +`b' has the value `"12"', not `"12.00"'. + + Prior to the POSIX standard, `awk' specified that the value of +`OFMT' was used for converting numbers to strings. `OFMT' specifies +the output format to use when printing numbers with `print'. `CONVFMT' +was introduced in order to separate the semantics of conversions from +the semantics of printing. Both `CONVFMT' and `OFMT' have the same +default value: `"%.6g"'. In the vast majority of cases, old `awk' +programs will not change their behavior. However, this use of `OFMT' +is something to keep in mind if you must port your program to other +implementations of `awk'; we recommend that instead of changing your +programs, you just port `gawk' itself! + + +File: gawk.info, Node: Values, Next: Conditional Exp, Prev: Conversion, Up: Expressions + +Numeric and String Values +========================= + + Through most of this manual, we present `awk' values (such as +constants, fields, or variables) as *either* numbers *or* strings. +This is a convenient way to think about them, since typically they are +used in only one way, or the other. + + In truth though, `awk' values can be *both* string and numeric, at +the same time. Internally, `awk' represents values with a string, a +(floating point) number, and an indication that one, the other, or both +representations of the value are valid. + + Keeping track of both kinds of values is important for execution +efficiency: a variable can acquire a string value the first time it is +used as a string, and then that string value can be used until the +variable is assigned a new value. Thus, if a variable with only a +numeric value is used in several concatenations in a row, it only has +to be given a string representation once. The numeric value remains +valid, so that no conversion back to a number is necessary if the +variable is later used in an arithmetic expression. + + Tracking both kinds of values is also important for precise numerical +calculations. Consider the following: + + a = 123.321 + CONVFMT = "%3.1f" + b = a " is a number" + c = a + 1.654 + +The variable `a' receives a string value in the concatenation and +assignment to `b'. The string value of `a' is `"123.3"'. If the +numeric value was lost when it was converted to a string, then the +numeric use of `a' in the last statement would lose information. `c' +would be assigned the value 124.954 instead of 124.975. Such errors +accumulate rapidly, and very adversely affect numeric computations. + + Once a numeric value acquires a corresponding string value, it stays +valid until a new assignment is made. If `CONVFMT' (*note Conversion +of Strings and Numbers: Conversion.) changes in the meantime, the old +string value will still be used. For example: + + BEGIN { + CONVFMT = "%2.2f" + a = 123.456 + b = a "" # force `a' to have string value too + printf "a = %s\n", a + CONVFMT = "%.6g" + printf "a = %s\n", a + a += 0 # make `a' numeric only again + printf "a = %s\n", a # use `a' as string + } + +This program prints `a = 123.46' twice, and then prints `a = 123.456'. + + *Note Conversion of Strings and Numbers: Conversion, for the rules +that specify how string values are made from numeric values. + + +File: gawk.info, Node: Conditional Exp, Next: Function Calls, Prev: Values, Up: Expressions + +Conditional Expressions +======================= + + A "conditional expression" is a special kind of expression with +three operands. It allows you to use one expression's value to select +one of two other expressions. + + The conditional expression looks the same as in the C language: + + SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP + +There are three subexpressions. The first, SELECTOR, is always +computed first. If it is "true" (not zero and not null) then +IF-TRUE-EXP is computed next and its value becomes the value of the +whole expression. Otherwise, IF-FALSE-EXP is computed next and its +value becomes the value of the whole expression. + + For example, this expression produces the absolute value of `x': + + x > 0 ? x : -x + + Each time the conditional expression is computed, exactly one of +IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This +is important when the expressions contain side effects. For example, +this conditional expression examines element `i' of either array `a' or +array `b', and increments `i'. + + x == y ? a[i++] : b[i++] + +This is guaranteed to increment `i' exactly once, because each time one +or the other of the two increment expressions is executed, and the +other is not. + + +File: gawk.info, Node: Function Calls, Next: Precedence, Prev: Conditional Exp, Up: Expressions + +Function Calls +============== + + A "function" is a name for a particular calculation. Because it has +a name, you can ask for it by name at any point in the program. For +example, the function `sqrt' computes the square root of a number. + + A fixed set of functions are "built-in", which means they are +available in every `awk' program. The `sqrt' function is one of these. +*Note Built-in Functions: Built-in, for a list of built-in functions +and their descriptions. In addition, you can define your own functions +in the program for use elsewhere in the same program. *Note +User-defined Functions: User-defined, for how to do this. + + The way to use a function is with a "function call" expression, +which consists of the function name followed by a list of "arguments" +in parentheses. The arguments are expressions which give the raw +materials for the calculation that the function will do. When there is +more than one argument, they are separated by commas. If there are no +arguments, write just `()' after the function name. Here are some +examples: + + sqrt(x^2 + y^2) # One argument + atan2(y, x) # Two arguments + rand() # No arguments + + *Do not put any space between the function name and the +open-parenthesis!* A user-defined function name looks just like the +name of a variable, and space would make the expression look like +concatenation of a variable with an expression inside parentheses. +Space before the parenthesis is harmless with built-in functions, but +it is best not to get into the habit of using space to avoid mistakes +with user-defined functions. + + Each function expects a particular number of arguments. For +example, the `sqrt' function must be called with a single argument, the +number to take the square root of: + + sqrt(ARGUMENT) + + Some of the built-in functions allow you to omit the final argument. +If you do so, they use a reasonable default. *Note Built-in Functions: +Built-in, for full details. If arguments are omitted in calls to +user-defined functions, then those arguments are treated as local +variables, initialized to the null string (*note User-defined +Functions: User-defined.). + + Like every other expression, the function call has a value, which is +computed by the function based on the arguments you give it. In this +example, the value of `sqrt(ARGUMENT)' is the square root of the +argument. A function can also have side effects, such as assigning the +values of certain variables or doing I/O. + + Here is a command to read numbers, one number per line, and print the +square root of each one: + + awk '{ print "The square root of", $1, "is", sqrt($1) }' + + +File: gawk.info, Node: Precedence, Prev: Function Calls, Up: Expressions + +Operator Precedence (How Operators Nest) +======================================== + + "Operator precedence" determines how operators are grouped, when +different operators appear close by in one expression. For example, +`*' has higher precedence than `+'; thus, `a + b * c' means to multiply +`b' and `c', and then add `a' to the product (i.e., `a + (b * c)'). + + You can overrule the precedence of the operators by using +parentheses. You can think of the precedence rules as saying where the +parentheses are assumed if you do not write parentheses yourself. In +fact, it is wise to always use parentheses whenever you have an unusual +combination of operators, because other people who read the program may +not remember what the precedence is in this case. You might forget, +too; then you could make a mistake. Explicit parentheses will help +prevent any such mistake. + + When operators of equal precedence are used together, the leftmost +operator groups first, except for the assignment, conditional and +exponentiation operators, which group in the opposite order. Thus, `a +- b + c' groups as `(a - b) + c'; `a = b = c' groups as `a = (b = c)'. + + The precedence of prefix unary operators does not matter as long as +only unary operators are involved, because there is only one way to +parse them--innermost first. Thus, `$++i' means `$(++i)' and `++$x' +means `++($x)'. However, when another operator follows the operand, +then the precedence of the unary operators can matter. Thus, `$x^2' +means `($x)^2', but `-x^2' means `-(x^2)', because `-' has lower +precedence than `^' while `$' has higher precedence. + + Here is a table of the operators of `awk', in order of increasing +precedence: + +assignment + `=', `+=', `-=', `*=', `/=', `%=', `^=', `**='. These operators + group right-to-left. (The `**=' operator is not specified by + POSIX.) + +conditional + `?:'. This operator groups right-to-left. + +logical "or". + `||'. + +logical "and". + `&&'. + +array membership + `in'. + +matching + `~', `!~'. + +relational, and redirection + The relational operators and the redirections have the same + precedence level. Characters such as `>' serve both as + relationals and as redirections; the context distinguishes between + the two meanings. + + The relational operators are `<', `<=', `==', `!=', `>=' and `>'. + + The I/O redirection operators are `<', `>', `>>' and `|'. + + Note that I/O redirection operators in `print' and `printf' + statements belong to the statement level, not to expressions. The + redirection does not produce an expression which could be the + operand of another operator. As a result, it does not make sense + to use a redirection operator near another operator of lower + precedence, without parentheses. Such combinations, for example + `print foo > a ? b : c', result in syntax errors. + +concatenation + No special token is used to indicate concatenation. The operands + are simply written side by side. + +add, subtract + `+', `-'. + +multiply, divide, mod + `*', `/', `%'. + +unary plus, minus, "not" + `+', `-', `!'. + +exponentiation + `^', `**'. These operators group right-to-left. (The `**' + operator is not specified by POSIX.) + +increment, decrement + `++', `--'. + +field + `$'. + + +File: gawk.info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top + +Control Statements in Actions +***************************** + + "Control statements" such as `if', `while', and so on control the +flow of execution in `awk' programs. Most of the control statements in +`awk' are patterned on similar statements in C. + + All the control statements start with special keywords such as `if' +and `while', to distinguish them from simple expressions. + + Many control statements contain other statements; for example, the +`if' statement contains another statement which may or may not be +executed. The contained statement is called the "body". If you want +to include more than one statement in the body, group them into a +single compound statement with curly braces, separating them with +newlines or semicolons. + +* Menu: + +* If Statement:: Conditionally execute + some `awk' statements. +* While Statement:: Loop until some condition is satisfied. +* Do Statement:: Do specified action while looping until some + condition is satisfied. +* For Statement:: Another looping statement, that provides + initialization and increment clauses. +* Break Statement:: Immediately exit the innermost enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input record. +* Next File Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of `awk'. + + +File: gawk.info, Node: If Statement, Next: While Statement, Prev: Statements, Up: Statements + +The `if' Statement +================== + + The `if'-`else' statement is `awk''s decision-making statement. It +looks like this: + + if (CONDITION) THEN-BODY [else ELSE-BODY] + +CONDITION is an expression that controls what the rest of the statement +will do. If CONDITION is true, THEN-BODY is executed; otherwise, +ELSE-BODY is executed (assuming that the `else' clause is present). +The `else' part of the statement is optional. The condition is +considered false if its value is zero or the null string, and true +otherwise. + + Here is an example: + + if (x % 2 == 0) + print "x is even" + else + print "x is odd" + + In this example, if the expression `x % 2 == 0' is true (that is, +the value of `x' is divisible by 2), then the first `print' statement +is executed, otherwise the second `print' statement is performed. + + If the `else' appears on the same line as THEN-BODY, and THEN-BODY +is not a compound statement (i.e., not surrounded by curly braces), +then a semicolon must separate THEN-BODY from `else'. To illustrate +this, let's rewrite the previous example: + + awk '{ if (x % 2 == 0) print "x is even"; else + print "x is odd" }' + +If you forget the `;', `awk' won't be able to parse the statement, and +you will get a syntax error. + + We would not actually write this example this way, because a human +reader might fail to see the `else' if it were not the first thing on +its line. + + +File: gawk.info, Node: While Statement, Next: Do Statement, Prev: If Statement, Up: Statements + +The `while' Statement +===================== + + In programming, a "loop" means a part of a program that is (or at +least can be) executed two or more times in succession. + + The `while' statement is the simplest looping statement in `awk'. +It repeatedly executes a statement as long as a condition is true. It +looks like this: + + while (CONDITION) + BODY + +Here BODY is a statement that we call the "body" of the loop, and +CONDITION is an expression that controls how long the loop keeps +running. + + The first thing the `while' statement does is test CONDITION. If +CONDITION is true, it executes the statement BODY. (CONDITION is true +when the value is not zero and not a null string.) After BODY has been +executed, CONDITION is tested again, and if it is still true, BODY is +executed again. This process repeats until CONDITION is no longer +true. If CONDITION is initially false, the body of the loop is never +executed. + + This example prints the first three fields of each record, one per +line. + + awk '{ i = 1 + while (i <= 3) { + print $i + i++ + } + }' + +Here the body of the loop is a compound statement enclosed in braces, +containing two statements. + + The loop works like this: first, the value of `i' is set to 1. +Then, the `while' tests whether `i' is less than or equal to three. +This is the case when `i' equals one, so the `i'-th field is printed. +Then the `i++' increments the value of `i' and the loop repeats. The +loop terminates when `i' reaches 4. + + As you can see, a newline is not required between the condition and +the body; but using one makes the program clearer unless the body is a +compound statement or is very simple. The newline after the open-brace +that begins the compound statement is not required either, but the +program would be hard to read without it. + + +File: gawk.info, Node: Do Statement, Next: For Statement, Prev: While Statement, Up: Statements + +The `do'-`while' Statement +========================== + + The `do' loop is a variation of the `while' looping statement. The +`do' loop executes the BODY once, then repeats BODY as long as +CONDITION is true. It looks like this: + + do + BODY + while (CONDITION) + + Even if CONDITION is false at the start, BODY is executed at least +once (and only once, unless executing BODY makes CONDITION true). +Contrast this with the corresponding `while' statement: + + while (CONDITION) + BODY + +This statement does not execute BODY even once if CONDITION is false to +begin with. + + Here is an example of a `do' statement: + + awk '{ i = 1 + do { + print $0 + i++ + } while (i <= 10) + }' + +prints each input record ten times. It isn't a very realistic example, +since in this case an ordinary `while' would do just as well. But this +reflects actual experience; there is only occasionally a real use for a +`do' statement. + |