diff options
Diffstat (limited to 'gawk.info-6')
-rw-r--r-- | gawk.info-6 | 1234 |
1 files changed, 1234 insertions, 0 deletions
diff --git a/gawk.info-6 b/gawk.info-6 new file mode 100644 index 00000000..2dfef35e --- /dev/null +++ b/gawk.info-6 @@ -0,0 +1,1234 @@ +This is Info file gawk.info, produced by Makeinfo-1.54 from the input +file gawk.texi. + + This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + + This is Edition 0.15 of `The GAWK Manual', +for the 2.15 version of the GNU implementation +of AWK. + + Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + +File: gawk.info, Node: I/O Functions, Next: Time Functions, Prev: String Functions, Up: Built-in + +Built-in Functions for Input/Output +=================================== + +`close(FILENAME)' + Close the file FILENAME, for input or output. The argument may + alternatively be a shell command that was used for redirecting to + or from a pipe; then the pipe is closed. + + *Note Closing Input Files and Pipes: Close Input, regarding closing + input files and pipes. *Note Closing Output Files and Pipes: + Close Output, regarding closing output files and pipes. + +`system(COMMAND)' + The system function allows the user to execute operating system + commands and then return to the `awk' program. The `system' + function executes the command given by the string COMMAND. It + returns, as its value, the status returned by the command that was + executed. + + For example, if the following fragment of code is put in your `awk' + program: + + END { + system("mail -s 'awk run done' operator < /dev/null") + } + + the system operator will be sent mail when the `awk' program + finishes processing input and begins its end-of-input processing. + + Note that much the same result can be obtained by redirecting + `print' or `printf' into a pipe. However, if your `awk' program + is interactive, `system' is useful for cranking up large + self-contained programs, such as a shell or an editor. + + Some operating systems cannot implement the `system' function. + `system' causes a fatal error if it is not supported. + +Controlling Output Buffering with `system' +------------------------------------------ + + Many utility programs will "buffer" their output; they save +information to be written to a disk file or terminal in memory, until +there is enough to be written in one operation. This is often more +efficient than writing every little bit of information as soon as it is +ready. However, sometimes it is necessary to force a program to +"flush" its buffers; that is, write the information to its destination, +even if a buffer is not full. You can do this from your `awk' program +by calling `system' with a null string as its argument: + + system("") # flush output + +`gawk' treats this use of the `system' function as a special case, and +is smart enough not to run a shell (or other command interpreter) with +the empty command. Therefore, with `gawk', this idiom is not only +useful, it is efficient. While this idiom should work with other `awk' +implementations, it will not necessarily avoid starting an unnecessary +shell. + + +File: gawk.info, Node: Time Functions, Prev: I/O Functions, Up: Built-in + +Functions for Dealing with Time Stamps +====================================== + + A common use for `awk' programs is the processing of log files. Log +files often contain time stamp information, indicating when a +particular log record was written. Many programs log their time stamp +in the form returned by the `time' system call, which is the number of +seconds since a particular epoch. On POSIX systems, it is the number +of seconds since Midnight, January 1, 1970, UTC. + + In order to make it easier to process such log files, and to easily +produce useful reports, `gawk' provides two functions for working with +time stamps. Both of these are `gawk' extensions; they are not +specified in the POSIX standard, nor are they in any other known version +of `awk'. + +`systime()' + This function returns the current time as the number of seconds + since the system epoch. On POSIX systems, this is the number of + seconds since Midnight, January 1, 1970, UTC. It may be a + different number on other systems. + +`strftime(FORMAT, TIMESTAMP)' + This function returns a string. It is similar to the function of + the same name in the ANSI C standard library. The time specified + by TIMESTAMP is used to produce a string, based on the contents of + the FORMAT string. + + The `systime' function allows you to compare a time stamp from a log +file with the current time of day. In particular, it is easy to +determine how long ago a particular record was logged. It also allows +you to produce log records using the "seconds since the epoch" format. + + The `strftime' function allows you to easily turn a time stamp into +human-readable information. It is similar in nature to the `sprintf' +function, copying non-format specification characters verbatim to the +returned string, and substituting date and time values for format +specifications in the FORMAT string. If no TIMESTAMP argument is +supplied, `gawk' will use the current time of day as the time stamp. + + `strftime' is guaranteed by the ANSI C standard to support the +following date format specifications: + +`%a' + The locale's abbreviated weekday name. + +`%A' + The locale's full weekday name. + +`%b' + The locale's abbreviated month name. + +`%B' + The locale's full month name. + +`%c' + The locale's "appropriate" date and time representation. + +`%d' + The day of the month as a decimal number (01-31). + +`%H' + The hour (24-hour clock) as a decimal number (00-23). + +`%I' + The hour (12-hour clock) as a decimal number (01-12). + +`%j' + The day of the year as a decimal number (001-366). + +`%m' + The month as a decimal number (01-12). + +`%M' + The minute as a decimal number (00-59). + +`%p' + The locale's equivalent of the AM/PM designations associated with + a 12-hour clock. + +`%S' + The second as a decimal number (00-61). (Occasionally there are + minutes in a year with one or two leap seconds, which is why the + seconds can go from 0 all the way to 61.) + +`%U' + The week number of the year (the first Sunday as the first day of + week 1) as a decimal number (00-53). + +`%w' + The weekday as a decimal number (0-6). Sunday is day 0. + +`%W' + The week number of the year (the first Monday as the first day of + week 1) as a decimal number (00-53). + +`%x' + The locale's "appropriate" date representation. + +`%X' + The locale's "appropriate" time representation. + +`%y' + The year without century as a decimal number (00-99). + +`%Y' + The year with century as a decimal number. + +`%Z' + The time zone name or abbreviation, or no characters if no time + zone is determinable. + +`%%' + A literal `%'. + + If a conversion specifier is not one of the above, the behavior is +undefined. (This is because the ANSI standard for C leaves the +behavior of the C version of `strftime' undefined, and `gawk' will use +the system's version of `strftime' if it's there. Typically, the +conversion specifier will either not appear in the returned string, or +it will appear literally.) + + Informally, a "locale" is the geographic place in which a program is +meant to run. For example, a common way to abbreviate the date +September 4, 1991 in the United States would be "9/4/91". In many +countries in Europe, however, it would be abbreviated "4.9.91". Thus, +the `%x' specification in a `"US"' locale might produce `9/4/91', while +in a `"EUROPE"' locale, it might produce `4.9.91'. The ANSI C standard +defines a default `"C"' locale, which is an environment that is typical +of what most C programmers are used to. + + A public-domain C version of `strftime' is shipped with `gawk' for +systems that are not yet fully ANSI-compliant. If that version is used +to compile `gawk' (*note Installing `gawk': Installation.), then the +following additional format specifications are available: + +`%D' + Equivalent to specifying `%m/%d/%y'. + +`%e' + The day of the month, padded with a blank if it is only one digit. + +`%h' + Equivalent to `%b', above. + +`%n' + A newline character (ASCII LF). + +`%r' + Equivalent to specifying `%I:%M:%S %p'. + +`%R' + Equivalent to specifying `%H:%M'. + +`%T' + Equivalent to specifying `%H:%M:%S'. + +`%t' + A TAB character. + +`%k' + is replaced by the hour (24-hour clock) as a decimal number (0-23). + Single digit numbers are padded with a blank. + +`%l' + is replaced by the hour (12-hour clock) as a decimal number (1-12). + Single digit numbers are padded with a blank. + +`%C' + The century, as a number between 00 and 99. + +`%u' + is replaced by the weekday as a decimal number [1 (Monday)-7]. + +`%V' + is replaced by the week number of the year (the first Monday as + the first day of week 1) as a decimal number (01-53). The method + for determining the week number is as specified by ISO 8601 (to + wit: if the week containing January 1 has four or more days in the + new year, then it is week 1, otherwise it is week 53 of the + previous year and the next week is week 1). + +`%Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI' +`%Om %OM %OS %Ou %OU %OV %Ow %OW %Oy' + These are "alternate representations" for the specifications that + use only the second letter (`%c', `%C', and so on). They are + recognized, but their normal representations are used. (These + facilitate compliance with the POSIX `date' utility.) + +`%v' + The date in VMS format (e.g. 20-JUN-1991). + + Here are two examples that use `strftime'. The first is an `awk' +version of the C `ctime' function. (This is a user defined function, +which we have not discussed yet. *Note User-defined Functions: +User-defined, for more information.) + + # ctime.awk + # + # awk version of C ctime(3) function + + function ctime(ts, format) + { + format = "%a %b %e %H:%M:%S %Z %Y" + if (ts == 0) + ts = systime() # use current time as default + return strftime(format, ts) + } + + This next example is an `awk' implementation of the POSIX `date' +utility. Normally, the `date' utility prints the current date and time +of day in a well known format. However, if you provide an argument to +it that begins with a `+', `date' will copy non-format specifier +characters to the standard output, and will interpret the current time +according to the format specifiers in the string. For example: + + date '+Today is %A, %B %d, %Y.' + +might print + + Today is Thursday, July 11, 1991. + + Here is the `awk' version of the `date' utility. + + #! /usr/bin/gawk -f + # + # date --- implement the P1003.2 Draft 11 'date' command + # + # Bug: does not recognize the -u argument. + + BEGIN \ + { + format = "%a %b %e %H:%M:%S %Z %Y" + exitval = 0 + + if (ARGC > 2) + exitval = 1 + else if (ARGC == 2) { + format = ARGV[1] + if (format ~ /^\+/) + format = substr(format, 2) # remove leading + + } + print strftime(format) + exit exitval + } + + +File: gawk.info, Node: User-defined, Next: Built-in Variables, Prev: Built-in, Up: Top + +User-defined Functions +********************** + + Complicated `awk' programs can often be simplified by defining your +own functions. User-defined functions can be called just like built-in +ones (*note Function Calls::.), but it is up to you to define them--to +tell `awk' what they should do. + +* Menu: + +* Definition Syntax:: How to write definitions and what they mean. +* Function Example:: An example function definition and + what it does. +* Function Caveats:: Things to watch out for. +* Return Statement:: Specifying the value a function returns. + + +File: gawk.info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined + +Syntax of Function Definitions +============================== + + Definitions of functions can appear anywhere between the rules of the +`awk' program. Thus, the general form of an `awk' program is extended +to include sequences of rules *and* user-defined function definitions. + + The definition of a function named NAME looks like this: + + function NAME (PARAMETER-LIST) { + BODY-OF-FUNCTION + } + +NAME is the name of the function to be defined. A valid function name +is like a valid variable name: a sequence of letters, digits and +underscores, not starting with a digit. Functions share the same pool +of names as variables and arrays. + + PARAMETER-LIST is a list of the function's arguments and local +variable names, separated by commas. When the function is called, the +argument names are used to hold the argument values given in the call. +The local variables are initialized to the null string. + + The BODY-OF-FUNCTION consists of `awk' statements. It is the most +important part of the definition, because it says what the function +should actually *do*. The argument names exist to give the body a way +to talk about the arguments; local variables, to give the body places +to keep temporary values. + + Argument names are not distinguished syntactically from local +variable names; instead, the number of arguments supplied when the +function is called determines how many argument variables there are. +Thus, if three argument values are given, the first three names in +PARAMETER-LIST are arguments, and the rest are local variables. + + It follows that if the number of arguments is not the same in all +calls to the function, some of the names in PARAMETER-LIST may be +arguments on some occasions and local variables on others. Another way +to think of this is that omitted arguments default to the null string. + + Usually when you write a function you know how many names you intend +to use for arguments and how many you intend to use as locals. By +convention, you should write an extra space between the arguments and +the locals, so other people can follow how your function is supposed to +be used. + + During execution of the function body, the arguments and local +variable values hide or "shadow" any variables of the same names used +in the rest of the program. The shadowed variables are not accessible +in the function definition, because there is no way to name them while +their names have been taken away for the local variables. All other +variables used in the `awk' program can be referenced or set normally +in the function definition. + + The arguments and local variables last only as long as the function +body is executing. Once the body finishes, the shadowed variables come +back. + + The function body can contain expressions which call functions. They +can even call this function, either directly or by way of another +function. When this happens, we say the function is "recursive". + + There is no need in `awk' to put the definition of a function before +all uses of the function. This is because `awk' reads the entire +program before starting to execute any of it. + + In many `awk' implementations, the keyword `function' may be +abbreviated `func'. However, POSIX only specifies the use of the +keyword `function'. This actually has some practical implications. If +`gawk' is in POSIX-compatibility mode (*note Invoking `awk': Command +Line.), then the following statement will *not* define a function: + + func foo() { a = sqrt($1) ; print a } + +Instead it defines a rule that, for each record, concatenates the value +of the variable `func' with the return value of the function `foo', and +based on the truth value of the result, executes the corresponding +action. This is probably not what was desired. (`awk' accepts this +input as syntactically valid, since functions may be used before they +are defined in `awk' programs.) + + +File: gawk.info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined + +Function Definition Example +=========================== + + Here is an example of a user-defined function, called `myprint', that +takes a number and prints it in a specific format. + + function myprint(num) + { + printf "%6.3g\n", num + } + +To illustrate, here is an `awk' rule which uses our `myprint' function: + + $3 > 0 { myprint($3) } + +This program prints, in our special format, all the third fields that +contain a positive number in our input. Therefore, when given: + + 1.2 3.4 5.6 7.8 + 9.10 11.12 -13.14 15.16 + 17.18 19.20 21.22 23.24 + +this program, using our function to format the results, prints: + + 5.6 + 21.2 + + Here is a rather contrived example of a recursive function. It +prints a string backwards: + + function rev (str, len) { + if (len == 0) { + printf "\n" + return + } + printf "%c", substr(str, len, 1) + rev(str, len - 1) + } + + +File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined + +Calling User-defined Functions +============================== + + "Calling a function" means causing the function to run and do its +job. A function call is an expression, and its value is the value +returned by the function. + + A function call consists of the function name followed by the +arguments in parentheses. What you write in the call for the arguments +are `awk' expressions; each time the call is executed, these +expressions are evaluated, and the values are the actual arguments. For +example, here is a call to `foo' with three arguments (the first being +a string concatenation): + + foo(x y, "lose", 4 * z) + + *Caution:* whitespace characters (spaces and tabs) are not allowed + between the function name and the open-parenthesis of the argument + list. If you write whitespace by mistake, `awk' might think that + you mean to concatenate a variable with an expression in + parentheses. However, it notices that you used a function name + and not a variable name, and reports an error. + + When a function is called, it is given a *copy* of the values of its +arguments. This is called "call by value". The caller may use a +variable as the expression for the argument, but the called function +does not know this: it only knows what value the argument had. For +example, if you write this code: + + foo = "bar" + z = myfunc(foo) + +then you should not think of the argument to `myfunc' as being "the +variable `foo'." Instead, think of the argument as the string value, +`"bar"'. + + If the function `myfunc' alters the values of its local variables, +this has no effect on any other variables. In particular, if `myfunc' +does this: + + function myfunc (win) { + print win + win = "zzz" + print win + } + +to change its first argument variable `win', this *does not* change the +value of `foo' in the caller. The role of `foo' in calling `myfunc' +ended when its value, `"bar"', was computed. If `win' also exists +outside of `myfunc', the function body cannot alter this outer value, +because it is shadowed during the execution of `myfunc' and cannot be +seen or changed from there. + + However, when arrays are the parameters to functions, they are *not* +copied. Instead, the array itself is made available for direct +manipulation by the function. This is usually called "call by +reference". Changes made to an array parameter inside the body of a +function *are* visible outside that function. This can be *very* +dangerous if you do not watch what you are doing. For example: + + function changeit (array, ind, nvalue) { + array[ind] = nvalue + } + + BEGIN { + a[1] = 1 ; a[2] = 2 ; a[3] = 3 + changeit(a, 2, "two") + printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3] + } + +prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit' +stores `"two"' in the second element of `a'. + + +File: gawk.info, Node: Return Statement, Prev: Function Caveats, Up: User-defined + +The `return' Statement +====================== + + The body of a user-defined function can contain a `return' statement. +This statement returns control to the rest of the `awk' program. It +can also be used to return a value for use in the rest of the `awk' +program. It looks like this: + + return EXPRESSION + + The EXPRESSION part is optional. If it is omitted, then the returned +value is undefined and, therefore, unpredictable. + + A `return' statement with no value expression is assumed at the end +of every function definition. So if control reaches the end of the +function body, then the function returns an unpredictable value. `awk' +will not warn you if you use the return value of such a function; you +will simply get unpredictable or unexpected results. + + Here is an example of a user-defined function that returns a value +for the largest number among the elements of an array: + + function maxelt (vec, i, ret) { + for (i in vec) { + if (ret == "" || vec[i] > ret) + ret = vec[i] + } + return ret + } + +You call `maxelt' with one argument, which is an array name. The local +variables `i' and `ret' are not intended to be arguments; while there +is nothing to stop you from passing two or three arguments to `maxelt', +the results would be strange. The extra space before `i' in the +function parameter list is to indicate that `i' and `ret' are not +supposed to be arguments. This is a convention which you should follow +when you define functions. + + Here is a program that uses our `maxelt' function. It loads an +array, calls `maxelt', and then reports the maximum number in that +array: + + awk ' + function maxelt (vec, i, ret) { + for (i in vec) { + if (ret == "" || vec[i] > ret) + ret = vec[i] + } + return ret + } + + # Load all fields of each record into nums. + { + for(i = 1; i <= NF; i++) + nums[NR, i] = $i + } + + END { + print maxelt(nums) + }' + + Given the following input: + + 1 5 23 8 16 + 44 3 5 2 8 26 + 256 291 1396 2962 100 + -6 467 998 1101 + 99385 11 0 225 + +our program tells us (predictably) that: + + 99385 + +is the largest number in our array. + + +File: gawk.info, Node: Built-in Variables, Next: Command Line, Prev: User-defined, Up: Top + +Built-in Variables +****************** + + Most `awk' variables are available for you to use for your own +purposes; they never change except when your program assigns values to +them, and never affect anything except when your program examines them. + + A few variables have special built-in meanings. Some of them `awk' +examines automatically, so that they enable you to tell `awk' how to do +certain things. Others are set automatically by `awk', so that they +carry information from the internal workings of `awk' to your program. + + This chapter documents all the built-in variables of `gawk'. Most +of them are also documented in the chapters where their areas of +activity are described. + +* Menu: + +* User-modified:: Built-in variables that you change + to control `awk'. +* Auto-set:: Built-in variables where `awk' + gives you information. + + +File: gawk.info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables + +Built-in Variables that Control `awk' +===================================== + + This is a list of the variables which you can change to control how +`awk' does certain things. + +`CONVFMT' + This string is used by `awk' to control conversion of numbers to + strings (*note Conversion of Strings and Numbers: Conversion.). + It works by being passed, in effect, as the first argument to the + `sprintf' function. Its default value is `"%.6g"'. `CONVFMT' was + introduced by the POSIX standard. + +`FIELDWIDTHS' + This is a space separated list of columns that tells `gawk' how to + manage input with fixed, columnar boundaries. It is an + experimental feature that is still evolving. Assigning to + `FIELDWIDTHS' overrides the use of `FS' for field splitting. + *Note Reading Fixed-width Data: Constant Size, for more + information. + + If `gawk' is in compatibility mode (*note Invoking `awk': Command + Line.), then `FIELDWIDTHS' has no special meaning, and field + splitting operations are done based exclusively on the value of + `FS'. + +`FS' + `FS' is the input field separator (*note Specifying how Fields are + Separated: Field Separators.). The value is a single-character + string or a multi-character regular expression that matches the + separations between fields in an input record. + + The default value is `" "', a string consisting of a single space. + As a special exception, this value actually means that any + sequence of spaces and tabs is a single separator. It also causes + spaces and tabs at the beginning or end of a line to be ignored. + + You can set the value of `FS' on the command line using the `-F' + option: + + awk -F, 'PROGRAM' INPUT-FILES + + If `gawk' is using `FIELDWIDTHS' for field-splitting, assigning a + value to `FS' will cause `gawk' to return to the normal, + regexp-based, field splitting. + +`IGNORECASE' + If `IGNORECASE' is nonzero, then *all* regular expression matching + is done in a case-independent fashion. In particular, regexp + matching with `~' and `!~', and the `gsub' `index', `match', + `split' and `sub' functions all ignore case when doing their + particular regexp operations. *Note:* since field splitting with + the value of the `FS' variable is also a regular expression + operation, that too is done with case ignored. *Note + Case-sensitivity in Matching: Case-sensitivity. + + If `gawk' is in compatibility mode (*note Invoking `awk': Command + Line.), then `IGNORECASE' has no special meaning, and regexp + operations are always case-sensitive. + +`OFMT' + This string is used by `awk' to control conversion of numbers to + strings (*note Conversion of Strings and Numbers: Conversion.) for + printing with the `print' statement. It works by being passed, in + effect, as the first argument to the `sprintf' function. Its + default value is `"%.6g"'. Earlier versions of `awk' also used + `OFMT' to specify the format for converting numbers to strings in + general expressions; this has been taken over by `CONVFMT'. + +`OFS' + This is the output field separator (*note Output Separators::.). + It is output between the fields output by a `print' statement. Its + default value is `" "', a string consisting of a single space. + +`ORS' + This is the output record separator. It is output at the end of + every `print' statement. Its default value is a string containing + a single newline character, which could be written as `"\n"'. + (*Note Output Separators::.) + +`RS' + This is `awk''s input record separator. Its default value is a + string containing a single newline character, which means that an + input record consists of a single line of text. (*Note How Input + is Split into Records: Records.) + +`SUBSEP' + `SUBSEP' is the subscript separator. It has the default value of + `"\034"', and is used to separate the parts of the name of a + multi-dimensional array. Thus, if you access `foo[12,3]', it + really accesses `foo["12\0343"]' (*note Multi-dimensional Arrays: + Multi-dimensional.). + + +File: gawk.info, Node: Auto-set, Prev: User-modified, Up: Built-in Variables + +Built-in Variables that Convey Information +========================================== + + This is a list of the variables that are set automatically by `awk' +on certain occasions so as to provide information to your program. + +`ARGC' +`ARGV' + The command-line arguments available to `awk' programs are stored + in an array called `ARGV'. `ARGC' is the number of command-line + arguments present. *Note Invoking `awk': Command Line. `ARGV' is + indexed from zero to `ARGC - 1'. For example: + + awk 'BEGIN { + for (i = 0; i < ARGC; i++) + print ARGV[i] + }' inventory-shipped BBS-list + + In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains + `"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The + value of `ARGC' is 3, one more than the index of the last element + in `ARGV' since the elements are numbered from zero. + + The names `ARGC' and `ARGV', as well the convention of indexing + the array from 0 to `ARGC - 1', are derived from the C language's + method of accessing command line arguments. + + Notice that the `awk' program is not entered in `ARGV'. The other + special command line options, with their arguments, are also not + entered. But variable assignments on the command line *are* + treated as arguments, and do show up in the `ARGV' array. + + Your program can alter `ARGC' and the elements of `ARGV'. Each + time `awk' reaches the end of an input file, it uses the next + element of `ARGV' as the name of the next input file. By storing a + different string there, your program can change which files are + read. You can use `"-"' to represent the standard input. By + storing additional elements and incrementing `ARGC' you can cause + additional files to be read. + + If you decrease the value of `ARGC', that eliminates input files + from the end of the list. By recording the old value of `ARGC' + elsewhere, your program can treat the eliminated arguments as + something other than file names. + + To eliminate a file from the middle of the list, store the null + string (`""') into `ARGV' in place of the file's name. As a + special feature, `awk' ignores file names that have been replaced + with the null string. + +`ARGIND' + The index in `ARGV' of the current file being processed. Every + time `gawk' opens a new data file for processing, it sets `ARGIND' + to the index in `ARGV' of the file name. Thus, the condition + `FILENAME == ARGV[ARGIND]' is always true. + + This variable is useful in file processing; it allows you to tell + how far along you are in the list of data files, and to + distinguish between multiple successive instances of the same + filename on the command line. + + While you can change the value of `ARGIND' within your `awk' + program, `gawk' will automatically set it to a new value when the + next file is opened. + + This variable is a `gawk' extension; in other `awk' implementations + it is not special. + +`ENVIRON' + This is an array that contains the values of the environment. The + array indices are the environment variable names; the values are + the values of the particular environment variables. For example, + `ENVIRON["HOME"]' might be `/u/close'. Changing this array does + not affect the environment passed on to any programs that `awk' + may spawn via redirection or the `system' function. (In a future + version of `gawk', it may do so.) + + Some operating systems may not have environment variables. On + such systems, the array `ENVIRON' is empty. + +`ERRNO' + If a system error occurs either doing a redirection for `getline', + during a read for `getline', or during a `close' operation, then + `ERRNO' will contain a string describing the error. + + This variable is a `gawk' extension; in other `awk' implementations + it is not special. + +`FILENAME' + This is the name of the file that `awk' is currently reading. If + `awk' is reading from the standard input (in other words, there + are no files listed on the command line), `FILENAME' is set to + `"-"'. `FILENAME' is changed each time a new file is read (*note + Reading Input Files: Reading Files.). + +`FNR' + `FNR' is the current record number in the current file. `FNR' is + incremented each time a new record is read (*note Explicit Input + with `getline': Getline.). It is reinitialized to 0 each time a + new input file is started. + +`NF' + `NF' is the number of fields in the current input record. `NF' is + set each time a new record is read, when a new field is created, + or when `$0' changes (*note Examining Fields: Fields.). + +`NR' + This is the number of input records `awk' has processed since the + beginning of the program's execution. (*note How Input is Split + into Records: Records.). `NR' is set each time a new record is + read. + +`RLENGTH' + `RLENGTH' is the length of the substring matched by the `match' + function (*note Built-in Functions for String Manipulation: String + Functions.). `RLENGTH' is set by invoking the `match' function. + Its value is the length of the matched string, or -1 if no match + was found. + +`RSTART' + `RSTART' is the start-index in characters of the substring matched + by the `match' function (*note Built-in Functions for String + Manipulation: String Functions.). `RSTART' is set by invoking the + `match' function. Its value is the position of the string where + the matched substring starts, or 0 if no match was found. + + +File: gawk.info, Node: Command Line, Next: Language History, Prev: Built-in Variables, Up: Top + +Invoking `awk' +************** + + There are two ways to run `awk': with an explicit program, or with +one or more program files. Here are templates for both of them; items +enclosed in `[...]' in these templates are optional. + + Besides traditional one-letter POSIX-style options, `gawk' also +supports GNU long named options. + + awk [POSIX OR GNU STYLE OPTIONS] -f progfile [`--'] FILE ... + awk [POSIX OR GNU STYLE OPTIONS] [`--'] 'PROGRAM' FILE ... + +* Menu: + +* Options:: Command line options and their meanings. +* Other Arguments:: Input file names and variable assignments. +* AWKPATH Variable:: Searching directories for `awk' programs. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. + + +File: gawk.info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Command Line + +Command Line Options +==================== + + Options begin with a minus sign, and consist of a single character. +GNU style long named options consist of two minus signs and a keyword +that can be abbreviated if the abbreviation allows the option to be +uniquely identified. If the option takes an argument, then the keyword +is immediately followed by an equals sign (`=') and the argument's +value. For brevity, the discussion below only refers to the +traditional short options; however the long and short options are +interchangeable in all contexts. + + Each long named option for `gawk' has a corresponding POSIX-style +option. The options and their meanings are as follows: + +`-F FS' +`--field-separator=FS' + Sets the `FS' variable to FS (*note Specifying how Fields are + Separated: Field Separators.). + +`-f SOURCE-FILE' +`--file=SOURCE-FILE' + Indicates that the `awk' program is to be found in SOURCE-FILE + instead of in the first non-option argument. + +`-v VAR=VAL' +`--assign=VAR=VAL' + Sets the variable VAR to the value VAL *before* execution of the + program begins. Such variable values are available inside the + `BEGIN' rule (see below for a fuller explanation). + + The `-v' option can only set one variable, but you can use it more + than once, setting another variable each time, like this: + `-v foo=1 -v bar=2'. + +`-W GAWK-OPT' + Following the POSIX standard, options that are implementation + specific are supplied as arguments to the `-W' option. With + `gawk', these arguments may be separated by commas, or quoted and + separated by whitespace. Case is ignored when processing these + options. These options also have corresponding GNU style long + named options. The following `gawk'-specific options are + available: + + `-W compat' + `--compat' + Specifies "compatibility mode", in which the GNU extensions in + `gawk' are disabled, so that `gawk' behaves just like Unix + `awk'. *Note Extensions in `gawk' not in POSIX `awk': + POSIX/GNU, which summarizes the extensions. Also see *Note + Downward Compatibility and Debugging: Compatibility Mode. + + `-W copyleft' + `-W copyright' + `--copyleft' + `--copyright' + Print the short version of the General Public License. This + option may disappear in a future version of `gawk'. + + `-W help' + `-W usage' + `--help' + `--usage' + Print a "usage" message summarizing the short and long style + options that `gawk' accepts, and then exit. + + `-W lint' + `--lint' + Provide warnings about constructs that are dubious or + non-portable to other `awk' implementations. Some warnings + are issued when `gawk' first reads your program. Others are + issued at run-time, as your program executes. + + `-W posix' + `--posix' + Operate in strict POSIX mode. This disables all `gawk' + extensions (just like `-W compat'), and adds the following + additional restrictions: + + * `\x' escape sequences are not recognized (*note Constant + Expressions: Constants.). + + * The synonym `func' for the keyword `function' is not + recognized (*note Syntax of Function Definitions: + Definition Syntax.). + + * The operators `**' and `**=' cannot be used in place of + `^' and `^=' (*note Arithmetic Operators: Arithmetic + Ops., and also *note Assignment Expressions: Assignment + Ops.). + + * Specifying `-Ft' on the command line does not set the + value of `FS' to be a single tab character (*note + Specifying how Fields are Separated: Field Separators.). + + Although you can supply both `-W compat' and `-W posix' on the + command line, `-W posix' will take precedence. + + `-W source=PROGRAM-TEXT' + `--source=PROGRAM-TEXT' + Program source code is taken from the PROGRAM-TEXT. This + option allows you to mix `awk' source code in files with + program source code that you would enter on the command line. + This is particularly useful when you have library functions + that you wish to use from your command line programs (*note + The `AWKPATH' Environment Variable: AWKPATH Variable.). + + `-W version' + `--version' + Prints version information for this particular copy of `gawk'. + This is so you can determine if your copy of `gawk' is up to + date with respect to whatever the Free Software Foundation is + currently distributing. This option may disappear in a + future version of `gawk'. + +`--' + Signals the end of the command line options. The following + arguments are not treated as options even if they begin with `-'. + This interpretation of `--' follows the POSIX argument parsing + conventions. + + This is useful if you have file names that start with `-', or in + shell scripts, if you have file names that will be specified by + the user which could start with `-'. + + Any other options are flagged as invalid with a warning message, but +are otherwise ignored. + + In compatibility mode, as a special case, if the value of FS supplied +to the `-F' option is `t', then `FS' is set to the tab character +(`"\t"'). This is only true for `-W compat', and not for `-W posix' +(*note Specifying how Fields are Separated: Field Separators.). + + If the `-f' option is *not* used, then the first non-option command +line argument is expected to be the program text. + + The `-f' option may be used more than once on the command line. If +it is, `awk' reads its program source from all of the named files, as +if they had been concatenated together into one big file. This is +useful for creating libraries of `awk' functions. Useful functions can +be written once, and then retrieved from a standard place, instead of +having to be included into each individual program. You can still type +in a program at the terminal and use library functions, by specifying +`-f /dev/tty'. `awk' will read a file from the terminal to use as part +of the `awk' program. After typing your program, type `Control-d' (the +end-of-file character) to terminate it. (You may also use `-f -' to +read program source from the standard input, but then you will not be +able to also use the standard input as a source of data.) + + Because it is clumsy using the standard `awk' mechanisms to mix +source file and command line `awk' programs, `gawk' provides the +`--source' option. This does not require you to pre-empt the standard +input for your source code, and allows you to easily mix command line +and library source code (*note The `AWKPATH' Environment Variable: +AWKPATH Variable.). + + If no `-f' or `--source' option is specified, then `gawk' will use +the first non-option command line argument as the text of the program +source code. + + +File: gawk.info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Command Line + +Other Command Line Arguments +============================ + + Any additional arguments on the command line are normally treated as +input files to be processed in the order specified. However, an +argument that has the form `VAR=VALUE', means to assign the value VALUE +to the variable VAR--it does not specify a file at all. + + All these arguments are made available to your `awk' program in the +`ARGV' array (*note Built-in Variables::.). Command line options and +the program text (if present) are omitted from the `ARGV' array. All +other arguments, including variable assignments, are included. + + The distinction between file name arguments and variable-assignment +arguments is made when `awk' is about to open the next input file. At +that point in execution, it checks the "file name" to see whether it is +really a variable assignment; if so, `awk' sets the variable instead of +reading a file. + + Therefore, the variables actually receive the specified values after +all previously specified files have been read. In particular, the +values of variables assigned in this fashion are *not* available inside +a `BEGIN' rule (*note `BEGIN' and `END' Special Patterns: BEGIN/END.), +since such rules are run before `awk' begins scanning the argument list. +The values given on the command line are processed for escape sequences +(*note Constant Expressions: Constants.). + + In some earlier implementations of `awk', when a variable assignment +occurred before any file names, the assignment would happen *before* +the `BEGIN' rule was executed. Some applications came to depend upon +this "feature." When `awk' was changed to be more consistent, the `-v' +option was added to accommodate applications that depended upon this +old behavior. + + The variable assignment feature is most useful for assigning to +variables such as `RS', `OFS', and `ORS', which control input and +output formats, before scanning the data files. It is also useful for +controlling state if multiple passes are needed over a data file. For +example: + + awk 'pass == 1 { PASS 1 STUFF } + pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile + + Given the variable assignment feature, the `-F' option is not +strictly necessary. It remains for historical compatibility. + + +File: gawk.info, Node: AWKPATH Variable, Next: Obsolete, Prev: Other Arguments, Up: Command Line + +The `AWKPATH' Environment Variable +================================== + + The previous section described how `awk' program files can be named +on the command line with the `-f' option. In some `awk' +implementations, you must supply a precise path name for each program +file, unless the file is in the current directory. + + But in `gawk', if the file name supplied in the `-f' option does not +contain a `/', then `gawk' searches a list of directories (called the +"search path"), one by one, looking for a file with the specified name. + + The search path is actually a string consisting of directory names +separated by colons. `gawk' gets its search path from the `AWKPATH' +environment variable. If that variable does not exist, `gawk' uses the +default path, which is `.:/usr/lib/awk:/usr/local/lib/awk'. (Programs +written by system administrators should use an `AWKPATH' variable that +does not include the current directory, `.'.) + + The search path feature is particularly useful for building up +libraries of useful `awk' functions. The library files can be placed +in a standard directory that is in the default path, and then specified +on the command line with a short file name. Otherwise, the full file +name would have to be typed for each file. + + By combining the `--source' and `-f' options, your command line +`awk' programs can use facilities in `awk' library files. + + Path searching is not done if `gawk' is in compatibility mode. This +is true for both `-W compat' and `-W posix'. *Note Command Line +Options: Options. + + *Note:* if you want files in the current directory to be found, you +must include the current directory in the path, either by writing `.' +as an entry in the path, or by writing a null entry in the path. (A +null entry is indicated by starting or ending the path with a colon, or +by placing two colons next to each other (`::').) If the current +directory is not included in the path, then files cannot be found in +the current directory. This path search mechanism is identical to the +shell's. + + +File: gawk.info, Node: Obsolete, Next: Undocumented, Prev: AWKPATH Variable, Up: Command Line + +Obsolete Options and/or Features +================================ + + This section describes features and/or command line options from the +previous release of `gawk' that are either not available in the current +version, or that are still supported but deprecated (meaning that they +will *not* be in the next release). + + For version 2.15 of `gawk', the following command line options from +version 2.11.1 are no longer recognized. + +`-c' + Use `-W compat' instead. + +`-V' + Use `-W version' instead. + +`-C' + Use `-W copyright' instead. + +`-a' +`-e' + These options produce an "unrecognized option" error message but + have no effect on the execution of `gawk'. The POSIX standard now + specifies traditional `awk' regular expressions for the `awk' + utility. + + The public-domain version of `strftime' that is distributed with +`gawk' changed for the 2.14 release. The `%V' conversion specifier +that used to generate the date in VMS format was changed to `%v'. This +is because the POSIX standard for the `date' utility now specifies a +`%V' conversion specifier. *Note Functions for Dealing with Time +Stamps: Time Functions, for details. + + +File: gawk.info, Node: Undocumented, Prev: Obsolete, Up: Command Line + +Undocumented Options and Features +================================= + + This section intentionally left blank. + + +File: gawk.info, Node: Language History, Next: Installation, Prev: Command Line, Up: Top + +The Evolution of the `awk' Language +*********************************** + + This manual describes the GNU implementation of `awk', which is +patterned after the POSIX specification. Many `awk' users are only +familiar with the original `awk' implementation in Version 7 Unix, +which is also the basis for the version in Berkeley Unix (through +4.3-Reno). This chapter briefly describes the evolution of the `awk' +language. + +* Menu: + +* V7/S5R3.1:: The major changes between V7 and + System V Release 3.1. +* S5R4:: Minor changes between System V + Releases 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* POSIX/GNU:: The extensions in `gawk' + not in POSIX `awk'. + |