aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.info-8
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.info-8')
-rw-r--r--gawk.info-81173
1 files changed, 1173 insertions, 0 deletions
diff --git a/gawk.info-8 b/gawk.info-8
new file mode 100644
index 00000000..d0d693ff
--- /dev/null
+++ b/gawk.info-8
@@ -0,0 +1,1173 @@
+This is Info file gawk.info, produced by Makeinfo-1.54 from the input
+file gawk.texi.
+
+ This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ This is Edition 0.15 of `The GAWK Manual',
+for the 2.15 version of the GNU implementation
+of AWK.
+
+ Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
+
+ Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+ Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+ Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+File: gawk.info, Node: Regexp Summary, Next: Actions Summary, Prev: Pattern Summary, Up: Rules Summary
+
+Regular Expressions
+-------------------
+
+ Regular expressions are the extended kind found in `egrep'. They
+are composed of characters as follows:
+
+`C'
+ matches the character C (assuming C is a character with no special
+ meaning in regexps).
+
+`\C'
+ matches the literal character C.
+
+`.'
+ matches any character except newline.
+
+`^'
+ matches the beginning of a line or a string.
+
+`$'
+ matches the end of a line or a string.
+
+`[ABC...]'
+ matches any of the characters ABC... (character class).
+
+`[^ABC...]'
+ matches any character except ABC... and newline (negated character
+ class).
+
+`R1|R2'
+ matches either R1 or R2 (alternation).
+
+`R1R2'
+ matches R1, and then R2 (concatenation).
+
+`R+'
+ matches one or more R's.
+
+`R*'
+ matches zero or more R's.
+
+`R?'
+ matches zero or one R's.
+
+`(R)'
+ matches R (grouping).
+
+ *Note Regular Expressions as Patterns: Regexp, for a more detailed
+explanation of regular expressions.
+
+ The escape sequences allowed in string constants are also valid in
+regular expressions (*note Constant Expressions: Constants.).
+
+
+File: gawk.info, Node: Actions Summary, Prev: Regexp Summary, Up: Rules Summary
+
+Actions
+-------
+
+ Action statements are enclosed in braces, `{' and `}'. Action
+statements consist of the usual assignment, conditional, and looping
+statements found in most languages. The operators, control statements,
+and input/output statements available are patterned after those in C.
+
+* Menu:
+
+* Operator Summary:: `awk' operators.
+* Control Flow Summary:: The control statements.
+* I/O Summary:: The I/O statements.
+* Printf Summary:: A summary of `printf'.
+* Special File Summary:: Special file names interpreted internally.
+* Numeric Functions Summary:: Built-in numeric functions.
+* String Functions Summary:: Built-in string functions.
+* Time Functions Summary:: Built-in time functions.
+* String Constants Summary:: Escape sequences in strings.
+
+
+File: gawk.info, Node: Operator Summary, Next: Control Flow Summary, Prev: Actions Summary, Up: Actions Summary
+
+Operators
+.........
+
+ The operators in `awk', in order of increasing precedence, are:
+
+`= += -= *= /= %= ^='
+ Assignment. Both absolute assignment (`VAR=VALUE') and operator
+ assignment (the other forms) are supported.
+
+`?:'
+ A conditional expression, as in C. This has the form `EXPR1 ?
+ eXPR2 : EXPR3'. If EXPR1 is true, the value of the expression is
+ EXPR2; otherwise it is EXPR3. Only one of EXPR2 and EXPR3 is
+ evaluated.
+
+`||'
+ Logical "or".
+
+`&&'
+ Logical "and".
+
+`~ !~'
+ Regular expression match, negated match.
+
+`< <= > >= != =='
+ The usual relational operators.
+
+`BLANK'
+ String concatenation.
+
+`+ -'
+ Addition and subtraction.
+
+`* / %'
+ Multiplication, division, and modulus.
+
+`+ - !'
+ Unary plus, unary minus, and logical negation.
+
+`^'
+ Exponentiation (`**' may also be used, and `**=' for the assignment
+ operator, but they are not specified in the POSIX standard).
+
+`++ --'
+ Increment and decrement, both prefix and postfix.
+
+`$'
+ Field reference.
+
+ *Note Expressions as Action Statements: Expressions, for a full
+description of all the operators listed above. *Note Examining Fields:
+Fields, for a description of the field reference operator.
+
+
+File: gawk.info, Node: Control Flow Summary, Next: I/O Summary, Prev: Operator Summary, Up: Actions Summary
+
+Control Statements
+..................
+
+ The control statements are as follows:
+
+ if (CONDITION) STATEMENT [ else STATEMENT ]
+ while (CONDITION) STATEMENT
+ do STATEMENT while (CONDITION)
+ for (EXPR1; EXPR2; EXPR3) STATEMENT
+ for (VAR in ARRAY) STATEMENT
+ break
+ continue
+ delete ARRAY[INDEX]
+ exit [ EXPRESSION ]
+ { STATEMENTS }
+
+ *Note Control Statements in Actions: Statements, for a full
+description of all the control statements listed above.
+
+
+File: gawk.info, Node: I/O Summary, Next: Printf Summary, Prev: Control Flow Summary, Up: Actions Summary
+
+I/O Statements
+..............
+
+ The input/output statements are as follows:
+
+`getline'
+ Set `$0' from next input record; set `NF', `NR', `FNR'.
+
+`getline <FILE'
+ Set `$0' from next record of FILE; set `NF'.
+
+`getline VAR'
+ Set VAR from next input record; set `NF', `FNR'.
+
+`getline VAR <FILE'
+ Set VAR from next record of FILE.
+
+`next'
+ Stop processing the current input record. The next input record
+ is read and processing starts over with the first pattern in the
+ `awk' program. If the end of the input data is reached, the `END'
+ rule(s), if any, are executed.
+
+`next file'
+ Stop processing the current input file. The next input record
+ read comes from the next input file. `FILENAME' is updated, `FNR'
+ is set to 1, and processing starts over with the first pattern in
+ the `awk' program. If the end of the input data is reached, the
+ `END' rule(s), if any, are executed.
+
+`print'
+ Prints the current record.
+
+`print EXPR-LIST'
+ Prints expressions.
+
+`print EXPR-LIST > FILE'
+ Prints expressions on FILE.
+
+`printf FMT, EXPR-LIST'
+ Format and print.
+
+`printf FMT, EXPR-LIST > file'
+ Format and print on FILE.
+
+ Other input/output redirections are also allowed. For `print' and
+`printf', `>> FILE' appends output to the FILE, and `| COMMAND' writes
+on a pipe. In a similar fashion, `COMMAND | getline' pipes input into
+`getline'. `getline' returns 0 on end of file, and -1 on an error.
+
+ *Note Explicit Input with `getline': Getline, for a full description
+of the `getline' statement. *Note Printing Output: Printing, for a
+full description of `print' and `printf'. Finally, *note The `next'
+Statement: Next Statement., for a description of how the `next'
+statement works.
+
+
+File: gawk.info, Node: Printf Summary, Next: Special File Summary, Prev: I/O Summary, Up: Actions Summary
+
+`printf' Summary
+................
+
+ The `awk' `printf' statement and `sprintf' function accept the
+following conversion specification formats:
+
+`%c'
+ An ASCII character. If the argument used for `%c' is numeric, it
+ is treated as a character and printed. Otherwise, the argument is
+ assumed to be a string, and the only first character of that
+ string is printed.
+
+`%d'
+`%i'
+ A decimal number (the integer part).
+
+`%e'
+ A floating point number of the form `[-]d.ddddddE[+-]dd'.
+
+`%f'
+ A floating point number of the form [`-']`ddd.dddddd'.
+
+`%g'
+ Use `%e' or `%f' conversion, whichever produces a shorter string,
+ with nonsignificant zeros suppressed.
+
+`%o'
+ An unsigned octal number (again, an integer).
+
+`%s'
+ A character string.
+
+`%x'
+ An unsigned hexadecimal number (an integer).
+
+`%X'
+ Like `%x', except use `A' through `F' instead of `a' through `f'
+ for decimal 10 through 15.
+
+`%%'
+ A single `%' character; no argument is converted.
+
+ There are optional, additional parameters that may lie between the
+`%' and the control letter:
+
+`-'
+ The expression should be left-justified within its field.
+
+`WIDTH'
+ The field should be padded to this width. If WIDTH has a leading
+ zero, then the field is padded with zeros. Otherwise it is padded
+ with blanks.
+
+`.PREC'
+ A number indicating the maximum width of strings or digits to the
+ right of the decimal point.
+
+ Either or both of the WIDTH and PREC values may be specified as `*'.
+In that case, the particular value is taken from the argument list.
+
+ *Note Using `printf' Statements for Fancier Printing: Printf, for
+examples and for a more detailed description.
+
+
+File: gawk.info, Node: Special File Summary, Next: Numeric Functions Summary, Prev: Printf Summary, Up: Actions Summary
+
+Special File Names
+..................
+
+ When doing I/O redirection from either `print' or `printf' into a
+file, or via `getline' from a file, `gawk' recognizes certain special
+file names internally. These file names allow access to open file
+descriptors inherited from `gawk''s parent process (usually the shell).
+The file names are:
+
+`/dev/stdin'
+ The standard input.
+
+`/dev/stdout'
+ The standard output.
+
+`/dev/stderr'
+ The standard error output.
+
+`/dev/fd/N'
+ The file denoted by the open file descriptor N.
+
+ In addition the following files provide process related information
+about the running `gawk' program.
+
+`/dev/pid'
+ Reading this file returns the process ID of the current process,
+ in decimal, terminated with a newline.
+
+`/dev/ppid'
+ Reading this file returns the parent process ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/pgrpid'
+ Reading this file returns the process group ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/user'
+ Reading this file returns a single record terminated with a
+ newline. The fields are separated with blanks. The fields
+ represent the following information:
+
+ `$1'
+ The value of the `getuid' system call.
+
+ `$2'
+ The value of the `geteuid' system call.
+
+ `$3'
+ The value of the `getgid' system call.
+
+ `$4'
+ The value of the `getegid' system call.
+
+ If there are any additional fields, they are the group IDs
+ returned by `getgroups' system call. (Multiple groups may not be
+ supported on all systems.)
+
+These file names may also be used on the command line to name data
+files. These file names are only recognized internally if you do not
+actually have files by these names on your system.
+
+ *Note Standard I/O Streams: Special Files, for a longer description
+that provides the motivation for this feature.
+
+
+File: gawk.info, Node: Numeric Functions Summary, Next: String Functions Summary, Prev: Special File Summary, Up: Actions Summary
+
+Numeric Functions
+.................
+
+ `awk' has the following predefined arithmetic functions:
+
+`atan2(Y, X)'
+ returns the arctangent of Y/X in radians.
+
+`cos(EXPR)'
+ returns the cosine in radians.
+
+`exp(EXPR)'
+ the exponential function.
+
+`int(EXPR)'
+ truncates to integer.
+
+`log(EXPR)'
+ the natural logarithm function.
+
+`rand()'
+ returns a random number between 0 and 1.
+
+`sin(EXPR)'
+ returns the sine in radians.
+
+`sqrt(EXPR)'
+ the square root function.
+
+`srand(EXPR)'
+ use EXPR as a new seed for the random number generator. If no EXPR
+ is provided, the time of day is used. The return value is the
+ previous seed for the random number generator.
+
+
+File: gawk.info, Node: String Functions Summary, Next: Time Functions Summary, Prev: Numeric Functions Summary, Up: Actions Summary
+
+String Functions
+................
+
+ `awk' has the following predefined string functions:
+
+`gsub(R, S, T)'
+ for each substring matching the regular expression R in the string
+ T, substitute the string S, and return the number of substitutions.
+ If T is not supplied, use `$0'.
+
+`index(S, T)'
+ returns the index of the string T in the string S, or 0 if T is
+ not present.
+
+`length(S)'
+ returns the length of the string S. The length of `$0' is
+ returned if no argument is supplied.
+
+`match(S, R)'
+ returns the position in S where the regular expression R occurs,
+ or 0 if R is not present, and sets the values of `RSTART' and
+ `RLENGTH'.
+
+`split(S, A, R)'
+ splits the string S into the array A on the regular expression R,
+ and returns the number of fields. If R is omitted, `FS' is used
+ instead.
+
+`sprintf(FMT, EXPR-LIST)'
+ prints EXPR-LIST according to FMT, and returns the resulting
+ string.
+
+`sub(R, S, T)'
+ this is just like `gsub', but only the first matching substring is
+ replaced.
+
+`substr(S, I, N)'
+ returns the N-character substring of S starting at I. If N is
+ omitted, the rest of S is used.
+
+`tolower(STR)'
+ returns a copy of the string STR, with all the upper-case
+ characters in STR translated to their corresponding lower-case
+ counterparts. Nonalphabetic characters are left unchanged.
+
+`toupper(STR)'
+ returns a copy of the string STR, with all the lower-case
+ characters in STR translated to their corresponding upper-case
+ counterparts. Nonalphabetic characters are left unchanged.
+
+`system(CMD-LINE)'
+ Execute the command CMD-LINE, and return the exit status.
+
+
+File: gawk.info, Node: Time Functions Summary, Next: String Constants Summary, Prev: String Functions Summary, Up: Actions Summary
+
+Built-in time functions
+.......................
+
+ The following two functions are available for getting the current
+time of day, and for formatting time stamps.
+
+`systime()'
+ returns the current time of day as the number of seconds since a
+ particular epoch (Midnight, January 1, 1970 UTC, on POSIX systems).
+
+`strftime(FORMAT, TIMESTAMP)'
+ formats TIMESTAMP according to the specification in FORMAT. The
+ current time of day is used if no TIMESTAMP is supplied. *Note
+ Functions for Dealing with Time Stamps: Time Functions, for the
+ details on the conversion specifiers that `strftime' accepts.
+
+
+File: gawk.info, Node: String Constants Summary, Prev: Time Functions Summary, Up: Actions Summary
+
+String Constants
+................
+
+ String constants in `awk' are sequences of characters enclosed
+between double quotes (`"'). Within strings, certain "escape sequences"
+are recognized, as in C. These are:
+
+`\\'
+ A literal backslash.
+
+`\a'
+ The "alert" character; usually the ASCII BEL character.
+
+`\b'
+ Backspace.
+
+`\f'
+ Formfeed.
+
+`\n'
+ Newline.
+
+`\r'
+ Carriage return.
+
+`\t'
+ Horizontal tab.
+
+`\v'
+ Vertical tab.
+
+`\xHEX DIGITS'
+ The character represented by the string of hexadecimal digits
+ following the `\x'. As in ANSI C, all following hexadecimal
+ digits are considered part of the escape sequence. (This feature
+ should tell us something about language design by committee.)
+ E.g., `"\x1B"' is a string containing the ASCII ESC (escape)
+ character. (The `\x' escape sequence is not in POSIX `awk'.)
+
+`\DDD'
+ The character represented by the 1-, 2-, or 3-digit sequence of
+ octal digits. Thus, `"\033"' is also a string containing the
+ ASCII ESC (escape) character.
+
+`\C'
+ The literal character C.
+
+ The escape sequences may also be used inside constant regular
+expressions (e.g., the regexp `/[ \t\f\n\r\v]/' matches whitespace
+characters).
+
+ *Note Constant Expressions: Constants.
+
+
+File: gawk.info, Node: Functions Summary, Next: Historical Features, Prev: Rules Summary, Up: Gawk Summary
+
+Functions
+=========
+
+ Functions in `awk' are defined as follows:
+
+ function NAME(PARAMETER LIST) { STATEMENTS }
+
+ Actual parameters supplied in the function call are used to
+instantiate the formal parameters declared in the function. Arrays are
+passed by reference, other variables are passed by value.
+
+ If there are fewer arguments passed than there are names in
+PARAMETER-LIST, the extra names are given the null string as value.
+Extra names have the effect of local variables.
+
+ The open-parenthesis in a function call of a user-defined function
+must immediately follow the function name, without any intervening
+white space. This is to avoid a syntactic ambiguity with the
+concatenation operator.
+
+ The word `func' may be used in place of `function' (but not in POSIX
+`awk').
+
+ Use the `return' statement to return a value from a function.
+
+ *Note User-defined Functions: User-defined, for a more complete
+description.
+
+
+File: gawk.info, Node: Historical Features, Prev: Functions Summary, Up: Gawk Summary
+
+Historical Features
+===================
+
+ There are two features of historical `awk' implementations that
+`gawk' supports. First, it is possible to call the `length' built-in
+function not only with no arguments, but even without parentheses!
+
+ a = length
+
+is the same as either of
+
+ a = length()
+ a = length($0)
+
+This feature is marked as "deprecated" in the POSIX standard, and
+`gawk' will issue a warning about its use if `-W lint' is specified on
+the command line.
+
+ The other feature is the use of the `continue' statement outside the
+body of a `while', `for', or `do' loop. Traditional `awk'
+implementations have treated such usage as equivalent to the `next'
+statement. `gawk' will support this usage if `-W posix' has not been
+specified.
+
+
+File: gawk.info, Node: Sample Program, Next: Bugs, Prev: Gawk Summary, Up: Top
+
+Sample Program
+**************
+
+ The following example is a complete `awk' program, which prints the
+number of occurrences of each word in its input. It illustrates the
+associative nature of `awk' arrays by using strings as subscripts. It
+also demonstrates the `for X in ARRAY' construction. Finally, it shows
+how `awk' can be used in conjunction with other utility programs to do
+a useful task of some complexity with a minimum of effort. Some
+explanations follow the program listing.
+
+ awk '
+ # Print list of word frequencies
+ {
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+ }
+
+ END {
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+ }'
+
+ The first thing to notice about this program is that it has two
+rules. The first rule, because it has an empty pattern, is executed on
+every line of the input. It uses `awk''s field-accessing mechanism
+(*note Examining Fields: Fields.) to pick out the individual words from
+the line, and the built-in variable `NF' (*note Built-in Variables::.)
+to know how many fields are available.
+
+ For each input word, an element of the array `freq' is incremented to
+reflect that the word has been seen an additional time.
+
+ The second rule, because it has the pattern `END', is not executed
+until the input has been exhausted. It prints out the contents of the
+`freq' table that has been built up inside the first action.
+
+ Note that this program has several problems that would prevent it
+from being useful by itself on real text files:
+
+ * Words are detected using the `awk' convention that fields are
+ separated by whitespace and that other characters in the input
+ (except newlines) don't have any special meaning to `awk'. This
+ means that punctuation characters count as part of words.
+
+ * The `awk' language considers upper and lower case characters to be
+ distinct. Therefore, `foo' and `Foo' are not treated by this
+ program as the same word. This is undesirable since in normal
+ text, words are capitalized if they begin sentences, and a
+ frequency analyzer should not be sensitive to that.
+
+ * The output does not come out in any useful order. You're more
+ likely to be interested in which words occur most frequently, or
+ having an alphabetized table of how frequently each word occurs.
+
+ The way to solve these problems is to use some of the more advanced
+features of the `awk' language. First, we use `tolower' to remove case
+distinctions. Next, we use `gsub' to remove punctuation characters.
+Finally, we use the system `sort' utility to process the output of the
+`awk' script. First, here is the new version of the program:
+
+ awk '
+ # Print list of word frequencies
+ {
+ $0 = tolower($0) # remove case distinctions
+ gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+ }
+
+ END {
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+ }'
+
+ Assuming we have saved this program in a file named `frequency.awk',
+and that the data is in `file1', the following pipeline
+
+ awk -f frequency.awk file1 | sort +1 -nr
+
+produces a table of the words appearing in `file1' in order of
+decreasing frequency.
+
+ The `awk' program suitably massages the data and produces a word
+frequency table, which is not ordered.
+
+ The `awk' script's output is then sorted by the `sort' command and
+printed on the terminal. The options given to `sort' in this example
+specify to sort using the second field of each input line (skipping one
+field), that the sort keys should be treated as numeric quantities
+(otherwise `15' would come before `5'), and that the sorting should be
+done in descending (reverse) order.
+
+ We could have even done the `sort' from within the program, by
+changing the `END' action to:
+
+ END {
+ sort = "sort +1 -nr"
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word] | sort
+ close(sort)
+ }'
+
+ See the general operating system documentation for more information
+on how to use the `sort' command.
+
+
+File: gawk.info, Node: Bugs, Next: Notes, Prev: Sample Program, Up: Top
+
+Reporting Problems and Bugs
+***************************
+
+ If you have problems with `gawk' or think that you have found a bug,
+please report it to the developers; we cannot promise to do anything
+but we might well want to fix it.
+
+ Before reporting a bug, make sure you have actually found a real bug.
+Carefully reread the documentation and see if it really says you can do
+what you're trying to do. If it's not clear whether you should be able
+to do something or not, report that too; it's a bug in the
+documentation!
+
+ Before reporting a bug or trying to fix it yourself, try to isolate
+it to the smallest possible `awk' program and input data file that
+reproduces the problem. Then send us the program and data file, some
+idea of what kind of Unix system you're using, and the exact results
+`gawk' gave you. Also say what you expected to occur; this will help
+us decide whether the problem was really in the documentation.
+
+ Once you have a precise problem, send e-mail to (Internet)
+`bug-gnu-utils@prep.ai.mit.edu' or (UUCP)
+`mit-eddie!prep.ai.mit.edu!bug-gnu-utils'. Please include the version
+number of `gawk' you are using. You can get this information with the
+command `gawk -W version '{}' /dev/null'. You should send carbon
+copies of your mail to David Trueman at `david@cs.dal.ca', and to
+Arnold Robbins, who can be reached at `arnold@skeeve.atl.ga.us'. David
+is most likely to fix code problems, while Arnold is most likely to fix
+documentation problems.
+
+ Non-bug suggestions are always welcome as well. If you have
+questions about things that are unclear in the documentation or are
+just obscure features, ask Arnold Robbins; he will try to help you out,
+although he may not have the time to fix the problem. You can send him
+electronic mail at the Internet address above.
+
+ If you find bugs in one of the non-Unix ports of `gawk', please send
+an electronic mail message to the person who maintains that port. They
+are listed below, and also in the `README' file in the `gawk'
+distribution. Information in the `README' file should be considered
+authoritative if it conflicts with this manual.
+
+ The people maintaining the non-Unix ports of `gawk' are:
+
+MS-DOS
+ The port to MS-DOS is maintained by Scott Deifik. His electronic
+ mail address is `scottd@amgen.com'.
+
+VMS
+ The port to VAX VMS is maintained by Pat Rankin. His electronic
+ mail address is `rankin@eql.caltech.edu'.
+
+Atari ST
+ The port to the Atari ST is maintained by Michal Jaegermann. His
+ electronic mail address is `ntomczak@vm.ucs.ualberta.ca'.
+
+ If your bug is also reproducible under Unix, please send copies of
+your report to the general GNU bug list, as well as to Arnold Robbins
+and David Trueman, at the addresses listed above.
+
+
+File: gawk.info, Node: Notes, Next: Glossary, Prev: Bugs, Up: Top
+
+Implementation Notes
+********************
+
+ This appendix contains information mainly of interest to
+implementors and maintainers of `gawk'. Everything in it applies
+specifically to `gawk', and not to other implementations.
+
+* Menu:
+
+* Compatibility Mode:: How to disable certain `gawk' extensions.
+* Future Extensions:: New features we may implement soon.
+* Improvements:: Suggestions for improvements by volunteers.
+
+
+File: gawk.info, Node: Compatibility Mode, Next: Future Extensions, Prev: Notes, Up: Notes
+
+Downward Compatibility and Debugging
+====================================
+
+ *Note Extensions in `gawk' not in POSIX `awk': POSIX/GNU, for a
+summary of the GNU extensions to the `awk' language and program. All
+of these features can be turned off by invoking `gawk' with the `-W
+compat' option, or with the `-W posix' option.
+
+ If `gawk' is compiled for debugging with `-DDEBUG', then there is
+one more option available on the command line:
+
+`-W parsedebug'
+ Print out the parse stack information as the program is being
+ parsed.
+
+ This option is intended only for serious `gawk' developers, and not
+for the casual user. It probably has not even been compiled into your
+version of `gawk', since it slows down execution.
+
+
+File: gawk.info, Node: Future Extensions, Next: Improvements, Prev: Compatibility Mode, Up: Notes
+
+Probable Future Extensions
+==========================
+
+ This section briefly lists extensions that indicate the directions
+we are currently considering for `gawk'. The file `FUTURES' in the
+`gawk' distributions lists these extensions, as well as several others.
+
+`RS' as a regexp
+ The meaning of `RS' may be generalized along the lines of `FS'.
+
+Control of subprocess environment
+ Changes made in `gawk' to the array `ENVIRON' may be propagated to
+ subprocesses run by `gawk'.
+
+Databases
+ It may be possible to map a GDBM/NDBM/SDBM file into an `awk'
+ array.
+
+Single-character fields
+ The null string, `""', as a field separator, will cause field
+ splitting and the `split' function to separate individual
+ characters. Thus, `split(a, "abcd", "")' would yield `a[1] ==
+ "a"', `a[2] == "b"', and so on.
+
+More `lint' warnings
+ There are more things that could be checked for portability.
+
+`RECLEN' variable for fixed length records
+ Along with `FIELDWIDTHS', this would speed up the processing of
+ fixed-length records.
+
+`RT' variable to hold the record terminator
+ It is occasionally useful to have access to the actual string of
+ characters that matched the `RS' variable. The `RT' variable
+ would hold these characters.
+
+A `restart' keyword
+ After modifying `$0', `restart' would restart the pattern matching
+ loop, without reading a new record from the input.
+
+A `|&' redirection
+ The `|&' redirection, in place of `|', would open a two-way
+ pipeline for communication with a sub-process (via `getline' and
+ `print' and `printf').
+
+`IGNORECASE' affecting all comparisons
+ The effects of the `IGNORECASE' variable may be generalized to all
+ string comparisons, and not just regular expression operations.
+
+A way to mix command line source code and library files
+ There may be a new option that would make it possible to easily
+ use library functions from a program entered on the command line.
+
+GNU-style long options
+ We will add GNU-style long options to `gawk' for compatibility
+ with other GNU programs. (For example, `--field-separator=:'
+ would be equivalent to `-F:'.)
+
+
+File: gawk.info, Node: Improvements, Prev: Future Extensions, Up: Notes
+
+Suggestions for Improvements
+============================
+
+ Here are some projects that would-be `gawk' hackers might like to
+take on. They vary in size from a few days to a few weeks of
+programming, depending on which one you choose and how fast a
+programmer you are. Please send any improvements you write to the
+maintainers at the GNU project.
+
+ 1. Compilation of `awk' programs: `gawk' uses a Bison (YACC-like)
+ parser to convert the script given it into a syntax tree; the
+ syntax tree is then executed by a simple recursive evaluator.
+ This method incurs a lot of overhead, since the recursive
+ evaluator performs many procedure calls to do even the simplest
+ things.
+
+ It should be possible for `gawk' to convert the script's parse tree
+ into a C program which the user would then compile, using the
+ normal C compiler and a special `gawk' library to provide all the
+ needed functions (regexps, fields, associative arrays, type
+ coercion, and so on).
+
+ An easier possibility might be for an intermediate phase of `awk'
+ to convert the parse tree into a linear byte code form like the
+ one used in GNU Emacs Lisp. The recursive evaluator would then be
+ replaced by a straight line byte code interpreter that would be
+ intermediate in speed between running a compiled program and doing
+ what `gawk' does now.
+
+ This may actually happen for the 3.0 version of `gawk'.
+
+ 2. An error message section has not been included in this version of
+ the manual. Perhaps some nice beta testers will document some of
+ the messages for the future.
+
+ 3. The programs in the test suite could use documenting in this
+ manual.
+
+ 4. The programs and data files in the manual should be available in
+ separate files to facilitate experimentation.
+
+ 5. See the `FUTURES' file for more ideas. Contact us if you would
+ seriously like to tackle any of the items listed there.
+
+
+File: gawk.info, Node: Glossary, Next: Index, Prev: Notes, Up: Top
+
+Glossary
+********
+
+Action
+ A series of `awk' statements attached to a rule. If the rule's
+ pattern matches an input record, the `awk' language executes the
+ rule's action. Actions are always enclosed in curly braces.
+ *Note Overview of Actions: Actions.
+
+Amazing `awk' Assembler
+ Henry Spencer at the University of Toronto wrote a retargetable
+ assembler completely as `awk' scripts. It is thousands of lines
+ long, including machine descriptions for several 8-bit
+ microcomputers. It is a good example of a program that would have
+ been better written in another language.
+
+ANSI
+ The American National Standards Institute. This organization
+ produces many standards, among them the standard for the C
+ programming language.
+
+Assignment
+ An `awk' expression that changes the value of some `awk' variable
+ or data object. An object that you can assign to is called an
+ "lvalue". *Note Assignment Expressions: Assignment Ops.
+
+`awk' Language
+ The language in which `awk' programs are written.
+
+`awk' Program
+ An `awk' program consists of a series of "patterns" and "actions",
+ collectively known as "rules". For each input record given to the
+ program, the program's rules are all processed in turn. `awk'
+ programs may also contain function definitions.
+
+`awk' Script
+ Another name for an `awk' program.
+
+Built-in Function
+ The `awk' language provides built-in functions that perform various
+ numerical, time stamp related, and string computations. Examples
+ are `sqrt' (for the square root of a number) and `substr' (for a
+ substring of a string). *Note Built-in Functions: Built-in.
+
+Built-in Variable
+ `ARGC', `ARGIND', `ARGV', `CONVFMT', `ENVIRON', `ERRNO',
+ `FIELDWIDTHS', `FILENAME', `FNR', `FS', `IGNORECASE', `NF', `NR',
+ `OFMT', `OFS', `ORS', `RLENGTH', `RSTART', `RS', and `SUBSEP', are
+ the variables that have special meaning to `awk'. Changing some
+ of them affects `awk''s running environment. *Note Built-in
+ Variables::.
+
+Braces
+ See "Curly Braces."
+
+C
+ The system programming language that most GNU software is written
+ in. The `awk' programming language has C-like syntax, and this
+ manual points out similarities between `awk' and C when
+ appropriate.
+
+CHEM
+ A preprocessor for `pic' that reads descriptions of molecules and
+ produces `pic' input for drawing them. It was written by Brian
+ Kernighan, and is available from `netlib@research.att.com'.
+
+Compound Statement
+ A series of `awk' statements, enclosed in curly braces. Compound
+ statements may be nested. *Note Control Statements in Actions:
+ Statements.
+
+Concatenation
+ Concatenating two strings means sticking them together, one after
+ another, giving a new string. For example, the string `foo'
+ concatenated with the string `bar' gives the string `foobar'.
+ *Note String Concatenation: Concatenation.
+
+Conditional Expression
+ An expression using the `?:' ternary operator, such as `EXPR1 ?
+ EXPR2 : EXPR3'. The expression EXPR1 is evaluated; if the result
+ is true, the value of the whole expression is the value of EXPR2
+ otherwise the value is EXPR3. In either case, only one of EXPR2
+ and EXPR3 is evaluated. *Note Conditional Expressions:
+ Conditional Exp.
+
+Constant Regular Expression
+ A constant regular expression is a regular expression written
+ within slashes, such as `/foo/'. This regular expression is chosen
+ when you write the `awk' program, and cannot be changed doing its
+ execution. *Note How to Use Regular Expressions: Regexp Usage.
+
+Comparison Expression
+ A relation that is either true or false, such as `(a < b)'.
+ Comparison expressions are used in `if', `while', and `for'
+ statements, and in patterns to select which input records to
+ process. *Note Comparison Expressions: Comparison Ops.
+
+Curly Braces
+ The characters `{' and `}'. Curly braces are used in `awk' for
+ delimiting actions, compound statements, and function bodies.
+
+Data Objects
+ These are numbers and strings of characters. Numbers are
+ converted into strings and vice versa, as needed. *Note
+ Conversion of Strings and Numbers: Conversion.
+
+Dynamic Regular Expression
+ A dynamic regular expression is a regular expression written as an
+ ordinary expression. It could be a string constant, such as
+ `"foo"', but it may also be an expression whose value may vary.
+ *Note How to Use Regular Expressions: Regexp Usage.
+
+Escape Sequences
+ A special sequence of characters used for describing nonprinting
+ characters, such as `\n' for newline, or `\033' for the ASCII ESC
+ (escape) character. *Note Constant Expressions: Constants.
+
+Field
+ When `awk' reads an input record, it splits the record into pieces
+ separated by whitespace (or by a separator regexp which you can
+ change by setting the built-in variable `FS'). Such pieces are
+ called fields. If the pieces are of fixed length, you can use the
+ built-in variable `FIELDWIDTHS' to describe their lengths. *Note
+ How Input is Split into Records: Records.
+
+Format
+ Format strings are used to control the appearance of output in the
+ `printf' statement. Also, data conversions from numbers to strings
+ are controlled by the format string contained in the built-in
+ variable `CONVFMT'. *Note Format-Control Letters: Control Letters.
+
+Function
+ A specialized group of statements often used to encapsulate general
+ or program-specific tasks. `awk' has a number of built-in
+ functions, and also allows you to define your own. *Note Built-in
+ Functions: Built-in. Also, see *Note User-defined Functions:
+ User-defined.
+
+`gawk'
+ The GNU implementation of `awk'.
+
+GNU
+ "GNU's not Unix". An on-going project of the Free Software
+ Foundation to create a complete, freely distributable,
+ POSIX-compliant computing environment.
+
+Input Record
+ A single chunk of data read in by `awk'. Usually, an `awk' input
+ record consists of one line of text. *Note How Input is Split
+ into Records: Records.
+
+Keyword
+ In the `awk' language, a keyword is a word that has special
+ meaning. Keywords are reserved and may not be used as variable
+ names.
+
+ `awk''s keywords are: `if', `else', `while', `do...while', `for',
+ `for...in', `break', `continue', `delete', `next', `function',
+ `func', and `exit'.
+
+Lvalue
+ An expression that can appear on the left side of an assignment
+ operator. In most languages, lvalues can be variables or array
+ elements. In `awk', a field designator can also be used as an
+ lvalue.
+
+Number
+ A numeric valued data object. The `gawk' implementation uses
+ double precision floating point to represent numbers.
+
+Pattern
+ Patterns tell `awk' which input records are interesting to which
+ rules.
+
+ A pattern is an arbitrary conditional expression against which
+ input is tested. If the condition is satisfied, the pattern is
+ said to "match" the input record. A typical pattern might compare
+ the input record against a regular expression. *Note Patterns::.
+
+POSIX
+ The name for a series of standards being developed by the IEEE
+ that specify a Portable Operating System interface. The "IX"
+ denotes the Unix heritage of these standards. The main standard
+ of interest for `awk' users is P1003.2, the Command Language and
+ Utilities standard.
+
+Range (of input lines)
+ A sequence of consecutive lines from the input file. A pattern
+ can specify ranges of input lines for `awk' to process, or it can
+ specify single lines. *Note Patterns::.
+
+Recursion
+ When a function calls itself, either directly or indirectly. If
+ this isn't clear, refer to the entry for "recursion."
+
+Redirection
+ Redirection means performing input from other than the standard
+ input stream, or output to other than the standard output stream.
+
+ You can redirect the output of the `print' and `printf' statements
+ to a file or a system command, using the `>', `>>', and `|'
+ operators. You can redirect input to the `getline' statement using
+ the `<' and `|' operators. *Note Redirecting Output of `print'
+ and `printf': Redirection.
+
+Regular Expression
+ See "regexp."
+
+Regexp
+ Short for "regular expression". A regexp is a pattern that
+ denotes a set of strings, possibly an infinite set. For example,
+ the regexp `R.*xp' matches any string starting with the letter `R'
+ and ending with the letters `xp'. In `awk', regexps are used in
+ patterns and in conditional expressions. Regexps may contain
+ escape sequences. *Note Regular Expressions as Patterns: Regexp.
+
+Rule
+ A segment of an `awk' program, that specifies how to process single
+ input records. A rule consists of a "pattern" and an "action".
+ `awk' reads an input record; then, for each rule, if the input
+ record satisfies the rule's pattern, `awk' executes the rule's
+ action. Otherwise, the rule does nothing for that input record.
+
+Side Effect
+ A side effect occurs when an expression has an effect aside from
+ merely producing a value. Assignment expressions, increment
+ expressions and function calls have side effects. *Note
+ Assignment Expressions: Assignment Ops.
+
+Special File
+ A file name interpreted internally by `gawk', instead of being
+ handed directly to the underlying operating system. For example,
+ `/dev/stdin'. *Note Standard I/O Streams: Special Files.
+
+Stream Editor
+ A program that reads records from an input stream and processes
+ them one or more at a time. This is in contrast with batch
+ programs, which may expect to read their input files in entirety
+ before starting to do anything, and with interactive programs,
+ which require input from the user.
+
+String
+ A datum consisting of a sequence of characters, such as `I am a
+ string'. Constant strings are written with double-quotes in the
+ `awk' language, and may contain escape sequences. *Note Constant
+ Expressions: Constants.
+
+Whitespace
+ A sequence of blank or tab characters occurring inside an input
+ record or a string.
+