aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.info-8
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.info-8')
-rw-r--r--gawk.info-81173
1 files changed, 0 insertions, 1173 deletions
diff --git a/gawk.info-8 b/gawk.info-8
deleted file mode 100644
index d0d693ff..00000000
--- a/gawk.info-8
+++ /dev/null
@@ -1,1173 +0,0 @@
-This is Info file gawk.info, produced by Makeinfo-1.54 from the input
-file gawk.texi.
-
- This file documents `awk', a program that you can use to select
-particular records in a file and perform operations upon them.
-
- This is Edition 0.15 of `The GAWK Manual',
-for the 2.15 version of the GNU implementation
-of AWK.
-
- Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
-manual provided the copyright notice and this permission notice are
-preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
-this manual under the conditions for verbatim copying, provided that
-the entire resulting derived work is distributed under the terms of a
-permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
-manual into another language, under the above conditions for modified
-versions, except that this permission notice may be stated in a
-translation approved by the Foundation.
-
-
-File: gawk.info, Node: Regexp Summary, Next: Actions Summary, Prev: Pattern Summary, Up: Rules Summary
-
-Regular Expressions
--------------------
-
- Regular expressions are the extended kind found in `egrep'. They
-are composed of characters as follows:
-
-`C'
- matches the character C (assuming C is a character with no special
- meaning in regexps).
-
-`\C'
- matches the literal character C.
-
-`.'
- matches any character except newline.
-
-`^'
- matches the beginning of a line or a string.
-
-`$'
- matches the end of a line or a string.
-
-`[ABC...]'
- matches any of the characters ABC... (character class).
-
-`[^ABC...]'
- matches any character except ABC... and newline (negated character
- class).
-
-`R1|R2'
- matches either R1 or R2 (alternation).
-
-`R1R2'
- matches R1, and then R2 (concatenation).
-
-`R+'
- matches one or more R's.
-
-`R*'
- matches zero or more R's.
-
-`R?'
- matches zero or one R's.
-
-`(R)'
- matches R (grouping).
-
- *Note Regular Expressions as Patterns: Regexp, for a more detailed
-explanation of regular expressions.
-
- The escape sequences allowed in string constants are also valid in
-regular expressions (*note Constant Expressions: Constants.).
-
-
-File: gawk.info, Node: Actions Summary, Prev: Regexp Summary, Up: Rules Summary
-
-Actions
--------
-
- Action statements are enclosed in braces, `{' and `}'. Action
-statements consist of the usual assignment, conditional, and looping
-statements found in most languages. The operators, control statements,
-and input/output statements available are patterned after those in C.
-
-* Menu:
-
-* Operator Summary:: `awk' operators.
-* Control Flow Summary:: The control statements.
-* I/O Summary:: The I/O statements.
-* Printf Summary:: A summary of `printf'.
-* Special File Summary:: Special file names interpreted internally.
-* Numeric Functions Summary:: Built-in numeric functions.
-* String Functions Summary:: Built-in string functions.
-* Time Functions Summary:: Built-in time functions.
-* String Constants Summary:: Escape sequences in strings.
-
-
-File: gawk.info, Node: Operator Summary, Next: Control Flow Summary, Prev: Actions Summary, Up: Actions Summary
-
-Operators
-.........
-
- The operators in `awk', in order of increasing precedence, are:
-
-`= += -= *= /= %= ^='
- Assignment. Both absolute assignment (`VAR=VALUE') and operator
- assignment (the other forms) are supported.
-
-`?:'
- A conditional expression, as in C. This has the form `EXPR1 ?
- eXPR2 : EXPR3'. If EXPR1 is true, the value of the expression is
- EXPR2; otherwise it is EXPR3. Only one of EXPR2 and EXPR3 is
- evaluated.
-
-`||'
- Logical "or".
-
-`&&'
- Logical "and".
-
-`~ !~'
- Regular expression match, negated match.
-
-`< <= > >= != =='
- The usual relational operators.
-
-`BLANK'
- String concatenation.
-
-`+ -'
- Addition and subtraction.
-
-`* / %'
- Multiplication, division, and modulus.
-
-`+ - !'
- Unary plus, unary minus, and logical negation.
-
-`^'
- Exponentiation (`**' may also be used, and `**=' for the assignment
- operator, but they are not specified in the POSIX standard).
-
-`++ --'
- Increment and decrement, both prefix and postfix.
-
-`$'
- Field reference.
-
- *Note Expressions as Action Statements: Expressions, for a full
-description of all the operators listed above. *Note Examining Fields:
-Fields, for a description of the field reference operator.
-
-
-File: gawk.info, Node: Control Flow Summary, Next: I/O Summary, Prev: Operator Summary, Up: Actions Summary
-
-Control Statements
-..................
-
- The control statements are as follows:
-
- if (CONDITION) STATEMENT [ else STATEMENT ]
- while (CONDITION) STATEMENT
- do STATEMENT while (CONDITION)
- for (EXPR1; EXPR2; EXPR3) STATEMENT
- for (VAR in ARRAY) STATEMENT
- break
- continue
- delete ARRAY[INDEX]
- exit [ EXPRESSION ]
- { STATEMENTS }
-
- *Note Control Statements in Actions: Statements, for a full
-description of all the control statements listed above.
-
-
-File: gawk.info, Node: I/O Summary, Next: Printf Summary, Prev: Control Flow Summary, Up: Actions Summary
-
-I/O Statements
-..............
-
- The input/output statements are as follows:
-
-`getline'
- Set `$0' from next input record; set `NF', `NR', `FNR'.
-
-`getline <FILE'
- Set `$0' from next record of FILE; set `NF'.
-
-`getline VAR'
- Set VAR from next input record; set `NF', `FNR'.
-
-`getline VAR <FILE'
- Set VAR from next record of FILE.
-
-`next'
- Stop processing the current input record. The next input record
- is read and processing starts over with the first pattern in the
- `awk' program. If the end of the input data is reached, the `END'
- rule(s), if any, are executed.
-
-`next file'
- Stop processing the current input file. The next input record
- read comes from the next input file. `FILENAME' is updated, `FNR'
- is set to 1, and processing starts over with the first pattern in
- the `awk' program. If the end of the input data is reached, the
- `END' rule(s), if any, are executed.
-
-`print'
- Prints the current record.
-
-`print EXPR-LIST'
- Prints expressions.
-
-`print EXPR-LIST > FILE'
- Prints expressions on FILE.
-
-`printf FMT, EXPR-LIST'
- Format and print.
-
-`printf FMT, EXPR-LIST > file'
- Format and print on FILE.
-
- Other input/output redirections are also allowed. For `print' and
-`printf', `>> FILE' appends output to the FILE, and `| COMMAND' writes
-on a pipe. In a similar fashion, `COMMAND | getline' pipes input into
-`getline'. `getline' returns 0 on end of file, and -1 on an error.
-
- *Note Explicit Input with `getline': Getline, for a full description
-of the `getline' statement. *Note Printing Output: Printing, for a
-full description of `print' and `printf'. Finally, *note The `next'
-Statement: Next Statement., for a description of how the `next'
-statement works.
-
-
-File: gawk.info, Node: Printf Summary, Next: Special File Summary, Prev: I/O Summary, Up: Actions Summary
-
-`printf' Summary
-................
-
- The `awk' `printf' statement and `sprintf' function accept the
-following conversion specification formats:
-
-`%c'
- An ASCII character. If the argument used for `%c' is numeric, it
- is treated as a character and printed. Otherwise, the argument is
- assumed to be a string, and the only first character of that
- string is printed.
-
-`%d'
-`%i'
- A decimal number (the integer part).
-
-`%e'
- A floating point number of the form `[-]d.ddddddE[+-]dd'.
-
-`%f'
- A floating point number of the form [`-']`ddd.dddddd'.
-
-`%g'
- Use `%e' or `%f' conversion, whichever produces a shorter string,
- with nonsignificant zeros suppressed.
-
-`%o'
- An unsigned octal number (again, an integer).
-
-`%s'
- A character string.
-
-`%x'
- An unsigned hexadecimal number (an integer).
-
-`%X'
- Like `%x', except use `A' through `F' instead of `a' through `f'
- for decimal 10 through 15.
-
-`%%'
- A single `%' character; no argument is converted.
-
- There are optional, additional parameters that may lie between the
-`%' and the control letter:
-
-`-'
- The expression should be left-justified within its field.
-
-`WIDTH'
- The field should be padded to this width. If WIDTH has a leading
- zero, then the field is padded with zeros. Otherwise it is padded
- with blanks.
-
-`.PREC'
- A number indicating the maximum width of strings or digits to the
- right of the decimal point.
-
- Either or both of the WIDTH and PREC values may be specified as `*'.
-In that case, the particular value is taken from the argument list.
-
- *Note Using `printf' Statements for Fancier Printing: Printf, for
-examples and for a more detailed description.
-
-
-File: gawk.info, Node: Special File Summary, Next: Numeric Functions Summary, Prev: Printf Summary, Up: Actions Summary
-
-Special File Names
-..................
-
- When doing I/O redirection from either `print' or `printf' into a
-file, or via `getline' from a file, `gawk' recognizes certain special
-file names internally. These file names allow access to open file
-descriptors inherited from `gawk''s parent process (usually the shell).
-The file names are:
-
-`/dev/stdin'
- The standard input.
-
-`/dev/stdout'
- The standard output.
-
-`/dev/stderr'
- The standard error output.
-
-`/dev/fd/N'
- The file denoted by the open file descriptor N.
-
- In addition the following files provide process related information
-about the running `gawk' program.
-
-`/dev/pid'
- Reading this file returns the process ID of the current process,
- in decimal, terminated with a newline.
-
-`/dev/ppid'
- Reading this file returns the parent process ID of the current
- process, in decimal, terminated with a newline.
-
-`/dev/pgrpid'
- Reading this file returns the process group ID of the current
- process, in decimal, terminated with a newline.
-
-`/dev/user'
- Reading this file returns a single record terminated with a
- newline. The fields are separated with blanks. The fields
- represent the following information:
-
- `$1'
- The value of the `getuid' system call.
-
- `$2'
- The value of the `geteuid' system call.
-
- `$3'
- The value of the `getgid' system call.
-
- `$4'
- The value of the `getegid' system call.
-
- If there are any additional fields, they are the group IDs
- returned by `getgroups' system call. (Multiple groups may not be
- supported on all systems.)
-
-These file names may also be used on the command line to name data
-files. These file names are only recognized internally if you do not
-actually have files by these names on your system.
-
- *Note Standard I/O Streams: Special Files, for a longer description
-that provides the motivation for this feature.
-
-
-File: gawk.info, Node: Numeric Functions Summary, Next: String Functions Summary, Prev: Special File Summary, Up: Actions Summary
-
-Numeric Functions
-.................
-
- `awk' has the following predefined arithmetic functions:
-
-`atan2(Y, X)'
- returns the arctangent of Y/X in radians.
-
-`cos(EXPR)'
- returns the cosine in radians.
-
-`exp(EXPR)'
- the exponential function.
-
-`int(EXPR)'
- truncates to integer.
-
-`log(EXPR)'
- the natural logarithm function.
-
-`rand()'
- returns a random number between 0 and 1.
-
-`sin(EXPR)'
- returns the sine in radians.
-
-`sqrt(EXPR)'
- the square root function.
-
-`srand(EXPR)'
- use EXPR as a new seed for the random number generator. If no EXPR
- is provided, the time of day is used. The return value is the
- previous seed for the random number generator.
-
-
-File: gawk.info, Node: String Functions Summary, Next: Time Functions Summary, Prev: Numeric Functions Summary, Up: Actions Summary
-
-String Functions
-................
-
- `awk' has the following predefined string functions:
-
-`gsub(R, S, T)'
- for each substring matching the regular expression R in the string
- T, substitute the string S, and return the number of substitutions.
- If T is not supplied, use `$0'.
-
-`index(S, T)'
- returns the index of the string T in the string S, or 0 if T is
- not present.
-
-`length(S)'
- returns the length of the string S. The length of `$0' is
- returned if no argument is supplied.
-
-`match(S, R)'
- returns the position in S where the regular expression R occurs,
- or 0 if R is not present, and sets the values of `RSTART' and
- `RLENGTH'.
-
-`split(S, A, R)'
- splits the string S into the array A on the regular expression R,
- and returns the number of fields. If R is omitted, `FS' is used
- instead.
-
-`sprintf(FMT, EXPR-LIST)'
- prints EXPR-LIST according to FMT, and returns the resulting
- string.
-
-`sub(R, S, T)'
- this is just like `gsub', but only the first matching substring is
- replaced.
-
-`substr(S, I, N)'
- returns the N-character substring of S starting at I. If N is
- omitted, the rest of S is used.
-
-`tolower(STR)'
- returns a copy of the string STR, with all the upper-case
- characters in STR translated to their corresponding lower-case
- counterparts. Nonalphabetic characters are left unchanged.
-
-`toupper(STR)'
- returns a copy of the string STR, with all the lower-case
- characters in STR translated to their corresponding upper-case
- counterparts. Nonalphabetic characters are left unchanged.
-
-`system(CMD-LINE)'
- Execute the command CMD-LINE, and return the exit status.
-
-
-File: gawk.info, Node: Time Functions Summary, Next: String Constants Summary, Prev: String Functions Summary, Up: Actions Summary
-
-Built-in time functions
-.......................
-
- The following two functions are available for getting the current
-time of day, and for formatting time stamps.
-
-`systime()'
- returns the current time of day as the number of seconds since a
- particular epoch (Midnight, January 1, 1970 UTC, on POSIX systems).
-
-`strftime(FORMAT, TIMESTAMP)'
- formats TIMESTAMP according to the specification in FORMAT. The
- current time of day is used if no TIMESTAMP is supplied. *Note
- Functions for Dealing with Time Stamps: Time Functions, for the
- details on the conversion specifiers that `strftime' accepts.
-
-
-File: gawk.info, Node: String Constants Summary, Prev: Time Functions Summary, Up: Actions Summary
-
-String Constants
-................
-
- String constants in `awk' are sequences of characters enclosed
-between double quotes (`"'). Within strings, certain "escape sequences"
-are recognized, as in C. These are:
-
-`\\'
- A literal backslash.
-
-`\a'
- The "alert" character; usually the ASCII BEL character.
-
-`\b'
- Backspace.
-
-`\f'
- Formfeed.
-
-`\n'
- Newline.
-
-`\r'
- Carriage return.
-
-`\t'
- Horizontal tab.
-
-`\v'
- Vertical tab.
-
-`\xHEX DIGITS'
- The character represented by the string of hexadecimal digits
- following the `\x'. As in ANSI C, all following hexadecimal
- digits are considered part of the escape sequence. (This feature
- should tell us something about language design by committee.)
- E.g., `"\x1B"' is a string containing the ASCII ESC (escape)
- character. (The `\x' escape sequence is not in POSIX `awk'.)
-
-`\DDD'
- The character represented by the 1-, 2-, or 3-digit sequence of
- octal digits. Thus, `"\033"' is also a string containing the
- ASCII ESC (escape) character.
-
-`\C'
- The literal character C.
-
- The escape sequences may also be used inside constant regular
-expressions (e.g., the regexp `/[ \t\f\n\r\v]/' matches whitespace
-characters).
-
- *Note Constant Expressions: Constants.
-
-
-File: gawk.info, Node: Functions Summary, Next: Historical Features, Prev: Rules Summary, Up: Gawk Summary
-
-Functions
-=========
-
- Functions in `awk' are defined as follows:
-
- function NAME(PARAMETER LIST) { STATEMENTS }
-
- Actual parameters supplied in the function call are used to
-instantiate the formal parameters declared in the function. Arrays are
-passed by reference, other variables are passed by value.
-
- If there are fewer arguments passed than there are names in
-PARAMETER-LIST, the extra names are given the null string as value.
-Extra names have the effect of local variables.
-
- The open-parenthesis in a function call of a user-defined function
-must immediately follow the function name, without any intervening
-white space. This is to avoid a syntactic ambiguity with the
-concatenation operator.
-
- The word `func' may be used in place of `function' (but not in POSIX
-`awk').
-
- Use the `return' statement to return a value from a function.
-
- *Note User-defined Functions: User-defined, for a more complete
-description.
-
-
-File: gawk.info, Node: Historical Features, Prev: Functions Summary, Up: Gawk Summary
-
-Historical Features
-===================
-
- There are two features of historical `awk' implementations that
-`gawk' supports. First, it is possible to call the `length' built-in
-function not only with no arguments, but even without parentheses!
-
- a = length
-
-is the same as either of
-
- a = length()
- a = length($0)
-
-This feature is marked as "deprecated" in the POSIX standard, and
-`gawk' will issue a warning about its use if `-W lint' is specified on
-the command line.
-
- The other feature is the use of the `continue' statement outside the
-body of a `while', `for', or `do' loop. Traditional `awk'
-implementations have treated such usage as equivalent to the `next'
-statement. `gawk' will support this usage if `-W posix' has not been
-specified.
-
-
-File: gawk.info, Node: Sample Program, Next: Bugs, Prev: Gawk Summary, Up: Top
-
-Sample Program
-**************
-
- The following example is a complete `awk' program, which prints the
-number of occurrences of each word in its input. It illustrates the
-associative nature of `awk' arrays by using strings as subscripts. It
-also demonstrates the `for X in ARRAY' construction. Finally, it shows
-how `awk' can be used in conjunction with other utility programs to do
-a useful task of some complexity with a minimum of effort. Some
-explanations follow the program listing.
-
- awk '
- # Print list of word frequencies
- {
- for (i = 1; i <= NF; i++)
- freq[$i]++
- }
-
- END {
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
- }'
-
- The first thing to notice about this program is that it has two
-rules. The first rule, because it has an empty pattern, is executed on
-every line of the input. It uses `awk''s field-accessing mechanism
-(*note Examining Fields: Fields.) to pick out the individual words from
-the line, and the built-in variable `NF' (*note Built-in Variables::.)
-to know how many fields are available.
-
- For each input word, an element of the array `freq' is incremented to
-reflect that the word has been seen an additional time.
-
- The second rule, because it has the pattern `END', is not executed
-until the input has been exhausted. It prints out the contents of the
-`freq' table that has been built up inside the first action.
-
- Note that this program has several problems that would prevent it
-from being useful by itself on real text files:
-
- * Words are detected using the `awk' convention that fields are
- separated by whitespace and that other characters in the input
- (except newlines) don't have any special meaning to `awk'. This
- means that punctuation characters count as part of words.
-
- * The `awk' language considers upper and lower case characters to be
- distinct. Therefore, `foo' and `Foo' are not treated by this
- program as the same word. This is undesirable since in normal
- text, words are capitalized if they begin sentences, and a
- frequency analyzer should not be sensitive to that.
-
- * The output does not come out in any useful order. You're more
- likely to be interested in which words occur most frequently, or
- having an alphabetized table of how frequently each word occurs.
-
- The way to solve these problems is to use some of the more advanced
-features of the `awk' language. First, we use `tolower' to remove case
-distinctions. Next, we use `gsub' to remove punctuation characters.
-Finally, we use the system `sort' utility to process the output of the
-`awk' script. First, here is the new version of the program:
-
- awk '
- # Print list of word frequencies
- {
- $0 = tolower($0) # remove case distinctions
- gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation
- for (i = 1; i <= NF; i++)
- freq[$i]++
- }
-
- END {
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
- }'
-
- Assuming we have saved this program in a file named `frequency.awk',
-and that the data is in `file1', the following pipeline
-
- awk -f frequency.awk file1 | sort +1 -nr
-
-produces a table of the words appearing in `file1' in order of
-decreasing frequency.
-
- The `awk' program suitably massages the data and produces a word
-frequency table, which is not ordered.
-
- The `awk' script's output is then sorted by the `sort' command and
-printed on the terminal. The options given to `sort' in this example
-specify to sort using the second field of each input line (skipping one
-field), that the sort keys should be treated as numeric quantities
-(otherwise `15' would come before `5'), and that the sorting should be
-done in descending (reverse) order.
-
- We could have even done the `sort' from within the program, by
-changing the `END' action to:
-
- END {
- sort = "sort +1 -nr"
- for (word in freq)
- printf "%s\t%d\n", word, freq[word] | sort
- close(sort)
- }'
-
- See the general operating system documentation for more information
-on how to use the `sort' command.
-
-
-File: gawk.info, Node: Bugs, Next: Notes, Prev: Sample Program, Up: Top
-
-Reporting Problems and Bugs
-***************************
-
- If you have problems with `gawk' or think that you have found a bug,
-please report it to the developers; we cannot promise to do anything
-but we might well want to fix it.
-
- Before reporting a bug, make sure you have actually found a real bug.
-Carefully reread the documentation and see if it really says you can do
-what you're trying to do. If it's not clear whether you should be able
-to do something or not, report that too; it's a bug in the
-documentation!
-
- Before reporting a bug or trying to fix it yourself, try to isolate
-it to the smallest possible `awk' program and input data file that
-reproduces the problem. Then send us the program and data file, some
-idea of what kind of Unix system you're using, and the exact results
-`gawk' gave you. Also say what you expected to occur; this will help
-us decide whether the problem was really in the documentation.
-
- Once you have a precise problem, send e-mail to (Internet)
-`bug-gnu-utils@prep.ai.mit.edu' or (UUCP)
-`mit-eddie!prep.ai.mit.edu!bug-gnu-utils'. Please include the version
-number of `gawk' you are using. You can get this information with the
-command `gawk -W version '{}' /dev/null'. You should send carbon
-copies of your mail to David Trueman at `david@cs.dal.ca', and to
-Arnold Robbins, who can be reached at `arnold@skeeve.atl.ga.us'. David
-is most likely to fix code problems, while Arnold is most likely to fix
-documentation problems.
-
- Non-bug suggestions are always welcome as well. If you have
-questions about things that are unclear in the documentation or are
-just obscure features, ask Arnold Robbins; he will try to help you out,
-although he may not have the time to fix the problem. You can send him
-electronic mail at the Internet address above.
-
- If you find bugs in one of the non-Unix ports of `gawk', please send
-an electronic mail message to the person who maintains that port. They
-are listed below, and also in the `README' file in the `gawk'
-distribution. Information in the `README' file should be considered
-authoritative if it conflicts with this manual.
-
- The people maintaining the non-Unix ports of `gawk' are:
-
-MS-DOS
- The port to MS-DOS is maintained by Scott Deifik. His electronic
- mail address is `scottd@amgen.com'.
-
-VMS
- The port to VAX VMS is maintained by Pat Rankin. His electronic
- mail address is `rankin@eql.caltech.edu'.
-
-Atari ST
- The port to the Atari ST is maintained by Michal Jaegermann. His
- electronic mail address is `ntomczak@vm.ucs.ualberta.ca'.
-
- If your bug is also reproducible under Unix, please send copies of
-your report to the general GNU bug list, as well as to Arnold Robbins
-and David Trueman, at the addresses listed above.
-
-
-File: gawk.info, Node: Notes, Next: Glossary, Prev: Bugs, Up: Top
-
-Implementation Notes
-********************
-
- This appendix contains information mainly of interest to
-implementors and maintainers of `gawk'. Everything in it applies
-specifically to `gawk', and not to other implementations.
-
-* Menu:
-
-* Compatibility Mode:: How to disable certain `gawk' extensions.
-* Future Extensions:: New features we may implement soon.
-* Improvements:: Suggestions for improvements by volunteers.
-
-
-File: gawk.info, Node: Compatibility Mode, Next: Future Extensions, Prev: Notes, Up: Notes
-
-Downward Compatibility and Debugging
-====================================
-
- *Note Extensions in `gawk' not in POSIX `awk': POSIX/GNU, for a
-summary of the GNU extensions to the `awk' language and program. All
-of these features can be turned off by invoking `gawk' with the `-W
-compat' option, or with the `-W posix' option.
-
- If `gawk' is compiled for debugging with `-DDEBUG', then there is
-one more option available on the command line:
-
-`-W parsedebug'
- Print out the parse stack information as the program is being
- parsed.
-
- This option is intended only for serious `gawk' developers, and not
-for the casual user. It probably has not even been compiled into your
-version of `gawk', since it slows down execution.
-
-
-File: gawk.info, Node: Future Extensions, Next: Improvements, Prev: Compatibility Mode, Up: Notes
-
-Probable Future Extensions
-==========================
-
- This section briefly lists extensions that indicate the directions
-we are currently considering for `gawk'. The file `FUTURES' in the
-`gawk' distributions lists these extensions, as well as several others.
-
-`RS' as a regexp
- The meaning of `RS' may be generalized along the lines of `FS'.
-
-Control of subprocess environment
- Changes made in `gawk' to the array `ENVIRON' may be propagated to
- subprocesses run by `gawk'.
-
-Databases
- It may be possible to map a GDBM/NDBM/SDBM file into an `awk'
- array.
-
-Single-character fields
- The null string, `""', as a field separator, will cause field
- splitting and the `split' function to separate individual
- characters. Thus, `split(a, "abcd", "")' would yield `a[1] ==
- "a"', `a[2] == "b"', and so on.
-
-More `lint' warnings
- There are more things that could be checked for portability.
-
-`RECLEN' variable for fixed length records
- Along with `FIELDWIDTHS', this would speed up the processing of
- fixed-length records.
-
-`RT' variable to hold the record terminator
- It is occasionally useful to have access to the actual string of
- characters that matched the `RS' variable. The `RT' variable
- would hold these characters.
-
-A `restart' keyword
- After modifying `$0', `restart' would restart the pattern matching
- loop, without reading a new record from the input.
-
-A `|&' redirection
- The `|&' redirection, in place of `|', would open a two-way
- pipeline for communication with a sub-process (via `getline' and
- `print' and `printf').
-
-`IGNORECASE' affecting all comparisons
- The effects of the `IGNORECASE' variable may be generalized to all
- string comparisons, and not just regular expression operations.
-
-A way to mix command line source code and library files
- There may be a new option that would make it possible to easily
- use library functions from a program entered on the command line.
-
-GNU-style long options
- We will add GNU-style long options to `gawk' for compatibility
- with other GNU programs. (For example, `--field-separator=:'
- would be equivalent to `-F:'.)
-
-
-File: gawk.info, Node: Improvements, Prev: Future Extensions, Up: Notes
-
-Suggestions for Improvements
-============================
-
- Here are some projects that would-be `gawk' hackers might like to
-take on. They vary in size from a few days to a few weeks of
-programming, depending on which one you choose and how fast a
-programmer you are. Please send any improvements you write to the
-maintainers at the GNU project.
-
- 1. Compilation of `awk' programs: `gawk' uses a Bison (YACC-like)
- parser to convert the script given it into a syntax tree; the
- syntax tree is then executed by a simple recursive evaluator.
- This method incurs a lot of overhead, since the recursive
- evaluator performs many procedure calls to do even the simplest
- things.
-
- It should be possible for `gawk' to convert the script's parse tree
- into a C program which the user would then compile, using the
- normal C compiler and a special `gawk' library to provide all the
- needed functions (regexps, fields, associative arrays, type
- coercion, and so on).
-
- An easier possibility might be for an intermediate phase of `awk'
- to convert the parse tree into a linear byte code form like the
- one used in GNU Emacs Lisp. The recursive evaluator would then be
- replaced by a straight line byte code interpreter that would be
- intermediate in speed between running a compiled program and doing
- what `gawk' does now.
-
- This may actually happen for the 3.0 version of `gawk'.
-
- 2. An error message section has not been included in this version of
- the manual. Perhaps some nice beta testers will document some of
- the messages for the future.
-
- 3. The programs in the test suite could use documenting in this
- manual.
-
- 4. The programs and data files in the manual should be available in
- separate files to facilitate experimentation.
-
- 5. See the `FUTURES' file for more ideas. Contact us if you would
- seriously like to tackle any of the items listed there.
-
-
-File: gawk.info, Node: Glossary, Next: Index, Prev: Notes, Up: Top
-
-Glossary
-********
-
-Action
- A series of `awk' statements attached to a rule. If the rule's
- pattern matches an input record, the `awk' language executes the
- rule's action. Actions are always enclosed in curly braces.
- *Note Overview of Actions: Actions.
-
-Amazing `awk' Assembler
- Henry Spencer at the University of Toronto wrote a retargetable
- assembler completely as `awk' scripts. It is thousands of lines
- long, including machine descriptions for several 8-bit
- microcomputers. It is a good example of a program that would have
- been better written in another language.
-
-ANSI
- The American National Standards Institute. This organization
- produces many standards, among them the standard for the C
- programming language.
-
-Assignment
- An `awk' expression that changes the value of some `awk' variable
- or data object. An object that you can assign to is called an
- "lvalue". *Note Assignment Expressions: Assignment Ops.
-
-`awk' Language
- The language in which `awk' programs are written.
-
-`awk' Program
- An `awk' program consists of a series of "patterns" and "actions",
- collectively known as "rules". For each input record given to the
- program, the program's rules are all processed in turn. `awk'
- programs may also contain function definitions.
-
-`awk' Script
- Another name for an `awk' program.
-
-Built-in Function
- The `awk' language provides built-in functions that perform various
- numerical, time stamp related, and string computations. Examples
- are `sqrt' (for the square root of a number) and `substr' (for a
- substring of a string). *Note Built-in Functions: Built-in.
-
-Built-in Variable
- `ARGC', `ARGIND', `ARGV', `CONVFMT', `ENVIRON', `ERRNO',
- `FIELDWIDTHS', `FILENAME', `FNR', `FS', `IGNORECASE', `NF', `NR',
- `OFMT', `OFS', `ORS', `RLENGTH', `RSTART', `RS', and `SUBSEP', are
- the variables that have special meaning to `awk'. Changing some
- of them affects `awk''s running environment. *Note Built-in
- Variables::.
-
-Braces
- See "Curly Braces."
-
-C
- The system programming language that most GNU software is written
- in. The `awk' programming language has C-like syntax, and this
- manual points out similarities between `awk' and C when
- appropriate.
-
-CHEM
- A preprocessor for `pic' that reads descriptions of molecules and
- produces `pic' input for drawing them. It was written by Brian
- Kernighan, and is available from `netlib@research.att.com'.
-
-Compound Statement
- A series of `awk' statements, enclosed in curly braces. Compound
- statements may be nested. *Note Control Statements in Actions:
- Statements.
-
-Concatenation
- Concatenating two strings means sticking them together, one after
- another, giving a new string. For example, the string `foo'
- concatenated with the string `bar' gives the string `foobar'.
- *Note String Concatenation: Concatenation.
-
-Conditional Expression
- An expression using the `?:' ternary operator, such as `EXPR1 ?
- EXPR2 : EXPR3'. The expression EXPR1 is evaluated; if the result
- is true, the value of the whole expression is the value of EXPR2
- otherwise the value is EXPR3. In either case, only one of EXPR2
- and EXPR3 is evaluated. *Note Conditional Expressions:
- Conditional Exp.
-
-Constant Regular Expression
- A constant regular expression is a regular expression written
- within slashes, such as `/foo/'. This regular expression is chosen
- when you write the `awk' program, and cannot be changed doing its
- execution. *Note How to Use Regular Expressions: Regexp Usage.
-
-Comparison Expression
- A relation that is either true or false, such as `(a < b)'.
- Comparison expressions are used in `if', `while', and `for'
- statements, and in patterns to select which input records to
- process. *Note Comparison Expressions: Comparison Ops.
-
-Curly Braces
- The characters `{' and `}'. Curly braces are used in `awk' for
- delimiting actions, compound statements, and function bodies.
-
-Data Objects
- These are numbers and strings of characters. Numbers are
- converted into strings and vice versa, as needed. *Note
- Conversion of Strings and Numbers: Conversion.
-
-Dynamic Regular Expression
- A dynamic regular expression is a regular expression written as an
- ordinary expression. It could be a string constant, such as
- `"foo"', but it may also be an expression whose value may vary.
- *Note How to Use Regular Expressions: Regexp Usage.
-
-Escape Sequences
- A special sequence of characters used for describing nonprinting
- characters, such as `\n' for newline, or `\033' for the ASCII ESC
- (escape) character. *Note Constant Expressions: Constants.
-
-Field
- When `awk' reads an input record, it splits the record into pieces
- separated by whitespace (or by a separator regexp which you can
- change by setting the built-in variable `FS'). Such pieces are
- called fields. If the pieces are of fixed length, you can use the
- built-in variable `FIELDWIDTHS' to describe their lengths. *Note
- How Input is Split into Records: Records.
-
-Format
- Format strings are used to control the appearance of output in the
- `printf' statement. Also, data conversions from numbers to strings
- are controlled by the format string contained in the built-in
- variable `CONVFMT'. *Note Format-Control Letters: Control Letters.
-
-Function
- A specialized group of statements often used to encapsulate general
- or program-specific tasks. `awk' has a number of built-in
- functions, and also allows you to define your own. *Note Built-in
- Functions: Built-in. Also, see *Note User-defined Functions:
- User-defined.
-
-`gawk'
- The GNU implementation of `awk'.
-
-GNU
- "GNU's not Unix". An on-going project of the Free Software
- Foundation to create a complete, freely distributable,
- POSIX-compliant computing environment.
-
-Input Record
- A single chunk of data read in by `awk'. Usually, an `awk' input
- record consists of one line of text. *Note How Input is Split
- into Records: Records.
-
-Keyword
- In the `awk' language, a keyword is a word that has special
- meaning. Keywords are reserved and may not be used as variable
- names.
-
- `awk''s keywords are: `if', `else', `while', `do...while', `for',
- `for...in', `break', `continue', `delete', `next', `function',
- `func', and `exit'.
-
-Lvalue
- An expression that can appear on the left side of an assignment
- operator. In most languages, lvalues can be variables or array
- elements. In `awk', a field designator can also be used as an
- lvalue.
-
-Number
- A numeric valued data object. The `gawk' implementation uses
- double precision floating point to represent numbers.
-
-Pattern
- Patterns tell `awk' which input records are interesting to which
- rules.
-
- A pattern is an arbitrary conditional expression against which
- input is tested. If the condition is satisfied, the pattern is
- said to "match" the input record. A typical pattern might compare
- the input record against a regular expression. *Note Patterns::.
-
-POSIX
- The name for a series of standards being developed by the IEEE
- that specify a Portable Operating System interface. The "IX"
- denotes the Unix heritage of these standards. The main standard
- of interest for `awk' users is P1003.2, the Command Language and
- Utilities standard.
-
-Range (of input lines)
- A sequence of consecutive lines from the input file. A pattern
- can specify ranges of input lines for `awk' to process, or it can
- specify single lines. *Note Patterns::.
-
-Recursion
- When a function calls itself, either directly or indirectly. If
- this isn't clear, refer to the entry for "recursion."
-
-Redirection
- Redirection means performing input from other than the standard
- input stream, or output to other than the standard output stream.
-
- You can redirect the output of the `print' and `printf' statements
- to a file or a system command, using the `>', `>>', and `|'
- operators. You can redirect input to the `getline' statement using
- the `<' and `|' operators. *Note Redirecting Output of `print'
- and `printf': Redirection.
-
-Regular Expression
- See "regexp."
-
-Regexp
- Short for "regular expression". A regexp is a pattern that
- denotes a set of strings, possibly an infinite set. For example,
- the regexp `R.*xp' matches any string starting with the letter `R'
- and ending with the letters `xp'. In `awk', regexps are used in
- patterns and in conditional expressions. Regexps may contain
- escape sequences. *Note Regular Expressions as Patterns: Regexp.
-
-Rule
- A segment of an `awk' program, that specifies how to process single
- input records. A rule consists of a "pattern" and an "action".
- `awk' reads an input record; then, for each rule, if the input
- record satisfies the rule's pattern, `awk' executes the rule's
- action. Otherwise, the rule does nothing for that input record.
-
-Side Effect
- A side effect occurs when an expression has an effect aside from
- merely producing a value. Assignment expressions, increment
- expressions and function calls have side effects. *Note
- Assignment Expressions: Assignment Ops.
-
-Special File
- A file name interpreted internally by `gawk', instead of being
- handed directly to the underlying operating system. For example,
- `/dev/stdin'. *Note Standard I/O Streams: Special Files.
-
-Stream Editor
- A program that reads records from an input stream and processes
- them one or more at a time. This is in contrast with batch
- programs, which may expect to read their input files in entirety
- before starting to do anything, and with interactive programs,
- which require input from the user.
-
-String
- A datum consisting of a sequence of characters, such as `I am a
- string'. Constant strings are written with double-quotes in the
- `awk' language, and may contain escape sequences. *Note Constant
- Expressions: Constants.
-
-Whitespace
- A sequence of blank or tab characters occurring inside an input
- record or a string.
-