diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2010-07-16 12:27:41 +0300 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2010-07-16 12:27:41 +0300 |
commit | 61bb57af53ebe916d2db6e3585d4fc7ac1d99b92 (patch) | |
tree | 2bfc4e5b127618d286f57a87d416702131b1b01d /gawk.info-8 | |
parent | 0a9ae0c89481db540e1b817a63cc6c793a62c90d (diff) | |
download | egawk-61bb57af53ebe916d2db6e3585d4fc7ac1d99b92.tar.gz egawk-61bb57af53ebe916d2db6e3585d4fc7ac1d99b92.tar.bz2 egawk-61bb57af53ebe916d2db6e3585d4fc7ac1d99b92.zip |
Move to gawk-2.15.3.
Diffstat (limited to 'gawk.info-8')
-rw-r--r-- | gawk.info-8 | 1173 |
1 files changed, 0 insertions, 1173 deletions
diff --git a/gawk.info-8 b/gawk.info-8 deleted file mode 100644 index d0d693ff..00000000 --- a/gawk.info-8 +++ /dev/null @@ -1,1173 +0,0 @@ -This is Info file gawk.info, produced by Makeinfo-1.54 from the input -file gawk.texi. - - This file documents `awk', a program that you can use to select -particular records in a file and perform operations upon them. - - This is Edition 0.15 of `The GAWK Manual', -for the 2.15 version of the GNU implementation -of AWK. - - Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. - - Permission is granted to make and distribute verbatim copies of this -manual provided the copyright notice and this permission notice are -preserved on all copies. - - Permission is granted to copy and distribute modified versions of -this manual under the conditions for verbatim copying, provided that -the entire resulting derived work is distributed under the terms of a -permission notice identical to this one. - - Permission is granted to copy and distribute translations of this -manual into another language, under the above conditions for modified -versions, except that this permission notice may be stated in a -translation approved by the Foundation. - - -File: gawk.info, Node: Regexp Summary, Next: Actions Summary, Prev: Pattern Summary, Up: Rules Summary - -Regular Expressions -------------------- - - Regular expressions are the extended kind found in `egrep'. They -are composed of characters as follows: - -`C' - matches the character C (assuming C is a character with no special - meaning in regexps). - -`\C' - matches the literal character C. - -`.' - matches any character except newline. - -`^' - matches the beginning of a line or a string. - -`$' - matches the end of a line or a string. - -`[ABC...]' - matches any of the characters ABC... (character class). - -`[^ABC...]' - matches any character except ABC... and newline (negated character - class). - -`R1|R2' - matches either R1 or R2 (alternation). - -`R1R2' - matches R1, and then R2 (concatenation). - -`R+' - matches one or more R's. - -`R*' - matches zero or more R's. - -`R?' - matches zero or one R's. - -`(R)' - matches R (grouping). - - *Note Regular Expressions as Patterns: Regexp, for a more detailed -explanation of regular expressions. - - The escape sequences allowed in string constants are also valid in -regular expressions (*note Constant Expressions: Constants.). - - -File: gawk.info, Node: Actions Summary, Prev: Regexp Summary, Up: Rules Summary - -Actions -------- - - Action statements are enclosed in braces, `{' and `}'. Action -statements consist of the usual assignment, conditional, and looping -statements found in most languages. The operators, control statements, -and input/output statements available are patterned after those in C. - -* Menu: - -* Operator Summary:: `awk' operators. -* Control Flow Summary:: The control statements. -* I/O Summary:: The I/O statements. -* Printf Summary:: A summary of `printf'. -* Special File Summary:: Special file names interpreted internally. -* Numeric Functions Summary:: Built-in numeric functions. -* String Functions Summary:: Built-in string functions. -* Time Functions Summary:: Built-in time functions. -* String Constants Summary:: Escape sequences in strings. - - -File: gawk.info, Node: Operator Summary, Next: Control Flow Summary, Prev: Actions Summary, Up: Actions Summary - -Operators -......... - - The operators in `awk', in order of increasing precedence, are: - -`= += -= *= /= %= ^=' - Assignment. Both absolute assignment (`VAR=VALUE') and operator - assignment (the other forms) are supported. - -`?:' - A conditional expression, as in C. This has the form `EXPR1 ? - eXPR2 : EXPR3'. If EXPR1 is true, the value of the expression is - EXPR2; otherwise it is EXPR3. Only one of EXPR2 and EXPR3 is - evaluated. - -`||' - Logical "or". - -`&&' - Logical "and". - -`~ !~' - Regular expression match, negated match. - -`< <= > >= != ==' - The usual relational operators. - -`BLANK' - String concatenation. - -`+ -' - Addition and subtraction. - -`* / %' - Multiplication, division, and modulus. - -`+ - !' - Unary plus, unary minus, and logical negation. - -`^' - Exponentiation (`**' may also be used, and `**=' for the assignment - operator, but they are not specified in the POSIX standard). - -`++ --' - Increment and decrement, both prefix and postfix. - -`$' - Field reference. - - *Note Expressions as Action Statements: Expressions, for a full -description of all the operators listed above. *Note Examining Fields: -Fields, for a description of the field reference operator. - - -File: gawk.info, Node: Control Flow Summary, Next: I/O Summary, Prev: Operator Summary, Up: Actions Summary - -Control Statements -.................. - - The control statements are as follows: - - if (CONDITION) STATEMENT [ else STATEMENT ] - while (CONDITION) STATEMENT - do STATEMENT while (CONDITION) - for (EXPR1; EXPR2; EXPR3) STATEMENT - for (VAR in ARRAY) STATEMENT - break - continue - delete ARRAY[INDEX] - exit [ EXPRESSION ] - { STATEMENTS } - - *Note Control Statements in Actions: Statements, for a full -description of all the control statements listed above. - - -File: gawk.info, Node: I/O Summary, Next: Printf Summary, Prev: Control Flow Summary, Up: Actions Summary - -I/O Statements -.............. - - The input/output statements are as follows: - -`getline' - Set `$0' from next input record; set `NF', `NR', `FNR'. - -`getline <FILE' - Set `$0' from next record of FILE; set `NF'. - -`getline VAR' - Set VAR from next input record; set `NF', `FNR'. - -`getline VAR <FILE' - Set VAR from next record of FILE. - -`next' - Stop processing the current input record. The next input record - is read and processing starts over with the first pattern in the - `awk' program. If the end of the input data is reached, the `END' - rule(s), if any, are executed. - -`next file' - Stop processing the current input file. The next input record - read comes from the next input file. `FILENAME' is updated, `FNR' - is set to 1, and processing starts over with the first pattern in - the `awk' program. If the end of the input data is reached, the - `END' rule(s), if any, are executed. - -`print' - Prints the current record. - -`print EXPR-LIST' - Prints expressions. - -`print EXPR-LIST > FILE' - Prints expressions on FILE. - -`printf FMT, EXPR-LIST' - Format and print. - -`printf FMT, EXPR-LIST > file' - Format and print on FILE. - - Other input/output redirections are also allowed. For `print' and -`printf', `>> FILE' appends output to the FILE, and `| COMMAND' writes -on a pipe. In a similar fashion, `COMMAND | getline' pipes input into -`getline'. `getline' returns 0 on end of file, and -1 on an error. - - *Note Explicit Input with `getline': Getline, for a full description -of the `getline' statement. *Note Printing Output: Printing, for a -full description of `print' and `printf'. Finally, *note The `next' -Statement: Next Statement., for a description of how the `next' -statement works. - - -File: gawk.info, Node: Printf Summary, Next: Special File Summary, Prev: I/O Summary, Up: Actions Summary - -`printf' Summary -................ - - The `awk' `printf' statement and `sprintf' function accept the -following conversion specification formats: - -`%c' - An ASCII character. If the argument used for `%c' is numeric, it - is treated as a character and printed. Otherwise, the argument is - assumed to be a string, and the only first character of that - string is printed. - -`%d' -`%i' - A decimal number (the integer part). - -`%e' - A floating point number of the form `[-]d.ddddddE[+-]dd'. - -`%f' - A floating point number of the form [`-']`ddd.dddddd'. - -`%g' - Use `%e' or `%f' conversion, whichever produces a shorter string, - with nonsignificant zeros suppressed. - -`%o' - An unsigned octal number (again, an integer). - -`%s' - A character string. - -`%x' - An unsigned hexadecimal number (an integer). - -`%X' - Like `%x', except use `A' through `F' instead of `a' through `f' - for decimal 10 through 15. - -`%%' - A single `%' character; no argument is converted. - - There are optional, additional parameters that may lie between the -`%' and the control letter: - -`-' - The expression should be left-justified within its field. - -`WIDTH' - The field should be padded to this width. If WIDTH has a leading - zero, then the field is padded with zeros. Otherwise it is padded - with blanks. - -`.PREC' - A number indicating the maximum width of strings or digits to the - right of the decimal point. - - Either or both of the WIDTH and PREC values may be specified as `*'. -In that case, the particular value is taken from the argument list. - - *Note Using `printf' Statements for Fancier Printing: Printf, for -examples and for a more detailed description. - - -File: gawk.info, Node: Special File Summary, Next: Numeric Functions Summary, Prev: Printf Summary, Up: Actions Summary - -Special File Names -.................. - - When doing I/O redirection from either `print' or `printf' into a -file, or via `getline' from a file, `gawk' recognizes certain special -file names internally. These file names allow access to open file -descriptors inherited from `gawk''s parent process (usually the shell). -The file names are: - -`/dev/stdin' - The standard input. - -`/dev/stdout' - The standard output. - -`/dev/stderr' - The standard error output. - -`/dev/fd/N' - The file denoted by the open file descriptor N. - - In addition the following files provide process related information -about the running `gawk' program. - -`/dev/pid' - Reading this file returns the process ID of the current process, - in decimal, terminated with a newline. - -`/dev/ppid' - Reading this file returns the parent process ID of the current - process, in decimal, terminated with a newline. - -`/dev/pgrpid' - Reading this file returns the process group ID of the current - process, in decimal, terminated with a newline. - -`/dev/user' - Reading this file returns a single record terminated with a - newline. The fields are separated with blanks. The fields - represent the following information: - - `$1' - The value of the `getuid' system call. - - `$2' - The value of the `geteuid' system call. - - `$3' - The value of the `getgid' system call. - - `$4' - The value of the `getegid' system call. - - If there are any additional fields, they are the group IDs - returned by `getgroups' system call. (Multiple groups may not be - supported on all systems.) - -These file names may also be used on the command line to name data -files. These file names are only recognized internally if you do not -actually have files by these names on your system. - - *Note Standard I/O Streams: Special Files, for a longer description -that provides the motivation for this feature. - - -File: gawk.info, Node: Numeric Functions Summary, Next: String Functions Summary, Prev: Special File Summary, Up: Actions Summary - -Numeric Functions -................. - - `awk' has the following predefined arithmetic functions: - -`atan2(Y, X)' - returns the arctangent of Y/X in radians. - -`cos(EXPR)' - returns the cosine in radians. - -`exp(EXPR)' - the exponential function. - -`int(EXPR)' - truncates to integer. - -`log(EXPR)' - the natural logarithm function. - -`rand()' - returns a random number between 0 and 1. - -`sin(EXPR)' - returns the sine in radians. - -`sqrt(EXPR)' - the square root function. - -`srand(EXPR)' - use EXPR as a new seed for the random number generator. If no EXPR - is provided, the time of day is used. The return value is the - previous seed for the random number generator. - - -File: gawk.info, Node: String Functions Summary, Next: Time Functions Summary, Prev: Numeric Functions Summary, Up: Actions Summary - -String Functions -................ - - `awk' has the following predefined string functions: - -`gsub(R, S, T)' - for each substring matching the regular expression R in the string - T, substitute the string S, and return the number of substitutions. - If T is not supplied, use `$0'. - -`index(S, T)' - returns the index of the string T in the string S, or 0 if T is - not present. - -`length(S)' - returns the length of the string S. The length of `$0' is - returned if no argument is supplied. - -`match(S, R)' - returns the position in S where the regular expression R occurs, - or 0 if R is not present, and sets the values of `RSTART' and - `RLENGTH'. - -`split(S, A, R)' - splits the string S into the array A on the regular expression R, - and returns the number of fields. If R is omitted, `FS' is used - instead. - -`sprintf(FMT, EXPR-LIST)' - prints EXPR-LIST according to FMT, and returns the resulting - string. - -`sub(R, S, T)' - this is just like `gsub', but only the first matching substring is - replaced. - -`substr(S, I, N)' - returns the N-character substring of S starting at I. If N is - omitted, the rest of S is used. - -`tolower(STR)' - returns a copy of the string STR, with all the upper-case - characters in STR translated to their corresponding lower-case - counterparts. Nonalphabetic characters are left unchanged. - -`toupper(STR)' - returns a copy of the string STR, with all the lower-case - characters in STR translated to their corresponding upper-case - counterparts. Nonalphabetic characters are left unchanged. - -`system(CMD-LINE)' - Execute the command CMD-LINE, and return the exit status. - - -File: gawk.info, Node: Time Functions Summary, Next: String Constants Summary, Prev: String Functions Summary, Up: Actions Summary - -Built-in time functions -....................... - - The following two functions are available for getting the current -time of day, and for formatting time stamps. - -`systime()' - returns the current time of day as the number of seconds since a - particular epoch (Midnight, January 1, 1970 UTC, on POSIX systems). - -`strftime(FORMAT, TIMESTAMP)' - formats TIMESTAMP according to the specification in FORMAT. The - current time of day is used if no TIMESTAMP is supplied. *Note - Functions for Dealing with Time Stamps: Time Functions, for the - details on the conversion specifiers that `strftime' accepts. - - -File: gawk.info, Node: String Constants Summary, Prev: Time Functions Summary, Up: Actions Summary - -String Constants -................ - - String constants in `awk' are sequences of characters enclosed -between double quotes (`"'). Within strings, certain "escape sequences" -are recognized, as in C. These are: - -`\\' - A literal backslash. - -`\a' - The "alert" character; usually the ASCII BEL character. - -`\b' - Backspace. - -`\f' - Formfeed. - -`\n' - Newline. - -`\r' - Carriage return. - -`\t' - Horizontal tab. - -`\v' - Vertical tab. - -`\xHEX DIGITS' - The character represented by the string of hexadecimal digits - following the `\x'. As in ANSI C, all following hexadecimal - digits are considered part of the escape sequence. (This feature - should tell us something about language design by committee.) - E.g., `"\x1B"' is a string containing the ASCII ESC (escape) - character. (The `\x' escape sequence is not in POSIX `awk'.) - -`\DDD' - The character represented by the 1-, 2-, or 3-digit sequence of - octal digits. Thus, `"\033"' is also a string containing the - ASCII ESC (escape) character. - -`\C' - The literal character C. - - The escape sequences may also be used inside constant regular -expressions (e.g., the regexp `/[ \t\f\n\r\v]/' matches whitespace -characters). - - *Note Constant Expressions: Constants. - - -File: gawk.info, Node: Functions Summary, Next: Historical Features, Prev: Rules Summary, Up: Gawk Summary - -Functions -========= - - Functions in `awk' are defined as follows: - - function NAME(PARAMETER LIST) { STATEMENTS } - - Actual parameters supplied in the function call are used to -instantiate the formal parameters declared in the function. Arrays are -passed by reference, other variables are passed by value. - - If there are fewer arguments passed than there are names in -PARAMETER-LIST, the extra names are given the null string as value. -Extra names have the effect of local variables. - - The open-parenthesis in a function call of a user-defined function -must immediately follow the function name, without any intervening -white space. This is to avoid a syntactic ambiguity with the -concatenation operator. - - The word `func' may be used in place of `function' (but not in POSIX -`awk'). - - Use the `return' statement to return a value from a function. - - *Note User-defined Functions: User-defined, for a more complete -description. - - -File: gawk.info, Node: Historical Features, Prev: Functions Summary, Up: Gawk Summary - -Historical Features -=================== - - There are two features of historical `awk' implementations that -`gawk' supports. First, it is possible to call the `length' built-in -function not only with no arguments, but even without parentheses! - - a = length - -is the same as either of - - a = length() - a = length($0) - -This feature is marked as "deprecated" in the POSIX standard, and -`gawk' will issue a warning about its use if `-W lint' is specified on -the command line. - - The other feature is the use of the `continue' statement outside the -body of a `while', `for', or `do' loop. Traditional `awk' -implementations have treated such usage as equivalent to the `next' -statement. `gawk' will support this usage if `-W posix' has not been -specified. - - -File: gawk.info, Node: Sample Program, Next: Bugs, Prev: Gawk Summary, Up: Top - -Sample Program -************** - - The following example is a complete `awk' program, which prints the -number of occurrences of each word in its input. It illustrates the -associative nature of `awk' arrays by using strings as subscripts. It -also demonstrates the `for X in ARRAY' construction. Finally, it shows -how `awk' can be used in conjunction with other utility programs to do -a useful task of some complexity with a minimum of effort. Some -explanations follow the program listing. - - awk ' - # Print list of word frequencies - { - for (i = 1; i <= NF; i++) - freq[$i]++ - } - - END { - for (word in freq) - printf "%s\t%d\n", word, freq[word] - }' - - The first thing to notice about this program is that it has two -rules. The first rule, because it has an empty pattern, is executed on -every line of the input. It uses `awk''s field-accessing mechanism -(*note Examining Fields: Fields.) to pick out the individual words from -the line, and the built-in variable `NF' (*note Built-in Variables::.) -to know how many fields are available. - - For each input word, an element of the array `freq' is incremented to -reflect that the word has been seen an additional time. - - The second rule, because it has the pattern `END', is not executed -until the input has been exhausted. It prints out the contents of the -`freq' table that has been built up inside the first action. - - Note that this program has several problems that would prevent it -from being useful by itself on real text files: - - * Words are detected using the `awk' convention that fields are - separated by whitespace and that other characters in the input - (except newlines) don't have any special meaning to `awk'. This - means that punctuation characters count as part of words. - - * The `awk' language considers upper and lower case characters to be - distinct. Therefore, `foo' and `Foo' are not treated by this - program as the same word. This is undesirable since in normal - text, words are capitalized if they begin sentences, and a - frequency analyzer should not be sensitive to that. - - * The output does not come out in any useful order. You're more - likely to be interested in which words occur most frequently, or - having an alphabetized table of how frequently each word occurs. - - The way to solve these problems is to use some of the more advanced -features of the `awk' language. First, we use `tolower' to remove case -distinctions. Next, we use `gsub' to remove punctuation characters. -Finally, we use the system `sort' utility to process the output of the -`awk' script. First, here is the new version of the program: - - awk ' - # Print list of word frequencies - { - $0 = tolower($0) # remove case distinctions - gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation - for (i = 1; i <= NF; i++) - freq[$i]++ - } - - END { - for (word in freq) - printf "%s\t%d\n", word, freq[word] - }' - - Assuming we have saved this program in a file named `frequency.awk', -and that the data is in `file1', the following pipeline - - awk -f frequency.awk file1 | sort +1 -nr - -produces a table of the words appearing in `file1' in order of -decreasing frequency. - - The `awk' program suitably massages the data and produces a word -frequency table, which is not ordered. - - The `awk' script's output is then sorted by the `sort' command and -printed on the terminal. The options given to `sort' in this example -specify to sort using the second field of each input line (skipping one -field), that the sort keys should be treated as numeric quantities -(otherwise `15' would come before `5'), and that the sorting should be -done in descending (reverse) order. - - We could have even done the `sort' from within the program, by -changing the `END' action to: - - END { - sort = "sort +1 -nr" - for (word in freq) - printf "%s\t%d\n", word, freq[word] | sort - close(sort) - }' - - See the general operating system documentation for more information -on how to use the `sort' command. - - -File: gawk.info, Node: Bugs, Next: Notes, Prev: Sample Program, Up: Top - -Reporting Problems and Bugs -*************************** - - If you have problems with `gawk' or think that you have found a bug, -please report it to the developers; we cannot promise to do anything -but we might well want to fix it. - - Before reporting a bug, make sure you have actually found a real bug. -Carefully reread the documentation and see if it really says you can do -what you're trying to do. If it's not clear whether you should be able -to do something or not, report that too; it's a bug in the -documentation! - - Before reporting a bug or trying to fix it yourself, try to isolate -it to the smallest possible `awk' program and input data file that -reproduces the problem. Then send us the program and data file, some -idea of what kind of Unix system you're using, and the exact results -`gawk' gave you. Also say what you expected to occur; this will help -us decide whether the problem was really in the documentation. - - Once you have a precise problem, send e-mail to (Internet) -`bug-gnu-utils@prep.ai.mit.edu' or (UUCP) -`mit-eddie!prep.ai.mit.edu!bug-gnu-utils'. Please include the version -number of `gawk' you are using. You can get this information with the -command `gawk -W version '{}' /dev/null'. You should send carbon -copies of your mail to David Trueman at `david@cs.dal.ca', and to -Arnold Robbins, who can be reached at `arnold@skeeve.atl.ga.us'. David -is most likely to fix code problems, while Arnold is most likely to fix -documentation problems. - - Non-bug suggestions are always welcome as well. If you have -questions about things that are unclear in the documentation or are -just obscure features, ask Arnold Robbins; he will try to help you out, -although he may not have the time to fix the problem. You can send him -electronic mail at the Internet address above. - - If you find bugs in one of the non-Unix ports of `gawk', please send -an electronic mail message to the person who maintains that port. They -are listed below, and also in the `README' file in the `gawk' -distribution. Information in the `README' file should be considered -authoritative if it conflicts with this manual. - - The people maintaining the non-Unix ports of `gawk' are: - -MS-DOS - The port to MS-DOS is maintained by Scott Deifik. His electronic - mail address is `scottd@amgen.com'. - -VMS - The port to VAX VMS is maintained by Pat Rankin. His electronic - mail address is `rankin@eql.caltech.edu'. - -Atari ST - The port to the Atari ST is maintained by Michal Jaegermann. His - electronic mail address is `ntomczak@vm.ucs.ualberta.ca'. - - If your bug is also reproducible under Unix, please send copies of -your report to the general GNU bug list, as well as to Arnold Robbins -and David Trueman, at the addresses listed above. - - -File: gawk.info, Node: Notes, Next: Glossary, Prev: Bugs, Up: Top - -Implementation Notes -******************** - - This appendix contains information mainly of interest to -implementors and maintainers of `gawk'. Everything in it applies -specifically to `gawk', and not to other implementations. - -* Menu: - -* Compatibility Mode:: How to disable certain `gawk' extensions. -* Future Extensions:: New features we may implement soon. -* Improvements:: Suggestions for improvements by volunteers. - - -File: gawk.info, Node: Compatibility Mode, Next: Future Extensions, Prev: Notes, Up: Notes - -Downward Compatibility and Debugging -==================================== - - *Note Extensions in `gawk' not in POSIX `awk': POSIX/GNU, for a -summary of the GNU extensions to the `awk' language and program. All -of these features can be turned off by invoking `gawk' with the `-W -compat' option, or with the `-W posix' option. - - If `gawk' is compiled for debugging with `-DDEBUG', then there is -one more option available on the command line: - -`-W parsedebug' - Print out the parse stack information as the program is being - parsed. - - This option is intended only for serious `gawk' developers, and not -for the casual user. It probably has not even been compiled into your -version of `gawk', since it slows down execution. - - -File: gawk.info, Node: Future Extensions, Next: Improvements, Prev: Compatibility Mode, Up: Notes - -Probable Future Extensions -========================== - - This section briefly lists extensions that indicate the directions -we are currently considering for `gawk'. The file `FUTURES' in the -`gawk' distributions lists these extensions, as well as several others. - -`RS' as a regexp - The meaning of `RS' may be generalized along the lines of `FS'. - -Control of subprocess environment - Changes made in `gawk' to the array `ENVIRON' may be propagated to - subprocesses run by `gawk'. - -Databases - It may be possible to map a GDBM/NDBM/SDBM file into an `awk' - array. - -Single-character fields - The null string, `""', as a field separator, will cause field - splitting and the `split' function to separate individual - characters. Thus, `split(a, "abcd", "")' would yield `a[1] == - "a"', `a[2] == "b"', and so on. - -More `lint' warnings - There are more things that could be checked for portability. - -`RECLEN' variable for fixed length records - Along with `FIELDWIDTHS', this would speed up the processing of - fixed-length records. - -`RT' variable to hold the record terminator - It is occasionally useful to have access to the actual string of - characters that matched the `RS' variable. The `RT' variable - would hold these characters. - -A `restart' keyword - After modifying `$0', `restart' would restart the pattern matching - loop, without reading a new record from the input. - -A `|&' redirection - The `|&' redirection, in place of `|', would open a two-way - pipeline for communication with a sub-process (via `getline' and - `print' and `printf'). - -`IGNORECASE' affecting all comparisons - The effects of the `IGNORECASE' variable may be generalized to all - string comparisons, and not just regular expression operations. - -A way to mix command line source code and library files - There may be a new option that would make it possible to easily - use library functions from a program entered on the command line. - -GNU-style long options - We will add GNU-style long options to `gawk' for compatibility - with other GNU programs. (For example, `--field-separator=:' - would be equivalent to `-F:'.) - - -File: gawk.info, Node: Improvements, Prev: Future Extensions, Up: Notes - -Suggestions for Improvements -============================ - - Here are some projects that would-be `gawk' hackers might like to -take on. They vary in size from a few days to a few weeks of -programming, depending on which one you choose and how fast a -programmer you are. Please send any improvements you write to the -maintainers at the GNU project. - - 1. Compilation of `awk' programs: `gawk' uses a Bison (YACC-like) - parser to convert the script given it into a syntax tree; the - syntax tree is then executed by a simple recursive evaluator. - This method incurs a lot of overhead, since the recursive - evaluator performs many procedure calls to do even the simplest - things. - - It should be possible for `gawk' to convert the script's parse tree - into a C program which the user would then compile, using the - normal C compiler and a special `gawk' library to provide all the - needed functions (regexps, fields, associative arrays, type - coercion, and so on). - - An easier possibility might be for an intermediate phase of `awk' - to convert the parse tree into a linear byte code form like the - one used in GNU Emacs Lisp. The recursive evaluator would then be - replaced by a straight line byte code interpreter that would be - intermediate in speed between running a compiled program and doing - what `gawk' does now. - - This may actually happen for the 3.0 version of `gawk'. - - 2. An error message section has not been included in this version of - the manual. Perhaps some nice beta testers will document some of - the messages for the future. - - 3. The programs in the test suite could use documenting in this - manual. - - 4. The programs and data files in the manual should be available in - separate files to facilitate experimentation. - - 5. See the `FUTURES' file for more ideas. Contact us if you would - seriously like to tackle any of the items listed there. - - -File: gawk.info, Node: Glossary, Next: Index, Prev: Notes, Up: Top - -Glossary -******** - -Action - A series of `awk' statements attached to a rule. If the rule's - pattern matches an input record, the `awk' language executes the - rule's action. Actions are always enclosed in curly braces. - *Note Overview of Actions: Actions. - -Amazing `awk' Assembler - Henry Spencer at the University of Toronto wrote a retargetable - assembler completely as `awk' scripts. It is thousands of lines - long, including machine descriptions for several 8-bit - microcomputers. It is a good example of a program that would have - been better written in another language. - -ANSI - The American National Standards Institute. This organization - produces many standards, among them the standard for the C - programming language. - -Assignment - An `awk' expression that changes the value of some `awk' variable - or data object. An object that you can assign to is called an - "lvalue". *Note Assignment Expressions: Assignment Ops. - -`awk' Language - The language in which `awk' programs are written. - -`awk' Program - An `awk' program consists of a series of "patterns" and "actions", - collectively known as "rules". For each input record given to the - program, the program's rules are all processed in turn. `awk' - programs may also contain function definitions. - -`awk' Script - Another name for an `awk' program. - -Built-in Function - The `awk' language provides built-in functions that perform various - numerical, time stamp related, and string computations. Examples - are `sqrt' (for the square root of a number) and `substr' (for a - substring of a string). *Note Built-in Functions: Built-in. - -Built-in Variable - `ARGC', `ARGIND', `ARGV', `CONVFMT', `ENVIRON', `ERRNO', - `FIELDWIDTHS', `FILENAME', `FNR', `FS', `IGNORECASE', `NF', `NR', - `OFMT', `OFS', `ORS', `RLENGTH', `RSTART', `RS', and `SUBSEP', are - the variables that have special meaning to `awk'. Changing some - of them affects `awk''s running environment. *Note Built-in - Variables::. - -Braces - See "Curly Braces." - -C - The system programming language that most GNU software is written - in. The `awk' programming language has C-like syntax, and this - manual points out similarities between `awk' and C when - appropriate. - -CHEM - A preprocessor for `pic' that reads descriptions of molecules and - produces `pic' input for drawing them. It was written by Brian - Kernighan, and is available from `netlib@research.att.com'. - -Compound Statement - A series of `awk' statements, enclosed in curly braces. Compound - statements may be nested. *Note Control Statements in Actions: - Statements. - -Concatenation - Concatenating two strings means sticking them together, one after - another, giving a new string. For example, the string `foo' - concatenated with the string `bar' gives the string `foobar'. - *Note String Concatenation: Concatenation. - -Conditional Expression - An expression using the `?:' ternary operator, such as `EXPR1 ? - EXPR2 : EXPR3'. The expression EXPR1 is evaluated; if the result - is true, the value of the whole expression is the value of EXPR2 - otherwise the value is EXPR3. In either case, only one of EXPR2 - and EXPR3 is evaluated. *Note Conditional Expressions: - Conditional Exp. - -Constant Regular Expression - A constant regular expression is a regular expression written - within slashes, such as `/foo/'. This regular expression is chosen - when you write the `awk' program, and cannot be changed doing its - execution. *Note How to Use Regular Expressions: Regexp Usage. - -Comparison Expression - A relation that is either true or false, such as `(a < b)'. - Comparison expressions are used in `if', `while', and `for' - statements, and in patterns to select which input records to - process. *Note Comparison Expressions: Comparison Ops. - -Curly Braces - The characters `{' and `}'. Curly braces are used in `awk' for - delimiting actions, compound statements, and function bodies. - -Data Objects - These are numbers and strings of characters. Numbers are - converted into strings and vice versa, as needed. *Note - Conversion of Strings and Numbers: Conversion. - -Dynamic Regular Expression - A dynamic regular expression is a regular expression written as an - ordinary expression. It could be a string constant, such as - `"foo"', but it may also be an expression whose value may vary. - *Note How to Use Regular Expressions: Regexp Usage. - -Escape Sequences - A special sequence of characters used for describing nonprinting - characters, such as `\n' for newline, or `\033' for the ASCII ESC - (escape) character. *Note Constant Expressions: Constants. - -Field - When `awk' reads an input record, it splits the record into pieces - separated by whitespace (or by a separator regexp which you can - change by setting the built-in variable `FS'). Such pieces are - called fields. If the pieces are of fixed length, you can use the - built-in variable `FIELDWIDTHS' to describe their lengths. *Note - How Input is Split into Records: Records. - -Format - Format strings are used to control the appearance of output in the - `printf' statement. Also, data conversions from numbers to strings - are controlled by the format string contained in the built-in - variable `CONVFMT'. *Note Format-Control Letters: Control Letters. - -Function - A specialized group of statements often used to encapsulate general - or program-specific tasks. `awk' has a number of built-in - functions, and also allows you to define your own. *Note Built-in - Functions: Built-in. Also, see *Note User-defined Functions: - User-defined. - -`gawk' - The GNU implementation of `awk'. - -GNU - "GNU's not Unix". An on-going project of the Free Software - Foundation to create a complete, freely distributable, - POSIX-compliant computing environment. - -Input Record - A single chunk of data read in by `awk'. Usually, an `awk' input - record consists of one line of text. *Note How Input is Split - into Records: Records. - -Keyword - In the `awk' language, a keyword is a word that has special - meaning. Keywords are reserved and may not be used as variable - names. - - `awk''s keywords are: `if', `else', `while', `do...while', `for', - `for...in', `break', `continue', `delete', `next', `function', - `func', and `exit'. - -Lvalue - An expression that can appear on the left side of an assignment - operator. In most languages, lvalues can be variables or array - elements. In `awk', a field designator can also be used as an - lvalue. - -Number - A numeric valued data object. The `gawk' implementation uses - double precision floating point to represent numbers. - -Pattern - Patterns tell `awk' which input records are interesting to which - rules. - - A pattern is an arbitrary conditional expression against which - input is tested. If the condition is satisfied, the pattern is - said to "match" the input record. A typical pattern might compare - the input record against a regular expression. *Note Patterns::. - -POSIX - The name for a series of standards being developed by the IEEE - that specify a Portable Operating System interface. The "IX" - denotes the Unix heritage of these standards. The main standard - of interest for `awk' users is P1003.2, the Command Language and - Utilities standard. - -Range (of input lines) - A sequence of consecutive lines from the input file. A pattern - can specify ranges of input lines for `awk' to process, or it can - specify single lines. *Note Patterns::. - -Recursion - When a function calls itself, either directly or indirectly. If - this isn't clear, refer to the entry for "recursion." - -Redirection - Redirection means performing input from other than the standard - input stream, or output to other than the standard output stream. - - You can redirect the output of the `print' and `printf' statements - to a file or a system command, using the `>', `>>', and `|' - operators. You can redirect input to the `getline' statement using - the `<' and `|' operators. *Note Redirecting Output of `print' - and `printf': Redirection. - -Regular Expression - See "regexp." - -Regexp - Short for "regular expression". A regexp is a pattern that - denotes a set of strings, possibly an infinite set. For example, - the regexp `R.*xp' matches any string starting with the letter `R' - and ending with the letters `xp'. In `awk', regexps are used in - patterns and in conditional expressions. Regexps may contain - escape sequences. *Note Regular Expressions as Patterns: Regexp. - -Rule - A segment of an `awk' program, that specifies how to process single - input records. A rule consists of a "pattern" and an "action". - `awk' reads an input record; then, for each rule, if the input - record satisfies the rule's pattern, `awk' executes the rule's - action. Otherwise, the rule does nothing for that input record. - -Side Effect - A side effect occurs when an expression has an effect aside from - merely producing a value. Assignment expressions, increment - expressions and function calls have side effects. *Note - Assignment Expressions: Assignment Ops. - -Special File - A file name interpreted internally by `gawk', instead of being - handed directly to the underlying operating system. For example, - `/dev/stdin'. *Note Standard I/O Streams: Special Files. - -Stream Editor - A program that reads records from an input stream and processes - them one or more at a time. This is in contrast with batch - programs, which may expect to read their input files in entirety - before starting to do anything, and with interactive programs, - which require input from the user. - -String - A datum consisting of a sequence of characters, such as `I am a - string'. Constant strings are written with double-quotes in the - `awk' language, and may contain escape sequences. *Note Constant - Expressions: Constants. - -Whitespace - A sequence of blank or tab characters occurring inside an input - record or a string. - |