diff options
-rw-r--r-- | ChangeLog | 7 | ||||
-rw-r--r-- | doc/ChangeLog | 6 | ||||
-rw-r--r-- | doc/gawk.info | 1413 | ||||
-rw-r--r-- | doc/gawk.texi | 215 | ||||
-rw-r--r-- | doc/gawktexi.in | 215 | ||||
-rw-r--r-- | field.c | 32 | ||||
-rw-r--r-- | support/ChangeLog | 4 | ||||
-rw-r--r-- | support/dfa.c | 24 | ||||
-rw-r--r-- | support/dfa.h | 2 | ||||
-rw-r--r-- | support/intprops.h | 79 | ||||
-rw-r--r-- | support/verify.h | 9 | ||||
-rw-r--r-- | test/ChangeLog | 6 | ||||
-rw-r--r-- | test/Makefile.am | 12 | ||||
-rw-r--r-- | test/Makefile.in | 27 | ||||
-rw-r--r-- | test/Maketests | 15 | ||||
-rw-r--r-- | test/fwtest5.awk | 2 | ||||
-rw-r--r-- | test/fwtest5.in | 4 | ||||
-rw-r--r-- | test/fwtest5.ok | 4 | ||||
-rw-r--r-- | test/fwtest6.awk | 4 | ||||
-rw-r--r-- | test/fwtest6.in | 1 | ||||
-rw-r--r-- | test/fwtest6.ok | 3 | ||||
-rw-r--r-- | test/fwtest7.awk | 2 | ||||
-rw-r--r-- | test/fwtest7.in | 1 | ||||
-rw-r--r-- | test/fwtest7.ok | 1 |
24 files changed, 1239 insertions, 849 deletions
@@ -1,3 +1,10 @@ +2017-05-23 Arnold D. Robbins <arnold@skeeve.com> + + * field.c (fw_parse_field): Stop upon hitting the end of the + record; this enables correct counting of the number of fields. + (set_FIELDWIDTHS): Add `*' at end as meaning ``all the rest + of the data on the line.'' Allow skip:* as well. + 2017-05-20 Arnold D. Robbins <arnold@skeeve.com> * awkgram.y (add_lint): Make ``no effect'' check smarter about diff --git a/doc/ChangeLog b/doc/ChangeLog index 567fa990..d351a3f1 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,9 @@ +2017-05-22 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in: Document FIELDWIDTHS much better, including how + it works in corner cases. Some general organizational improvements + in this chunk of text. + 2017-04-23 Arnold D. Robbins <arnold@skeeve.com> * gawktexi.in: Improve documentation of --source option. diff --git a/doc/gawk.info b/doc/gawk.info index 3779fa95..b57abbd3 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -196,7 +196,13 @@ in (a) below. A copy of the license is included in the section entitled field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how 'gawk' is + splitting records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the 'getline' @@ -4228,6 +4234,8 @@ be named on the 'awk' command line (*note Getline::). * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how 'gawk' is splitting + records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the 'getline' function. @@ -5124,10 +5132,25 @@ This minor node discusses an advanced feature of 'gawk'. If you are a novice 'awk' user, you might want to skip it on the first reading. 'gawk' provides a facility for dealing with fixed-width fields with -no distinctive field separator. For example, data of this nature arises -in the input for old Fortran programs where numbers are run together, or -in the output of programs that did not anticipate the use of their -output as input for other programs. +no distinctive field separator. We discuss this feature in the +following nodes. + +* Menu: + +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. + + +File: gawk.info, Node: Fixed width data, Next: Skipping intervening, Up: Constant Size + +4.6.1 Processing Fixed-Width Data +--------------------------------- + +An example of fixed-width data would be the input for old Fortran +programs where numbers are run together, or the output of programs that +did not anticipate the use of their output as input for other programs. An example of the latter is a table where all the columns are lined up by the use of a variable number of spaces and _empty fields are just @@ -5141,12 +5164,11 @@ by assigning a string containing space-separated numbers to the built-in variable 'FIELDWIDTHS'. Each number specifies the width of the field, _including_ columns between fields. If you want to ignore the columns between fields, you can specify the width as a separate field that is -subsequently ignored. Or, starting in version 4.2, each field width may -optionally be preceded by a colon-separated value specifying the number -of characters to skip before the field starts. It is a fatal error to -supply a field width that has a negative value. The following data is -the output of the Unix 'w' utility. It is useful to illustrate the use -of 'FIELDWIDTHS': +subsequently ignored. It is a fatal error to supply a field width that +has a negative value. + + The following data is the output of the Unix 'w' utility. It is +useful to illustrate the use of 'FIELDWIDTHS': 10:06pm up 21 days, 14:04, 23 users User tty login idle JCPU PCPU what @@ -5169,7 +5191,7 @@ calculated idle time: sub(/^ +/, "", idle) # strip leading spaces if (idle == "") idle = 0 - if (idle ~ /:/) { + if (idle ~ /:/) { # hh:mm split(idle, t, ":") idle = t[1] * 60 + t[2] } @@ -5193,11 +5215,31 @@ calculated idle time: brent ttyp0 286 dave ttyq4 1296000 - Starting in version 4.2, this program could be rewritten to specify -'FIELDWIDTHS' like so: + Another (possibly more practical) example of fixed-width input data +is the input from a deck of balloting cards. In some parts of the +United States, voters mark their choices by punching holes in computer +cards. These cards are then processed to count the votes for any +particular candidate or on any particular issue. Because a voter may +choose not to vote on some issue, any column on the card may be empty. +An 'awk' program for processing such data could use the 'FIELDWIDTHS' +feature to simplify reading the data. (Of course, getting 'gawk' to run +on a system with card readers is another story!) + + +File: gawk.info, Node: Skipping intervening, Next: Allowing trailing data, Prev: Fixed width data, Up: Constant Size + +4.6.2 Skipping Intervening Fields +--------------------------------- + +Starting in version 4.2, each field width may optionally be preceded by +a colon-separated value specifying the number of characters to skip +before the field starts. Thus, the preceding program could be rewritten +to specify 'FIELDWIDTHS' like so: + BEGIN { FIELDWIDTHS = "8 1:5 4:7 6 1:6 1:6 2:33" } + This strips away some of the white space separating the fields. With -such a change, the program would produce the following results: +such a change, the program produces the following results: hzang ttyV3 50 eklye ttyV5 0 @@ -5207,39 +5249,68 @@ such a change, the program would produce the following results: brent ttyp0 286 dave ttyq4 1296000 - Another (possibly more practical) example of fixed-width input data -is the input from a deck of balloting cards. In some parts of the -United States, voters mark their choices by punching holes in computer -cards. These cards are then processed to count the votes for any -particular candidate or on any particular issue. Because a voter may -choose not to vote on some issue, any column on the card may be empty. -An 'awk' program for processing such data could use the 'FIELDWIDTHS' -feature to simplify reading the data. (Of course, getting 'gawk' to run -on a system with card readers is another story!) + +File: gawk.info, Node: Allowing trailing data, Next: Fields with fixed data, Prev: Skipping intervening, Up: Constant Size - Assigning a value to 'FS' causes 'gawk' to use 'FS' for field -splitting again. Use 'FS = FS' to make this happen, without having to -know the current value of 'FS'. In order to tell which kind of field -splitting is in effect, use 'PROCINFO["FS"]' (*note Auto-set::). The -value is '"FS"' if regular field splitting is being used, or -'"FIELDWIDTHS"' if fixed-width field splitting is being used: +4.6.3 Capturing Optional Trailing Data +-------------------------------------- - if (PROCINFO["FS"] == "FS") - REGULAR FIELD SPLITTING ... - else if (PROCINFO["FS"] == "FIELDWIDTHS") - FIXED-WIDTH FIELD SPLITTING ... - else if (PROCINFO["FS"] == "FPAT") - CONTENT-BASED FIELD SPLITTING ... (see next minor node) - else - API INPUT PARSER FIELD SPLITTING ... (advanced feature) +There are times when fixed-width data may be followed by additional data +that has no fixed length. Such data may or may not be present, but if +it is, it should be possible to get at it from an 'awk' program. - This information is useful when writing a function that needs to -temporarily change 'FS' or 'FIELDWIDTHS', read some records, and then -restore the original settings (*note Passwd Functions:: for an example -of such a function). + Starting with version 4.2, in order to provide a way to say "anything +else in the record after the defined fields," 'gawk' allows you to add a +final '*' character to the value of 'FIELDWIDTHS'. There can only be +one such character, and it must be the final non-whitespace character in +'FIELDWIDTHS'. For example: + + $ cat fw.awk Show the program + -| BEGIN { FIELDWIDTHS = "2 2 *" } + -| { print NF, $1, $2, $3 } + $ cat fw.in Show sample input + -| 1234abcdefghi + $ gawk -f fw.awk fw.in Run the program + -| 3 12 34 abcdefghi -File: gawk.info, Node: Splitting By Content, Next: Multiple Line, Prev: Constant Size, Up: Reading Files +File: gawk.info, Node: Fields with fixed data, Prev: Allowing trailing data, Up: Constant Size + +4.6.4 Field Values With Fixed-Width Data +---------------------------------------- + +So far, so good. But what happens if there isn't as much data as there +should be based on the contents of 'FIELDWIDTHS'? Or, what happens if +there is more data than expected? + + For many years, what happens in these cases was not well defined. +Starting with version 4.2, the rules are as follows: + +Enough data for some fields + For example, if 'FIELDWIDTHS' is set to '"2 3 4"' and the input + record is 'aabbb'. In this case, 'NF' is set to two. + +Not enough data for a field + For example, if 'FIELDWIDTHS' is set to '"2 3 4"' and the input + record is 'aab'. In this case, 'NF' is set to two and '$2' has the + value '"b"'. The idea is that even though there aren't as many + characters as were expected, there are some, so the data should be + made available to the program. + +Too much data + For example, if 'FIELDWIDTHS' is set to '"2 3 4"' and the input + record is 'aabbbccccddd'. In this case, 'NF' is set to three and + the extra characters ('ddd') are ignored. If you want 'gawk' to + capture the extra characters, supply a final '*' in the value of + 'FIELDWIDTHS'. + +Too much data, but with '*' supplied + For example, if 'FIELDWIDTHS' is set to '"2 3 4 *"' and the input + record is 'aabbbccccddd'. In this case, 'NF' is set to four, and + '$4' has the value '"ddd"'. + + +File: gawk.info, Node: Splitting By Content, Next: Testing field creation, Prev: Constant Size, Up: Reading Files 4.7 Defining Fields by Content ============================== @@ -5315,9 +5386,7 @@ would be to remove the quotes when they occur, with something like this: affects field splitting with 'FPAT'. Assigning a value to 'FPAT' overrides field splitting with 'FS' and -with 'FIELDWIDTHS'. Similar to 'FIELDWIDTHS', the value of -'PROCINFO["FS"]' will be '"FPAT"' if content-based field splitting is -being used. +with 'FIELDWIDTHS'. NOTE: Some programs export CSV data that contains embedded newlines between the double quotes. 'gawk' provides no way to deal with @@ -5335,23 +5404,53 @@ contain at least one character. A straightforward modification Finally, the 'patsplit()' function makes the same functionality available for splitting regular strings (*note String Functions::). - To recap, 'gawk' provides three independent methods to split input + ---------- Footnotes ---------- + + (1) The CSV format lacked a formal standard definition for many +years. RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt) standardizes the +most common practices. + + +File: gawk.info, Node: Testing field creation, Next: Multiple Line, Prev: Splitting By Content, Up: Reading Files + +4.8 Checking How 'gawk' Is Splitting Records +============================================ + +As we've seen, 'gawk' provides three independent methods to split input records into fields. The mechanism used is based on which of the three variables--'FS', 'FIELDWIDTHS', or 'FPAT'--was last assigned to. In addition, an API input parser may choose to override the record parsing mechanism; please refer to *note Input Parsers:: for further information about this feature. - ---------- Footnotes ---------- + To restore normal field splitting after using 'FIELDWIDTHS' and/or +'FPAT', simply assign a value to 'FS'. You can use 'FS = FS' to do +this, without having to know the current value of 'FS'. - (1) The CSV format lacked a formal standard definition for many -years. RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt) standardizes the -most common practices. + In order to tell which kind of field splitting is in effect, use +'PROCINFO["FS"]' (*note Auto-set::). The value is '"FS"' if regular +field splitting is being used, '"FIELDWIDTHS"' if fixed-width field +splitting is being used, or '"FPAT"' if content-based field splitting is +being used: + + if (PROCINFO["FS"] == "FS") + REGULAR FIELD SPLITTING ... + else if (PROCINFO["FS"] == "FIELDWIDTHS") + FIXED-WIDTH FIELD SPLITTING ... + else if (PROCINFO["FS"] == "FPAT") + CONTENT-BASED FIELD SPLITTING + else + API INPUT PARSER FIELD SPLITTING ... (advanced feature) + + This information is useful when writing a function that needs to +temporarily change 'FS' or 'FIELDWIDTHS', read some records, and then +restore the original settings (*note Passwd Functions:: for an example +of such a function). -File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Splitting By Content, Up: Reading Files +File: gawk.info, Node: Multiple Line, Next: Getline, Prev: Testing field creation, Up: Reading Files -4.8 Multiple-Line Records +4.9 Multiple-Line Records ========================= In some databases, a single line cannot conveniently hold all the @@ -5491,8 +5590,8 @@ separator of a single space: 'FS = " "'. File: gawk.info, Node: Getline, Next: Read Timeout, Prev: Multiple Line, Up: Reading Files -4.9 Explicit Input with 'getline' -================================= +4.10 Explicit Input with 'getline' +================================== So far we have been getting our input data from 'awk''s main input stream--either the standard input (usually your keyboard, sometimes the @@ -5543,8 +5642,8 @@ represents a shell command. File: gawk.info, Node: Plain Getline, Next: Getline/Variable, Up: Getline -4.9.1 Using 'getline' with No Arguments ---------------------------------------- +4.10.1 Using 'getline' with No Arguments +---------------------------------------- The 'getline' command can be used without arguments to read input from the current input file. All it does in this case is read the next input @@ -5604,8 +5703,8 @@ the value of '$0'. File: gawk.info, Node: Getline/Variable, Next: Getline/File, Prev: Plain Getline, Up: Getline -4.9.2 Using 'getline' into a Variable -------------------------------------- +4.10.2 Using 'getline' into a Variable +-------------------------------------- You can use 'getline VAR' to read the next record from 'awk''s input into the variable VAR. No other processing is done. For example, @@ -5645,8 +5744,8 @@ fields, so the values of the fields (including '$0') and the value of File: gawk.info, Node: Getline/File, Next: Getline/Variable/File, Prev: Getline/Variable, Up: Getline -4.9.3 Using 'getline' from a File ---------------------------------- +4.10.3 Using 'getline' from a File +---------------------------------- Use 'getline < FILE' to read the next record from FILE. Here, FILE is a string-valued expression that specifies the file name. '< FILE' is @@ -5678,8 +5777,8 @@ portable to all 'awk' implementations. File: gawk.info, Node: Getline/Variable/File, Next: Getline/Pipe, Prev: Getline/File, Up: Getline -4.9.4 Using 'getline' into a Variable from a File -------------------------------------------------- +4.10.4 Using 'getline' into a Variable from a File +-------------------------------------------------- Use 'getline VAR < FILE' to read input from the file FILE, and put it in the variable VAR. As earlier, FILE is a string-valued expression that @@ -5722,8 +5821,8 @@ regular expression. File: gawk.info, Node: Getline/Pipe, Next: Getline/Variable/Pipe, Prev: Getline/Variable/File, Up: Getline -4.9.5 Using 'getline' from a Pipe ---------------------------------- +4.10.5 Using 'getline' from a Pipe +---------------------------------- Omniscience has much to recommend it. Failing that, attention to details would be useful. @@ -5792,8 +5891,8 @@ you want your program to be portable to all 'awk' implementations. File: gawk.info, Node: Getline/Variable/Pipe, Next: Getline/Coprocess, Prev: Getline/Pipe, Up: Getline -4.9.6 Using 'getline' into a Variable from a Pipe -------------------------------------------------- +4.10.6 Using 'getline' into a Variable from a Pipe +-------------------------------------------------- When you use 'COMMAND | getline VAR', the output of COMMAND is sent through a pipe to 'getline' and into the variable VAR. For example, the @@ -5819,8 +5918,8 @@ to other 'awk' implementations. File: gawk.info, Node: Getline/Coprocess, Next: Getline/Variable/Coprocess, Prev: Getline/Variable/Pipe, Up: Getline -4.9.7 Using 'getline' from a Coprocess --------------------------------------- +4.10.7 Using 'getline' from a Coprocess +--------------------------------------- Reading input into 'getline' from a pipe is a one-way operation. The command that is started with 'COMMAND | getline' only sends data _to_ @@ -5849,8 +5948,8 @@ coprocesses are discussed in more detail. File: gawk.info, Node: Getline/Variable/Coprocess, Next: Getline Notes, Prev: Getline/Coprocess, Up: Getline -4.9.8 Using 'getline' into a Variable from a Coprocess ------------------------------------------------------- +4.10.8 Using 'getline' into a Variable from a Coprocess +------------------------------------------------------- When you use 'COMMAND |& getline VAR', the output from the coprocess COMMAND is sent through a two-way pipe to 'getline' and into the @@ -5867,8 +5966,8 @@ coprocesses are discussed in more detail. File: gawk.info, Node: Getline Notes, Next: Getline Summary, Prev: Getline/Variable/Coprocess, Up: Getline -4.9.9 Points to Remember About 'getline' ----------------------------------------- +4.10.9 Points to Remember About 'getline' +----------------------------------------- Here are some miscellaneous points about 'getline' that you should bear in mind: @@ -5927,8 +6026,8 @@ in mind: File: gawk.info, Node: Getline Summary, Prev: Getline Notes, Up: Getline -4.9.10 Summary of 'getline' Variants ------------------------------------- +4.10.10 Summary of 'getline' Variants +------------------------------------- *note Table 4.1: table-getline-variants. summarizes the eight variants of 'getline', listing which predefined variables are set by each one, @@ -5955,7 +6054,7 @@ Table 4.1: 'getline' variants and what they set File: gawk.info, Node: Read Timeout, Next: Retrying Input, Prev: Getline, Up: Reading Files -4.10 Reading Input with a Timeout +4.11 Reading Input with a Timeout ================================= This minor node describes a feature that is specific to 'gawk'. @@ -6049,7 +6148,7 @@ can block indefinitely until some other process opens it for writing. File: gawk.info, Node: Retrying Input, Next: Command-line directories, Prev: Read Timeout, Up: Reading Files -4.11 Retrying Reads After Certain Input Errors +4.12 Retrying Reads After Certain Input Errors ============================================== This minor node describes a feature that is specific to 'gawk'. @@ -6076,7 +6175,7 @@ configured to behave in a non-blocking fashion. File: gawk.info, Node: Command-line directories, Next: Input Summary, Prev: Retrying Input, Up: Reading Files -4.12 Directories on the Command Line +4.13 Directories on the Command Line ==================================== According to the POSIX standard, files named on the 'awk' command line @@ -6099,7 +6198,7 @@ usable data from an 'awk' program. File: gawk.info, Node: Input Summary, Next: Input Exercises, Prev: Command-line directories, Up: Reading Files -4.13 Summary +4.14 Summary ============ * Input is split into records based on the value of 'RS'. The @@ -6171,7 +6270,7 @@ File: gawk.info, Node: Input Summary, Next: Input Exercises, Prev: Command-li File: gawk.info, Node: Input Exercises, Prev: Input Summary, Up: Reading Files -4.14 Exercises +4.15 Exercises ============== 1. Using the 'FIELDWIDTHS' variable (*note Constant Size::), write a @@ -33758,7 +33857,7 @@ Index * fields, separating <1>: Field Separators. (line 15) * fields, single-character: Single Character Fields. (line 6) -* FIELDWIDTHS variable: Constant Size. (line 22) +* FIELDWIDTHS variable: Fixed width data. (line 17) * FIELDWIDTHS variable <1>: User-modified. (line 37) * file descriptors: Special FD. (line 6) * file inclusion, @include directive: Include Files. (line 8) @@ -33967,7 +34066,7 @@ Index * gawk, features, adding: Adding Code. (line 6) * gawk, features, advanced: Advanced Features. (line 6) * gawk, field separators and: User-modified. (line 74) -* gawk, FIELDWIDTHS variable in: Constant Size. (line 22) +* gawk, FIELDWIDTHS variable in: Fixed width data. (line 17) * gawk, FIELDWIDTHS variable in <1>: User-modified. (line 37) * gawk, file names in: Special Files. (line 6) * gawk, format-control characters: Control Letters. (line 18) @@ -34018,7 +34117,8 @@ Index * gawk, RT variable in <2>: Auto-set. (line 296) * gawk, See Also awk: Preface. (line 34) * gawk, source code, obtaining: Getting. (line 6) -* gawk, splitting fields and: Constant Size. (line 103) +* gawk, splitting fields and: Testing field creation. + (line 6) * gawk, string-translation functions: I18N Functions. (line 6) * gawk, SYMTAB array in: Auto-set. (line 300) * gawk, TEXTDOMAIN variable in: User-modified. (line 155) @@ -35323,8 +35423,8 @@ Index * troubleshooting, backslash before nonspecial character: Escape Sequences. (line 108) * troubleshooting, division: Arithmetic Ops. (line 44) -* troubleshooting, fatal errors, field widths, specifying: Constant Size. - (line 22) +* troubleshooting, fatal errors, field widths, specifying: Fixed width data. + (line 17) * troubleshooting, fatal errors, printf format strings: Format Modifiers. (line 157) * troubleshooting, fflush() function: I/O Functions. (line 63) @@ -35448,7 +35548,7 @@ Index * Vinschen, Corinna: Acknowledgments. (line 60) * w debugger command (alias for watch): Viewing And Changing Data. (line 66) -* w utility: Constant Size. (line 22) +* w utility: Fixed width data. (line 17) * wait() extension function: Extension Sample Fork. (line 22) * waitpid() extension function: Extension Sample Fork. @@ -35503,574 +35603,579 @@ Index Tag Table: Node: Top1200 -Node: Foreword342794 -Node: Foreword447236 -Node: Preface48768 -Ref: Preface-Footnote-151627 -Ref: Preface-Footnote-251734 -Ref: Preface-Footnote-351968 -Node: History52110 -Node: Names54462 -Ref: Names-Footnote-155556 -Node: This Manual55703 -Ref: This Manual-Footnote-162188 -Node: Conventions62288 -Node: Manual History64642 -Ref: Manual History-Footnote-167637 -Ref: Manual History-Footnote-267678 -Node: How To Contribute67752 -Node: Acknowledgments68403 -Node: Getting Started73289 -Node: Running gawk75728 -Node: One-shot76918 -Node: Read Terminal78181 -Node: Long80174 -Node: Executable Scripts81687 -Ref: Executable Scripts-Footnote-184482 -Node: Comments84585 -Node: Quoting87069 -Node: DOS Quoting92586 -Node: Sample Data Files94641 -Node: Very Simple97236 -Node: Two Rules102138 -Node: More Complex104023 -Node: Statements/Lines106889 -Ref: Statements/Lines-Footnote-1111348 -Node: Other Features111613 -Node: When112549 -Ref: When-Footnote-1114303 -Node: Intro Summary114368 -Node: Invoking Gawk115252 -Node: Command Line116766 -Node: Options117564 -Ref: Options-Footnote-1134183 -Ref: Options-Footnote-2134413 -Node: Other Arguments134438 -Node: Naming Standard Input137385 -Node: Environment Variables138478 -Node: AWKPATH Variable139036 -Ref: AWKPATH Variable-Footnote-1142447 -Ref: AWKPATH Variable-Footnote-2142481 -Node: AWKLIBPATH Variable142742 -Node: Other Environment Variables143999 -Node: Exit Status147820 -Node: Include Files148497 -Node: Loading Shared Libraries152092 -Node: Obsolete153520 -Node: Undocumented154212 -Node: Invoking Summary154509 -Node: Regexp156169 -Node: Regexp Usage157623 -Node: Escape Sequences159660 -Node: Regexp Operators165892 -Ref: Regexp Operators-Footnote-1173308 -Ref: Regexp Operators-Footnote-2173455 -Node: Bracket Expressions173553 -Ref: table-char-classes176029 -Node: Leftmost Longest179166 -Node: Computed Regexps180469 -Node: GNU Regexp Operators183896 -Node: Case-sensitivity187575 -Ref: Case-sensitivity-Footnote-1190462 -Ref: Case-sensitivity-Footnote-2190697 -Node: Regexp Summary190805 -Node: Reading Files192271 -Node: Records194434 -Node: awk split records195167 -Node: gawk split records200098 -Ref: gawk split records-Footnote-1204638 -Node: Fields204675 -Node: Nonconstant Fields207416 -Ref: Nonconstant Fields-Footnote-1209652 -Node: Changing Fields209856 -Node: Field Separators215784 -Node: Default Field Splitting218482 -Node: Regexp Field Splitting219600 -Node: Single Character Fields222953 -Node: Command Line Field Separator224013 -Node: Full Line Fields227231 -Ref: Full Line Fields-Footnote-1228753 -Ref: Full Line Fields-Footnote-2228799 -Node: Field Splitting Summary228900 -Node: Constant Size230974 -Node: Splitting By Content236283 -Ref: Splitting By Content-Footnote-1240423 -Node: Multiple Line240586 -Ref: Multiple Line-Footnote-1246468 -Node: Getline246647 -Node: Plain Getline249114 -Node: Getline/Variable251753 -Node: Getline/File252902 -Node: Getline/Variable/File254288 -Ref: Getline/Variable/File-Footnote-1255891 -Node: Getline/Pipe255979 -Node: Getline/Variable/Pipe258684 -Node: Getline/Coprocess259817 -Node: Getline/Variable/Coprocess261082 -Node: Getline Notes261822 -Node: Getline Summary264617 -Ref: table-getline-variants265039 -Node: Read Timeout265787 -Ref: Read Timeout-Footnote-1269693 -Node: Retrying Input269751 -Node: Command-line directories270950 -Node: Input Summary271856 -Node: Input Exercises275028 -Node: Printing275756 -Node: Print277590 -Node: Print Examples279047 -Node: Output Separators281827 -Node: OFMT283844 -Node: Printf285200 -Node: Basic Printf285985 -Node: Control Letters287559 -Node: Format Modifiers291547 -Node: Printf Examples297562 -Node: Redirection300048 -Node: Special FD306889 -Ref: Special FD-Footnote-1310057 -Node: Special Files310131 -Node: Other Inherited Files310748 -Node: Special Network311749 -Node: Special Caveats312609 -Node: Close Files And Pipes313558 -Ref: table-close-pipe-return-values320465 -Ref: Close Files And Pipes-Footnote-1321248 -Ref: Close Files And Pipes-Footnote-2321396 -Node: Nonfatal321548 -Node: Output Summary323873 -Node: Output Exercises325095 -Node: Expressions325774 -Node: Values326962 -Node: Constants327640 -Node: Scalar Constants328331 -Ref: Scalar Constants-Footnote-1329195 -Node: Nondecimal-numbers329445 -Node: Regexp Constants332446 -Node: Using Constant Regexps332972 -Node: Standard Regexp Constants333594 -Node: Strong Regexp Constants336782 -Node: Variables339740 -Node: Using Variables340397 -Node: Assignment Options342307 -Node: Conversion344180 -Node: Strings And Numbers344704 -Ref: Strings And Numbers-Footnote-1347767 -Node: Locale influences conversions347876 -Ref: table-locale-affects350634 -Node: All Operators351252 -Node: Arithmetic Ops351881 -Node: Concatenation354387 -Ref: Concatenation-Footnote-1357234 -Node: Assignment Ops357341 -Ref: table-assign-ops362332 -Node: Increment Ops363645 -Node: Truth Values and Conditions367105 -Node: Truth Values368179 -Node: Typing and Comparison369227 -Node: Variable Typing370047 -Ref: Variable Typing-Footnote-1376510 -Ref: Variable Typing-Footnote-2376582 -Node: Comparison Operators376659 -Ref: table-relational-ops377078 -Node: POSIX String Comparison380573 -Ref: POSIX String Comparison-Footnote-1382268 -Ref: POSIX String Comparison-Footnote-2382407 -Node: Boolean Ops382491 -Ref: Boolean Ops-Footnote-1386973 -Node: Conditional Exp387065 -Node: Function Calls388801 -Node: Precedence392678 -Node: Locales396337 -Node: Expressions Summary397969 -Node: Patterns and Actions400542 -Node: Pattern Overview401662 -Node: Regexp Patterns403339 -Node: Expression Patterns403881 -Node: Ranges407662 -Node: BEGIN/END410770 -Node: Using BEGIN/END411531 -Ref: Using BEGIN/END-Footnote-1414267 -Node: I/O And BEGIN/END414373 -Node: BEGINFILE/ENDFILE416687 -Node: Empty419594 -Node: Using Shell Variables419911 -Node: Action Overview422185 -Node: Statements424510 -Node: If Statement426358 -Node: While Statement427853 -Node: Do Statement429881 -Node: For Statement431029 -Node: Switch Statement434187 -Node: Break Statement436573 -Node: Continue Statement438665 -Node: Next Statement440492 -Node: Nextfile Statement442875 -Node: Exit Statement445527 -Node: Built-in Variables447930 -Node: User-modified449063 -Node: Auto-set456830 -Ref: Auto-set-Footnote-1471558 -Ref: Auto-set-Footnote-2471764 -Node: ARGC and ARGV471820 -Node: Pattern Action Summary476033 -Node: Arrays478463 -Node: Array Basics479792 -Node: Array Intro480636 -Ref: figure-array-elements482611 -Ref: Array Intro-Footnote-1485315 -Node: Reference to Elements485443 -Node: Assigning Elements487907 -Node: Array Example488398 -Node: Scanning an Array490157 -Node: Controlling Scanning493179 -Ref: Controlling Scanning-Footnote-1498578 -Node: Numeric Array Subscripts498894 -Node: Uninitialized Subscripts501078 -Node: Delete502697 -Ref: Delete-Footnote-1505449 -Node: Multidimensional505506 -Node: Multiscanning508601 -Node: Arrays of Arrays510192 -Node: Arrays Summary514959 -Node: Functions517052 -Node: Built-in518090 -Node: Calling Built-in519171 -Node: Numeric Functions521167 -Ref: Numeric Functions-Footnote-1525195 -Ref: Numeric Functions-Footnote-2525552 -Ref: Numeric Functions-Footnote-3525600 -Node: String Functions525872 -Ref: String Functions-Footnote-1549530 -Ref: String Functions-Footnote-2549658 -Ref: String Functions-Footnote-3549906 -Node: Gory Details549993 -Ref: table-sub-escapes551784 -Ref: table-sub-proposed553303 -Ref: table-posix-sub554666 -Ref: table-gensub-escapes556207 -Ref: Gory Details-Footnote-1557030 -Node: I/O Functions557184 -Ref: table-system-return-values563766 -Ref: I/O Functions-Footnote-1565746 -Ref: I/O Functions-Footnote-2565894 -Node: Time Functions566014 -Ref: Time Functions-Footnote-1576681 -Ref: Time Functions-Footnote-2576749 -Ref: Time Functions-Footnote-3576907 -Ref: Time Functions-Footnote-4577018 -Ref: Time Functions-Footnote-5577130 -Ref: Time Functions-Footnote-6577357 -Node: Bitwise Functions577623 -Ref: table-bitwise-ops578217 -Ref: Bitwise Functions-Footnote-1584250 -Ref: Bitwise Functions-Footnote-2584423 -Node: Type Functions584614 -Node: I18N Functions587289 -Node: User-defined588940 -Node: Definition Syntax589745 -Ref: Definition Syntax-Footnote-1595432 -Node: Function Example595503 -Ref: Function Example-Footnote-1598425 -Node: Function Caveats598447 -Node: Calling A Function598965 -Node: Variable Scope599923 -Node: Pass By Value/Reference602917 -Node: Return Statement606416 -Node: Dynamic Typing609395 -Node: Indirect Calls610325 -Ref: Indirect Calls-Footnote-1620576 -Node: Functions Summary620704 -Node: Library Functions623409 -Ref: Library Functions-Footnote-1627016 -Ref: Library Functions-Footnote-2627159 -Node: Library Names627330 -Ref: Library Names-Footnote-1630790 -Ref: Library Names-Footnote-2631013 -Node: General Functions631099 -Node: Strtonum Function632202 -Node: Assert Function635224 -Node: Round Function638550 -Node: Cliff Random Function640091 -Node: Ordinal Functions641107 -Ref: Ordinal Functions-Footnote-1644170 -Ref: Ordinal Functions-Footnote-2644422 -Node: Join Function644632 -Ref: Join Function-Footnote-1646402 -Node: Getlocaltime Function646602 -Node: Readfile Function650344 -Node: Shell Quoting652316 -Node: Data File Management653717 -Node: Filetrans Function654349 -Node: Rewind Function658445 -Node: File Checking660351 -Ref: File Checking-Footnote-1661685 -Node: Empty Files661886 -Node: Ignoring Assigns663865 -Node: Getopt Function665415 -Ref: Getopt Function-Footnote-1676884 -Node: Passwd Functions677084 -Ref: Passwd Functions-Footnote-1685923 -Node: Group Functions686011 -Ref: Group Functions-Footnote-1693909 -Node: Walking Arrays694116 -Node: Library Functions Summary697124 -Node: Library Exercises698530 -Node: Sample Programs698995 -Node: Running Examples699765 -Node: Clones700493 -Node: Cut Program701717 -Node: Egrep Program711646 -Ref: Egrep Program-Footnote-1719158 -Node: Id Program719268 -Node: Split Program722948 -Ref: Split Program-Footnote-1726407 -Node: Tee Program726536 -Node: Uniq Program729326 -Node: Wc Program736752 -Ref: Wc Program-Footnote-1741007 -Node: Miscellaneous Programs741101 -Node: Dupword Program742314 -Node: Alarm Program744344 -Node: Translate Program749199 -Ref: Translate Program-Footnote-1753764 -Node: Labels Program754034 -Ref: Labels Program-Footnote-1757385 -Node: Word Sorting757469 -Node: History Sorting761541 -Node: Extract Program763376 -Node: Simple Sed770905 -Node: Igawk Program773979 -Ref: Igawk Program-Footnote-1788310 -Ref: Igawk Program-Footnote-2788512 -Ref: Igawk Program-Footnote-3788634 -Node: Anagram Program788749 -Node: Signature Program791811 -Node: Programs Summary793058 -Node: Programs Exercises794272 -Ref: Programs Exercises-Footnote-1798401 -Node: Advanced Features798492 -Node: Nondecimal Data800482 -Node: Array Sorting802073 -Node: Controlling Array Traversal802773 -Ref: Controlling Array Traversal-Footnote-1811140 -Node: Array Sorting Functions811258 -Ref: Array Sorting Functions-Footnote-1816349 -Node: Two-way I/O816545 -Ref: Two-way I/O-Footnote-1823096 -Ref: Two-way I/O-Footnote-2823283 -Node: TCP/IP Networking823365 -Node: Profiling826483 -Ref: Profiling-Footnote-1835155 -Node: Advanced Features Summary835478 -Node: Internationalization837322 -Node: I18N and L10N838802 -Node: Explaining gettext839489 -Ref: Explaining gettext-Footnote-1845381 -Ref: Explaining gettext-Footnote-2845566 -Node: Programmer i18n845731 -Ref: Programmer i18n-Footnote-1850680 -Node: Translator i18n850729 -Node: String Extraction851523 -Ref: String Extraction-Footnote-1852655 -Node: Printf Ordering852741 -Ref: Printf Ordering-Footnote-1855527 -Node: I18N Portability855591 -Ref: I18N Portability-Footnote-1858047 -Node: I18N Example858110 -Ref: I18N Example-Footnote-1860916 -Node: Gawk I18N860989 -Node: I18N Summary861634 -Node: Debugger862975 -Node: Debugging863997 -Node: Debugging Concepts864438 -Node: Debugging Terms866247 -Node: Awk Debugging868822 -Node: Sample Debugging Session869728 -Node: Debugger Invocation870262 -Node: Finding The Bug871648 -Node: List of Debugger Commands878126 -Node: Breakpoint Control879459 -Node: Debugger Execution Control883153 -Node: Viewing And Changing Data886515 -Node: Execution Stack889889 -Node: Debugger Info891526 -Node: Miscellaneous Debugger Commands895597 -Node: Readline Support900685 -Node: Limitations901581 -Node: Debugging Summary903690 -Node: Arbitrary Precision Arithmetic904969 -Node: Computer Arithmetic906385 -Ref: table-numeric-ranges909976 -Ref: Computer Arithmetic-Footnote-1910698 -Node: Math Definitions910755 -Ref: table-ieee-formats914069 -Ref: Math Definitions-Footnote-1914672 -Node: MPFR features914777 -Node: FP Math Caution916494 -Ref: FP Math Caution-Footnote-1917566 -Node: Inexactness of computations917935 -Node: Inexact representation918895 -Node: Comparing FP Values920255 -Node: Errors accumulate921337 -Node: Getting Accuracy922770 -Node: Try To Round925480 -Node: Setting precision926379 -Ref: table-predefined-precision-strings927076 -Node: Setting the rounding mode928906 -Ref: table-gawk-rounding-modes929280 -Ref: Setting the rounding mode-Footnote-1932688 -Node: Arbitrary Precision Integers932867 -Ref: Arbitrary Precision Integers-Footnote-1936054 -Node: POSIX Floating Point Problems936203 -Ref: POSIX Floating Point Problems-Footnote-1940085 -Node: Floating point summary940123 -Node: Dynamic Extensions942313 -Node: Extension Intro943866 -Node: Plugin License945132 -Node: Extension Mechanism Outline945929 -Ref: figure-load-extension946368 -Ref: figure-register-new-function947933 -Ref: figure-call-new-function949025 -Node: Extension API Description951087 -Node: Extension API Functions Introduction952729 -Node: General Data Types958063 -Ref: General Data Types-Footnote-1965268 -Node: Memory Allocation Functions965567 -Ref: Memory Allocation Functions-Footnote-1968412 -Node: Constructor Functions968511 -Node: Registration Functions971510 -Node: Extension Functions972195 -Node: Exit Callback Functions977408 -Node: Extension Version String978658 -Node: Input Parsers979321 -Node: Output Wrappers992028 -Node: Two-way processors996540 -Node: Printing Messages998805 -Ref: Printing Messages-Footnote-1999976 -Node: Updating ERRNO1000129 -Node: Requesting Values1000868 -Ref: table-value-types-returned1001605 -Node: Accessing Parameters1002541 -Node: Symbol Table Access1003776 -Node: Symbol table by name1004288 -Node: Symbol table by cookie1006077 -Ref: Symbol table by cookie-Footnote-11010262 -Node: Cached values1010326 -Ref: Cached values-Footnote-11013862 -Node: Array Manipulation1013953 -Ref: Array Manipulation-Footnote-11015044 -Node: Array Data Types1015081 -Ref: Array Data Types-Footnote-11017739 -Node: Array Functions1017831 -Node: Flattening Arrays1022230 -Node: Creating Arrays1029171 -Node: Redirection API1033940 -Node: Extension API Variables1036782 -Node: Extension Versioning1037415 -Ref: gawk-api-version1037852 -Node: Extension API Informational Variables1039580 -Node: Extension API Boilerplate1040644 -Node: Changes from API V11044506 -Node: Finding Extensions1045166 -Node: Extension Example1045725 -Node: Internal File Description1046523 -Node: Internal File Ops1050603 -Ref: Internal File Ops-Footnote-11062003 -Node: Using Internal File Ops1062143 -Ref: Using Internal File Ops-Footnote-11064526 -Node: Extension Samples1064800 -Node: Extension Sample File Functions1066329 -Node: Extension Sample Fnmatch1073978 -Node: Extension Sample Fork1075465 -Node: Extension Sample Inplace1076683 -Node: Extension Sample Ord1079893 -Node: Extension Sample Readdir1080729 -Ref: table-readdir-file-types1081618 -Node: Extension Sample Revout1082423 -Node: Extension Sample Rev2way1083012 -Node: Extension Sample Read write array1083752 -Node: Extension Sample Readfile1085694 -Node: Extension Sample Time1086789 -Node: Extension Sample API Tests1088137 -Node: gawkextlib1088629 -Node: Extension summary1091076 -Node: Extension Exercises1094778 -Node: Language History1096276 -Node: V7/SVR3.11097932 -Node: SVR41100084 -Node: POSIX1101518 -Node: BTL1102897 -Node: POSIX/GNU1103626 -Node: Feature History1109404 -Node: Common Extensions1123715 -Node: Ranges and Locales1124998 -Ref: Ranges and Locales-Footnote-11129614 -Ref: Ranges and Locales-Footnote-21129641 -Ref: Ranges and Locales-Footnote-31129876 -Node: Contributors1130097 -Node: History summary1135657 -Node: Installation1137037 -Node: Gawk Distribution1137981 -Node: Getting1138465 -Node: Extracting1139426 -Node: Distribution contents1141064 -Node: Unix Installation1147406 -Node: Quick Installation1148088 -Node: Shell Startup Files1150502 -Node: Additional Configuration Options1151591 -Node: Configuration Philosophy1153580 -Node: Non-Unix Installation1155949 -Node: PC Installation1156409 -Node: PC Binary Installation1157247 -Node: PC Compiling1157682 -Node: PC Using1158799 -Node: Cygwin1161844 -Node: MSYS1162614 -Node: VMS Installation1163115 -Node: VMS Compilation1163906 -Ref: VMS Compilation-Footnote-11165135 -Node: VMS Dynamic Extensions1165193 -Node: VMS Installation Details1166878 -Node: VMS Running1169131 -Node: VMS GNV1173410 -Node: VMS Old Gawk1174145 -Node: Bugs1174616 -Node: Bug address1175279 -Node: Usenet1177676 -Node: Maintainers1178453 -Node: Other Versions1179829 -Node: Installation summary1186413 -Node: Notes1187448 -Node: Compatibility Mode1188313 -Node: Additions1189095 -Node: Accessing The Source1190020 -Node: Adding Code1191455 -Node: New Ports1197673 -Node: Derived Files1202161 -Ref: Derived Files-Footnote-11207646 -Ref: Derived Files-Footnote-21207681 -Ref: Derived Files-Footnote-31208279 -Node: Future Extensions1208393 -Node: Implementation Limitations1209051 -Node: Extension Design1210234 -Node: Old Extension Problems1211388 -Ref: Old Extension Problems-Footnote-11212906 -Node: Extension New Mechanism Goals1212963 -Ref: Extension New Mechanism Goals-Footnote-11216327 -Node: Extension Other Design Decisions1216516 -Node: Extension Future Growth1218629 -Node: Old Extension Mechanism1219465 -Node: Notes summary1221228 -Node: Basic Concepts1222410 -Node: Basic High Level1223091 -Ref: figure-general-flow1223373 -Ref: figure-process-flow1224058 -Ref: Basic High Level-Footnote-11227359 -Node: Basic Data Typing1227544 -Node: Glossary1230872 -Node: Copying1262819 -Node: GNU Free Documentation License1300358 -Node: Index1325476 +Node: Foreword343204 +Node: Foreword447646 +Node: Preface49178 +Ref: Preface-Footnote-152037 +Ref: Preface-Footnote-252144 +Ref: Preface-Footnote-352378 +Node: History52520 +Node: Names54872 +Ref: Names-Footnote-155966 +Node: This Manual56113 +Ref: This Manual-Footnote-162598 +Node: Conventions62698 +Node: Manual History65052 +Ref: Manual History-Footnote-168047 +Ref: Manual History-Footnote-268088 +Node: How To Contribute68162 +Node: Acknowledgments68813 +Node: Getting Started73699 +Node: Running gawk76138 +Node: One-shot77328 +Node: Read Terminal78591 +Node: Long80584 +Node: Executable Scripts82097 +Ref: Executable Scripts-Footnote-184892 +Node: Comments84995 +Node: Quoting87479 +Node: DOS Quoting92996 +Node: Sample Data Files95051 +Node: Very Simple97646 +Node: Two Rules102548 +Node: More Complex104433 +Node: Statements/Lines107299 +Ref: Statements/Lines-Footnote-1111758 +Node: Other Features112023 +Node: When112959 +Ref: When-Footnote-1114713 +Node: Intro Summary114778 +Node: Invoking Gawk115662 +Node: Command Line117176 +Node: Options117974 +Ref: Options-Footnote-1134593 +Ref: Options-Footnote-2134823 +Node: Other Arguments134848 +Node: Naming Standard Input137795 +Node: Environment Variables138888 +Node: AWKPATH Variable139446 +Ref: AWKPATH Variable-Footnote-1142857 +Ref: AWKPATH Variable-Footnote-2142891 +Node: AWKLIBPATH Variable143152 +Node: Other Environment Variables144409 +Node: Exit Status148230 +Node: Include Files148907 +Node: Loading Shared Libraries152502 +Node: Obsolete153930 +Node: Undocumented154622 +Node: Invoking Summary154919 +Node: Regexp156579 +Node: Regexp Usage158033 +Node: Escape Sequences160070 +Node: Regexp Operators166302 +Ref: Regexp Operators-Footnote-1173718 +Ref: Regexp Operators-Footnote-2173865 +Node: Bracket Expressions173963 +Ref: table-char-classes176439 +Node: Leftmost Longest179576 +Node: Computed Regexps180879 +Node: GNU Regexp Operators184306 +Node: Case-sensitivity187985 +Ref: Case-sensitivity-Footnote-1190872 +Ref: Case-sensitivity-Footnote-2191107 +Node: Regexp Summary191215 +Node: Reading Files192681 +Node: Records194950 +Node: awk split records195683 +Node: gawk split records200614 +Ref: gawk split records-Footnote-1205154 +Node: Fields205191 +Node: Nonconstant Fields207932 +Ref: Nonconstant Fields-Footnote-1210168 +Node: Changing Fields210372 +Node: Field Separators216300 +Node: Default Field Splitting218998 +Node: Regexp Field Splitting220116 +Node: Single Character Fields223469 +Node: Command Line Field Separator224529 +Node: Full Line Fields227747 +Ref: Full Line Fields-Footnote-1229269 +Ref: Full Line Fields-Footnote-2229315 +Node: Field Splitting Summary229416 +Node: Constant Size231490 +Node: Fixed width data232222 +Node: Skipping intervening235689 +Node: Allowing trailing data236487 +Node: Fields with fixed data237524 +Node: Splitting By Content239042 +Ref: Splitting By Content-Footnote-1242692 +Node: Testing field creation242855 +Node: Multiple Line244476 +Ref: Multiple Line-Footnote-1250360 +Node: Getline250539 +Node: Plain Getline253008 +Node: Getline/Variable255649 +Node: Getline/File256800 +Node: Getline/Variable/File258188 +Ref: Getline/Variable/File-Footnote-1259793 +Node: Getline/Pipe259881 +Node: Getline/Variable/Pipe262588 +Node: Getline/Coprocess263723 +Node: Getline/Variable/Coprocess264990 +Node: Getline Notes265732 +Node: Getline Summary268529 +Ref: table-getline-variants268953 +Node: Read Timeout269701 +Ref: Read Timeout-Footnote-1273607 +Node: Retrying Input273665 +Node: Command-line directories274864 +Node: Input Summary275770 +Node: Input Exercises278942 +Node: Printing279670 +Node: Print281504 +Node: Print Examples282961 +Node: Output Separators285741 +Node: OFMT287758 +Node: Printf289114 +Node: Basic Printf289899 +Node: Control Letters291473 +Node: Format Modifiers295461 +Node: Printf Examples301476 +Node: Redirection303962 +Node: Special FD310803 +Ref: Special FD-Footnote-1313971 +Node: Special Files314045 +Node: Other Inherited Files314662 +Node: Special Network315663 +Node: Special Caveats316523 +Node: Close Files And Pipes317472 +Ref: table-close-pipe-return-values324379 +Ref: Close Files And Pipes-Footnote-1325162 +Ref: Close Files And Pipes-Footnote-2325310 +Node: Nonfatal325462 +Node: Output Summary327787 +Node: Output Exercises329009 +Node: Expressions329688 +Node: Values330876 +Node: Constants331554 +Node: Scalar Constants332245 +Ref: Scalar Constants-Footnote-1333109 +Node: Nondecimal-numbers333359 +Node: Regexp Constants336360 +Node: Using Constant Regexps336886 +Node: Standard Regexp Constants337508 +Node: Strong Regexp Constants340696 +Node: Variables343654 +Node: Using Variables344311 +Node: Assignment Options346221 +Node: Conversion348094 +Node: Strings And Numbers348618 +Ref: Strings And Numbers-Footnote-1351681 +Node: Locale influences conversions351790 +Ref: table-locale-affects354548 +Node: All Operators355166 +Node: Arithmetic Ops355795 +Node: Concatenation358301 +Ref: Concatenation-Footnote-1361148 +Node: Assignment Ops361255 +Ref: table-assign-ops366246 +Node: Increment Ops367559 +Node: Truth Values and Conditions371019 +Node: Truth Values372093 +Node: Typing and Comparison373141 +Node: Variable Typing373961 +Ref: Variable Typing-Footnote-1380424 +Ref: Variable Typing-Footnote-2380496 +Node: Comparison Operators380573 +Ref: table-relational-ops380992 +Node: POSIX String Comparison384487 +Ref: POSIX String Comparison-Footnote-1386182 +Ref: POSIX String Comparison-Footnote-2386321 +Node: Boolean Ops386405 +Ref: Boolean Ops-Footnote-1390887 +Node: Conditional Exp390979 +Node: Function Calls392715 +Node: Precedence396592 +Node: Locales400251 +Node: Expressions Summary401883 +Node: Patterns and Actions404456 +Node: Pattern Overview405576 +Node: Regexp Patterns407253 +Node: Expression Patterns407795 +Node: Ranges411576 +Node: BEGIN/END414684 +Node: Using BEGIN/END415445 +Ref: Using BEGIN/END-Footnote-1418181 +Node: I/O And BEGIN/END418287 +Node: BEGINFILE/ENDFILE420601 +Node: Empty423508 +Node: Using Shell Variables423825 +Node: Action Overview426099 +Node: Statements428424 +Node: If Statement430272 +Node: While Statement431767 +Node: Do Statement433795 +Node: For Statement434943 +Node: Switch Statement438101 +Node: Break Statement440487 +Node: Continue Statement442579 +Node: Next Statement444406 +Node: Nextfile Statement446789 +Node: Exit Statement449441 +Node: Built-in Variables451844 +Node: User-modified452977 +Node: Auto-set460744 +Ref: Auto-set-Footnote-1475472 +Ref: Auto-set-Footnote-2475678 +Node: ARGC and ARGV475734 +Node: Pattern Action Summary479947 +Node: Arrays482377 +Node: Array Basics483706 +Node: Array Intro484550 +Ref: figure-array-elements486525 +Ref: Array Intro-Footnote-1489229 +Node: Reference to Elements489357 +Node: Assigning Elements491821 +Node: Array Example492312 +Node: Scanning an Array494071 +Node: Controlling Scanning497093 +Ref: Controlling Scanning-Footnote-1502492 +Node: Numeric Array Subscripts502808 +Node: Uninitialized Subscripts504992 +Node: Delete506611 +Ref: Delete-Footnote-1509363 +Node: Multidimensional509420 +Node: Multiscanning512515 +Node: Arrays of Arrays514106 +Node: Arrays Summary518873 +Node: Functions520966 +Node: Built-in522004 +Node: Calling Built-in523085 +Node: Numeric Functions525081 +Ref: Numeric Functions-Footnote-1529109 +Ref: Numeric Functions-Footnote-2529466 +Ref: Numeric Functions-Footnote-3529514 +Node: String Functions529786 +Ref: String Functions-Footnote-1553444 +Ref: String Functions-Footnote-2553572 +Ref: String Functions-Footnote-3553820 +Node: Gory Details553907 +Ref: table-sub-escapes555698 +Ref: table-sub-proposed557217 +Ref: table-posix-sub558580 +Ref: table-gensub-escapes560121 +Ref: Gory Details-Footnote-1560944 +Node: I/O Functions561098 +Ref: table-system-return-values567680 +Ref: I/O Functions-Footnote-1569660 +Ref: I/O Functions-Footnote-2569808 +Node: Time Functions569928 +Ref: Time Functions-Footnote-1580595 +Ref: Time Functions-Footnote-2580663 +Ref: Time Functions-Footnote-3580821 +Ref: Time Functions-Footnote-4580932 +Ref: Time Functions-Footnote-5581044 +Ref: Time Functions-Footnote-6581271 +Node: Bitwise Functions581537 +Ref: table-bitwise-ops582131 +Ref: Bitwise Functions-Footnote-1588164 +Ref: Bitwise Functions-Footnote-2588337 +Node: Type Functions588528 +Node: I18N Functions591203 +Node: User-defined592854 +Node: Definition Syntax593659 +Ref: Definition Syntax-Footnote-1599346 +Node: Function Example599417 +Ref: Function Example-Footnote-1602339 +Node: Function Caveats602361 +Node: Calling A Function602879 +Node: Variable Scope603837 +Node: Pass By Value/Reference606831 +Node: Return Statement610330 +Node: Dynamic Typing613309 +Node: Indirect Calls614239 +Ref: Indirect Calls-Footnote-1624490 +Node: Functions Summary624618 +Node: Library Functions627323 +Ref: Library Functions-Footnote-1630930 +Ref: Library Functions-Footnote-2631073 +Node: Library Names631244 +Ref: Library Names-Footnote-1634704 +Ref: Library Names-Footnote-2634927 +Node: General Functions635013 +Node: Strtonum Function636116 +Node: Assert Function639138 +Node: Round Function642464 +Node: Cliff Random Function644005 +Node: Ordinal Functions645021 +Ref: Ordinal Functions-Footnote-1648084 +Ref: Ordinal Functions-Footnote-2648336 +Node: Join Function648546 +Ref: Join Function-Footnote-1650316 +Node: Getlocaltime Function650516 +Node: Readfile Function654258 +Node: Shell Quoting656230 +Node: Data File Management657631 +Node: Filetrans Function658263 +Node: Rewind Function662359 +Node: File Checking664265 +Ref: File Checking-Footnote-1665599 +Node: Empty Files665800 +Node: Ignoring Assigns667779 +Node: Getopt Function669329 +Ref: Getopt Function-Footnote-1680798 +Node: Passwd Functions680998 +Ref: Passwd Functions-Footnote-1689837 +Node: Group Functions689925 +Ref: Group Functions-Footnote-1697823 +Node: Walking Arrays698030 +Node: Library Functions Summary701038 +Node: Library Exercises702444 +Node: Sample Programs702909 +Node: Running Examples703679 +Node: Clones704407 +Node: Cut Program705631 +Node: Egrep Program715560 +Ref: Egrep Program-Footnote-1723072 +Node: Id Program723182 +Node: Split Program726862 +Ref: Split Program-Footnote-1730321 +Node: Tee Program730450 +Node: Uniq Program733240 +Node: Wc Program740666 +Ref: Wc Program-Footnote-1744921 +Node: Miscellaneous Programs745015 +Node: Dupword Program746228 +Node: Alarm Program748258 +Node: Translate Program753113 +Ref: Translate Program-Footnote-1757678 +Node: Labels Program757948 +Ref: Labels Program-Footnote-1761299 +Node: Word Sorting761383 +Node: History Sorting765455 +Node: Extract Program767290 +Node: Simple Sed774819 +Node: Igawk Program777893 +Ref: Igawk Program-Footnote-1792224 +Ref: Igawk Program-Footnote-2792426 +Ref: Igawk Program-Footnote-3792548 +Node: Anagram Program792663 +Node: Signature Program795725 +Node: Programs Summary796972 +Node: Programs Exercises798186 +Ref: Programs Exercises-Footnote-1802315 +Node: Advanced Features802406 +Node: Nondecimal Data804396 +Node: Array Sorting805987 +Node: Controlling Array Traversal806687 +Ref: Controlling Array Traversal-Footnote-1815054 +Node: Array Sorting Functions815172 +Ref: Array Sorting Functions-Footnote-1820263 +Node: Two-way I/O820459 +Ref: Two-way I/O-Footnote-1827010 +Ref: Two-way I/O-Footnote-2827197 +Node: TCP/IP Networking827279 +Node: Profiling830397 +Ref: Profiling-Footnote-1839069 +Node: Advanced Features Summary839392 +Node: Internationalization841236 +Node: I18N and L10N842716 +Node: Explaining gettext843403 +Ref: Explaining gettext-Footnote-1849295 +Ref: Explaining gettext-Footnote-2849480 +Node: Programmer i18n849645 +Ref: Programmer i18n-Footnote-1854594 +Node: Translator i18n854643 +Node: String Extraction855437 +Ref: String Extraction-Footnote-1856569 +Node: Printf Ordering856655 +Ref: Printf Ordering-Footnote-1859441 +Node: I18N Portability859505 +Ref: I18N Portability-Footnote-1861961 +Node: I18N Example862024 +Ref: I18N Example-Footnote-1864830 +Node: Gawk I18N864903 +Node: I18N Summary865548 +Node: Debugger866889 +Node: Debugging867911 +Node: Debugging Concepts868352 +Node: Debugging Terms870161 +Node: Awk Debugging872736 +Node: Sample Debugging Session873642 +Node: Debugger Invocation874176 +Node: Finding The Bug875562 +Node: List of Debugger Commands882040 +Node: Breakpoint Control883373 +Node: Debugger Execution Control887067 +Node: Viewing And Changing Data890429 +Node: Execution Stack893803 +Node: Debugger Info895440 +Node: Miscellaneous Debugger Commands899511 +Node: Readline Support904599 +Node: Limitations905495 +Node: Debugging Summary907604 +Node: Arbitrary Precision Arithmetic908883 +Node: Computer Arithmetic910299 +Ref: table-numeric-ranges913890 +Ref: Computer Arithmetic-Footnote-1914612 +Node: Math Definitions914669 +Ref: table-ieee-formats917983 +Ref: Math Definitions-Footnote-1918586 +Node: MPFR features918691 +Node: FP Math Caution920408 +Ref: FP Math Caution-Footnote-1921480 +Node: Inexactness of computations921849 +Node: Inexact representation922809 +Node: Comparing FP Values924169 +Node: Errors accumulate925251 +Node: Getting Accuracy926684 +Node: Try To Round929394 +Node: Setting precision930293 +Ref: table-predefined-precision-strings930990 +Node: Setting the rounding mode932820 +Ref: table-gawk-rounding-modes933194 +Ref: Setting the rounding mode-Footnote-1936602 +Node: Arbitrary Precision Integers936781 +Ref: Arbitrary Precision Integers-Footnote-1939968 +Node: POSIX Floating Point Problems940117 +Ref: POSIX Floating Point Problems-Footnote-1943999 +Node: Floating point summary944037 +Node: Dynamic Extensions946227 +Node: Extension Intro947780 +Node: Plugin License949046 +Node: Extension Mechanism Outline949843 +Ref: figure-load-extension950282 +Ref: figure-register-new-function951847 +Ref: figure-call-new-function952939 +Node: Extension API Description955001 +Node: Extension API Functions Introduction956643 +Node: General Data Types961977 +Ref: General Data Types-Footnote-1969182 +Node: Memory Allocation Functions969481 +Ref: Memory Allocation Functions-Footnote-1972326 +Node: Constructor Functions972425 +Node: Registration Functions975424 +Node: Extension Functions976109 +Node: Exit Callback Functions981322 +Node: Extension Version String982572 +Node: Input Parsers983235 +Node: Output Wrappers995942 +Node: Two-way processors1000454 +Node: Printing Messages1002719 +Ref: Printing Messages-Footnote-11003890 +Node: Updating ERRNO1004043 +Node: Requesting Values1004782 +Ref: table-value-types-returned1005519 +Node: Accessing Parameters1006455 +Node: Symbol Table Access1007690 +Node: Symbol table by name1008202 +Node: Symbol table by cookie1009991 +Ref: Symbol table by cookie-Footnote-11014176 +Node: Cached values1014240 +Ref: Cached values-Footnote-11017776 +Node: Array Manipulation1017867 +Ref: Array Manipulation-Footnote-11018958 +Node: Array Data Types1018995 +Ref: Array Data Types-Footnote-11021653 +Node: Array Functions1021745 +Node: Flattening Arrays1026144 +Node: Creating Arrays1033085 +Node: Redirection API1037854 +Node: Extension API Variables1040696 +Node: Extension Versioning1041329 +Ref: gawk-api-version1041766 +Node: Extension API Informational Variables1043494 +Node: Extension API Boilerplate1044558 +Node: Changes from API V11048420 +Node: Finding Extensions1049080 +Node: Extension Example1049639 +Node: Internal File Description1050437 +Node: Internal File Ops1054517 +Ref: Internal File Ops-Footnote-11065917 +Node: Using Internal File Ops1066057 +Ref: Using Internal File Ops-Footnote-11068440 +Node: Extension Samples1068714 +Node: Extension Sample File Functions1070243 +Node: Extension Sample Fnmatch1077892 +Node: Extension Sample Fork1079379 +Node: Extension Sample Inplace1080597 +Node: Extension Sample Ord1083807 +Node: Extension Sample Readdir1084643 +Ref: table-readdir-file-types1085532 +Node: Extension Sample Revout1086337 +Node: Extension Sample Rev2way1086926 +Node: Extension Sample Read write array1087666 +Node: Extension Sample Readfile1089608 +Node: Extension Sample Time1090703 +Node: Extension Sample API Tests1092051 +Node: gawkextlib1092543 +Node: Extension summary1094990 +Node: Extension Exercises1098692 +Node: Language History1100190 +Node: V7/SVR3.11101846 +Node: SVR41103998 +Node: POSIX1105432 +Node: BTL1106811 +Node: POSIX/GNU1107540 +Node: Feature History1113318 +Node: Common Extensions1127629 +Node: Ranges and Locales1128912 +Ref: Ranges and Locales-Footnote-11133528 +Ref: Ranges and Locales-Footnote-21133555 +Ref: Ranges and Locales-Footnote-31133790 +Node: Contributors1134011 +Node: History summary1139571 +Node: Installation1140951 +Node: Gawk Distribution1141895 +Node: Getting1142379 +Node: Extracting1143340 +Node: Distribution contents1144978 +Node: Unix Installation1151320 +Node: Quick Installation1152002 +Node: Shell Startup Files1154416 +Node: Additional Configuration Options1155505 +Node: Configuration Philosophy1157494 +Node: Non-Unix Installation1159863 +Node: PC Installation1160323 +Node: PC Binary Installation1161161 +Node: PC Compiling1161596 +Node: PC Using1162713 +Node: Cygwin1165758 +Node: MSYS1166528 +Node: VMS Installation1167029 +Node: VMS Compilation1167820 +Ref: VMS Compilation-Footnote-11169049 +Node: VMS Dynamic Extensions1169107 +Node: VMS Installation Details1170792 +Node: VMS Running1173045 +Node: VMS GNV1177324 +Node: VMS Old Gawk1178059 +Node: Bugs1178530 +Node: Bug address1179193 +Node: Usenet1181590 +Node: Maintainers1182367 +Node: Other Versions1183743 +Node: Installation summary1190327 +Node: Notes1191362 +Node: Compatibility Mode1192227 +Node: Additions1193009 +Node: Accessing The Source1193934 +Node: Adding Code1195369 +Node: New Ports1201587 +Node: Derived Files1206075 +Ref: Derived Files-Footnote-11211560 +Ref: Derived Files-Footnote-21211595 +Ref: Derived Files-Footnote-31212193 +Node: Future Extensions1212307 +Node: Implementation Limitations1212965 +Node: Extension Design1214148 +Node: Old Extension Problems1215302 +Ref: Old Extension Problems-Footnote-11216820 +Node: Extension New Mechanism Goals1216877 +Ref: Extension New Mechanism Goals-Footnote-11220241 +Node: Extension Other Design Decisions1220430 +Node: Extension Future Growth1222543 +Node: Old Extension Mechanism1223379 +Node: Notes summary1225142 +Node: Basic Concepts1226324 +Node: Basic High Level1227005 +Ref: figure-general-flow1227287 +Ref: figure-process-flow1227972 +Ref: Basic High Level-Footnote-11231273 +Node: Basic Data Typing1231458 +Node: Glossary1234786 +Node: Copying1266733 +Node: GNU Free Documentation License1304272 +Node: Index1329390 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 06add9d1..6dd00c5f 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -568,7 +568,13 @@ particular records in a file and perform operations upon them. field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is + splitting records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} @@ -6431,6 +6437,8 @@ used with it do not have to be named on the @command{awk} command line * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is splitting + records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. @@ -7756,18 +7764,30 @@ feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @command{gawk} provides a facility for dealing with fixed-width fields -with no distinctive field separator. For example, data of this nature -arises in the input for old Fortran programs where numbers are run -together, or in the output of programs that did not anticipate the use -of their output as input for other programs. - -An example of the latter is a table where all the columns are lined up by -the use of a variable number of spaces and @emph{empty fields are just -spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} -does not work well in this case. Although a portable @command{awk} program -can use a series of @code{substr()} calls on @code{$0} -(@pxref{String Functions}), -this is awkward and inefficient for a large number of fields. +with no distinctive field separator. We discuss this feature in +the following @value{SUBSECTION}s. + +@menu +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. +@end menu + +@node Fixed width data +@subsection Processing Fixed-Width Data + +An example of fixed-width data would be the input for old Fortran programs +where numbers are run together, or the output of programs that did not +anticipate the use of their output as input for other programs. + +An example of the latter is a table where all the columns are lined up +by the use of a variable number of spaces and @emph{empty fields are +just spaces}. Clearly, @command{awk}'s normal field splitting based +on @code{FS} does not work well in this case. Although a portable +@command{awk} program can use a series of @code{substr()} calls on +@code{$0} (@pxref{String Functions}), this is awkward and inefficient +for a large number of fields. @cindex troubleshooting, fatal errors, field widths@comma{} specifying @cindex @command{w} utility @@ -7775,14 +7795,12 @@ this is awkward and inefficient for a large number of fields. @cindex @command{gawk}, @code{FIELDWIDTHS} variable in The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in -variable @code{FIELDWIDTHS}. Each number specifies the width of the field, -@emph{including} columns between fields. If you want to ignore the columns -between fields, you can specify the width as a separate field that is -subsequently ignored. -Or, starting in @value{PVERSION} 4.2, each field width may optionally be -preceded by a colon-separated value specifying the number of characters to skip -before the field starts. -It is a fatal error to supply a field width that has a negative value. +variable @code{FIELDWIDTHS}. Each number specifies the width of the +field, @emph{including} columns between fields. If you want to ignore +the columns between fields, you can specify the width as a separate +field that is subsequently ignored. It is a fatal error to supply a +field width that has a negative value. + The following data is the output of the Unix @command{w} utility. It is useful to illustrate the use of @code{FIELDWIDTHS}: @@ -7812,7 +7830,7 @@ NR > 2 @{ sub(/^ +/, "", idle) # strip leading spaces if (idle == "") idle = 0 - if (idle ~ /:/) @{ + if (idle ~ /:/) @{ # hh:mm split(idle, t, ":") idle = t[1] * 60 + t[2] @} @@ -7841,13 +7859,30 @@ brent ttyp0 286 dave ttyq4 1296000 @end example -Starting in @value{PVERSION} 4.2, this program could be rewritten to -specify @code{FIELDWIDTHS} like so: +Another (possibly more practical) example of fixed-width input data +is the input from a deck of balloting cards. In some parts of +the United States, voters mark their choices by punching holes in computer +cards. These cards are then processed to count the votes for any particular +candidate or on any particular issue. Because a voter may choose not to +vote on some issue, any column on the card may be empty. An @command{awk} +program for processing such data could use the @code{FIELDWIDTHS} feature +to simplify reading the data. (Of course, getting @command{gawk} to run on +a system with card readers is another story!) + +@node Skipping intervening +@subsection Skipping Intervening Fields + +Starting in @value{PVERSION} 4.2, each field width may optionally be +preceded by a colon-separated value specifying the number of characters +to skip before the field starts. Thus, the preceding program could be +rewritten to specify @code{FIELDWIDTHS} like so: + @example BEGIN @{ FIELDWIDTHS = "8 1:5 4:7 6 1:6 1:6 2:33" @} @end example + This strips away some of the white space separating the fields. With such -a change, the program would produce the following results: +a change, the program produces the following results: @example hzang ttyV3 50 @@ -7859,42 +7894,65 @@ brent ttyp0 286 dave ttyq4 1296000 @end example -Another (possibly more practical) example of fixed-width input data -is the input from a deck of balloting cards. In some parts of -the United States, voters mark their choices by punching holes in computer -cards. These cards are then processed to count the votes for any particular -candidate or on any particular issue. Because a voter may choose not to -vote on some issue, any column on the card may be empty. An @command{awk} -program for processing such data could use the @code{FIELDWIDTHS} feature -to simplify reading the data. (Of course, getting @command{gawk} to run on -a system with card readers is another story!) +@node Allowing trailing data +@subsection Capturing Optional Trailing Data -@cindex @command{gawk}, splitting fields and -Assigning a value to @code{FS} causes @command{gawk} to use -@code{FS} for field splitting again. Use @samp{FS = FS} to make this happen, -without having to know the current value of @code{FS}. -In order to tell which kind of field splitting is in effect, -use @code{PROCINFO["FS"]} -(@pxref{Auto-set}). -The value is @code{"FS"} if regular field splitting is being used, -or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: +There are times when fixed-width data may be followed by additional data +that has no fixed length. Such data may or may not be present, but if +it is, it should be possible to get at it from an @command{awk} program. + +Starting with version 4.2, in order to provide a way to say ``anything +else in the record after the defined fields,'' @command{gawk} +allows you to add a final @samp{*} character to the value of +@code{FIELDWIDTHS}. There can only be one such character, and it must +be the final non-whitespace character in @code{FIELDWIDTHS}. +For example: @example -if (PROCINFO["FS"] == "FS") - @var{regular field splitting} @dots{} -else if (PROCINFO["FS"] == "FIELDWIDTHS") - @var{fixed-width field splitting} @dots{} -else if (PROCINFO["FS"] == "FPAT") - @var{content-based field splitting} @dots{} @ii{(see next @value{SECTION})} -else - @var{API input parser field splitting} @dots{} @ii{(advanced feature)} +$ @kbd{cat fw.awk} @ii{Show the program} +@print{} BEGIN @{ FIELDWIDTHS = "2 2 *" @} +@print{} @{ print NF, $1, $2, $3 @} +$ @kbd{cat fw.in} @ii{Show sample input} +@print{} 1234abcdefghi +$ @kbd{gawk -f fw.awk fw.in} @ii{Run the program} +@print{} 3 12 34 abcdefghi @end example -This information is useful when writing a function -that needs to temporarily change @code{FS} or @code{FIELDWIDTHS}, -read some records, and then restore the original settings -(@pxref{Passwd Functions} -for an example of such a function). +@node Fields with fixed data +@subsection Field Values With Fixed-Width Data + +So far, so good. But what happens if there isn't as much data as there +should be based on the contents of @code{FIELDWIDTHS}? Or, what happens +if there is more data than expected? + +For many years, what happens in these cases was not well defined. Starting +with version 4.2, the rules are as follows: + +@table @asis +@item Enough data for some fields +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbb}. In this case, @code{NF} is set to two. + +@item Not enough data for a field +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aab}. In this case, @code{NF} is set to two and +@code{$2} has the value @code{"b"}. The idea is that even though there +aren't as many characters as were expected, there are some, so the data +should be made available to the program. + +@item Too much data +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +three and the extra characters (@samp{ddd}) are ignored. If you want +@command{gawk} to capture the extra characters, supply a final @samp{*} +in the value of @code{FIELDWIDTHS}. + +@item Too much data, but with @samp{*} supplied +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +four, and @code{$4} has the value @code{"ddd"}. + +@end table @node Splitting By Content @section Defining Fields by Content @@ -7995,8 +8053,6 @@ affects field splitting with @code{FPAT}. Assigning a value to @code{FPAT} overrides field splitting with @code{FS} and with @code{FIELDWIDTHS}. -Similar to @code{FIELDWIDTHS}, the value of @code{PROCINFO["FS"]} -will be @code{"FPAT"} if content-based field splitting is being used. @quotation NOTE Some programs export CSV data that contains embedded newlines between @@ -8023,13 +8079,44 @@ FPAT = "([^,]*)|(\"[^\"]+\")" Finally, the @code{patsplit()} function makes the same functionality available for splitting regular strings (@pxref{String Functions}). -To recap, @command{gawk} provides three independent methods -to split input records into fields. -The mechanism used is based on which of the three -variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was -last assigned to. In addition, an API input parser may choose to -override the record parsing mechanism; please refer to @ref{Input Parsers} -for further information about this feature. + +@node Testing field creation +@section Checking How @command{gawk} Is Splitting Records + +@cindex @command{gawk}, splitting fields and +As we've seen, @command{gawk} provides three independent methods to split +input records into fields. The mechanism used is based on which of the +three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was +last assigned to. In addition, an API input parser may choose to override +the record parsing mechanism; please refer to @ref{Input Parsers} for +further information about this feature. + +To restore normal field splitting after using @code{FIELDWIDTHS} +and/or @code{FPAT}, simply assign a value to @code{FS}. +You can use @samp{FS = FS} to do this, +without having to know the current value of @code{FS}. + +In order to tell which kind of field splitting is in effect, +use @code{PROCINFO["FS"]} (@pxref{Auto-set}). +The value is @code{"FS"} if regular field splitting is being used, +@code{"FIELDWIDTHS"} if fixed-width field splitting is being used, +or @code{"FPAT"} if content-based field splitting is being used: + +@example +if (PROCINFO["FS"] == "FS") + @var{regular field splitting} @dots{} +else if (PROCINFO["FS"] == "FIELDWIDTHS") + @var{fixed-width field splitting} @dots{} +else if (PROCINFO["FS"] == "FPAT") + @var{content-based field splitting} +else + @var{API input parser field splitting} @dots{} @ii{(advanced feature)} +@end example + +This information is useful when writing a function that needs to +temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records, +and then restore the original settings (@pxref{Passwd Functions} for an +example of such a function). @node Multiple Line @section Multiple-Line Records diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 4f6d5bec..b913ab56 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -563,7 +563,13 @@ particular records in a file and perform operations upon them. field. * Field Splitting Summary:: Some final points and a summary table. * Constant Size:: Reading constant width data. +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is + splitting records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} @@ -6215,6 +6221,8 @@ used with it do not have to be named on the @command{awk} command line * Field Separators:: The field separator and how to change it. * Constant Size:: Reading constant width data. * Splitting By Content:: Defining Fields By Content +* Testing field creation:: Checking how @command{gawk} is splitting + records. * Multiple Line:: Reading multiline records. * Getline:: Reading files under explicit program control using the @code{getline} function. @@ -7356,18 +7364,30 @@ feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. @command{gawk} provides a facility for dealing with fixed-width fields -with no distinctive field separator. For example, data of this nature -arises in the input for old Fortran programs where numbers are run -together, or in the output of programs that did not anticipate the use -of their output as input for other programs. - -An example of the latter is a table where all the columns are lined up by -the use of a variable number of spaces and @emph{empty fields are just -spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} -does not work well in this case. Although a portable @command{awk} program -can use a series of @code{substr()} calls on @code{$0} -(@pxref{String Functions}), -this is awkward and inefficient for a large number of fields. +with no distinctive field separator. We discuss this feature in +the following @value{SUBSECTION}s. + +@menu +* Fixed width data:: Processing fixed-width data. +* Skipping intervening:: Skipping intervening fields. +* Allowing trailing data:: Capturing optional trailing data. +* Fields with fixed data:: Field values with fixed-width data. +@end menu + +@node Fixed width data +@subsection Processing Fixed-Width Data + +An example of fixed-width data would be the input for old Fortran programs +where numbers are run together, or the output of programs that did not +anticipate the use of their output as input for other programs. + +An example of the latter is a table where all the columns are lined up +by the use of a variable number of spaces and @emph{empty fields are +just spaces}. Clearly, @command{awk}'s normal field splitting based +on @code{FS} does not work well in this case. Although a portable +@command{awk} program can use a series of @code{substr()} calls on +@code{$0} (@pxref{String Functions}), this is awkward and inefficient +for a large number of fields. @cindex troubleshooting, fatal errors, field widths@comma{} specifying @cindex @command{w} utility @@ -7375,14 +7395,12 @@ this is awkward and inefficient for a large number of fields. @cindex @command{gawk}, @code{FIELDWIDTHS} variable in The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in -variable @code{FIELDWIDTHS}. Each number specifies the width of the field, -@emph{including} columns between fields. If you want to ignore the columns -between fields, you can specify the width as a separate field that is -subsequently ignored. -Or, starting in @value{PVERSION} 4.2, each field width may optionally be -preceded by a colon-separated value specifying the number of characters to skip -before the field starts. -It is a fatal error to supply a field width that has a negative value. +variable @code{FIELDWIDTHS}. Each number specifies the width of the +field, @emph{including} columns between fields. If you want to ignore +the columns between fields, you can specify the width as a separate +field that is subsequently ignored. It is a fatal error to supply a +field width that has a negative value. + The following data is the output of the Unix @command{w} utility. It is useful to illustrate the use of @code{FIELDWIDTHS}: @@ -7412,7 +7430,7 @@ NR > 2 @{ sub(/^ +/, "", idle) # strip leading spaces if (idle == "") idle = 0 - if (idle ~ /:/) @{ + if (idle ~ /:/) @{ # hh:mm split(idle, t, ":") idle = t[1] * 60 + t[2] @} @@ -7441,13 +7459,30 @@ brent ttyp0 286 dave ttyq4 1296000 @end example -Starting in @value{PVERSION} 4.2, this program could be rewritten to -specify @code{FIELDWIDTHS} like so: +Another (possibly more practical) example of fixed-width input data +is the input from a deck of balloting cards. In some parts of +the United States, voters mark their choices by punching holes in computer +cards. These cards are then processed to count the votes for any particular +candidate or on any particular issue. Because a voter may choose not to +vote on some issue, any column on the card may be empty. An @command{awk} +program for processing such data could use the @code{FIELDWIDTHS} feature +to simplify reading the data. (Of course, getting @command{gawk} to run on +a system with card readers is another story!) + +@node Skipping intervening +@subsection Skipping Intervening Fields + +Starting in @value{PVERSION} 4.2, each field width may optionally be +preceded by a colon-separated value specifying the number of characters +to skip before the field starts. Thus, the preceding program could be +rewritten to specify @code{FIELDWIDTHS} like so: + @example BEGIN @{ FIELDWIDTHS = "8 1:5 4:7 6 1:6 1:6 2:33" @} @end example + This strips away some of the white space separating the fields. With such -a change, the program would produce the following results: +a change, the program produces the following results: @example hzang ttyV3 50 @@ -7459,42 +7494,65 @@ brent ttyp0 286 dave ttyq4 1296000 @end example -Another (possibly more practical) example of fixed-width input data -is the input from a deck of balloting cards. In some parts of -the United States, voters mark their choices by punching holes in computer -cards. These cards are then processed to count the votes for any particular -candidate or on any particular issue. Because a voter may choose not to -vote on some issue, any column on the card may be empty. An @command{awk} -program for processing such data could use the @code{FIELDWIDTHS} feature -to simplify reading the data. (Of course, getting @command{gawk} to run on -a system with card readers is another story!) +@node Allowing trailing data +@subsection Capturing Optional Trailing Data -@cindex @command{gawk}, splitting fields and -Assigning a value to @code{FS} causes @command{gawk} to use -@code{FS} for field splitting again. Use @samp{FS = FS} to make this happen, -without having to know the current value of @code{FS}. -In order to tell which kind of field splitting is in effect, -use @code{PROCINFO["FS"]} -(@pxref{Auto-set}). -The value is @code{"FS"} if regular field splitting is being used, -or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: +There are times when fixed-width data may be followed by additional data +that has no fixed length. Such data may or may not be present, but if +it is, it should be possible to get at it from an @command{awk} program. + +Starting with version 4.2, in order to provide a way to say ``anything +else in the record after the defined fields,'' @command{gawk} +allows you to add a final @samp{*} character to the value of +@code{FIELDWIDTHS}. There can only be one such character, and it must +be the final non-whitespace character in @code{FIELDWIDTHS}. +For example: @example -if (PROCINFO["FS"] == "FS") - @var{regular field splitting} @dots{} -else if (PROCINFO["FS"] == "FIELDWIDTHS") - @var{fixed-width field splitting} @dots{} -else if (PROCINFO["FS"] == "FPAT") - @var{content-based field splitting} @dots{} @ii{(see next @value{SECTION})} -else - @var{API input parser field splitting} @dots{} @ii{(advanced feature)} +$ @kbd{cat fw.awk} @ii{Show the program} +@print{} BEGIN @{ FIELDWIDTHS = "2 2 *" @} +@print{} @{ print NF, $1, $2, $3 @} +$ @kbd{cat fw.in} @ii{Show sample input} +@print{} 1234abcdefghi +$ @kbd{gawk -f fw.awk fw.in} @ii{Run the program} +@print{} 3 12 34 abcdefghi @end example -This information is useful when writing a function -that needs to temporarily change @code{FS} or @code{FIELDWIDTHS}, -read some records, and then restore the original settings -(@pxref{Passwd Functions} -for an example of such a function). +@node Fields with fixed data +@subsection Field Values With Fixed-Width Data + +So far, so good. But what happens if there isn't as much data as there +should be based on the contents of @code{FIELDWIDTHS}? Or, what happens +if there is more data than expected? + +For many years, what happens in these cases was not well defined. Starting +with version 4.2, the rules are as follows: + +@table @asis +@item Enough data for some fields +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbb}. In this case, @code{NF} is set to two. + +@item Not enough data for a field +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aab}. In this case, @code{NF} is set to two and +@code{$2} has the value @code{"b"}. The idea is that even though there +aren't as many characters as were expected, there are some, so the data +should be made available to the program. + +@item Too much data +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +three and the extra characters (@samp{ddd}) are ignored. If you want +@command{gawk} to capture the extra characters, supply a final @samp{*} +in the value of @code{FIELDWIDTHS}. + +@item Too much data, but with @samp{*} supplied +For example, if @code{FIELDWIDTHS} is set to @code{"2 3 4 *"} and the +input record is @samp{aabbbccccddd}. In this case, @code{NF} is set to +four, and @code{$4} has the value @code{"ddd"}. + +@end table @node Splitting By Content @section Defining Fields by Content @@ -7595,8 +7653,6 @@ affects field splitting with @code{FPAT}. Assigning a value to @code{FPAT} overrides field splitting with @code{FS} and with @code{FIELDWIDTHS}. -Similar to @code{FIELDWIDTHS}, the value of @code{PROCINFO["FS"]} -will be @code{"FPAT"} if content-based field splitting is being used. @quotation NOTE Some programs export CSV data that contains embedded newlines between @@ -7623,13 +7679,44 @@ FPAT = "([^,]*)|(\"[^\"]+\")" Finally, the @code{patsplit()} function makes the same functionality available for splitting regular strings (@pxref{String Functions}). -To recap, @command{gawk} provides three independent methods -to split input records into fields. -The mechanism used is based on which of the three -variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was -last assigned to. In addition, an API input parser may choose to -override the record parsing mechanism; please refer to @ref{Input Parsers} -for further information about this feature. + +@node Testing field creation +@section Checking How @command{gawk} Is Splitting Records + +@cindex @command{gawk}, splitting fields and +As we've seen, @command{gawk} provides three independent methods to split +input records into fields. The mechanism used is based on which of the +three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was +last assigned to. In addition, an API input parser may choose to override +the record parsing mechanism; please refer to @ref{Input Parsers} for +further information about this feature. + +To restore normal field splitting after using @code{FIELDWIDTHS} +and/or @code{FPAT}, simply assign a value to @code{FS}. +You can use @samp{FS = FS} to do this, +without having to know the current value of @code{FS}. + +In order to tell which kind of field splitting is in effect, +use @code{PROCINFO["FS"]} (@pxref{Auto-set}). +The value is @code{"FS"} if regular field splitting is being used, +@code{"FIELDWIDTHS"} if fixed-width field splitting is being used, +or @code{"FPAT"} if content-based field splitting is being used: + +@example +if (PROCINFO["FS"] == "FS") + @var{regular field splitting} @dots{} +else if (PROCINFO["FS"] == "FIELDWIDTHS") + @var{fixed-width field splitting} @dots{} +else if (PROCINFO["FS"] == "FPAT") + @var{content-based field splitting} +else + @var{API input parser field splitting} @dots{} @ii{(advanced feature)} +@end example + +This information is useful when writing a function that needs to +temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records, +and then restore the original settings (@pxref{Passwd Functions} for an +example of such a function). @node Multiple Line @section Multiple-Line Records @@ -777,7 +777,7 @@ fw_parse_field(long up_to, /* parse only up to this field number */ * in practice. */ memset(&mbs, 0, sizeof(mbstate_t)); - while (nf < up_to) { + while (nf < up_to && scan < end) { if (nf >= fw->nf) { *buf = end; return nf; @@ -788,7 +788,7 @@ fw_parse_field(long up_to, /* parse only up to this field number */ scan += flen; } } else { - while (nf < up_to) { + while (nf < up_to && scan < end) { if (nf >= fw->nf) { *buf = end; return nf; @@ -1171,18 +1171,38 @@ set_FIELDWIDTHS() if (*scan == '\0') break; - /* Detect an invalid base-10 integer, a valid value that - is followed by something other than a blank or '\0', - or a value that is not in the range [1..INT_MAX]. */ + // Look for skip value. We allow N:M and N:*. + /* + * Detect an invalid base-10 integer, a valid value that + * is followed by something other than a blank or '\0', + * or a value that is not in the range [1..INT_MAX]. + */ errno = 0; tmp = strtoul(scan, &end, 10); - if (errno == 0 && *end == ':' && (0 < tmp && tmp <= INT_MAX)) { + if (errno == 0 && *end == ':' && (0 < tmp && tmp <= UINT_MAX)) { FIELDWIDTHS->fields[i].skip = tmp; scan = end + 1; + if (*scan == '*') + goto got_star; + // try scanning for field width tmp = strtoul(scan, &end, 10); } else FIELDWIDTHS->fields[i].skip = 0; + + if (*scan == '*') { + got_star: + for (scan++; is_blank(*scan); scan++) + continue; + + if (*scan != '\0') + fatal(_("`*' must be the last designator in FIELDWIDTHS")); + + FIELDWIDTHS->fields[i].len = UINT_MAX; + FIELDWIDTHS->nf = i+1; + break; + } + if (errno != 0 || (*end != '\0' && ! is_blank(*end)) || !(0 < tmp && tmp <= INT_MAX) diff --git a/support/ChangeLog b/support/ChangeLog index 1c6015f3..e2077bd0 100644 --- a/support/ChangeLog +++ b/support/ChangeLog @@ -1,3 +1,7 @@ +2017-05-22 Arnold D. Robbins <arnold@skeeve.com> + + * dfa.c, dfa.h, intprops.h, verify.h: Sync with GNULIB. + 2017-03-23 Arnold D. Robbins <arnold@skeeve.com> * dfa.c: Sync with GNULIB. diff --git a/support/dfa.c b/support/dfa.c index 2003ac55..18c17a5d 100644 --- a/support/dfa.c +++ b/support/dfa.c @@ -72,6 +72,14 @@ isasciidigit (char c) #include "xalloc.h" #include "localeinfo.h" +#ifndef FALLTHROUGH +# if __GNUC__ < 7 +# define FALLTHROUGH ((void) 0) +# else +# define FALLTHROUGH __attribute__ ((__fallthrough__)) +# endif +#endif + #ifndef MIN # define MIN(a,b) ((a) < (b) ? (a) : (b)) #endif @@ -1661,10 +1669,10 @@ addtok_mb (struct dfa *dfa, token t, char mbprop) case BACKREF: dfa->fast = false; - /* fallthrough */ + FALLTHROUGH; default: dfa->nleaves++; - /* fallthrough */ + FALLTHROUGH; case EMPTY: dfa->parse.depth++; break; @@ -2461,8 +2469,7 @@ dfaanalyze (struct dfa *d, bool searchflag) copy (&merged, &d->follows[pos[j].index]); } } - /* fallthrough */ - + FALLTHROUGH; case QMARK: /* A QMARK or STAR node is automatically nullable. */ if (d->tokens[i] != PLUS) @@ -2752,13 +2759,13 @@ build_state (state_num s, struct dfa *d, unsigned char uc) else if (d->tokens[pos.index] >= CSET) { matches = d->charclasses[d->tokens[pos.index] - CSET]; - if (tstbit (uc, &d->charclasses[d->tokens[pos.index] - CSET])) + if (tstbit (uc, &matches)) matched = true; } else if (d->tokens[pos.index] == ANYCHAR) { matches = d->charclasses[d->canychar]; - if (tstbit (uc, &d->charclasses[d->canychar])) + if (tstbit (uc, &matches)) matched = true; /* ANYCHAR must match with a single character, so we must put @@ -3378,8 +3385,7 @@ dfa_supported (struct dfa const *d) case NOTLIMWORD: if (!d->localeinfo.multibyte) continue; - /* fallthrough */ - + FALLTHROUGH; case BACKREF: case MBCSET: return false; @@ -3488,7 +3494,7 @@ dfassbuild (struct dfa *d) sup->tokens[j++] = EMPTY; break; } - /* fallthrough */ + FALLTHROUGH; default: sup->tokens[j++] = d->tokens[i]; if ((0 <= d->tokens[i] && d->tokens[i] < NOTCHAR) diff --git a/support/dfa.h b/support/dfa.h index c68b4df7..7d11f05d 100644 --- a/support/dfa.h +++ b/support/dfa.h @@ -1,5 +1,5 @@ /* dfa.h - declarations for GNU deterministic regexp compiler - Copyright (C) 1988, 1998, 2007, 2009-2016 Free Software Foundation, Inc. + Copyright (C) 1988, 1998, 2007, 2009-2017 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by diff --git a/support/intprops.h b/support/intprops.h index 716741ad..d8c71206 100644 --- a/support/intprops.h +++ b/support/intprops.h @@ -1,6 +1,6 @@ /* intprops.h -- properties of integer types - Copyright (C) 2001-2016 Free Software Foundation, Inc. + Copyright (C) 2001-2017 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published @@ -21,11 +21,6 @@ #define _GL_INTPROPS_H #include <limits.h> -#include <verify.h> - -#ifndef __has_builtin -# define __has_builtin(x) 0 -#endif /* Return a value with the common real type of E and V and the value of V. */ #define _GL_INT_CONVERT(e, v) (0 * (e) + (v)) @@ -84,24 +79,7 @@ /* This include file assumes that signed types are two's complement without padding bits; the above macros have undefined behavior otherwise. If this is a problem for you, please let us know how to fix it for your host. - As a sanity check, test the assumption for some signed types that - <limits.h> bounds. */ -verify (TYPE_MINIMUM (signed char) == SCHAR_MIN); -verify (TYPE_MAXIMUM (signed char) == SCHAR_MAX); -verify (TYPE_MINIMUM (short int) == SHRT_MIN); -verify (TYPE_MAXIMUM (short int) == SHRT_MAX); -verify (TYPE_MINIMUM (int) == INT_MIN); -verify (TYPE_MAXIMUM (int) == INT_MAX); -verify (TYPE_MINIMUM (long int) == LONG_MIN); -verify (TYPE_MAXIMUM (long int) == LONG_MAX); -#ifdef LLONG_MAX -verify (TYPE_MINIMUM (long long int) == LLONG_MIN); -verify (TYPE_MAXIMUM (long long int) == LLONG_MAX); -#endif -/* Similarly, sanity-check one ISO/IEC TS 18661-1:2014 macro if defined. */ -#ifdef UINT_WIDTH -verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); -#endif + This assumption is tested by the intprops-tests module. */ /* Does the __typeof__ keyword work? This could be done by 'configure', but for now it's easier to do it by hand. */ @@ -241,12 +219,10 @@ verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); : (max) >> (b) < (a)) /* True if __builtin_add_overflow (A, B, P) works when P is non-null. */ -#define _GL_HAS_BUILTIN_OVERFLOW \ - (5 <= __GNUC__ || __has_builtin (__builtin_add_overflow)) +#define _GL_HAS_BUILTIN_OVERFLOW (5 <= __GNUC__) /* True if __builtin_add_overflow_p (A, B, C) works. */ -#define _GL_HAS_BUILTIN_OVERFLOW_P \ - (7 <= __GNUC__ || __has_builtin (__builtin_add_overflow_p)) +#define _GL_HAS_BUILTIN_OVERFLOW_P (7 <= __GNUC__) /* The _GL*_OVERFLOW macros have the same restrictions as the *_RANGE_OVERFLOW macros, except that they do not assume that operands @@ -395,10 +371,10 @@ verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); (_Generic \ (*(r), \ signed char: \ - _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned char, \ + _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ signed char, SCHAR_MIN, SCHAR_MAX), \ short int: \ - _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned short int, \ + _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ short int, SHRT_MIN, SHRT_MAX), \ int: \ _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ @@ -412,10 +388,10 @@ verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); #else # define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) \ (sizeof *(r) == sizeof (signed char) \ - ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned char, \ + ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ signed char, SCHAR_MIN, SCHAR_MAX) \ : sizeof *(r) == sizeof (short int) \ - ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned short int, \ + ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ short int, SHRT_MIN, SHRT_MAX) \ : sizeof *(r) == sizeof (int) \ ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \ @@ -437,9 +413,8 @@ verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); /* Store the low-order bits of A <op> B into *R, where the operation is given by OP. Use the unsigned type UT for calculation to avoid - overflow problems. *R's type is T, with extremal values TMIN and - TMAX. T must be a signed integer type. Return 1 if the result - overflows. */ + overflow problems. *R's type is T, with extrema TMIN and TMAX. + T must be a signed integer type. Return 1 if the result overflows. */ #define _GL_INT_OP_CALC(a, b, r, op, overflow, ut, t, tmin, tmax) \ (sizeof ((a) op (b)) < sizeof (t) \ ? _GL_INT_OP_CALC1 ((t) (a), (t) (b), r, op, overflow, ut, t, tmin, tmax) \ @@ -448,17 +423,27 @@ verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH); ((overflow (a, b) \ || (EXPR_SIGNED ((a) op (b)) && ((a) op (b)) < (tmin)) \ || (tmax) < ((a) op (b))) \ - ? (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t, tmin, tmax), 1) \ - : (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t, tmin, tmax), 0)) - -/* Return A <op> B, where the operation is given by OP. Use the - unsigned type UT for calculation to avoid overflow problems. - Convert the result to type T without overflow by subtracting TMIN - from large values before converting, and adding it afterwards. - Compilers can optimize all the operations except OP. */ -#define _GL_INT_OP_WRAPV_VIA_UNSIGNED(a, b, op, ut, t, tmin, tmax) \ - (((ut) (a) op (ut) (b)) <= (tmax) \ - ? (t) ((ut) (a) op (ut) (b)) \ - : ((t) (((ut) (a) op (ut) (b)) - (tmin)) + (tmin))) + ? (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t), 1) \ + : (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t), 0)) + +/* Return the low-order bits of A <op> B, where the operation is given + by OP. Use the unsigned type UT for calculation to avoid undefined + behavior on signed integer overflow, and convert the result to type T. + UT is at least as wide as T and is no narrower than unsigned int, + T is two's complement, and there is no padding or trap representations. + Assume that converting UT to T yields the low-order bits, as is + done in all known two's-complement C compilers. E.g., see: + https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html + + According to the C standard, converting UT to T yields an + implementation-defined result or signal for values outside T's + range. However, code that works around this theoretical problem + runs afoul of a compiler bug in Oracle Studio 12.3 x86. See: + http://lists.gnu.org/archive/html/bug-gnulib/2017-04/msg00049.html + As the compiler bug is real, don't try to work around the + theoretical problem. */ + +#define _GL_INT_OP_WRAPV_VIA_UNSIGNED(a, b, op, ut, t) \ + ((t) ((ut) (a) op (ut) (b))) #endif /* _GL_INTPROPS_H */ diff --git a/support/verify.h b/support/verify.h index 5c8381d2..dcba9c8c 100644 --- a/support/verify.h +++ b/support/verify.h @@ -1,6 +1,6 @@ /* Compile-time assert-like macros. - Copyright (C) 2005-2006, 2009-2016 Free Software Foundation, Inc. + Copyright (C) 2005-2006, 2009-2017 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -248,7 +248,12 @@ template <int w> /* Verify requirement R at compile-time, as a declaration without a trailing ';'. */ -#define verify(R) _GL_VERIFY (R, "verify (" #R ")") +#ifdef __GNUC__ +# define verify(R) _GL_VERIFY (R, "verify (" #R ")") +#else +/* PGI barfs if R is long. Play it safe. */ +# define verify(R) _GL_VERIFY (R, "verify (...)") +#endif #ifndef __has_builtin # define __has_builtin(x) 0 diff --git a/test/ChangeLog b/test/ChangeLog index b7f18d9b..c402ed60 100644 --- a/test/ChangeLog +++ b/test/ChangeLog @@ -1,3 +1,9 @@ +2017-05-23 Arnold D. Robbins <arnold@skeeve.com> + + * Makefile.am (fwtest5, fwtest6): New tests. + * fwtest5.awk, fwtest5.in, fwtest5.ok, fwtest6.awk, fwtest6.in, + fwtest6.ok, fwtest7.awk, fwtest7.in, fwtest7.ok: New files. + 2017-05-20 Arnold D. Robbins <arnold@skeeve.com> * noeffect.awk, noeffect.ok: Updated after code change. diff --git a/test/Makefile.am b/test/Makefile.am index 29de0b18..dd5f2369 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -393,6 +393,15 @@ EXTRA_DIST = \ fwtest4.awk \ fwtest4.in \ fwtest4.ok \ + fwtest5.awk \ + fwtest5.in \ + fwtest5.ok \ + fwtest6.awk \ + fwtest6.in \ + fwtest6.ok \ + fwtest7.awk \ + fwtest7.in \ + fwtest7.ok \ genpot.awk \ genpot.ok \ gensub.awk \ @@ -1232,7 +1241,8 @@ GAWK_EXT_TESTS = \ crlf dbugeval dbugeval2 dbugtypedre1 dbugtypedre2 delsub \ devfd devfd1 devfd2 dumpvars errno exit \ fieldwdth forcenum fpat1 fpat2 fpat3 fpat4 fpat5 fpat6 fpatnull \ - fsfwfs funlen functab1 functab2 functab3 fwtest fwtest2 fwtest3 fwtest4 \ + fsfwfs funlen functab1 functab2 functab3 \ + fwtest fwtest2 fwtest3 fwtest4 fwtest5 fwtest6 fwtest7 \ genpot gensub gensub2 gensub3 getlndir gnuops2 gnuops3 gnureops gsubind \ icasefs icasers id igncdym igncfs ignrcas2 ignrcas4 ignrcase \ incdupe incdupe2 incdupe3 incdupe4 incdupe5 incdupe6 incdupe7 \ diff --git a/test/Makefile.in b/test/Makefile.in index 91fb8d8e..9d27170a 100644 --- a/test/Makefile.in +++ b/test/Makefile.in @@ -651,6 +651,15 @@ EXTRA_DIST = \ fwtest4.awk \ fwtest4.in \ fwtest4.ok \ + fwtest5.awk \ + fwtest5.in \ + fwtest5.ok \ + fwtest6.awk \ + fwtest6.in \ + fwtest6.ok \ + fwtest7.awk \ + fwtest7.in \ + fwtest7.ok \ genpot.awk \ genpot.ok \ gensub.awk \ @@ -1489,7 +1498,8 @@ GAWK_EXT_TESTS = \ crlf dbugeval dbugeval2 dbugtypedre1 dbugtypedre2 delsub \ devfd devfd1 devfd2 dumpvars errno exit \ fieldwdth forcenum fpat1 fpat2 fpat3 fpat4 fpat5 fpat6 fpatnull \ - fsfwfs funlen functab1 functab2 functab3 fwtest fwtest2 fwtest3 fwtest4 \ + fsfwfs funlen functab1 functab2 functab3 \ + fwtest fwtest2 fwtest3 fwtest4 fwtest5 fwtest6 fwtest7 \ genpot gensub gensub2 gensub3 getlndir gnuops2 gnuops3 gnureops gsubind \ icasefs icasers id igncdym igncfs ignrcas2 ignrcas4 ignrcase \ incdupe incdupe2 incdupe3 incdupe4 incdupe5 incdupe6 incdupe7 \ @@ -4025,6 +4035,21 @@ fwtest4: @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ +fwtest5: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + +fwtest6: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + +fwtest7: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + gensub: @echo $@ @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ diff --git a/test/Maketests b/test/Maketests index 0c77f98a..20b659ef 100644 --- a/test/Maketests +++ b/test/Maketests @@ -1192,6 +1192,21 @@ fwtest4: @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ +fwtest5: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + +fwtest6: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + +fwtest7: + @echo $@ + @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ + @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ + gensub: @echo $@ @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@ diff --git a/test/fwtest5.awk b/test/fwtest5.awk new file mode 100644 index 00000000..be030eab --- /dev/null +++ b/test/fwtest5.awk @@ -0,0 +1,2 @@ +BEGIN { FIELDWIDTHS = "2 3 4" } +{ print NF } diff --git a/test/fwtest5.in b/test/fwtest5.in new file mode 100644 index 00000000..c24c70ed --- /dev/null +++ b/test/fwtest5.in @@ -0,0 +1,4 @@ +12 +12345 +123456789 +123456789abcd diff --git a/test/fwtest5.ok b/test/fwtest5.ok new file mode 100644 index 00000000..7d8164bf --- /dev/null +++ b/test/fwtest5.ok @@ -0,0 +1,4 @@ +1 +2 +3 +3 diff --git a/test/fwtest6.awk b/test/fwtest6.awk new file mode 100644 index 00000000..b36d75a2 --- /dev/null +++ b/test/fwtest6.awk @@ -0,0 +1,4 @@ +# BEGIN { FIELDWIDTHS = "2 2 *" } +BEGIN { FIELDWIDTHS = "2 2 * " } +{ print NF, $1, $2, $3 } +END { FIELDWIDTHS = "2 * 2" } diff --git a/test/fwtest6.in b/test/fwtest6.in new file mode 100644 index 00000000..fea8d647 --- /dev/null +++ b/test/fwtest6.in @@ -0,0 +1 @@ +1234abcdefghi diff --git a/test/fwtest6.ok b/test/fwtest6.ok new file mode 100644 index 00000000..9ba87f2a --- /dev/null +++ b/test/fwtest6.ok @@ -0,0 +1,3 @@ +3 12 34 abcdefghi +gawk: fwtest6.awk:4: (FILENAME=- FNR=1) fatal: `*' must be the last designator in FIELDWIDTHS +EXIT CODE: 2 diff --git a/test/fwtest7.awk b/test/fwtest7.awk new file mode 100644 index 00000000..af424d94 --- /dev/null +++ b/test/fwtest7.awk @@ -0,0 +1,2 @@ +BEGIN { FIELDWIDTHS = "2 1:*" } +{ print $1, $2 } diff --git a/test/fwtest7.in b/test/fwtest7.in new file mode 100644 index 00000000..1accfe88 --- /dev/null +++ b/test/fwtest7.in @@ -0,0 +1 @@ +abcdefghijklmn diff --git a/test/fwtest7.ok b/test/fwtest7.ok new file mode 100644 index 00000000..c321c19c --- /dev/null +++ b/test/fwtest7.ok @@ -0,0 +1 @@ +ab defghijklmn |