aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--ChangeLog7
-rw-r--r--doc/ChangeLog7
-rw-r--r--doc/gawk.info1102
-rw-r--r--doc/gawk.texi256
-rw-r--r--re.c19
5 files changed, 753 insertions, 638 deletions
diff --git a/ChangeLog b/ChangeLog
index 8eb3b701..8507b985 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+Sun Jun 12 23:43:06 2011 Arnold D. Robbins <arnold@skeeve.com>
+
+ * re.c (resetup): Always turn on RE_RANGES_IGNORE_LOCALES.
+ Add justifying comment with URLs for the relevant portions of
+ POSIX. Thanks to Paul Eggert for pointing out the happy change
+ to the rules and supplying the URLs.
+
Wed Jun 8 22:41:30 2011 Arnold D. Robbins <arnold@skeeve.com>
* regcomp.c (build_range_exp): Add check for RE_NO_EMPTY_RANGES
diff --git a/doc/ChangeLog b/doc/ChangeLog
index eb54d5cf..8b32325b 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,10 @@
+Mon Jun 13 22:28:02 2011 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawk.texi: Document that POSIX now says [a-z] is undefined outside
+ the C and POSIX locales, so gawk treats it as the Good Lord intended
+ in all cases. Thanks to Paul Eggert for letting me know about this
+ and providing URLs to cite.
+
Fri May 27 09:59:38 2011 Arnold D. Robbins <arnold@skeeve.com>
* gawk.1, gawk.texi: Minor edits w.r.t. the bug reporting address.
diff --git a/doc/gawk.info b/doc/gawk.info
index 39e2f76e..3dd9d731 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -171,7 +171,6 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
-* Locales:: How the locale affects things.
* Records:: Controlling how data is split into
records.
* Fields:: An introduction to fields.
@@ -270,6 +269,7 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
third subexpression.
* Function Calls:: A function call is an expression.
* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
* Pattern Overview:: What goes into a pattern.
* Regexp Patterns:: Using regexps as patterns.
* Expression Patterns:: Any expression can be used as a
@@ -476,6 +476,8 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
* POSIX/GNU:: The extensions in `gawk' not in
POSIX `awk'.
* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
* Contributors:: The major contributors to
`gawk'.
* Gawk Distribution:: What is in the `gawk'
@@ -2849,7 +2851,6 @@ you specify more complicated classes of strings.
* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
-* Locales:: How the locale affects things.

File: gawk.info, Node: Regexp Usage, Next: Escape Sequences, Up: Regexp
@@ -3235,15 +3236,16 @@ File: gawk.info, Node: Bracket Expressions, Next: GNU Regexp Operators, Prev:
3.4 Using Bracket Expressions
=============================
-Within a bracket expression, a "range expression" consists of two
-characters separated by a hyphen. It matches any single character that
-sorts between the two characters, using the locale's collating sequence
-and character set. For example, `[0-9]' is equivalent to
-`[0123456789]'.
+As mentioned earlier, a bracket expression matches any character amongst
+those listed between the opening and closing square brackets.
- Unfortunately, providing simple character ranges such as `[a-z]'
-usually does not work like you might expect, due to locale-related
-issues. This is discussed more fully, in *note Locales::.
+ Within a bracket expression, a "range expression" consists of two
+characters separated by a hyphen. It matches any single character that
+sorts between the two characters, based upon the system's native
+character set. For example, `[0-9]' is equivalent to `[0123456789]'.
+(See *note Ranges and Locales::, for an explanation of how the POSIX
+standard and `gawk' have changed over time. This is mainly of
+historical interest.)
To include one of the characters `\', `]', `-', or `^' in a bracket
expression, put a `\' in front of it. For example:
@@ -3293,11 +3295,9 @@ Table 3.1: POSIX Character Classes
For example, before the POSIX standard, you had to write
`/[A-Za-z0-9]/' to match alphanumeric characters. If your character
-set had other alphabetic characters in it, this would not match them,
-and if your character set collated differently from ASCII, this might
-not even match the ASCII alphanumeric characters. With the POSIX
-character classes, you can write `/[[:alnum:]]/' to match the alphabetic
-and numeric characters in your character set.
+set had other alphabetic characters in it, this would not match them.
+With the POSIX character classes, you can write `/[[:alnum:]]/' to
+match the alphabetic and numeric characters in your character set.
Two additional special sequences can appear in bracket expressions.
These apply to non-ASCII character sets, which can have single symbols
@@ -3528,7 +3528,7 @@ this principle is also important for regexp-based record and field
splitting (*note Records::, and also *note Field Separators::).

-File: gawk.info, Node: Computed Regexps, Next: Locales, Prev: Leftmost Longest, Up: Regexp
+File: gawk.info, Node: Computed Regexps, Prev: Leftmost Longest, Up: Regexp
3.8 Using Dynamic Regexps
=========================
@@ -3607,86 +3607,6 @@ be used inside a bracket expression for a dynamic regexp:
often in practice, but it's worth noting for future reference.

-File: gawk.info, Node: Locales, Prev: Computed Regexps, Up: Regexp
-
-3.9 Where You Are Makes A Difference
-====================================
-
-Modern systems support the notion of "locales": a way to tell the
-system about the local character set and language. The current locale
-setting can affect the way regexp matching works, often in surprising
-ways.
-
- For example, in the default `"C"' locale, `[a-dx-z]' is equivalent to
-`[abcdxyz]'. Many locales sort characters in dictionary order, and in
-these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]';
-instead it might be equivalent to `[aBbCcdXxYyz]', for example.
-
- This point needs to be emphasized: Much literature teaches that you
-should use `[a-z]' to match a lowercase character. But on systems with
-non-ASCII locales, this also matches all of the uppercase characters
-except `Z'! This is a continuous cause of confusion, even well into
-the twenty-first century.
-
- NOTE: In an attempt to end the confusion once and for all, when
- not in POSIX mode (*note Options::), `gawk' expands ranges into
- the characters they include, based only on the machine character
- set. This restores the traditional, pre-POSIX, pre-locales
- behavior. However, you should read the rest of this section so
- that you can write portable scripts, instead of relying on
- behavior specific to `gawk'.
-
- To obtain the traditional interpretation of bracket expressions, you
-can use the `"C"' locale by setting the `LC_ALL' environment variable
-to the value `C'. However, it is best to just use POSIX character
-classes, such as `[[:lower:]]' to match specific classes of characters.
-
- To demonstrate these issues, the following example uses the `sub()'
-function, which does text replacement (*note String Functions::). Here,
-the intent is to remove trailing uppercase characters:
-
- $ echo something1234abc | gawk --posix '{ sub("[A-Z]*$", ""); print }'
- -| something1234a
-
-This output is unexpected, since the `bc' at the end of
-`something1234abc' should not normally match `[A-Z]*'. This result is
-due to the locale setting (and thus you may not see it on your system).
-There are two fixes. The first is to use the POSIX character class
-`[[:upper:]]', instead of `[A-Z]'. (This is preferred, since then your
-program will work everywhere.)
-
- The second is to change the locale setting in the environment, before
-running `gawk', by using the shell statements:
-
- LANG=C LC_ALL=C
- export LANG LC_ALL
-
- The setting `C' forces `gawk' to behave in the traditional Unix
-manner, where case distinctions do matter. You may wish to put these
-statements into your shell startup file, e.g., `$HOME/.profile'.
-
- Similar considerations apply to other ranges. For example, `["-/]'
-is perfectly valid in ASCII, but is not valid in many Unicode locales,
-such as `en_US.UTF-8'. (In general, such ranges should be avoided;
-either list the characters individually, or use a POSIX character class
-such as `[[:punct:]]'.)
-
- An additional factor relates to splitting records. For the normal
-case of `RS = "\n"', the locale is largely irrelevant. For other
-single-character record separators, using `LC_ALL=C' will give you much
-better performance when reading records. Otherwise, `gawk' has to make
-several function calls, _per input character_, to find the record
-terminator.
-
- According to POSIX, string comparison is also affected by locales
-(similar to regular expressions). The details are presented in *note
-POSIX String Comparison::.
-
- Finally, the locale affects the value of the decimal point character
-used when `gawk' parses input data. This is discussed in detail in
-*note Conversion::.
-
-
File: gawk.info, Node: Reading Files, Next: Printing, Prev: Regexp, Up: Top
4 Reading Input Files
@@ -6451,6 +6371,7 @@ operators.
* Truth Values and Conditions:: Testing for true and false.
* Function Calls:: A function call is an expression.
* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.

File: gawk.info, Node: Values, Next: All Operators, Up: Expressions
@@ -7897,7 +7818,7 @@ Here is a sample run:
-| 5 1

-File: gawk.info, Node: Precedence, Prev: Function Calls, Up: Expressions
+File: gawk.info, Node: Precedence, Next: Locales, Prev: Function Calls, Up: Expressions
6.5 Operator Precedence (How Operators Nest)
============================================
@@ -7998,6 +7919,33 @@ precedence:
POSIX. For maximum portability, do not use them.

+File: gawk.info, Node: Locales, Prev: Precedence, Up: Expressions
+
+6.6 Where You Are Makes A Difference
+====================================
+
+Modern systems support the notion of "locales": a way to tell the
+system about the local character set and language.
+
+ Once upon a time, the locale setting used to affect regexp matching
+(*note Ranges and Locales::), but this is no longer true.
+
+ Locales can affect record splitting. For the normal case of `RS =
+"\n"', the locale is largely irrelevant. For other single-character
+record separators, setting `LC_ALL=C' in the environment will give you
+much better performance when reading records. Otherwise, `gawk' has to
+make several function calls, _per input character_, to find the record
+terminator.
+
+ According to POSIX, string comparison is also affected by locales
+(similar to regular expressions). The details are presented in *note
+POSIX String Comparison::.
+
+ Finally, the locale affects the value of the decimal point character
+used when `gawk' parses input data. This is discussed in detail in
+*note Conversion::.
+
+
File: gawk.info, Node: Patterns and Actions, Next: Arrays, Prev: Expressions, Up: Top
7 Patterns, Actions, and Variables
@@ -19753,6 +19701,7 @@ you can find more information.
* POSIX/GNU:: The extensions in `gawk' not in POSIX
`awk'.
* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp ranges.
* Contributors:: The major contributors to `gawk'.

@@ -20066,7 +20015,7 @@ the current version of `gawk'.

-File: gawk.info, Node: Common Extensions, Next: Contributors, Prev: POSIX/GNU, Up: Language History
+File: gawk.info, Node: Common Extensions, Next: Ranges and Locales, Prev: POSIX/GNU, Up: Language History
A.6 Common Extensions Summary
=============================
@@ -20092,9 +20041,108 @@ Feature BWK Awk Mawk GNU Awk
`BINMODE' variable X X

-File: gawk.info, Node: Contributors, Prev: Common Extensions, Up: Language History
+File: gawk.info, Node: Ranges and Locales, Next: Contributors, Prev: Common Extensions, Up: Language History
+
+A.7 Regexp Ranges and Locales: A Long Sad Story
+===============================================
+
+This minor node describes the confusing history of ranges within
+regular expressions and their interactions with locales, and how this
+affected different versions of `gawk'.
+
+ The original Unix tools that worked with regular expressions defined
+character ranges (such as `[a-z]') to match any character between the
+first character in the range and the last character in the range,
+inclusive. Ordering was based on the numeric value of each character
+in the machine's native character set. Thus, on ASCII-based systems,
+`[a-z]' matched all the lowercase letters, and only the lowercase
+letters, since the numeric values for the letters from `a' through `z'
+were contigous. (On an EBCDIC system, the range `[a-z]' includes
+additional, non-alphabetic characters as well.)
+
+ Almost all introductory Unix literature explained range expressions
+as working in this fashion, and in particular, would teach that the
+"correct" way to match lowercase letters was with `[a-z]', and that
+`[A-Z]' was the the "correct" way to match uppercase letters. And
+indeed, this was true.
+
+ The 1993 POSIX standard introduced the idea of locales (*note
+Locales::). Since many locales include other letters besides the plain
+twenty-six letters of the American English alphabet, the POSIX standard
+added character classes (*note Bracket Expressions::) as a way to match
+different kinds of characters besides the traditional ones in the ASCII
+character set.
+
+ However, the standard _changed_ the interpretation of range
+expressions. In the `"C"' and `"POSIX"' locales, a range expression
+like `[a-dx-z]' is still equivalent to `[abcdxyz]', as in ASCII. But
+outside those locales, the ordering was defined to be based on
+"collation order".
+
+ In many locales, `A' and `a' are both less than `B'. In other
+words, these locales sort characters in dictionary order, and
+`[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might
+be equivalent to `[aBbCcdXxYyz]', for example.
+
+ This point needs to be emphasized: Much literature teaches that you
+should use `[a-z]' to match a lowercase character. But on systems with
+non-ASCII locales, this also matched all of the uppercase characters
+except `Z'! This was a continuous cause of confusion, even well into
+the twenty-first century.
+
+ To demonstrate these issues, the following example uses the `sub()'
+function, which does text replacement (*note String Functions::). Here,
+the intent is to remove trailing uppercase characters:
+
+ $ echo something1234abc | gawk-3.1.8 '{ sub("[A-Z]*$", ""); print }'
+ -| something1234a
+
+This output is unexpected, since the `bc' at the end of
+`something1234abc' should not normally match `[A-Z]*'. This result is
+due to the locale setting (and thus you may not see it on your system).
+
+ Similar considerations apply to other ranges. For example, `["-/]'
+is perfectly valid in ASCII, but is not valid in many Unicode locales,
+such as `en_US.UTF-8'.
+
+ Early versions of `gawk' used regexp matching code that was not
+locale aware, so ranges had their traditional interpretation.
+
+ When `gawk' switched to using locale-aware regexp matchers, the
+problems began; especially as both GNU/Linux and commercial Unix
+vendors started implementing non-ASCII locales, _and making them the
+default_. Perhaps the most frequently asked question became something
+like "why does `[A-Z]' match lowercase letters?!?"
+
+ This situation existed for close to 10 years, if not more, and the
+`gawk' maintainer grew weary of trying to explain that `gawk' was being
+nicely standards-compliant, and that the issue was in the user's
+locale. During the development of version 4.0, he modified `gawk' to
+always treat ranges in the original, pre-POSIX fashion, unless
+`--posix' was used (*note Options::).
+
+ Fortunately, shortly before the final release of `gawk' 4.0, the
+maintainer learned that the 2008 standard had changed the definition of
+ranges, such that outside the `"C"' and `"POSIX"' locales, the meaning
+of range expressions was _undefined_.(1)
+
+ By using this lovely technical term, the standard gives license to
+implementors to implement ranges in whatever way they choose. The
+`gawk' maintainer chose to apply the pre-POSIX meaning in all cases:
+the default regexp matching; with `--traditional', and with `--posix';
+in all cases, `gawk' remains POSIX compliant.
+
+ ---------- Footnotes ----------
+
+ (1) See the standard
+(http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05)
+and its rationale
+(http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05).
+
+
+File: gawk.info, Node: Contributors, Prev: Ranges and Locales, Up: Language History
-A.7 Major Contributors to `gawk'
+A.8 Major Contributors to `gawk'
================================
Always give credit where credit is due.
@@ -24595,7 +24643,7 @@ Index
* - (hyphen), -= operator <1>: Precedence. (line 95)
* - (hyphen), -= operator: Assignment Ops. (line 129)
* - (hyphen), filenames beginning with: Options. (line 59)
-* - (hyphen), in bracket expressions: Bracket Expressions. (line 16)
+* - (hyphen), in bracket expressions: Bracket Expressions. (line 17)
* --assign option: Options. (line 32)
* --c option: Options. (line 78)
* --characters-as-bytes option: Options. (line 68)
@@ -24765,7 +24813,7 @@ Index
(line 44)
* \ (backslash), gsub()/gensub()/sub() functions and: Gory Details.
(line 6)
-* \ (backslash), in bracket expressions: Bracket Expressions. (line 16)
+* \ (backslash), in bracket expressions: Bracket Expressions. (line 17)
* \ (backslash), in escape sequences: Escape Sequences. (line 6)
* \ (backslash), in escape sequences, POSIX and: Escape Sequences.
(line 113)
@@ -24776,7 +24824,7 @@ Index
* ^ (caret), ^ operator: Precedence. (line 49)
* ^ (caret), ^= operator <1>: Precedence. (line 95)
* ^ (caret), ^= operator: Assignment Ops. (line 129)
-* ^ (caret), in bracket expressions: Bracket Expressions. (line 16)
+* ^ (caret), in bracket expressions: Bracket Expressions. (line 17)
* ^, in FS: Regexp Field Splitting.
(line 59)
* _ (underscore), _ C macro: Explaining gettext. (line 70)
@@ -25024,7 +25072,7 @@ Index
(line 44)
* backslash (\), gsub()/gensub()/sub() functions and: Gory Details.
(line 6)
-* backslash (\), in bracket expressions: Bracket Expressions. (line 16)
+* backslash (\), in bracket expressions: Bracket Expressions. (line 17)
* backslash (\), in escape sequences: Escape Sequences. (line 6)
* backslash (\), in escape sequences, POSIX and: Escape Sequences.
(line 113)
@@ -25084,15 +25132,15 @@ Index
* bracket expressions <1>: Bracket Expressions. (line 6)
* bracket expressions: Regexp Operators. (line 55)
* bracket expressions, character classes: Bracket Expressions.
- (line 29)
+ (line 30)
* bracket expressions, collating elements: Bracket Expressions.
- (line 70)
+ (line 69)
* bracket expressions, collating symbols: Bracket Expressions.
- (line 77)
+ (line 76)
* bracket expressions, complemented: Regexp Operators. (line 63)
* bracket expressions, equivalence classes: Bracket Expressions.
- (line 83)
-* bracket expressions, non-ASCII: Bracket Expressions. (line 70)
+ (line 82)
+* bracket expressions, non-ASCII: Bracket Expressions. (line 69)
* bracket expressions, range expressions: Bracket Expressions.
(line 6)
* break debugger command: Breakpoint Control. (line 11)
@@ -25135,7 +25183,7 @@ Index
* caret (^), ^ operator: Precedence. (line 49)
* caret (^), ^= operator <1>: Precedence. (line 95)
* caret (^), ^= operator: Assignment Ops. (line 129)
-* caret (^), in bracket expressions: Bracket Expressions. (line 16)
+* caret (^), in bracket expressions: Bracket Expressions. (line 17)
* case keyword: Switch Statement. (line 6)
* case sensitivity, array indices and: Array Intro. (line 92)
* case sensitivity, converting case: String Functions. (line 522)
@@ -25175,8 +25223,8 @@ Index
* Close, Diane <1>: Contributors. (line 21)
* Close, Diane: Manual History. (line 41)
* close_func() input method: Internals. (line 160)
-* collating elements: Bracket Expressions. (line 70)
-* collating symbols: Bracket Expressions. (line 77)
+* collating elements: Bracket Expressions. (line 69)
+* collating symbols: Bracket Expressions. (line 76)
* Colombo, Antonio: Acknowledgments. (line 60)
* columns, aligning: Print Examples. (line 70)
* columns, cutting: Cut Program. (line 6)
@@ -25552,7 +25600,7 @@ Index
* e debugger command (alias for enable): Breakpoint Control. (line 72)
* EBCDIC: Ordinal Functions. (line 45)
* egrep utility <1>: Egrep Program. (line 6)
-* egrep utility: Bracket Expressions. (line 23)
+* egrep utility: Bracket Expressions. (line 24)
* egrep.awk program: Egrep Program. (line 54)
* elements in arrays: Reference to Elements.
(line 6)
@@ -25596,7 +25644,7 @@ Index
* equals sign (=), == operator <1>: Precedence. (line 65)
* equals sign (=), == operator: Comparison Operators.
(line 11)
-* EREs (Extended Regular Expressions): Bracket Expressions. (line 23)
+* EREs (Extended Regular Expressions): Bracket Expressions. (line 24)
* ERRNO variable <1>: Internals. (line 139)
* ERRNO variable <2>: TCP/IP Networking. (line 54)
* ERRNO variable <3>: Auto-set. (line 72)
@@ -25645,7 +25693,7 @@ Index
* expressions, matching, See comparison expressions: Typing and Comparison.
(line 9)
* expressions, selecting: Conditional Exp. (line 6)
-* Extended Regular Expressions (EREs): Bracket Expressions. (line 23)
+* Extended Regular Expressions (EREs): Bracket Expressions. (line 24)
* eXtensible Markup Language (XML): Internals. (line 160)
* extension() function (gawk): Using Internal File Ops.
(line 15)
@@ -25888,7 +25936,7 @@ Index
* gawk, bitwise operations in: Bitwise Functions. (line 39)
* gawk, break statement in: Break Statement. (line 51)
* gawk, built-in variables and: Built-in Variables. (line 14)
-* gawk, character classes and: Bracket Expressions. (line 91)
+* gawk, character classes and: Bracket Expressions. (line 90)
* gawk, coding style in: Adding Code. (line 38)
* gawk, command-line options: GNU Regexp Operators.
(line 70)
@@ -26074,7 +26122,7 @@ Index
* hyphen (-), -= operator <1>: Precedence. (line 95)
* hyphen (-), -= operator: Assignment Ops. (line 129)
* hyphen (-), filenames beginning with: Options. (line 59)
-* hyphen (-), in bracket expressions: Bracket Expressions. (line 16)
+* hyphen (-), in bracket expressions: Bracket Expressions. (line 17)
* i debugger command (alias for info): Dgawk Info. (line 12)
* id utility: Id Program. (line 6)
* id.awk program: Id Program. (line 30)
@@ -26177,7 +26225,7 @@ Index
(line 13)
* internationalization, localization: User-modified. (line 153)
* internationalization, localization, character classes: Bracket Expressions.
- (line 91)
+ (line 90)
* internationalization, localization, gawk and: Internationalization.
(line 13)
* internationalization, localization, locale categories: Explaining gettext.
@@ -26618,9 +26666,9 @@ Index
* POSIX awk, backslashes in string constants: Escape Sequences.
(line 113)
* POSIX awk, BEGIN/END patterns: I/O And BEGIN/END. (line 16)
-* POSIX awk, bracket expressions and: Bracket Expressions. (line 23)
+* POSIX awk, bracket expressions and: Bracket Expressions. (line 24)
* POSIX awk, bracket expressions and, character classes: Bracket Expressions.
- (line 29)
+ (line 30)
* POSIX awk, break statement and: Break Statement. (line 51)
* POSIX awk, changes in awk versions: POSIX. (line 6)
* POSIX awk, continue statement and: Continue Statement. (line 43)
@@ -27314,411 +27362,413 @@ Index

Tag Table:
Node: Top1346
-Node: Foreword33320
-Node: Preface37665
-Ref: Preface-Footnote-140632
-Ref: Preface-Footnote-240738
-Node: History40970
-Node: Names43361
-Ref: Names-Footnote-144838
-Node: This Manual44910
-Ref: This Manual-Footnote-149857
-Node: Conventions49957
-Node: Manual History52091
-Ref: Manual History-Footnote-155361
-Ref: Manual History-Footnote-255402
-Node: How To Contribute55476
-Node: Acknowledgments56620
-Node: Getting Started60951
-Node: Running gawk63330
-Node: One-shot64516
-Node: Read Terminal65741
-Ref: Read Terminal-Footnote-167391
-Ref: Read Terminal-Footnote-267667
-Node: Long67838
-Node: Executable Scripts69214
-Ref: Executable Scripts-Footnote-171083
-Ref: Executable Scripts-Footnote-271185
-Node: Comments71636
-Node: Quoting74103
-Node: DOS Quoting78726
-Node: Sample Data Files79401
-Node: Very Simple82433
-Node: Two Rules87032
-Node: More Complex89179
-Ref: More Complex-Footnote-192109
-Node: Statements/Lines92194
-Ref: Statements/Lines-Footnote-196656
-Node: Other Features96921
-Node: When97849
-Node: Invoking Gawk99996
-Node: Command Line101381
-Node: Options102164
-Ref: Options-Footnote-1115442
-Node: Other Arguments115467
-Node: Naming Standard Input118125
-Node: Environment Variables119219
-Node: AWKPATH Variable119663
-Ref: AWKPATH Variable-Footnote-1122260
-Node: Other Environment Variables122520
-Node: Exit Status124860
-Node: Include Files125535
-Node: Obsolete129020
-Node: Undocumented129706
-Node: Regexp129947
-Node: Regexp Usage131399
-Node: Escape Sequences133425
-Node: Regexp Operators139188
-Ref: Regexp Operators-Footnote-1146385
-Ref: Regexp Operators-Footnote-2146532
-Node: Bracket Expressions146630
-Ref: table-char-classes148433
-Node: GNU Regexp Operators151077
-Node: Case-sensitivity154800
-Ref: Case-sensitivity-Footnote-1157768
-Ref: Case-sensitivity-Footnote-2158003
-Node: Leftmost Longest158111
-Node: Computed Regexps159312
-Node: Locales162738
-Node: Reading Files166445
-Node: Records168386
-Ref: Records-Footnote-1177060
-Node: Fields177097
-Ref: Fields-Footnote-1180130
-Node: Nonconstant Fields180216
-Node: Changing Fields182418
-Node: Field Separators188396
-Node: Default Field Splitting191025
-Node: Regexp Field Splitting192142
-Node: Single Character Fields195484
-Node: Command Line Field Separator196543
-Node: Field Splitting Summary199984
-Ref: Field Splitting Summary-Footnote-1203176
-Node: Constant Size203277
-Node: Splitting By Content207861
-Ref: Splitting By Content-Footnote-1211587
-Node: Multiple Line211627
-Ref: Multiple Line-Footnote-1217474
-Node: Getline217653
-Node: Plain Getline219881
-Node: Getline/Variable221970
-Node: Getline/File223111
-Node: Getline/Variable/File224433
-Ref: Getline/Variable/File-Footnote-1226032
-Node: Getline/Pipe226119
-Node: Getline/Variable/Pipe228679
-Node: Getline/Coprocess229786
-Node: Getline/Variable/Coprocess231029
-Node: Getline Notes231743
-Node: Getline Summary233685
-Ref: table-getline-variants234028
-Node: Command line directories234884
-Node: Printing235509
-Node: Print237140
-Node: Print Examples238477
-Node: Output Separators241261
-Node: OFMT243021
-Node: Printf244379
-Node: Basic Printf245285
-Node: Control Letters246824
-Node: Format Modifiers250636
-Node: Printf Examples256645
-Node: Redirection259360
-Node: Special Files266344
-Node: Special FD266877
-Ref: Special FD-Footnote-1270502
-Node: Special Network270576
-Node: Special Caveats271426
-Node: Close Files And Pipes272222
-Ref: Close Files And Pipes-Footnote-1279245
-Ref: Close Files And Pipes-Footnote-2279393
-Node: Expressions279543
-Node: Values280612
-Node: Constants281288
-Node: Scalar Constants281968
-Ref: Scalar Constants-Footnote-1282827
-Node: Nondecimal-numbers283009
-Node: Regexp Constants286068
-Node: Using Constant Regexps286543
-Node: Variables289598
-Node: Using Variables290253
-Node: Assignment Options291977
-Node: Conversion293849
-Ref: table-locale-affects299225
-Ref: Conversion-Footnote-1299849
-Node: All Operators299958
-Node: Arithmetic Ops300588
-Node: Concatenation303093
-Ref: Concatenation-Footnote-1305886
-Node: Assignment Ops306006
-Ref: table-assign-ops310994
-Node: Increment Ops312402
-Node: Truth Values and Conditions315872
-Node: Truth Values316955
-Node: Typing and Comparison318004
-Node: Variable Typing318793
-Ref: Variable Typing-Footnote-1322690
-Node: Comparison Operators322812
-Ref: table-relational-ops323222
-Node: POSIX String Comparison326771
-Ref: POSIX String Comparison-Footnote-1327727
-Node: Boolean Ops327865
-Ref: Boolean Ops-Footnote-1331943
-Node: Conditional Exp332034
-Node: Function Calls333766
-Node: Precedence337360
-Node: Patterns and Actions341013
-Node: Pattern Overview342067
-Node: Regexp Patterns343733
-Node: Expression Patterns344276
-Node: Ranges347850
-Node: BEGIN/END350816
-Node: Using BEGIN/END351578
-Ref: Using BEGIN/END-Footnote-1354309
-Node: I/O And BEGIN/END354415
-Node: BEGINFILE/ENDFILE356697
-Node: Empty359530
-Node: Using Shell Variables359846
-Node: Action Overview362131
-Node: Statements364488
-Node: If Statement366342
-Node: While Statement367841
-Node: Do Statement369885
-Node: For Statement371041
-Node: Switch Statement374193
-Node: Break Statement376290
-Node: Continue Statement378280
-Node: Next Statement380067
-Node: Nextfile Statement382457
-Node: Exit Statement384754
-Node: Built-in Variables387170
-Node: User-modified388265
-Ref: User-modified-Footnote-1396291
-Node: Auto-set396353
-Ref: Auto-set-Footnote-1405644
-Node: ARGC and ARGV405849
-Node: Arrays409700
-Node: Array Basics411205
-Node: Array Intro411916
-Node: Reference to Elements416234
-Node: Assigning Elements418504
-Node: Array Example418995
-Node: Scanning an Array420727
-Node: Delete423393
-Ref: Delete-Footnote-1425828
-Node: Numeric Array Subscripts425885
-Node: Uninitialized Subscripts428068
-Node: Multi-dimensional429696
-Node: Multi-scanning432790
-Node: Arrays of Arrays434374
-Node: Functions438951
-Node: Built-in439773
-Node: Calling Built-in440851
-Node: Numeric Functions442839
-Ref: Numeric Functions-Footnote-1446604
-Ref: Numeric Functions-Footnote-2446961
-Ref: Numeric Functions-Footnote-3447009
-Node: String Functions447278
-Ref: String Functions-Footnote-1470775
-Ref: String Functions-Footnote-2470904
-Ref: String Functions-Footnote-3471152
-Node: Gory Details471239
-Ref: table-sub-escapes472918
-Ref: table-posix-sub474232
-Ref: table-gensub-escapes475145
-Node: I/O Functions476316
-Ref: I/O Functions-Footnote-1482971
-Node: Time Functions483118
-Ref: Time Functions-Footnote-1494010
-Ref: Time Functions-Footnote-2494078
-Ref: Time Functions-Footnote-3494236
-Ref: Time Functions-Footnote-4494347
-Ref: Time Functions-Footnote-5494459
-Ref: Time Functions-Footnote-6494686
-Node: Bitwise Functions494952
-Ref: table-bitwise-ops495510
-Ref: Bitwise Functions-Footnote-1499670
-Node: Type Functions499854
-Node: I18N Functions500324
-Node: User-defined501951
-Node: Definition Syntax502755
-Ref: Definition Syntax-Footnote-1507665
-Node: Function Example507734
-Node: Function Caveats510328
-Node: Calling A Function510749
-Node: Variable Scope511864
-Node: Pass By Value/Reference513839
-Node: Return Statement517279
-Node: Dynamic Typing520260
-Node: Indirect Calls520995
-Node: Internationalization530680
-Node: I18N and L10N532106
-Node: Explaining gettext532792
-Ref: Explaining gettext-Footnote-1537858
-Ref: Explaining gettext-Footnote-2538042
-Node: Programmer i18n538207
-Node: Translator i18n542407
-Node: String Extraction543200
-Ref: String Extraction-Footnote-1544161
-Node: Printf Ordering544247
-Ref: Printf Ordering-Footnote-1547031
-Node: I18N Portability547095
-Ref: I18N Portability-Footnote-1549544
-Node: I18N Example549607
-Ref: I18N Example-Footnote-1552242
-Node: Gawk I18N552314
-Node: Advanced Features552931
-Node: Nondecimal Data554444
-Node: Array Sorting556027
-Node: Controlling Array Traversal556727
-Node: Controlling Scanning With A Function557474
-Node: Controlling Scanning565177
-Ref: Controlling Scanning-Footnote-1568978
-Node: Array Sorting Functions569294
-Ref: Array Sorting Functions-Footnote-1572810
-Ref: Array Sorting Functions-Footnote-2572903
-Node: Two-way I/O573097
-Ref: Two-way I/O-Footnote-1578529
-Node: TCP/IP Networking578599
-Node: Profiling581443
-Node: Library Functions588917
-Ref: Library Functions-Footnote-1591924
-Node: Library Names592095
-Ref: Library Names-Footnote-1595566
-Ref: Library Names-Footnote-2595786
-Node: General Functions595872
-Node: Strtonum Function596825
-Node: Assert Function599755
-Node: Round Function603081
-Node: Cliff Random Function604624
-Node: Ordinal Functions605640
-Ref: Ordinal Functions-Footnote-1608710
-Ref: Ordinal Functions-Footnote-2608962
-Node: Join Function609171
-Ref: Join Function-Footnote-1610942
-Node: Gettimeofday Function611142
-Node: Data File Management614857
-Node: Filetrans Function615489
-Node: Rewind Function619628
-Node: File Checking621015
-Node: Empty Files622109
-Node: Ignoring Assigns624339
-Node: Getopt Function625892
-Ref: Getopt Function-Footnote-1637196
-Node: Passwd Functions637399
-Ref: Passwd Functions-Footnote-1646374
-Node: Group Functions646462
-Node: Walking Arrays654546
-Node: Sample Programs656115
-Node: Running Examples656780
-Node: Clones657508
-Node: Cut Program658732
-Node: Egrep Program668577
-Ref: Egrep Program-Footnote-1676350
-Node: Id Program676460
-Node: Split Program680076
-Ref: Split Program-Footnote-1683595
-Node: Tee Program683723
-Node: Uniq Program686526
-Node: Wc Program693955
-Ref: Wc Program-Footnote-1698221
-Ref: Wc Program-Footnote-2698421
-Node: Miscellaneous Programs698513
-Node: Dupword Program699701
-Node: Alarm Program701732
-Node: Translate Program706481
-Ref: Translate Program-Footnote-1710868
-Ref: Translate Program-Footnote-2711096
-Node: Labels Program711230
-Ref: Labels Program-Footnote-1714601
-Node: Word Sorting714685
-Node: History Sorting718569
-Node: Extract Program720408
-Ref: Extract Program-Footnote-1727891
-Node: Simple Sed728019
-Node: Igawk Program731081
-Ref: Igawk Program-Footnote-1746238
-Ref: Igawk Program-Footnote-2746439
-Node: Anagram Program746577
-Node: Signature Program749645
-Node: Debugger750745
-Node: Debugging751656
-Node: Debugging Concepts752069
-Node: Debugging Terms753925
-Node: Awk Debugging756548
-Node: Sample dgawk session757440
-Node: dgawk invocation757932
-Node: Finding The Bug759114
-Node: List of Debugger Commands765600
-Node: Breakpoint Control766911
-Node: Dgawk Execution Control770547
-Node: Viewing And Changing Data773898
-Node: Dgawk Stack777235
-Node: Dgawk Info778695
-Node: Miscellaneous Dgawk Commands782643
-Node: Readline Support788071
-Node: Dgawk Limitations788909
-Node: Language History791098
-Node: V7/SVR3.1792536
-Node: SVR4794857
-Node: POSIX796299
-Node: BTL797307
-Node: POSIX/GNU798041
-Node: Common Extensions803192
-Node: Contributors804293
-Node: Installation808554
-Node: Gawk Distribution809448
-Node: Getting809932
-Node: Extracting810758
-Node: Distribution contents812450
-Node: Unix Installation817672
-Node: Quick Installation818289
-Node: Additional Configuration Options820251
-Node: Configuration Philosophy821728
-Node: Non-Unix Installation824070
-Node: PC Installation824528
-Node: PC Binary Installation825827
-Node: PC Compiling827675
-Node: PC Testing830619
-Node: PC Using831795
-Node: Cygwin835980
-Node: MSYS836980
-Node: VMS Installation837494
-Node: VMS Compilation838097
-Ref: VMS Compilation-Footnote-1839104
-Node: VMS Installation Details839162
-Node: VMS Running840797
-Node: VMS Old Gawk842404
-Node: Bugs842878
-Node: Other Versions846731
-Node: Notes852012
-Node: Compatibility Mode852704
-Node: Additions853487
-Node: Accessing The Source854299
-Node: Adding Code855724
-Node: New Ports861691
-Node: Dynamic Extensions865804
-Node: Internals867180
-Node: Plugin License876283
-Node: Sample Library876917
-Node: Internal File Description877603
-Node: Internal File Ops881318
-Ref: Internal File Ops-Footnote-1886099
-Node: Using Internal File Ops886239
-Node: Future Extensions888616
-Node: Basic Concepts891120
-Node: Basic High Level891877
-Ref: Basic High Level-Footnote-1895912
-Node: Basic Data Typing896097
-Node: Floating Point Issues900622
-Node: String Conversion Precision901705
-Ref: String Conversion Precision-Footnote-1903405
-Node: Unexpected Results903514
-Node: POSIX Floating Point Problems905340
-Ref: POSIX Floating Point Problems-Footnote-1909045
-Node: Glossary909083
-Node: Copying934059
-Node: GNU Free Documentation License971616
-Node: Index996753
+Node: Foreword33440
+Node: Preface37785
+Ref: Preface-Footnote-140752
+Ref: Preface-Footnote-240858
+Node: History41090
+Node: Names43481
+Ref: Names-Footnote-144958
+Node: This Manual45030
+Ref: This Manual-Footnote-149977
+Node: Conventions50077
+Node: Manual History52211
+Ref: Manual History-Footnote-155481
+Ref: Manual History-Footnote-255522
+Node: How To Contribute55596
+Node: Acknowledgments56740
+Node: Getting Started61071
+Node: Running gawk63450
+Node: One-shot64636
+Node: Read Terminal65861
+Ref: Read Terminal-Footnote-167511
+Ref: Read Terminal-Footnote-267787
+Node: Long67958
+Node: Executable Scripts69334
+Ref: Executable Scripts-Footnote-171203
+Ref: Executable Scripts-Footnote-271305
+Node: Comments71756
+Node: Quoting74223
+Node: DOS Quoting78846
+Node: Sample Data Files79521
+Node: Very Simple82553
+Node: Two Rules87152
+Node: More Complex89299
+Ref: More Complex-Footnote-192229
+Node: Statements/Lines92314
+Ref: Statements/Lines-Footnote-196776
+Node: Other Features97041
+Node: When97969
+Node: Invoking Gawk100116
+Node: Command Line101501
+Node: Options102284
+Ref: Options-Footnote-1115562
+Node: Other Arguments115587
+Node: Naming Standard Input118245
+Node: Environment Variables119339
+Node: AWKPATH Variable119783
+Ref: AWKPATH Variable-Footnote-1122380
+Node: Other Environment Variables122640
+Node: Exit Status124980
+Node: Include Files125655
+Node: Obsolete129140
+Node: Undocumented129826
+Node: Regexp130067
+Node: Regexp Usage131456
+Node: Escape Sequences133482
+Node: Regexp Operators139245
+Ref: Regexp Operators-Footnote-1146442
+Ref: Regexp Operators-Footnote-2146589
+Node: Bracket Expressions146687
+Ref: table-char-classes148577
+Node: GNU Regexp Operators151100
+Node: Case-sensitivity154823
+Ref: Case-sensitivity-Footnote-1157791
+Ref: Case-sensitivity-Footnote-2158026
+Node: Leftmost Longest158134
+Node: Computed Regexps159335
+Node: Reading Files162745
+Node: Records164686
+Ref: Records-Footnote-1173360
+Node: Fields173397
+Ref: Fields-Footnote-1176430
+Node: Nonconstant Fields176516
+Node: Changing Fields178718
+Node: Field Separators184696
+Node: Default Field Splitting187325
+Node: Regexp Field Splitting188442
+Node: Single Character Fields191784
+Node: Command Line Field Separator192843
+Node: Field Splitting Summary196284
+Ref: Field Splitting Summary-Footnote-1199476
+Node: Constant Size199577
+Node: Splitting By Content204161
+Ref: Splitting By Content-Footnote-1207887
+Node: Multiple Line207927
+Ref: Multiple Line-Footnote-1213774
+Node: Getline213953
+Node: Plain Getline216181
+Node: Getline/Variable218270
+Node: Getline/File219411
+Node: Getline/Variable/File220733
+Ref: Getline/Variable/File-Footnote-1222332
+Node: Getline/Pipe222419
+Node: Getline/Variable/Pipe224979
+Node: Getline/Coprocess226086
+Node: Getline/Variable/Coprocess227329
+Node: Getline Notes228043
+Node: Getline Summary229985
+Ref: table-getline-variants230328
+Node: Command line directories231184
+Node: Printing231809
+Node: Print233440
+Node: Print Examples234777
+Node: Output Separators237561
+Node: OFMT239321
+Node: Printf240679
+Node: Basic Printf241585
+Node: Control Letters243124
+Node: Format Modifiers246936
+Node: Printf Examples252945
+Node: Redirection255660
+Node: Special Files262644
+Node: Special FD263177
+Ref: Special FD-Footnote-1266802
+Node: Special Network266876
+Node: Special Caveats267726
+Node: Close Files And Pipes268522
+Ref: Close Files And Pipes-Footnote-1275545
+Ref: Close Files And Pipes-Footnote-2275693
+Node: Expressions275843
+Node: Values276975
+Node: Constants277651
+Node: Scalar Constants278331
+Ref: Scalar Constants-Footnote-1279190
+Node: Nondecimal-numbers279372
+Node: Regexp Constants282431
+Node: Using Constant Regexps282906
+Node: Variables285961
+Node: Using Variables286616
+Node: Assignment Options288340
+Node: Conversion290212
+Ref: table-locale-affects295588
+Ref: Conversion-Footnote-1296212
+Node: All Operators296321
+Node: Arithmetic Ops296951
+Node: Concatenation299456
+Ref: Concatenation-Footnote-1302249
+Node: Assignment Ops302369
+Ref: table-assign-ops307357
+Node: Increment Ops308765
+Node: Truth Values and Conditions312235
+Node: Truth Values313318
+Node: Typing and Comparison314367
+Node: Variable Typing315156
+Ref: Variable Typing-Footnote-1319053
+Node: Comparison Operators319175
+Ref: table-relational-ops319585
+Node: POSIX String Comparison323134
+Ref: POSIX String Comparison-Footnote-1324090
+Node: Boolean Ops324228
+Ref: Boolean Ops-Footnote-1328306
+Node: Conditional Exp328397
+Node: Function Calls330129
+Node: Precedence333723
+Node: Locales337392
+Node: Patterns and Actions338481
+Node: Pattern Overview339535
+Node: Regexp Patterns341201
+Node: Expression Patterns341744
+Node: Ranges345318
+Node: BEGIN/END348284
+Node: Using BEGIN/END349046
+Ref: Using BEGIN/END-Footnote-1351777
+Node: I/O And BEGIN/END351883
+Node: BEGINFILE/ENDFILE354165
+Node: Empty356998
+Node: Using Shell Variables357314
+Node: Action Overview359599
+Node: Statements361956
+Node: If Statement363810
+Node: While Statement365309
+Node: Do Statement367353
+Node: For Statement368509
+Node: Switch Statement371661
+Node: Break Statement373758
+Node: Continue Statement375748
+Node: Next Statement377535
+Node: Nextfile Statement379925
+Node: Exit Statement382222
+Node: Built-in Variables384638
+Node: User-modified385733
+Ref: User-modified-Footnote-1393759
+Node: Auto-set393821
+Ref: Auto-set-Footnote-1403112
+Node: ARGC and ARGV403317
+Node: Arrays407168
+Node: Array Basics408673
+Node: Array Intro409384
+Node: Reference to Elements413702
+Node: Assigning Elements415972
+Node: Array Example416463
+Node: Scanning an Array418195
+Node: Delete420861
+Ref: Delete-Footnote-1423296
+Node: Numeric Array Subscripts423353
+Node: Uninitialized Subscripts425536
+Node: Multi-dimensional427164
+Node: Multi-scanning430258
+Node: Arrays of Arrays431842
+Node: Functions436419
+Node: Built-in437241
+Node: Calling Built-in438319
+Node: Numeric Functions440307
+Ref: Numeric Functions-Footnote-1444072
+Ref: Numeric Functions-Footnote-2444429
+Ref: Numeric Functions-Footnote-3444477
+Node: String Functions444746
+Ref: String Functions-Footnote-1468243
+Ref: String Functions-Footnote-2468372
+Ref: String Functions-Footnote-3468620
+Node: Gory Details468707
+Ref: table-sub-escapes470386
+Ref: table-posix-sub471700
+Ref: table-gensub-escapes472613
+Node: I/O Functions473784
+Ref: I/O Functions-Footnote-1480439
+Node: Time Functions480586
+Ref: Time Functions-Footnote-1491478
+Ref: Time Functions-Footnote-2491546
+Ref: Time Functions-Footnote-3491704
+Ref: Time Functions-Footnote-4491815
+Ref: Time Functions-Footnote-5491927
+Ref: Time Functions-Footnote-6492154
+Node: Bitwise Functions492420
+Ref: table-bitwise-ops492978
+Ref: Bitwise Functions-Footnote-1497138
+Node: Type Functions497322
+Node: I18N Functions497792
+Node: User-defined499419
+Node: Definition Syntax500223
+Ref: Definition Syntax-Footnote-1505133
+Node: Function Example505202
+Node: Function Caveats507796
+Node: Calling A Function508217
+Node: Variable Scope509332
+Node: Pass By Value/Reference511307
+Node: Return Statement514747
+Node: Dynamic Typing517728
+Node: Indirect Calls518463
+Node: Internationalization528148
+Node: I18N and L10N529574
+Node: Explaining gettext530260
+Ref: Explaining gettext-Footnote-1535326
+Ref: Explaining gettext-Footnote-2535510
+Node: Programmer i18n535675
+Node: Translator i18n539875
+Node: String Extraction540668
+Ref: String Extraction-Footnote-1541629
+Node: Printf Ordering541715
+Ref: Printf Ordering-Footnote-1544499
+Node: I18N Portability544563
+Ref: I18N Portability-Footnote-1547012
+Node: I18N Example547075
+Ref: I18N Example-Footnote-1549710
+Node: Gawk I18N549782
+Node: Advanced Features550399
+Node: Nondecimal Data551912
+Node: Array Sorting553495
+Node: Controlling Array Traversal554195
+Node: Controlling Scanning With A Function554942
+Node: Controlling Scanning562645
+Ref: Controlling Scanning-Footnote-1566446
+Node: Array Sorting Functions566762
+Ref: Array Sorting Functions-Footnote-1570278
+Ref: Array Sorting Functions-Footnote-2570371
+Node: Two-way I/O570565
+Ref: Two-way I/O-Footnote-1575997
+Node: TCP/IP Networking576067
+Node: Profiling578911
+Node: Library Functions586385
+Ref: Library Functions-Footnote-1589392
+Node: Library Names589563
+Ref: Library Names-Footnote-1593034
+Ref: Library Names-Footnote-2593254
+Node: General Functions593340
+Node: Strtonum Function594293
+Node: Assert Function597223
+Node: Round Function600549
+Node: Cliff Random Function602092
+Node: Ordinal Functions603108
+Ref: Ordinal Functions-Footnote-1606178
+Ref: Ordinal Functions-Footnote-2606430
+Node: Join Function606639
+Ref: Join Function-Footnote-1608410
+Node: Gettimeofday Function608610
+Node: Data File Management612325
+Node: Filetrans Function612957
+Node: Rewind Function617096
+Node: File Checking618483
+Node: Empty Files619577
+Node: Ignoring Assigns621807
+Node: Getopt Function623360
+Ref: Getopt Function-Footnote-1634664
+Node: Passwd Functions634867
+Ref: Passwd Functions-Footnote-1643842
+Node: Group Functions643930
+Node: Walking Arrays652014
+Node: Sample Programs653583
+Node: Running Examples654248
+Node: Clones654976
+Node: Cut Program656200
+Node: Egrep Program666045
+Ref: Egrep Program-Footnote-1673818
+Node: Id Program673928
+Node: Split Program677544
+Ref: Split Program-Footnote-1681063
+Node: Tee Program681191
+Node: Uniq Program683994
+Node: Wc Program691423
+Ref: Wc Program-Footnote-1695689
+Ref: Wc Program-Footnote-2695889
+Node: Miscellaneous Programs695981
+Node: Dupword Program697169
+Node: Alarm Program699200
+Node: Translate Program703949
+Ref: Translate Program-Footnote-1708336
+Ref: Translate Program-Footnote-2708564
+Node: Labels Program708698
+Ref: Labels Program-Footnote-1712069
+Node: Word Sorting712153
+Node: History Sorting716037
+Node: Extract Program717876
+Ref: Extract Program-Footnote-1725359
+Node: Simple Sed725487
+Node: Igawk Program728549
+Ref: Igawk Program-Footnote-1743706
+Ref: Igawk Program-Footnote-2743907
+Node: Anagram Program744045
+Node: Signature Program747113
+Node: Debugger748213
+Node: Debugging749124
+Node: Debugging Concepts749537
+Node: Debugging Terms751393
+Node: Awk Debugging754016
+Node: Sample dgawk session754908
+Node: dgawk invocation755400
+Node: Finding The Bug756582
+Node: List of Debugger Commands763068
+Node: Breakpoint Control764379
+Node: Dgawk Execution Control768015
+Node: Viewing And Changing Data771366
+Node: Dgawk Stack774703
+Node: Dgawk Info776163
+Node: Miscellaneous Dgawk Commands780111
+Node: Readline Support785539
+Node: Dgawk Limitations786377
+Node: Language History788566
+Node: V7/SVR3.1790078
+Node: SVR4792399
+Node: POSIX793841
+Node: BTL794849
+Node: POSIX/GNU795583
+Node: Common Extensions800734
+Node: Ranges and Locales801841
+Ref: Ranges and Locales-Footnote-1806448
+Node: Contributors806669
+Node: Installation810931
+Node: Gawk Distribution811825
+Node: Getting812309
+Node: Extracting813135
+Node: Distribution contents814827
+Node: Unix Installation820049
+Node: Quick Installation820666
+Node: Additional Configuration Options822628
+Node: Configuration Philosophy824105
+Node: Non-Unix Installation826447
+Node: PC Installation826905
+Node: PC Binary Installation828204
+Node: PC Compiling830052
+Node: PC Testing832996
+Node: PC Using834172
+Node: Cygwin838357
+Node: MSYS839357
+Node: VMS Installation839871
+Node: VMS Compilation840474
+Ref: VMS Compilation-Footnote-1841481
+Node: VMS Installation Details841539
+Node: VMS Running843174
+Node: VMS Old Gawk844781
+Node: Bugs845255
+Node: Other Versions849108
+Node: Notes854389
+Node: Compatibility Mode855081
+Node: Additions855864
+Node: Accessing The Source856676
+Node: Adding Code858101
+Node: New Ports864068
+Node: Dynamic Extensions868181
+Node: Internals869557
+Node: Plugin License878660
+Node: Sample Library879294
+Node: Internal File Description879980
+Node: Internal File Ops883695
+Ref: Internal File Ops-Footnote-1888476
+Node: Using Internal File Ops888616
+Node: Future Extensions890993
+Node: Basic Concepts893497
+Node: Basic High Level894254
+Ref: Basic High Level-Footnote-1898289
+Node: Basic Data Typing898474
+Node: Floating Point Issues902999
+Node: String Conversion Precision904082
+Ref: String Conversion Precision-Footnote-1905782
+Node: Unexpected Results905891
+Node: POSIX Floating Point Problems907717
+Ref: POSIX Floating Point Problems-Footnote-1911422
+Node: Glossary911460
+Node: Copying936436
+Node: GNU Free Documentation License973993
+Node: Index999130

End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index b9190a62..a74773ca 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -20,7 +20,7 @@
@c applies to and all the info about who's publishing this edition
@c These apply across the board.
-@set UPDATE-MONTH May, 2011
+@set UPDATE-MONTH June, 2011
@set VERSION 4.0
@set PATCHLEVEL 0
@@ -368,7 +368,6 @@ particular records in a file and perform operations upon them.
* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
-* Locales:: How the locale affects things.
* Records:: Controlling how data is split into
records.
* Fields:: An introduction to fields.
@@ -467,6 +466,7 @@ particular records in a file and perform operations upon them.
third subexpression.
* Function Calls:: A function call is an expression.
* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
* Pattern Overview:: What goes into a pattern.
* Regexp Patterns:: Using regexps as patterns.
* Expression Patterns:: Any expression can be used as a
@@ -673,6 +673,8 @@ particular records in a file and perform operations upon them.
* POSIX/GNU:: The extensions in @command{gawk} not in
POSIX @command{awk}.
* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
* Contributors:: The major contributors to
@command{gawk}.
* Gawk Distribution:: What is in the @command{gawk}
@@ -4003,7 +4005,6 @@ regular expressions work, we present more complicated instances.
* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
-* Locales:: How the locale affects things.
@end menu
@node Regexp Usage
@@ -4530,15 +4531,14 @@ As in arithmetic, parentheses can change how operators are grouped.
@cindex POSIX @command{awk}, regular expressions and
@cindex @command{gawk}, regular expressions, precedence
-In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and @samp{?} operators
-stand for themselves when there is nothing in the regexp that precedes them.
-For example, @code{/+/} matches a literal plus sign. However, many other versions of
-@command{awk} treat such a usage as a syntax error.
-
-If @command{gawk} is in compatibility mode
-(@pxref{Options}),
-interval expressions are not available in
-regular expressions.
+In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
+@samp{?} operators stand for themselves when there is nothing in the
+regexp that precedes them. For example, @code{/+/} matches a literal
+plus sign. However, many other versions of @command{awk} treat such a
+usage as a syntax error.
+
+If @command{gawk} is in compatibility mode (@pxref{Options}), interval
+expressions are not available in regular expressions.
@c ENDOFRANGE regexpo
@node Bracket Expressions
@@ -4548,15 +4548,16 @@ regular expressions.
@cindex bracket expressions, range expressions
@cindex range expressions (regexps)
+As mentioned earlier, a bracket expression matches any character amongst
+those listed between the opening and closing square brackets.
+
Within a bracket expression, a @dfn{range expression} consists of two
characters separated by a hyphen. It matches any single character that
-sorts between the two characters, using the locale's
-collating sequence and character set.
-For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
-
-Unfortunately, providing simple character ranges such as @samp{[a-z]}
-usually does not work like you might expect, due to locale-related issues.
-This is discussed more fully, in @ref{Locales}.
+sorts between the two characters, based upon the system's native character
+set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
+(See @ref{Ranges and Locales}, for an explanation of how the POSIX
+standard and @command{gawk} have changed over time. This is mainly
+of historical interest.)
@cindex @code{\} (backslash), in bracket expressions
@cindex backslash (@code{\}), in bracket expressions
@@ -4625,8 +4626,7 @@ control characters, or space characters).
For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/}
to match alphanumeric characters. If your
character set had other alphabetic characters in it, this would not
-match them, and if your character set collated differently from
-ASCII, this might not even match the ASCII alphanumeric characters.
+match them.
With the POSIX character classes, you can write
@code{/[[:alnum:]]/} to match the alphabetic
and numeric characters in your character set.
@@ -5105,94 +5105,6 @@ occur often in practice, but it's worth noting for future reference.
@c ENDOFRANGE regexpd
@c ENDOFRANGE regexp
-@node Locales
-@section Where You Are Makes A Difference
-@cindex locale, definition of
-
-Modern systems support the notion of @dfn{locales}: a way to tell
-the system about the local character set and language. The current
-locale setting can affect the way regexp matching works, often
-in surprising ways.
-
-For example, in the default @code{"C"} locale, @samp{[a-dx-z]} is equivalent to
-@samp{[abcdxyz]}. Many locales sort characters in dictionary order,
-and in these locales, @samp{[a-dx-z]} is typically not equivalent to
-@samp{[abcdxyz]}; instead it might be equivalent to @samp{[aBbCcdXxYyz]},
-for example.
-
-This point needs to be emphasized: Much literature teaches that you should
-use @samp{[a-z]} to match a lowercase character. But on systems with
-non-ASCII locales, this also matches all of the uppercase characters
-except @samp{Z}! This is a continuous cause of confusion, even well
-into the twenty-first century.
-
-@quotation NOTE
-In an attempt to end the confusion once and for all,
-when not in POSIX mode (@pxref{Options}),
-@command{gawk} expands ranges into the characters they
-include, based only on the machine character set.
-This restores the traditional, pre-POSIX, pre-locales
-behavior. However, you should read the rest of this section
-so that you can write portable scripts, instead of relying
-on behavior specific to @command{gawk}.
-@end quotation
-
-To obtain the traditional interpretation of bracket expressions, you can
-use the @code{"C"} locale by setting the @env{LC_ALL} environment variable to the
-value @samp{C}. However, it is best to just use POSIX character classes,
-such as @samp{[[:lower:]]} to match specific classes of characters.
-
-To demonstrate these issues, the following example uses the @code{sub()}
-function, which does text replacement (@pxref{String Functions}). Here,
-the intent is to remove trailing uppercase characters:
-
-@example
-$ @kbd{echo something1234abc | gawk --posix '@{ sub("[A-Z]*$", ""); print @}'}
-@print{} something1234a
-@end example
-
-@noindent
-This output is unexpected, since the @samp{bc} at the end of
-@samp{something1234abc} should not normally match @samp{[A-Z]*}.
-This result is due to the locale setting (and thus you may not see
-it on your system). There are two fixes. The first is to use the
-POSIX character class @samp{[[:upper:]]}, instead of @samp{[A-Z]}.
-(This is preferred, since then your program will work everywhere.)
-
-The second is to change the locale setting in the environment, before
-running @command{gawk}, by using the shell statements:
-
-@example
-LANG=C LC_ALL=C
-export LANG LC_ALL
-@end example
-
-The setting @samp{C} forces @command{gawk} to behave in the traditional
-Unix manner, where case distinctions do matter.
-You may wish to put these statements into your shell startup file,
-e.g., @file{$HOME/.profile}.
-
-Similar considerations apply to other ranges. For example,
-@samp{["-/]} is perfectly valid in ASCII, but is not valid in many
-Unicode locales, such as @samp{en_US.UTF-8}. (In general, such
-ranges should be avoided; either list the characters individually,
-or use a POSIX character class such as @samp{[[:punct:]]}.)
-
-An additional factor relates to splitting records.
-For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant.
-For other single-character record separators, using @samp{LC_ALL=C}
-will give you much better performance when reading records. Otherwise,
-@command{gawk} has to make several function calls, @emph{per input
-character}, to find the record terminator.
-
-According to POSIX, string comparison is also affected by locales
-(similar to regular expressions). The details are presented in
-@ref{POSIX String Comparison}.
-
-Finally, the locale affects the value of the decimal point character
-used when @command{gawk} parses input data. This is discussed in
-detail in @ref{Conversion}.
-
@node Reading Files
@chapter Reading Input Files
@@ -8773,6 +8685,7 @@ combinations of these with various operators.
* Truth Values and Conditions:: Testing for true and false.
* Function Calls:: A function call is an expression.
* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
@end menu
@node Values
@@ -10933,6 +10846,33 @@ For maximum portability, do not use them.
@end quotation
@c ENDOFRANGE prec
@c ENDOFRANGE oppr
+
+@node Locales
+@section Where You Are Makes A Difference
+@cindex locale, definition of
+
+Modern systems support the notion of @dfn{locales}: a way to tell
+the system about the local character set and language.
+
+Once upon a time, the locale setting used to affect regexp matching
+(@pxref{Ranges and Locales}), but this is no longer true.
+
+Locales can affect record splitting.
+For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant.
+For other single-character record separators, setting @samp{LC_ALL=C}
+in the environment
+will give you much better performance when reading records. Otherwise,
+@command{gawk} has to make several function calls, @emph{per input
+character}, to find the record terminator.
+
+According to POSIX, string comparison is also affected by locales
+(similar to regular expressions). The details are presented in
+@ref{POSIX String Comparison}.
+
+Finally, the locale affects the value of the decimal point character
+used when @command{gawk} parses input data. This is discussed in
+detail in @ref{Conversion}.
+
@c ENDOFRANGE exps
@node Patterns and Actions
@@ -26434,6 +26374,7 @@ of the @value{DOCUMENT} where you can find more information.
* POSIX/GNU:: The extensions in @command{gawk} not in POSIX
@command{awk}.
* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp ranges.
* Contributors:: The major contributors to @command{gawk}.
@end menu
@@ -26977,6 +26918,103 @@ the three most widely-used freely available versions of @command{awk}
@item @code{BINMODE} variable @tab @tab X @tab X
@end multitable
+@node Ranges and Locales
+@appendixsec Regexp Ranges and Locales: A Long Sad Story
+
+This @value{SECTION} describes the confusing history of ranges within
+regular expressions and their interactions with locales, and how this
+affected different versions of @command{gawk}.
+
+The original Unix tools that worked with regular expressions defined
+character ranges (such as @samp{[a-z]}) to match any character between
+the first character in the range and the last character in the range,
+inclusive. Ordering was based on the numeric value of each character
+in the machine's native character set. Thus, on ASCII-based systems,
+@code{[a-z]} matched all the lowercase letters, and only the lowercase
+letters, since the numeric values for the letters from @samp{a} through
+@samp{z} were contigous. (On an EBCDIC system, the range @samp{[a-z]}
+includes additional, non-alphabetic characters as well.)
+
+Almost all introductory Unix literature explained range expressions
+as working in this fashion, and in particular, would teach that the
+``correct'' way to match lowercase letters was with @samp{[a-z]}, and
+that @samp{[A-Z]} was the the ``correct'' way to match uppercase letters.
+And indeed, this was true.
+
+The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
+Since many locales include other letters besides the plain twenty-six
+letters of the American English alphabet, the POSIX standard added
+character classes (@pxref{Bracket Expressions}) as a way to match
+different kinds of characters besides the traditional ones in the ASCII
+character set.
+
+However, the standard @emph{changed} the interpretation of range expressions.
+In the @code{"C"} and @code{"POSIX"} locales, a range expression like
+@samp{[a-dx-z]} is still equivalent to @samp{[abcdxyz]}, as in ASCII.
+But outside those locales, the ordering was defined to be based on
+@dfn{collation order}.
+
+In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
+In other words, these locales sort characters in dictionary order,
+and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
+instead it might be equivalent to @samp{[aBbCcdXxYyz]}, for example.
+
+This point needs to be emphasized: Much literature teaches that you should
+use @samp{[a-z]} to match a lowercase character. But on systems with
+non-ASCII locales, this also matched all of the uppercase characters
+except @samp{Z}! This was a continuous cause of confusion, even well
+into the twenty-first century.
+
+To demonstrate these issues, the following example uses the @code{sub()}
+function, which does text replacement (@pxref{String Functions}). Here,
+the intent is to remove trailing uppercase characters:
+
+@example
+$ @kbd{echo something1234abc | gawk-3.1.8 '@{ sub("[A-Z]*$", ""); print @}'}
+@print{} something1234a
+@end example
+
+@noindent
+This output is unexpected, since the @samp{bc} at the end of
+@samp{something1234abc} should not normally match @samp{[A-Z]*}.
+This result is due to the locale setting (and thus you may not see
+it on your system).
+
+Similar considerations apply to other ranges. For example, @samp{["-/]}
+is perfectly valid in ASCII, but is not valid in many Unicode locales,
+such as @samp{en_US.UTF-8}.
+
+Early versions of @command{gawk} used regexp matching code that was not
+locale aware, so ranges had their traditional interpretation.
+
+When @command{gawk} switched to using locale-aware regexp matchers,
+the problems began; especially as both GNU/Linux and commercial Unix
+vendors started implementing non-ASCII locales, @emph{and making them
+the default}. Perhaps the most frequently asked question became something
+like ``why does @code{[A-Z]} match lowercase letters?!?''
+
+This situation existed for close to 10 years, if not more, and
+the @command{gawk} maintainer grew weary of trying to explain that
+@command{gawk} was being nicely standards-compliant, and that the issue
+was in the user's locale. During the development of version 4.0,
+he modified @command{gawk} to always treat ranges in the original,
+pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).
+
+Fortunately, shortly before the final release of @command{gawk} 4.0,
+the maintainer learned that the 2008 standard had changed the
+definition of ranges, such that outside the @code{"C"} and @code{"POSIX"}
+locales, the meaning of range expressions was
+@emph{undefined}.@footnote{See
+@uref{http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05, the standard}
+and
+@uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05, its rationale}.}
+
+By using this lovely technical term, the standard gives license
+to implementors to implement ranges in whatever way they choose.
+The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
+cases: the default regexp matching; with @option{--traditional}, and with
+@option{--posix}; in all cases, @command{gawk} remains POSIX compliant.
+
@node Contributors
@appendixsec Major Contributors to @command{gawk}
@cindex @command{gawk}, list of contributors to
diff --git a/re.c b/re.c
index 3dce1d52..2e1a37e7 100644
--- a/re.c
+++ b/re.c
@@ -382,13 +382,26 @@ resetup()
{
if (do_posix)
syn = RE_SYNTAX_POSIX_AWK; /* strict POSIX re's */
- else if (do_traditional) {
+ else if (do_traditional)
syn = RE_SYNTAX_AWK; /* traditional Unix awk re's */
- syn |= RE_RANGES_IGNORE_LOCALES;
- } else
+ else
syn = RE_SYNTAX_GNU_AWK; /* POSIX re's + GNU ops */
/*
+ * As of POSIX 1003.1-2008 (see rule 7 of
+ * http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
+ * and the rationale, at http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05)
+ * POSIX changed ranges outside the POSIX locale from requiring
+ * Collation Element Order to being "undefined". This gives an
+ * implementation, like gawk, the freedom to do ranges as it
+ * pleases.
+ *
+ * We very much please to always use numeric ordering, as
+ * the Good Lord intended.
+ */
+ syn |= RE_RANGES_IGNORE_LOCALES;
+
+ /*
* Interval expressions are now on by default, as POSIX is
* wide-spread enough that people want it. The do_intervals
* variable remains for use with --traditional.