diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 78 |
1 files changed, 43 insertions, 35 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 19f5d3d5..7f3c3d91 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -4327,23 +4327,23 @@ may not be able to match the @sc{nul} character. @cindex @code{[]} (square brackets) @cindex square brackets (@code{[]}) -@cindex character lists -@cindex character sets, See Also character lists -@cindex bracket expressions, See character lists +@cindex bracket expressions +@cindex character sets, See Also bracket expressions +@cindex character lists, See bracket expressions @item [@dots{}] -This is called a @dfn{character list}.@footnote{In other literature, -you may see a character list referred to as either a -@dfn{character set}, a @dfn{character class}, or a @dfn{bracket expression}.} +This is called a @dfn{bracket expression}.@footnote{In other literature, +you may see a bracket expression referred to as either a +@dfn{character set}, a @dfn{character class}, or a @dfn{character list}.} It matches any @emph{one} of the characters that are enclosed in the square brackets. For example, @samp{[MVX]} matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a string. A full -discussion of what can be inside the square brackets of a character list +discussion of what can be inside the square brackets of a bracket expression is given in @ref{Character Lists}. -@cindex character lists, complemented +@cindex bracket expressions, complemented @item [^ @dots{}] -This is a @dfn{complemented character list}. The first character after +This is a @dfn{complemented bracket expression}. The first character after the @samp{[} @emph{must} be a @samp{^}. It matches any characters @emph{except} those in the square brackets. For example, @samp{[^awk]} matches any character that is not an @samp{a}, @samp{w}, @@ -4483,11 +4483,11 @@ regular expressions. @node Character Lists @section Using Character Lists @c STARTOFRANGE charlist -@cindex character lists -@cindex character lists, range expressions +@cindex bracket expressions +@cindex bracket expressions, range expressions @cindex range expressions (regexps) -Within a character list, a @dfn{range expression} consists of two +Within a bracket expression, a @dfn{range expression} consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's collating sequence and character set. @@ -4497,14 +4497,14 @@ Unfortunately, providing simple character ranges such as @samp{[a-z]} usually does not work like you might expect, due to locale-related issues. This is discussed more fully, in @ref{Locales}. -@cindex @code{\} (backslash), in character lists -@cindex backslash (@code{\}), in character lists -@cindex @code{^} (caret), in character lists -@cindex caret (@code{^}), in character lists -@cindex @code{-} (hyphen), in character lists -@cindex hyphen (@code{-}), in character lists +@cindex @code{\} (backslash), in bracket expressions +@cindex backslash (@code{\}), in bracket expressions +@cindex @code{^} (caret), in bracket expressions +@cindex caret (@code{^}), in bracket expressions +@cindex @code{-} (hyphen), in bracket expressions +@cindex hyphen (@code{-}), in bracket expressions To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a -character list, put a @samp{\} in front of it. For example: +bracket expression, put a @samp{\} in front of it. For example: @example [d\]] @@ -4513,11 +4513,11 @@ character list, put a @samp{\} in front of it. For example: @noindent matches either @samp{d} or @samp{]}. -@cindex POSIX @command{awk}, character lists and +@cindex POSIX @command{awk}, bracket expressions and @cindex Extended Regular Expressions (EREs) @cindex EREs (Extended Regular Expressions) @cindex @command{egrep} utility -This treatment of @samp{\} in character lists +This treatment of @samp{\} in bracket expressions is compatible with other @command{awk} implementations and is also mandated by POSIX. The regular expressions in @command{awk} are a superset @@ -4525,8 +4525,8 @@ of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional @command{egrep} utility. -@cindex character lists, character classes -@cindex POSIX @command{awk}, character lists and, character classes +@cindex bracket expressions, character classes +@cindex POSIX @command{awk}, bracket expressions and, character classes @dfn{Character classes} are a feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but the @@ -4535,7 +4535,7 @@ from character set to character set. For example, the notion of what is an alphabetic character differs between the United States and France. A character class is only valid in a regexp @emph{inside} the -brackets of a character list. Character classes consist of @samp{[:}, +brackets of a bracket expression. Character classes consist of @samp{[:}, a keyword denoting the class, and @samp{:]}. @ref{table-char-classes} lists the character classes defined by the POSIX standard. @@ -4570,10 +4570,10 @@ With the POSIX character classes, you can write @code{/[[:alnum:]]/} to match the alphabetic and numeric characters in your character set. -@cindex character lists, collating elements -@cindex character lists, non-ASCII +@cindex bracket expressions, collating elements +@cindex bracket expressions, non-ASCII @cindex collating elements -Two additional special sequences can appear in character lists. +Two additional special sequences can appear in bracket expressions. These apply to non-ASCII character sets, which can have single symbols (called @dfn{collating elements}) that are represented with more than one character. They can also have several characters that are equivalent for @@ -4582,7 +4582,7 @@ and a grave-accented ``@`e'' are equivalent.) These sequences are: @table @asis -@cindex character lists, collating symbols +@cindex bracket expressions, collating symbols @cindex collating symbols @item Collating symbols Multicharacter collating elements enclosed between @@ -4590,7 +4590,7 @@ Multicharacter collating elements enclosed between then @code{[[.ch.]]} is a regexp that matches this collating element, whereas @code{[ch]} is a regexp that matches either @samp{c} or @samp{h}. -@cindex character lists, equivalence classes +@cindex bracket expressions, equivalence classes @item Equivalence classes Locale-specific names for a list of characters that are equal. The name is enclosed between @@ -4604,7 +4604,7 @@ These features are very valuable in non-English-speaking locales. @cindex internationalization, localization, character classes @cindex @command{gawk}, character classes and -@cindex POSIX @command{awk}, character lists and, character classes +@cindex POSIX @command{awk}, bracket expressions and, character classes @quotation CAUTION The library functions that @command{gawk} uses for regular expression matching currently recognize only POSIX character classes; @@ -4856,7 +4856,7 @@ that it is possible, using something like and @samp{IGNORECASE = 0 || /foobar/ @{ @dots{} @}}. However, this is somewhat obscure and we don't recommend it.} -To do this, use either character lists or @code{tolower()}. However, one +To do this, use either bracket expressions or @code{tolower()}. However, one thing you can do with @code{IGNORECASE} only is dynamically turn case-sensitivity on or off for all the rules at once. @@ -5017,7 +5017,7 @@ intend a regexp match. @cindex newlines, in dynamic regexps Some commercial versions of @command{awk} do not allow the newline -character to be used inside a character list for a dynamic regexp: +character to be used inside a bracket expression for a dynamic regexp: @example $ @kbd{awk '$0 ~ "[ \t\n]"'} @@ -5122,7 +5122,7 @@ For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant. For other single-character record separators, using @samp{LC_ALL=C} will give you much better performance when reading records. Otherwise, @command{gawk} has to make several function calls, @emph{per input -character} to find the record terminator. +character}, to find the record terminator. Finally, the locale affects the value of the decimal point character used when @command{gawk} parses input data. This is discussed in @@ -8657,6 +8657,8 @@ returns zero on success and non-zero otherwise. In general, different implementations vary in what they report when closing pipes; thus the return value cannot be used portably. @value{DARKCORNER} +In POSIX mode (@pxref{Options}), @command{gawk} just returns zero +when closing a pipe. @c ENDOFRANGE ifc @c ENDOFRANGE ofc @@ -21643,7 +21645,7 @@ The @code{set_charlist()} function is more complicated than @code{set_fieldlist()}. The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable (@pxref{Constant Size}), -which describes constant-width input. When using a character list, that is +which describes constant-width input. When using a bracket expression, that is exactly what we have. Setting up @code{FIELDWIDTHS} is more complicated than simply listing the @@ -21671,7 +21673,7 @@ function set_charlist( field, i, j, f, g, t, if (index(f[i], "-") != 0) @{ # range m = split(f[i], g, "-") if (m != 2 || g[1] >= g[2]) @{ - printf("bad character list: %s\n", + printf("bad bracket expression: %s\n", f[i]) > "/dev/stderr" exit 1 @} @@ -22189,6 +22191,8 @@ arguments and perform in the same way. @node Split Program @subsection Splitting a Large File into Pieces +@c FIXME: One day, update to current POSIX version of split + @c STARTOFRANGE filspl @cindex files, splitting @cindex @code{split} utility @@ -22458,6 +22462,8 @@ END \ @node Uniq Program @subsection Printing Nonduplicated Lines of Text +@c FIXME: One day, update to current POSIX version of uniq + @c STARTOFRANGE prunt @cindex printing, unduplicated lines of text @c STARTOFRANGE tpul @@ -22717,6 +22723,8 @@ END @{ @node Wc Program @subsection Counting Things +@c FIXME: One day, update to current POSIX version of wc + @c STARTOFRANGE count @cindex counting @c STARTOFRANGE infco |