diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 61 |
1 files changed, 35 insertions, 26 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index a809bd0d..ae0d728a 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -37,11 +37,13 @@ @ifnotdocbook @set BULLET @bullet{} @set MINUS @minus{} +@set NUL @sc{nul} @end ifnotdocbook @ifdocbook @set BULLET @set MINUS +@set NUL NUL @end ifdocbook @set xref-automatic-section-title @@ -5277,10 +5279,10 @@ with @samp{A}. @cindex POSIX @command{awk}, period (@code{.})@comma{} using In strict POSIX mode (@pxref{Options}), -@samp{.} does not match the @sc{nul} +@samp{.} does not match the @value{NUL} character, which is a character with all bits equal to zero. -Otherwise, @sc{nul} is just another character. Other versions of @command{awk} -may not be able to match the @sc{nul} character. +Otherwise, @value{NUL} is just another character. Other versions of @command{awk} +may not be able to match the @value{NUL} character. @cindex @code{[]} (square brackets), regexp operator @cindex square brackets (@code{[]}), regexp operator @@ -6429,7 +6431,7 @@ a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files. -You might think that for text files, the @sc{nul} character, which +You might think that for text files, the @value{NUL} character, which consists of a character with all bits equal to zero, is a good value to use for @code{RS} in this case: @@ -6438,23 +6440,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @end example @cindex differences in @command{awk} and @command{gawk}, strings, storing -@command{gawk} in fact accepts this, and uses the @sc{nul} +@command{gawk} in fact accepts this, and uses the @value{NUL} character for the record separator. This works for certain special files, such as @file{/proc/environ} on -GNU/Linux systems, where the @sc{nul} character is in fact the record separator. +GNU/Linux systems, where the @value{NUL} character is in fact the record separator. However, this usage is @emph{not} portable to most other @command{awk} implementations. @cindex dark corner, strings, storing Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the -@sc{nul} character as the string terminator. In effect, this means that +@value{NUL} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} -It happens that recent versions of @command{mawk} can use the @sc{nul} +It happens that recent versions of @command{mawk} can use the @value{NUL} character as a record separator. However, this is a special case: -@command{mawk} does not allow embedded @sc{nul} characters in strings. +@command{mawk} does not allow embedded @value{NUL} characters in strings. @cindex records, treating files as @cindex treating files, as single records @@ -6479,7 +6481,7 @@ a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files. -You might think that for text files, the @sc{nul} character, which +You might think that for text files, the @value{NUL} character, which consists of a character with all bits equal to zero, is a good value to use for @code{RS} in this case: @@ -6488,23 +6490,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @end example @cindex differences in @command{awk} and @command{gawk}, strings, storing -@command{gawk} in fact accepts this, and uses the @sc{nul} +@command{gawk} in fact accepts this, and uses the @value{NUL} character for the record separator. This works for certain special files, such as @file{/proc/environ} on -GNU/Linux systems, where the @sc{nul} character is in fact the record separator. +GNU/Linux systems, where the @value{NUL} character is in fact the record separator. However, this usage is @emph{not} portable to most other @command{awk} implementations. @cindex dark corner, strings, storing Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the -@sc{nul} character as the string terminator. In effect, this means that +@value{NUL} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} -It happens that recent versions of @command{mawk} can use the @sc{nul} +It happens that recent versions of @command{mawk} can use the @value{NUL} character as a record separator. However, this is a special case: -@command{mawk} does not allow embedded @sc{nul} characters in strings. +@command{mawk} does not allow embedded @value{NUL} characters in strings. @cindex records, treating files as @cindex treating files, as single records @@ -10425,7 +10427,7 @@ double-quotation marks. For example: @cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible -eight-bit ASCII characters including ASCII @sc{nul} (character code zero). +eight-bit ASCII characters including ASCII @value{NUL} (character code zero). Other @command{awk} implementations may have difficulty with some character codes. @@ -14128,9 +14130,8 @@ the beginning, in the following manner: @example NF != 4 @{ - err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) - print err > "/dev/stderr" - next + printf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) > "/dev/stderr" + next @} @end example @@ -20532,8 +20533,8 @@ function mystrtonum(str, ret, n, i, k, c) ret = 0 for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) - # index() returns 0 if c not in string, - # includes c == "0" + # index() returns 0 if c not in string, + # includes c == "0" k = index("1234567", c) ret = ret * 8 + k @@ -20546,8 +20547,8 @@ function mystrtonum(str, ret, n, i, k, c) for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) c = tolower(c) - # index() returns 0 if c not in string, - # includes c == "0" + # index() returns 0 if c not in string, + # includes c == "0" k = index("123456789abcdef", c) ret = ret * 16 + k @@ -31300,7 +31301,7 @@ and is managed by @command{gawk} from then on. The API defines several simple @code{struct}s that map values as seen from @command{awk}. A value can be a @code{double}, a string, or an array (as in multidimensional arrays, or when creating a new array). -String values maintain both pointer and length since embedded @sc{nul} +String values maintain both pointer and length since embedded @value{NUL} characters are allowed. @quotation NOTE @@ -31432,7 +31433,7 @@ Scalar values in @command{awk} are either numbers or strings. The indicates what is in the @code{union}. Representing numbers is easy---the API uses a C @code{double}. Strings -require more work. Since @command{gawk} allows embedded @sc{nul} bytes +require more work. Since @command{gawk} allows embedded @value{NUL} bytes in string values, a string must be represented as a pair containing a data-pointer and length. This is the @code{awk_string_t} type. @@ -38678,7 +38679,7 @@ the derived files, because that keeps the repository less cluttered, and it is easier to see the substantive changes when comparing versions and trying to understand what changed between commits. -However, there are two reasons why the @command{gawk} maintainer +However, there are several reasons why the @command{gawk} maintainer likes to have everything in the repository. First, because it is then easy to reproduce any given version completely, @@ -38747,6 +38748,14 @@ the maintainer is no different than Jane User who wants to try to build Thus, the maintainer thinks that it's not just important, but critical, that for any given branch, the above incantation @emph{just works}. +@c Added 9/2014: +A third reason to have all the files is that without them, using @samp{git +bisect} to try to find the commit that introduced a bug is exceedingly +difficult. The maintainer tried to do that on another project that +requires running bootstrapping scripts just to create @command{configure} +and so on; it was really painful. When the repository is self-contained, +using @command{git bisect} in it is very easy. + @c So - that's my reasoning and philosophy. What are some of the consequences and/or actions to take? |