aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi376
1 files changed, 170 insertions, 206 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index bec760b1..3b9a1bdd 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -6,6 +6,7 @@ TODO:
Document common extensions with COMMONEXT marking & index entry.
Pick a reasonable name for BWK awk and use it everywhere (search
for Bell Laboratories)
+ Review use of "Modern xxx systems..."
DONE:
@end ignore
@c %**start of header (This is for running Texinfo on a region.)
@@ -2947,7 +2948,7 @@ last value that counts.
@cindex POSIX @command{awk}, GNU long options and
Each long option for @command{gawk} has a corresponding
-POSIX-style option.
+POSIX-style short option.
The long and short options are
interchangeable in all contexts.
The following list describes options mandated by the POSIX standard:
@@ -18713,7 +18714,7 @@ a specific function). There is no intermediate state analogous to
@cindex variables, private
Library functions often need to have global variables that they can use to
preserve state information between calls to the function---for example,
-@code{getopt}'s variable @code{_opti}
+@code{getopt()}'s variable @code{_opti}
(@pxref{Getopt Function}).
Such variables are called @dfn{private}, since the only functions that need to
use them are the ones in the library.
@@ -18748,7 +18749,7 @@ provide some basis for this discussion.}
As a final note on variable naming, if a function makes global variables
available for use by a main program, it is a good convention to start that
variable's name with a capital letter---for
-example, @code{getopt}'s @code{Opterr} and @code{Optind} variables
+example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
(@pxref{Getopt Function}).
The leading capital letter indicates that it is global, while the fact that
the variable name is not all capital letters indicates that the variable is
@@ -19698,13 +19699,23 @@ how it simplifies writing the main program.
@c fakenode --- for prepinfo
@subheading Advanced Notes: So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?
-@strong{FIXME:} Write this section.
+You are probably wondering, if @code{beginfile()} and @code{endfile()}
+functions can do the job, why does @command{gawk} have
+@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+
+Good question. Normally, if @command{awk} cannot open a file, this
+causes an immediate fatal error. In this case, there is no way for a
+user-defined function to deal with the problem, since the mechanism for
+calling it relies on the file being open and at the first record. Thus,
+the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
+files that cannot be processed. @code{ENDFILE} exists for symmetry,
+and because it provides an easy way to do per-file clean-up processing.
@node Rewind Function
@subsection Rereading the Current File
@cindex files, reading
-Another request for a new built-in function was for a @code{rewind}
+Another request for a new built-in function was for a @code{rewind()}
function that would make it possible to reread the current file.
The requesting user didn't want to have to use @code{getline}
(@pxref{Getline})
@@ -19713,9 +19724,9 @@ inside a loop.
However, as long as you are not in the @code{END} rule, it is
quite easy to arrange to immediately close the current input file
and then start over with it from the top.
-For lack of a better name, we'll call it @code{rewind}:
+For lack of a better name, we'll call it @code{rewind()}:
-@cindex @code{rewind} user-defined function
+@cindex @code{rewind()} user-defined function
@example
@c file eg/lib/rewind.awk
# rewind.awk --- rewind the current file and start over
@@ -19725,10 +19736,10 @@ For lack of a better name, we'll call it @code{rewind}:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# September 2000
-
@c endfile
@end ignore
@c file eg/lib/rewind.awk
+
function rewind( i)
@{
# shift remaining arguments up
@@ -19761,7 +19772,7 @@ the previous @value{SECTION}
to either update @code{ARGIND} on your own
or modify this code as appropriate.
-The @code{rewind} function also relies on the @code{nextfile} keyword
+The @code{rewind()} function also relies on the @code{nextfile} keyword
(@pxref{Nextfile Statement}).
@xref{Nextfile Function},
for a function version of @code{nextfile}.
@@ -19788,14 +19799,15 @@ program:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# October 2000
-
+# December 2010
@c endfile
@end ignore
@c file eg/lib/readable.awk
+
BEGIN @{
for (i = 1; i < ARGC; i++) @{
- if (ARGV[i] ~ /^[A-Za-z_][A-Za-z0-9_]*=.*/ \
- || ARGV[i] == "-")
+ if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
+ || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
continue # assignment or standard input
else if ((getline junk < ARGV[i]) < 0) # unreadable
delete ARGV[i]
@@ -19810,8 +19822,7 @@ BEGIN @{
This works, because the @code{getline} won't be fatal.
Removing the element from @code{ARGV} with @code{delete}
skips the file (since it's no longer in the list).
-
-@c This doesn't handle /dev/stdin etc. Not worth the hassle to mention or fix.
+See also @ref{ARGC and ARGV}.
@node Empty Files
@subsection Checking For Zero-length Files
@@ -19828,7 +19839,7 @@ Using @command{gawk}'s @code{ARGIND} variable
(@pxref{Built-in Variables}), it is possible to detect when an empty
@value{DF} has been skipped. Similar to the library file presented
in @ref{Filetrans Function}, the following library file calls a function named
-@code{zerofile} that the user must provide. The arguments passed are
+@code{zerofile()} that the user must provide. The arguments passed are
the @value{FN} and the position in @code{ARGV} where it was found:
@cindex @code{zerofile.awk} program
@@ -19841,10 +19852,10 @@ the @value{FN} and the position in @code{ARGV} where it was found:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# June 2003
-
@c endfile
@end ignore
@c file eg/lib/zerofile.awk
+
BEGIN @{ Argind = 0 @}
ARGIND > Argind + 1 @{
@@ -19865,7 +19876,7 @@ END @{
The user-level variable @code{Argind} allows the @command{awk} program
to track its progress through @code{ARGV}. Whenever the program detects
that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or
-more empty files were skipped. The action then calls @code{zerofile} for
+more empty files were skipped. The action then calls @code{zerofile()} for
each such file, incrementing @code{Argind} along the way.
The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date
@@ -19874,7 +19885,7 @@ in the normal case.
Finally, the @code{END} rule catches the case of any empty files at
the end of the command-line arguments. Note that the test in the
condition of the @code{for} loop uses the @samp{<=} operator,
-not @code{<}.
+not @samp{<}.
As an exercise, you might consider whether this same problem can
be solved without relying on @command{gawk}'s @code{ARGIND} variable.
@@ -19884,6 +19895,7 @@ an intervening value in @code{ARGV} is a variable assignment.
@ignore
# zerofile2.awk --- same thing, portably
+
BEGIN @{
ARGIND = Argind = 0
for (i = 1; i < ARGC; i++)
@@ -19923,7 +19935,7 @@ END @{
Occasionally, you might not want @command{awk} to process command-line
variable assignments
(@pxref{Assignment Options}).
-In particular, if you have @value{FN}s that contain an @samp{=} character,
+In particular, if you have a @value{FN} that contain an @samp{=} character,
@command{awk} treats the @value{FN} as an assignment, and does not process it.
Some users have suggested an additional command-line option for @command{gawk}
@@ -19941,14 +19953,14 @@ a library file does the trick:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# October 1999
-
@c endfile
@end ignore
@c file eg/lib/noassign.awk
+
function disable_assigns(argc, argv, i)
@{
for (i = 1; i < argc; i++)
- if (argv[i] ~ /^[A-Za-z_][A-Za-z_0-9]*=.*/)
+ if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
argv[i] = ("./" argv[i])
@}
@@ -19992,7 +20004,7 @@ are left alone.
@c STARTOFRANGE clibf
@cindex functions, library, C library
@cindex arguments, processing
-Most utilities on POSIX compatible systems take options, or ``switches,'' on
+Most utilities on POSIX compatible systems take options on
the command line that can be used to change the way a program behaves.
@command{awk} is an example of such a program
(@pxref{Options}).
@@ -20002,20 +20014,20 @@ correctly obey the command-line option. For example, @command{awk}'s
The first occurrence on the command line of either @option{--} or a
string that does not begin with @samp{-} ends the options.
-@cindex @code{getopt} function (C library)
-Modern Unix systems provide a C function named @code{getopt} for processing
+@cindex @code{getopt()} function (C library)
+Modern Unix systems provide a C function named @code{getopt()} for processing
command-line arguments. The programmer provides a string describing the
one-letter options. If an option requires an argument, it is followed in the
-string with a colon. @code{getopt} is also passed the
+string with a colon. @code{getopt()} is also passed the
count and values of the command-line arguments and is called in a loop.
-@code{getopt} processes the command-line arguments for option letters.
+@code{getopt()} processes the command-line arguments for option letters.
Each time around the loop, it returns a single character representing the
next option letter that it finds, or @samp{?} if it finds an invalid option.
When it returns @minus{}1, there are no options left on the command line.
-When using @code{getopt}, options that do not take arguments can be
+When using @code{getopt()}, options that do not take arguments can be
grouped together. Furthermore, options that take arguments require that the
-argument is present. The argument can immediately follow the option letter,
+argument be present. The argument can immediately follow the option letter,
or it can be a separate command-line argument.
Given a hypothetical program that takes
@@ -20035,7 +20047,7 @@ In this example, @option{-acbfoo} indicates that all of the
@option{-a}, @option{-b}, and @option{-c} options were supplied,
and that @samp{foo} is the argument to the @option{-b} option.
-@code{getopt} provides four external variables that the programmer can use:
+@code{getopt()} provides four external variables that the programmer can use:
@table @code
@item optind
@@ -20046,7 +20058,7 @@ nonoption command-line argument can be found.
The string value of the argument to an option.
@item opterr
-Usually @code{getopt} prints an error message when it finds an invalid
+Usually @code{getopt()} prints an error message when it finds an invalid
option. Setting @code{opterr} to zero disables this feature. (An
application might want to print its own error message.)
@@ -20055,7 +20067,7 @@ The letter representing the command-line option.
@c While not usually documented, most versions supply this variable.
@end table
-The following C fragment shows how @code{getopt} might process command-line
+The following C fragment shows how @code{getopt()} might process command-line
arguments for @command{awk}:
@example
@@ -20093,9 +20105,9 @@ As a side point, @command{gawk} actually uses the GNU @code{getopt_long}
function to process both normal and GNU-style long options
(@pxref{Options}).
-The abstraction provided by @code{getopt} is very useful and is quite
+The abstraction provided by @code{getopt()} is very useful and is quite
handy in @command{awk} programs as well. Following is an @command{awk}
-version of @code{getopt}. This function highlights one of the
+version of @code{getopt()}. This function highlights one of the
greatest weaknesses in @command{awk}, which is that it is very poor at
manipulating single characters. Repeated calls to @code{substr()} are
necessary for accessing individual characters
@@ -20106,10 +20118,10 @@ We have left it alone, since using @code{substr()} is more portable.}
The discussion that follows walks through the code a bit at a time:
-@cindex @code{getopt} user-defined function
+@cindex @code{getopt()} user-defined function
@example
@c file eg/lib/getopt.awk
-# getopt.awk --- do C library getopt(3) function in awk
+# getopt.awk --- Do C library getopt(3) function in awk
@c endfile
@ignore
@c file eg/lib/getopt.awk
@@ -20118,10 +20130,10 @@ The discussion that follows walks through the code a bit at a time:
#
# Initial version: March, 1991
# Revised: May, 1993
-
@c endfile
@end ignore
@c file eg/lib/getopt.awk
+
# External variables:
# Optind -- index in ARGV of first nonoption argument
# Optarg -- string value of argument to current option
@@ -20130,7 +20142,7 @@ The discussion that follows walks through the code a bit at a time:
# Returns:
# -1 at end of options
-# ? for unrecognized option
+# "?" for unrecognized option
# <c> a character representing the current option
# Private Data:
@@ -20144,11 +20156,11 @@ what the return values are, what they mean, and any global variables that
are ``private'' to this library function. Such documentation is essential
for any program, and particularly for library functions.
-The @code{getopt} function first checks that it was indeed called with a string of options
-(the @code{options} parameter). If @code{options} has a zero length,
-@code{getopt} immediately returns @minus{}1:
+The @code{getopt()} function first checks that it was indeed called with
+a string of options (the @code{options} parameter). If @code{options}
+has a zero length, @code{getopt()} immediately returns @minus{}1:
-@cindex @code{getopt} user-defined function
+@cindex @code{getopt()} user-defined function
@example
@c file eg/lib/getopt.awk
function getopt(argc, argv, options, thisopt, i)
@@ -20173,13 +20185,13 @@ The next thing to check for is the end of the options. A @option{--}
ends the command-line options, as does any command-line argument that
does not begin with a @samp{-}. @code{Optind} is used to step through
the array of command-line arguments; it retains its value across calls
-to @code{getopt}, because it is a global variable.
+to @code{getopt()}, because it is a global variable.
The regular expression that is used, @code{@w{/^-[^: \t\n\f\r\v\b]/}}, is
perhaps a bit of overkill; it checks for a @samp{-} followed by anything
that is not whitespace and not a colon.
If the current command-line argument does not match this pattern,
-it is not an option, and it ends option processing:
+it is not an option, and it ends option processing. Continuing on:
@example
@c file eg/lib/getopt.awk
@@ -20214,9 +20226,9 @@ obtained with @code{substr()}. It is saved in @code{Optopt} for the main
program to use.
If @code{thisopt} is not in the @code{options} string, then it is an
-invalid option. If @code{Opterr} is nonzero, @code{getopt} prints an error
+invalid option. If @code{Opterr} is nonzero, @code{getopt()} prints an error
message on the standard error that is similar to the message from the C
-version of @code{getopt}.
+version of @code{getopt()}.
Because the option is invalid, it is necessary to skip it and move on to the
next option character. If @code{_opti} is greater than or equal to the
@@ -20225,7 +20237,7 @@ to the next argument, so @code{Optind} is incremented and @code{_opti} is reset
to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
incremented.
-In any case, because the option is invalid, @code{getopt} returns @samp{?}.
+In any case, because the option is invalid, @code{getopt()} returns @code{"?"}.
The main program can examine @code{Optopt} if it needs to know what the
invalid option letter actually is. Continuing on:
@@ -20268,10 +20280,10 @@ current command-line argument, it means this element in @code{argv} is
through being processed, so @code{Optind} is incremented to point to the
next element in @code{argv}. If neither condition is true, then only
@code{_opti} is incremented, so that the next option letter can be processed
-on the next call to @code{getopt}.
+on the next call to @code{getopt()}.
The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
-@code{Opterr} is set to one, since the default behavior is for @code{getopt}
+@code{Opterr} is set to one, since the default behavior is for @code{getopt()}
to print a diagnostic message upon seeing an invalid option. @code{Optind}
is set to one, since there's no reason to look at the program name, which is
in @code{ARGV[0]}:
@@ -20300,7 +20312,7 @@ The rest of the @code{BEGIN} rule is a simple test program. Here is the
result of two sample runs of the test program:
@example
-$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
+$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
@print{} c = <a>, optarg = <>
@print{} c = <c>, optarg = <>
@print{} c = <b>, optarg = <ARG>
@@ -20308,7 +20320,7 @@ $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
@print{} ARGV[3] = <bax>
@print{} ARGV[4] = <-x>
-$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
+$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc}
@print{} c = <a>, optarg = <>
@error{} x -- invalid option
@print{} c = <?>, optarg = <>
@@ -20322,7 +20334,7 @@ the first @option{--} terminates the arguments to @command{awk}, so that it does
not try to interpret the @option{-a}, etc., as its own options.
@quotation NOTE
-After @code{getopt} is through, it is the responsibility of the user level
+After @code{getopt()} is through, it is the responsibility of the user level
code to
clear out all the elements of @code{ARGV} from 1 to @code{Optind},
so that @command{awk} does not try to process the command-line options
@@ -20331,7 +20343,7 @@ as @value{FN}s.
Several of the sample programs presented in
@ref{Sample Programs},
-use @code{getopt} to process their arguments.
+use @code{getopt()} to process their arguments.
@c ENDOFRANGE libfclo
@c ENDOFRANGE flibclo
@c ENDOFRANGE clop
@@ -20360,8 +20372,8 @@ user information associated with the user and group ID numbers. This
user database. @xref{Group Functions},
for a similar suite that retrieves information from the group database.
-@cindex @code{getpwent} function (C library)
-@cindex @code{getpwent} user-defined function
+@cindex @code{getpwent()} function (C library)
+@cindex @code{getpwent()} user-defined function
@cindex users, information about, retrieving
@cindex login information
@cindex account information
@@ -20370,7 +20382,7 @@ for a similar suite that retrieves information from the group database.
The POSIX standard does not define the file where user information is
kept. Instead, it provides the @code{<pwd.h>} header file
and several C language subroutines for obtaining user information.
-The primary function is @code{getpwent}, for ``get password entry.''
+The primary function is @code{getpwent()}, for ``get password entry.''
The ``password'' comes from the original user database file,
@file{/etc/passwd}, which stores user information, along with the
encrypted passwords (hence the name).
@@ -20381,11 +20393,11 @@ directly, this file may not contain complete information about the
system's set of users.@footnote{It is often the case that password
information is stored in a network database.} To be sure you are able to
produce a readable and complete version of the user database, it is necessary
-to write a small C program that calls @code{getpwent}. @code{getpwent}
+to write a small C program that calls @code{getpwent()}. @code{getpwent()}
is defined as returning a pointer to a @code{struct passwd}. Each time it
is called, it returns the next entry in the database. When there are
no more entries, it returns @code{NULL}, the null pointer. When this
-happens, the C program should call @code{endpwent} to close the database.
+happens, the C program should call @code{endpwent()} to close the database.
Following is @command{pwcat}, a C program that ``cats'' the password database:
@c Use old style function header for portability to old systems (SunOS, HP/UX).
@@ -20403,6 +20415,7 @@ Following is @command{pwcat}, a C program that ``cats'' the password database:
/*
* Arnold Robbins, arnold@@skeeve.com, May 1993
* Public Domain
+ * December 2010, move to ANSI C definition for main().
*/
#if HAVE_CONFIG_H
@@ -20426,9 +20439,7 @@ Following is @command{pwcat}, a C program that ``cats'' the password database:
@end ignore
@c file eg/lib/pwcat.c
int
-main(argc, argv)
-int argc;
-char **argv;
+main(int argc, char **argv)
@{
struct passwd *p;
@@ -20465,7 +20476,6 @@ If you don't understand C, don't worry about it.
The output from @command{pwcat} is the user database, in the traditional
@file{/etc/passwd} format of colon-separated fields. The fields are:
-@ignore
@table @asis
@item Login name
The user's login name.
@@ -20475,12 +20485,12 @@ The user's encrypted password. This may not be available on some systems.
@item User-ID
The user's numeric user ID number.
-(On some systems it's a C @code{long}, and not an @code{int()}. Thus
+(On some systems it's a C @code{long}, and not an @code{int}. Thus
we cast it to @code{long} for all cases.)
@item Group-ID
The user's numeric group ID number.
-(Similar comments about @code{long} vs.@: @code{int()} apply here.)
+(Similar comments about @code{long} vs.@: @code{int} apply here.)
@item Full name
The user's full name, and perhaps other information associated with the
@@ -20494,26 +20504,6 @@ The user's login (or ``home'') directory (familiar to shell programmers as
The program that is run when the user logs in. This is usually a
shell, such as Bash.
@end table
-@end ignore
-
-@multitable {Encrypted password} {1234567890123456789012345678901234567890123456}
-@item Login name @tab The user's login name.
-
-@item Encrypted password @tab The user's encrypted password. This may not be available on some systems.
-
-@item User-ID @tab The user's numeric user ID number.
-
-@item Group-ID @tab The user's numeric group ID number.
-
-@item Full name @tab The user's full name, and perhaps other information associated with the
-user.
-
-@item Home directory @tab The user's login (or ``home'') directory (familiar to shell programmers as
-@code{$HOME}).
-
-@item Login shell @tab The program that is run when the user logs in. This is usually a
-shell, such as Bash.
-@end multitable
A few lines representative of @command{pwcat}'s output are as follows:
@@ -20521,7 +20511,7 @@ A few lines representative of @command{pwcat}'s output are as follows:
@cindex Robbins, Arnold
@cindex Robbins, Miriam
@example
-$ pwcat
+$ @kbd{pwcat}
@print{} root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
@print{} nobody:*:65534:65534::/:
@print{} daemon:*:1:1::/:
@@ -20537,10 +20527,7 @@ With that introduction, following is a group of functions for getting user
information. There are several functions here, corresponding to the C
functions of the same names:
-@c Exercise: simplify all these functions that return values.
-@c Answer: return foo[key] returns "" if key not there, no need to check with `in'.
-
-@cindex @code{_pw_init} user-defined function
+@cindex @code{_pw_init()} user-defined function
@example
@c file eg/lib/passwdawk.in
# passwd.awk --- access password file information
@@ -20551,16 +20538,17 @@ functions of the same names:
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
# Revised October 2000
-
+# Revised December 2010
@c endfile
@end ignore
@c file eg/lib/passwdawk.in
+
BEGIN @{
# tailor this to suit your system
_pw_awklib = "/usr/local/libexec/awk/"
@}
-function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw)
+function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
@{
if (_pw_inited)
return
@@ -20582,11 +20570,12 @@ function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw)
close(pwcat)
_pw_count = 0
_pw_inited = 1
- FS = oldfs
if (using_fw)
FIELDWIDTHS = FIELDWIDTHS
else if (using_fpat)
FPAT = FPAT
+ else
+ FS = oldfs
RS = oldrs
$0 = olddol0
@}
@@ -20599,14 +20588,14 @@ The @code{BEGIN} rule sets a private variable to the directory where
routine, we have chosen to put it in @file{/usr/local/libexec/awk};
however, you might want it to be in a different directory on your system.
-The function @code{_pw_init} keeps three copies of the user information
+The function @code{_pw_init()} keeps three copies of the user information
in three associative arrays. The arrays are indexed by username
(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
occurrence (@code{_pw_bycount}).
-The variable @code{_pw_inited} is used for efficiency; @code{_pw_init}
+The variable @code{_pw_inited} is used for efficiency; @code{_pw_init()}
needs only to be called once.
-@cindex @code{getline} command, @code{_pw_init} function
+@cindex @code{getline} command, @code{_pw_init()} function
Because this function uses @code{getline} to read information from
@command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}.
It notes in the variable @code{using_fw} whether field splitting
@@ -20620,66 +20609,62 @@ The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
is @code{"FIELDWIDTHS"} if field splitting is being done with
@code{FIELDWIDTHS}. This makes it possible to restore the correct
field-splitting mechanism later. The test can only be true for
-@command{gawk}. It is false if using @code{FS} or on some other
-@command{awk} implementation.
+@command{gawk}. It is false if using @code{FS} or @code{FPAT},
+or on some other @command{awk} implementation.
-The code that checks for using @code{FPAT} is similar.
+The code that checks for using @code{FPAT}, using @code{using_fpat}
+and @code{PROCINFO["FS"]} is similar.
The main part of the function uses a loop to read database lines, split
the line into fields, and then store the line into each array as necessary.
-When the loop is done, @code{@w{_pw_init}} cleans up by closing the pipeline,
+When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline,
setting @code{@w{_pw_inited}} to one, and restoring @code{FS}
(and @code{FIELDWIDTHS} or @code{FPAT}
if necessary), @code{RS}, and @code{$0}.
The use of @code{@w{_pw_count}} is explained shortly.
-@strong{FIXME: NEXT ED:} All of these functions don't need the ... in ... test. Just
-return the array element, which will be "" if not already there. Duh.
-@cindex @code{getpwnam} function (C library)
-The @code{getpwnam} function takes a username as a string argument. If that
+@cindex @code{getpwnam()} function (C library)
+The @code{getpwnam()} function takes a username as a string argument. If that
user is in the database, it returns the appropriate line. Otherwise, it
-returns the null string:
+relies on the array reference to a non-existant
+element to create the element with the null string as its value:
-@cindex @code{getpwnam} user-defined function
+@cindex @code{getpwnam()} user-defined function
@example
@group
@c file eg/lib/passwdawk.in
function getpwnam(name)
@{
_pw_init()
- if (name in _pw_byname)
- return _pw_byname[name]
- return ""
+ return _pw_byname[name]
@}
@c endfile
@end group
@end example
-@cindex @code{getpwuid} function (C library)
+@cindex @code{getpwuid()} function (C library)
Similarly,
the @code{getpwuid} function takes a user ID number argument. If that
user number is in the database, it returns the appropriate line. Otherwise, it
returns the null string:
-@cindex @code{getpwuid} user-defined function
+@cindex @code{getpwuid()} user-defined function
@example
@c file eg/lib/passwdawk.in
function getpwuid(uid)
@{
_pw_init()
- if (uid in _pw_byuid)
- return _pw_byuid[uid]
- return ""
+ return _pw_byuid[uid]
@}
@c endfile
@end example
-@cindex @code{getpwent} function (C library)
-The @code{getpwent} function simply steps through the database, one entry at
+@cindex @code{getpwent()} function (C library)
+The @code{getpwent()} function simply steps through the database, one entry at
a time. It uses @code{_pw_count} to track its current position in the
@code{_pw_bycount} array:
-@cindex @code{getpwent} user-defined function
+@cindex @code{getpwent()} user-defined function
@example
@c file eg/lib/passwdawk.in
function getpwent()
@@ -20692,11 +20677,11 @@ function getpwent()
@c endfile
@end example
-@cindex @code{endpwent} function (C library)
-The @code{@w{endpwent}} function resets @code{@w{_pw_count}} to zero, so that
-subsequent calls to @code{getpwent} start over again:
+@cindex @code{endpwent()} function (C library)
+The @code{@w{endpwent()}} function resets @code{@w{_pw_count}} to zero, so that
+subsequent calls to @code{getpwent()} start over again:
-@cindex @code{endpwent} user-defined function
+@cindex @code{endpwent()} user-defined function
@example
@c file eg/lib/passwdawk.in
function endpwent()
@@ -20706,23 +20691,24 @@ function endpwent()
@c endfile
@end example
-A conscious design decision in this suite was made that each subroutine calls
-@code{@w{_pw_init}} to initialize the database arrays. The overhead of running
+A conscious design decision in this suite is that each subroutine calls
+@code{@w{_pw_init()}} to initialize the database arrays.
+The overhead of running
a separate process to generate the user database, and the I/O to scan it,
are only incurred if the user's main program actually calls one of these
functions. If this library file is loaded along with a user's program, but
none of the routines are ever called, then there is no extra runtime overhead.
-(The alternative is move the body of @code{@w{_pw_init}} into a
+(The alternative is move the body of @code{@w{_pw_init()}} into a
@code{BEGIN} rule, which always runs @command{pwcat}. This simplifies the
code but runs an extra process that may never be needed.)
-In turn, calling @code{_pw_init} is not too expensive, because the
+In turn, calling @code{_pw_init()} is not too expensive, because the
@code{_pw_inited} variable keeps the program from reading the data more than
once. If you are worried about squeezing every last cycle out of your
@command{awk} program, the check of @code{_pw_inited} could be moved out of
-@code{_pw_init} and duplicated in all the other functions. In practice,
-this is not necessary, since most @command{awk} programs are I/O-bound, and it
-clutters up the code.
+@code{_pw_init()} and duplicated in all the other functions. In practice,
+this is not necessary, since most @command{awk} programs are I/O-bound,
+and such a change would clutter up the code.
The @command{id} program in @ref{Id Program},
uses these functions.
@@ -20743,8 +20729,8 @@ uses these functions.
@c STARTOFRANGE datagr
@cindex database, group, reading
@cindex @code{PROCINFO} array
-@cindex @code{getgrent} function (C library)
-@cindex @code{getgrent} user-defined function
+@cindex @code{getgrent()} function (C library)
+@cindex @code{getgrent()} user-defined function
@cindex groups@comma{} information about
@cindex account information
@cindex group file
@@ -20754,16 +20740,15 @@ Much of the discussion presented in
applies to the group database as well. Although there has traditionally
been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
standard only provides a set of C library routines
-(@code{<grp.h>} and @code{getgrent})
+(@code{<grp.h>} and @code{getgrent()})
for accessing the information.
-Even though this file may exist, it likely does not have
+Even though this file may exist, it may not have
complete information. Therefore, as with the user database, it is necessary
to have a small C program that generates the group database as its output.
-
-@cindex @command{grcat} program
@command{grcat}, a C program that ``cats'' the group database,
is as follows:
+@cindex @command{grcat} program
@example
@c file eg/lib/grcat.c
/*
@@ -20777,6 +20762,7 @@ is as follows:
/*
* Arnold Robbins, arnold@@skeeve.com, May 1993
* Public Domain
+ * December 2010, move to ANSI C definition for main().
*/
/* For OS/2, do nothing. */
@@ -20798,9 +20784,7 @@ int main() { return 0; }
#include <grp.h>
int
-main(argc, argv)
-int argc;
-char **argv;
+main(int argc, char **argv)
@{
struct group *g;
int i;
@@ -20847,18 +20831,18 @@ char **argv;
Each line in the group database represents one group. The fields are
separated with colons and represent the following information:
-@ignore
@table @asis
@item Group Name
-The name of the group.
+The group's name.
@item Group Password
-The encrypted group password. In practice, this field is never used. It is
-usually empty or set to @samp{*}.
+The group's encrypted password. In practice, this field is never used;
+it is usually empty or set to @samp{*}.
@item Group ID Number
-The numeric group ID number. This number is unique within the file.
-(On some systems it's a C @code{long}, and not an @code{int()}. Thus
+The group's numeric group ID number;
+this number must be unique within the file.
+(On some systems it's a C @code{long}, and not an @code{int}. Thus
we cast it to @code{long} for all cases.)
@item Group Member List
@@ -20870,31 +20854,11 @@ for those group ID numbers.
(Note that @code{PROCINFO} is a @command{gawk} extension;
@pxref{Built-in Variables}.)
@end table
-@end ignore
-
-@multitable {Encrypted password} {1234567890123456789012345678901234567890123456}
-@item Group name @tab The group's name.
-
-@item Group password @tab The group's encrypted password. In practice, this field is never used;
-it is usually empty or set to @samp{*}.
-
-@item Group-ID @tab
-The group's numeric group ID number; this number should be unique within the file.
-
-@item Group member list @tab
-A comma-separated list of user names. These users are members of the group.
-Modern Unix systems allow users to be members of several groups
-simultaneously. If your system does, then there are elements
-@code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO}
-for those group ID numbers.
-(Note that @code{PROCINFO} is a @command{gawk} extension;
-@pxref{Built-in Variables}.)
-@end multitable
Here is what running @command{grcat} might produce:
@example
-$ grcat
+$ @kbd{grcat}
@print{} wheel:*:0:arnold
@print{} nogroup:*:65534:
@print{} daemon:*:1:
@@ -20907,8 +20871,8 @@ $ grcat
Here are the functions for obtaining information from the group database.
There are several, modeled after the C library functions of the same names:
-@cindex @code{getline} command, @code{_gr_init} user-defined function
-@cindex @code{_gr_init} user-defined function
+@cindex @code{getline} command, @code{_gr_init()} user-defined function
+@cindex @code{_gr_init()} user-defined function
@example
@c file eg/lib/groupawk.in
# group.awk --- functions for dealing with the group file
@@ -20919,11 +20883,12 @@ There are several, modeled after the C library functions of the same names:
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
# Revised October 2000
-
+# Revised December 2010
@c endfile
@end ignore
@c line break on _gr_init for smallbook
@c file eg/lib/groupawk.in
+
BEGIN \
@{
# Change to suit your system
@@ -20931,7 +20896,7 @@ BEGIN \
@}
function _gr_init( oldfs, oldrs, olddol0, grcat,
- using_fw, n, a, i)
+ using_fw, using_fpat, n, a, i)
@{
if (_gr_inited)
return
@@ -20968,11 +20933,12 @@ function _gr_init( oldfs, oldrs, olddol0, grcat,
close(grcat)
_gr_count = 0
_gr_inited++
- FS = oldfs
if (using_fw)
FIELDWIDTHS = FIELDWIDTHS
else if (using_fpat)
FPAT = FPAT
+ else
+ FS = oldfs
RS = oldrs
$0 = olddol0
@}
@@ -20988,10 +20954,12 @@ These routines follow the same general outline as the user database routines
(@pxref{Passwd Functions}).
The @code{@w{_gr_inited}} variable is used to
ensure that the database is scanned no more than once.
-The @code{@w{_gr_init}} function first saves @code{FS},
+The @code{@w{_gr_init()}} function first saves @code{FS},
@code{RS}, and
@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
scanning the group information.
+It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
+is being used, and to restore the appropriate field splitting mechanism.
The group information is stored is several associative arrays.
The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
@@ -21008,75 +20976,71 @@ tvpeople:*:101:johnny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan
@end example
-For this reason, @code{_gr_init} looks to see if a group name or
+For this reason, @code{_gr_init()} looks to see if a group name or
group ID number is already seen. If it is, then the user names are
simply concatenated onto the previous list of users. (There is actually a
subtle problem with the code just presented. Suppose that
the first time there were no names. This code adds the names with
a leading comma. It also doesn't check that there is a @code{$4}.)
-Finally, @code{_gr_init} closes the pipeline to @command{grcat}, restores
+Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0},
initializes @code{_gr_count} to zero
(it is used later), and makes @code{_gr_inited} nonzero.
-@cindex @code{getgrnam} function (C library)
-The @code{getgrnam} function takes a group name as its argument, and if that
-group exists, it is returned. Otherwise, @code{getgrnam} returns the null
-string:
+@cindex @code{getgrnam()} function (C library)
+The @code{getgrnam()} function takes a group name as its argument, and if that
+group exists, it is returned.
+Otherwise, it
+relies on the array reference to a non-existant
+element to create the element with the null string as its value:
-@cindex @code{getgrnam} user-defined function
+@cindex @code{getgrnam()} user-defined function
@example
@c file eg/lib/groupawk.in
function getgrnam(group)
@{
_gr_init()
- if (group in _gr_byname)
- return _gr_byname[group]
- return ""
+ return _gr_byname[group]
@}
@c endfile
@end example
-@cindex @code{getgrgid} function (C library)
-The @code{getgrgid} function is similar, it takes a numeric group ID and
+@cindex @code{getgrgid()} function (C library)
+The @code{getgrgid()} function is similar; it takes a numeric group ID and
looks up the information associated with that group ID:
-@cindex @code{getgrgid} user-defined function
+@cindex @code{getgrgid()} user-defined function
@example
@c file eg/lib/groupawk.in
function getgrgid(gid)
@{
_gr_init()
- if (gid in _gr_bygid)
- return _gr_bygid[gid]
- return ""
+ return _gr_bygid[gid]
@}
@c endfile
@end example
-@cindex @code{getgruser} function (C library)
-The @code{getgruser} function does not have a C counterpart. It takes a
+@cindex @code{getgruser()} function (C library)
+The @code{getgruser()} function does not have a C counterpart. It takes a
user name and returns the list of groups that have the user as a member:
-@cindex @code{getgruser} function, user-defined
+@cindex @code{getgruser()} function, user-defined
@example
@c file eg/lib/groupawk.in
function getgruser(user)
@{
_gr_init()
- if (user in _gr_groupsbyuser)
- return _gr_groupsbyuser[user]
- return ""
+ return _gr_groupsbyuser[user]
@}
@c endfile
@end example
-@cindex @code{getgrent} function (C library)
-The @code{getgrent} function steps through the database one entry at a time.
+@cindex @code{getgrent()} function (C library)
+The @code{getgrent()} function steps through the database one entry at a time.
It uses @code{_gr_count} to track its position in the list:
-@cindex @code{getgrent} user-defined function
+@cindex @code{getgrent()} user-defined function
@example
@c file eg/lib/groupawk.in
function getgrent()
@@ -21090,11 +21054,11 @@ function getgrent()
@end example
@c ENDOFRANGE clibf
-@cindex @code{endgrent} function (C library)
-The @code{endgrent} function resets @code{_gr_count} to zero so that @code{getgrent} can
+@cindex @code{endgrent()} function (C library)
+The @code{endgrent()} function resets @code{_gr_count} to zero so that @code{getgrent()} can
start over again:
-@cindex @code{endgrent} user-defined function
+@cindex @code{endgrent()} user-defined function
@example
@c file eg/lib/groupawk.in
function endgrent()
@@ -21104,10 +21068,10 @@ function endgrent()
@c endfile
@end example
-As with the user database routines, each function calls @code{_gr_init} to
+As with the user database routines, each function calls @code{_gr_init()} to
initialize the arrays. Doing so only incurs the extra overhead of running
@command{grcat} if these functions are used (as opposed to moving the body of
-@code{_gr_init} into a @code{BEGIN} rule).
+@code{_gr_init()} into a @code{BEGIN} rule).
Most of the work is in scanning the database and building the various
associative arrays. The functions that the user calls are themselves very
@@ -21261,7 +21225,7 @@ character.
Suppress printing of lines that do not contain the field delimiter.
@end table
-The @command{awk} implementation of @command{cut} uses the @code{getopt} library
+The @command{awk} implementation of @command{cut} uses the @code{getopt()} library
function (@pxref{Getopt Function})
and the @code{join()} library function
(@pxref{Join Function}).
@@ -21322,7 +21286,7 @@ screen.
Next comes a @code{BEGIN} rule that parses the command-line options.
It sets @code{FS} to a single TAB character, because that is @command{cut}'s
default field separator. The output field separator is also set to be the
-same as the input field separator. Then @code{getopt} is used to step
+same as the input field separator. Then @code{getopt()} is used to step
through the command-line options. Exactly one of the variables
@code{by_fields} or @code{by_chars} is set to true, to indicate that
processing should be done by fields or by characters, respectively.
@@ -21369,7 +21333,7 @@ Special care is taken when the field delimiter is a space. Using
a single space (@code{@w{" "}}) for the value of @code{FS} is
incorrect---@command{awk} would separate fields with runs of spaces,
tabs, and/or newlines, and we want them to be separated with individual
-spaces. Also remember that after @code{getopt} is through
+spaces. Also remember that after @code{getopt()} is through
(as described in @ref{Getopt Function}),
we have to
clear out all the elements of @code{ARGV} from 1 to @code{Optind},
@@ -21600,13 +21564,13 @@ Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
option is to allow patterns that start with a @samp{-}.
@end table
-This version uses the @code{getopt} library function
+This version uses the @code{getopt()} library function
(@pxref{Getopt Function})
and the file transition library program
(@pxref{Filetrans Function}).
The program begins with a descriptive comment and then a @code{BEGIN} rule
-that processes the command-line arguments with @code{getopt}. The @option{-i}
+that processes the command-line arguments with @code{getopt()}. The @option{-i}
(ignore case) option is particularly easy with @command{gawk}; we just use the
@code{IGNORECASE} built-in variable
(@pxref{Built-in Variables}):
@@ -22306,14 +22270,14 @@ and the @code{join()} library function
The program begins with a @code{usage} function and then a brief outline of
the options and their meanings in a comment.
The @code{BEGIN} rule deals with the command-line arguments and options. It
-uses a trick to get @code{getopt} to handle options of the form @samp{-25},
+uses a trick to get @code{getopt()} to handle options of the form @samp{-25},
treating such an option as the option letter @samp{2} with an argument of
@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks
like a number), @code{Optarg} is
concatenated with the option digit and then the result is added to zero to make
it into a number. If there is only one digit in the option, then
@code{Optarg} is not needed. In this case, @code{Optind} must be decremented so that
-@code{getopt} processes it next time. This code is admittedly a bit
+@code{getopt()} processes it next time. This code is admittedly a bit
tricky.
If no options are supplied, then the default is taken, to print both
@@ -22548,7 +22512,7 @@ since @command{awk} does a lot of the work for us; it splits lines into
words (i.e., fields) and counts them, it counts lines (i.e., records),
and it can easily tell us how long a line is.
-This uses the @code{getopt} library function
+This uses the @code{getopt()} library function
(@pxref{Getopt Function})
and the file-transition functions
(@pxref{Filetrans Function}).