diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2011-05-04 23:39:43 +0300 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2011-05-04 23:39:43 +0300 |
commit | 1387c9a6046ba3a3e9ce8343daac42e1086efa6b (patch) | |
tree | d203a0f14aae778c0e907f1fca66c12fee3346e1 /doc/gawk.texi | |
parent | f2b825c82aa6b0b2eabed734244148206f3c01a5 (diff) | |
download | egawk-1387c9a6046ba3a3e9ce8343daac42e1086efa6b.tar.gz egawk-1387c9a6046ba3a3e9ce8343daac42e1086efa6b.tar.bz2 egawk-1387c9a6046ba3a3e9ce8343daac42e1086efa6b.zip |
Revamp array sorting.
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 1710 |
1 files changed, 916 insertions, 794 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 60cfd1d7..49229d19 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -306,394 +306,437 @@ particular records in a file and perform operations upon them. * Index:: Concept and Variable Index. @detailmenu -* History:: The history of @command{gawk} and - @command{awk}. -* Names:: What name to use to find @command{awk}. -* This Manual:: Using this @value{DOCUMENT}. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - @value{DOCUMENT}. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run @command{gawk} programs; - includes command-line syntax. -* One-shot:: Running a short throwaway @command{awk} - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent @command{awk} programs in - files. -* Executable Scripts:: Making self-contained @command{awk} - programs. -* Comments:: Adding documentation to @command{gawk} - programs. -* Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in Windows Batch Files. -* Sample Data Files:: Sample data files for use in the - @command{awk} programs illustrated in this - @value{DOCUMENT}. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of @command{awk}. -* When:: When to use @command{gawk} and when to use - other things. -* Command Line:: How to run @command{awk}. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* Naming Standard Input:: How to specify standard input with other - files. -* Environment Variables:: The environment variables @command{gawk} - uses. -* AWKPATH Variable:: Searching directories for @command{awk} - programs. -* Other Environment Variables:: The environment variables. -* Exit Status:: @command{gawk}'s exit status. -* Include Files:: Including other files into your program. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write nonprinting characters. -* Regexp Operators:: Regular Expression Operators. -* Bracket Expressions:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Locales:: How the locale affects things. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Nonconstant Fields:: Nonconstant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Default Field Splitting:: How fields are normally separated. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting @code{FS} from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the @code{getline} function. -* Plain Getline:: Using @code{getline} with no arguments. -* Getline/Variable:: Using @code{getline} into a variable. -* Getline/File:: Using @code{getline} from a file. -* Getline/Variable/File:: Using @code{getline} into a variable from a - file. -* Getline/Pipe:: Using @code{getline} from a pipe. -* Getline/Variable/Pipe:: Using @code{getline} into a variable from a - pipe. -* Getline/Coprocess:: Using @code{getline} from a coprocess. -* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a - coprocess. -* Getline Notes:: Important things to know about - @code{getline}. -* Getline Summary:: Summary of @code{getline} Variants. -* Command line directories:: What happens if you put a directory on the - command line. -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - @code{print}. -* Printf:: The @code{printf} statement. -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in @command{gawk}. - @command{gawk} allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Values:: Constants, Variables, and Regular - Expressions. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Nondecimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* All Operators:: @command{gawk}'s operators. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values and Conditions:: Testing for true and false. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with @samp{<}, etc. -* Variable Typing:: String type versus numeric type. -* Comparison Operators:: The comparison operators. -* POSIX String Comparison:: String comparison with POSIX rules. -* Boolean Ops:: Combining comparison expressions using - boolean operators @samp{||} (``or''), - @samp{&&} (``and'') and @samp{!} (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* Empty:: The empty pattern, which matches every - record. -* BEGINFILE/ENDFILE:: Two special patterns for advanced control. -* Using Shell Variables:: How to use shell variables with - @command{awk}. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some @command{awk} - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Switch Statement:: Switch/case evaluation for conditional - execution of statements based on a value. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @command{awk}. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control @command{awk}. -* Auto-set:: Built-in variables where @command{awk} - gives you information. -* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. -* Array Basics:: The basics of arrays. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. It - loops through the indices of an array's - existing elements. -* Controlling Scanning:: Controlling the order in which arrays - are scanned. -* Delete:: The @code{delete} statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - @command{awk}. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. -* Array Sorting:: Sorting array values and indices. -* Arrays of Arrays:: True multidimensional arrays. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - @code{int()}, @code{sin()} and - @code{rand()}. -* String Functions:: Functions for string manipulation, such as - @code{split()}, @code{match()} and - @code{sprintf()}. -* Gory Details:: More than you want to know about @samp{\} - and @samp{&} with @code{sub()}, - @code{gsub()}, and @code{gensub()}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* Type Functions:: Functions for type information. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Calling A Function:: Don't use spaces. -* Variable Scope:: Controlling variable scope. -* Pass By Value/Reference:: Passing parameters. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using @command{gawk} for network - programming. -* Profiling:: Profiling your @command{awk} programs. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Strtonum Function:: A replacement for the built-in - @code{strtonum()} function. -* Assert Function:: A function for assertions in @command{awk} - programs. -* Round Function:: A function for rounding if @code{sprintf()} - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Empty Files:: Checking for zero-length files. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The @command{cut} utility. -* Egrep Program:: The @command{egrep} utility. -* Id Program:: The @command{id} utility. -* Split Program:: The @command{split} utility. -* Tee Program:: The @command{tee} utility. -* Uniq Program:: The @command{uniq} utility. -* Wc Program:: The @command{wc} utility. -* Miscellaneous Programs:: Some interesting @command{awk} programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the @command{tr} - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for @command{awk} that includes - files. -* Anagram Program:: Finding anagrams from a dictionary. -* Signature Program:: People do amazing things with too much time - on their hands. -* Debugging:: Introduction to @command{dgawk}. -* Debugging Concepts:: Debugging In General. -* Debugging Terms:: Additional Debugging Concepts. -* Awk Debugging:: Awk Debugging. -* Sample dgawk session:: Sample @command{dgawk} session. -* dgawk invocation:: @command{dgawk} Invocation. -* Finding The Bug:: Finding The Bug. -* List of Debugger Commands:: Main @command{dgawk} Commands. -* Breakpoint Control:: Control of breakpoints. -* Dgawk Execution Control:: Control of execution. -* Viewing And Changing Data:: Viewing and changing data. -* Dgawk Stack:: Dealing with the stack. -* Dgawk Info:: Obtaining information about the program and - the debugger state. -* Miscellaneous Dgawk Commands:: Miscellaneous Commands. -* Readline Support:: Readline Support. -* Dgawk Limitations:: Limitations and future plans. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's - version of @command{awk}. -* POSIX/GNU:: The extensions in @command{gawk} not in - POSIX @command{awk}. -* Contributors:: The major contributors to @command{gawk}. -* Common Extensions:: Common Extensions Summary. -* Gawk Distribution:: What is in the @command{gawk} distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing @command{gawk} under various - versions of Unix. -* Quick Installation:: Compiling @command{gawk} under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling @command{gawk} on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, - Windows32, and OS/2. -* PC Testing:: Testing @command{gawk} on PC - Operating Systems. -* PC Using:: Running @command{gawk} on MS-DOS, Windows32 - and OS/2. -* Cygwin:: Building and running @command{gawk} for - Cygwin. -* MSYS:: Using @command{gawk} In The MSYS - Environment. -* VMS Installation:: Installing @command{gawk} on VMS. -* VMS Compilation:: How to compile @command{gawk} under VMS. -* VMS Installation Details:: How to install @command{gawk} under VMS. -* VMS Running:: How to run @command{gawk} under VMS. -* VMS Old Gawk:: An old version comes with some VMS systems. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available @command{awk} - implementations. -* Compatibility Mode:: How to disable certain @command{gawk} - extensions. -* Additions:: Making Additions To @command{gawk}. -* Accessing The Source:: Accessing the Git repository. -* Adding Code:: Adding code to the main body of - @command{gawk}. -* New Ports:: Porting @command{gawk} to a new operating - system. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Internals:: A brief look at some @command{gawk} - internals. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* History:: The history of @command{gawk} and + @command{awk}. +* Names:: What name to use to find @command{awk}. +* This Manual:: Using this @value{DOCUMENT}. Includes + sample input files that you can use. +* Conventions:: Typographical Conventions. +* Manual History:: Brief history of the GNU project and + this @value{DOCUMENT}. +* How To Contribute:: Helping to save the world. +* Acknowledgments:: Acknowledgments. +* Running gawk:: How to run @command{gawk} programs; + includes command-line syntax. +* One-shot:: Running a short throwaway @command{awk} + program. +* Read Terminal:: Using no input files (input from + terminal instead). +* Long:: Putting permanent @command{awk} + programs in files. +* Executable Scripts:: Making self-contained @command{awk} + programs. +* Comments:: Adding documentation to @command{gawk} + programs. +* Quoting:: More discussion of shell quoting + issues. +* DOS Quoting:: Quoting in Windows Batch Files. +* Sample Data Files:: Sample data files for use in the + @command{awk} programs illustrated in + this @value{DOCUMENT}. +* Very Simple:: A very simple example. +* Two Rules:: A less simple one-line example using + two rules. +* More Complex:: A more complex example. +* Statements/Lines:: Subdividing or combining statements + into lines. +* Other Features:: Other Features of @command{awk}. +* When:: When to use @command{gawk} and when to + use other things. +* Command Line:: How to run @command{awk}. +* Options:: Command-line options and their + meanings. +* Other Arguments:: Input file names and variable + assignments. +* Naming Standard Input:: How to specify standard input with + other files. +* Environment Variables:: The environment variables + @command{gawk} uses. +* AWKPATH Variable:: Searching directories for @command{awk} + programs. +* Other Environment Variables:: The environment variables. +* Exit Status:: @command{gawk}'s exit status. +* Include Files:: Including other files into your + program. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. +* Regexp Usage:: How to Use Regular Expressions. +* Escape Sequences:: How to write nonprinting characters. +* Regexp Operators:: Regular Expression Operators. +* Bracket Expressions:: What can go between @samp{[...]}. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. +* Leftmost Longest:: How much text matches. +* Computed Regexps:: Using Dynamic Regexps. +* Locales:: How the locale affects things. +* Records:: Controlling how data is split into + records. +* Fields:: An introduction to fields. +* Nonconstant Fields:: Nonconstant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change + it. +* Default Field Splitting:: How fields are normally separated. +* Regexp Field Splitting:: Using regexps as the field separator. +* Single Character Fields:: Making each character a separate field. +* Command Line Field Separator:: Setting @code{FS} from the + command-line. +* Field Splitting Summary:: Some final points and a summary table. +* Constant Size:: Reading constant width data. +* Splitting By Content:: Defining Fields By Content +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program + control using the @code{getline} + function. +* Plain Getline:: Using @code{getline} with no arguments. +* Getline/Variable:: Using @code{getline} into a variable. +* Getline/File:: Using @code{getline} from a file. +* Getline/Variable/File:: Using @code{getline} into a variable + from a file. +* Getline/Pipe:: Using @code{getline} from a pipe. +* Getline/Variable/Pipe:: Using @code{getline} into a variable + from a pipe. +* Getline/Coprocess:: Using @code{getline} from a coprocess. +* Getline/Variable/Coprocess:: Using @code{getline} into a variable + from a coprocess. +* Getline Notes:: Important things to know about + @code{getline}. +* Getline Summary:: Summary of @code{getline} Variants. +* Command line directories:: What happens if you put a directory on + the command line. +* Print:: The @code{print} statement. +* Print Examples:: Simple examples of @code{print} + statements. +* Output Separators:: The output separators and how to change + them. +* OFMT:: Controlling Numeric Output With + @code{print}. +* Printf:: The @code{printf} statement. +* Basic Printf:: Syntax of the @code{printf} statement. +* Control Letters:: Format-control letters. +* Format Modifiers:: Format-specification modifiers. +* Printf Examples:: Several examples. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in + @command{gawk}. @command{gawk} allows + access to inherited file descriptors. +* Special FD:: Special files for I/O. +* Special Network:: Special files for network + communications. +* Special Caveats:: Things to watch out for. +* Close Files And Pipes:: Closing Input and Output Files and + Pipes. +* Values:: Constants, Variables, and Regular + Expressions. +* Constants:: String, numeric and regexp constants. +* Scalar Constants:: Numeric and string constants. +* Nondecimal-numbers:: What are octal and hex numbers. +* Regexp Constants:: Regular Expression constants. +* Using Constant Regexps:: When and how to use a regexp constant. +* Variables:: Variables give names to values for + later use. +* Using Variables:: Using variables in your programs. +* Assignment Options:: Setting variables on the command-line + and a summary of command-line syntax. + This is an advanced method of input. +* Conversion:: The conversion of strings to numbers + and vice versa. +* All Operators:: @command{gawk}'s operators. +* Arithmetic Ops:: Arithmetic operations (@samp{+}, + @samp{-}, etc.) +* Concatenation:: Concatenating strings. +* Assignment Ops:: Changing the value of a variable or a + field. +* Increment Ops:: Incrementing the numeric value of a + variable. +* Truth Values and Conditions:: Testing for true and false. +* Truth Values:: What is ``true'' and what is ``false''. +* Typing and Comparison:: How variables acquire types and how + this affects comparison of numbers and + strings with @samp{<}, etc. +* Variable Typing:: String type versus numeric type. +* Comparison Operators:: The comparison operators. +* POSIX String Comparison:: String comparison with POSIX rules. +* Boolean Ops:: Combining comparison expressions using + boolean operators @samp{||} (``or''), + @samp{&&} (``and'') and @samp{!} + (``not''). +* Conditional Exp:: Conditional expressions select between + two subexpressions under control of a + third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. +* Pattern Overview:: What goes into a pattern. +* Regexp Patterns:: Using regexps as patterns. +* Expression Patterns:: Any expression can be used as a + pattern. +* Ranges:: Pairs of patterns specify record + ranges. +* BEGIN/END:: Specifying initialization and cleanup + rules. +* Using BEGIN/END:: How and why to use BEGIN/END rules. +* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. +* BEGINFILE/ENDFILE:: Two special patterns for advanced + control. +* Empty:: The empty pattern, which matches every + record. +* Using Shell Variables:: How to use shell variables with + @command{awk}. +* Action Overview:: What goes into an action. +* Statements:: Describes the various control + statements in detail. +* If Statement:: Conditionally execute some + @command{awk} statements. +* While Statement:: Loop until some condition is satisfied. +* Do Statement:: Do specified action while looping until + some condition is satisfied. +* For Statement:: Another looping statement, that + provides initialization and increment + clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a + value. +* Break Statement:: Immediately exit the innermost + enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input + record. +* Nextfile Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of @command{awk}. +* Built-in Variables:: Summarizes the built-in variables. +* User-modified:: Built-in variables that you change to + control @command{awk}. +* Auto-set:: Built-in variables where @command{awk} + gives you information. +* ARGC and ARGV:: Ways to use @code{ARGC} and + @code{ARGV}. +* Array Basics:: The basics of arrays. +* Array Intro:: Introduction to Arrays +* Reference to Elements:: How to examine one element of an array. +* Assigning Elements:: How to change an element of an array. +* Array Example:: Basic Example of an Array +* Scanning an Array:: A variation of the @code{for} + statement. It loops through the indices + of an array's existing elements. +* Delete:: The @code{delete} statement removes an + element from an array. +* Numeric Array Subscripts:: How to use numbers as subscripts in + @command{awk}. +* Uninitialized Subscripts:: Using Uninitialized variables as + subscripts. +* Multi-dimensional:: Emulating multidimensional arrays in + @command{awk}. +* Multi-scanning:: Scanning multidimensional arrays. +* Arrays of Arrays:: True multidimensional arrays. +* Built-in:: Summarizes the built-in functions. +* Calling Built-in:: How to call built-in functions. +* Numeric Functions:: Functions that work with numbers, + including @code{int()}, @code{sin()} + and @code{rand()}. +* String Functions:: Functions for string manipulation, such + as @code{split()}, @code{match()} and + @code{sprintf()}. +* Gory Details:: More than you want to know about + @samp{\} and @samp{&} with + @code{sub()}, @code{gsub()}, and + @code{gensub()}. +* I/O Functions:: Functions for files and shell commands. +* Time Functions:: Functions for dealing with timestamps. +* Bitwise Functions:: Functions for bitwise operations. +* Type Functions:: Functions for type information. +* I18N Functions:: Functions for string translation. +* User-defined:: Describes User-defined functions in + detail. +* Definition Syntax:: How to write definitions and what they + mean. +* Function Example:: An example function definition and what + it does. +* Function Caveats:: Things to watch out for. +* Calling A Function:: Don't use spaces. +* Variable Scope:: Controlling variable scope. +* Pass By Value/Reference:: Passing parameters. +* Return Statement:: Specifying the value a function + returns. +* Dynamic Typing:: How variable types can change at + runtime. +* Indirect Calls:: Choosing the function to call at + runtime. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Controlling Scanning With A Function:: Using a function to control scanning. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +* Array Sorting Functions:: How to use @code{asort()} and + @code{asorti()}. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using @command{gawk} for network + programming. +* Profiling:: Profiling your @command{awk} programs. +* Library Names:: How to best name private global + variables in library functions. +* General Functions:: Functions that are of general use. +* Strtonum Function:: A replacement for the built-in + @code{strtonum()} function. +* Assert Function:: A function for assertions in + @command{awk} programs. +* Round Function:: A function for rounding if + @code{sprintf()} does not do it + correctly. +* Cliff Random Function:: The Cliff Random Number Generator. +* Ordinal Functions:: Functions for using characters as + numbers and vice versa. +* Join Function:: A function to join an array into a + string. +* Gettimeofday Function:: A function to get formatted times. +* Data File Management:: Functions for managing command-line + data files. +* Filetrans Function:: A function for handling data file + transitions. +* Rewind Function:: A function for rereading the current + file. +* File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. +* Ignoring Assigns:: Treating assignments as file names. +* Getopt Function:: A function for processing command-line + arguments. +* Passwd Functions:: Functions for getting user information. +* Group Functions:: Functions for getting group + information. +* Walking Arrays:: A function to walk arrays of arrays. +* Running Examples:: How to run these examples. +* Clones:: Clones of common utilities. +* Cut Program:: The @command{cut} utility. +* Egrep Program:: The @command{egrep} utility. +* Id Program:: The @command{id} utility. +* Split Program:: The @command{split} utility. +* Tee Program:: The @command{tee} utility. +* Uniq Program:: The @command{uniq} utility. +* Wc Program:: The @command{wc} utility. +* Miscellaneous Programs:: Some interesting @command{awk} + programs. +* Dupword Program:: Finding duplicated words in a document. +* Alarm Program:: An alarm clock. +* Translate Program:: A program similar to the @command{tr} + utility. +* Labels Program:: Printing mailing labels. +* Word Sorting:: A program to produce a word usage + count. +* History Sorting:: Eliminating duplicate entries from a + history file. +* Extract Program:: Pulling out programs from Texinfo + source files. +* Simple Sed:: A Simple Stream Editor. +* Igawk Program:: A wrapper for @command{awk} that + includes files. +* Anagram Program:: Finding anagrams from a dictionary. +* Signature Program:: People do amazing things with too much + time on their hands. +* Debugging:: Introduction to @command{dgawk}. +* Debugging Concepts:: Debugging In General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample dgawk session:: Sample @command{dgawk} session. +* dgawk invocation:: @command{dgawk} Invocation. +* Finding The Bug:: Finding The Bug. +* List of Debugger Commands:: Main @command{dgawk} Commands. +* Breakpoint Control:: Control of breakpoints. +* Dgawk Execution Control:: Control of execution. +* Viewing And Changing Data:: Viewing and changing data. +* Dgawk Stack:: Dealing with the stack. +* Dgawk Info:: Obtaining information about the program + and the debugger state. +* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +* Readline Support:: Readline Support. +* Dgawk Limitations:: Limitations and future plans. +* V7/SVR3.1:: The major changes between V7 and System + V Release 3.1. +* SVR4:: Minor changes between System V Releases + 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* BTL:: New features from Brian Kernighan's + version of @command{awk}. +* POSIX/GNU:: The extensions in @command{gawk} not in + POSIX @command{awk}. +* Common Extensions:: Common Extensions Summary. +* Contributors:: The major contributors to + @command{gawk}. +* Gawk Distribution:: What is in the @command{gawk} + distribution. +* Getting:: How to get the distribution. +* Extracting:: How to extract the distribution. +* Distribution contents:: What is in the distribution. +* Unix Installation:: Installing @command{gawk} under various + versions of Unix. +* Quick Installation:: Compiling @command{gawk} under Unix. +* Additional Configuration Options:: Other compile-time options. +* Configuration Philosophy:: How it's all supposed to work. +* Non-Unix Installation:: Installation on Other Operating + Systems. +* PC Installation:: Installing and Compiling @command{gawk} + on MS-DOS and OS/2. +* PC Binary Installation:: Installing a prepared distribution. +* PC Compiling:: Compiling @command{gawk} for MS-DOS, + Windows32, and OS/2. +* PC Testing:: Testing @command{gawk} on PC systems. +* PC Using:: Running @command{gawk} on MS-DOS, + Windows32 and OS/2. +* Cygwin:: Building and running @command{gawk} for + Cygwin. +* MSYS:: Using @command{gawk} In The MSYS + Environment. +* VMS Installation:: Installing @command{gawk} on VMS. +* VMS Compilation:: How to compile @command{gawk} under + VMS. +* VMS Installation Details:: How to install @command{gawk} under + VMS. +* VMS Running:: How to run @command{gawk} under VMS. +* VMS Old Gawk:: An old version comes with some VMS + systems. +* Bugs:: Reporting Problems and Bugs. +* Other Versions:: Other freely available @command{awk} + implementations. +* Compatibility Mode:: How to disable certain @command{gawk} + extensions. +* Additions:: Making Additions To @command{gawk}. +* Accessing The Source:: Accessing the Git repository. +* Adding Code:: Adding code to the main body of + @command{gawk}. +* New Ports:: Porting @command{gawk} to a new + operating system. +* Dynamic Extensions:: Adding new built-in functions to + @command{gawk}. +* Internals:: A brief look at some @command{gawk} + internals. +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Future Extensions:: New features that may be implemented + one day. +* Basic High Level:: The high level view. +* Basic Data Typing:: A very quick intro to data types. +* Floating Point Issues:: Stuff to know about floating-point + numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. @end detailmenu @end menu @@ -13036,7 +13079,6 @@ same @command{awk} program. * Uninitialized Subscripts:: Using Uninitialized variables as subscripts. * Multi-dimensional:: Emulating multidimensional arrays in @command{awk}. -* Array Sorting:: Sorting array values and indices. * Arrays of Arrays:: True multidimensional arrays. @end menu @@ -13378,11 +13420,6 @@ END @{ @cindex elements in arrays, scanning @cindex arrays, scanning -@menu -* Controlling Scanning:: Controlling the order in which arrays are scanned. -* Controlling Scanning With A Function:: Using a function to control scanning. -@end menu - In programs that use arrays, it is often necessary to use a loop that executes once for each element of an array. In other languages, where arrays are contiguous and indices are limited to positive integers, @@ -13447,286 +13484,14 @@ the loop body; it is not predictable whether the @code{for} loop will reach them. Similarly, changing @var{var} inside the loop may produce strange results. It is best to avoid such things. -@node Controlling Scanning -@subsubsection Controlling Array Scanning Order - As an extension, @command{gawk} makes it possible for you to loop over the elements of an array in order, based on the value of @code{PROCINFO["sorted_in"]} (@pxref{Auto-set}). -Several sorting options are available: - -@table @samp -@item ascending index string -Order by indices compared as strings; this is the most basic sort. -(Internally, array indices are always strings, so with @samp{a[2*5] = 1} -the index is actually @code{"10"} rather than numeric 10.) - -@item ascending index number -Order by indices but force them to be treated as numbers in the process. -Any index with non-numeric value will end up positioned as if it were zero. - -@item ascending value string -Order by element values rather than by indices. Scalar values are -compared as strings. Subarrays, if present, come out last. - -@item ascending value number -Order by values but force scalar values to be treated as numbers -for the purpose of comparison. If there are subarrays, those appear -at the end of the sorted list. - -@item descending index string -Reverse order from the most basic sort. - -@item descending index number -Numeric indices ordered from high to low. - -@item descending value string -Element values, treated as strings, ordered from high to low. Subarrays, if present, -come out first. - -@item descending value number -Element values, treated as numbers, ordered from high to low. Subarrays, if present, -come out first. - -@item unsorted -Array elements are processed in arbitrary order, the normal @command{awk} -behavior. You can also get the normal behavior by just -deleting the @code{"sorted_in"} item from the @code{PROCINFO} array, if -it previously had a value assigned to it. -@end table - -The array traversal order is determined before the @code{for} loop -starts to run. Changing @code{PROCINFO["sorted_in"]} in the loop body -will not affect the loop. - -Portions of the sort specification string may be truncated or omitted. -The default is @samp{ascending} for direction, @samp{index} for sort key type, -and @samp{string} for comparison mode. This implies that one can -simply assign the empty string, "", instead of "ascending index string" to -@code{PROCINFO["sorted_in"]} for the same effect. - -For example: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{ a[4] = 4} -> @kbd{ a[3] = 3} -> @kbd{ for (i in a)} -> @kbd{ print i, a[i]} -> @kbd{@}'} -@print{} 4 4 -@print{} 3 3 -$ @kbd{gawk 'BEGIN @{} -> @kbd{ PROCINFO["sorted_in"] = "asc index"} -> @kbd{ a[4] = 4} -> @kbd{ a[3] = 3} -> @kbd{ for (i in a)} -> @kbd{ print i, a[i]} -> @kbd{@}'} -@print{} 3 3 -@print{} 4 4 -@end example - -When sorting an array by element values, if a value happens to be -a subarray then it is considered to be greater than any string or -numeric value, regardless of what the subarray itself contains, -and all subarrays are treated as being equal to each other. Their -order relative to each other is determined by their index strings. +This is an advanced feature, so discussion of it is delayed +until @ref{Controlling Array Traversal}. -@node Controlling Scanning With A Function -@subsubsection Controlling Array Scanning Order With a User-defined Function - -The value of @code{PROCINFO["sorted_in"]} can also be a function name. -This lets you traverse an array based on any custom criterion. -The array elements are ordered according to the return value of this -function. This comparison function should be defined with at least -four arguments: - -@example -function comp_func(i1, v1, i2, v2) -@{ - @var{compare elements 1 and 2 in some fashion} - @var{return < 0; 0; or > 0} -@} -@end example - -Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} -are the corresponding values of the two elements being compared. -Either @var{v1} or @var{v2}, or both, can be arrays if the array being -traversed contains subarrays as values. The three possible return values -are interpreted this way: - -@itemize @bullet -@item -If the return value of @code{comp_func(i1, v1, i2, v2)} is less than zero, -index @var{i1} comes before index @var{i2} during loop traversal. - -@item -If @code{comp_func(i1, v1, i2, v2)} returns zero, @var{i1} and @var{i2} -come together but the relative order with respect to each other is undefined. - -@item -If the return value of @code{comp_func(i1, v1, i2, v2)} is greater than zero, -@var{i1} comes after @var{i2}. -@end itemize - -The following comparison function can be used to scan an array in -numerical order of the indices: - -@example -function cmp_num_idx(i1, v1, i2, v2) -@{ - # numerical index comparison, ascending order - return (i1 - i2) -@} -@end example - -This function traverses an array based on an order by element values -rather than by indices: - -@example -function cmp_str_val(i1, v1, i2, v2) -@{ - # string value comparison, ascending order - v1 = v1 "" - v2 = v2 "" - if (v1 < v2) - return -1 - return (v1 != v2) -@} -@end example - -Here is a -comparison function to make all numbers, and numeric strings without -any leading or trailing spaces, come out first during loop traversal: - -@example -function cmp_num_str_val(i1, v1, i2, v2, n1, n2) -@{ - # numbers before string value comparison, ascending order - n1 = v1 + 0 - n2 = v2 + 0 - if (n1 == v1) - return (n2 == v2) ? (n1 - n2) : -1 - else if (n2 == v2) - return 1 - return (v1 < v2) ? -1 : (v1 != v2) -@} -@end example - -Consider sorting the entries of a GNU/Linux system password file -according to login names. The following program which sorts records -by a specific field position can be used for this purpose: - -@example -# sort.awk --- simple program to sort by field position -# field position is specified by the global variable POS - -function cmp_field(i1, v1, i2, v2) -@{ - # comparison by value, as string, and ascending order - return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) -@} - -@{ - for (i = 1; i <= NF; i++) - a[NR][i] = $i -@} - -END @{ - PROCINFO["sorted_in"] = "cmp_field" - if (POS < 1 || POS > NF) - POS = 1 - for (i in a) @{ - for (j = 1; j <= NF; j++) - printf("%s%c", a[i][j], j < NF ? ":" : "") - print "" - @} -@} -@end example - -The first field in each entry of the password file is the user's login name, -and the fields are seperated by colons. Running the program produces the -following output: - -@example -$ @kbd{gawk -vPOS=1 -F: -f sort.awk /etc/passwd} -@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin -@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin -@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin -@dots{} -@end example - -The comparison normally should always return the same value when given a -specific pair of array elements as its arguments. If inconsistent -results are returned then the order is undefined. This behavior is -sometimes exploited to introduce random order in otherwise seemingly -ordered data: - -@example -function cmp_randomize(i1, v1, i2, v2) -@{ - # random order - return (2 - 4 * rand()) -@} -@end example - -As mentioned above, the order of the indices is arbitrary if two -elements compare equal. This is usually not a problem, but letting -the tied elements come out in arbitrary order can be an issue, especially -when comparing item values. The partial ordering of the equal elements -may change during the next loop traversal, if other elements are added or -removed from the array. One way to resolve ties when comparing elements -with otherwise equal values is to include the indices in the comparison -rules. Note that doing this may make the loop traversal less efficient, -so consider it only if necessary. The following comparison functions -force a deterministic order, and are based on the fact that the -indices of two elements are never equal: - -@example -function cmp_numeric(i1, v1, i2, v2) -@{ - # numerical value (and index) comparison, descending order - return (v1 != v2) ? (v2 - v1) : (i2 - i1) -@} - -function cmp_string(i1, v1, i2, v2) -@{ - # string value (and index) comparison, descending order - v1 = v1 i1 - v2 = v2 i2 - return (v1 > v2) ? -1 : (v1 != v2) -@} -@end example - -@c Avoid using the term ``stable'' when describing the unpredictable behavior -@c if two items compare equal. Usually, the goal of a "stable algorithm" -@c is to maintain the original order of the items, which is a meaningless -@c concept for a list constructed from a hash. - -A custom comparison function can often simplify ordered loop -traversal, and the the sky is really the limit when it comes to -designing such a function. - -When string comparisons are made during a sort, either for element -values where one or both aren't numbers, or for element indices -handled as strings, the value of @code{IGNORECASE} -(@pxref{Built-in Variables}) controls whether -the comparisons treat corresponding uppercase and lowercase letters as -equivalent or distinct. - -All sorting based on @code{PROCINFO["sorted_in"]} -is disabled in POSIX mode, -since the @code{PROCINFO} array is not special in that case. - -As a side note, sorting the array indices before traversing -the array has been reported to add 15% to 20% overhead to the -execution time of @command{awk} programs. For this reason, -sorted array traversal is not the default. - -@c The @command{gawk} -@c maintainers believe that only the people who wish to use a -@c feature should have to pay for it. +In addition, @command{gawk} provides built-in functions for +sorting arrays; see @ref{Array Sorting Functions}. @node Delete @section The @code{delete} Statement @@ -14107,124 +13872,6 @@ The result is to set @code{separate[1]} to @code{"1"} and @code{separate[2]} to @code{"foo"}. Presto! The original sequence of separate indices is recovered. -@node Array Sorting -@section Sorting Array Values and Indices with @command{gawk} - -@cindex arrays, sorting -@cindex @code{asort()} function (@command{gawk}) -@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting -@cindex sort function, arrays, sorting -The order in which an array is scanned with a @samp{for (i in array)} -loop is essentially arbitrary. -In most @command{awk} implementations, sorting an array requires -writing a @code{sort} function. -While this can be educational for exploring different sorting algorithms, -usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort()} -and @code{asorti()} functions -(@pxref{String Functions}) -for sorting arrays. For example: - -@example -@var{populate the array} data -n = asort(data) -for (i = 1; i <= n; i++) - @var{do something with} data[i] -@end example - -After the call to @code{asort()}, the array @code{data} is indexed from 1 -to some number @var{n}, the total number of elements in @code{data}. -(This count is @code{asort()}'s return value.) -@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. -The array elements are compared as strings. - -@cindex side effects, @code{asort()} function -An important side effect of calling @code{asort()} is that -@emph{the array's original indices are irrevocably lost}. -As this isn't always desirable, @code{asort()} accepts a -second argument: - -@example -@var{populate the array} source -n = asort(source, dest) -for (i = 1; i <= n; i++) - @var{do something with} dest[i] -@end example - -In this case, @command{gawk} copies the @code{source} array into the -@code{dest} array and then sorts @code{dest}, destroying its indices. -However, the @code{source} array is not affected. - -@code{asort()} and @code{asorti()} accept a third string argument -to control the comparison rule for the array elements, and the direction -of the sorted results. The valid comparison modes are @samp{string} and @samp{number}, -and the direction can be either @samp{ascending} or @samp{descending}. -Either mode or direction, or both, can be omitted in which -case the defaults, @samp{string} or @samp{ascending} is assumed -for the comparison mode and the direction, respectively. Seperate comparison -mode from direction with a single space, and they can appear in any -order. To compare the elements as numbers, and to reverse the elements -of the @code{dest} array, the call to asort in the above example can be -replaced with: - -@example -asort(source, dest, "descending number") -@end example - -The third argument to @code{asort()} can also be a user-defined -function name which is used to order the array elements before -constructing the result array. -@xref{Scanning an Array}, for more information. - - -Often, what's needed is to sort on the values of the @emph{indices} -instead of the values of the elements. -To do that, use the -@code{asorti()} function. The interface is identical to that of -@code{asort()}, except that the index values are used for sorting, and -become the values of the result array: - -@example -@{ source[$0] = some_func($0) @} - -END @{ - n = asorti(source, dest) - for (i = 1; i <= n; i++) @{ - @ii{Work with sorted indices directly:} - @var{do something with} dest[i] - @dots{} - @ii{Access original array via sorted indices:} - @var{do something with} source[dest[i]] - @} -@} -@end example - -Sorting the array by replacing the indices provides maximal flexibility. -To traverse the elements in decreasing order, use a loop that goes from -@var{n} down to 1, either over the elements or over the indices. This -is an alternative to specifying @samp{descending} for the sorting order -using the optional third argument. - -@cindex reference counting, sorting arrays -Copying array indices and elements isn't expensive in terms of memory. -Internally, @command{gawk} maintains @dfn{reference counts} to data. -For example, when @code{asort()} copies the first array to the second one, -there is only one copy of the original array elements' data, even though -both arrays use the values. - -@c Document It And Call It A Feature. Sigh. -@cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -@cindex arrays, sorting, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array sorting and -Because @code{IGNORECASE} affects string comparisons, the value -of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. -Note also that the locale's sorting order does @emph{not} -come into play; comparisons are based on character values only.@footnote{This -is true because locale-based comparison occurs only when in POSIX -compatibility mode, and since @code{asort()} and @code{asorti()} are -@command{gawk} extensions, they are not available in that case.} -Caveat Emptor. @node Arrays of Arrays @section Arrays of Arrays @@ -14667,7 +14314,7 @@ order specification. The value of @code{IGNORECASE} affects the sorting. The third argument can also be a user-defined function name in which case the value returned by the function is used to order the array elements before constructing the result array. -@xref{Scanning an Array}, for more information. +@xref{Array Sorting Functions}, for more information. For example, if the contents of @code{a} are as follows: @@ -14701,7 +14348,7 @@ asort(a, a, "descending") @end example The @code{asort()} function is described in more detail in -@ref{Array Sorting}. +@ref{Array Sorting Functions}. @code{asort()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @@ -14713,7 +14360,7 @@ are sorted, instead of the values. (Here too, @code{IGNORECASE} affects the sorting.) The @code{asorti()} function is described in more detail in -@ref{Array Sorting}. +@ref{Array Sorting Functions}. @code{asorti()} is a @command{gawk} extension; it is not available in compatibility mode (@pxref{Options}). @@ -18474,7 +18121,9 @@ It's a bit of a ``grab bag'' of items that are otherwise unrelated to each other. First, a command-line option allows @command{gawk} to recognize nondecimal numbers in input data, not just in @command{awk} -programs. Next, two-way I/O, discussed briefly in earlier parts of this +programs. +Then, @command{gawk}'s special features for sorting arrays are presented. +Next, two-way I/O, discussed briefly in earlier parts of this @value{DOCUMENT}, is described in full detail, along with the basics of TCP/IP networking. Finally, @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune @@ -18487,6 +18136,8 @@ its description is relegated to an appendix. @menu * Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array traversal and + sorting arrays. * Two-way I/O:: Two-way communications with another process. * TCP/IP Networking:: Using @command{gawk} for network programming. * Profiling:: Profiling your @command{awk} programs. @@ -18549,6 +18200,473 @@ This makes your programs easier to write and easier to read, and leads to less surprising results. @end quotation +@node Array Sorting +@section Controlling Array Traversal and Array Sorting + +@command{gawk} lets you control the order in which @samp{for (i in array)} loops +will traverse an array. + +In addition, two built-in functions, @code{asort()} and @code{asorti()}, +let you sort arrays based on the array values and indices, respectively. +These two functions also provide control over the sorting criteria used +to order the elements during sorting. + +@menu +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}. +@end menu + +@node Controlling Array Traversal +@subsection Controlling Array Traversal + +By default, the order in which a @samp{for (i in array)} loop +will scan an array is not defined; it is generally based upon +the internal implementation of arrays inside @command{awk}. + +Often, though, it is desirable to be able to loop over the elements +in a particular order that you, the programmer, choose. @command{gawk} +lets you do this; this @value{SUBSECTION} describes how. + +@menu +* Controlling Scanning With A Function:: Using a function to control scanning. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +@end menu + +@node Controlling Scanning With A Function +@subsubsection Controlling Array Scanning Order With a User-defined Function + +The value of @code{PROCINFO["sorted_in"]} can be a function name. +This lets you traverse an array based on any custom criterion. +The array elements are ordered according to the return value of this +function. This comparison function should be defined with at least +four arguments: + +@example +function comp_func(i1, v1, i2, v2) +@{ + @var{compare elements 1 and 2 in some fashion} + @var{return < 0; 0; or > 0} +@} +@end example + +Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} +are the corresponding values of the two elements being compared. +Either @var{v1} or @var{v2}, or both, can be arrays if the array being +traversed contains subarrays as values. The three possible return values +are interpreted this way: + +@itemize @bullet +@item +If the return value of @code{comp_func(i1, v1, i2, v2)} is less than zero, +index @var{i1} comes before index @var{i2} during loop traversal. + +@item +If @code{comp_func(i1, v1, i2, v2)} returns zero, @var{i1} and @var{i2} +come together but the relative order with respect to each other is undefined. + +@item +If the return value of @code{comp_func(i1, v1, i2, v2)} is greater than zero, +@var{i1} comes after @var{i2}. +@end itemize + +The following comparison function can be used to scan an array in +numerical order of the indices: + +@example +function cmp_num_idx(i1, v1, i2, v2) +@{ + # numerical index comparison, ascending order + return (i1 - i2) +@} +@end example + +This function traverses an array based on the string order of the element values +rather than by indices: + +@example +function cmp_str_val(i1, v1, i2, v2) +@{ + # string value comparison, ascending order + v1 = v1 "" + v2 = v2 "" + if (v1 < v2) + return -1 + return (v1 != v2) +@} +@end example + +Here is a +comparison function to make all numbers, and numeric strings without +any leading or trailing spaces, come out first during loop traversal: + +@example +function cmp_num_str_val(i1, v1, i2, v2, n1, n2) +@{ + # numbers before string value comparison, ascending order + n1 = v1 + 0 + n2 = v2 + 0 + if (n1 == v1) + return (n2 == v2) ? (n1 - n2) : -1 + else if (n2 == v2) + return 1 + return (v1 < v2) ? -1 : (v1 != v2) +@} +@end example + +@strong{FIXME}: Put in a fuller example here of some data +and show the different results when traversing. + +Consider sorting the entries of a GNU/Linux system password file +according to login names. The following program which sorts records +by a specific field position can be used for this purpose: + +@example +# sort.awk --- simple program to sort by field position +# field position is specified by the global variable POS + +function cmp_field(i1, v1, i2, v2) +@{ + # comparison by value, as string, and ascending order + return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) +@} + +@{ + for (i = 1; i <= NF; i++) + a[NR][i] = $i +@} + +END @{ + PROCINFO["sorted_in"] = "cmp_field" + if (POS < 1 || POS > NF) + POS = 1 + for (i in a) @{ + for (j = 1; j <= NF; j++) + printf("%s%c", a[i][j], j < NF ? ":" : "") + print "" + @} +@} +@end example + +The first field in each entry of the password file is the user's login name, +and the fields are seperated by colons. +Each record defines a subarray, which each field as an element in the subarray. +Running the program produces the +following output: + +@example +$ @kbd{gawk -vPOS=1 -F: -f sort.awk /etc/passwd} +@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin +@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin +@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin +@dots{} +@end example + +The comparison normally should always return the same value when given a +specific pair of array elements as its arguments. If inconsistent +results are returned then the order is undefined. This behavior is +sometimes exploited to introduce random order in otherwise seemingly +ordered data: + +@example +function cmp_randomize(i1, v1, i2, v2) +@{ + # random order + return (2 - 4 * rand()) +@} +@end example + +As mentioned above, the order of the indices is arbitrary if two +elements compare equal. This is usually not a problem, but letting +the tied elements come out in arbitrary order can be an issue, especially +when comparing item values. The partial ordering of the equal elements +may change during the next loop traversal, if other elements are added or +removed from the array. One way to resolve ties when comparing elements +with otherwise equal values is to include the indices in the comparison +rules. Note that doing this may make the loop traversal less efficient, +so consider it only if necessary. The following comparison functions +force a deterministic order, and are based on the fact that the +indices of two elements are never equal: + +@example +function cmp_numeric(i1, v1, i2, v2) +@{ + # numerical value (and index) comparison, descending order + return (v1 != v2) ? (v2 - v1) : (i2 - i1) +@} + +function cmp_string(i1, v1, i2, v2) +@{ + # string value (and index) comparison, descending order + v1 = v1 i1 + v2 = v2 i2 + return (v1 > v2) ? -1 : (v1 != v2) +@} +@end example + +@c Avoid using the term ``stable'' when describing the unpredictable behavior +@c if two items compare equal. Usually, the goal of a "stable algorithm" +@c is to maintain the original order of the items, which is a meaningless +@c concept for a list constructed from a hash. + +A custom comparison function can often simplify ordered loop +traversal, and the the sky is really the limit when it comes to +designing such a function. + +When string comparisons are made during a sort, either for element +values where one or both aren't numbers, or for element indices +handled as strings, the value of @code{IGNORECASE} +(@pxref{Built-in Variables}) controls whether +the comparisons treat corresponding uppercase and lowercase letters as +equivalent or distinct. + +Another point to keep in mind is that in the case of subarrays +the element values can themselves be arrays; a production comparison +function should use the @code{isarray()} function +(@pxref{Type Functions}), +to check for this, and choose a defined sorting order for subarrays. + +All sorting based on @code{PROCINFO["sorted_in"]} +is disabled in POSIX mode, +since the @code{PROCINFO} array is not special in that case. + +As a side note, sorting the array indices before traversing +the array has been reported to add 15% to 20% overhead to the +execution time of @command{awk} programs. For this reason, +sorted array traversal is not the default. + +@c The @command{gawk} +@c maintainers believe that only the people who wish to use a +@c feature should have to pay for it. + +@node Controlling Scanning +@subsubsection Controlling Array Scanning Order + +As described in +@iftex +the previous subsubsection, +@end iftex +@ref{Controlling Scanning With A Function}, +@ifnottex +@end ifnottex +you can provide the name of a function as the value of +@code{PROCINFO["sorted_in"]} to specify custom sorting criteria. + +Often, though, you may wish to do something simple, such as +``sort based on comparing the indices in ascending order,'' +or ``sort based on comparing the values in descending order.'' +Having to write a simple comparison function for this purpose +for use in all of your programs becomes tedious. +For the most likely simple cases @command{gawk} provides +the option of supplying special names that do the requested +sorting for you. +You can think of them as ``predefined'' sorting functions, +if you like, although the names purposely include characters +that are not valid in real @command{awk} function names. + +The following special values are available: + +@table @code +@item "@@ind_str_asc" +Order by indices compared as strings; this is the most basic sort. +(Internally, array indices are always strings, so with @samp{a[2*5] = 1} +the index is actually @code{"10"} rather than numeric 10.) + +@item "@@ind_num_asc" +Order by indices but force them to be treated as numbers in the process. +Any index with non-numeric value will end up positioned as if it were zero. + +@item "@@val_type_asc" +Order by element values rather than indices. +Ordering is by the type assigned to the element +(@pxref{Typing and Comparison}). +All numeric values come before all string values, +which in turn come before all subarrays. + +@item "@@val_str_asc" +Order by element values rather than by indices. Scalar values are +compared as strings. Subarrays, if present, come out last. + +@item "@@val_num_asc" +Order by values but force scalar values to be treated as numbers +for the purpose of comparison. If there are subarrays, those appear +at the end of the sorted list. + +@item "@@ind_str_desc" +Reverse order from the most basic sort. + +@item "@@ind_num_desc" +Numeric indices ordered from high to low. + +@item "@@val_type_desc" +Element values, based on type, in descending order. + +@item "@@val_str_desc" +Element values, treated as strings, ordered from high to low. Subarrays, if present, +come out first. + +@item "@@val_num_desc" +Element values, treated as numbers, ordered from high to low. Subarrays, if present, +come out first. + +@item "@@unsorted" +Array elements are processed in arbitrary order, which is the normal @command{awk} +behavior. You can also get the normal behavior by just +deleting the @code{"sorted_in"} element from the @code{PROCINFO} array, if +it previously had a value assigned to it. +@end table + +The array traversal order is determined before the @code{for} loop +starts to run. Changing @code{PROCINFO["sorted_in"]} in the loop body +will not affect the loop. + +For example: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{ a[4] = 4} +> @kbd{ a[3] = 3} +> @kbd{ for (i in a)} +> @kbd{ print i, a[i]} +> @kbd{@}'} +@print{} 4 4 +@print{} 3 3 +$ @kbd{gawk 'BEGIN @{} +> @kbd{ PROCINFO["sorted_in"] = "@@str_ind_asc"} +> @kbd{ a[4] = 4} +> @kbd{ a[3] = 3} +> @kbd{ for (i in a)} +> @kbd{ print i, a[i]} +> @kbd{@}'} +@print{} 3 3 +@print{} 4 4 +@end example + +When sorting an array by element values, if a value happens to be +a subarray then it is considered to be greater than any string or +numeric value, regardless of what the subarray itself contains, +and all subarrays are treated as being equal to each other. Their +order relative to each other is determined by their index strings. + +@node Array Sorting Functions +@subsection Sorting Array Values and Indices with @command{gawk} + +@cindex arrays, sorting +@cindex @code{asort()} function (@command{gawk}) +@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting +@cindex sort function, arrays, sorting +The order in which an array is scanned with a @samp{for (i in array)} +loop is essentially arbitrary. +In most @command{awk} implementations, sorting an array requires +writing a @code{sort} function. +While this can be educational for exploring different sorting algorithms, +usually that's not the point of the program. +@command{gawk} provides the built-in @code{asort()} +and @code{asorti()} functions +(@pxref{String Functions}) +for sorting arrays. For example: + +@example +@var{populate the array} data +n = asort(data) +for (i = 1; i <= n; i++) + @var{do something with} data[i] +@end example + +After the call to @code{asort()}, the array @code{data} is indexed from 1 +to some number @var{n}, the total number of elements in @code{data}. +(This count is @code{asort()}'s return value.) +@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. +The array elements are compared as strings. + +@cindex side effects, @code{asort()} function +An important side effect of calling @code{asort()} is that +@emph{the array's original indices are irrevocably lost}. +As this isn't always desirable, @code{asort()} accepts a +second argument: + +@example +@var{populate the array} source +n = asort(source, dest) +for (i = 1; i <= n; i++) + @var{do something with} dest[i] +@end example + +In this case, @command{gawk} copies the @code{source} array into the +@code{dest} array and then sorts @code{dest}, destroying its indices. +However, the @code{source} array is not affected. + +@code{asort()} and @code{asorti()} accept a third string argument +to control the comparison rule for the array elements, and the direction +of the sorted results. The valid comparison modes are @samp{string} and @samp{number}, +and the direction can be either @samp{ascending} or @samp{descending}. +Either mode or direction, or both, can be omitted in which +case the defaults, @samp{string} or @samp{ascending} is assumed +for the comparison mode and the direction, respectively. Seperate comparison +mode from direction with a single space, and they can appear in any +order. To compare the elements as numbers, and to reverse the elements +of the @code{dest} array, the call to asort in the above example can be +replaced with: + +@example +asort(source, dest, "descending number") +@end example + +The third argument to @code{asort()} can also be a user-defined +function name which is used to order the array elements before +constructing the result array. +@xref{Scanning an Array}, for more information. + + +Often, what's needed is to sort on the values of the @emph{indices} +instead of the values of the elements. +To do that, use the +@code{asorti()} function. The interface is identical to that of +@code{asort()}, except that the index values are used for sorting, and +become the values of the result array: + +@example +@{ source[$0] = some_func($0) @} + +END @{ + n = asorti(source, dest) + for (i = 1; i <= n; i++) @{ + @ii{Work with sorted indices directly:} + @var{do something with} dest[i] + @dots{} + @ii{Access original array via sorted indices:} + @var{do something with} source[dest[i]] + @} +@} +@end example + +Sorting the array by replacing the indices provides maximal flexibility. +To traverse the elements in decreasing order, use a loop that goes from +@var{n} down to 1, either over the elements or over the indices. This +is an alternative to specifying @samp{descending} for the sorting order +using the optional third argument. + +@cindex reference counting, sorting arrays +Copying array indices and elements isn't expensive in terms of memory. +Internally, @command{gawk} maintains @dfn{reference counts} to data. +For example, when @code{asort()} copies the first array to the second one, +there is only one copy of the original array elements' data, even though +both arrays use the values. + +@c Document It And Call It A Feature. Sigh. +@cindex @command{gawk}, @code{IGNORECASE} variable in +@cindex @code{IGNORECASE} variable +@cindex arrays, sorting, @code{IGNORECASE} variable and +@cindex @code{IGNORECASE} variable, array sorting and +Because @code{IGNORECASE} affects string comparisons, the value +of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. +Note also that the locale's sorting order does @emph{not} +come into play; comparisons are based on character values only.@footnote{This +is true because locale-based comparison occurs only when in POSIX +compatibility mode, and since @code{asort()} and @code{asorti()} are +@command{gawk} extensions, they are not available in that case.} +Caveat Emptor. + @node Two-way I/O @section Two-Way Communications with Another Process @cindex Brennan, Michael @@ -26252,8 +26370,8 @@ of the @value{DOCUMENT} where you can find more information. * SVR4:: Minor changes between System V Releases 3.1 and 4. * POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's - version of @command{awk}. +* BTL:: New features from Brian Kernighan's version of + @command{awk}. * POSIX/GNU:: The extensions in @command{gawk} not in POSIX @command{awk}. * Common Extensions:: Common Extensions Summary. @@ -26762,6 +26880,9 @@ SunOS 3.x, Sun 386 (Road Runner) @item Tandem (non-POSIX) +@item +Prestandard VAX C compiler for VAX/VMS + @end itemize @end itemize @@ -26887,6 +27008,7 @@ provided the initial port to OS/2 and its documentation. @cindex Jaegermann, Michal Michal Jaegermann provided the port to Atari systems and its documentation. +(This port is no longer supported.) He continues to provide portability checking with DEC Alpha systems, and has done a lot of work to make sure @command{gawk} works on non-32-bit systems. |