diff options
-rw-r--r-- | doc/ChangeLog | 5 | ||||
-rw-r--r-- | doc/gawk.info | 4047 | ||||
-rw-r--r-- | doc/gawk.texi | 3961 |
3 files changed, 4038 insertions, 3975 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog index 61cce0f9..4616137b 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,8 @@ +2012-11-06 Arnold D. Robbins <arnold@skeeve.com> + + * gawk.texi: Rearrange chapter order and separate into parts + using @part for TeX. + 2012-11-05 Arnold D. Robbins <arnold@skeeve.com> * gawk.texi: Semi-rationalize invocations of @image. diff --git a/doc/gawk.info b/doc/gawk.info index f093d4fe..cc0c2598 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -87,13 +87,13 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Arrays:: The description and use of arrays. Also includes array-oriented control statements. * Functions:: Built-in and user-defined functions. +* Library Functions:: A Library of `awk' Functions. +* Sample Programs:: Many `awk' programs with complete + explanations. * Internationalization:: Getting `gawk' to speak your language. * Advanced Features:: Stuff for advanced users, specific to `gawk'. -* Library Functions:: A Library of `awk' Functions. -* Sample Programs:: Many `awk' programs with complete - explanations. * Debugger:: The `gawk' debugger. * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with `gawk'. @@ -385,28 +385,6 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) runtime. * Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU `gettext' works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging `printf' arguments. -* I18N Portability:: `awk'-level portability - issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: `gawk' is also - internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array - traversal and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use `asort()' and - `asorti()'. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using `gawk' for network - programming. -* Profiling:: Profiling your `awk' programs. * Library Names:: How to best name private global variables in library functions. * General Functions:: Functions that are of general use. @@ -468,6 +446,28 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU `gettext' works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging `printf' arguments. +* I18N Portability:: `awk'-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: `gawk' is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use `asort()' and + `asorti()'. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using `gawk' for network + programming. +* Profiling:: Profiling your `awk' programs. * Debugging:: Introduction to `gawk' debugger. * Debugging Concepts:: Debugging in General. @@ -941,6 +941,12 @@ expert should find useful. In particular, the description of POSIX `awk' and the example programs in *note Library Functions::, and in *note Sample Programs::, should be of interest. + This Info file is split into several parts, as follows: + + Part I describes the `awk' language and `gawk' program in detail. +It starts with the basics, and continues through all of the features of +`awk'. It contains the following chapters: + *note Getting Started::, provides the essentials you need to know to begin using `awk'. @@ -973,6 +979,21 @@ described, as well as sorting arrays in `gawk'. It also describes how *note Functions::, describes the built-in functions `awk' and `gawk' provide, as well as how to define your own functions. + Part II shows how to use `awk' and `gawk' for problem solving. +There is lots of code here for you to read and learn from. It contains +the following chapters: + + *note Library Functions::, which provides a number of functions +meant to be used from main `awk' programs. + + *note Sample Programs::, which provides many sample `awk' programs. + + Reading these two chapters allows you to see `awk' solving real +problems. + + Part III focuses on features specific to `gawk'. It contains the +following chapters: + *note Internationalization::, describes special features in `gawk' for translating program messages into different languages at runtime. @@ -981,10 +1002,6 @@ advanced features. Of particular note are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your `awk' programs. - *note Library Functions::, and *note Sample Programs::, provide many -sample `awk' programs. Reading them allows you to see `awk' solving -real problems. - *note Debugger::, describes the `awk' debugger. *note Arbitrary Precision Arithmetic::, describes advanced @@ -993,6 +1010,10 @@ arithmetic facilities provided by `gawk'. *note Dynamic Extensions::, describes how to add new variables and functions to `gawk' by writing extensions in C. + Part IV provides the appendices, the Glossary, and two licenses that +cover the `gawk' source code and this Info file, respectively. It +contains the following appendices: + *note Language History::, describes how the `awk' language has evolved since its first release to present. It also describes how `gawk' has acquired features over time. @@ -10928,7 +10949,7 @@ by creating an arbitrary index: -| a -File: gawk.info, Node: Functions, Next: Internationalization, Prev: Arrays, Up: Top +File: gawk.info, Node: Functions, Next: Library Functions, Prev: Arrays, Up: Top 9 Functions *********** @@ -13374,1439 +13395,9 @@ example, in the following case: `gawk' will look up the actual function to call only once. -File: gawk.info, Node: Internationalization, Next: Advanced Features, Prev: Functions, Up: Top - -10 Internationalization with `gawk' -*********************************** - -Once upon a time, computer makers wrote software that worked only in -English. Eventually, hardware and software vendors noticed that if -their systems worked in the native languages of non-English-speaking -countries, they were able to sell more systems. As a result, -internationalization and localization of programs and software systems -became a common practice. - - For many years, the ability to provide internationalization was -largely restricted to programs written in C and C++. This major node -describes the underlying library `gawk' uses for internationalization, -as well as how `gawk' makes internationalization features available at -the `awk' program level. Having internationalization available at the -`awk' level gives software developers additional flexibility--they are -no longer forced to write in C or C++ when internationalization is a -requirement. - -* Menu: - -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU `gettext' works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* I18N Example:: A simple i18n example. -* Gawk I18N:: `gawk' is also internationalized. - - -File: gawk.info, Node: I18N and L10N, Next: Explaining gettext, Up: Internationalization - -10.1 Internationalization and Localization -========================================== - -"Internationalization" means writing (or modifying) a program once, in -such a way that it can use multiple languages without requiring further -source-code changes. "Localization" means providing the data necessary -for an internationalized program to work in a particular language. -Most typically, these terms refer to features such as the language used -for printing error messages, the language used to read responses, and -information related to how numerical and monetary values are printed -and read. - - -File: gawk.info, Node: Explaining gettext, Next: Programmer i18n, Prev: I18N and L10N, Up: Internationalization - -10.2 GNU `gettext' -================== - -The facilities in GNU `gettext' focus on messages; strings printed by a -program, either directly or via formatting with `printf' or -`sprintf()'.(1) - - When using GNU `gettext', each application has its own "text -domain". This is a unique name, such as `kpilot' or `gawk', that -identifies the application. A complete application may have multiple -components--programs written in C or C++, as well as scripts written in -`sh' or `awk'. All of the components use the same text domain. - - To make the discussion concrete, assume we're writing an application -named `guide'. Internationalization consists of the following steps, -in this order: - - 1. The programmer goes through the source for all of `guide''s - components and marks each string that is a candidate for - translation. For example, `"`-F': option required"' is a good - candidate for translation. A table with strings of option names - is not (e.g., `gawk''s `--profile' option should remain the same, - no matter what the local language). - - 2. The programmer indicates the application's text domain (`"guide"') - to the `gettext' library, by calling the `textdomain()' function. - - 3. Messages from the application are extracted from the source code - and collected into a portable object template file (`guide.pot'), - which lists the strings and their translations. The translations - are initially empty. The original (usually English) messages - serve as the key for lookup of the translations. - - 4. For each language with a translator, `guide.pot' is copied to a - portable object file (`.po') and translations are created and - shipped with the application. For example, there might be a - `fr.po' for a French translation. - - 5. Each language's `.po' file is converted into a binary message - object (`.mo') file. A message object file contains the original - messages and their translations in a binary format that allows - fast lookup of translations at runtime. - - 6. When `guide' is built and installed, the binary translation files - are installed in a standard place. - - 7. For testing and development, it is possible to tell `gettext' to - use `.mo' files in a different directory than the standard one by - using the `bindtextdomain()' function. - - 8. At runtime, `guide' looks up each string via a call to - `gettext()'. The returned string is the translated string if - available, or the original string if not. - - 9. If necessary, it is possible to access messages from a different - text domain than the one belonging to the application, without - having to switch the application's default text domain back and - forth. - - In C (or C++), the string marking and dynamic translation lookup are -accomplished by wrapping each string in a call to `gettext()': - - printf("%s", gettext("Don't Panic!\n")); - - The tools that extract messages from source code pull out all -strings enclosed in calls to `gettext()'. - - The GNU `gettext' developers, recognizing that typing `gettext(...)' -over and over again is both painful and ugly to look at, use the macro -`_' (an underscore) to make things easier: - - /* In the standard header file: */ - #define _(str) gettext(str) - - /* In the program text: */ - printf("%s", _("Don't Panic!\n")); - -This reduces the typing overhead to just three extra characters per -string and is considerably easier to read as well. - - There are locale "categories" for different types of locale-related -information. The defined locale categories that `gettext' knows about -are: - -`LC_MESSAGES' - Text messages. This is the default category for `gettext' - operations, but it is possible to supply a different one - explicitly, if necessary. (It is almost never necessary to supply - a different category.) - -`LC_COLLATE' - Text-collation information; i.e., how different characters and/or - groups of characters sort in a given language. - -`LC_CTYPE' - Character-type information (alphabetic, digit, upper- or - lowercase, and so on). This information is accessed via the POSIX - character classes in regular expressions, such as `/[[:alnum:]]/' - (*note Regexp Operators::). - -`LC_MONETARY' - Monetary information, such as the currency symbol, and whether the - symbol goes before or after a number. - -`LC_NUMERIC' - Numeric information, such as which characters to use for the - decimal point and the thousands separator.(2) - -`LC_RESPONSE' - Response information, such as how "yes" and "no" appear in the - local language, and possibly other information as well. - -`LC_TIME' - Time- and date-related information, such as 12- or 24-hour clock, - month printed before or after the day in a date, local month - abbreviations, and so on. - -`LC_ALL' - All of the above. (Not too useful in the context of `gettext'.) - - ---------- Footnotes ---------- - - (1) For some operating systems, the `gawk' port doesn't support GNU -`gettext'. Therefore, these features are not available if you are -using one of those operating systems. Sorry. - - (2) Americans use a comma every three decimal places and a period -for the decimal point, while many Europeans do exactly the opposite: -1,234.56 versus 1.234,56. - - -File: gawk.info, Node: Programmer i18n, Next: Translator i18n, Prev: Explaining gettext, Up: Internationalization - -10.3 Internationalizing `awk' Programs -====================================== - -`gawk' provides the following variables and functions for -internationalization: - -`TEXTDOMAIN' - This variable indicates the application's text domain. For - compatibility with GNU `gettext', the default value is - `"messages"'. - -`_"your message here"' - String constants marked with a leading underscore are candidates - for translation at runtime. String constants without a leading - underscore are not translated. - -`dcgettext(STRING [, DOMAIN [, CATEGORY]])' - Return the translation of STRING in text domain DOMAIN for locale - category CATEGORY. The default value for DOMAIN is the current - value of `TEXTDOMAIN'. The default value for CATEGORY is - `"LC_MESSAGES"'. - - If you supply a value for CATEGORY, it must be a string equal to - one of the known locale categories described in *note Explaining - gettext::. You must also supply a text domain. Use `TEXTDOMAIN' - if you want to use the current domain. - - CAUTION: The order of arguments to the `awk' version of the - `dcgettext()' function is purposely different from the order - for the C version. The `awk' version's order was chosen to - be simple and to allow for reasonable `awk'-style default - arguments. - -`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])' - Return the plural form used for NUMBER of the translation of - STRING1 and STRING2 in text domain DOMAIN for locale category - CATEGORY. STRING1 is the English singular variant of a message, - and STRING2 the English plural variant of the same message. The - default value for DOMAIN is the current value of `TEXTDOMAIN'. - The default value for CATEGORY is `"LC_MESSAGES"'. - - The same remarks about argument order as for the `dcgettext()' - function apply. - -`bindtextdomain(DIRECTORY [, DOMAIN])' - Change the directory in which `gettext' looks for `.mo' files, in - case they will not or cannot be placed in the standard locations - (e.g., during testing). Return the directory in which DOMAIN is - "bound." - - The default DOMAIN is the value of `TEXTDOMAIN'. If DIRECTORY is - the null string (`""'), then `bindtextdomain()' returns the - current binding for the given DOMAIN. - - To use these facilities in your `awk' program, follow the steps -outlined in *note Explaining gettext::, like so: - - 1. Set the variable `TEXTDOMAIN' to the text domain of your program. - This is best done in a `BEGIN' rule (*note BEGIN/END::), or it can - also be done via the `-v' command-line option (*note Options::): - - BEGIN { - TEXTDOMAIN = "guide" - ... - } - - 2. Mark all translatable strings with a leading underscore (`_') - character. It _must_ be adjacent to the opening quote of the - string. For example: - - print _"hello, world" - x = _"you goofed" - printf(_"Number of users is %d\n", nusers) - - 3. If you are creating strings dynamically, you can still translate - them, using the `dcgettext()' built-in function: - - message = nusers " users logged in" - message = dcgettext(message, "adminprog") - print message - - Here, the call to `dcgettext()' supplies a different text domain - (`"adminprog"') in which to find the message, but it uses the - default `"LC_MESSAGES"' category. - - 4. During development, you might want to put the `.mo' file in a - private directory for testing. This is done with the - `bindtextdomain()' built-in function: - - BEGIN { - TEXTDOMAIN = "guide" # our text domain - if (Testing) { - # where to find our files - bindtextdomain("testdir") - # joe is in charge of adminprog - bindtextdomain("../joe/testdir", "adminprog") - } - ... - } - - - *Note I18N Example::, for an example program showing the steps to -create and use translations from `awk'. - - -File: gawk.info, Node: Translator i18n, Next: I18N Example, Prev: Programmer i18n, Up: Internationalization - -10.4 Translating `awk' Programs -=============================== - -Once a program's translatable strings have been marked, they must be -extracted to create the initial `.po' file. As part of translation, it -is often helpful to rearrange the order in which arguments to `printf' -are output. - - `gawk''s `--gen-pot' command-line option extracts the messages and -is discussed next. After that, `printf''s ability to rearrange the -order for `printf' arguments at runtime is covered. - -* Menu: - -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging `printf' arguments. -* I18N Portability:: `awk'-level portability issues. - - -File: gawk.info, Node: String Extraction, Next: Printf Ordering, Up: Translator i18n - -10.4.1 Extracting Marked Strings --------------------------------- - -Once your `awk' program is working, and all the strings have been -marked and you've set (and perhaps bound) the text domain, it is time -to produce translations. First, use the `--gen-pot' command-line -option to create the initial `.pot' file: - - $ gawk --gen-pot -f guide.awk > guide.pot - - When run with `--gen-pot', `gawk' does not execute your program. -Instead, it parses it as usual and prints all marked strings to -standard output in the format of a GNU `gettext' Portable Object file. -Also included in the output are any constant strings that appear as the -first argument to `dcgettext()' or as the first and second argument to -`dcngettext()'.(1) *Note I18N Example::, for the full list of steps to -go through to create and test translations for `guide'. - - ---------- Footnotes ---------- - - (1) The `xgettext' utility that comes with GNU `gettext' can handle -`.awk' files. - - -File: gawk.info, Node: Printf Ordering, Next: I18N Portability, Prev: String Extraction, Up: Translator i18n - -10.4.2 Rearranging `printf' Arguments -------------------------------------- - -Format strings for `printf' and `sprintf()' (*note Printf::) present a -special problem for translation. Consider the following:(1) - - printf(_"String `%s' has %d characters\n", - string, length(string))) - - A possible German translation for this might be: - - "%d Zeichen lang ist die Zeichenkette `%s'\n" - - The problem should be obvious: the order of the format -specifications is different from the original! Even though `gettext()' -can return the translated string at runtime, it cannot change the -argument order in the call to `printf'. - - To solve this problem, `printf' format specifiers may have an -additional optional element, which we call a "positional specifier". -For example: - - "%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" - - Here, the positional specifier consists of an integer count, which -indicates which argument to use, and a `$'. Counts are one-based, and -the format string itself is _not_ included. Thus, in the following -example, `string' is the first argument and `length(string)' is the -second: - - $ gawk 'BEGIN { - > string = "Dont Panic" - > printf _"%2$d characters live in \"%1$s\"\n", - > string, length(string) - > }' - -| 10 characters live in "Dont Panic" - - If present, positional specifiers come first in the format -specification, before the flags, the field width, and/or the precision. - - Positional specifiers can be used with the dynamic field width and -precision capability: - - $ gawk 'BEGIN { - > printf("%*.*s\n", 10, 20, "hello") - > printf("%3$*2$.*1$s\n", 20, 10, "hello") - > }' - -| hello - -| hello - - NOTE: When using `*' with a positional specifier, the `*' comes - first, then the integer position, and then the `$'. This is - somewhat counterintuitive. - - `gawk' does not allow you to mix regular format specifiers and those -with positional specifiers in the same string: - - $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }' - error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or none - - NOTE: There are some pathological cases that `gawk' may fail to - diagnose. In such cases, the output may not be what you expect. - It's still a bad idea to try mixing them, even if `gawk' doesn't - detect it. - - Although positional specifiers can be used directly in `awk' -programs, their primary purpose is to help in producing correct -translations of format strings into languages different from the one in -which the program is first written. - - ---------- Footnotes ---------- - - (1) This example is borrowed from the GNU `gettext' manual. - - -File: gawk.info, Node: I18N Portability, Prev: Printf Ordering, Up: Translator i18n - -10.4.3 `awk' Portability Issues -------------------------------- - -`gawk''s internationalization features were purposely chosen to have as -little impact as possible on the portability of `awk' programs that use -them to other versions of `awk'. Consider this program: - - BEGIN { - TEXTDOMAIN = "guide" - if (Test_Guide) # set with -v - bindtextdomain("/test/guide/messages") - print _"don't panic!" - } - -As written, it won't work on other versions of `awk'. However, it is -actually almost portable, requiring very little change: - - * Assignments to `TEXTDOMAIN' won't have any effect, since - `TEXTDOMAIN' is not special in other `awk' implementations. - - * Non-GNU versions of `awk' treat marked strings as the - concatenation of a variable named `_' with the string following - it.(1) Typically, the variable `_' has the null string (`""') as - its value, leaving the original string constant as the result. - - * By defining "dummy" functions to replace `dcgettext()', - `dcngettext()' and `bindtextdomain()', the `awk' program can be - made to run, but all the messages are output in the original - language. For example: - - function bindtextdomain(dir, domain) - { - return dir - } - - function dcgettext(string, domain, category) - { - return string - } - - function dcngettext(string1, string2, number, domain, category) - { - return (number == 1 ? string1 : string2) - } - - * The use of positional specifications in `printf' or `sprintf()' is - _not_ portable. To support `gettext()' at the C level, many - systems' C versions of `sprintf()' do support positional - specifiers. But it works only if enough arguments are supplied in - the function call. Many versions of `awk' pass `printf' formats - and arguments unchanged to the underlying C library version of - `sprintf()', but only one format and argument at a time. What - happens if a positional specification is used is anybody's guess. - However, since the positional specifications are primarily for use - in _translated_ format strings, and since non-GNU `awk's never - retrieve the translated string, this should not be a problem in - practice. - - ---------- Footnotes ---------- - - (1) This is good fodder for an "Obfuscated `awk'" contest. - - -File: gawk.info, Node: I18N Example, Next: Gawk I18N, Prev: Translator i18n, Up: Internationalization - -10.5 A Simple Internationalization Example -========================================== - -Now let's look at a step-by-step example of how to internationalize and -localize a simple `awk' program, using `guide.awk' as our original -source: - - BEGIN { - TEXTDOMAIN = "guide" - bindtextdomain(".") # for testing - print _"Don't Panic" - print _"The Answer Is", 42 - print "Pardon me, Zaphod who?" - } - -Run `gawk --gen-pot' to create the `.pot' file: - - $ gawk --gen-pot -f guide.awk > guide.pot - -This produces: - - #: guide.awk:4 - msgid "Don't Panic" - msgstr "" - - #: guide.awk:5 - msgid "The Answer Is" - msgstr "" - - This original portable object template file is saved and reused for -each language into which the application is translated. The `msgid' is -the original string and the `msgstr' is the translation. - - NOTE: Strings not marked with a leading underscore do not appear - in the `guide.pot' file. - - Next, the messages must be translated. Here is a translation to a -hypothetical dialect of English, called "Mellow":(1) - - $ cp guide.pot guide-mellow.po - ADD TRANSLATIONS TO guide-mellow.po ... - -Following are the translations: - - #: guide.awk:4 - msgid "Don't Panic" - msgstr "Hey man, relax!" - - #: guide.awk:5 - msgid "The Answer Is" - msgstr "Like, the scoop is" - - The next step is to make the directory to hold the binary message -object file and then to create the `guide.mo' file. The directory -layout shown here is standard for GNU `gettext' on GNU/Linux systems. -Other versions of `gettext' may use a different layout: - - $ mkdir en_US en_US/LC_MESSAGES - - The `msgfmt' utility does the conversion from human-readable `.po' -file to machine-readable `.mo' file. By default, `msgfmt' creates a -file named `messages'. This file must be renamed and placed in the -proper directory so that `gawk' can find it: - - $ msgfmt guide-mellow.po - $ mv messages en_US/LC_MESSAGES/guide.mo - - Finally, we run the program to test it: - - $ gawk -f guide.awk - -| Hey man, relax! - -| Like, the scoop is 42 - -| Pardon me, Zaphod who? - - If the three replacement functions for `dcgettext()', `dcngettext()' -and `bindtextdomain()' (*note I18N Portability::) are in a file named -`libintl.awk', then we can run `guide.awk' unchanged as follows: - - $ gawk --posix -f guide.awk -f libintl.awk - -| Don't Panic - -| The Answer Is 42 - -| Pardon me, Zaphod who? - - ---------- Footnotes ---------- - - (1) Perhaps it would be better if it were called "Hippy." Ah, well. - - -File: gawk.info, Node: Gawk I18N, Prev: I18N Example, Up: Internationalization - -10.6 `gawk' Can Speak Your Language -=================================== - -`gawk' itself has been internationalized using the GNU `gettext' -package. (GNU `gettext' is described in complete detail in *note (GNU -`gettext' utilities)Top:: gettext, GNU gettext tools.) As of this -writing, the latest version of GNU `gettext' is version 0.18.1 -(ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz). - - If a translation of `gawk''s messages exists, then `gawk' produces -usage messages, warnings, and fatal errors in the local language. - - -File: gawk.info, Node: Advanced Features, Next: Library Functions, Prev: Internationalization, Up: Top - -11 Advanced Features of `gawk' -****************************** - - Write documentation as if whoever reads it is a violent psychopath - who knows where you live. - Steve English, as quoted by Peter Langston - - This major node discusses advanced features in `gawk'. It's a bit -of a "grab bag" of items that are otherwise unrelated to each other. -First, a command-line option allows `gawk' to recognize nondecimal -numbers in input data, not just in `awk' programs. Then, `gawk''s -special features for sorting arrays are presented. Next, two-way I/O, -discussed briefly in earlier parts of this Info file, is described in -full detail, along with the basics of TCP/IP networking. Finally, -`gawk' can "profile" an `awk' program, making it possible to tune it -for performance. - - *note Dynamic Extensions::, discusses the ability to dynamically add -new built-in functions to `gawk'. As this feature is still immature -and likely to change, its description is relegated to an appendix. - -* Menu: - -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal and - sorting arrays. -* Two-way I/O:: Two-way communications with another process. -* TCP/IP Networking:: Using `gawk' for network programming. -* Profiling:: Profiling your `awk' programs. - - -File: gawk.info, Node: Nondecimal Data, Next: Array Sorting, Up: Advanced Features - -11.1 Allowing Nondecimal Input Data -=================================== - -If you run `gawk' with the `--non-decimal-data' option, you can have -nondecimal constants in your input data: - - $ echo 0123 123 0x123 | - > gawk --non-decimal-data '{ printf "%d, %d, %d\n", - > $1, $2, $3 }' - -| 83, 123, 291 - - For this feature to work, write your program so that `gawk' treats -your data as numeric: - - $ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }' - -| 0123 123 0x123 - -The `print' statement treats its expressions as strings. Although the -fields can act as numbers when necessary, they are still strings, so -`print' does not try to treat them numerically. You may need to add -zero to a field to force it to be treated as a number. For example: - - $ echo 0123 123 0x123 | gawk --non-decimal-data ' - > { print $1, $2, $3 - > print $1 + 0, $2 + 0, $3 + 0 }' - -| 0123 123 0x123 - -| 83 123 291 - - Because it is common to have decimal data with leading zeros, and -because using this facility could lead to surprising results, the -default is to leave it disabled. If you want it, you must explicitly -request it. - - CAUTION: _Use of this option is not recommended._ It can break old - programs very badly. Instead, use the `strtonum()' function to - convert your data (*note Nondecimal-numbers::). This makes your - programs easier to write and easier to read, and leads to less - surprising results. - - -File: gawk.info, Node: Array Sorting, Next: Two-way I/O, Prev: Nondecimal Data, Up: Advanced Features - -11.2 Controlling Array Traversal and Array Sorting -================================================== - -`gawk' lets you control the order in which a `for (i in array)' loop -traverses an array. - - In addition, two built-in functions, `asort()' and `asorti()', let -you sort arrays based on the array values and indices, respectively. -These two functions also provide control over the sorting criteria used -to order the elements during sorting. - -* Menu: - -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use `asort()' and `asorti()'. - - -File: gawk.info, Node: Controlling Array Traversal, Next: Array Sorting Functions, Up: Array Sorting - -11.2.1 Controlling Array Traversal ----------------------------------- - -By default, the order in which a `for (i in array)' loop scans an array -is not defined; it is generally based upon the internal implementation -of arrays inside `awk'. - - Often, though, it is desirable to be able to loop over the elements -in a particular order that you, the programmer, choose. `gawk' lets -you do this. - - *note Controlling Scanning::, describes how you can assign special, -pre-defined values to `PROCINFO["sorted_in"]' in order to control the -order in which `gawk' will traverse an array during a `for' loop. - - In addition, the value of `PROCINFO["sorted_in"]' can be a function -name. This lets you traverse an array based on any custom criterion. -The array elements are ordered according to the return value of this -function. The comparison function should be defined with at least four -arguments: - - function comp_func(i1, v1, i2, v2) - { - COMPARE ELEMENTS 1 AND 2 IN SOME FASHION - RETURN < 0; 0; OR > 0 - } - - Here, I1 and I2 are the indices, and V1 and V2 are the corresponding -values of the two elements being compared. Either V1 or V2, or both, -can be arrays if the array being traversed contains subarrays as values. -(*Note Arrays of Arrays::, for more information about subarrays.) The -three possible return values are interpreted as follows: - -`comp_func(i1, v1, i2, v2) < 0' - Index I1 comes before index I2 during loop traversal. - -`comp_func(i1, v1, i2, v2) == 0' - Indices I1 and I2 come together but the relative order with - respect to each other is undefined. - -`comp_func(i1, v1, i2, v2) > 0' - Index I1 comes after index I2 during loop traversal. - - Our first comparison function can be used to scan an array in -numerical order of the indices: - - function cmp_num_idx(i1, v1, i2, v2) - { - # numerical index comparison, ascending order - return (i1 - i2) - } - - Our second function traverses an array based on the string order of -the element values rather than by indices: - - function cmp_str_val(i1, v1, i2, v2) - { - # string value comparison, ascending order - v1 = v1 "" - v2 = v2 "" - if (v1 < v2) - return -1 - return (v1 != v2) - } - - The third comparison function makes all numbers, and numeric strings -without any leading or trailing spaces, come out first during loop -traversal: - - function cmp_num_str_val(i1, v1, i2, v2, n1, n2) - { - # numbers before string value comparison, ascending order - n1 = v1 + 0 - n2 = v2 + 0 - if (n1 == v1) - return (n2 == v2) ? (n1 - n2) : -1 - else if (n2 == v2) - return 1 - return (v1 < v2) ? -1 : (v1 != v2) - } - - Here is a main program to demonstrate how `gawk' behaves using each -of the previous functions: - - BEGIN { - data["one"] = 10 - data["two"] = 20 - data[10] = "one" - data[100] = 100 - data[20] = "two" - - f[1] = "cmp_num_idx" - f[2] = "cmp_str_val" - f[3] = "cmp_num_str_val" - for (i = 1; i <= 3; i++) { - printf("Sort function: %s\n", f[i]) - PROCINFO["sorted_in"] = f[i] - for (j in data) - printf("\tdata[%s] = %s\n", j, data[j]) - print "" - } - } - - Here are the results when the program is run: - - $ gawk -f compdemo.awk - -| Sort function: cmp_num_idx Sort by numeric index - -| data[two] = 20 - -| data[one] = 10 Both strings are numerically zero - -| data[10] = one - -| data[20] = two - -| data[100] = 100 - -| - -| Sort function: cmp_str_val Sort by element values as strings - -| data[one] = 10 - -| data[100] = 100 String 100 is less than string 20 - -| data[two] = 20 - -| data[10] = one - -| data[20] = two - -| - -| Sort function: cmp_num_str_val Sort all numeric values before all strings - -| data[one] = 10 - -| data[two] = 20 - -| data[100] = 100 - -| data[10] = one - -| data[20] = two - - Consider sorting the entries of a GNU/Linux system password file -according to login name. The following program sorts records by a -specific field position and can be used for this purpose: - - # sort.awk --- simple program to sort by field position - # field position is specified by the global variable POS - - function cmp_field(i1, v1, i2, v2) - { - # comparison by value, as string, and ascending order - return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) - } - - { - for (i = 1; i <= NF; i++) - a[NR][i] = $i - } - - END { - PROCINFO["sorted_in"] = "cmp_field" - if (POS < 1 || POS > NF) - POS = 1 - for (i in a) { - for (j = 1; j <= NF; j++) - printf("%s%c", a[i][j], j < NF ? ":" : "") - print "" - } - } - - The first field in each entry of the password file is the user's -login name, and the fields are separated by colons. Each record -defines a subarray, with each field as an element in the subarray. -Running the program produces the following output: - - $ gawk -v POS=1 -F: -f sort.awk /etc/passwd - -| adm:x:3:4:adm:/var/adm:/sbin/nologin - -| apache:x:48:48:Apache:/var/www:/sbin/nologin - -| avahi:x:70:70:Avahi daemon:/:/sbin/nologin - ... - - The comparison should normally always return the same value when -given a specific pair of array elements as its arguments. If -inconsistent results are returned then the order is undefined. This -behavior can be exploited to introduce random order into otherwise -seemingly ordered data: - - function cmp_randomize(i1, v1, i2, v2) - { - # random order - return (2 - 4 * rand()) - } - - As mentioned above, the order of the indices is arbitrary if two -elements compare equal. This is usually not a problem, but letting the -tied elements come out in arbitrary order can be an issue, especially -when comparing item values. The partial ordering of the equal elements -may change during the next loop traversal, if other elements are added -or removed from the array. One way to resolve ties when comparing -elements with otherwise equal values is to include the indices in the -comparison rules. Note that doing this may make the loop traversal -less efficient, so consider it only if necessary. The following -comparison functions force a deterministic order, and are based on the -fact that the indices of two elements are never equal: - - function cmp_numeric(i1, v1, i2, v2) - { - # numerical value (and index) comparison, descending order - return (v1 != v2) ? (v2 - v1) : (i2 - i1) - } - - function cmp_string(i1, v1, i2, v2) - { - # string value (and index) comparison, descending order - v1 = v1 i1 - v2 = v2 i2 - return (v1 > v2) ? -1 : (v1 != v2) - } - - A custom comparison function can often simplify ordered loop -traversal, and the sky is really the limit when it comes to designing -such a function. - - When string comparisons are made during a sort, either for element -values where one or both aren't numbers, or for element indices handled -as strings, the value of `IGNORECASE' (*note Built-in Variables::) -controls whether the comparisons treat corresponding uppercase and -lowercase letters as equivalent or distinct. - - Another point to keep in mind is that in the case of subarrays the -element values can themselves be arrays; a production comparison -function should use the `isarray()' function (*note Type Functions::), -to check for this, and choose a defined sorting order for subarrays. - - All sorting based on `PROCINFO["sorted_in"]' is disabled in POSIX -mode, since the `PROCINFO' array is not special in that case. - - As a side note, sorting the array indices before traversing the -array has been reported to add 15% to 20% overhead to the execution -time of `awk' programs. For this reason, sorted array traversal is not -the default. - - -File: gawk.info, Node: Array Sorting Functions, Prev: Controlling Array Traversal, Up: Array Sorting - -11.2.2 Sorting Array Values and Indices with `gawk' ---------------------------------------------------- - -In most `awk' implementations, sorting an array requires writing a -`sort()' function. While this can be educational for exploring -different sorting algorithms, usually that's not the point of the -program. `gawk' provides the built-in `asort()' and `asorti()' -functions (*note String Functions::) for sorting arrays. For example: - - POPULATE THE ARRAY data - n = asort(data) - for (i = 1; i <= n; i++) - DO SOMETHING WITH data[i] - - After the call to `asort()', the array `data' is indexed from 1 to -some number N, the total number of elements in `data'. (This count is -`asort()''s return value.) `data[1]' <= `data[2]' <= `data[3]', and so -on. The comparison is based on the type of the elements (*note Typing -and Comparison::). All numeric values come before all string values, -which in turn come before all subarrays. - - An important side effect of calling `asort()' is that _the array's -original indices are irrevocably lost_. As this isn't always -desirable, `asort()' accepts a second argument: - - POPULATE THE ARRAY source - n = asort(source, dest) - for (i = 1; i <= n; i++) - DO SOMETHING WITH dest[i] - - In this case, `gawk' copies the `source' array into the `dest' array -and then sorts `dest', destroying its indices. However, the `source' -array is not affected. - - `asort()' accepts a third string argument to control comparison of -array elements. As with `PROCINFO["sorted_in"]', this argument may be -one of the predefined names that `gawk' provides (*note Controlling -Scanning::), or the name of a user-defined function (*note Controlling -Array Traversal::). - - NOTE: In all cases, the sorted element values consist of the - original array's element values. The ability to control - comparison merely affects the way in which they are sorted. - - Often, what's needed is to sort on the values of the _indices_ -instead of the values of the elements. To do that, use the `asorti()' -function. The interface is identical to that of `asort()', except that -the index values are used for sorting, and become the values of the -result array: - - { source[$0] = some_func($0) } - - END { - n = asorti(source, dest) - for (i = 1; i <= n; i++) { - Work with sorted indices directly: - DO SOMETHING WITH dest[i] - ... - Access original array via sorted indices: - DO SOMETHING WITH source[dest[i]] - } - } - - Similar to `asort()', in all cases, the sorted element values -consist of the original array's indices. The ability to control -comparison merely affects the way in which they are sorted. - - Sorting the array by replacing the indices provides maximal -flexibility. To traverse the elements in decreasing order, use a loop -that goes from N down to 1, either over the elements or over the -indices.(1) - - Copying array indices and elements isn't expensive in terms of -memory. Internally, `gawk' maintains "reference counts" to data. For -example, when `asort()' copies the first array to the second one, there -is only one copy of the original array elements' data, even though both -arrays use the values. - - Because `IGNORECASE' affects string comparisons, the value of -`IGNORECASE' also affects sorting for both `asort()' and `asorti()'. -Note also that the locale's sorting order does _not_ come into play; -comparisons are based on character values only.(2) Caveat Emptor. - - ---------- Footnotes ---------- - - (1) You may also use one of the predefined sorting names that sorts -in decreasing order. - - (2) This is true because locale-based comparison occurs only when in -POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk' -extensions, they are not available in that case. - - -File: gawk.info, Node: Two-way I/O, Next: TCP/IP Networking, Prev: Array Sorting, Up: Advanced Features - -11.3 Two-Way Communications with Another Process -================================================ - - From: brennan@whidbey.com (Mike Brennan) - Newsgroups: comp.lang.awk - Subject: Re: Learn the SECRET to Attract Women Easily - Date: 4 Aug 1997 17:34:46 GMT - Message-ID: <5s53rm$eca@news.whidbey.com> - - On 3 Aug 1997 13:17:43 GMT, Want More Dates??? - <tracy78@kilgrona.com> wrote: - >Learn the SECRET to Attract Women Easily - > - >The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women - - The scent of awk programmers is a lot more attractive to women than - the scent of perl programmers. - -- - Mike Brennan - - It is often useful to be able to send data to a separate program for -processing and then read the result. This can always be done with -temporary files: - - # Write the data for processing - tempfile = ("mydata." PROCINFO["pid"]) - while (NOT DONE WITH DATA) - print DATA | ("subprogram > " tempfile) - close("subprogram > " tempfile) - - # Read the results, remove tempfile when done - while ((getline newdata < tempfile) > 0) - PROCESS newdata APPROPRIATELY - close(tempfile) - system("rm " tempfile) - -This works, but not elegantly. Among other things, it requires that -the program be run in a directory that cannot be shared among users; -for example, `/tmp' will not do, as another user might happen to be -using a temporary file with the same name. - - However, with `gawk', it is possible to open a _two-way_ pipe to -another process. The second process is termed a "coprocess", since it -runs in parallel with `gawk'. The two-way connection is created using -the `|&' operator (borrowed from the Korn shell, `ksh'):(1) - - do { - print DATA |& "subprogram" - "subprogram" |& getline results - } while (DATA LEFT TO PROCESS) - close("subprogram") - - The first time an I/O operation is executed using the `|&' operator, -`gawk' creates a two-way pipeline to a child process that runs the -other program. Output created with `print' or `printf' is written to -the program's standard input, and output from the program's standard -output can be read by the `gawk' program using `getline'. As is the -case with processes started by `|', the subprogram can be any program, -or pipeline of programs, that can be started by the shell. - - There are some cautionary items to be aware of: - - * As the code inside `gawk' currently stands, the coprocess's - standard error goes to the same place that the parent `gawk''s - standard error goes. It is not possible to read the child's - standard error separately. - - * I/O buffering may be a problem. `gawk' automatically flushes all - output down the pipe to the coprocess. However, if the coprocess - does not flush its output, `gawk' may hang when doing a `getline' - in order to read the coprocess's results. This could lead to a - situation known as "deadlock", where each process is waiting for - the other one to do something. - - It is possible to close just one end of the two-way pipe to a -coprocess, by supplying a second argument to the `close()' function of -either `"to"' or `"from"' (*note Close Files And Pipes::). These -strings tell `gawk' to close the end of the pipe that sends data to the -coprocess or the end that reads from it, respectively. - - This is particularly necessary in order to use the system `sort' -utility as part of a coprocess; `sort' must read _all_ of its input -data before it can produce any output. The `sort' program does not -receive an end-of-file indication until `gawk' closes the write end of -the pipe. - - When you have finished writing data to the `sort' utility, you can -close the `"to"' end of the pipe, and then start reading sorted data -via `getline'. For example: - - BEGIN { - command = "LC_ALL=C sort" - n = split("abcdefghijklmnopqrstuvwxyz", a, "") - - for (i = n; i > 0; i--) - print a[i] |& command - close(command, "to") - - while ((command |& getline line) > 0) - print "got", line - close(command) - } - - This program writes the letters of the alphabet in reverse order, one -per line, down the two-way pipe to `sort'. It then closes the write -end of the pipe, so that `sort' receives an end-of-file indication. -This causes `sort' to sort the data and write the sorted data back to -the `gawk' program. Once all of the data has been read, `gawk' -terminates the coprocess and exits. - - As a side note, the assignment `LC_ALL=C' in the `sort' command -ensures traditional Unix (ASCII) sorting from `sort'. - - You may also use pseudo-ttys (ptys) for two-way communication -instead of pipes, if your system supports them. This is done on a -per-command basis, by setting a special element in the `PROCINFO' array -(*note Auto-set::), like so: - - command = "sort -nr" # command, save in convenience variable - PROCINFO[command, "pty"] = 1 # update PROCINFO - print ... |& command # start two-way pipe - ... - -Using ptys avoids the buffer deadlock issues described earlier, at some -loss in performance. If your system does not have ptys, or if all the -system's ptys are in use, `gawk' automatically falls back to using -regular pipes. - - ---------- Footnotes ---------- - - (1) This is very different from the same operator in the C shell. - - -File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way I/O, Up: Advanced Features - -11.4 Using `gawk' for Network Programming -========================================= - - `EMISTERED': - A host is a host from coast to coast, - and no-one can talk to host that's close, - unless the host that isn't close - is busy hung or dead. - - In addition to being able to open a two-way pipeline to a coprocess -on the same system (*note Two-way I/O::), it is possible to make a -two-way connection to another process on another system across an IP -network connection. - - You can think of this as just a _very long_ two-way pipeline to a -coprocess. The way `gawk' decides that you want to use TCP/IP -networking is by recognizing special file names that begin with one of -`/inet/', `/inet4/' or `/inet6'. - - The full syntax of the special file name is -`/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'. The -components are: - -NET-TYPE - Specifies the kind of Internet connection to make. Use `/inet4/' - to force IPv4, and `/inet6/' to force IPv6. Plain `/inet/' (which - used to be the only option) uses the system default, most likely - IPv4. - -PROTOCOL - The protocol to use over IP. This must be either `tcp', or `udp', - for a TCP or UDP IP connection, respectively. The use of TCP is - recommended for most applications. - -LOCAL-PORT - The local TCP or UDP port number to use. Use a port number of `0' - when you want the system to pick a port. This is what you should do - when writing a TCP or UDP client. You may also use a well-known - service name, such as `smtp' or `http', in which case `gawk' - attempts to determine the predefined port number using the C - `getaddrinfo()' function. - -REMOTE-HOST - The IP address or fully-qualified domain name of the Internet host - to which you want to connect. - -REMOTE-PORT - The TCP or UDP port number to use on the given REMOTE-HOST. - Again, use `0' if you don't care, or else a well-known service - name. - - NOTE: Failure in opening a two-way socket will result in a - non-fatal error being returned to the calling code. The value of - `ERRNO' indicates the error (*note Auto-set::). - - Consider the following very simple example: - - BEGIN { - Service = "/inet/tcp/0/localhost/daytime" - Service |& getline - print $0 - close(Service) - } - - This program reads the current date and time from the local system's -TCP `daytime' server. It then prints the results and closes the -connection. - - Because this topic is extensive, the use of `gawk' for TCP/IP -programming is documented separately. See *note (General -Introduction)Top:: gawkinet, TCP/IP Internetworking with `gawk', for a -much more complete introduction and discussion, as well as extensive -examples. - - -File: gawk.info, Node: Profiling, Prev: TCP/IP Networking, Up: Advanced Features - -11.5 Profiling Your `awk' Programs -================================== - -You may produce execution traces of your `awk' programs. This is done -by passing the option `--profile' to `gawk'. When `gawk' has finished -running, it creates a profile of your program in a file named -`awkprof.out'. Because it is profiling, it also executes up to 45% -slower than `gawk' normally does. - - As shown in the following example, the `--profile' option can be -used to change the name of the file where `gawk' will write the profile: - - gawk --profile=myprog.prof -f myprog.awk data1 data2 - -In the above example, `gawk' places the profile in `myprog.prof' -instead of in `awkprof.out'. - - Here is a sample session showing a simple `awk' program, its input -data, and the results from running `gawk' with the `--profile' option. -First, the `awk' program: - - BEGIN { print "First BEGIN rule" } - - END { print "First END rule" } - - /foo/ { - print "matched /foo/, gosh" - for (i = 1; i <= 3; i++) - sing() - } - - { - if (/foo/) - print "if is true" - else - print "else is true" - } - - BEGIN { print "Second BEGIN rule" } - - END { print "Second END rule" } - - function sing( dummy) - { - print "I gotta be me!" - } - - Following is the input data: - - foo - bar - baz - foo - junk - - Here is the `awkprof.out' that results from running the `gawk' -profiler on this program and data (this example also illustrates that -`awk' programmers sometimes have to work late): - - # gawk profile, created Sun Aug 13 00:00:15 2000 - - # BEGIN block(s) - - BEGIN { - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - } - - # Rule(s) - - 5 /foo/ { # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) { - 6 sing() - } - } - - 5 { - 5 if (/foo/) { # 2 - 2 print "if is true" - 3 } else { - 3 print "else is true" - } - } - - # END block(s) - - END { - 1 print "First END rule" - 1 print "Second END rule" - } - - # Functions, listed alphabetically - - 6 function sing(dummy) - { - 6 print "I gotta be me!" - } - - This example illustrates many of the basic features of profiling -output. They are as follows: - - * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule, - pattern/action rules, `ENDFILE' rule, `END' rule and functions, - listed alphabetically. Multiple `BEGIN' and `END' rules are - merged together, as are multiple `BEGINFILE' and `ENDFILE' rules. - - * Pattern-action rules have two counts. The first count, to the - left of the rule, shows how many times the rule's pattern was - _tested_. The second count, to the right of the rule's opening - left brace in a comment, shows how many times the rule's action - was _executed_. The difference between the two indicates how many - times the rule's pattern evaluated to false. - - * Similarly, the count for an `if'-`else' statement shows how many - times the condition was tested. To the right of the opening left - brace for the `if''s body is a count showing how many times the - condition was true. The count for the `else' indicates how many - times the test failed. - - * The count for a loop header (such as `for' or `while') shows how - many times the loop test was executed. (Because of this, you - can't just look at the count on the first statement in a rule to - determine how many times the rule was executed. If the first - statement is a loop, the count is misleading.) - - * For user-defined functions, the count next to the `function' - keyword indicates how many times the function was called. The - counts next to the statements in the body show how many times - those statements were executed. - - * The layout uses "K&R" style with TABs. Braces are used - everywhere, even when the body of an `if', `else', or loop is only - a single statement. - - * Parentheses are used only where needed, as indicated by the - structure of the program and the precedence rules. For example, - `(3 + 5) * 4' means add three plus five, then multiply the total - by four. However, `3 + 5 * 4' has no parentheses, and means `3 + - (5 * 4)'. - - * Parentheses are used around the arguments to `print' and `printf' - only when the `print' or `printf' statement is followed by a - redirection. Similarly, if the target of a redirection isn't a - scalar, it gets parenthesized. - - * `gawk' supplies leading comments in front of the `BEGIN' and `END' - rules, the pattern/action rules, and the functions. - - - The profiled version of your program may not look exactly like what -you typed when you wrote it. This is because `gawk' creates the -profiled version by "pretty printing" its internal representation of -the program. The advantage to this is that `gawk' can produce a -standard representation. The disadvantage is that all source-code -comments are lost, as are the distinctions among multiple `BEGIN', -`END', `BEGINFILE', and `ENDFILE' rules. Also, things such as: - - /foo/ - -come out as: - - /foo/ { - print $0 - } +File: gawk.info, Node: Library Functions, Next: Sample Programs, Prev: Functions, Up: Top -which is correct, but possibly surprising. - - Besides creating profiles when a program has completed, `gawk' can -produce a profile while it is running. This is useful if your `awk' -program goes into an infinite loop and you want to see what has been -executed. To use this feature, run `gawk' with the `--profile' option -in the background: - - $ gawk --profile -f myprog & - [1] 13992 - -The shell prints a job number and process ID number; in this case, -13992. Use the `kill' command to send the `USR1' signal to `gawk': - - $ kill -USR1 13992 - -As usual, the profiled version of the program is written to -`awkprof.out', or to a different file if one specified with the -`--profile' option. - - Along with the regular profile, as shown earlier, the profile -includes a trace of any active functions: - - # Function Call Stack: - - # 3. baz - # 2. bar - # 1. foo - # -- main -- - - You may send `gawk' the `USR1' signal as many times as you like. -Each time, the profile and function call trace are appended to the -output profile file. - - If you use the `HUP' signal instead of the `USR1' signal, `gawk' -produces the profile and the function call trace and then exits. - - When `gawk' runs on MS-Windows systems, it uses the `INT' and `QUIT' -signals for producing the profile and, in the case of the `INT' signal, -`gawk' exits. This is because these systems don't support the `kill' -command, so the only signals you can deliver to a program are those -generated by the keyboard. The `INT' signal is generated by the -`Ctrl-<C>' or `Ctrl-<BREAK>' key, while the `QUIT' signal is generated -by the `Ctrl-<\>' key. - - Finally, `gawk' also accepts another option `--pretty-print'. When -called this way, `gawk' "pretty prints" the program into `awkprof.out', -without any execution counts. - - -File: gawk.info, Node: Library Functions, Next: Sample Programs, Prev: Advanced Features, Up: Top - -12 A Library of `awk' Functions +10 A Library of `awk' Functions ******************************* *note User-defined::, describes how to write your own `awk' functions. @@ -14878,7 +13469,7 @@ contents of the input record. File: gawk.info, Node: Library Names, Next: General Functions, Up: Library Functions -12.1 Naming Library Function Global Variables +10.1 Naming Library Function Global Variables ============================================= Due to the way the `awk' language evolved, variables are either @@ -14958,7 +13549,7 @@ verifying this. File: gawk.info, Node: General Functions, Next: Data File Management, Prev: Library Names, Up: Library Functions -12.2 General Programming +10.2 General Programming ======================== This minor node presents a number of functions that are of general @@ -14981,7 +13572,7 @@ programming use. File: gawk.info, Node: Strtonum Function, Next: Assert Function, Up: General Functions -12.2.1 Converting Strings To Numbers +10.2.1 Converting Strings To Numbers ------------------------------------ The `strtonum()' function (*note String Functions::) is a `gawk' @@ -15065,7 +13656,7 @@ be tested with `gawk' and the results compared to the built-in File: gawk.info, Node: Assert Function, Next: Round Function, Prev: Strtonum Function, Up: General Functions -12.2.2 Assertions +10.2.2 Assertions ----------------- When writing large programs, it is often useful to know that a @@ -15151,7 +13742,7 @@ rule always ends with an `exit' statement. File: gawk.info, Node: Round Function, Next: Cliff Random Function, Prev: Assert Function, Up: General Functions -12.2.3 Rounding Numbers +10.2.3 Rounding Numbers ----------------------- The way `printf' and `sprintf()' (*note Printf::) perform rounding @@ -15197,7 +13788,7 @@ might be useful if your `awk''s `printf' does unbiased rounding: File: gawk.info, Node: Cliff Random Function, Next: Ordinal Functions, Prev: Round Function, Up: General Functions -12.2.4 The Cliff Random Number Generator +10.2.4 The Cliff Random Number Generator ---------------------------------------- The Cliff random number generator @@ -15226,7 +13817,7 @@ might try using this function instead. File: gawk.info, Node: Ordinal Functions, Next: Join Function, Prev: Cliff Random Function, Up: General Functions -12.2.5 Translating Between Characters and Numbers +10.2.5 Translating Between Characters and Numbers ------------------------------------------------- One commercial implementation of `awk' supplies a built-in function, @@ -15324,7 +13915,7 @@ extensions, you can simplify `_ord_init' to loop from 0 to 255. File: gawk.info, Node: Join Function, Next: Getlocaltime Function, Prev: Ordinal Functions, Up: General Functions -12.2.6 Merging an Array into a String +10.2.6 Merging an Array into a String ------------------------------------- When doing string processing, it is often useful to be able to join all @@ -15371,7 +13962,7 @@ makes string operations more difficult than they really need to be. File: gawk.info, Node: Getlocaltime Function, Prev: Join Function, Up: General Functions -12.2.7 Managing the Time of Day +10.2.7 Managing the Time of Day ------------------------------- The `systime()' and `strftime()' functions described in *note Time @@ -15453,7 +14044,7 @@ optional timestamp value to use instead of the current time. File: gawk.info, Node: Data File Management, Next: Getopt Function, Prev: General Functions, Up: Library Functions -12.3 Data File Management +10.3 Data File Management ========================= This minor node presents functions that are useful for managing @@ -15470,7 +14061,7 @@ command-line data files. File: gawk.info, Node: Filetrans Function, Next: Rewind Function, Up: Data File Management -12.3.1 Noting Data File Boundaries +10.3.1 Noting Data File Boundaries ---------------------------------- The `BEGIN' and `END' rules are each executed exactly once at the @@ -15568,7 +14159,7 @@ it provides an easy way to do per-file cleanup processing. File: gawk.info, Node: Rewind Function, Next: File Checking, Prev: Filetrans Function, Up: Data File Management -12.3.2 Rereading the Current File +10.3.2 Rereading the Current File --------------------------------- Another request for a new built-in function was for a `rewind()' @@ -15610,7 +14201,7 @@ Nextfile Statement::). File: gawk.info, Node: File Checking, Next: Empty Files, Prev: Rewind Function, Up: Data File Management -12.3.3 Checking for Readable Data Files +10.3.3 Checking for Readable Data Files --------------------------------------- Normally, if you give `awk' a data file that isn't readable, it stops @@ -15639,7 +14230,7 @@ in the list). See also *note ARGC and ARGV::. File: gawk.info, Node: Empty Files, Next: Ignoring Assigns, Prev: File Checking, Up: Data File Management -12.3.4 Checking For Zero-length Files +10.3.4 Checking For Zero-length Files ------------------------------------- All known `awk' implementations silently skip over zero-length files. @@ -15696,7 +14287,7 @@ intervening value in `ARGV' is a variable assignment. File: gawk.info, Node: Ignoring Assigns, Prev: Empty Files, Up: Data File Management -12.3.5 Treating Assignments as File Names +10.3.5 Treating Assignments as File Names ----------------------------------------- Occasionally, you might not want `awk' to process command-line variable @@ -15739,7 +14330,7 @@ arguments are left alone. File: gawk.info, Node: Getopt Function, Next: Passwd Functions, Prev: Data File Management, Up: Library Functions -12.4 Processing Command-Line Options +10.4 Processing Command-Line Options ==================================== Most utilities on POSIX compatible systems take options on the command @@ -16032,7 +14623,7 @@ have left it alone, since using `substr()' is more portable. File: gawk.info, Node: Passwd Functions, Next: Group Functions, Prev: Getopt Function, Up: Library Functions -12.5 Reading the User Database +10.5 Reading the User Database ============================== The `PROCINFO' array (*note Built-in Variables::) provides access to @@ -16275,7 +14866,7 @@ network database. File: gawk.info, Node: Group Functions, Next: Walking Arrays, Prev: Passwd Functions, Up: Library Functions -12.6 Reading the Group Database +10.6 Reading the Group Database =============================== Much of the discussion presented in *note Passwd Functions::, applies @@ -16509,7 +15100,7 @@ very simple, relying on `awk''s associative arrays to do work. File: gawk.info, Node: Walking Arrays, Prev: Group Functions, Up: Library Functions -12.7 Traversing Arrays of Arrays +10.7 Traversing Arrays of Arrays ================================ *note Arrays of Arrays::, described how `gawk' provides arrays of @@ -16558,9 +15149,9 @@ value. Here is a main program to demonstrate: -| a[3] = 3 -File: gawk.info, Node: Sample Programs, Next: Debugger, Prev: Library Functions, Up: Top +File: gawk.info, Node: Sample Programs, Next: Internationalization, Prev: Library Functions, Up: Top -13 Practical `awk' Programs +11 Practical `awk' Programs *************************** *note Library Functions::, presents the idea that reading programs in a @@ -16580,7 +15171,7 @@ Library Functions::. File: gawk.info, Node: Running Examples, Next: Clones, Up: Sample Programs -13.1 Running the Example Programs +11.1 Running the Example Programs ================================= To run a given program, you would typically do something like this: @@ -16603,7 +15194,7 @@ OPTIONS are any command-line options for the program that start with a File: gawk.info, Node: Clones, Next: Miscellaneous Programs, Prev: Running Examples, Up: Sample Programs -13.2 Reinventing Wheels for Fun and Profit +11.2 Reinventing Wheels for Fun and Profit ========================================== This minor node presents a number of POSIX utilities implemented in @@ -16633,7 +15224,7 @@ programming for "real world" tasks. File: gawk.info, Node: Cut Program, Next: Egrep Program, Up: Clones -13.2.1 Cutting out Fields and Columns +11.2.1 Cutting out Fields and Columns ------------------------------------- The `cut' utility selects, or "cuts," characters or fields from its @@ -16892,7 +15483,7 @@ solution to the problem of picking the input line apart by characters. File: gawk.info, Node: Egrep Program, Next: Id Program, Prev: Cut Program, Up: Clones -13.2.2 Searching for Regular Expressions in Files +11.2.2 Searching for Regular Expressions in Files ------------------------------------------------- The `egrep' utility searches files for patterns. It uses regular @@ -17124,7 +15715,7 @@ the translated line, not the original. File: gawk.info, Node: Id Program, Next: Split Program, Prev: Egrep Program, Up: Clones -13.2.3 Printing out User Information +11.2.3 Printing out User Information ------------------------------------ The `id' utility lists a user's real and effective user ID numbers, @@ -17231,7 +15822,7 @@ body never executes. File: gawk.info, Node: Split Program, Next: Tee Program, Prev: Id Program, Up: Clones -13.2.4 Splitting a Large File into Pieces +11.2.4 Splitting a Large File into Pieces ----------------------------------------- The `split' program splits large text files into smaller pieces. Usage @@ -17339,7 +15930,7 @@ not relevant for what the program aims to demonstrate. File: gawk.info, Node: Tee Program, Next: Uniq Program, Prev: Split Program, Up: Clones -13.2.5 Duplicating Output into Multiple Files +11.2.5 Duplicating Output into Multiple Files --------------------------------------------- The `tee' program is known as a "pipe fitting." `tee' copies its @@ -17427,7 +16018,7 @@ N input records and M output files, the first method only executes N File: gawk.info, Node: Uniq Program, Next: Wc Program, Prev: Tee Program, Up: Clones -13.2.6 Printing Nonduplicated Lines of Text +11.2.6 Printing Nonduplicated Lines of Text ------------------------------------------- The `uniq' utility reads sorted lines of data on its standard input, @@ -17646,7 +16237,7 @@ line of input data: File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones -13.2.7 Counting Things +11.2.7 Counting Things ---------------------- The `wc' (word count) utility counts lines, words, and characters in @@ -17791,7 +16382,7 @@ characters, not bytes. File: gawk.info, Node: Miscellaneous Programs, Prev: Clones, Up: Sample Programs -13.3 A Grab Bag of `awk' Programs +11.3 A Grab Bag of `awk' Programs ================================= This minor node is a large "grab bag" of miscellaneous programs. We @@ -17818,7 +16409,7 @@ hope you find them both interesting and enjoyable. File: gawk.info, Node: Dupword Program, Next: Alarm Program, Up: Miscellaneous Programs -13.3.1 Finding Duplicated Words in a Document +11.3.1 Finding Duplicated Words in a Document --------------------------------------------- A common error when writing large amounts of prose is to accidentally @@ -17866,7 +16457,7 @@ word, comparing it to the previous one: File: gawk.info, Node: Alarm Program, Next: Translate Program, Prev: Dupword Program, Up: Miscellaneous Programs -13.3.2 An Alarm Clock Program +11.3.2 An Alarm Clock Program ----------------------------- Nothing cures insomnia like a ringing alarm clock. @@ -17999,7 +16590,7 @@ necessary: File: gawk.info, Node: Translate Program, Next: Labels Program, Prev: Alarm Program, Up: Miscellaneous Programs -13.3.3 Transliterating Characters +11.3.3 Transliterating Characters --------------------------------- The system `tr' utility transliterates characters. For example, it is @@ -18125,7 +16716,7 @@ split each character in a string into separate array elements. File: gawk.info, Node: Labels Program, Next: Word Sorting, Prev: Translate Program, Up: Miscellaneous Programs -13.3.4 Printing Mailing Labels +11.3.4 Printing Mailing Labels ------------------------------ Here is a "real world"(1) program. This script reads lists of names and @@ -18232,7 +16823,7 @@ something done." File: gawk.info, Node: Word Sorting, Next: History Sorting, Prev: Labels Program, Up: Miscellaneous Programs -13.3.5 Generating Word-Usage Counts +11.3.5 Generating Word-Usage Counts ----------------------------------- When working with large amounts of text, it can be interesting to know @@ -18336,7 +16927,7 @@ operating system documentation for more information on how to use the File: gawk.info, Node: History Sorting, Next: Extract Program, Prev: Word Sorting, Up: Miscellaneous Programs -13.3.6 Removing Duplicates from Unsorted Text +11.3.6 Removing Duplicates from Unsorted Text --------------------------------------------- The `uniq' program (*note Uniq Program::), removes duplicate lines from @@ -18383,7 +16974,7 @@ seen. File: gawk.info, Node: Extract Program, Next: Simple Sed, Prev: History Sorting, Up: Miscellaneous Programs -13.3.7 Extracting Programs from Texinfo Source Files +11.3.7 Extracting Programs from Texinfo Source Files ---------------------------------------------------- The nodes *note Library Functions::, and *note Sample Programs::, are @@ -18583,7 +17174,7 @@ function. Consider how you might use it to simplify the code. File: gawk.info, Node: Simple Sed, Next: Igawk Program, Prev: Extract Program, Up: Miscellaneous Programs -13.3.8 A Simple Stream Editor +11.3.8 A Simple Stream Editor ----------------------------- The `sed' utility is a stream editor, a program that reads a stream of @@ -18664,7 +17255,7 @@ the single rule handles the printing scheme outlined above, using File: gawk.info, Node: Igawk Program, Next: Anagram Program, Prev: Simple Sed, Up: Miscellaneous Programs -13.3.9 An Easy Way to Use Library Functions +11.3.9 An Easy Way to Use Library Functions ------------------------------------------- In *note Include Files::, we saw how `gawk' provides a built-in @@ -19061,7 +17652,7 @@ can loop forever if the file exists but is empty. Caveat emptor. File: gawk.info, Node: Anagram Program, Next: Signature Program, Prev: Igawk Program, Up: Miscellaneous Programs -13.3.10 Finding Anagrams From A Dictionary +11.3.10 Finding Anagrams From A Dictionary ------------------------------------------ An interesting programming challenge is to search for "anagrams" in a @@ -19151,7 +17742,7 @@ otherwise the anagrams would appear in arbitrary order: File: gawk.info, Node: Signature Program, Prev: Anagram Program, Up: Miscellaneous Programs -13.3.11 And Now For Something Completely Different +11.3.11 And Now For Something Completely Different -------------------------------------------------- The following program was written by Davide Brini and is published on @@ -19176,7 +17767,1437 @@ supplies the following copyright terms: We leave it to you to determine what the program does. -File: gawk.info, Node: Debugger, Next: Arbitrary Precision Arithmetic, Prev: Sample Programs, Up: Top +File: gawk.info, Node: Internationalization, Next: Advanced Features, Prev: Sample Programs, Up: Top + +12 Internationalization with `gawk' +*********************************** + +Once upon a time, computer makers wrote software that worked only in +English. Eventually, hardware and software vendors noticed that if +their systems worked in the native languages of non-English-speaking +countries, they were able to sell more systems. As a result, +internationalization and localization of programs and software systems +became a common practice. + + For many years, the ability to provide internationalization was +largely restricted to programs written in C and C++. This major node +describes the underlying library `gawk' uses for internationalization, +as well as how `gawk' makes internationalization features available at +the `awk' program level. Having internationalization available at the +`awk' level gives software developers additional flexibility--they are +no longer forced to write in C or C++ when internationalization is a +requirement. + +* Menu: + +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU `gettext' works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* I18N Example:: A simple i18n example. +* Gawk I18N:: `gawk' is also internationalized. + + +File: gawk.info, Node: I18N and L10N, Next: Explaining gettext, Up: Internationalization + +12.1 Internationalization and Localization +========================================== + +"Internationalization" means writing (or modifying) a program once, in +such a way that it can use multiple languages without requiring further +source-code changes. "Localization" means providing the data necessary +for an internationalized program to work in a particular language. +Most typically, these terms refer to features such as the language used +for printing error messages, the language used to read responses, and +information related to how numerical and monetary values are printed +and read. + + +File: gawk.info, Node: Explaining gettext, Next: Programmer i18n, Prev: I18N and L10N, Up: Internationalization + +12.2 GNU `gettext' +================== + +The facilities in GNU `gettext' focus on messages; strings printed by a +program, either directly or via formatting with `printf' or +`sprintf()'.(1) + + When using GNU `gettext', each application has its own "text +domain". This is a unique name, such as `kpilot' or `gawk', that +identifies the application. A complete application may have multiple +components--programs written in C or C++, as well as scripts written in +`sh' or `awk'. All of the components use the same text domain. + + To make the discussion concrete, assume we're writing an application +named `guide'. Internationalization consists of the following steps, +in this order: + + 1. The programmer goes through the source for all of `guide''s + components and marks each string that is a candidate for + translation. For example, `"`-F': option required"' is a good + candidate for translation. A table with strings of option names + is not (e.g., `gawk''s `--profile' option should remain the same, + no matter what the local language). + + 2. The programmer indicates the application's text domain (`"guide"') + to the `gettext' library, by calling the `textdomain()' function. + + 3. Messages from the application are extracted from the source code + and collected into a portable object template file (`guide.pot'), + which lists the strings and their translations. The translations + are initially empty. The original (usually English) messages + serve as the key for lookup of the translations. + + 4. For each language with a translator, `guide.pot' is copied to a + portable object file (`.po') and translations are created and + shipped with the application. For example, there might be a + `fr.po' for a French translation. + + 5. Each language's `.po' file is converted into a binary message + object (`.mo') file. A message object file contains the original + messages and their translations in a binary format that allows + fast lookup of translations at runtime. + + 6. When `guide' is built and installed, the binary translation files + are installed in a standard place. + + 7. For testing and development, it is possible to tell `gettext' to + use `.mo' files in a different directory than the standard one by + using the `bindtextdomain()' function. + + 8. At runtime, `guide' looks up each string via a call to + `gettext()'. The returned string is the translated string if + available, or the original string if not. + + 9. If necessary, it is possible to access messages from a different + text domain than the one belonging to the application, without + having to switch the application's default text domain back and + forth. + + In C (or C++), the string marking and dynamic translation lookup are +accomplished by wrapping each string in a call to `gettext()': + + printf("%s", gettext("Don't Panic!\n")); + + The tools that extract messages from source code pull out all +strings enclosed in calls to `gettext()'. + + The GNU `gettext' developers, recognizing that typing `gettext(...)' +over and over again is both painful and ugly to look at, use the macro +`_' (an underscore) to make things easier: + + /* In the standard header file: */ + #define _(str) gettext(str) + + /* In the program text: */ + printf("%s", _("Don't Panic!\n")); + +This reduces the typing overhead to just three extra characters per +string and is considerably easier to read as well. + + There are locale "categories" for different types of locale-related +information. The defined locale categories that `gettext' knows about +are: + +`LC_MESSAGES' + Text messages. This is the default category for `gettext' + operations, but it is possible to supply a different one + explicitly, if necessary. (It is almost never necessary to supply + a different category.) + +`LC_COLLATE' + Text-collation information; i.e., how different characters and/or + groups of characters sort in a given language. + +`LC_CTYPE' + Character-type information (alphabetic, digit, upper- or + lowercase, and so on). This information is accessed via the POSIX + character classes in regular expressions, such as `/[[:alnum:]]/' + (*note Regexp Operators::). + +`LC_MONETARY' + Monetary information, such as the currency symbol, and whether the + symbol goes before or after a number. + +`LC_NUMERIC' + Numeric information, such as which characters to use for the + decimal point and the thousands separator.(2) + +`LC_RESPONSE' + Response information, such as how "yes" and "no" appear in the + local language, and possibly other information as well. + +`LC_TIME' + Time- and date-related information, such as 12- or 24-hour clock, + month printed before or after the day in a date, local month + abbreviations, and so on. + +`LC_ALL' + All of the above. (Not too useful in the context of `gettext'.) + + ---------- Footnotes ---------- + + (1) For some operating systems, the `gawk' port doesn't support GNU +`gettext'. Therefore, these features are not available if you are +using one of those operating systems. Sorry. + + (2) Americans use a comma every three decimal places and a period +for the decimal point, while many Europeans do exactly the opposite: +1,234.56 versus 1.234,56. + + +File: gawk.info, Node: Programmer i18n, Next: Translator i18n, Prev: Explaining gettext, Up: Internationalization + +12.3 Internationalizing `awk' Programs +====================================== + +`gawk' provides the following variables and functions for +internationalization: + +`TEXTDOMAIN' + This variable indicates the application's text domain. For + compatibility with GNU `gettext', the default value is + `"messages"'. + +`_"your message here"' + String constants marked with a leading underscore are candidates + for translation at runtime. String constants without a leading + underscore are not translated. + +`dcgettext(STRING [, DOMAIN [, CATEGORY]])' + Return the translation of STRING in text domain DOMAIN for locale + category CATEGORY. The default value for DOMAIN is the current + value of `TEXTDOMAIN'. The default value for CATEGORY is + `"LC_MESSAGES"'. + + If you supply a value for CATEGORY, it must be a string equal to + one of the known locale categories described in *note Explaining + gettext::. You must also supply a text domain. Use `TEXTDOMAIN' + if you want to use the current domain. + + CAUTION: The order of arguments to the `awk' version of the + `dcgettext()' function is purposely different from the order + for the C version. The `awk' version's order was chosen to + be simple and to allow for reasonable `awk'-style default + arguments. + +`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])' + Return the plural form used for NUMBER of the translation of + STRING1 and STRING2 in text domain DOMAIN for locale category + CATEGORY. STRING1 is the English singular variant of a message, + and STRING2 the English plural variant of the same message. The + default value for DOMAIN is the current value of `TEXTDOMAIN'. + The default value for CATEGORY is `"LC_MESSAGES"'. + + The same remarks about argument order as for the `dcgettext()' + function apply. + +`bindtextdomain(DIRECTORY [, DOMAIN])' + Change the directory in which `gettext' looks for `.mo' files, in + case they will not or cannot be placed in the standard locations + (e.g., during testing). Return the directory in which DOMAIN is + "bound." + + The default DOMAIN is the value of `TEXTDOMAIN'. If DIRECTORY is + the null string (`""'), then `bindtextdomain()' returns the + current binding for the given DOMAIN. + + To use these facilities in your `awk' program, follow the steps +outlined in *note Explaining gettext::, like so: + + 1. Set the variable `TEXTDOMAIN' to the text domain of your program. + This is best done in a `BEGIN' rule (*note BEGIN/END::), or it can + also be done via the `-v' command-line option (*note Options::): + + BEGIN { + TEXTDOMAIN = "guide" + ... + } + + 2. Mark all translatable strings with a leading underscore (`_') + character. It _must_ be adjacent to the opening quote of the + string. For example: + + print _"hello, world" + x = _"you goofed" + printf(_"Number of users is %d\n", nusers) + + 3. If you are creating strings dynamically, you can still translate + them, using the `dcgettext()' built-in function: + + message = nusers " users logged in" + message = dcgettext(message, "adminprog") + print message + + Here, the call to `dcgettext()' supplies a different text domain + (`"adminprog"') in which to find the message, but it uses the + default `"LC_MESSAGES"' category. + + 4. During development, you might want to put the `.mo' file in a + private directory for testing. This is done with the + `bindtextdomain()' built-in function: + + BEGIN { + TEXTDOMAIN = "guide" # our text domain + if (Testing) { + # where to find our files + bindtextdomain("testdir") + # joe is in charge of adminprog + bindtextdomain("../joe/testdir", "adminprog") + } + ... + } + + + *Note I18N Example::, for an example program showing the steps to +create and use translations from `awk'. + + +File: gawk.info, Node: Translator i18n, Next: I18N Example, Prev: Programmer i18n, Up: Internationalization + +12.4 Translating `awk' Programs +=============================== + +Once a program's translatable strings have been marked, they must be +extracted to create the initial `.po' file. As part of translation, it +is often helpful to rearrange the order in which arguments to `printf' +are output. + + `gawk''s `--gen-pot' command-line option extracts the messages and +is discussed next. After that, `printf''s ability to rearrange the +order for `printf' arguments at runtime is covered. + +* Menu: + +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging `printf' arguments. +* I18N Portability:: `awk'-level portability issues. + + +File: gawk.info, Node: String Extraction, Next: Printf Ordering, Up: Translator i18n + +12.4.1 Extracting Marked Strings +-------------------------------- + +Once your `awk' program is working, and all the strings have been +marked and you've set (and perhaps bound) the text domain, it is time +to produce translations. First, use the `--gen-pot' command-line +option to create the initial `.pot' file: + + $ gawk --gen-pot -f guide.awk > guide.pot + + When run with `--gen-pot', `gawk' does not execute your program. +Instead, it parses it as usual and prints all marked strings to +standard output in the format of a GNU `gettext' Portable Object file. +Also included in the output are any constant strings that appear as the +first argument to `dcgettext()' or as the first and second argument to +`dcngettext()'.(1) *Note I18N Example::, for the full list of steps to +go through to create and test translations for `guide'. + + ---------- Footnotes ---------- + + (1) The `xgettext' utility that comes with GNU `gettext' can handle +`.awk' files. + + +File: gawk.info, Node: Printf Ordering, Next: I18N Portability, Prev: String Extraction, Up: Translator i18n + +12.4.2 Rearranging `printf' Arguments +------------------------------------- + +Format strings for `printf' and `sprintf()' (*note Printf::) present a +special problem for translation. Consider the following:(1) + + printf(_"String `%s' has %d characters\n", + string, length(string))) + + A possible German translation for this might be: + + "%d Zeichen lang ist die Zeichenkette `%s'\n" + + The problem should be obvious: the order of the format +specifications is different from the original! Even though `gettext()' +can return the translated string at runtime, it cannot change the +argument order in the call to `printf'. + + To solve this problem, `printf' format specifiers may have an +additional optional element, which we call a "positional specifier". +For example: + + "%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" + + Here, the positional specifier consists of an integer count, which +indicates which argument to use, and a `$'. Counts are one-based, and +the format string itself is _not_ included. Thus, in the following +example, `string' is the first argument and `length(string)' is the +second: + + $ gawk 'BEGIN { + > string = "Dont Panic" + > printf _"%2$d characters live in \"%1$s\"\n", + > string, length(string) + > }' + -| 10 characters live in "Dont Panic" + + If present, positional specifiers come first in the format +specification, before the flags, the field width, and/or the precision. + + Positional specifiers can be used with the dynamic field width and +precision capability: + + $ gawk 'BEGIN { + > printf("%*.*s\n", 10, 20, "hello") + > printf("%3$*2$.*1$s\n", 20, 10, "hello") + > }' + -| hello + -| hello + + NOTE: When using `*' with a positional specifier, the `*' comes + first, then the integer position, and then the `$'. This is + somewhat counterintuitive. + + `gawk' does not allow you to mix regular format specifiers and those +with positional specifiers in the same string: + + $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }' + error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or none + + NOTE: There are some pathological cases that `gawk' may fail to + diagnose. In such cases, the output may not be what you expect. + It's still a bad idea to try mixing them, even if `gawk' doesn't + detect it. + + Although positional specifiers can be used directly in `awk' +programs, their primary purpose is to help in producing correct +translations of format strings into languages different from the one in +which the program is first written. + + ---------- Footnotes ---------- + + (1) This example is borrowed from the GNU `gettext' manual. + + +File: gawk.info, Node: I18N Portability, Prev: Printf Ordering, Up: Translator i18n + +12.4.3 `awk' Portability Issues +------------------------------- + +`gawk''s internationalization features were purposely chosen to have as +little impact as possible on the portability of `awk' programs that use +them to other versions of `awk'. Consider this program: + + BEGIN { + TEXTDOMAIN = "guide" + if (Test_Guide) # set with -v + bindtextdomain("/test/guide/messages") + print _"don't panic!" + } + +As written, it won't work on other versions of `awk'. However, it is +actually almost portable, requiring very little change: + + * Assignments to `TEXTDOMAIN' won't have any effect, since + `TEXTDOMAIN' is not special in other `awk' implementations. + + * Non-GNU versions of `awk' treat marked strings as the + concatenation of a variable named `_' with the string following + it.(1) Typically, the variable `_' has the null string (`""') as + its value, leaving the original string constant as the result. + + * By defining "dummy" functions to replace `dcgettext()', + `dcngettext()' and `bindtextdomain()', the `awk' program can be + made to run, but all the messages are output in the original + language. For example: + + function bindtextdomain(dir, domain) + { + return dir + } + + function dcgettext(string, domain, category) + { + return string + } + + function dcngettext(string1, string2, number, domain, category) + { + return (number == 1 ? string1 : string2) + } + + * The use of positional specifications in `printf' or `sprintf()' is + _not_ portable. To support `gettext()' at the C level, many + systems' C versions of `sprintf()' do support positional + specifiers. But it works only if enough arguments are supplied in + the function call. Many versions of `awk' pass `printf' formats + and arguments unchanged to the underlying C library version of + `sprintf()', but only one format and argument at a time. What + happens if a positional specification is used is anybody's guess. + However, since the positional specifications are primarily for use + in _translated_ format strings, and since non-GNU `awk's never + retrieve the translated string, this should not be a problem in + practice. + + ---------- Footnotes ---------- + + (1) This is good fodder for an "Obfuscated `awk'" contest. + + +File: gawk.info, Node: I18N Example, Next: Gawk I18N, Prev: Translator i18n, Up: Internationalization + +12.5 A Simple Internationalization Example +========================================== + +Now let's look at a step-by-step example of how to internationalize and +localize a simple `awk' program, using `guide.awk' as our original +source: + + BEGIN { + TEXTDOMAIN = "guide" + bindtextdomain(".") # for testing + print _"Don't Panic" + print _"The Answer Is", 42 + print "Pardon me, Zaphod who?" + } + +Run `gawk --gen-pot' to create the `.pot' file: + + $ gawk --gen-pot -f guide.awk > guide.pot + +This produces: + + #: guide.awk:4 + msgid "Don't Panic" + msgstr "" + + #: guide.awk:5 + msgid "The Answer Is" + msgstr "" + + This original portable object template file is saved and reused for +each language into which the application is translated. The `msgid' is +the original string and the `msgstr' is the translation. + + NOTE: Strings not marked with a leading underscore do not appear + in the `guide.pot' file. + + Next, the messages must be translated. Here is a translation to a +hypothetical dialect of English, called "Mellow":(1) + + $ cp guide.pot guide-mellow.po + ADD TRANSLATIONS TO guide-mellow.po ... + +Following are the translations: + + #: guide.awk:4 + msgid "Don't Panic" + msgstr "Hey man, relax!" + + #: guide.awk:5 + msgid "The Answer Is" + msgstr "Like, the scoop is" + + The next step is to make the directory to hold the binary message +object file and then to create the `guide.mo' file. The directory +layout shown here is standard for GNU `gettext' on GNU/Linux systems. +Other versions of `gettext' may use a different layout: + + $ mkdir en_US en_US/LC_MESSAGES + + The `msgfmt' utility does the conversion from human-readable `.po' +file to machine-readable `.mo' file. By default, `msgfmt' creates a +file named `messages'. This file must be renamed and placed in the +proper directory so that `gawk' can find it: + + $ msgfmt guide-mellow.po + $ mv messages en_US/LC_MESSAGES/guide.mo + + Finally, we run the program to test it: + + $ gawk -f guide.awk + -| Hey man, relax! + -| Like, the scoop is 42 + -| Pardon me, Zaphod who? + + If the three replacement functions for `dcgettext()', `dcngettext()' +and `bindtextdomain()' (*note I18N Portability::) are in a file named +`libintl.awk', then we can run `guide.awk' unchanged as follows: + + $ gawk --posix -f guide.awk -f libintl.awk + -| Don't Panic + -| The Answer Is 42 + -| Pardon me, Zaphod who? + + ---------- Footnotes ---------- + + (1) Perhaps it would be better if it were called "Hippy." Ah, well. + + +File: gawk.info, Node: Gawk I18N, Prev: I18N Example, Up: Internationalization + +12.6 `gawk' Can Speak Your Language +=================================== + +`gawk' itself has been internationalized using the GNU `gettext' +package. (GNU `gettext' is described in complete detail in *note (GNU +`gettext' utilities)Top:: gettext, GNU gettext tools.) As of this +writing, the latest version of GNU `gettext' is version 0.18.1 +(ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz). + + If a translation of `gawk''s messages exists, then `gawk' produces +usage messages, warnings, and fatal errors in the local language. + + +File: gawk.info, Node: Advanced Features, Next: Debugger, Prev: Internationalization, Up: Top + +13 Advanced Features of `gawk' +****************************** + + Write documentation as if whoever reads it is a violent psychopath + who knows where you live. + Steve English, as quoted by Peter Langston + + This major node discusses advanced features in `gawk'. It's a bit +of a "grab bag" of items that are otherwise unrelated to each other. +First, a command-line option allows `gawk' to recognize nondecimal +numbers in input data, not just in `awk' programs. Then, `gawk''s +special features for sorting arrays are presented. Next, two-way I/O, +discussed briefly in earlier parts of this Info file, is described in +full detail, along with the basics of TCP/IP networking. Finally, +`gawk' can "profile" an `awk' program, making it possible to tune it +for performance. + + *note Dynamic Extensions::, discusses the ability to dynamically add +new built-in functions to `gawk'. As this feature is still immature +and likely to change, its description is relegated to an appendix. + +* Menu: + +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array traversal and + sorting arrays. +* Two-way I/O:: Two-way communications with another process. +* TCP/IP Networking:: Using `gawk' for network programming. +* Profiling:: Profiling your `awk' programs. + + +File: gawk.info, Node: Nondecimal Data, Next: Array Sorting, Up: Advanced Features + +13.1 Allowing Nondecimal Input Data +=================================== + +If you run `gawk' with the `--non-decimal-data' option, you can have +nondecimal constants in your input data: + + $ echo 0123 123 0x123 | + > gawk --non-decimal-data '{ printf "%d, %d, %d\n", + > $1, $2, $3 }' + -| 83, 123, 291 + + For this feature to work, write your program so that `gawk' treats +your data as numeric: + + $ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }' + -| 0123 123 0x123 + +The `print' statement treats its expressions as strings. Although the +fields can act as numbers when necessary, they are still strings, so +`print' does not try to treat them numerically. You may need to add +zero to a field to force it to be treated as a number. For example: + + $ echo 0123 123 0x123 | gawk --non-decimal-data ' + > { print $1, $2, $3 + > print $1 + 0, $2 + 0, $3 + 0 }' + -| 0123 123 0x123 + -| 83 123 291 + + Because it is common to have decimal data with leading zeros, and +because using this facility could lead to surprising results, the +default is to leave it disabled. If you want it, you must explicitly +request it. + + CAUTION: _Use of this option is not recommended._ It can break old + programs very badly. Instead, use the `strtonum()' function to + convert your data (*note Nondecimal-numbers::). This makes your + programs easier to write and easier to read, and leads to less + surprising results. + + +File: gawk.info, Node: Array Sorting, Next: Two-way I/O, Prev: Nondecimal Data, Up: Advanced Features + +13.2 Controlling Array Traversal and Array Sorting +================================================== + +`gawk' lets you control the order in which a `for (i in array)' loop +traverses an array. + + In addition, two built-in functions, `asort()' and `asorti()', let +you sort arrays based on the array values and indices, respectively. +These two functions also provide control over the sorting criteria used +to order the elements during sorting. + +* Menu: + +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use `asort()' and `asorti()'. + + +File: gawk.info, Node: Controlling Array Traversal, Next: Array Sorting Functions, Up: Array Sorting + +13.2.1 Controlling Array Traversal +---------------------------------- + +By default, the order in which a `for (i in array)' loop scans an array +is not defined; it is generally based upon the internal implementation +of arrays inside `awk'. + + Often, though, it is desirable to be able to loop over the elements +in a particular order that you, the programmer, choose. `gawk' lets +you do this. + + *note Controlling Scanning::, describes how you can assign special, +pre-defined values to `PROCINFO["sorted_in"]' in order to control the +order in which `gawk' will traverse an array during a `for' loop. + + In addition, the value of `PROCINFO["sorted_in"]' can be a function +name. This lets you traverse an array based on any custom criterion. +The array elements are ordered according to the return value of this +function. The comparison function should be defined with at least four +arguments: + + function comp_func(i1, v1, i2, v2) + { + COMPARE ELEMENTS 1 AND 2 IN SOME FASHION + RETURN < 0; 0; OR > 0 + } + + Here, I1 and I2 are the indices, and V1 and V2 are the corresponding +values of the two elements being compared. Either V1 or V2, or both, +can be arrays if the array being traversed contains subarrays as values. +(*Note Arrays of Arrays::, for more information about subarrays.) The +three possible return values are interpreted as follows: + +`comp_func(i1, v1, i2, v2) < 0' + Index I1 comes before index I2 during loop traversal. + +`comp_func(i1, v1, i2, v2) == 0' + Indices I1 and I2 come together but the relative order with + respect to each other is undefined. + +`comp_func(i1, v1, i2, v2) > 0' + Index I1 comes after index I2 during loop traversal. + + Our first comparison function can be used to scan an array in +numerical order of the indices: + + function cmp_num_idx(i1, v1, i2, v2) + { + # numerical index comparison, ascending order + return (i1 - i2) + } + + Our second function traverses an array based on the string order of +the element values rather than by indices: + + function cmp_str_val(i1, v1, i2, v2) + { + # string value comparison, ascending order + v1 = v1 "" + v2 = v2 "" + if (v1 < v2) + return -1 + return (v1 != v2) + } + + The third comparison function makes all numbers, and numeric strings +without any leading or trailing spaces, come out first during loop +traversal: + + function cmp_num_str_val(i1, v1, i2, v2, n1, n2) + { + # numbers before string value comparison, ascending order + n1 = v1 + 0 + n2 = v2 + 0 + if (n1 == v1) + return (n2 == v2) ? (n1 - n2) : -1 + else if (n2 == v2) + return 1 + return (v1 < v2) ? -1 : (v1 != v2) + } + + Here is a main program to demonstrate how `gawk' behaves using each +of the previous functions: + + BEGIN { + data["one"] = 10 + data["two"] = 20 + data[10] = "one" + data[100] = 100 + data[20] = "two" + + f[1] = "cmp_num_idx" + f[2] = "cmp_str_val" + f[3] = "cmp_num_str_val" + for (i = 1; i <= 3; i++) { + printf("Sort function: %s\n", f[i]) + PROCINFO["sorted_in"] = f[i] + for (j in data) + printf("\tdata[%s] = %s\n", j, data[j]) + print "" + } + } + + Here are the results when the program is run: + + $ gawk -f compdemo.awk + -| Sort function: cmp_num_idx Sort by numeric index + -| data[two] = 20 + -| data[one] = 10 Both strings are numerically zero + -| data[10] = one + -| data[20] = two + -| data[100] = 100 + -| + -| Sort function: cmp_str_val Sort by element values as strings + -| data[one] = 10 + -| data[100] = 100 String 100 is less than string 20 + -| data[two] = 20 + -| data[10] = one + -| data[20] = two + -| + -| Sort function: cmp_num_str_val Sort all numeric values before all strings + -| data[one] = 10 + -| data[two] = 20 + -| data[100] = 100 + -| data[10] = one + -| data[20] = two + + Consider sorting the entries of a GNU/Linux system password file +according to login name. The following program sorts records by a +specific field position and can be used for this purpose: + + # sort.awk --- simple program to sort by field position + # field position is specified by the global variable POS + + function cmp_field(i1, v1, i2, v2) + { + # comparison by value, as string, and ascending order + return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) + } + + { + for (i = 1; i <= NF; i++) + a[NR][i] = $i + } + + END { + PROCINFO["sorted_in"] = "cmp_field" + if (POS < 1 || POS > NF) + POS = 1 + for (i in a) { + for (j = 1; j <= NF; j++) + printf("%s%c", a[i][j], j < NF ? ":" : "") + print "" + } + } + + The first field in each entry of the password file is the user's +login name, and the fields are separated by colons. Each record +defines a subarray, with each field as an element in the subarray. +Running the program produces the following output: + + $ gawk -v POS=1 -F: -f sort.awk /etc/passwd + -| adm:x:3:4:adm:/var/adm:/sbin/nologin + -| apache:x:48:48:Apache:/var/www:/sbin/nologin + -| avahi:x:70:70:Avahi daemon:/:/sbin/nologin + ... + + The comparison should normally always return the same value when +given a specific pair of array elements as its arguments. If +inconsistent results are returned then the order is undefined. This +behavior can be exploited to introduce random order into otherwise +seemingly ordered data: + + function cmp_randomize(i1, v1, i2, v2) + { + # random order + return (2 - 4 * rand()) + } + + As mentioned above, the order of the indices is arbitrary if two +elements compare equal. This is usually not a problem, but letting the +tied elements come out in arbitrary order can be an issue, especially +when comparing item values. The partial ordering of the equal elements +may change during the next loop traversal, if other elements are added +or removed from the array. One way to resolve ties when comparing +elements with otherwise equal values is to include the indices in the +comparison rules. Note that doing this may make the loop traversal +less efficient, so consider it only if necessary. The following +comparison functions force a deterministic order, and are based on the +fact that the indices of two elements are never equal: + + function cmp_numeric(i1, v1, i2, v2) + { + # numerical value (and index) comparison, descending order + return (v1 != v2) ? (v2 - v1) : (i2 - i1) + } + + function cmp_string(i1, v1, i2, v2) + { + # string value (and index) comparison, descending order + v1 = v1 i1 + v2 = v2 i2 + return (v1 > v2) ? -1 : (v1 != v2) + } + + A custom comparison function can often simplify ordered loop +traversal, and the sky is really the limit when it comes to designing +such a function. + + When string comparisons are made during a sort, either for element +values where one or both aren't numbers, or for element indices handled +as strings, the value of `IGNORECASE' (*note Built-in Variables::) +controls whether the comparisons treat corresponding uppercase and +lowercase letters as equivalent or distinct. + + Another point to keep in mind is that in the case of subarrays the +element values can themselves be arrays; a production comparison +function should use the `isarray()' function (*note Type Functions::), +to check for this, and choose a defined sorting order for subarrays. + + All sorting based on `PROCINFO["sorted_in"]' is disabled in POSIX +mode, since the `PROCINFO' array is not special in that case. + + As a side note, sorting the array indices before traversing the +array has been reported to add 15% to 20% overhead to the execution +time of `awk' programs. For this reason, sorted array traversal is not +the default. + + +File: gawk.info, Node: Array Sorting Functions, Prev: Controlling Array Traversal, Up: Array Sorting + +13.2.2 Sorting Array Values and Indices with `gawk' +--------------------------------------------------- + +In most `awk' implementations, sorting an array requires writing a +`sort()' function. While this can be educational for exploring +different sorting algorithms, usually that's not the point of the +program. `gawk' provides the built-in `asort()' and `asorti()' +functions (*note String Functions::) for sorting arrays. For example: + + POPULATE THE ARRAY data + n = asort(data) + for (i = 1; i <= n; i++) + DO SOMETHING WITH data[i] + + After the call to `asort()', the array `data' is indexed from 1 to +some number N, the total number of elements in `data'. (This count is +`asort()''s return value.) `data[1]' <= `data[2]' <= `data[3]', and so +on. The comparison is based on the type of the elements (*note Typing +and Comparison::). All numeric values come before all string values, +which in turn come before all subarrays. + + An important side effect of calling `asort()' is that _the array's +original indices are irrevocably lost_. As this isn't always +desirable, `asort()' accepts a second argument: + + POPULATE THE ARRAY source + n = asort(source, dest) + for (i = 1; i <= n; i++) + DO SOMETHING WITH dest[i] + + In this case, `gawk' copies the `source' array into the `dest' array +and then sorts `dest', destroying its indices. However, the `source' +array is not affected. + + `asort()' accepts a third string argument to control comparison of +array elements. As with `PROCINFO["sorted_in"]', this argument may be +one of the predefined names that `gawk' provides (*note Controlling +Scanning::), or the name of a user-defined function (*note Controlling +Array Traversal::). + + NOTE: In all cases, the sorted element values consist of the + original array's element values. The ability to control + comparison merely affects the way in which they are sorted. + + Often, what's needed is to sort on the values of the _indices_ +instead of the values of the elements. To do that, use the `asorti()' +function. The interface is identical to that of `asort()', except that +the index values are used for sorting, and become the values of the +result array: + + { source[$0] = some_func($0) } + + END { + n = asorti(source, dest) + for (i = 1; i <= n; i++) { + Work with sorted indices directly: + DO SOMETHING WITH dest[i] + ... + Access original array via sorted indices: + DO SOMETHING WITH source[dest[i]] + } + } + + Similar to `asort()', in all cases, the sorted element values +consist of the original array's indices. The ability to control +comparison merely affects the way in which they are sorted. + + Sorting the array by replacing the indices provides maximal +flexibility. To traverse the elements in decreasing order, use a loop +that goes from N down to 1, either over the elements or over the +indices.(1) + + Copying array indices and elements isn't expensive in terms of +memory. Internally, `gawk' maintains "reference counts" to data. For +example, when `asort()' copies the first array to the second one, there +is only one copy of the original array elements' data, even though both +arrays use the values. + + Because `IGNORECASE' affects string comparisons, the value of +`IGNORECASE' also affects sorting for both `asort()' and `asorti()'. +Note also that the locale's sorting order does _not_ come into play; +comparisons are based on character values only.(2) Caveat Emptor. + + ---------- Footnotes ---------- + + (1) You may also use one of the predefined sorting names that sorts +in decreasing order. + + (2) This is true because locale-based comparison occurs only when in +POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk' +extensions, they are not available in that case. + + +File: gawk.info, Node: Two-way I/O, Next: TCP/IP Networking, Prev: Array Sorting, Up: Advanced Features + +13.3 Two-Way Communications with Another Process +================================================ + + From: brennan@whidbey.com (Mike Brennan) + Newsgroups: comp.lang.awk + Subject: Re: Learn the SECRET to Attract Women Easily + Date: 4 Aug 1997 17:34:46 GMT + Message-ID: <5s53rm$eca@news.whidbey.com> + + On 3 Aug 1997 13:17:43 GMT, Want More Dates??? + <tracy78@kilgrona.com> wrote: + >Learn the SECRET to Attract Women Easily + > + >The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women + + The scent of awk programmers is a lot more attractive to women than + the scent of perl programmers. + -- + Mike Brennan + + It is often useful to be able to send data to a separate program for +processing and then read the result. This can always be done with +temporary files: + + # Write the data for processing + tempfile = ("mydata." PROCINFO["pid"]) + while (NOT DONE WITH DATA) + print DATA | ("subprogram > " tempfile) + close("subprogram > " tempfile) + + # Read the results, remove tempfile when done + while ((getline newdata < tempfile) > 0) + PROCESS newdata APPROPRIATELY + close(tempfile) + system("rm " tempfile) + +This works, but not elegantly. Among other things, it requires that +the program be run in a directory that cannot be shared among users; +for example, `/tmp' will not do, as another user might happen to be +using a temporary file with the same name. + + However, with `gawk', it is possible to open a _two-way_ pipe to +another process. The second process is termed a "coprocess", since it +runs in parallel with `gawk'. The two-way connection is created using +the `|&' operator (borrowed from the Korn shell, `ksh'):(1) + + do { + print DATA |& "subprogram" + "subprogram" |& getline results + } while (DATA LEFT TO PROCESS) + close("subprogram") + + The first time an I/O operation is executed using the `|&' operator, +`gawk' creates a two-way pipeline to a child process that runs the +other program. Output created with `print' or `printf' is written to +the program's standard input, and output from the program's standard +output can be read by the `gawk' program using `getline'. As is the +case with processes started by `|', the subprogram can be any program, +or pipeline of programs, that can be started by the shell. + + There are some cautionary items to be aware of: + + * As the code inside `gawk' currently stands, the coprocess's + standard error goes to the same place that the parent `gawk''s + standard error goes. It is not possible to read the child's + standard error separately. + + * I/O buffering may be a problem. `gawk' automatically flushes all + output down the pipe to the coprocess. However, if the coprocess + does not flush its output, `gawk' may hang when doing a `getline' + in order to read the coprocess's results. This could lead to a + situation known as "deadlock", where each process is waiting for + the other one to do something. + + It is possible to close just one end of the two-way pipe to a +coprocess, by supplying a second argument to the `close()' function of +either `"to"' or `"from"' (*note Close Files And Pipes::). These +strings tell `gawk' to close the end of the pipe that sends data to the +coprocess or the end that reads from it, respectively. + + This is particularly necessary in order to use the system `sort' +utility as part of a coprocess; `sort' must read _all_ of its input +data before it can produce any output. The `sort' program does not +receive an end-of-file indication until `gawk' closes the write end of +the pipe. + + When you have finished writing data to the `sort' utility, you can +close the `"to"' end of the pipe, and then start reading sorted data +via `getline'. For example: + + BEGIN { + command = "LC_ALL=C sort" + n = split("abcdefghijklmnopqrstuvwxyz", a, "") + + for (i = n; i > 0; i--) + print a[i] |& command + close(command, "to") + + while ((command |& getline line) > 0) + print "got", line + close(command) + } + + This program writes the letters of the alphabet in reverse order, one +per line, down the two-way pipe to `sort'. It then closes the write +end of the pipe, so that `sort' receives an end-of-file indication. +This causes `sort' to sort the data and write the sorted data back to +the `gawk' program. Once all of the data has been read, `gawk' +terminates the coprocess and exits. + + As a side note, the assignment `LC_ALL=C' in the `sort' command +ensures traditional Unix (ASCII) sorting from `sort'. + + You may also use pseudo-ttys (ptys) for two-way communication +instead of pipes, if your system supports them. This is done on a +per-command basis, by setting a special element in the `PROCINFO' array +(*note Auto-set::), like so: + + command = "sort -nr" # command, save in convenience variable + PROCINFO[command, "pty"] = 1 # update PROCINFO + print ... |& command # start two-way pipe + ... + +Using ptys avoids the buffer deadlock issues described earlier, at some +loss in performance. If your system does not have ptys, or if all the +system's ptys are in use, `gawk' automatically falls back to using +regular pipes. + + ---------- Footnotes ---------- + + (1) This is very different from the same operator in the C shell. + + +File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way I/O, Up: Advanced Features + +13.4 Using `gawk' for Network Programming +========================================= + + `EMISTERED': + A host is a host from coast to coast, + and no-one can talk to host that's close, + unless the host that isn't close + is busy hung or dead. + + In addition to being able to open a two-way pipeline to a coprocess +on the same system (*note Two-way I/O::), it is possible to make a +two-way connection to another process on another system across an IP +network connection. + + You can think of this as just a _very long_ two-way pipeline to a +coprocess. The way `gawk' decides that you want to use TCP/IP +networking is by recognizing special file names that begin with one of +`/inet/', `/inet4/' or `/inet6'. + + The full syntax of the special file name is +`/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'. The +components are: + +NET-TYPE + Specifies the kind of Internet connection to make. Use `/inet4/' + to force IPv4, and `/inet6/' to force IPv6. Plain `/inet/' (which + used to be the only option) uses the system default, most likely + IPv4. + +PROTOCOL + The protocol to use over IP. This must be either `tcp', or `udp', + for a TCP or UDP IP connection, respectively. The use of TCP is + recommended for most applications. + +LOCAL-PORT + The local TCP or UDP port number to use. Use a port number of `0' + when you want the system to pick a port. This is what you should do + when writing a TCP or UDP client. You may also use a well-known + service name, such as `smtp' or `http', in which case `gawk' + attempts to determine the predefined port number using the C + `getaddrinfo()' function. + +REMOTE-HOST + The IP address or fully-qualified domain name of the Internet host + to which you want to connect. + +REMOTE-PORT + The TCP or UDP port number to use on the given REMOTE-HOST. + Again, use `0' if you don't care, or else a well-known service + name. + + NOTE: Failure in opening a two-way socket will result in a + non-fatal error being returned to the calling code. The value of + `ERRNO' indicates the error (*note Auto-set::). + + Consider the following very simple example: + + BEGIN { + Service = "/inet/tcp/0/localhost/daytime" + Service |& getline + print $0 + close(Service) + } + + This program reads the current date and time from the local system's +TCP `daytime' server. It then prints the results and closes the +connection. + + Because this topic is extensive, the use of `gawk' for TCP/IP +programming is documented separately. See *note (General +Introduction)Top:: gawkinet, TCP/IP Internetworking with `gawk', for a +much more complete introduction and discussion, as well as extensive +examples. + + +File: gawk.info, Node: Profiling, Prev: TCP/IP Networking, Up: Advanced Features + +13.5 Profiling Your `awk' Programs +================================== + +You may produce execution traces of your `awk' programs. This is done +by passing the option `--profile' to `gawk'. When `gawk' has finished +running, it creates a profile of your program in a file named +`awkprof.out'. Because it is profiling, it also executes up to 45% +slower than `gawk' normally does. + + As shown in the following example, the `--profile' option can be +used to change the name of the file where `gawk' will write the profile: + + gawk --profile=myprog.prof -f myprog.awk data1 data2 + +In the above example, `gawk' places the profile in `myprog.prof' +instead of in `awkprof.out'. + + Here is a sample session showing a simple `awk' program, its input +data, and the results from running `gawk' with the `--profile' option. +First, the `awk' program: + + BEGIN { print "First BEGIN rule" } + + END { print "First END rule" } + + /foo/ { + print "matched /foo/, gosh" + for (i = 1; i <= 3; i++) + sing() + } + + { + if (/foo/) + print "if is true" + else + print "else is true" + } + + BEGIN { print "Second BEGIN rule" } + + END { print "Second END rule" } + + function sing( dummy) + { + print "I gotta be me!" + } + + Following is the input data: + + foo + bar + baz + foo + junk + + Here is the `awkprof.out' that results from running the `gawk' +profiler on this program and data (this example also illustrates that +`awk' programmers sometimes have to work late): + + # gawk profile, created Sun Aug 13 00:00:15 2000 + + # BEGIN block(s) + + BEGIN { + 1 print "First BEGIN rule" + 1 print "Second BEGIN rule" + } + + # Rule(s) + + 5 /foo/ { # 2 + 2 print "matched /foo/, gosh" + 6 for (i = 1; i <= 3; i++) { + 6 sing() + } + } + + 5 { + 5 if (/foo/) { # 2 + 2 print "if is true" + 3 } else { + 3 print "else is true" + } + } + + # END block(s) + + END { + 1 print "First END rule" + 1 print "Second END rule" + } + + # Functions, listed alphabetically + + 6 function sing(dummy) + { + 6 print "I gotta be me!" + } + + This example illustrates many of the basic features of profiling +output. They are as follows: + + * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule, + pattern/action rules, `ENDFILE' rule, `END' rule and functions, + listed alphabetically. Multiple `BEGIN' and `END' rules are + merged together, as are multiple `BEGINFILE' and `ENDFILE' rules. + + * Pattern-action rules have two counts. The first count, to the + left of the rule, shows how many times the rule's pattern was + _tested_. The second count, to the right of the rule's opening + left brace in a comment, shows how many times the rule's action + was _executed_. The difference between the two indicates how many + times the rule's pattern evaluated to false. + + * Similarly, the count for an `if'-`else' statement shows how many + times the condition was tested. To the right of the opening left + brace for the `if''s body is a count showing how many times the + condition was true. The count for the `else' indicates how many + times the test failed. + + * The count for a loop header (such as `for' or `while') shows how + many times the loop test was executed. (Because of this, you + can't just look at the count on the first statement in a rule to + determine how many times the rule was executed. If the first + statement is a loop, the count is misleading.) + + * For user-defined functions, the count next to the `function' + keyword indicates how many times the function was called. The + counts next to the statements in the body show how many times + those statements were executed. + + * The layout uses "K&R" style with TABs. Braces are used + everywhere, even when the body of an `if', `else', or loop is only + a single statement. + + * Parentheses are used only where needed, as indicated by the + structure of the program and the precedence rules. For example, + `(3 + 5) * 4' means add three plus five, then multiply the total + by four. However, `3 + 5 * 4' has no parentheses, and means `3 + + (5 * 4)'. + + * Parentheses are used around the arguments to `print' and `printf' + only when the `print' or `printf' statement is followed by a + redirection. Similarly, if the target of a redirection isn't a + scalar, it gets parenthesized. + + * `gawk' supplies leading comments in front of the `BEGIN' and `END' + rules, the pattern/action rules, and the functions. + + + The profiled version of your program may not look exactly like what +you typed when you wrote it. This is because `gawk' creates the +profiled version by "pretty printing" its internal representation of +the program. The advantage to this is that `gawk' can produce a +standard representation. The disadvantage is that all source-code +comments are lost, as are the distinctions among multiple `BEGIN', +`END', `BEGINFILE', and `ENDFILE' rules. Also, things such as: + + /foo/ + +come out as: + + /foo/ { + print $0 + } + +which is correct, but possibly surprising. + + Besides creating profiles when a program has completed, `gawk' can +produce a profile while it is running. This is useful if your `awk' +program goes into an infinite loop and you want to see what has been +executed. To use this feature, run `gawk' with the `--profile' option +in the background: + + $ gawk --profile -f myprog & + [1] 13992 + +The shell prints a job number and process ID number; in this case, +13992. Use the `kill' command to send the `USR1' signal to `gawk': + + $ kill -USR1 13992 + +As usual, the profiled version of the program is written to +`awkprof.out', or to a different file if one specified with the +`--profile' option. + + Along with the regular profile, as shown earlier, the profile +includes a trace of any active functions: + + # Function Call Stack: + + # 3. baz + # 2. bar + # 1. foo + # -- main -- + + You may send `gawk' the `USR1' signal as many times as you like. +Each time, the profile and function call trace are appended to the +output profile file. + + If you use the `HUP' signal instead of the `USR1' signal, `gawk' +produces the profile and the function call trace and then exits. + + When `gawk' runs on MS-Windows systems, it uses the `INT' and `QUIT' +signals for producing the profile and, in the case of the `INT' signal, +`gawk' exits. This is because these systems don't support the `kill' +command, so the only signals you can deliver to a program are those +generated by the keyboard. The `INT' signal is generated by the +`Ctrl-<C>' or `Ctrl-<BREAK>' key, while the `QUIT' signal is generated +by the `Ctrl-<\>' key. + + Finally, `gawk' also accepts another option `--pretty-print'. When +called this way, `gawk' "pretty prints" the program into `awkprof.out', +without any execution counts. + + +File: gawk.info, Node: Debugger, Next: Arbitrary Precision Arithmetic, Prev: Advanced Features, Up: Top 14 Debugging `awk' Programs *************************** @@ -29489,8 +29510,8 @@ Index * break debugger command: Breakpoint Control. (line 11) * break statement: Break Statement. (line 6) * Brennan, Michael <1>: Other Versions. (line 6) -* Brennan, Michael <2>: Simple Sed. (line 25) -* Brennan, Michael <3>: Two-way I/O. (line 6) +* Brennan, Michael <2>: Two-way I/O. (line 6) +* Brennan, Michael <3>: Simple Sed. (line 25) * Brennan, Michael: Delete. (line 56) * Brian Kernighan's awk, extensions <1>: Other Versions. (line 13) * Brian Kernighan's awk, extensions: BTL. (line 6) @@ -31042,10 +31063,10 @@ Index * private variables: Library Names. (line 11) * processes, two-way communications with: Two-way I/O. (line 23) * processing data: Basic High Level. (line 6) -* PROCINFO array <1>: Id Program. (line 15) -* PROCINFO array <2>: Group Functions. (line 6) -* PROCINFO array <3>: Passwd Functions. (line 6) -* PROCINFO array <4>: Two-way I/O. (line 116) +* PROCINFO array <1>: Two-way I/O. (line 116) +* PROCINFO array <2>: Id Program. (line 15) +* PROCINFO array <3>: Group Functions. (line 6) +* PROCINFO array <4>: Passwd Functions. (line 6) * PROCINFO array <5>: Time Functions. (line 46) * PROCINFO array <6>: Auto-set. (line 130) * PROCINFO array: Obsolete. (line 11) @@ -31672,507 +31693,507 @@ Node: History47786 Node: Names50177 Ref: Names-Footnote-151654 Node: This Manual51726 -Ref: This Manual-Footnote-156854 -Node: Conventions56954 -Node: Manual History59088 -Ref: Manual History-Footnote-162358 -Ref: Manual History-Footnote-262399 -Node: How To Contribute62473 -Node: Acknowledgments63617 -Node: Getting Started68113 -Node: Running gawk70492 -Node: One-shot71678 -Node: Read Terminal72903 -Ref: Read Terminal-Footnote-174553 -Ref: Read Terminal-Footnote-274829 -Node: Long75000 -Node: Executable Scripts76376 -Ref: Executable Scripts-Footnote-178245 -Ref: Executable Scripts-Footnote-278347 -Node: Comments78894 -Node: Quoting81361 -Node: DOS Quoting85984 -Node: Sample Data Files86659 -Node: Very Simple89691 -Node: Two Rules94290 -Node: More Complex96437 -Ref: More Complex-Footnote-199367 -Node: Statements/Lines99452 -Ref: Statements/Lines-Footnote-1103914 -Node: Other Features104179 -Node: When105107 -Node: Invoking Gawk107254 -Node: Command Line108715 -Node: Options109498 -Ref: Options-Footnote-1124896 -Node: Other Arguments124921 -Node: Naming Standard Input127579 -Node: Environment Variables128673 -Node: AWKPATH Variable129231 -Ref: AWKPATH Variable-Footnote-1131989 -Node: AWKLIBPATH Variable132249 -Node: Other Environment Variables132846 -Node: Exit Status135341 -Node: Include Files136016 -Node: Loading Shared Libraries139585 -Node: Obsolete140810 -Node: Undocumented141507 -Node: Regexp141750 -Node: Regexp Usage143139 -Node: Escape Sequences145165 -Node: Regexp Operators150928 -Ref: Regexp Operators-Footnote-1158308 -Ref: Regexp Operators-Footnote-2158455 -Node: Bracket Expressions158553 -Ref: table-char-classes160443 -Node: GNU Regexp Operators162966 -Node: Case-sensitivity166689 -Ref: Case-sensitivity-Footnote-1169657 -Ref: Case-sensitivity-Footnote-2169892 -Node: Leftmost Longest170000 -Node: Computed Regexps171201 -Node: Reading Files174611 -Node: Records176614 -Ref: Records-Footnote-1185538 -Node: Fields185575 -Ref: Fields-Footnote-1188608 -Node: Nonconstant Fields188694 -Node: Changing Fields190896 -Node: Field Separators196877 -Node: Default Field Splitting199506 -Node: Regexp Field Splitting200623 -Node: Single Character Fields203965 -Node: Command Line Field Separator205024 -Node: Field Splitting Summary208465 -Ref: Field Splitting Summary-Footnote-1211657 -Node: Constant Size211758 -Node: Splitting By Content216342 -Ref: Splitting By Content-Footnote-1220068 -Node: Multiple Line220108 -Ref: Multiple Line-Footnote-1225955 -Node: Getline226134 -Node: Plain Getline228350 -Node: Getline/Variable230439 -Node: Getline/File231580 -Node: Getline/Variable/File232902 -Ref: Getline/Variable/File-Footnote-1234501 -Node: Getline/Pipe234588 -Node: Getline/Variable/Pipe237148 -Node: Getline/Coprocess238255 -Node: Getline/Variable/Coprocess239498 -Node: Getline Notes240212 -Node: Getline Summary242999 -Ref: table-getline-variants243407 -Node: Read Timeout244263 -Ref: Read Timeout-Footnote-1248008 -Node: Command line directories248065 -Node: Printing248695 -Node: Print250326 -Node: Print Examples251663 -Node: Output Separators254447 -Node: OFMT256207 -Node: Printf257565 -Node: Basic Printf258471 -Node: Control Letters260010 -Node: Format Modifiers263822 -Node: Printf Examples269831 -Node: Redirection272546 -Node: Special Files279530 -Node: Special FD280063 -Ref: Special FD-Footnote-1283688 -Node: Special Network283762 -Node: Special Caveats284612 -Node: Close Files And Pipes285408 -Ref: Close Files And Pipes-Footnote-1292431 -Ref: Close Files And Pipes-Footnote-2292579 -Node: Expressions292729 -Node: Values293861 -Node: Constants294537 -Node: Scalar Constants295217 -Ref: Scalar Constants-Footnote-1296076 -Node: Nondecimal-numbers296258 -Node: Regexp Constants299317 -Node: Using Constant Regexps299792 -Node: Variables302847 -Node: Using Variables303502 -Node: Assignment Options305226 -Node: Conversion307098 -Ref: table-locale-affects312474 -Ref: Conversion-Footnote-1313098 -Node: All Operators313207 -Node: Arithmetic Ops313837 -Node: Concatenation316342 -Ref: Concatenation-Footnote-1319135 -Node: Assignment Ops319255 -Ref: table-assign-ops324243 -Node: Increment Ops325651 -Node: Truth Values and Conditions329121 -Node: Truth Values330204 -Node: Typing and Comparison331253 -Node: Variable Typing332042 -Ref: Variable Typing-Footnote-1335939 -Node: Comparison Operators336061 -Ref: table-relational-ops336471 -Node: POSIX String Comparison340020 -Ref: POSIX String Comparison-Footnote-1340976 -Node: Boolean Ops341114 -Ref: Boolean Ops-Footnote-1345192 -Node: Conditional Exp345283 -Node: Function Calls347015 -Node: Precedence350609 -Node: Locales354278 -Node: Patterns and Actions355367 -Node: Pattern Overview356421 -Node: Regexp Patterns358090 -Node: Expression Patterns358633 -Node: Ranges362318 -Node: BEGIN/END365284 -Node: Using BEGIN/END366046 -Ref: Using BEGIN/END-Footnote-1368777 -Node: I/O And BEGIN/END368883 -Node: BEGINFILE/ENDFILE371165 -Node: Empty374069 -Node: Using Shell Variables374385 -Node: Action Overview376670 -Node: Statements379027 -Node: If Statement380881 -Node: While Statement382380 -Node: Do Statement384424 -Node: For Statement385580 -Node: Switch Statement388732 -Node: Break Statement390829 -Node: Continue Statement392819 -Node: Next Statement394612 -Node: Nextfile Statement397002 -Node: Exit Statement399643 -Node: Built-in Variables402059 -Node: User-modified403154 -Ref: User-modified-Footnote-1411509 -Node: Auto-set411571 -Ref: Auto-set-Footnote-1423922 -Ref: Auto-set-Footnote-2424127 -Node: ARGC and ARGV424183 -Node: Arrays428034 -Node: Array Basics429539 -Node: Array Intro430365 -Node: Reference to Elements434683 -Node: Assigning Elements436953 -Node: Array Example437444 -Node: Scanning an Array439176 -Node: Controlling Scanning441490 -Ref: Controlling Scanning-Footnote-1446423 -Node: Delete446739 -Ref: Delete-Footnote-1449504 -Node: Numeric Array Subscripts449561 -Node: Uninitialized Subscripts451744 -Node: Multi-dimensional453372 -Node: Multi-scanning456466 -Node: Arrays of Arrays458057 -Node: Functions462702 -Node: Built-in463524 -Node: Calling Built-in464602 -Node: Numeric Functions466590 -Ref: Numeric Functions-Footnote-1470422 -Ref: Numeric Functions-Footnote-2470779 -Ref: Numeric Functions-Footnote-3470827 -Node: String Functions471096 -Ref: String Functions-Footnote-1494593 -Ref: String Functions-Footnote-2494722 -Ref: String Functions-Footnote-3494970 -Node: Gory Details495057 -Ref: table-sub-escapes496736 -Ref: table-sub-posix-92498090 -Ref: table-sub-proposed499433 -Ref: table-posix-sub500783 -Ref: table-gensub-escapes502329 -Ref: Gory Details-Footnote-1503536 -Ref: Gory Details-Footnote-2503587 -Node: I/O Functions503738 -Ref: I/O Functions-Footnote-1510393 -Node: Time Functions510540 -Ref: Time Functions-Footnote-1521432 -Ref: Time Functions-Footnote-2521500 -Ref: Time Functions-Footnote-3521658 -Ref: Time Functions-Footnote-4521769 -Ref: Time Functions-Footnote-5521881 -Ref: Time Functions-Footnote-6522108 -Node: Bitwise Functions522374 -Ref: table-bitwise-ops522932 -Ref: Bitwise Functions-Footnote-1527153 -Node: Type Functions527337 -Node: I18N Functions527807 -Node: User-defined529434 -Node: Definition Syntax530238 -Ref: Definition Syntax-Footnote-1535148 -Node: Function Example535217 -Node: Function Caveats537811 -Node: Calling A Function538232 -Node: Variable Scope539347 -Node: Pass By Value/Reference541322 -Node: Return Statement544762 -Node: Dynamic Typing547743 -Node: Indirect Calls548478 -Node: Internationalization558163 -Node: I18N and L10N559589 -Node: Explaining gettext560275 -Ref: Explaining gettext-Footnote-1565341 -Ref: Explaining gettext-Footnote-2565525 -Node: Programmer i18n565690 -Node: Translator i18n569890 -Node: String Extraction570683 -Ref: String Extraction-Footnote-1571644 -Node: Printf Ordering571730 -Ref: Printf Ordering-Footnote-1574514 -Node: I18N Portability574578 -Ref: I18N Portability-Footnote-1577027 -Node: I18N Example577090 -Ref: I18N Example-Footnote-1579725 -Node: Gawk I18N579797 -Node: Advanced Features580414 -Node: Nondecimal Data581927 -Node: Array Sorting583510 -Node: Controlling Array Traversal584207 -Node: Array Sorting Functions592445 -Ref: Array Sorting Functions-Footnote-1596119 -Ref: Array Sorting Functions-Footnote-2596212 -Node: Two-way I/O596406 -Ref: Two-way I/O-Footnote-1601838 -Node: TCP/IP Networking601908 -Node: Profiling604752 -Node: Library Functions612206 -Ref: Library Functions-Footnote-1615213 -Node: Library Names615384 -Ref: Library Names-Footnote-1618855 -Ref: Library Names-Footnote-2619075 -Node: General Functions619161 -Node: Strtonum Function620114 -Node: Assert Function623044 -Node: Round Function626370 -Node: Cliff Random Function627913 -Node: Ordinal Functions628929 -Ref: Ordinal Functions-Footnote-1631999 -Ref: Ordinal Functions-Footnote-2632251 -Node: Join Function632460 -Ref: Join Function-Footnote-1634231 -Node: Getlocaltime Function634431 -Node: Data File Management638146 -Node: Filetrans Function638778 -Node: Rewind Function642917 -Node: File Checking644304 -Node: Empty Files645398 -Node: Ignoring Assigns647628 -Node: Getopt Function649181 -Ref: Getopt Function-Footnote-1660485 -Node: Passwd Functions660688 -Ref: Passwd Functions-Footnote-1669663 -Node: Group Functions669751 -Node: Walking Arrays677835 -Node: Sample Programs679404 -Node: Running Examples680069 -Node: Clones680797 -Node: Cut Program682021 -Node: Egrep Program691866 -Ref: Egrep Program-Footnote-1699639 -Node: Id Program699749 -Node: Split Program703365 -Ref: Split Program-Footnote-1706884 -Node: Tee Program707012 -Node: Uniq Program709815 -Node: Wc Program717244 -Ref: Wc Program-Footnote-1721510 -Ref: Wc Program-Footnote-2721710 -Node: Miscellaneous Programs721802 -Node: Dupword Program722990 -Node: Alarm Program725021 -Node: Translate Program729770 -Ref: Translate Program-Footnote-1734157 -Ref: Translate Program-Footnote-2734385 -Node: Labels Program734519 -Ref: Labels Program-Footnote-1737890 -Node: Word Sorting737974 -Node: History Sorting741858 -Node: Extract Program743697 -Ref: Extract Program-Footnote-1751180 -Node: Simple Sed751308 -Node: Igawk Program754370 -Ref: Igawk Program-Footnote-1769527 -Ref: Igawk Program-Footnote-2769728 -Node: Anagram Program769866 -Node: Signature Program772934 -Node: Debugger774034 -Node: Debugging775000 -Node: Debugging Concepts775433 -Node: Debugging Terms777289 -Node: Awk Debugging779886 -Node: Sample Debugging Session780778 -Node: Debugger Invocation781298 -Node: Finding The Bug782627 -Node: List of Debugger Commands789115 -Node: Breakpoint Control790449 -Node: Debugger Execution Control794113 -Node: Viewing And Changing Data797473 -Node: Execution Stack800829 -Node: Debugger Info802296 -Node: Miscellaneous Debugger Commands806277 -Node: Readline Support811722 -Node: Limitations812553 -Node: Arbitrary Precision Arithmetic814805 -Ref: Arbitrary Precision Arithmetic-Footnote-1816447 -Node: General Arithmetic816595 -Node: Floating Point Issues818315 -Node: String Conversion Precision819196 -Ref: String Conversion Precision-Footnote-1820902 -Node: Unexpected Results821011 -Node: POSIX Floating Point Problems823164 -Ref: POSIX Floating Point Problems-Footnote-1826989 -Node: Integer Programming827027 -Node: Floating-point Programming828780 -Ref: Floating-point Programming-Footnote-1835089 -Node: Floating-point Representation835353 -Node: Floating-point Context836518 -Ref: table-ieee-formats837360 -Node: Rounding Mode838744 -Ref: table-rounding-modes839223 -Ref: Rounding Mode-Footnote-1842227 -Node: Gawk and MPFR842408 -Node: Arbitrary Precision Floats843650 -Ref: Arbitrary Precision Floats-Footnote-1846079 -Node: Setting Precision846390 -Node: Setting Rounding Mode849123 -Ref: table-gawk-rounding-modes849527 -Node: Floating-point Constants850707 -Node: Changing Precision852131 -Ref: Changing Precision-Footnote-1853531 -Node: Exact Arithmetic853705 -Node: Arbitrary Precision Integers856813 -Ref: Arbitrary Precision Integers-Footnote-1859813 -Node: Dynamic Extensions859960 -Node: Extension Intro861283 -Node: Plugin License862486 -Node: Extension Design863160 -Node: Old Extension Problems864231 -Ref: Old Extension Problems-Footnote-1865741 -Node: Extension New Mechanism Goals865798 -Ref: Extension New Mechanism Goals-Footnote-1868510 -Node: Extension Other Design Decisions868696 -Node: Extension Mechanism Outline870443 -Ref: load-extension871468 -Ref: load-new-function872946 -Ref: call-new-function873927 -Node: Extension Future Growth875908 -Node: Extension API Description876650 -Node: Extension API Functions Introduction877970 -Node: General Data Types882045 -Ref: General Data Types-Footnote-1887678 -Node: Requesting Values887977 -Ref: table-value-types-returned888708 -Node: Constructor Functions889662 -Node: Registration Functions892658 -Node: Extension Functions893343 -Node: Exit Callback Functions895162 -Node: Extension Version String896405 -Node: Input Parsers897055 -Node: Output Wrappers905636 -Node: Two-way processors910029 -Node: Printing Messages912151 -Ref: Printing Messages-Footnote-1913228 -Node: Updating `ERRNO'913380 -Node: Accessing Parameters914119 -Node: Symbol Table Access915349 -Node: Symbol table by name915861 -Ref: Symbol table by name-Footnote-1918033 -Node: Symbol table by cookie918113 -Ref: Symbol table by cookie-Footnote-1922242 -Node: Cached values922305 -Ref: Cached values-Footnote-1925506 -Node: Array Manipulation925597 -Ref: Array Manipulation-Footnote-1926695 -Node: Array Data Types926734 -Ref: Array Data Types-Footnote-1929456 -Node: Array Functions929548 -Node: Flattening Arrays933314 -Node: Creating Arrays940145 -Node: Extension API Variables944941 -Node: Extension Versioning945577 -Node: Extension API Informational Variables947478 -Node: Extension API Boilerplate948564 -Node: Finding Extensions952398 -Node: Extension Example952945 -Node: Internal File Description953683 -Node: Internal File Ops957371 -Ref: Internal File Ops-Footnote-1968455 -Node: Using Internal File Ops968595 -Ref: Using Internal File Ops-Footnote-1970951 -Node: Extension Samples971217 -Node: Extension Sample File Functions972660 -Node: Extension Sample Fnmatch981029 -Node: Extension Sample Fork982755 -Node: Extension Sample Ord983969 -Node: Extension Sample Readdir984745 -Node: Extension Sample Revout987083 -Node: Extension Sample Rev2way987676 -Node: Extension Sample Read write array988366 -Node: Extension Sample Readfile990249 -Node: Extension Sample API Tests991004 -Node: Extension Sample Time991529 -Node: gawkextlib992838 -Node: Language History995221 -Node: V7/SVR3.1996743 -Node: SVR4999064 -Node: POSIX1000506 -Node: BTL1001514 -Node: POSIX/GNU1002248 -Node: Common Extensions1007783 -Node: Ranges and Locales1008890 -Ref: Ranges and Locales-Footnote-11013508 -Ref: Ranges and Locales-Footnote-21013535 -Ref: Ranges and Locales-Footnote-31013795 -Node: Contributors1014016 -Node: Installation1018312 -Node: Gawk Distribution1019206 -Node: Getting1019690 -Node: Extracting1020516 -Node: Distribution contents1022208 -Node: Unix Installation1027430 -Node: Quick Installation1028047 -Node: Additional Configuration Options1030009 -Node: Configuration Philosophy1031486 -Node: Non-Unix Installation1033828 -Node: PC Installation1034286 -Node: PC Binary Installation1035585 -Node: PC Compiling1037433 -Node: PC Testing1040377 -Node: PC Using1041553 -Node: Cygwin1045738 -Node: MSYS1046738 -Node: VMS Installation1047252 -Node: VMS Compilation1047855 -Ref: VMS Compilation-Footnote-11048862 -Node: VMS Installation Details1048920 -Node: VMS Running1050555 -Node: VMS Old Gawk1052162 -Node: Bugs1052636 -Node: Other Versions1056488 -Node: Notes1061803 -Node: Compatibility Mode1062390 -Node: Additions1063173 -Node: Accessing The Source1064100 -Node: Adding Code1065526 -Node: New Ports1071568 -Node: Derived Files1075703 -Ref: Derived Files-Footnote-11081008 -Ref: Derived Files-Footnote-21081042 -Ref: Derived Files-Footnote-31081642 -Node: Future Extensions1081740 -Node: Basic Concepts1083227 -Node: Basic High Level1083908 -Ref: figure-general-flow1084179 -Ref: figure-process-flow1084778 -Ref: Basic High Level-Footnote-11088007 -Node: Basic Data Typing1088192 -Node: Glossary1091547 -Node: Copying1116858 -Node: GNU Free Documentation License1154415 -Node: Index1179552 +Ref: This Manual-Footnote-157632 +Node: Conventions57732 +Node: Manual History59866 +Ref: Manual History-Footnote-163136 +Ref: Manual History-Footnote-263177 +Node: How To Contribute63251 +Node: Acknowledgments64395 +Node: Getting Started68891 +Node: Running gawk71270 +Node: One-shot72456 +Node: Read Terminal73681 +Ref: Read Terminal-Footnote-175331 +Ref: Read Terminal-Footnote-275607 +Node: Long75778 +Node: Executable Scripts77154 +Ref: Executable Scripts-Footnote-179023 +Ref: Executable Scripts-Footnote-279125 +Node: Comments79672 +Node: Quoting82139 +Node: DOS Quoting86762 +Node: Sample Data Files87437 +Node: Very Simple90469 +Node: Two Rules95068 +Node: More Complex97215 +Ref: More Complex-Footnote-1100145 +Node: Statements/Lines100230 +Ref: Statements/Lines-Footnote-1104692 +Node: Other Features104957 +Node: When105885 +Node: Invoking Gawk108032 +Node: Command Line109493 +Node: Options110276 +Ref: Options-Footnote-1125674 +Node: Other Arguments125699 +Node: Naming Standard Input128357 +Node: Environment Variables129451 +Node: AWKPATH Variable130009 +Ref: AWKPATH Variable-Footnote-1132767 +Node: AWKLIBPATH Variable133027 +Node: Other Environment Variables133624 +Node: Exit Status136119 +Node: Include Files136794 +Node: Loading Shared Libraries140363 +Node: Obsolete141588 +Node: Undocumented142285 +Node: Regexp142528 +Node: Regexp Usage143917 +Node: Escape Sequences145943 +Node: Regexp Operators151706 +Ref: Regexp Operators-Footnote-1159086 +Ref: Regexp Operators-Footnote-2159233 +Node: Bracket Expressions159331 +Ref: table-char-classes161221 +Node: GNU Regexp Operators163744 +Node: Case-sensitivity167467 +Ref: Case-sensitivity-Footnote-1170435 +Ref: Case-sensitivity-Footnote-2170670 +Node: Leftmost Longest170778 +Node: Computed Regexps171979 +Node: Reading Files175389 +Node: Records177392 +Ref: Records-Footnote-1186316 +Node: Fields186353 +Ref: Fields-Footnote-1189386 +Node: Nonconstant Fields189472 +Node: Changing Fields191674 +Node: Field Separators197655 +Node: Default Field Splitting200284 +Node: Regexp Field Splitting201401 +Node: Single Character Fields204743 +Node: Command Line Field Separator205802 +Node: Field Splitting Summary209243 +Ref: Field Splitting Summary-Footnote-1212435 +Node: Constant Size212536 +Node: Splitting By Content217120 +Ref: Splitting By Content-Footnote-1220846 +Node: Multiple Line220886 +Ref: Multiple Line-Footnote-1226733 +Node: Getline226912 +Node: Plain Getline229128 +Node: Getline/Variable231217 +Node: Getline/File232358 +Node: Getline/Variable/File233680 +Ref: Getline/Variable/File-Footnote-1235279 +Node: Getline/Pipe235366 +Node: Getline/Variable/Pipe237926 +Node: Getline/Coprocess239033 +Node: Getline/Variable/Coprocess240276 +Node: Getline Notes240990 +Node: Getline Summary243777 +Ref: table-getline-variants244185 +Node: Read Timeout245041 +Ref: Read Timeout-Footnote-1248786 +Node: Command line directories248843 +Node: Printing249473 +Node: Print251104 +Node: Print Examples252441 +Node: Output Separators255225 +Node: OFMT256985 +Node: Printf258343 +Node: Basic Printf259249 +Node: Control Letters260788 +Node: Format Modifiers264600 +Node: Printf Examples270609 +Node: Redirection273324 +Node: Special Files280308 +Node: Special FD280841 +Ref: Special FD-Footnote-1284466 +Node: Special Network284540 +Node: Special Caveats285390 +Node: Close Files And Pipes286186 +Ref: Close Files And Pipes-Footnote-1293209 +Ref: Close Files And Pipes-Footnote-2293357 +Node: Expressions293507 +Node: Values294639 +Node: Constants295315 +Node: Scalar Constants295995 +Ref: Scalar Constants-Footnote-1296854 +Node: Nondecimal-numbers297036 +Node: Regexp Constants300095 +Node: Using Constant Regexps300570 +Node: Variables303625 +Node: Using Variables304280 +Node: Assignment Options306004 +Node: Conversion307876 +Ref: table-locale-affects313252 +Ref: Conversion-Footnote-1313876 +Node: All Operators313985 +Node: Arithmetic Ops314615 +Node: Concatenation317120 +Ref: Concatenation-Footnote-1319913 +Node: Assignment Ops320033 +Ref: table-assign-ops325021 +Node: Increment Ops326429 +Node: Truth Values and Conditions329899 +Node: Truth Values330982 +Node: Typing and Comparison332031 +Node: Variable Typing332820 +Ref: Variable Typing-Footnote-1336717 +Node: Comparison Operators336839 +Ref: table-relational-ops337249 +Node: POSIX String Comparison340798 +Ref: POSIX String Comparison-Footnote-1341754 +Node: Boolean Ops341892 +Ref: Boolean Ops-Footnote-1345970 +Node: Conditional Exp346061 +Node: Function Calls347793 +Node: Precedence351387 +Node: Locales355056 +Node: Patterns and Actions356145 +Node: Pattern Overview357199 +Node: Regexp Patterns358868 +Node: Expression Patterns359411 +Node: Ranges363096 +Node: BEGIN/END366062 +Node: Using BEGIN/END366824 +Ref: Using BEGIN/END-Footnote-1369555 +Node: I/O And BEGIN/END369661 +Node: BEGINFILE/ENDFILE371943 +Node: Empty374847 +Node: Using Shell Variables375163 +Node: Action Overview377448 +Node: Statements379805 +Node: If Statement381659 +Node: While Statement383158 +Node: Do Statement385202 +Node: For Statement386358 +Node: Switch Statement389510 +Node: Break Statement391607 +Node: Continue Statement393597 +Node: Next Statement395390 +Node: Nextfile Statement397780 +Node: Exit Statement400421 +Node: Built-in Variables402837 +Node: User-modified403932 +Ref: User-modified-Footnote-1412287 +Node: Auto-set412349 +Ref: Auto-set-Footnote-1424700 +Ref: Auto-set-Footnote-2424905 +Node: ARGC and ARGV424961 +Node: Arrays428812 +Node: Array Basics430317 +Node: Array Intro431143 +Node: Reference to Elements435461 +Node: Assigning Elements437731 +Node: Array Example438222 +Node: Scanning an Array439954 +Node: Controlling Scanning442268 +Ref: Controlling Scanning-Footnote-1447201 +Node: Delete447517 +Ref: Delete-Footnote-1450282 +Node: Numeric Array Subscripts450339 +Node: Uninitialized Subscripts452522 +Node: Multi-dimensional454150 +Node: Multi-scanning457244 +Node: Arrays of Arrays458835 +Node: Functions463480 +Node: Built-in464299 +Node: Calling Built-in465377 +Node: Numeric Functions467365 +Ref: Numeric Functions-Footnote-1471197 +Ref: Numeric Functions-Footnote-2471554 +Ref: Numeric Functions-Footnote-3471602 +Node: String Functions471871 +Ref: String Functions-Footnote-1495368 +Ref: String Functions-Footnote-2495497 +Ref: String Functions-Footnote-3495745 +Node: Gory Details495832 +Ref: table-sub-escapes497511 +Ref: table-sub-posix-92498865 +Ref: table-sub-proposed500208 +Ref: table-posix-sub501558 +Ref: table-gensub-escapes503104 +Ref: Gory Details-Footnote-1504311 +Ref: Gory Details-Footnote-2504362 +Node: I/O Functions504513 +Ref: I/O Functions-Footnote-1511168 +Node: Time Functions511315 +Ref: Time Functions-Footnote-1522207 +Ref: Time Functions-Footnote-2522275 +Ref: Time Functions-Footnote-3522433 +Ref: Time Functions-Footnote-4522544 +Ref: Time Functions-Footnote-5522656 +Ref: Time Functions-Footnote-6522883 +Node: Bitwise Functions523149 +Ref: table-bitwise-ops523707 +Ref: Bitwise Functions-Footnote-1527928 +Node: Type Functions528112 +Node: I18N Functions528582 +Node: User-defined530209 +Node: Definition Syntax531013 +Ref: Definition Syntax-Footnote-1535923 +Node: Function Example535992 +Node: Function Caveats538586 +Node: Calling A Function539007 +Node: Variable Scope540122 +Node: Pass By Value/Reference542097 +Node: Return Statement545537 +Node: Dynamic Typing548518 +Node: Indirect Calls549253 +Node: Library Functions558938 +Ref: Library Functions-Footnote-1561937 +Node: Library Names562108 +Ref: Library Names-Footnote-1565579 +Ref: Library Names-Footnote-2565799 +Node: General Functions565885 +Node: Strtonum Function566838 +Node: Assert Function569768 +Node: Round Function573094 +Node: Cliff Random Function574637 +Node: Ordinal Functions575653 +Ref: Ordinal Functions-Footnote-1578723 +Ref: Ordinal Functions-Footnote-2578975 +Node: Join Function579184 +Ref: Join Function-Footnote-1580955 +Node: Getlocaltime Function581155 +Node: Data File Management584870 +Node: Filetrans Function585502 +Node: Rewind Function589641 +Node: File Checking591028 +Node: Empty Files592122 +Node: Ignoring Assigns594352 +Node: Getopt Function595905 +Ref: Getopt Function-Footnote-1607209 +Node: Passwd Functions607412 +Ref: Passwd Functions-Footnote-1616387 +Node: Group Functions616475 +Node: Walking Arrays624559 +Node: Sample Programs626128 +Node: Running Examples626805 +Node: Clones627533 +Node: Cut Program628757 +Node: Egrep Program638602 +Ref: Egrep Program-Footnote-1646375 +Node: Id Program646485 +Node: Split Program650101 +Ref: Split Program-Footnote-1653620 +Node: Tee Program653748 +Node: Uniq Program656551 +Node: Wc Program663980 +Ref: Wc Program-Footnote-1668246 +Ref: Wc Program-Footnote-2668446 +Node: Miscellaneous Programs668538 +Node: Dupword Program669726 +Node: Alarm Program671757 +Node: Translate Program676506 +Ref: Translate Program-Footnote-1680893 +Ref: Translate Program-Footnote-2681121 +Node: Labels Program681255 +Ref: Labels Program-Footnote-1684626 +Node: Word Sorting684710 +Node: History Sorting688594 +Node: Extract Program690433 +Ref: Extract Program-Footnote-1697916 +Node: Simple Sed698044 +Node: Igawk Program701106 +Ref: Igawk Program-Footnote-1716263 +Ref: Igawk Program-Footnote-2716464 +Node: Anagram Program716602 +Node: Signature Program719670 +Node: Internationalization720770 +Node: I18N and L10N722202 +Node: Explaining gettext722888 +Ref: Explaining gettext-Footnote-1727954 +Ref: Explaining gettext-Footnote-2728138 +Node: Programmer i18n728303 +Node: Translator i18n732503 +Node: String Extraction733296 +Ref: String Extraction-Footnote-1734257 +Node: Printf Ordering734343 +Ref: Printf Ordering-Footnote-1737127 +Node: I18N Portability737191 +Ref: I18N Portability-Footnote-1739640 +Node: I18N Example739703 +Ref: I18N Example-Footnote-1742338 +Node: Gawk I18N742410 +Node: Advanced Features743027 +Node: Nondecimal Data744531 +Node: Array Sorting746114 +Node: Controlling Array Traversal746811 +Node: Array Sorting Functions755049 +Ref: Array Sorting Functions-Footnote-1758723 +Ref: Array Sorting Functions-Footnote-2758816 +Node: Two-way I/O759010 +Ref: Two-way I/O-Footnote-1764442 +Node: TCP/IP Networking764512 +Node: Profiling767356 +Node: Debugger774810 +Node: Debugging775778 +Node: Debugging Concepts776211 +Node: Debugging Terms778067 +Node: Awk Debugging780664 +Node: Sample Debugging Session781556 +Node: Debugger Invocation782076 +Node: Finding The Bug783405 +Node: List of Debugger Commands789893 +Node: Breakpoint Control791227 +Node: Debugger Execution Control794891 +Node: Viewing And Changing Data798251 +Node: Execution Stack801607 +Node: Debugger Info803074 +Node: Miscellaneous Debugger Commands807055 +Node: Readline Support812500 +Node: Limitations813331 +Node: Arbitrary Precision Arithmetic815583 +Ref: Arbitrary Precision Arithmetic-Footnote-1817225 +Node: General Arithmetic817373 +Node: Floating Point Issues819093 +Node: String Conversion Precision819974 +Ref: String Conversion Precision-Footnote-1821680 +Node: Unexpected Results821789 +Node: POSIX Floating Point Problems823942 +Ref: POSIX Floating Point Problems-Footnote-1827767 +Node: Integer Programming827805 +Node: Floating-point Programming829558 +Ref: Floating-point Programming-Footnote-1835867 +Node: Floating-point Representation836131 +Node: Floating-point Context837296 +Ref: table-ieee-formats838138 +Node: Rounding Mode839522 +Ref: table-rounding-modes840001 +Ref: Rounding Mode-Footnote-1843005 +Node: Gawk and MPFR843186 +Node: Arbitrary Precision Floats844428 +Ref: Arbitrary Precision Floats-Footnote-1846857 +Node: Setting Precision847168 +Node: Setting Rounding Mode849901 +Ref: table-gawk-rounding-modes850305 +Node: Floating-point Constants851485 +Node: Changing Precision852909 +Ref: Changing Precision-Footnote-1854309 +Node: Exact Arithmetic854483 +Node: Arbitrary Precision Integers857591 +Ref: Arbitrary Precision Integers-Footnote-1860591 +Node: Dynamic Extensions860738 +Node: Extension Intro862061 +Node: Plugin License863264 +Node: Extension Design863938 +Node: Old Extension Problems865009 +Ref: Old Extension Problems-Footnote-1866519 +Node: Extension New Mechanism Goals866576 +Ref: Extension New Mechanism Goals-Footnote-1869288 +Node: Extension Other Design Decisions869474 +Node: Extension Mechanism Outline871221 +Ref: load-extension872246 +Ref: load-new-function873724 +Ref: call-new-function874705 +Node: Extension Future Growth876686 +Node: Extension API Description877428 +Node: Extension API Functions Introduction878748 +Node: General Data Types882823 +Ref: General Data Types-Footnote-1888456 +Node: Requesting Values888755 +Ref: table-value-types-returned889486 +Node: Constructor Functions890440 +Node: Registration Functions893436 +Node: Extension Functions894121 +Node: Exit Callback Functions895940 +Node: Extension Version String897183 +Node: Input Parsers897833 +Node: Output Wrappers906414 +Node: Two-way processors910807 +Node: Printing Messages912929 +Ref: Printing Messages-Footnote-1914006 +Node: Updating `ERRNO'914158 +Node: Accessing Parameters914897 +Node: Symbol Table Access916127 +Node: Symbol table by name916639 +Ref: Symbol table by name-Footnote-1918811 +Node: Symbol table by cookie918891 +Ref: Symbol table by cookie-Footnote-1923020 +Node: Cached values923083 +Ref: Cached values-Footnote-1926284 +Node: Array Manipulation926375 +Ref: Array Manipulation-Footnote-1927473 +Node: Array Data Types927512 +Ref: Array Data Types-Footnote-1930234 +Node: Array Functions930326 +Node: Flattening Arrays934092 +Node: Creating Arrays940923 +Node: Extension API Variables945719 +Node: Extension Versioning946355 +Node: Extension API Informational Variables948256 +Node: Extension API Boilerplate949342 +Node: Finding Extensions953176 +Node: Extension Example953723 +Node: Internal File Description954461 +Node: Internal File Ops958149 +Ref: Internal File Ops-Footnote-1969233 +Node: Using Internal File Ops969373 +Ref: Using Internal File Ops-Footnote-1971729 +Node: Extension Samples971995 +Node: Extension Sample File Functions973438 +Node: Extension Sample Fnmatch981807 +Node: Extension Sample Fork983533 +Node: Extension Sample Ord984747 +Node: Extension Sample Readdir985523 +Node: Extension Sample Revout987861 +Node: Extension Sample Rev2way988454 +Node: Extension Sample Read write array989144 +Node: Extension Sample Readfile991027 +Node: Extension Sample API Tests991782 +Node: Extension Sample Time992307 +Node: gawkextlib993616 +Node: Language History995999 +Node: V7/SVR3.1997521 +Node: SVR4999842 +Node: POSIX1001284 +Node: BTL1002292 +Node: POSIX/GNU1003026 +Node: Common Extensions1008561 +Node: Ranges and Locales1009668 +Ref: Ranges and Locales-Footnote-11014286 +Ref: Ranges and Locales-Footnote-21014313 +Ref: Ranges and Locales-Footnote-31014573 +Node: Contributors1014794 +Node: Installation1019090 +Node: Gawk Distribution1019984 +Node: Getting1020468 +Node: Extracting1021294 +Node: Distribution contents1022986 +Node: Unix Installation1028208 +Node: Quick Installation1028825 +Node: Additional Configuration Options1030787 +Node: Configuration Philosophy1032264 +Node: Non-Unix Installation1034606 +Node: PC Installation1035064 +Node: PC Binary Installation1036363 +Node: PC Compiling1038211 +Node: PC Testing1041155 +Node: PC Using1042331 +Node: Cygwin1046516 +Node: MSYS1047516 +Node: VMS Installation1048030 +Node: VMS Compilation1048633 +Ref: VMS Compilation-Footnote-11049640 +Node: VMS Installation Details1049698 +Node: VMS Running1051333 +Node: VMS Old Gawk1052940 +Node: Bugs1053414 +Node: Other Versions1057266 +Node: Notes1062581 +Node: Compatibility Mode1063168 +Node: Additions1063951 +Node: Accessing The Source1064878 +Node: Adding Code1066304 +Node: New Ports1072346 +Node: Derived Files1076481 +Ref: Derived Files-Footnote-11081786 +Ref: Derived Files-Footnote-21081820 +Ref: Derived Files-Footnote-31082420 +Node: Future Extensions1082518 +Node: Basic Concepts1084005 +Node: Basic High Level1084686 +Ref: figure-general-flow1084957 +Ref: figure-process-flow1085556 +Ref: Basic High Level-Footnote-11088785 +Node: Basic Data Typing1088970 +Node: Glossary1092325 +Node: Copying1117636 +Node: GNU Free Documentation License1155193 +Node: Index1180330 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 7584f35f..e4ffc222 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -20,7 +20,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH October, 2012 +@set UPDATE-MONTH November, 2012 @set VERSION 4.0 @set PATCHLEVEL 1 @@ -294,13 +294,13 @@ particular records in a file and perform operations upon them. * Arrays:: The description and use of arrays. Also includes array-oriented control statements. * Functions:: Built-in and user-defined functions. +* Library Functions:: A Library of @command{awk} Functions. +* Sample Programs:: Many @command{awk} programs with complete + explanations. * Internationalization:: Getting @command{gawk} to speak your language. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. -* Library Functions:: A Library of @command{awk} Functions. -* Sample Programs:: Many @command{awk} programs with complete - explanations. * Debugger:: The @code{gawk} debugger. * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with @command{gawk}. @@ -593,28 +593,6 @@ particular records in a file and perform operations upon them. runtime. * Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability - issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also - internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array - traversal and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use @code{asort()} and - @code{asorti()}. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using @command{gawk} for network - programming. -* Profiling:: Profiling your @command{awk} programs. * Library Names:: How to best name private global variables in library functions. * General Functions:: Functions that are of general use. @@ -676,6 +654,28 @@ particular records in a file and perform operations upon them. * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use @code{asort()} and + @code{asorti()}. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using @command{gawk} for network + programming. +* Profiling:: Profiling your @command{awk} programs. * Debugging:: Introduction to @command{gawk} debugger. * Debugging Concepts:: Debugging in General. @@ -1251,6 +1251,12 @@ expert should find useful. In particular, the description of POSIX @ref{Sample Programs}, should be of interest. +This @value{DOCUMENT} is split into several parts, as follows: + +Part I describes the @command{awk} language and @command{gawk} program in detail. +It starts with the basics, and continues through all of the features of @command{awk}. +It contains the following chapters: + @ref{Getting Started}, provides the essentials you need to know to begin using @command{awk}. @@ -1294,6 +1300,22 @@ describes the built-in functions @command{awk} and @command{gawk} provide, as well as how to define your own functions. +Part II shows how to use @command{awk} and @command{gawk} for problem solving. +There is lots of code here for you to read and learn from. +It contains the following chapters: + +@ref{Library Functions}, which provides a number of functions meant to +be used from main @command{awk} programs. + +@ref{Sample Programs}, +which provides many sample @command{awk} programs. + +Reading these two chapters allows you to see @command{awk} +solving real problems. + +Part III focuses on features specific to @command{gawk}. +It contains the following chapters: + @ref{Internationalization}, describes special features in @command{gawk} for translating program messages into different languages at runtime. @@ -1305,12 +1327,6 @@ are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your @command{awk} programs. -@ref{Library Functions}, and -@ref{Sample Programs}, -provide many sample @command{awk} programs. -Reading them allows you to see @command{awk} -solving real problems. - @ref{Debugger}, describes the @command{awk} debugger. @ref{Arbitrary Precision Arithmetic}, @@ -1320,6 +1336,10 @@ describes advanced arithmetic facilities provided by @ref{Dynamic Extensions}, describes how to add new variables and functions to @command{gawk} by writing extensions in C. +Part IV provides the appendices, the Glossary, and two licenses that cover +the @command{gawk} source code and this @value{DOCUMENT}, respectively. +It contains the following appendices: + @ref{Language History}, describes how the @command{awk} language has evolved since its first release to present. It also describes how @command{gawk} @@ -1780,12 +1800,14 @@ Nof Ayalon @* ISRAEL @* March, 2011 -@ignore -@c Try this @iftex -@page -@headings off -@majorheading I@ @ @ @ The @command{awk} Language and @command{gawk} +@part Part I:@* The @command{awk} Language +@end iftex + +@ignore +@ifdocbook +@part Part I:@* The @command{awk} Language + Part I describes the @command{awk} language and @command{gawk} program in detail. It starts with the basics, and continues through all of the features of @command{awk} and @command{gawk}. It contains the following chapters: @@ -1795,6 +1817,9 @@ and @command{gawk}. It contains the following chapters: @ref{Getting Started}. @item +@ref{Invoking Gawk}. + +@item @ref{Regexp}. @item @@ -1814,21 +1839,8 @@ and @command{gawk}. It contains the following chapters: @item @ref{Functions}. - -@item -@ref{Internationalization}. - -@item -@ref{Advanced Features}. - -@item -@ref{Invoking Gawk}. @end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex +@end ifdocbook @end ignore @node Getting Started @@ -4181,31 +4193,6 @@ long-undocumented ``feature'' of Unix @code{awk}. @end ignore -@ignore -@c Try this -@iftex -@page -@headings off -@majorheading II@ @ @ Using @command{awk} and @command{gawk} -Part II shows how to use @command{awk} and @command{gawk} for problem solving. -There is lots of code here for you to read and learn from. -It contains the following chapters: - -@itemize @bullet -@item -@ref{Library Functions}. - -@item -@ref{Sample Programs}. - -@end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex -@end ignore - @node Regexp @chapter Regular Expressions @cindex regexp, See regular expressions @@ -17932,1891 +17919,28 @@ for (i = 1; i <= n; i++) @c ENDOFRANGE funcud -@node Internationalization -@chapter Internationalization with @command{gawk} - -Once upon a time, computer makers -wrote software that worked only in English. -Eventually, hardware and software vendors noticed that if their -systems worked in the native languages of non-English-speaking -countries, they were able to sell more systems. -As a result, internationalization and localization -of programs and software systems became a common practice. - -@c STARTOFRANGE inloc -@cindex internationalization, localization -@cindex @command{gawk}, internationalization and, See internationalization -@cindex internationalization, localization, @command{gawk} and -For many years, the ability to provide internationalization -was largely restricted to programs written in C and C++. -This @value{CHAPTER} describes the underlying library @command{gawk} -uses for internationalization, as well as how -@command{gawk} makes internationalization -features available at the @command{awk} program level. -Having internationalization available at the @command{awk} level -gives software developers additional flexibility---they are no -longer forced to write in C or C++ when internationalization is -a requirement. - -@menu -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -@end menu - -@node I18N and L10N -@section Internationalization and Localization - -@cindex internationalization -@cindex localization, See internationalization@comma{} localization -@cindex localization -@dfn{Internationalization} means writing (or modifying) a program once, -in such a way that it can use multiple languages without requiring -further source-code changes. -@dfn{Localization} means providing the data necessary for an -internationalized program to work in a particular language. -Most typically, these terms refer to features such as the language -used for printing error messages, the language used to read -responses, and information related to how numerical and -monetary values are printed and read. - -@node Explaining gettext -@section GNU @code{gettext} - -@cindex internationalizing a program -@c STARTOFRANGE gettex -@cindex @code{gettext} library -The facilities in GNU @code{gettext} focus on messages; strings printed -by a program, either directly or via formatting with @code{printf} or -@code{sprintf()}.@footnote{For some operating systems, the @command{gawk} -port doesn't support GNU @code{gettext}. -Therefore, these features are not available -if you are using one of those operating systems. Sorry.} - -@cindex portability, @code{gettext} library and -When using GNU @code{gettext}, each application has its own -@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk}, -that identifies the application. -A complete application may have multiple components---programs written -in C or C++, as well as scripts written in @command{sh} or @command{awk}. -All of the components use the same text domain. - -To make the discussion concrete, assume we're writing an application -named @command{guide}. Internationalization consists of the -following steps, in this order: - -@enumerate -@item -The programmer goes -through the source for all of @command{guide}'s components -and marks each string that is a candidate for translation. -For example, @code{"`-F': option required"} is a good candidate for translation. -A table with strings of option names is not (e.g., @command{gawk}'s -@option{--profile} option should remain the same, no matter what the local -language). - -@cindex @code{textdomain()} function (C library) -@item -The programmer indicates the application's text domain -(@code{"guide"}) to the @code{gettext} library, -by calling the @code{textdomain()} function. - -@cindex @code{.pot} files -@cindex files, @code{.pot} -@cindex portable object template files -@cindex files, portable object template -@item -Messages from the application are extracted from the source code and -collected into a portable object template file (@file{guide.pot}), -which lists the strings and their translations. -The translations are initially empty. -The original (usually English) messages serve as the key for -lookup of the translations. - -@cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object -@item -For each language with a translator, @file{guide.pot} -is copied to a portable object file (@code{.po}) -and translations are created and shipped with the application. -For example, there might be a @file{fr.po} for a French translation. - -@cindex @code{.mo} files -@cindex files, @code{.mo} -@cindex message object files -@cindex files, message object -@item -Each language's @file{.po} file is converted into a binary -message object (@file{.mo}) file. -A message object file contains the original messages and their -translations in a binary format that allows fast lookup of translations -at runtime. - -@item -When @command{guide} is built and installed, the binary translation files -are installed in a standard place. - -@cindex @code{bindtextdomain()} function (C library) -@item -For testing and development, it is possible to tell @code{gettext} -to use @file{.mo} files in a different directory than the standard -one by using the @code{bindtextdomain()} function. - -@cindex @code{.mo} files, specifying directory of -@cindex files, @code{.mo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of -@item -At runtime, @command{guide} looks up each string via a call -to @code{gettext()}. The returned string is the translated string -if available, or the original string if not. - -@item -If necessary, it is possible to access messages from a different -text domain than the one belonging to the application, without -having to switch the application's default text domain back -and forth. -@end enumerate - -@cindex @code{gettext()} function (C library) -In C (or C++), the string marking and dynamic translation lookup -are accomplished by wrapping each string in a call to @code{gettext()}: - -@example -printf("%s", gettext("Don't Panic!\n")); -@end example - -The tools that extract messages from source code pull out all -strings enclosed in calls to @code{gettext()}. - -@cindex @code{_} (underscore), @code{_} C macro -@cindex underscore (@code{_}), @code{_} C macro -The GNU @code{gettext} developers, recognizing that typing -@samp{gettext(@dots{})} over and over again is both painful and ugly to look -at, use the macro @samp{_} (an underscore) to make things easier: - -@example -/* In the standard header file: */ -#define _(str) gettext(str) - -/* In the program text: */ -printf("%s", _("Don't Panic!\n")); -@end example - -@cindex internationalization, localization, locale categories -@cindex @code{gettext} library, locale categories -@cindex locale categories -@noindent -This reduces the typing overhead to just three extra characters per string -and is considerably easier to read as well. - -There are locale @dfn{categories} -for different types of locale-related information. -The defined locale categories that @code{gettext} knows about are: - -@table @code -@cindex @code{LC_MESSAGES} locale category -@item LC_MESSAGES -Text messages. This is the default category for @code{gettext} -operations, but it is possible to supply a different one explicitly, -if necessary. (It is almost never necessary to supply a different category.) - -@cindex sorting characters in different languages -@cindex @code{LC_COLLATE} locale category -@item LC_COLLATE -Text-collation information; i.e., how different characters -and/or groups of characters sort in a given language. - -@cindex @code{LC_CTYPE} locale category -@item LC_CTYPE -Character-type information (alphabetic, digit, upper- or lowercase, and -so on). -This information is accessed via the -POSIX character classes in regular expressions, -such as @code{/[[:alnum:]]/} -(@pxref{Regexp Operators}). - -@cindex monetary information, localization -@cindex currency symbols, localization -@cindex @code{LC_MONETARY} locale category -@item LC_MONETARY -Monetary information, such as the currency symbol, and whether the -symbol goes before or after a number. - -@cindex @code{LC_NUMERIC} locale category -@item LC_NUMERIC -Numeric information, such as which characters to use for the decimal -point and the thousands separator.@footnote{Americans -use a comma every three decimal places and a period for the decimal -point, while many Europeans do exactly the opposite: -1,234.56 versus 1.234,56.} - -@cindex @code{LC_RESPONSE} locale category -@item LC_RESPONSE -Response information, such as how ``yes'' and ``no'' appear in the -local language, and possibly other information as well. - -@cindex time, localization and -@cindex dates, information related to@comma{} localization -@cindex @code{LC_TIME} locale category -@item LC_TIME -Time- and date-related information, such as 12- or 24-hour clock, month printed -before or after the day in a date, local month abbreviations, and so on. - -@cindex @code{LC_ALL} locale category -@item LC_ALL -All of the above. (Not too useful in the context of @code{gettext}.) -@end table -@c ENDOFRANGE gettex - -@node Programmer i18n -@section Internationalizing @command{awk} Programs -@c STARTOFRANGE inap -@cindex @command{awk} programs, internationalizing - -@command{gawk} provides the following variables and functions for -internationalization: - -@table @code -@cindex @code{TEXTDOMAIN} variable -@item TEXTDOMAIN -This variable indicates the application's text domain. -For compatibility with GNU @code{gettext}, the default -value is @code{"messages"}. - -@cindex internationalization, localization, marked strings -@cindex strings, for localization -@item _"your message here" -String constants marked with a leading underscore -are candidates for translation at runtime. -String constants without a leading underscore are not translated. - -@cindex @code{dcgettext()} function (@command{gawk}) -@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -Return the translation of @var{string} in -text domain @var{domain} for locale category @var{category}. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -If you supply a value for @var{category}, it must be a string equal to -one of the known locale categories described in -@ifnotinfo -the previous @value{SECTION}. -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}. -@end ifinfo -You must also supply a text domain. Use @code{TEXTDOMAIN} if -you want to use the current domain. - -@quotation CAUTION -The order of arguments to the @command{awk} version -of the @code{dcgettext()} function is purposely different from the order for -the C version. The @command{awk} version's order was -chosen to be simple and to allow for reasonable @command{awk}-style -default arguments. -@end quotation - -@cindex @code{dcngettext()} function (@command{gawk}) -@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -Return the plural form used for @var{number} of the -translation of @var{string1} and @var{string2} in text domain -@var{domain} for locale category @var{category}. @var{string1} is the -English singular variant of a message, and @var{string2} the English plural -variant of the same message. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -The same remarks about argument order as for the @code{dcgettext()} function apply. - -@cindex @code{.mo} files, specifying directory of -@cindex files, @code{.mo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of -@cindex @code{bindtextdomain()} function (@command{gawk}) -@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) -Change the directory in which -@code{gettext} looks for @file{.mo} files, in case they -will not or cannot be placed in the standard locations -(e.g., during testing). -Return the directory in which @var{domain} is ``bound.'' - -The default @var{domain} is the value of @code{TEXTDOMAIN}. -If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain()} returns the current binding for the -given @var{domain}. -@end table - -To use these facilities in your @command{awk} program, follow the steps -outlined in -@ifnotinfo -the previous @value{SECTION}, -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}, -@end ifinfo -like so: - -@enumerate -@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and -@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and -@item -Set the variable @code{TEXTDOMAIN} to the text domain of -your program. This is best done in a @code{BEGIN} rule -(@pxref{BEGIN/END}), -or it can also be done via the @option{-v} command-line -option (@pxref{Options}): - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - @dots{} -@} -@end example - -@cindex @code{_} (underscore), translatable string -@cindex underscore (@code{_}), translatable string -@item -Mark all translatable strings with a leading underscore (@samp{_}) -character. It @emph{must} be adjacent to the opening -quote of the string. For example: - -@example -print _"hello, world" -x = _"you goofed" -printf(_"Number of users is %d\n", nusers) -@end example - -@item -If you are creating strings dynamically, you can -still translate them, using the @code{dcgettext()} -built-in function: - -@example -message = nusers " users logged in" -message = dcgettext(message, "adminprog") -print message -@end example - -Here, the call to @code{dcgettext()} supplies a different -text domain (@code{"adminprog"}) in which to find the -message, but it uses the default @code{"LC_MESSAGES"} category. - -@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) -@item -During development, you might want to put the @file{.mo} -file in a private directory for testing. This is done -with the @code{bindtextdomain()} built-in function: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" # our text domain - if (Testing) @{ - # where to find our files - bindtextdomain("testdir") - # joe is in charge of adminprog - bindtextdomain("../joe/testdir", "adminprog") - @} - @dots{} -@} -@end example - -@end enumerate - -@xref{I18N Example}, -for an example program showing the steps to create -and use translations from @command{awk}. - -@node Translator i18n -@section Translating @command{awk} Programs - -@cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object -Once a program's translatable strings have been marked, they must -be extracted to create the initial @file{.po} file. -As part of translation, it is often helpful to rearrange the order -in which arguments to @code{printf} are output. - -@command{gawk}'s @option{--gen-pot} command-line option extracts -the messages and is discussed next. -After that, @code{printf}'s ability to -rearrange the order for @code{printf} arguments at runtime -is covered. - -@menu -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -@end menu - -@node String Extraction -@subsection Extracting Marked Strings -@cindex strings, extracting -@cindex marked strings@comma{} extracting -@cindex @code{--gen-pot} option -@cindex command-line options, string extraction -@cindex string extraction (internationalization) -@cindex marked string extraction (internationalization) -@cindex extraction, of marked strings (internationalization) - -@cindex @code{--gen-pot} option -Once your @command{awk} program is working, and all the strings have -been marked and you've set (and perhaps bound) the text domain, -it is time to produce translations. -First, use the @option{--gen-pot} command-line option to create -the initial @file{.pot} file: - -@example -$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} -@end example - -@cindex @code{xgettext} utility -When run with @option{--gen-pot}, @command{gawk} does not execute your -program. Instead, it parses it as usual and prints all marked strings -to standard output in the format of a GNU @code{gettext} Portable Object -file. Also included in the output are any constant strings that -appear as the first argument to @code{dcgettext()} or as the first and -second argument to @code{dcngettext()}.@footnote{The -@command{xgettext} utility that comes with GNU -@code{gettext} can handle @file{.awk} files.} -@xref{I18N Example}, -for the full list of steps to go through to create and test -translations for @command{guide}. - -@node Printf Ordering -@subsection Rearranging @code{printf} Arguments - -@cindex @code{printf} statement, positional specifiers -@cindex positional specifiers, @code{printf} statement -Format strings for @code{printf} and @code{sprintf()} -(@pxref{Printf}) -present a special problem for translation. -Consider the following:@footnote{This example is borrowed -from the GNU @code{gettext} manual.} - -@c line broken here only for smallbook format -@example -printf(_"String `%s' has %d characters\n", - string, length(string))) -@end example - -A possible German translation for this might be: - -@example -"%d Zeichen lang ist die Zeichenkette `%s'\n" -@end example - -The problem should be obvious: the order of the format -specifications is different from the original! -Even though @code{gettext()} can return the translated string -at runtime, -it cannot change the argument order in the call to @code{printf}. - -To solve this problem, @code{printf} format specifiers may have -an additional optional element, which we call a @dfn{positional specifier}. -For example: - -@example -"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" -@end example - -Here, the positional specifier consists of an integer count, which indicates which -argument to use, and a @samp{$}. Counts are one-based, and the -format string itself is @emph{not} included. Thus, in the following -example, @samp{string} is the first argument and @samp{length(string)} is the second: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{string = "Dont Panic"} -> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} -> @kbd{string, length(string)} -> @kbd{@}'} -@print{} 10 characters live in "Dont Panic" -@end example - -If present, positional specifiers come first in the format specification, -before the flags, the field width, and/or the precision. - -Positional specifiers can be used with the dynamic field width and -precision capability: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{printf("%*.*s\n", 10, 20, "hello")} -> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")} -> @kbd{@}'} -@print{} hello -@print{} hello -@end example - -@quotation NOTE -When using @samp{*} with a positional specifier, the @samp{*} -comes first, then the integer position, and then the @samp{$}. -This is somewhat counterintuitive. -@end quotation - -@cindex @code{printf} statement, positional specifiers, mixing with regular formats -@cindex positional specifiers, @code{printf} statement, mixing with regular formats -@cindex format specifiers, mixing regular with positional specifiers -@command{gawk} does not allow you to mix regular format specifiers -and those with positional specifiers in the same string: - -@example -$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} -@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none -@end example - -@quotation NOTE -There are some pathological cases that @command{gawk} may fail to -diagnose. In such cases, the output may not be what you expect. -It's still a bad idea to try mixing them, even if @command{gawk} -doesn't detect it. -@end quotation - -Although positional specifiers can be used directly in @command{awk} programs, -their primary purpose is to help in producing correct translations of -format strings into languages different from the one in which the program -is first written. - -@node I18N Portability -@subsection @command{awk} Portability Issues - -@cindex portability, internationalization and -@cindex internationalization, localization, portability and -@command{gawk}'s internationalization features were purposely chosen to -have as little impact as possible on the portability of @command{awk} -programs that use them to other versions of @command{awk}. -Consider this program: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - if (Test_Guide) # set with -v - bindtextdomain("/test/guide/messages") - print _"don't panic!" -@} -@end example - -@noindent -As written, it won't work on other versions of @command{awk}. -However, it is actually almost portable, requiring very little -change: - -@itemize @bullet -@cindex @code{TEXTDOMAIN} variable, portability and -@item -Assignments to @code{TEXTDOMAIN} won't have any effect, -since @code{TEXTDOMAIN} is not special in other @command{awk} implementations. - -@item -Non-GNU versions of @command{awk} treat marked strings -as the concatenation of a variable named @code{_} with the string -following it.@footnote{This is good fodder for an ``Obfuscated -@command{awk}'' contest.} Typically, the variable @code{_} has -the null string (@code{""}) as its value, leaving the original string constant as -the result. - -@item -By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} -and @code{bindtextdomain()}, the @command{awk} program can be made to run, but -all the messages are output in the original language. -For example: - -@cindex @code{bindtextdomain()} function (@command{gawk}), portability and -@cindex @code{dcgettext()} function (@command{gawk}), portability and -@cindex @code{dcngettext()} function (@command{gawk}), portability and -@example -@c file eg/lib/libintl.awk -function bindtextdomain(dir, domain) -@{ - return dir -@} - -function dcgettext(string, domain, category) -@{ - return string -@} - -function dcngettext(string1, string2, number, domain, category) -@{ - return (number == 1 ? string1 : string2) -@} -@c endfile -@end example - -@item -The use of positional specifications in @code{printf} or -@code{sprintf()} is @emph{not} portable. -To support @code{gettext()} at the C level, many systems' C versions of -@code{sprintf()} do support positional specifiers. But it works only if -enough arguments are supplied in the function call. Many versions of -@command{awk} pass @code{printf} formats and arguments unchanged to the -underlying C library version of @code{sprintf()}, but only one format and -argument at a time. What happens if a positional specification is -used is anybody's guess. -However, since the positional specifications are primarily for use in -@emph{translated} format strings, and since non-GNU @command{awk}s never -retrieve the translated string, this should not be a problem in practice. -@end itemize -@c ENDOFRANGE inap - -@node I18N Example -@section A Simple Internationalization Example - -Now let's look at a step-by-step example of how to internationalize and -localize a simple @command{awk} program, using @file{guide.awk} as our -original source: - -@example -@c file eg/prog/guide.awk -BEGIN @{ - TEXTDOMAIN = "guide" - bindtextdomain(".") # for testing - print _"Don't Panic" - print _"The Answer Is", 42 - print "Pardon me, Zaphod who?" -@} -@c endfile -@end example - -@noindent -Run @samp{gawk --gen-pot} to create the @file{.pot} file: - -@example -$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} -@end example - -@noindent -This produces: - -@example -@c file eg/data/guide.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "" - -@c endfile -@end example - -This original portable object template file is saved and reused for each language -into which the application is translated. The @code{msgid} -is the original string and the @code{msgstr} is the translation. - -@quotation NOTE -Strings not marked with a leading underscore do not -appear in the @file{guide.pot} file. -@end quotation - -Next, the messages must be translated. -Here is a translation to a hypothetical dialect of English, -called ``Mellow'':@footnote{Perhaps it would be better if it were -called ``Hippy.'' Ah, well.} - -@example -@group -$ cp guide.pot guide-mellow.po -@var{Add translations to} guide-mellow.po @dots{} -@end group -@end example - -@noindent -Following are the translations: - -@example -@c file eg/data/guide-mellow.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "Hey man, relax!" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "Like, the scoop is" - -@c endfile -@end example - -@cindex Linux -@cindex GNU/Linux -The next step is to make the directory to hold the binary message object -file and then to create the @file{guide.mo} file. -The directory layout shown here is standard for GNU @code{gettext} on -GNU/Linux systems. Other versions of @code{gettext} may use a different -layout: - -@example -$ @kbd{mkdir en_US en_US/LC_MESSAGES} -@end example - -@cindex @code{.po} files, converting to @code{.mo} -@cindex files, @code{.po}, converting to @code{.mo} -@cindex @code{.mo} files, converting from @code{.po} -@cindex files, @code{.mo}, converting from @code{.po} -@cindex portable object files, converting to message object files -@cindex files, portable object, converting to message object files -@cindex message object files, converting from portable object files -@cindex files, message object, converting from portable object files -@cindex @command{msgfmt} utility -The @command{msgfmt} utility does the conversion from human-readable -@file{.po} file to machine-readable @file{.mo} file. -By default, @command{msgfmt} creates a file named @file{messages}. -This file must be renamed and placed in the proper directory so that -@command{gawk} can find it: - -@example -$ @kbd{msgfmt guide-mellow.po} -$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo} -@end example - -Finally, we run the program to test it: - -@example -$ @kbd{gawk -f guide.awk} -@print{} Hey man, relax! -@print{} Like, the scoop is 42 -@print{} Pardon me, Zaphod who? -@end example - -If the three replacement functions for @code{dcgettext()}, @code{dcngettext()} -and @code{bindtextdomain()} -(@pxref{I18N Portability}) -are in a file named @file{libintl.awk}, -then we can run @file{guide.awk} unchanged as follows: - -@example -$ @kbd{gawk --posix -f guide.awk -f libintl.awk} -@print{} Don't Panic -@print{} The Answer Is 42 -@print{} Pardon me, Zaphod who? -@end example - -@node Gawk I18N -@section @command{gawk} Can Speak Your Language - -@command{gawk} itself has been internationalized -using the GNU @code{gettext} package. -(GNU @code{gettext} is described in -complete detail in -@ifinfo -@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.) -@end ifinfo -@ifnotinfo -@cite{GNU gettext tools}.) -@end ifnotinfo -As of this writing, the latest version of GNU @code{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}. - -If a translation of @command{gawk}'s messages exists, -then @command{gawk} produces usage messages, warnings, -and fatal errors in the local language. -@c ENDOFRANGE inloc +@iftex +@part Part II:@* Problem Solving With @command{awk} +@end iftex -@node Advanced Features -@chapter Advanced Features of @command{gawk} -@cindex advanced features, network connections, See Also networks, connections -@c STARTOFRANGE gawadv -@cindex @command{gawk}, features, advanced -@c STARTOFRANGE advgaw -@cindex advanced features, @command{gawk} @ignore -Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> - - Found in Steve English's "signature" line: - -"Write documentation as if whoever reads it is a violent psychopath -who knows where you live." -@end ignore -@quotation -@i{Write documentation as if whoever reads it is -a violent psychopath who knows where you live.}@* -Steve English, as quoted by Peter Langston -@end quotation - -This @value{CHAPTER} discusses advanced features in @command{gawk}. -It's a bit of a ``grab bag'' of items that are otherwise unrelated -to each other. -First, a command-line option allows @command{gawk} to recognize -nondecimal numbers in input data, not just in @command{awk} -programs. -Then, @command{gawk}'s special features for sorting arrays are presented. -Next, two-way I/O, discussed briefly in earlier parts of this -@value{DOCUMENT}, is described in full detail, along with the basics -of TCP/IP networking. Finally, @command{gawk} -can @dfn{profile} an @command{awk} program, making it possible to tune -it for performance. - -@ref{Dynamic Extensions}, -discusses the ability to dynamically add new built-in functions to -@command{gawk}. As this feature is still immature and likely to change, -its description is relegated to an appendix. - -@menu -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal and - sorting arrays. -* Two-way I/O:: Two-way communications with another process. -* TCP/IP Networking:: Using @command{gawk} for network programming. -* Profiling:: Profiling your @command{awk} programs. -@end menu - -@node Nondecimal Data -@section Allowing Nondecimal Input Data -@cindex @code{--non-decimal-data} option -@cindex advanced features, @command{gawk}, nondecimal input data -@cindex input, data@comma{} nondecimal -@cindex constants, nondecimal - -If you run @command{gawk} with the @option{--non-decimal-data} option, -you can have nondecimal constants in your input data: - -@c line break here for small book format -@example -$ @kbd{echo 0123 123 0x123 |} -> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",} -> @kbd{$1, $2, $3 @}'} -@print{} 83, 123, 291 -@end example - -For this feature to work, write your program so that -@command{gawk} treats your data as numeric: - -@example -$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'} -@print{} 0123 123 0x123 -@end example - -@noindent -The @code{print} statement treats its expressions as strings. -Although the fields can act as numbers when necessary, -they are still strings, so @code{print} does not try to treat them -numerically. You may need to add zero to a field to force it to -be treated as a number. For example: - -@example -$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '} -> @kbd{@{ print $1, $2, $3} -> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'} -@print{} 0123 123 0x123 -@print{} 83 123 291 -@end example - -Because it is common to have decimal data with leading zeros, and because -using this facility could lead to surprising results, the default is to leave it -disabled. If you want it, you must explicitly request it. - -@cindex programming conventions, @code{--non-decimal-data} option -@cindex @code{--non-decimal-data} option, @code{strtonum()} function and -@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and -@quotation CAUTION -@emph{Use of this option is not recommended.} -It can break old programs very badly. -Instead, use the @code{strtonum()} function to convert your data -(@pxref{Nondecimal-numbers}). -This makes your programs easier to write and easier to read, and -leads to less surprising results. -@end quotation - -@node Array Sorting -@section Controlling Array Traversal and Array Sorting - -@command{gawk} lets you control the order in which a @samp{for (i in array)} -loop traverses an array. - -In addition, two built-in functions, @code{asort()} and @code{asorti()}, -let you sort arrays based on the array values and indices, respectively. -These two functions also provide control over the sorting criteria used -to order the elements during sorting. - -@menu -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}. -@end menu - -@node Controlling Array Traversal -@subsection Controlling Array Traversal - -By default, the order in which a @samp{for (i in array)} loop -scans an array is not defined; it is generally based upon -the internal implementation of arrays inside @command{awk}. - -Often, though, it is desirable to be able to loop over the elements -in a particular order that you, the programmer, choose. @command{gawk} -lets you do this. - -@ref{Controlling Scanning}, describes how you can assign special, -pre-defined values to @code{PROCINFO["sorted_in"]} in order to -control the order in which @command{gawk} will traverse an array -during a @code{for} loop. - -In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name. -This lets you traverse an array based on any custom criterion. -The array elements are ordered according to the return value of this -function. The comparison function should be defined with at least -four arguments: - -@example -function comp_func(i1, v1, i2, v2) -@{ - @var{compare elements 1 and 2 in some fashion} - @var{return < 0; 0; or > 0} -@} -@end example - -Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} -are the corresponding values of the two elements being compared. -Either @var{v1} or @var{v2}, or both, can be arrays if the array being -traversed contains subarrays as values. -(@xref{Arrays of Arrays}, for more information about subarrays.) -The three possible return values are interpreted as follows: - -@table @code -@item comp_func(i1, v1, i2, v2) < 0 -Index @var{i1} comes before index @var{i2} during loop traversal. - -@item comp_func(i1, v1, i2, v2) == 0 -Indices @var{i1} and @var{i2} -come together but the relative order with respect to each other is undefined. - -@item comp_func(i1, v1, i2, v2) > 0 -Index @var{i1} comes after index @var{i2} during loop traversal. -@end table - -Our first comparison function can be used to scan an array in -numerical order of the indices: - -@example -function cmp_num_idx(i1, v1, i2, v2) -@{ - # numerical index comparison, ascending order - return (i1 - i2) -@} -@end example - -Our second function traverses an array based on the string order of -the element values rather than by indices: - -@example -function cmp_str_val(i1, v1, i2, v2) -@{ - # string value comparison, ascending order - v1 = v1 "" - v2 = v2 "" - if (v1 < v2) - return -1 - return (v1 != v2) -@} -@end example - -The third -comparison function makes all numbers, and numeric strings without -any leading or trailing spaces, come out first during loop traversal: - -@example -function cmp_num_str_val(i1, v1, i2, v2, n1, n2) -@{ - # numbers before string value comparison, ascending order - n1 = v1 + 0 - n2 = v2 + 0 - if (n1 == v1) - return (n2 == v2) ? (n1 - n2) : -1 - else if (n2 == v2) - return 1 - return (v1 < v2) ? -1 : (v1 != v2) -@} -@end example - -Here is a main program to demonstrate how @command{gawk} -behaves using each of the previous functions: - -@example -BEGIN @{ - data["one"] = 10 - data["two"] = 20 - data[10] = "one" - data[100] = 100 - data[20] = "two" - - f[1] = "cmp_num_idx" - f[2] = "cmp_str_val" - f[3] = "cmp_num_str_val" - for (i = 1; i <= 3; i++) @{ - printf("Sort function: %s\n", f[i]) - PROCINFO["sorted_in"] = f[i] - for (j in data) - printf("\tdata[%s] = %s\n", j, data[j]) - print "" - @} -@} -@end example - -Here are the results when the program is run: -@page - -@example -$ @kbd{gawk -f compdemo.awk} -@print{} Sort function: cmp_num_idx @ii{Sort by numeric index} -@print{} data[two] = 20 -@print{} data[one] = 10 @ii{Both strings are numerically zero} -@print{} data[10] = one -@print{} data[20] = two -@print{} data[100] = 100 -@print{} -@print{} Sort function: cmp_str_val @ii{Sort by element values as strings} -@print{} data[one] = 10 -@print{} data[100] = 100 @ii{String 100 is less than string 20} -@print{} data[two] = 20 -@print{} data[10] = one -@print{} data[20] = two -@print{} -@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings} -@print{} data[one] = 10 -@print{} data[two] = 20 -@print{} data[100] = 100 -@print{} data[10] = one -@print{} data[20] = two -@end example - -Consider sorting the entries of a GNU/Linux system password file -according to login name. The following program sorts records -by a specific field position and can be used for this purpose: - -@example -# sort.awk --- simple program to sort by field position -# field position is specified by the global variable POS - -function cmp_field(i1, v1, i2, v2) -@{ - # comparison by value, as string, and ascending order - return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) -@} - -@{ - for (i = 1; i <= NF; i++) - a[NR][i] = $i -@} - -END @{ - PROCINFO["sorted_in"] = "cmp_field" - if (POS < 1 || POS > NF) - POS = 1 - for (i in a) @{ - for (j = 1; j <= NF; j++) - printf("%s%c", a[i][j], j < NF ? ":" : "") - print "" - @} -@} -@end example - -The first field in each entry of the password file is the user's login name, -and the fields are separated by colons. -Each record defines a subarray, -with each field as an element in the subarray. -Running the program produces the -following output: - -@example -$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd} -@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin -@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin -@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin -@dots{} -@end example - -The comparison should normally always return the same value when given a -specific pair of array elements as its arguments. If inconsistent -results are returned then the order is undefined. This behavior can be -exploited to introduce random order into otherwise seemingly -ordered data: - -@example -function cmp_randomize(i1, v1, i2, v2) -@{ - # random order - return (2 - 4 * rand()) -@} -@end example - -As mentioned above, the order of the indices is arbitrary if two -elements compare equal. This is usually not a problem, but letting -the tied elements come out in arbitrary order can be an issue, especially -when comparing item values. The partial ordering of the equal elements -may change during the next loop traversal, if other elements are added or -removed from the array. One way to resolve ties when comparing elements -with otherwise equal values is to include the indices in the comparison -rules. Note that doing this may make the loop traversal less efficient, -so consider it only if necessary. The following comparison functions -force a deterministic order, and are based on the fact that the -indices of two elements are never equal: - -@example -function cmp_numeric(i1, v1, i2, v2) -@{ - # numerical value (and index) comparison, descending order - return (v1 != v2) ? (v2 - v1) : (i2 - i1) -@} - -function cmp_string(i1, v1, i2, v2) -@{ - # string value (and index) comparison, descending order - v1 = v1 i1 - v2 = v2 i2 - return (v1 > v2) ? -1 : (v1 != v2) -@} -@end example - -@c Avoid using the term ``stable'' when describing the unpredictable behavior -@c if two items compare equal. Usually, the goal of a "stable algorithm" -@c is to maintain the original order of the items, which is a meaningless -@c concept for a list constructed from a hash. - -A custom comparison function can often simplify ordered loop -traversal, and the sky is really the limit when it comes to -designing such a function. - -When string comparisons are made during a sort, either for element -values where one or both aren't numbers, or for element indices -handled as strings, the value of @code{IGNORECASE} -(@pxref{Built-in Variables}) controls whether -the comparisons treat corresponding uppercase and lowercase letters as -equivalent or distinct. - -Another point to keep in mind is that in the case of subarrays -the element values can themselves be arrays; a production comparison -function should use the @code{isarray()} function -(@pxref{Type Functions}), -to check for this, and choose a defined sorting order for subarrays. - -All sorting based on @code{PROCINFO["sorted_in"]} -is disabled in POSIX mode, -since the @code{PROCINFO} array is not special in that case. - -As a side note, sorting the array indices before traversing -the array has been reported to add 15% to 20% overhead to the -execution time of @command{awk} programs. For this reason, -sorted array traversal is not the default. - -@c The @command{gawk} -@c maintainers believe that only the people who wish to use a -@c feature should have to pay for it. - -@node Array Sorting Functions -@subsection Sorting Array Values and Indices with @command{gawk} - -@cindex arrays, sorting -@cindex @code{asort()} function (@command{gawk}) -@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting -@cindex sort function, arrays, sorting -In most @command{awk} implementations, sorting an array requires -writing a @code{sort()} function. -While this can be educational for exploring different sorting algorithms, -usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort()} -and @code{asorti()} functions -(@pxref{String Functions}) -for sorting arrays. For example: - -@example -@var{populate the array} data -n = asort(data) -for (i = 1; i <= n; i++) - @var{do something with} data[i] -@end example - -After the call to @code{asort()}, the array @code{data} is indexed from 1 -to some number @var{n}, the total number of elements in @code{data}. -(This count is @code{asort()}'s return value.) -@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. -The comparison is based on the type of the elements -(@pxref{Typing and Comparison}). -All numeric values come before all string values, -which in turn come before all subarrays. - -@cindex side effects, @code{asort()} function -An important side effect of calling @code{asort()} is that -@emph{the array's original indices are irrevocably lost}. -As this isn't always desirable, @code{asort()} accepts a -second argument: - -@example -@var{populate the array} source -n = asort(source, dest) -for (i = 1; i <= n; i++) - @var{do something with} dest[i] -@end example - -In this case, @command{gawk} copies the @code{source} array into the -@code{dest} array and then sorts @code{dest}, destroying its indices. -However, the @code{source} array is not affected. - -@code{asort()} accepts a third string argument to control comparison of -array elements. As with @code{PROCINFO["sorted_in"]}, this argument -may be one of the predefined names that @command{gawk} provides -(@pxref{Controlling Scanning}), or the name of a user-defined function -(@pxref{Controlling Array Traversal}). - -@quotation NOTE -In all cases, the sorted element values consist of the original -array's element values. The ability to control comparison merely -affects the way in which they are sorted. -@end quotation - -Often, what's needed is to sort on the values of the @emph{indices} -instead of the values of the elements. -To do that, use the -@code{asorti()} function. The interface is identical to that of -@code{asort()}, except that the index values are used for sorting, and -become the values of the result array: - -@example -@{ source[$0] = some_func($0) @} - -END @{ - n = asorti(source, dest) - for (i = 1; i <= n; i++) @{ - @ii{Work with sorted indices directly:} - @var{do something with} dest[i] - @dots{} - @ii{Access original array via sorted indices:} - @var{do something with} source[dest[i]] - @} -@} -@end example - -Similar to @code{asort()}, -in all cases, the sorted element values consist of the original -array's indices. The ability to control comparison merely -affects the way in which they are sorted. - -Sorting the array by replacing the indices provides maximal flexibility. -To traverse the elements in decreasing order, use a loop that goes from -@var{n} down to 1, either over the elements or over the indices.@footnote{You -may also use one of the predefined sorting names that sorts in -decreasing order.} - -@cindex reference counting, sorting arrays -Copying array indices and elements isn't expensive in terms of memory. -Internally, @command{gawk} maintains @dfn{reference counts} to data. -For example, when @code{asort()} copies the first array to the second one, -there is only one copy of the original array elements' data, even though -both arrays use the values. - -@c Document It And Call It A Feature. Sigh. -@cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex @code{IGNORECASE} variable -@cindex arrays, sorting, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array sorting and -Because @code{IGNORECASE} affects string comparisons, the value -of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. -Note also that the locale's sorting order does @emph{not} -come into play; comparisons are based on character values only.@footnote{This -is true because locale-based comparison occurs only when in POSIX -compatibility mode, and since @code{asort()} and @code{asorti()} are -@command{gawk} extensions, they are not available in that case.} -Caveat Emptor. - -@node Two-way I/O -@section Two-Way Communications with Another Process -@cindex Brennan, Michael -@cindex programmers, attractiveness of -@smallexample -@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan -From: brennan@@whidbey.com (Mike Brennan) -Newsgroups: comp.lang.awk -Subject: Re: Learn the SECRET to Attract Women Easily -Date: 4 Aug 1997 17:34:46 GMT -@c Organization: WhidbeyNet -@c Lines: 12 -Message-ID: <5s53rm$eca@@news.whidbey.com> -@c References: <5s20dn$2e1@chronicle.concentric.net> -@c Reply-To: brennan@whidbey.com -@c NNTP-Posting-Host: asn202.whidbey.com -@c X-Newsreader: slrn (0.9.4.1 UNIX) -@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403 - -On 3 Aug 1997 13:17:43 GMT, Want More Dates??? -<tracy78@@kilgrona.com> wrote: ->Learn the SECRET to Attract Women Easily -> ->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women - -The scent of awk programmers is a lot more attractive to women than -the scent of perl programmers. --- -Mike Brennan -@c brennan@@whidbey.com -@end smallexample - -@cindex advanced features, @command{gawk}, processes@comma{} communicating with -@cindex processes, two-way communications with -It is often useful to be able to -send data to a separate program for -processing and then read the result. This can always be -done with temporary files: - -@example -# Write the data for processing -tempfile = ("mydata." PROCINFO["pid"]) -while (@var{not done with data}) - print @var{data} | ("subprogram > " tempfile) -close("subprogram > " tempfile) - -# Read the results, remove tempfile when done -while ((getline newdata < tempfile) > 0) - @var{process} newdata @var{appropriately} -close(tempfile) -system("rm " tempfile) -@end example - -@noindent -This works, but not elegantly. Among other things, it requires that -the program be run in a directory that cannot be shared among users; -for example, @file{/tmp} will not do, as another user might happen -to be using a temporary file with the same name. - -@cindex coprocesses -@cindex input/output, two-way -@cindex @code{|} (vertical bar), @code{|&} operator (I/O) -@cindex vertical bar (@code{|}), @code{|&} operator (I/O) -@cindex @command{csh} utility, @code{|&} operator, comparison with -However, with @command{gawk}, it is possible to -open a @emph{two-way} pipe to another process. The second process is -termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}. -The two-way connection is created using the @samp{|&} operator -(borrowed from the Korn shell, @command{ksh}):@footnote{This is very -different from the same operator in the C shell.} - -@example -do @{ - print @var{data} |& "subprogram" - "subprogram" |& getline results -@} while (@var{data left to process}) -close("subprogram") -@end example - -The first time an I/O operation is executed using the @samp{|&} -operator, @command{gawk} creates a two-way pipeline to a child process -that runs the other program. Output created with @code{print} -or @code{printf} is written to the program's standard input, and -output from the program's standard output can be read by the @command{gawk} -program using @code{getline}. -As is the case with processes started by @samp{|}, the subprogram -can be any program, or pipeline of programs, that can be started by -the shell. +@ifdocbook +@part Part II:@* Problem Solving With @command{awk} -There are some cautionary items to be aware of: +Part II shows how to use @command{awk} and @command{gawk} for problem solving. +There is lots of code here for you to read and learn from. +It contains the following chapters: @itemize @bullet @item -As the code inside @command{gawk} currently stands, the coprocess's -standard error goes to the same place that the parent @command{gawk}'s -standard error goes. It is not possible to read the child's -standard error separately. +@ref{Library Functions}. -@cindex deadlocks -@cindex buffering, input/output -@cindex @code{getline} command, deadlock and @item -I/O buffering may be a problem. @command{gawk} automatically -flushes all output down the pipe to the coprocess. -However, if the coprocess does not flush its output, -@command{gawk} may hang when doing a @code{getline} in order to read -the coprocess's results. This could lead to a situation -known as @dfn{deadlock}, where each process is waiting for the -other one to do something. +@ref{Sample Programs}. @end itemize - -@cindex @code{close()} function, two-way pipes and -It is possible to close just one end of the two-way pipe to -a coprocess, by supplying a second argument to the @code{close()} -function of either @code{"to"} or @code{"from"} -(@pxref{Close Files And Pipes}). -These strings tell @command{gawk} to close the end of the pipe -that sends data to the coprocess or the end that reads from it, -respectively. - -@cindex @command{sort} utility, coprocesses and -This is particularly necessary in order to use -the system @command{sort} utility as part of a coprocess; -@command{sort} must read @emph{all} of its input -data before it can produce any output. -The @command{sort} program does not receive an end-of-file indication -until @command{gawk} closes the write end of the pipe. - -When you have finished writing data to the @command{sort} -utility, you can close the @code{"to"} end of the pipe, and -then start reading sorted data via @code{getline}. -For example: - -@example -BEGIN @{ - command = "LC_ALL=C sort" - n = split("abcdefghijklmnopqrstuvwxyz", a, "") - - for (i = n; i > 0; i--) - print a[i] |& command - close(command, "to") - - while ((command |& getline line) > 0) - print "got", line - close(command) -@} -@end example - -This program writes the letters of the alphabet in reverse order, one -per line, down the two-way pipe to @command{sort}. It then closes the -write end of the pipe, so that @command{sort} receives an end-of-file -indication. This causes @command{sort} to sort the data and write the -sorted data back to the @command{gawk} program. Once all of the data -has been read, @command{gawk} terminates the coprocess and exits. - -As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} -command ensures traditional Unix (ASCII) sorting from @command{sort}. - -@cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array -You may also use pseudo-ttys (ptys) for -two-way communication instead of pipes, if your system supports them. -This is done on a per-command basis, by setting a special element -in the @code{PROCINFO} array -(@pxref{Auto-set}), -like so: - -@example -command = "sort -nr" # command, save in convenience variable -PROCINFO[command, "pty"] = 1 # update PROCINFO -print @dots{} |& command # start two-way pipe -@dots{} -@end example - -@noindent -Using ptys avoids the buffer deadlock issues described earlier, at some -loss in performance. If your system does not have ptys, or if all the -system's ptys are in use, @command{gawk} automatically falls back to -using regular pipes. - -@node TCP/IP Networking -@section Using @command{gawk} for Network Programming -@cindex advanced features, @command{gawk}, network programming -@cindex networks, programming -@c STARTOFRANGE tcpip -@cindex TCP/IP -@cindex @code{/inet/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet/@dots{}} (@command{gawk}) -@cindex @code{/inet4/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet4/@dots{}} (@command{gawk}) -@cindex @code{/inet6/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet6/@dots{}} (@command{gawk}) -@cindex @code{EMISTERED} -@quotation -@code{EMISTERED}:@* -@ @ @ @ @i{A host is a host from coast to coast,@* -@ @ @ @ and no-one can talk to host that's close,@* -@ @ @ @ unless the host that isn't close@* -@ @ @ @ is busy hung or dead.} -@end quotation - -In addition to being able to open a two-way pipeline to a coprocess -on the same system -(@pxref{Two-way I/O}), -it is possible to make a two-way connection to -another process on another system across an IP network connection. - -You can think of this as just a @emph{very long} two-way pipeline to -a coprocess. -The way @command{gawk} decides that you want to use TCP/IP networking is -by recognizing special @value{FN}s that begin with one of @samp{/inet/}, -@samp{/inet4/} or @samp{/inet6}. - -The full syntax of the special @value{FN} is -@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}. -The components are: - -@table @var -@item net-type -Specifies the kind of Internet connection to make. -Use @samp{/inet4/} to force IPv4, and -@samp{/inet6/} to force IPv6. -Plain @samp{/inet/} (which used to be the only option) uses -the system default, most likely IPv4. - -@item protocol -The protocol to use over IP. This must be either @samp{tcp}, or -@samp{udp}, for a TCP or UDP IP connection, -respectively. The use of TCP is recommended for most applications. - -@item local-port -@cindex @code{getaddrinfo()} function (C library) -The local TCP or UDP port number to use. Use a port number of @samp{0} -when you want the system to pick a port. This is what you should do -when writing a TCP or UDP client. -You may also use a well-known service name, such as @samp{smtp} -or @samp{http}, in which case @command{gawk} attempts to determine -the predefined port number using the C @code{getaddrinfo()} function. - -@item remote-host -The IP address or fully-qualified domain name of the Internet -host to which you want to connect. - -@item remote-port -The TCP or UDP port number to use on the given @var{remote-host}. -Again, use @samp{0} if you don't care, or else a well-known -service name. -@end table - -@cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable -@quotation NOTE -Failure in opening a two-way socket will result in a non-fatal error -being returned to the calling code. The value of @code{ERRNO} indicates -the error (@pxref{Auto-set}). -@end quotation - -Consider the following very simple example: - -@example -BEGIN @{ - Service = "/inet/tcp/0/localhost/daytime" - Service |& getline - print $0 - close(Service) -@} -@end example - -This program reads the current date and time from the local system's -TCP @samp{daytime} server. -It then prints the results and closes the connection. - -Because this topic is extensive, the use of @command{gawk} for -TCP/IP programming is documented separately. -@ifinfo -See -@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}, -@end ifinfo -@ifnotinfo -See @cite{TCP/IP Internetworking with @command{gawk}}, -which comes as part of the @command{gawk} distribution, -@end ifnotinfo -for a much more complete introduction and discussion, as well as -extensive examples. - -@c ENDOFRANGE tcpip - -@node Profiling -@section Profiling Your @command{awk} Programs -@c STARTOFRANGE awkp -@cindex @command{awk} programs, profiling -@c STARTOFRANGE proawk -@cindex profiling @command{awk} programs -@cindex profiling @command{gawk} -@cindex @code{awkprof.out} file -@cindex files, @code{awkprof.out} - -You may produce execution traces of your @command{awk} programs. -This is done by passing the option @option{--profile} to @command{gawk}. -When @command{gawk} has finished running, it creates a profile of your program in a file -named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than -@command{gawk} normally does. - -@cindex @code{--profile} option -As shown in the following example, -the @option{--profile} option can be used to change the name of the file -where @command{gawk} will write the profile: - -@example -gawk --profile=myprog.prof -f myprog.awk data1 data2 -@end example - -@noindent -In the above example, @command{gawk} places the profile in -@file{myprog.prof} instead of in @file{awkprof.out}. - -Here is a sample session showing a simple @command{awk} program, its input data, and the -results from running @command{gawk} with the @option{--profile} option. -First, the @command{awk} program: - -@example -BEGIN @{ print "First BEGIN rule" @} - -END @{ print "First END rule" @} - -/foo/ @{ - print "matched /foo/, gosh" - for (i = 1; i <= 3; i++) - sing() -@} - -@{ - if (/foo/) - print "if is true" - else - print "else is true" -@} - -BEGIN @{ print "Second BEGIN rule" @} - -END @{ print "Second END rule" @} - -function sing( dummy) -@{ - print "I gotta be me!" -@} -@end example - -Following is the input data: - -@example -foo -bar -baz -foo -junk -@end example - -Here is the @file{awkprof.out} that results from running the @command{gawk} -profiler on this program and data (this example also illustrates that @command{awk} -programmers sometimes have to work late): - -@cindex @code{BEGIN} pattern -@cindex @code{END} pattern -@example - # gawk profile, created Sun Aug 13 00:00:15 2000 - - # BEGIN block(s) - - BEGIN @{ - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - @} - - # Rule(s) - - 5 /foo/ @{ # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) @{ - 6 sing() - @} - @} - - 5 @{ - 5 if (/foo/) @{ # 2 - 2 print "if is true" - 3 @} else @{ - 3 print "else is true" - @} - @} - - # END block(s) - - END @{ - 1 print "First END rule" - 1 print "Second END rule" - @} - - # Functions, listed alphabetically - - 6 function sing(dummy) - @{ - 6 print "I gotta be me!" - @} -@end example - -This example illustrates many of the basic features of profiling output. -They are as follows: - -@itemize @bullet -@item -The program is printed in the order @code{BEGIN} rule, -@code{BEGINFILE} rule, -pattern/action rules, -@code{ENDFILE} rule, @code{END} rule and functions, listed -alphabetically. -Multiple @code{BEGIN} and @code{END} rules are merged together, -as are multiple @code{BEGINFILE} and @code{ENDFILE} rules. - -@cindex patterns, counts -@item -Pattern-action rules have two counts. -The first count, to the left of the rule, shows how many times -the rule's pattern was @emph{tested}. -The second count, to the right of the rule's opening left brace -in a comment, -shows how many times the rule's action was @emph{executed}. -The difference between the two indicates how many times the rule's -pattern evaluated to false. - -@item -Similarly, -the count for an @code{if}-@code{else} statement shows how many times -the condition was tested. -To the right of the opening left brace for the @code{if}'s body -is a count showing how many times the condition was true. -The count for the @code{else} -indicates how many times the test failed. - -@cindex loops, count for header -@item -The count for a loop header (such as @code{for} -or @code{while}) shows how many times the loop test was executed. -(Because of this, you can't just look at the count on the first -statement in a rule to determine how many times the rule was executed. -If the first statement is a loop, the count is misleading.) - -@cindex functions, user-defined, counts -@cindex user-defined, functions, counts -@item -For user-defined functions, the count next to the @code{function} -keyword indicates how many times the function was called. -The counts next to the statements in the body show how many times -those statements were executed. - -@cindex @code{@{@}} (braces) -@cindex braces (@code{@{@}}) -@item -The layout uses ``K&R'' style with TABs. -Braces are used everywhere, even when -the body of an @code{if}, @code{else}, or loop is only a single statement. - -@cindex @code{()} (parentheses) -@cindex parentheses @code{()} -@item -Parentheses are used only where needed, as indicated by the structure -of the program and the precedence rules. -@c extra verbiage here satisfies the copyeditor. ugh. -For example, @samp{(3 + 5) * 4} means add three plus five, then multiply -the total by four. However, @samp{3 + 5 * 4} has no parentheses, and -means @samp{3 + (5 * 4)}. - -@ignore -@item -All string concatenations are parenthesized too. -(This could be made a bit smarter.) +@end ifdocbook @end ignore -@item -Parentheses are used around the arguments to @code{print} -and @code{printf} only when -the @code{print} or @code{printf} statement is followed by a redirection. -Similarly, if -the target of a redirection isn't a scalar, it gets parenthesized. - -@item -@command{gawk} supplies leading comments in -front of the @code{BEGIN} and @code{END} rules, -the pattern/action rules, and the functions. - -@end itemize - -The profiled version of your program may not look exactly like what you -typed when you wrote it. This is because @command{gawk} creates the -profiled version by ``pretty printing'' its internal representation of -the program. The advantage to this is that @command{gawk} can produce -a standard representation. The disadvantage is that all source-code -comments are lost, as are the distinctions among multiple @code{BEGIN}, -@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: - -@example -/foo/ -@end example - -@noindent -come out as: - -@example -/foo/ @{ - print $0 -@} -@end example - -@noindent -which is correct, but possibly surprising. - -@cindex profiling @command{awk} programs, dynamically -@cindex @command{gawk} program, dynamic profiling -Besides creating profiles when a program has completed, -@command{gawk} can produce a profile while it is running. -This is useful if your @command{awk} program goes into an -infinite loop and you want to see what has been executed. -To use this feature, run @command{gawk} with the @option{--profile} -option in the background: - -@example -$ @kbd{gawk --profile -f myprog &} -[1] 13992 -@end example - -@cindex @command{kill} command@comma{} dynamic profiling -@cindex @code{USR1} signal -@cindex @code{SIGUSR1} signal -@cindex signals, @code{USR1}/@code{SIGUSR1} -@noindent -The shell prints a job number and process ID number; in this case, 13992. -Use the @command{kill} command to send the @code{USR1} signal -to @command{gawk}: - -@example -$ @kbd{kill -USR1 13992} -@end example - -@noindent -As usual, the profiled version of the program is written to -@file{awkprof.out}, or to a different file if one specified with -the @option{--profile} option. - -Along with the regular profile, as shown earlier, the profile -includes a trace of any active functions: - -@example -# Function Call Stack: - -# 3. baz -# 2. bar -# 1. foo -# -- main -- -@end example - -You may send @command{gawk} the @code{USR1} signal as many times as you like. -Each time, the profile and function call trace are appended to the output -profile file. - -@cindex @code{HUP} signal -@cindex @code{SIGHUP} signal -@cindex signals, @code{HUP}/@code{SIGHUP} -If you use the @code{HUP} signal instead of the @code{USR1} signal, -@command{gawk} produces the profile and the function call trace and then exits. - -@cindex @code{INT} signal (MS-Windows) -@cindex @code{SIGINT} signal (MS-Windows) -@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows) -@cindex @code{QUIT} signal (MS-Windows) -@cindex @code{SIGQUIT} signal (MS-Windows) -@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) -When @command{gawk} runs on MS-Windows systems, it uses the -@code{INT} and @code{QUIT} signals for producing the profile and, in -the case of the @code{INT} signal, @command{gawk} exits. This is -because these systems don't support the @command{kill} command, so the -only signals you can deliver to a program are those generated by the -keyboard. The @code{INT} signal is generated by the -@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the -@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. - -Finally, @command{gawk} also accepts another option @option{--pretty-print}. -When called this way, @command{gawk} ``pretty prints'' the program into -@file{awkprof.out}, without any execution counts. -@c ENDOFRANGE advgaw -@c ENDOFRANGE gawadv -@c ENDOFRANGE awkp -@c ENDOFRANGE proawk - @node Library Functions @chapter A Library of @command{awk} Functions @c STARTOFRANGE libf @@ -25691,6 +23815,1921 @@ BEGIN { } @end ignore +@iftex +@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk} +@end iftex + +@ignore +@ifdocbook + +@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk} + +Part III focuses on features specific to @command{gawk}. +It contains the following chapters: + +@itemize @bullet +@item +@ref{Internationalization}. + +@item +@ref{Advanced Features}. + +@item +@ref{Debugger}. + +@item +@ref{Arbitrary Precision Arithmetic}. + +@item +@ref{Dynamic Extensions}. +@end ifdocbook +@end ignore + +@node Internationalization +@chapter Internationalization with @command{gawk} + +Once upon a time, computer makers +wrote software that worked only in English. +Eventually, hardware and software vendors noticed that if their +systems worked in the native languages of non-English-speaking +countries, they were able to sell more systems. +As a result, internationalization and localization +of programs and software systems became a common practice. + +@c STARTOFRANGE inloc +@cindex internationalization, localization +@cindex @command{gawk}, internationalization and, See internationalization +@cindex internationalization, localization, @command{gawk} and +For many years, the ability to provide internationalization +was largely restricted to programs written in C and C++. +This @value{CHAPTER} describes the underlying library @command{gawk} +uses for internationalization, as well as how +@command{gawk} makes internationalization +features available at the @command{awk} program level. +Having internationalization available at the @command{awk} level +gives software developers additional flexibility---they are no +longer forced to write in C or C++ when internationalization is +a requirement. + +@menu +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also internationalized. +@end menu + +@node I18N and L10N +@section Internationalization and Localization + +@cindex internationalization +@cindex localization, See internationalization@comma{} localization +@cindex localization +@dfn{Internationalization} means writing (or modifying) a program once, +in such a way that it can use multiple languages without requiring +further source-code changes. +@dfn{Localization} means providing the data necessary for an +internationalized program to work in a particular language. +Most typically, these terms refer to features such as the language +used for printing error messages, the language used to read +responses, and information related to how numerical and +monetary values are printed and read. + +@node Explaining gettext +@section GNU @code{gettext} + +@cindex internationalizing a program +@c STARTOFRANGE gettex +@cindex @code{gettext} library +The facilities in GNU @code{gettext} focus on messages; strings printed +by a program, either directly or via formatting with @code{printf} or +@code{sprintf()}.@footnote{For some operating systems, the @command{gawk} +port doesn't support GNU @code{gettext}. +Therefore, these features are not available +if you are using one of those operating systems. Sorry.} + +@cindex portability, @code{gettext} library and +When using GNU @code{gettext}, each application has its own +@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk}, +that identifies the application. +A complete application may have multiple components---programs written +in C or C++, as well as scripts written in @command{sh} or @command{awk}. +All of the components use the same text domain. + +To make the discussion concrete, assume we're writing an application +named @command{guide}. Internationalization consists of the +following steps, in this order: + +@enumerate +@item +The programmer goes +through the source for all of @command{guide}'s components +and marks each string that is a candidate for translation. +For example, @code{"`-F': option required"} is a good candidate for translation. +A table with strings of option names is not (e.g., @command{gawk}'s +@option{--profile} option should remain the same, no matter what the local +language). + +@cindex @code{textdomain()} function (C library) +@item +The programmer indicates the application's text domain +(@code{"guide"}) to the @code{gettext} library, +by calling the @code{textdomain()} function. + +@cindex @code{.pot} files +@cindex files, @code{.pot} +@cindex portable object template files +@cindex files, portable object template +@item +Messages from the application are extracted from the source code and +collected into a portable object template file (@file{guide.pot}), +which lists the strings and their translations. +The translations are initially empty. +The original (usually English) messages serve as the key for +lookup of the translations. + +@cindex @code{.po} files +@cindex files, @code{.po} +@cindex portable object files +@cindex files, portable object +@item +For each language with a translator, @file{guide.pot} +is copied to a portable object file (@code{.po}) +and translations are created and shipped with the application. +For example, there might be a @file{fr.po} for a French translation. + +@cindex @code{.mo} files +@cindex files, @code{.mo} +@cindex message object files +@cindex files, message object +@item +Each language's @file{.po} file is converted into a binary +message object (@file{.mo}) file. +A message object file contains the original messages and their +translations in a binary format that allows fast lookup of translations +at runtime. + +@item +When @command{guide} is built and installed, the binary translation files +are installed in a standard place. + +@cindex @code{bindtextdomain()} function (C library) +@item +For testing and development, it is possible to tell @code{gettext} +to use @file{.mo} files in a different directory than the standard +one by using the @code{bindtextdomain()} function. + +@cindex @code{.mo} files, specifying directory of +@cindex files, @code{.mo}, specifying directory of +@cindex message object files, specifying directory of +@cindex files, message object, specifying directory of +@item +At runtime, @command{guide} looks up each string via a call +to @code{gettext()}. The returned string is the translated string +if available, or the original string if not. + +@item +If necessary, it is possible to access messages from a different +text domain than the one belonging to the application, without +having to switch the application's default text domain back +and forth. +@end enumerate + +@cindex @code{gettext()} function (C library) +In C (or C++), the string marking and dynamic translation lookup +are accomplished by wrapping each string in a call to @code{gettext()}: + +@example +printf("%s", gettext("Don't Panic!\n")); +@end example + +The tools that extract messages from source code pull out all +strings enclosed in calls to @code{gettext()}. + +@cindex @code{_} (underscore), @code{_} C macro +@cindex underscore (@code{_}), @code{_} C macro +The GNU @code{gettext} developers, recognizing that typing +@samp{gettext(@dots{})} over and over again is both painful and ugly to look +at, use the macro @samp{_} (an underscore) to make things easier: + +@example +/* In the standard header file: */ +#define _(str) gettext(str) + +/* In the program text: */ +printf("%s", _("Don't Panic!\n")); +@end example + +@cindex internationalization, localization, locale categories +@cindex @code{gettext} library, locale categories +@cindex locale categories +@noindent +This reduces the typing overhead to just three extra characters per string +and is considerably easier to read as well. + +There are locale @dfn{categories} +for different types of locale-related information. +The defined locale categories that @code{gettext} knows about are: + +@table @code +@cindex @code{LC_MESSAGES} locale category +@item LC_MESSAGES +Text messages. This is the default category for @code{gettext} +operations, but it is possible to supply a different one explicitly, +if necessary. (It is almost never necessary to supply a different category.) + +@cindex sorting characters in different languages +@cindex @code{LC_COLLATE} locale category +@item LC_COLLATE +Text-collation information; i.e., how different characters +and/or groups of characters sort in a given language. + +@cindex @code{LC_CTYPE} locale category +@item LC_CTYPE +Character-type information (alphabetic, digit, upper- or lowercase, and +so on). +This information is accessed via the +POSIX character classes in regular expressions, +such as @code{/[[:alnum:]]/} +(@pxref{Regexp Operators}). + +@cindex monetary information, localization +@cindex currency symbols, localization +@cindex @code{LC_MONETARY} locale category +@item LC_MONETARY +Monetary information, such as the currency symbol, and whether the +symbol goes before or after a number. + +@cindex @code{LC_NUMERIC} locale category +@item LC_NUMERIC +Numeric information, such as which characters to use for the decimal +point and the thousands separator.@footnote{Americans +use a comma every three decimal places and a period for the decimal +point, while many Europeans do exactly the opposite: +1,234.56 versus 1.234,56.} + +@cindex @code{LC_RESPONSE} locale category +@item LC_RESPONSE +Response information, such as how ``yes'' and ``no'' appear in the +local language, and possibly other information as well. + +@cindex time, localization and +@cindex dates, information related to@comma{} localization +@cindex @code{LC_TIME} locale category +@item LC_TIME +Time- and date-related information, such as 12- or 24-hour clock, month printed +before or after the day in a date, local month abbreviations, and so on. + +@cindex @code{LC_ALL} locale category +@item LC_ALL +All of the above. (Not too useful in the context of @code{gettext}.) +@end table +@c ENDOFRANGE gettex + +@node Programmer i18n +@section Internationalizing @command{awk} Programs +@c STARTOFRANGE inap +@cindex @command{awk} programs, internationalizing + +@command{gawk} provides the following variables and functions for +internationalization: + +@table @code +@cindex @code{TEXTDOMAIN} variable +@item TEXTDOMAIN +This variable indicates the application's text domain. +For compatibility with GNU @code{gettext}, the default +value is @code{"messages"}. + +@cindex internationalization, localization, marked strings +@cindex strings, for localization +@item _"your message here" +String constants marked with a leading underscore +are candidates for translation at runtime. +String constants without a leading underscore are not translated. + +@cindex @code{dcgettext()} function (@command{gawk}) +@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) +Return the translation of @var{string} in +text domain @var{domain} for locale category @var{category}. +The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. +The default value for @var{category} is @code{"LC_MESSAGES"}. + +If you supply a value for @var{category}, it must be a string equal to +one of the known locale categories described in +@ifnotinfo +the previous @value{SECTION}. +@end ifnotinfo +@ifinfo +@ref{Explaining gettext}. +@end ifinfo +You must also supply a text domain. Use @code{TEXTDOMAIN} if +you want to use the current domain. + +@quotation CAUTION +The order of arguments to the @command{awk} version +of the @code{dcgettext()} function is purposely different from the order for +the C version. The @command{awk} version's order was +chosen to be simple and to allow for reasonable @command{awk}-style +default arguments. +@end quotation + +@cindex @code{dcngettext()} function (@command{gawk}) +@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) +Return the plural form used for @var{number} of the +translation of @var{string1} and @var{string2} in text domain +@var{domain} for locale category @var{category}. @var{string1} is the +English singular variant of a message, and @var{string2} the English plural +variant of the same message. +The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. +The default value for @var{category} is @code{"LC_MESSAGES"}. + +The same remarks about argument order as for the @code{dcgettext()} function apply. + +@cindex @code{.mo} files, specifying directory of +@cindex files, @code{.mo}, specifying directory of +@cindex message object files, specifying directory of +@cindex files, message object, specifying directory of +@cindex @code{bindtextdomain()} function (@command{gawk}) +@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) +Change the directory in which +@code{gettext} looks for @file{.mo} files, in case they +will not or cannot be placed in the standard locations +(e.g., during testing). +Return the directory in which @var{domain} is ``bound.'' + +The default @var{domain} is the value of @code{TEXTDOMAIN}. +If @var{directory} is the null string (@code{""}), then +@code{bindtextdomain()} returns the current binding for the +given @var{domain}. +@end table + +To use these facilities in your @command{awk} program, follow the steps +outlined in +@ifnotinfo +the previous @value{SECTION}, +@end ifnotinfo +@ifinfo +@ref{Explaining gettext}, +@end ifinfo +like so: + +@enumerate +@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and +@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and +@item +Set the variable @code{TEXTDOMAIN} to the text domain of +your program. This is best done in a @code{BEGIN} rule +(@pxref{BEGIN/END}), +or it can also be done via the @option{-v} command-line +option (@pxref{Options}): + +@example +BEGIN @{ + TEXTDOMAIN = "guide" + @dots{} +@} +@end example + +@cindex @code{_} (underscore), translatable string +@cindex underscore (@code{_}), translatable string +@item +Mark all translatable strings with a leading underscore (@samp{_}) +character. It @emph{must} be adjacent to the opening +quote of the string. For example: + +@example +print _"hello, world" +x = _"you goofed" +printf(_"Number of users is %d\n", nusers) +@end example + +@item +If you are creating strings dynamically, you can +still translate them, using the @code{dcgettext()} +built-in function: + +@example +message = nusers " users logged in" +message = dcgettext(message, "adminprog") +print message +@end example + +Here, the call to @code{dcgettext()} supplies a different +text domain (@code{"adminprog"}) in which to find the +message, but it uses the default @code{"LC_MESSAGES"} category. + +@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) +@item +During development, you might want to put the @file{.mo} +file in a private directory for testing. This is done +with the @code{bindtextdomain()} built-in function: + +@example +BEGIN @{ + TEXTDOMAIN = "guide" # our text domain + if (Testing) @{ + # where to find our files + bindtextdomain("testdir") + # joe is in charge of adminprog + bindtextdomain("../joe/testdir", "adminprog") + @} + @dots{} +@} +@end example + +@end enumerate + +@xref{I18N Example}, +for an example program showing the steps to create +and use translations from @command{awk}. + +@node Translator i18n +@section Translating @command{awk} Programs + +@cindex @code{.po} files +@cindex files, @code{.po} +@cindex portable object files +@cindex files, portable object +Once a program's translatable strings have been marked, they must +be extracted to create the initial @file{.po} file. +As part of translation, it is often helpful to rearrange the order +in which arguments to @code{printf} are output. + +@command{gawk}'s @option{--gen-pot} command-line option extracts +the messages and is discussed next. +After that, @code{printf}'s ability to +rearrange the order for @code{printf} arguments at runtime +is covered. + +@menu +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability issues. +@end menu + +@node String Extraction +@subsection Extracting Marked Strings +@cindex strings, extracting +@cindex marked strings@comma{} extracting +@cindex @code{--gen-pot} option +@cindex command-line options, string extraction +@cindex string extraction (internationalization) +@cindex marked string extraction (internationalization) +@cindex extraction, of marked strings (internationalization) + +@cindex @code{--gen-pot} option +Once your @command{awk} program is working, and all the strings have +been marked and you've set (and perhaps bound) the text domain, +it is time to produce translations. +First, use the @option{--gen-pot} command-line option to create +the initial @file{.pot} file: + +@example +$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} +@end example + +@cindex @code{xgettext} utility +When run with @option{--gen-pot}, @command{gawk} does not execute your +program. Instead, it parses it as usual and prints all marked strings +to standard output in the format of a GNU @code{gettext} Portable Object +file. Also included in the output are any constant strings that +appear as the first argument to @code{dcgettext()} or as the first and +second argument to @code{dcngettext()}.@footnote{The +@command{xgettext} utility that comes with GNU +@code{gettext} can handle @file{.awk} files.} +@xref{I18N Example}, +for the full list of steps to go through to create and test +translations for @command{guide}. + +@node Printf Ordering +@subsection Rearranging @code{printf} Arguments + +@cindex @code{printf} statement, positional specifiers +@cindex positional specifiers, @code{printf} statement +Format strings for @code{printf} and @code{sprintf()} +(@pxref{Printf}) +present a special problem for translation. +Consider the following:@footnote{This example is borrowed +from the GNU @code{gettext} manual.} + +@c line broken here only for smallbook format +@example +printf(_"String `%s' has %d characters\n", + string, length(string))) +@end example + +A possible German translation for this might be: + +@example +"%d Zeichen lang ist die Zeichenkette `%s'\n" +@end example + +The problem should be obvious: the order of the format +specifications is different from the original! +Even though @code{gettext()} can return the translated string +at runtime, +it cannot change the argument order in the call to @code{printf}. + +To solve this problem, @code{printf} format specifiers may have +an additional optional element, which we call a @dfn{positional specifier}. +For example: + +@example +"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" +@end example + +Here, the positional specifier consists of an integer count, which indicates which +argument to use, and a @samp{$}. Counts are one-based, and the +format string itself is @emph{not} included. Thus, in the following +example, @samp{string} is the first argument and @samp{length(string)} is the second: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{string = "Dont Panic"} +> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} +> @kbd{string, length(string)} +> @kbd{@}'} +@print{} 10 characters live in "Dont Panic" +@end example + +If present, positional specifiers come first in the format specification, +before the flags, the field width, and/or the precision. + +Positional specifiers can be used with the dynamic field width and +precision capability: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{printf("%*.*s\n", 10, 20, "hello")} +> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")} +> @kbd{@}'} +@print{} hello +@print{} hello +@end example + +@quotation NOTE +When using @samp{*} with a positional specifier, the @samp{*} +comes first, then the integer position, and then the @samp{$}. +This is somewhat counterintuitive. +@end quotation + +@cindex @code{printf} statement, positional specifiers, mixing with regular formats +@cindex positional specifiers, @code{printf} statement, mixing with regular formats +@cindex format specifiers, mixing regular with positional specifiers +@command{gawk} does not allow you to mix regular format specifiers +and those with positional specifiers in the same string: + +@example +$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} +@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none +@end example + +@quotation NOTE +There are some pathological cases that @command{gawk} may fail to +diagnose. In such cases, the output may not be what you expect. +It's still a bad idea to try mixing them, even if @command{gawk} +doesn't detect it. +@end quotation + +Although positional specifiers can be used directly in @command{awk} programs, +their primary purpose is to help in producing correct translations of +format strings into languages different from the one in which the program +is first written. + +@node I18N Portability +@subsection @command{awk} Portability Issues + +@cindex portability, internationalization and +@cindex internationalization, localization, portability and +@command{gawk}'s internationalization features were purposely chosen to +have as little impact as possible on the portability of @command{awk} +programs that use them to other versions of @command{awk}. +Consider this program: + +@example +BEGIN @{ + TEXTDOMAIN = "guide" + if (Test_Guide) # set with -v + bindtextdomain("/test/guide/messages") + print _"don't panic!" +@} +@end example + +@noindent +As written, it won't work on other versions of @command{awk}. +However, it is actually almost portable, requiring very little +change: + +@itemize @bullet +@cindex @code{TEXTDOMAIN} variable, portability and +@item +Assignments to @code{TEXTDOMAIN} won't have any effect, +since @code{TEXTDOMAIN} is not special in other @command{awk} implementations. + +@item +Non-GNU versions of @command{awk} treat marked strings +as the concatenation of a variable named @code{_} with the string +following it.@footnote{This is good fodder for an ``Obfuscated +@command{awk}'' contest.} Typically, the variable @code{_} has +the null string (@code{""}) as its value, leaving the original string constant as +the result. + +@item +By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()}, the @command{awk} program can be made to run, but +all the messages are output in the original language. +For example: + +@cindex @code{bindtextdomain()} function (@command{gawk}), portability and +@cindex @code{dcgettext()} function (@command{gawk}), portability and +@cindex @code{dcngettext()} function (@command{gawk}), portability and +@example +@c file eg/lib/libintl.awk +function bindtextdomain(dir, domain) +@{ + return dir +@} + +function dcgettext(string, domain, category) +@{ + return string +@} + +function dcngettext(string1, string2, number, domain, category) +@{ + return (number == 1 ? string1 : string2) +@} +@c endfile +@end example + +@item +The use of positional specifications in @code{printf} or +@code{sprintf()} is @emph{not} portable. +To support @code{gettext()} at the C level, many systems' C versions of +@code{sprintf()} do support positional specifiers. But it works only if +enough arguments are supplied in the function call. Many versions of +@command{awk} pass @code{printf} formats and arguments unchanged to the +underlying C library version of @code{sprintf()}, but only one format and +argument at a time. What happens if a positional specification is +used is anybody's guess. +However, since the positional specifications are primarily for use in +@emph{translated} format strings, and since non-GNU @command{awk}s never +retrieve the translated string, this should not be a problem in practice. +@end itemize +@c ENDOFRANGE inap + +@node I18N Example +@section A Simple Internationalization Example + +Now let's look at a step-by-step example of how to internationalize and +localize a simple @command{awk} program, using @file{guide.awk} as our +original source: + +@example +@c file eg/prog/guide.awk +BEGIN @{ + TEXTDOMAIN = "guide" + bindtextdomain(".") # for testing + print _"Don't Panic" + print _"The Answer Is", 42 + print "Pardon me, Zaphod who?" +@} +@c endfile +@end example + +@noindent +Run @samp{gawk --gen-pot} to create the @file{.pot} file: + +@example +$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} +@end example + +@noindent +This produces: + +@example +@c file eg/data/guide.po +#: guide.awk:4 +msgid "Don't Panic" +msgstr "" + +#: guide.awk:5 +msgid "The Answer Is" +msgstr "" + +@c endfile +@end example + +This original portable object template file is saved and reused for each language +into which the application is translated. The @code{msgid} +is the original string and the @code{msgstr} is the translation. + +@quotation NOTE +Strings not marked with a leading underscore do not +appear in the @file{guide.pot} file. +@end quotation + +Next, the messages must be translated. +Here is a translation to a hypothetical dialect of English, +called ``Mellow'':@footnote{Perhaps it would be better if it were +called ``Hippy.'' Ah, well.} + +@example +@group +$ cp guide.pot guide-mellow.po +@var{Add translations to} guide-mellow.po @dots{} +@end group +@end example + +@noindent +Following are the translations: + +@example +@c file eg/data/guide-mellow.po +#: guide.awk:4 +msgid "Don't Panic" +msgstr "Hey man, relax!" + +#: guide.awk:5 +msgid "The Answer Is" +msgstr "Like, the scoop is" + +@c endfile +@end example + +@cindex Linux +@cindex GNU/Linux +The next step is to make the directory to hold the binary message object +file and then to create the @file{guide.mo} file. +The directory layout shown here is standard for GNU @code{gettext} on +GNU/Linux systems. Other versions of @code{gettext} may use a different +layout: + +@example +$ @kbd{mkdir en_US en_US/LC_MESSAGES} +@end example + +@cindex @code{.po} files, converting to @code{.mo} +@cindex files, @code{.po}, converting to @code{.mo} +@cindex @code{.mo} files, converting from @code{.po} +@cindex files, @code{.mo}, converting from @code{.po} +@cindex portable object files, converting to message object files +@cindex files, portable object, converting to message object files +@cindex message object files, converting from portable object files +@cindex files, message object, converting from portable object files +@cindex @command{msgfmt} utility +The @command{msgfmt} utility does the conversion from human-readable +@file{.po} file to machine-readable @file{.mo} file. +By default, @command{msgfmt} creates a file named @file{messages}. +This file must be renamed and placed in the proper directory so that +@command{gawk} can find it: + +@example +$ @kbd{msgfmt guide-mellow.po} +$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo} +@end example + +Finally, we run the program to test it: + +@example +$ @kbd{gawk -f guide.awk} +@print{} Hey man, relax! +@print{} Like, the scoop is 42 +@print{} Pardon me, Zaphod who? +@end example + +If the three replacement functions for @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()} +(@pxref{I18N Portability}) +are in a file named @file{libintl.awk}, +then we can run @file{guide.awk} unchanged as follows: + +@example +$ @kbd{gawk --posix -f guide.awk -f libintl.awk} +@print{} Don't Panic +@print{} The Answer Is 42 +@print{} Pardon me, Zaphod who? +@end example + +@node Gawk I18N +@section @command{gawk} Can Speak Your Language + +@command{gawk} itself has been internationalized +using the GNU @code{gettext} package. +(GNU @code{gettext} is described in +complete detail in +@ifinfo +@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.) +@end ifinfo +@ifnotinfo +@cite{GNU gettext tools}.) +@end ifnotinfo +As of this writing, the latest version of GNU @code{gettext} is +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}. + +If a translation of @command{gawk}'s messages exists, +then @command{gawk} produces usage messages, warnings, +and fatal errors in the local language. +@c ENDOFRANGE inloc + +@node Advanced Features +@chapter Advanced Features of @command{gawk} +@cindex advanced features, network connections, See Also networks, connections +@c STARTOFRANGE gawadv +@cindex @command{gawk}, features, advanced +@c STARTOFRANGE advgaw +@cindex advanced features, @command{gawk} +@ignore +Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> + + Found in Steve English's "signature" line: + +"Write documentation as if whoever reads it is a violent psychopath +who knows where you live." +@end ignore +@quotation +@i{Write documentation as if whoever reads it is +a violent psychopath who knows where you live.}@* +Steve English, as quoted by Peter Langston +@end quotation + +This @value{CHAPTER} discusses advanced features in @command{gawk}. +It's a bit of a ``grab bag'' of items that are otherwise unrelated +to each other. +First, a command-line option allows @command{gawk} to recognize +nondecimal numbers in input data, not just in @command{awk} +programs. +Then, @command{gawk}'s special features for sorting arrays are presented. +Next, two-way I/O, discussed briefly in earlier parts of this +@value{DOCUMENT}, is described in full detail, along with the basics +of TCP/IP networking. Finally, @command{gawk} +can @dfn{profile} an @command{awk} program, making it possible to tune +it for performance. + +@ref{Dynamic Extensions}, +discusses the ability to dynamically add new built-in functions to +@command{gawk}. As this feature is still immature and likely to change, +its description is relegated to an appendix. + +@menu +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array traversal and + sorting arrays. +* Two-way I/O:: Two-way communications with another process. +* TCP/IP Networking:: Using @command{gawk} for network programming. +* Profiling:: Profiling your @command{awk} programs. +@end menu + +@node Nondecimal Data +@section Allowing Nondecimal Input Data +@cindex @code{--non-decimal-data} option +@cindex advanced features, @command{gawk}, nondecimal input data +@cindex input, data@comma{} nondecimal +@cindex constants, nondecimal + +If you run @command{gawk} with the @option{--non-decimal-data} option, +you can have nondecimal constants in your input data: + +@c line break here for small book format +@example +$ @kbd{echo 0123 123 0x123 |} +> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",} +> @kbd{$1, $2, $3 @}'} +@print{} 83, 123, 291 +@end example + +For this feature to work, write your program so that +@command{gawk} treats your data as numeric: + +@example +$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'} +@print{} 0123 123 0x123 +@end example + +@noindent +The @code{print} statement treats its expressions as strings. +Although the fields can act as numbers when necessary, +they are still strings, so @code{print} does not try to treat them +numerically. You may need to add zero to a field to force it to +be treated as a number. For example: + +@example +$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '} +> @kbd{@{ print $1, $2, $3} +> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'} +@print{} 0123 123 0x123 +@print{} 83 123 291 +@end example + +Because it is common to have decimal data with leading zeros, and because +using this facility could lead to surprising results, the default is to leave it +disabled. If you want it, you must explicitly request it. + +@cindex programming conventions, @code{--non-decimal-data} option +@cindex @code{--non-decimal-data} option, @code{strtonum()} function and +@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and +@quotation CAUTION +@emph{Use of this option is not recommended.} +It can break old programs very badly. +Instead, use the @code{strtonum()} function to convert your data +(@pxref{Nondecimal-numbers}). +This makes your programs easier to write and easier to read, and +leads to less surprising results. +@end quotation + +@node Array Sorting +@section Controlling Array Traversal and Array Sorting + +@command{gawk} lets you control the order in which a @samp{for (i in array)} +loop traverses an array. + +In addition, two built-in functions, @code{asort()} and @code{asorti()}, +let you sort arrays based on the array values and indices, respectively. +These two functions also provide control over the sorting criteria used +to order the elements during sorting. + +@menu +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}. +@end menu + +@node Controlling Array Traversal +@subsection Controlling Array Traversal + +By default, the order in which a @samp{for (i in array)} loop +scans an array is not defined; it is generally based upon +the internal implementation of arrays inside @command{awk}. + +Often, though, it is desirable to be able to loop over the elements +in a particular order that you, the programmer, choose. @command{gawk} +lets you do this. + +@ref{Controlling Scanning}, describes how you can assign special, +pre-defined values to @code{PROCINFO["sorted_in"]} in order to +control the order in which @command{gawk} will traverse an array +during a @code{for} loop. + +In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name. +This lets you traverse an array based on any custom criterion. +The array elements are ordered according to the return value of this +function. The comparison function should be defined with at least +four arguments: + +@example +function comp_func(i1, v1, i2, v2) +@{ + @var{compare elements 1 and 2 in some fashion} + @var{return < 0; 0; or > 0} +@} +@end example + +Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} +are the corresponding values of the two elements being compared. +Either @var{v1} or @var{v2}, or both, can be arrays if the array being +traversed contains subarrays as values. +(@xref{Arrays of Arrays}, for more information about subarrays.) +The three possible return values are interpreted as follows: + +@table @code +@item comp_func(i1, v1, i2, v2) < 0 +Index @var{i1} comes before index @var{i2} during loop traversal. + +@item comp_func(i1, v1, i2, v2) == 0 +Indices @var{i1} and @var{i2} +come together but the relative order with respect to each other is undefined. + +@item comp_func(i1, v1, i2, v2) > 0 +Index @var{i1} comes after index @var{i2} during loop traversal. +@end table + +Our first comparison function can be used to scan an array in +numerical order of the indices: + +@example +function cmp_num_idx(i1, v1, i2, v2) +@{ + # numerical index comparison, ascending order + return (i1 - i2) +@} +@end example + +Our second function traverses an array based on the string order of +the element values rather than by indices: + +@example +function cmp_str_val(i1, v1, i2, v2) +@{ + # string value comparison, ascending order + v1 = v1 "" + v2 = v2 "" + if (v1 < v2) + return -1 + return (v1 != v2) +@} +@end example + +The third +comparison function makes all numbers, and numeric strings without +any leading or trailing spaces, come out first during loop traversal: + +@example +function cmp_num_str_val(i1, v1, i2, v2, n1, n2) +@{ + # numbers before string value comparison, ascending order + n1 = v1 + 0 + n2 = v2 + 0 + if (n1 == v1) + return (n2 == v2) ? (n1 - n2) : -1 + else if (n2 == v2) + return 1 + return (v1 < v2) ? -1 : (v1 != v2) +@} +@end example + +Here is a main program to demonstrate how @command{gawk} +behaves using each of the previous functions: + +@example +BEGIN @{ + data["one"] = 10 + data["two"] = 20 + data[10] = "one" + data[100] = 100 + data[20] = "two" + + f[1] = "cmp_num_idx" + f[2] = "cmp_str_val" + f[3] = "cmp_num_str_val" + for (i = 1; i <= 3; i++) @{ + printf("Sort function: %s\n", f[i]) + PROCINFO["sorted_in"] = f[i] + for (j in data) + printf("\tdata[%s] = %s\n", j, data[j]) + print "" + @} +@} +@end example + +Here are the results when the program is run: +@page + +@example +$ @kbd{gawk -f compdemo.awk} +@print{} Sort function: cmp_num_idx @ii{Sort by numeric index} +@print{} data[two] = 20 +@print{} data[one] = 10 @ii{Both strings are numerically zero} +@print{} data[10] = one +@print{} data[20] = two +@print{} data[100] = 100 +@print{} +@print{} Sort function: cmp_str_val @ii{Sort by element values as strings} +@print{} data[one] = 10 +@print{} data[100] = 100 @ii{String 100 is less than string 20} +@print{} data[two] = 20 +@print{} data[10] = one +@print{} data[20] = two +@print{} +@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings} +@print{} data[one] = 10 +@print{} data[two] = 20 +@print{} data[100] = 100 +@print{} data[10] = one +@print{} data[20] = two +@end example + +Consider sorting the entries of a GNU/Linux system password file +according to login name. The following program sorts records +by a specific field position and can be used for this purpose: + +@example +# sort.awk --- simple program to sort by field position +# field position is specified by the global variable POS + +function cmp_field(i1, v1, i2, v2) +@{ + # comparison by value, as string, and ascending order + return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS]) +@} + +@{ + for (i = 1; i <= NF; i++) + a[NR][i] = $i +@} + +END @{ + PROCINFO["sorted_in"] = "cmp_field" + if (POS < 1 || POS > NF) + POS = 1 + for (i in a) @{ + for (j = 1; j <= NF; j++) + printf("%s%c", a[i][j], j < NF ? ":" : "") + print "" + @} +@} +@end example + +The first field in each entry of the password file is the user's login name, +and the fields are separated by colons. +Each record defines a subarray, +with each field as an element in the subarray. +Running the program produces the +following output: + +@example +$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd} +@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin +@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin +@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin +@dots{} +@end example + +The comparison should normally always return the same value when given a +specific pair of array elements as its arguments. If inconsistent +results are returned then the order is undefined. This behavior can be +exploited to introduce random order into otherwise seemingly +ordered data: + +@example +function cmp_randomize(i1, v1, i2, v2) +@{ + # random order + return (2 - 4 * rand()) +@} +@end example + +As mentioned above, the order of the indices is arbitrary if two +elements compare equal. This is usually not a problem, but letting +the tied elements come out in arbitrary order can be an issue, especially +when comparing item values. The partial ordering of the equal elements +may change during the next loop traversal, if other elements are added or +removed from the array. One way to resolve ties when comparing elements +with otherwise equal values is to include the indices in the comparison +rules. Note that doing this may make the loop traversal less efficient, +so consider it only if necessary. The following comparison functions +force a deterministic order, and are based on the fact that the +indices of two elements are never equal: + +@example +function cmp_numeric(i1, v1, i2, v2) +@{ + # numerical value (and index) comparison, descending order + return (v1 != v2) ? (v2 - v1) : (i2 - i1) +@} + +function cmp_string(i1, v1, i2, v2) +@{ + # string value (and index) comparison, descending order + v1 = v1 i1 + v2 = v2 i2 + return (v1 > v2) ? -1 : (v1 != v2) +@} +@end example + +@c Avoid using the term ``stable'' when describing the unpredictable behavior +@c if two items compare equal. Usually, the goal of a "stable algorithm" +@c is to maintain the original order of the items, which is a meaningless +@c concept for a list constructed from a hash. + +A custom comparison function can often simplify ordered loop +traversal, and the sky is really the limit when it comes to +designing such a function. + +When string comparisons are made during a sort, either for element +values where one or both aren't numbers, or for element indices +handled as strings, the value of @code{IGNORECASE} +(@pxref{Built-in Variables}) controls whether +the comparisons treat corresponding uppercase and lowercase letters as +equivalent or distinct. + +Another point to keep in mind is that in the case of subarrays +the element values can themselves be arrays; a production comparison +function should use the @code{isarray()} function +(@pxref{Type Functions}), +to check for this, and choose a defined sorting order for subarrays. + +All sorting based on @code{PROCINFO["sorted_in"]} +is disabled in POSIX mode, +since the @code{PROCINFO} array is not special in that case. + +As a side note, sorting the array indices before traversing +the array has been reported to add 15% to 20% overhead to the +execution time of @command{awk} programs. For this reason, +sorted array traversal is not the default. + +@c The @command{gawk} +@c maintainers believe that only the people who wish to use a +@c feature should have to pay for it. + +@node Array Sorting Functions +@subsection Sorting Array Values and Indices with @command{gawk} + +@cindex arrays, sorting +@cindex @code{asort()} function (@command{gawk}) +@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting +@cindex sort function, arrays, sorting +In most @command{awk} implementations, sorting an array requires +writing a @code{sort()} function. +While this can be educational for exploring different sorting algorithms, +usually that's not the point of the program. +@command{gawk} provides the built-in @code{asort()} +and @code{asorti()} functions +(@pxref{String Functions}) +for sorting arrays. For example: + +@example +@var{populate the array} data +n = asort(data) +for (i = 1; i <= n; i++) + @var{do something with} data[i] +@end example + +After the call to @code{asort()}, the array @code{data} is indexed from 1 +to some number @var{n}, the total number of elements in @code{data}. +(This count is @code{asort()}'s return value.) +@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. +The comparison is based on the type of the elements +(@pxref{Typing and Comparison}). +All numeric values come before all string values, +which in turn come before all subarrays. + +@cindex side effects, @code{asort()} function +An important side effect of calling @code{asort()} is that +@emph{the array's original indices are irrevocably lost}. +As this isn't always desirable, @code{asort()} accepts a +second argument: + +@example +@var{populate the array} source +n = asort(source, dest) +for (i = 1; i <= n; i++) + @var{do something with} dest[i] +@end example + +In this case, @command{gawk} copies the @code{source} array into the +@code{dest} array and then sorts @code{dest}, destroying its indices. +However, the @code{source} array is not affected. + +@code{asort()} accepts a third string argument to control comparison of +array elements. As with @code{PROCINFO["sorted_in"]}, this argument +may be one of the predefined names that @command{gawk} provides +(@pxref{Controlling Scanning}), or the name of a user-defined function +(@pxref{Controlling Array Traversal}). + +@quotation NOTE +In all cases, the sorted element values consist of the original +array's element values. The ability to control comparison merely +affects the way in which they are sorted. +@end quotation + +Often, what's needed is to sort on the values of the @emph{indices} +instead of the values of the elements. +To do that, use the +@code{asorti()} function. The interface is identical to that of +@code{asort()}, except that the index values are used for sorting, and +become the values of the result array: + +@example +@{ source[$0] = some_func($0) @} + +END @{ + n = asorti(source, dest) + for (i = 1; i <= n; i++) @{ + @ii{Work with sorted indices directly:} + @var{do something with} dest[i] + @dots{} + @ii{Access original array via sorted indices:} + @var{do something with} source[dest[i]] + @} +@} +@end example + +Similar to @code{asort()}, +in all cases, the sorted element values consist of the original +array's indices. The ability to control comparison merely +affects the way in which they are sorted. + +Sorting the array by replacing the indices provides maximal flexibility. +To traverse the elements in decreasing order, use a loop that goes from +@var{n} down to 1, either over the elements or over the indices.@footnote{You +may also use one of the predefined sorting names that sorts in +decreasing order.} + +@cindex reference counting, sorting arrays +Copying array indices and elements isn't expensive in terms of memory. +Internally, @command{gawk} maintains @dfn{reference counts} to data. +For example, when @code{asort()} copies the first array to the second one, +there is only one copy of the original array elements' data, even though +both arrays use the values. + +@c Document It And Call It A Feature. Sigh. +@cindex @command{gawk}, @code{IGNORECASE} variable in +@cindex @code{IGNORECASE} variable +@cindex arrays, sorting, @code{IGNORECASE} variable and +@cindex @code{IGNORECASE} variable, array sorting and +Because @code{IGNORECASE} affects string comparisons, the value +of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. +Note also that the locale's sorting order does @emph{not} +come into play; comparisons are based on character values only.@footnote{This +is true because locale-based comparison occurs only when in POSIX +compatibility mode, and since @code{asort()} and @code{asorti()} are +@command{gawk} extensions, they are not available in that case.} +Caveat Emptor. + +@node Two-way I/O +@section Two-Way Communications with Another Process +@cindex Brennan, Michael +@cindex programmers, attractiveness of +@smallexample +@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan +From: brennan@@whidbey.com (Mike Brennan) +Newsgroups: comp.lang.awk +Subject: Re: Learn the SECRET to Attract Women Easily +Date: 4 Aug 1997 17:34:46 GMT +@c Organization: WhidbeyNet +@c Lines: 12 +Message-ID: <5s53rm$eca@@news.whidbey.com> +@c References: <5s20dn$2e1@chronicle.concentric.net> +@c Reply-To: brennan@whidbey.com +@c NNTP-Posting-Host: asn202.whidbey.com +@c X-Newsreader: slrn (0.9.4.1 UNIX) +@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403 + +On 3 Aug 1997 13:17:43 GMT, Want More Dates??? +<tracy78@@kilgrona.com> wrote: +>Learn the SECRET to Attract Women Easily +> +>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women + +The scent of awk programmers is a lot more attractive to women than +the scent of perl programmers. +-- +Mike Brennan +@c brennan@@whidbey.com +@end smallexample + +@cindex advanced features, @command{gawk}, processes@comma{} communicating with +@cindex processes, two-way communications with +It is often useful to be able to +send data to a separate program for +processing and then read the result. This can always be +done with temporary files: + +@example +# Write the data for processing +tempfile = ("mydata." PROCINFO["pid"]) +while (@var{not done with data}) + print @var{data} | ("subprogram > " tempfile) +close("subprogram > " tempfile) + +# Read the results, remove tempfile when done +while ((getline newdata < tempfile) > 0) + @var{process} newdata @var{appropriately} +close(tempfile) +system("rm " tempfile) +@end example + +@noindent +This works, but not elegantly. Among other things, it requires that +the program be run in a directory that cannot be shared among users; +for example, @file{/tmp} will not do, as another user might happen +to be using a temporary file with the same name. + +@cindex coprocesses +@cindex input/output, two-way +@cindex @code{|} (vertical bar), @code{|&} operator (I/O) +@cindex vertical bar (@code{|}), @code{|&} operator (I/O) +@cindex @command{csh} utility, @code{|&} operator, comparison with +However, with @command{gawk}, it is possible to +open a @emph{two-way} pipe to another process. The second process is +termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}. +The two-way connection is created using the @samp{|&} operator +(borrowed from the Korn shell, @command{ksh}):@footnote{This is very +different from the same operator in the C shell.} + +@example +do @{ + print @var{data} |& "subprogram" + "subprogram" |& getline results +@} while (@var{data left to process}) +close("subprogram") +@end example + +The first time an I/O operation is executed using the @samp{|&} +operator, @command{gawk} creates a two-way pipeline to a child process +that runs the other program. Output created with @code{print} +or @code{printf} is written to the program's standard input, and +output from the program's standard output can be read by the @command{gawk} +program using @code{getline}. +As is the case with processes started by @samp{|}, the subprogram +can be any program, or pipeline of programs, that can be started by +the shell. + +There are some cautionary items to be aware of: + +@itemize @bullet +@item +As the code inside @command{gawk} currently stands, the coprocess's +standard error goes to the same place that the parent @command{gawk}'s +standard error goes. It is not possible to read the child's +standard error separately. + +@cindex deadlocks +@cindex buffering, input/output +@cindex @code{getline} command, deadlock and +@item +I/O buffering may be a problem. @command{gawk} automatically +flushes all output down the pipe to the coprocess. +However, if the coprocess does not flush its output, +@command{gawk} may hang when doing a @code{getline} in order to read +the coprocess's results. This could lead to a situation +known as @dfn{deadlock}, where each process is waiting for the +other one to do something. +@end itemize + +@cindex @code{close()} function, two-way pipes and +It is possible to close just one end of the two-way pipe to +a coprocess, by supplying a second argument to the @code{close()} +function of either @code{"to"} or @code{"from"} +(@pxref{Close Files And Pipes}). +These strings tell @command{gawk} to close the end of the pipe +that sends data to the coprocess or the end that reads from it, +respectively. + +@cindex @command{sort} utility, coprocesses and +This is particularly necessary in order to use +the system @command{sort} utility as part of a coprocess; +@command{sort} must read @emph{all} of its input +data before it can produce any output. +The @command{sort} program does not receive an end-of-file indication +until @command{gawk} closes the write end of the pipe. + +When you have finished writing data to the @command{sort} +utility, you can close the @code{"to"} end of the pipe, and +then start reading sorted data via @code{getline}. +For example: + +@example +BEGIN @{ + command = "LC_ALL=C sort" + n = split("abcdefghijklmnopqrstuvwxyz", a, "") + + for (i = n; i > 0; i--) + print a[i] |& command + close(command, "to") + + while ((command |& getline line) > 0) + print "got", line + close(command) +@} +@end example + +This program writes the letters of the alphabet in reverse order, one +per line, down the two-way pipe to @command{sort}. It then closes the +write end of the pipe, so that @command{sort} receives an end-of-file +indication. This causes @command{sort} to sort the data and write the +sorted data back to the @command{gawk} program. Once all of the data +has been read, @command{gawk} terminates the coprocess and exits. + +As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} +command ensures traditional Unix (ASCII) sorting from @command{sort}. + +@cindex @command{gawk}, @code{PROCINFO} array in +@cindex @code{PROCINFO} array +You may also use pseudo-ttys (ptys) for +two-way communication instead of pipes, if your system supports them. +This is done on a per-command basis, by setting a special element +in the @code{PROCINFO} array +(@pxref{Auto-set}), +like so: + +@example +command = "sort -nr" # command, save in convenience variable +PROCINFO[command, "pty"] = 1 # update PROCINFO +print @dots{} |& command # start two-way pipe +@dots{} +@end example + +@noindent +Using ptys avoids the buffer deadlock issues described earlier, at some +loss in performance. If your system does not have ptys, or if all the +system's ptys are in use, @command{gawk} automatically falls back to +using regular pipes. + +@node TCP/IP Networking +@section Using @command{gawk} for Network Programming +@cindex advanced features, @command{gawk}, network programming +@cindex networks, programming +@c STARTOFRANGE tcpip +@cindex TCP/IP +@cindex @code{/inet/@dots{}} special files (@command{gawk}) +@cindex files, @code{/inet/@dots{}} (@command{gawk}) +@cindex @code{/inet4/@dots{}} special files (@command{gawk}) +@cindex files, @code{/inet4/@dots{}} (@command{gawk}) +@cindex @code{/inet6/@dots{}} special files (@command{gawk}) +@cindex files, @code{/inet6/@dots{}} (@command{gawk}) +@cindex @code{EMISTERED} +@quotation +@code{EMISTERED}:@* +@ @ @ @ @i{A host is a host from coast to coast,@* +@ @ @ @ and no-one can talk to host that's close,@* +@ @ @ @ unless the host that isn't close@* +@ @ @ @ is busy hung or dead.} +@end quotation + +In addition to being able to open a two-way pipeline to a coprocess +on the same system +(@pxref{Two-way I/O}), +it is possible to make a two-way connection to +another process on another system across an IP network connection. + +You can think of this as just a @emph{very long} two-way pipeline to +a coprocess. +The way @command{gawk} decides that you want to use TCP/IP networking is +by recognizing special @value{FN}s that begin with one of @samp{/inet/}, +@samp{/inet4/} or @samp{/inet6}. + +The full syntax of the special @value{FN} is +@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}. +The components are: + +@table @var +@item net-type +Specifies the kind of Internet connection to make. +Use @samp{/inet4/} to force IPv4, and +@samp{/inet6/} to force IPv6. +Plain @samp{/inet/} (which used to be the only option) uses +the system default, most likely IPv4. + +@item protocol +The protocol to use over IP. This must be either @samp{tcp}, or +@samp{udp}, for a TCP or UDP IP connection, +respectively. The use of TCP is recommended for most applications. + +@item local-port +@cindex @code{getaddrinfo()} function (C library) +The local TCP or UDP port number to use. Use a port number of @samp{0} +when you want the system to pick a port. This is what you should do +when writing a TCP or UDP client. +You may also use a well-known service name, such as @samp{smtp} +or @samp{http}, in which case @command{gawk} attempts to determine +the predefined port number using the C @code{getaddrinfo()} function. + +@item remote-host +The IP address or fully-qualified domain name of the Internet +host to which you want to connect. + +@item remote-port +The TCP or UDP port number to use on the given @var{remote-host}. +Again, use @samp{0} if you don't care, or else a well-known +service name. +@end table + +@cindex @command{gawk}, @code{ERRNO} variable in +@cindex @code{ERRNO} variable +@quotation NOTE +Failure in opening a two-way socket will result in a non-fatal error +being returned to the calling code. The value of @code{ERRNO} indicates +the error (@pxref{Auto-set}). +@end quotation + +Consider the following very simple example: + +@example +BEGIN @{ + Service = "/inet/tcp/0/localhost/daytime" + Service |& getline + print $0 + close(Service) +@} +@end example + +This program reads the current date and time from the local system's +TCP @samp{daytime} server. +It then prints the results and closes the connection. + +Because this topic is extensive, the use of @command{gawk} for +TCP/IP programming is documented separately. +@ifinfo +See +@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}, +@end ifinfo +@ifnotinfo +See @cite{TCP/IP Internetworking with @command{gawk}}, +which comes as part of the @command{gawk} distribution, +@end ifnotinfo +for a much more complete introduction and discussion, as well as +extensive examples. + +@c ENDOFRANGE tcpip + +@node Profiling +@section Profiling Your @command{awk} Programs +@c STARTOFRANGE awkp +@cindex @command{awk} programs, profiling +@c STARTOFRANGE proawk +@cindex profiling @command{awk} programs +@cindex profiling @command{gawk} +@cindex @code{awkprof.out} file +@cindex files, @code{awkprof.out} + +You may produce execution traces of your @command{awk} programs. +This is done by passing the option @option{--profile} to @command{gawk}. +When @command{gawk} has finished running, it creates a profile of your program in a file +named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than +@command{gawk} normally does. + +@cindex @code{--profile} option +As shown in the following example, +the @option{--profile} option can be used to change the name of the file +where @command{gawk} will write the profile: + +@example +gawk --profile=myprog.prof -f myprog.awk data1 data2 +@end example + +@noindent +In the above example, @command{gawk} places the profile in +@file{myprog.prof} instead of in @file{awkprof.out}. + +Here is a sample session showing a simple @command{awk} program, its input data, and the +results from running @command{gawk} with the @option{--profile} option. +First, the @command{awk} program: + +@example +BEGIN @{ print "First BEGIN rule" @} + +END @{ print "First END rule" @} + +/foo/ @{ + print "matched /foo/, gosh" + for (i = 1; i <= 3; i++) + sing() +@} + +@{ + if (/foo/) + print "if is true" + else + print "else is true" +@} + +BEGIN @{ print "Second BEGIN rule" @} + +END @{ print "Second END rule" @} + +function sing( dummy) +@{ + print "I gotta be me!" +@} +@end example + +Following is the input data: + +@example +foo +bar +baz +foo +junk +@end example + +Here is the @file{awkprof.out} that results from running the @command{gawk} +profiler on this program and data (this example also illustrates that @command{awk} +programmers sometimes have to work late): + +@cindex @code{BEGIN} pattern +@cindex @code{END} pattern +@example + # gawk profile, created Sun Aug 13 00:00:15 2000 + + # BEGIN block(s) + + BEGIN @{ + 1 print "First BEGIN rule" + 1 print "Second BEGIN rule" + @} + + # Rule(s) + + 5 /foo/ @{ # 2 + 2 print "matched /foo/, gosh" + 6 for (i = 1; i <= 3; i++) @{ + 6 sing() + @} + @} + + 5 @{ + 5 if (/foo/) @{ # 2 + 2 print "if is true" + 3 @} else @{ + 3 print "else is true" + @} + @} + + # END block(s) + + END @{ + 1 print "First END rule" + 1 print "Second END rule" + @} + + # Functions, listed alphabetically + + 6 function sing(dummy) + @{ + 6 print "I gotta be me!" + @} +@end example + +This example illustrates many of the basic features of profiling output. +They are as follows: + +@itemize @bullet +@item +The program is printed in the order @code{BEGIN} rule, +@code{BEGINFILE} rule, +pattern/action rules, +@code{ENDFILE} rule, @code{END} rule and functions, listed +alphabetically. +Multiple @code{BEGIN} and @code{END} rules are merged together, +as are multiple @code{BEGINFILE} and @code{ENDFILE} rules. + +@cindex patterns, counts +@item +Pattern-action rules have two counts. +The first count, to the left of the rule, shows how many times +the rule's pattern was @emph{tested}. +The second count, to the right of the rule's opening left brace +in a comment, +shows how many times the rule's action was @emph{executed}. +The difference between the two indicates how many times the rule's +pattern evaluated to false. + +@item +Similarly, +the count for an @code{if}-@code{else} statement shows how many times +the condition was tested. +To the right of the opening left brace for the @code{if}'s body +is a count showing how many times the condition was true. +The count for the @code{else} +indicates how many times the test failed. + +@cindex loops, count for header +@item +The count for a loop header (such as @code{for} +or @code{while}) shows how many times the loop test was executed. +(Because of this, you can't just look at the count on the first +statement in a rule to determine how many times the rule was executed. +If the first statement is a loop, the count is misleading.) + +@cindex functions, user-defined, counts +@cindex user-defined, functions, counts +@item +For user-defined functions, the count next to the @code{function} +keyword indicates how many times the function was called. +The counts next to the statements in the body show how many times +those statements were executed. + +@cindex @code{@{@}} (braces) +@cindex braces (@code{@{@}}) +@item +The layout uses ``K&R'' style with TABs. +Braces are used everywhere, even when +the body of an @code{if}, @code{else}, or loop is only a single statement. + +@cindex @code{()} (parentheses) +@cindex parentheses @code{()} +@item +Parentheses are used only where needed, as indicated by the structure +of the program and the precedence rules. +@c extra verbiage here satisfies the copyeditor. ugh. +For example, @samp{(3 + 5) * 4} means add three plus five, then multiply +the total by four. However, @samp{3 + 5 * 4} has no parentheses, and +means @samp{3 + (5 * 4)}. + +@ignore +@item +All string concatenations are parenthesized too. +(This could be made a bit smarter.) +@end ignore + +@item +Parentheses are used around the arguments to @code{print} +and @code{printf} only when +the @code{print} or @code{printf} statement is followed by a redirection. +Similarly, if +the target of a redirection isn't a scalar, it gets parenthesized. + +@item +@command{gawk} supplies leading comments in +front of the @code{BEGIN} and @code{END} rules, +the pattern/action rules, and the functions. + +@end itemize + +The profiled version of your program may not look exactly like what you +typed when you wrote it. This is because @command{gawk} creates the +profiled version by ``pretty printing'' its internal representation of +the program. The advantage to this is that @command{gawk} can produce +a standard representation. The disadvantage is that all source-code +comments are lost, as are the distinctions among multiple @code{BEGIN}, +@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: + +@example +/foo/ +@end example + +@noindent +come out as: + +@example +/foo/ @{ + print $0 +@} +@end example + +@noindent +which is correct, but possibly surprising. + +@cindex profiling @command{awk} programs, dynamically +@cindex @command{gawk} program, dynamic profiling +Besides creating profiles when a program has completed, +@command{gawk} can produce a profile while it is running. +This is useful if your @command{awk} program goes into an +infinite loop and you want to see what has been executed. +To use this feature, run @command{gawk} with the @option{--profile} +option in the background: + +@example +$ @kbd{gawk --profile -f myprog &} +[1] 13992 +@end example + +@cindex @command{kill} command@comma{} dynamic profiling +@cindex @code{USR1} signal +@cindex @code{SIGUSR1} signal +@cindex signals, @code{USR1}/@code{SIGUSR1} +@noindent +The shell prints a job number and process ID number; in this case, 13992. +Use the @command{kill} command to send the @code{USR1} signal +to @command{gawk}: + +@example +$ @kbd{kill -USR1 13992} +@end example + +@noindent +As usual, the profiled version of the program is written to +@file{awkprof.out}, or to a different file if one specified with +the @option{--profile} option. + +Along with the regular profile, as shown earlier, the profile +includes a trace of any active functions: + +@example +# Function Call Stack: + +# 3. baz +# 2. bar +# 1. foo +# -- main -- +@end example + +You may send @command{gawk} the @code{USR1} signal as many times as you like. +Each time, the profile and function call trace are appended to the output +profile file. + +@cindex @code{HUP} signal +@cindex @code{SIGHUP} signal +@cindex signals, @code{HUP}/@code{SIGHUP} +If you use the @code{HUP} signal instead of the @code{USR1} signal, +@command{gawk} produces the profile and the function call trace and then exits. + +@cindex @code{INT} signal (MS-Windows) +@cindex @code{SIGINT} signal (MS-Windows) +@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows) +@cindex @code{QUIT} signal (MS-Windows) +@cindex @code{SIGQUIT} signal (MS-Windows) +@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) +When @command{gawk} runs on MS-Windows systems, it uses the +@code{INT} and @code{QUIT} signals for producing the profile and, in +the case of the @code{INT} signal, @command{gawk} exits. This is +because these systems don't support the @command{kill} command, so the +only signals you can deliver to a program are those generated by the +keyboard. The @code{INT} signal is generated by the +@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the +@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. + +Finally, @command{gawk} also accepts another option @option{--pretty-print}. +When called this way, @command{gawk} ``pretty prints'' the program into +@file{awkprof.out}, without any execution counts. +@c ENDOFRANGE advgaw +@c ENDOFRANGE gawadv +@c ENDOFRANGE awkp +@c ENDOFRANGE proawk + @c The original text for this chapter was contributed by Efraim Yawitz. @c FIXME: Add more indexing. @@ -31858,16 +31897,18 @@ If you write an extension that you wish to share with other @command{gawk} users, please consider doing so through the @code{gawkextlib} project. +@iftex +@part Part IV:@* Appendices +@end iftex @ignore -@c Try this -@iftex -@page -@headings off -@majorheading III@ @ @ Appendixes -Part III provides the appendixes, the Glossary, and two licenses that cover +@ifdocbook + +@part Part IV:@* Appendices + +Part IV provides the appendices, the Glossary, and two licenses that cover the @command{gawk} source code and this @value{DOCUMENT}, respectively. -It contains the following appendixes: +It contains the following appendices: @itemize @bullet @item @@ -31891,11 +31932,7 @@ It contains the following appendixes: @item @ref{GNU Free Documentation License}. @end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex +@end ifdocbook @end ignore @node Language History |