diff options
-rw-r--r-- | awklib/eg/lib/walkarray.awk | 9 | ||||
-rw-r--r-- | awklib/eg/prog/anagram.awk | 50 | ||||
-rw-r--r-- | doc/gawk.info | 1024 | ||||
-rw-r--r-- | doc/gawk.texi | 248 |
4 files changed, 914 insertions, 417 deletions
diff --git a/awklib/eg/lib/walkarray.awk b/awklib/eg/lib/walkarray.awk new file mode 100644 index 00000000..5e36f46f --- /dev/null +++ b/awklib/eg/lib/walkarray.awk @@ -0,0 +1,9 @@ +function walk_array(arr, name, i) +{ + for (i in arr) { + if (isarray(arr[i])) + walk_array(arr[i], (name "[" i "]")) + else + printf("%s[%s] = %s\n", name, i, arr[i]) + } +} diff --git a/awklib/eg/prog/anagram.awk b/awklib/eg/prog/anagram.awk new file mode 100644 index 00000000..550e9580 --- /dev/null +++ b/awklib/eg/prog/anagram.awk @@ -0,0 +1,50 @@ +# anagram.awk --- An implementation of the anagram finding algorithm +# from Jon Bentley's "Programming Pearls", 2nd edition. +# Addison Wesley, 2000, ISBN 0-201-65788-0. +# Column 2, Problem C, section 2.8, pp 18-20. +# +# This program requires gawk 4.0 or newer. +# Required gawk-specific features: +# - True multidimensional arrays +# - split() with "" as separater splits out individual characters +# - asort() and asorti() functions +# +# See http://savannah.gnu.org/projects/gawk. +# +# Arnold Robbins +# arnold@skeeve.com +# Public Domain +# January, 2011 + +/'s$/ { next } # Skip possessives +{ + key = word2key($1) # Build signature + data[key][$1] = $1 # Store word with signature +} +# word2key --- split word apart into letters, sort, joining back together + +function word2key(word, a, i, n, result) +{ + n = split(word, a, "") + asort(a) + + for (i = 1; i <= n; i++) + result = result a[i] + + return result +} +END { + sort = "sort" + for (key in data) { + # Sort words with same key + nwords = asorti(data[key], words) + if (nwords == 1) + continue + + # And print. Minor glitch: trailing space at end of each line + for (j = 1; j <= nwords; j++) + printf("%s ", words[j]) | sort + print "" | sort + } + close(sort) +} diff --git a/doc/gawk.info b/doc/gawk.info index 07b24920..8d95f43b 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -410,6 +410,7 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Simple Sed:: A Simple Stream Editor. * Igawk Program:: A wrapper for `awk' that includes files. +* Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. * Debugging:: Introduction to `dgawk'. @@ -9374,7 +9375,8 @@ with a pound sign (`#'). `PROCINFO["sorted_in"]' If this element exists in `PROCINFO', _no matter what its value_, then `gawk' will cause `for(i in arr) ...' loops to - traverse the array indices in sorted order. + traverse the array indices in sorted order. *Note Scanning + an Array::, for more information. `PROCINFO["strftime"]' The default time format string for `strftime()'. Assigning a @@ -9887,6 +9889,34 @@ the loop body; it is not predictable whether the `for' loop will reach them. Similarly, changing VAR inside the loop may produce strange results. It is best to avoid such things. + As an extension, `gawk' makes it possible for you to loop over the +elements of an array in order, sorted by index. Sorting is based on +string comparison (since all array indices are strings), and you cannot +control the style of sorting; it is always from lowest to highest +(`"A"' before `"B"'). To enable this feature, create the array element +`PROCINFO["sorted_in"]' (*note Auto-set::). The value of this element +does not matter; `gawk' only tests if the element with this index +exists in `PROCINFO' or not. This extension is disabled in POSIX mode, +since the `PROCINFO' array is not special in that case. For example: + + $ gawk 'BEGIN { + > a[4] = 4 + > a[3] = 3 + > for (i in a) + > print i, a[i] + > }' + -| 4 4 + -| 3 3 + $ gawk 'BEGIN { + > PROCINFO["sorted_in"]++ + > a[4] = 4 + > a[3] = 3 + > for (i in a) + > print i, a[i] + > }' + -| 3 3 + -| 4 4 + File: gawk.info, Node: Delete, Next: Numeric Array Subscripts, Prev: Array Basics, Up: Arrays @@ -10347,6 +10377,17 @@ length) you could use the following code: for (j in array[i]) print array[i][j] + The `isarray()' function (*note Type Functions::) lets you test if +an array element is itself an array: + + for (i in array) { + if (isarray(array[i]) { + for (j in array[i]) { + print array[i][j] + } + } + } + If the structure of a jagged array of arrays is known in advance, you can often devise workarounds using control statements. For example, the following code prints the elements of our main array `a': @@ -10361,6 +10402,9 @@ the following code prints the elements of our main array `a': } } +*Note Walking Arrays::, for a user-defined function that will "walk" an +arbitrarily-dimensioned array of arrays. + Recall that a reference to an uninitialized array element yields a value of `""', the null string. This has one important implication when you intend to use a subarray as an argument to a function, as @@ -13852,6 +13896,7 @@ use only lowercase letters. arguments. * Passwd Functions:: Functions for getting user information. * Group Functions:: Functions for getting group information. +* Walking Arrays:: A function to walk arrays of arrays. ---------- Footnotes ---------- @@ -15349,7 +15394,7 @@ such a change would clutter up the code. network database. -File: gawk.info, Node: Group Functions, Prev: Passwd Functions, Up: Library Functions +File: gawk.info, Node: Group Functions, Next: Walking Arrays, Prev: Passwd Functions, Up: Library Functions 12.6 Reading the Group Database =============================== @@ -15584,6 +15629,57 @@ very simple, relying on `awk''s associative arrays to do work. The `id' program in *note Id Program::, uses these functions. +File: gawk.info, Node: Walking Arrays, Prev: Group Functions, Up: Library Functions + +12.7 Traversing Arrays of Arrays +================================ + +*note Arrays of Arrays::, described how `gawk' provides arrays of +arrays. In particular, any element of an array may be either a scalar, +or another array. The `isarray()' function (*note Type Functions::) +lets you distinguish an array from a scalar. The following function, +`walk_array', recursively traverses an array, printing each element's +indices and value. You call it with the array and a string +representing the name of the array: + + function walk_array(arr, name, i) + { + for (i in arr) { + if (isarray(arr[i])) + walk_array(arr[i], (name "[" i "]")) + else + printf("%s[%s] = %s\n", name, i, arr[i]) + } + } + +It works by looping over each element of the array. If any given +element is itself an array, the function calls itself recursively, +passing the subarray and a new string representing the current index. +Otherwise, the function simply prints the element's name, index, and +value. Here is a main program to demonstrate + + BEGIN { + a[1] = 1 + a[2][1] = 21 + a[2][2] = 22 + a[3] = 3 + a[4][1][1] = 411 + a[4][2] = 42 + + walk_array(a, "a") + } + + When run, the program produces the following output: + + $ gawk -f walk_array.awk + -| a[4][1][1] = 411 + -| a[4][2] = 42 + -| a[1] = 1 + -| a[2][1] = 21 + -| a[2][2] = 22 + -| a[3] = 3 + + File: gawk.info, Node: Sample Programs, Next: Debugger, Prev: Library Functions, Up: Top 13 Practical `awk' Programs @@ -16833,6 +16929,7 @@ hope you find them both interesting and enjoyable. * Simple Sed:: A Simple Stream Editor. * Igawk Program:: A wrapper for `awk' that includes files. +* Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. @@ -17686,7 +17783,7 @@ the single rule handles the printing scheme outlined above, using `print' or `printf' as appropriate, depending upon the value of `RT'. -File: gawk.info, Node: Igawk Program, Next: Signature Program, Prev: Simple Sed, Up: Miscellaneous Programs +File: gawk.info, Node: Igawk Program, Next: Anagram Program, Prev: Simple Sed, Up: Miscellaneous Programs 13.3.9 An Easy Way to Use Library Functions ------------------------------------------- @@ -18079,9 +18176,100 @@ programming book if you wish to understand things in more depth. can loop forever if the file exists but is empty. Caveat emptor. -File: gawk.info, Node: Signature Program, Prev: Igawk Program, Up: Miscellaneous Programs +File: gawk.info, Node: Anagram Program, Next: Signature Program, Prev: Igawk Program, Up: Miscellaneous Programs + +13.3.10 Finding Anagrams From A Dictionary +------------------------------------------ + +An interesting programming challenge is to read a word list (such as +`/usr/share/dict/words' on many GNU/Linux systems) and find words that +are "anagrams" of each other. One word is an anagram of another if +both words contain the same letters (for example, "babbling" and +"blabbing"). + + An elegant algorithm is presented in Column 2, Problem C of Jon +Bentley's `Programming Pearls', second edition. The idea is to give +words that are anagrams a common signature, sort all the words together +by their signature, and then print them. Dr. Bentley observes that +taking the letters in each word and sorting them produces that common +signature. + + The following program uses arrays of arrays to bring together words +with the same signature and array sorting to print the words in sorted +order. + + # anagram.awk --- An implementation of the anagram finding algorithm + # from Jon Bentley's "Programming Pearls", 2nd edition. + # Addison Wesley, 2000, ISBN 0-201-65788-0. + # Column 2, Problem C, section 2.8, pp 18-20. + + /'s$/ { next } # Skip possessives + + The program starts with a header, and then a rule to skip +possessives in the dictionary file. The next rule builds up the data +structure. The first dimension of the array is indexed by the +signature; the second dimension is the word itself: + + { + key = word2key($1) # Build signature + data[key][$1] = $1 # Store word with signature + } + + The `word2key()' function creates the signature. It splits the word +apart into individual letters, sorts the letters, and then joins them +back together: + + # word2key --- split word apart into letters, sort, joining back together + + function word2key(word, a, i, n, result) + { + n = split(word, a, "") + asort(a) + + for (i = 1; i <= n; i++) + result = result a[i] + + return result + } + + Finally, the `END' rule traverses the array and prints out the +anagram lists. It sends the output to the system `sort' command, since +otherwise the anagrams would appear in arbitrary order: + + END { + sort = "sort" + for (key in data) { + # Sort words with same key + nwords = asorti(data[key], words) + if (nwords == 1) + continue + + # And print. Minor glitch: trailing space at end of each line + for (j = 1; j <= nwords; j++) + printf("%s ", words[j]) | sort + print "" | sort + } + close(sort) + } + + Here is some partial output when the program is run: + + $ gawk -f anagram.awk /usr/share/dict/words | grep '^b' + ... + babbled blabbed + babbler blabber brabble + babblers blabbers brabbles + babbling blabbing + babbly blabby + babel bable + babels beslab + babery yabber + ... + + +File: gawk.info, Node: Signature Program, Prev: Anagram Program, Up: Miscellaneous Programs -13.3.10 And Now For Something Completely Different +13.3.11 And Now For Something Completely Different -------------------------------------------------- The following program was written by Davide Brini and is published on @@ -24395,7 +24583,7 @@ Index (line 67) * advanced features, data files as single record: Records. (line 175) * advanced features, fixed-width data: Constant Size. (line 9) -* advanced features, FNR/NR variables: Auto-set. (line 205) +* advanced features, FNR/NR variables: Auto-set. (line 206) * advanced features, gawk: Advanced Features. (line 6) * advanced features, gawk, network programming: TCP/IP Networking. (line 6) @@ -24425,6 +24613,7 @@ Index * ampersand (&), && operator: Boolean Ops. (line 57) * ampersand (&), gsub()/gensub()/sub() functions and: Gory Details. (line 6) +* anagram.awk program: Anagram Program. (line 23) * AND bitwise operation: Bitwise Functions. (line 6) * and Boolean-logic operator: Boolean Ops. (line 6) * and() function (gawk): Bitwise Functions. (line 39) @@ -24889,7 +25078,7 @@ Index (line 47) * dark corner, FILENAME variable <1>: Auto-set. (line 92) * dark corner, FILENAME variable: Getline Notes. (line 19) -* dark corner, FNR/NR variables: Auto-set. (line 205) +* dark corner, FNR/NR variables: Auto-set. (line 206) * dark corner, format-control characters: Control Letters. (line 18) * dark corner, FS as null string: Single Character Fields. (line 20) @@ -25088,7 +25277,7 @@ Index * differences in awk and gawk, regular expressions: Case-sensitivity. (line 26) * differences in awk and gawk, RS/RT variables: Records. (line 167) -* differences in awk and gawk, RT variable: Auto-set. (line 194) +* differences in awk and gawk, RT variable: Auto-set. (line 195) * differences in awk and gawk, single-character fields: Single Character Fields. (line 6) * differences in awk and gawk, split() function: String Functions. @@ -25369,7 +25558,7 @@ Index * floating-point, numbers, AWKNUM internal type: Internals. (line 19) * FNR variable <1>: Auto-set. (line 102) * FNR variable: Records. (line 6) -* FNR variable, changing: Auto-set. (line 205) +* FNR variable, changing: Auto-set. (line 206) * for statement: For Statement. (line 6) * for statement, in arrays: Scanning an Array. (line 20) * force_number() internal function: Internals. (line 27) @@ -25543,7 +25732,7 @@ Index * gawk, regular expressions, operators: GNU Regexp Operators. (line 6) * gawk, regular expressions, precedence: Regexp Operators. (line 156) -* gawk, RT variable in <1>: Auto-set. (line 194) +* gawk, RT variable in <1>: Auto-set. (line 195) * gawk, RT variable in <2>: Getline/Variable/File. (line 10) * gawk, RT variable in <3>: Multiple Line. (line 129) @@ -25990,7 +26179,7 @@ Index * not Boolean-logic operator: Boolean Ops. (line 6) * NR variable <1>: Auto-set. (line 118) * NR variable: Records. (line 6) -* NR variable, changing: Auto-set. (line 205) +* NR variable, changing: Auto-set. (line 206) * null strings <1>: Basic Data Typing. (line 50) * null strings <2>: Truth Values. (line 6) * null strings <3>: Regexp Field Splitting. @@ -26422,7 +26611,7 @@ Index * right angle bracket (>), >> operator (I/O): Redirection. (line 50) * right shift, bitwise: Bitwise Functions. (line 32) * Ritchie, Dennis: Basic Data Typing. (line 74) -* RLENGTH variable: Auto-set. (line 181) +* RLENGTH variable: Auto-set. (line 182) * RLENGTH variable, match() function and: String Functions. (line 194) * Robbins, Arnold <1>: Future Extensions. (line 6) * Robbins, Arnold <2>: Bugs. (line 32) @@ -26447,9 +26636,9 @@ Index * RS variable: Records. (line 20) * RS variable, multiline records and: Multiple Line. (line 17) * rshift() function (gawk): Bitwise Functions. (line 51) -* RSTART variable: Auto-set. (line 187) +* RSTART variable: Auto-set. (line 188) * RSTART variable, match() function and: String Functions. (line 194) -* RT variable <1>: Auto-set. (line 194) +* RT variable <1>: Auto-set. (line 195) * RT variable <2>: Getline/Variable/File. (line 10) * RT variable <3>: Multiple Line. (line 129) @@ -26824,6 +27013,7 @@ Index * w debugger command (alias for watch): Viewing And Changing Data. (line 67) * w utility: Constant Size. (line 22) +* walk_array() user-defined function: Walking Arrays. (line 14) * Wall, Larry <1>: Future Extensions. (line 6) * Wall, Larry: Array Intro. (line 6) * Wallin, Anders: Acknowledgments. (line 59) @@ -26893,407 +27083,409 @@ Index Tag Table: Node: Top1340 -Node: Foreword30100 -Node: Preface34428 -Ref: Preface-Footnote-137380 -Ref: Preface-Footnote-237486 -Node: History37718 -Node: Names39952 -Ref: Names-Footnote-141429 -Node: This Manual41501 -Ref: This Manual-Footnote-146399 -Node: Conventions46499 -Node: Manual History48615 -Ref: Manual History-Footnote-151793 -Ref: Manual History-Footnote-251834 -Node: How To Contribute51908 -Node: Acknowledgments53052 -Node: Getting Started57300 -Node: Running gawk59679 -Node: One-shot60865 -Node: Read Terminal62090 -Ref: Read Terminal-Footnote-163740 -Ref: Read Terminal-Footnote-264014 -Node: Long64185 -Node: Executable Scripts65561 -Ref: Executable Scripts-Footnote-167422 -Ref: Executable Scripts-Footnote-267524 -Node: Comments67975 -Node: Quoting70442 -Node: DOS Quoting75059 -Node: Sample Data Files75734 -Node: Very Simple78766 -Node: Two Rules83363 -Node: More Complex85510 -Ref: More Complex-Footnote-188440 -Node: Statements/Lines88520 -Ref: Statements/Lines-Footnote-192982 -Node: Other Features93247 -Node: When94116 -Node: Invoking Gawk96259 -Node: Command Line97644 -Node: Options98427 -Ref: Options-Footnote-1111481 -Node: Other Arguments111506 -Node: Naming Standard Input114169 -Node: Environment Variables115133 -Node: AWKPATH Variable115577 -Ref: AWKPATH Variable-Footnote-1118314 -Node: Other Environment Variables118574 -Node: Exit Status120922 -Node: Include Files121597 -Node: Obsolete124988 -Node: Undocumented125674 -Node: Regexp125915 -Node: Regexp Usage127367 -Node: Escape Sequences129393 -Node: Regexp Operators135136 -Ref: Regexp Operators-Footnote-1142308 -Ref: Regexp Operators-Footnote-2142455 -Node: Character Lists142553 -Ref: table-char-classes144328 -Node: GNU Regexp Operators146968 -Node: Case-sensitivity150687 -Ref: Case-sensitivity-Footnote-1153642 -Ref: Case-sensitivity-Footnote-2153877 -Node: Leftmost Longest153985 -Node: Computed Regexps155186 -Node: Locales158603 -Node: Reading Files162145 -Node: Records164086 -Ref: Records-Footnote-1172765 -Node: Fields172802 -Ref: Fields-Footnote-1175834 -Node: Nonconstant Fields175920 -Node: Changing Fields178122 -Node: Field Separators183412 -Node: Default Field Splitting186041 -Node: Regexp Field Splitting187158 -Node: Single Character Fields190516 -Node: Command Line Field Separator191575 -Node: Field Splitting Summary195014 -Ref: Field Splitting Summary-Footnote-1198200 -Node: Constant Size198301 -Node: Splitting By Content202863 -Ref: Splitting By Content-Footnote-1206589 -Node: Multiple Line206629 -Ref: Multiple Line-Footnote-1212476 -Node: Getline212655 -Node: Plain Getline214883 -Node: Getline/Variable216972 -Node: Getline/File218113 -Node: Getline/Variable/File219435 -Ref: Getline/Variable/File-Footnote-1221034 -Node: Getline/Pipe221121 -Node: Getline/Variable/Pipe223669 -Node: Getline/Coprocess224776 -Node: Getline/Variable/Coprocess226019 -Node: Getline Notes226733 -Node: Getline Summary228675 -Ref: table-getline-variants228959 -Node: Command line directories229864 -Node: Printing230489 -Node: Print232120 -Node: Print Examples233457 -Node: Output Separators236241 -Node: OFMT238000 -Node: Printf239358 -Node: Basic Printf240264 -Node: Control Letters241801 -Node: Format Modifiers245613 -Node: Printf Examples251624 -Node: Redirection254339 -Node: Special Files261317 -Node: Special FD261850 -Ref: Special FD-Footnote-1265461 -Node: Special Network265535 -Node: Special Caveats266390 -Node: Close Files And Pipes267184 -Ref: Close Files And Pipes-Footnote-1274128 -Ref: Close Files And Pipes-Footnote-2274276 -Node: Expressions274426 -Node: Values275495 -Node: Constants276171 -Node: Scalar Constants276851 -Ref: Scalar Constants-Footnote-1277710 -Node: Nondecimal-numbers277892 -Node: Regexp Constants280951 -Node: Using Constant Regexps281426 -Node: Variables284431 -Node: Using Variables285086 -Node: Assignment Options286813 -Node: Conversion288694 -Ref: table-locale-affects294068 -Ref: Conversion-Footnote-1294692 -Node: All Operators294801 -Node: Arithmetic Ops295431 -Node: Concatenation297937 -Ref: Concatenation-Footnote-1300730 -Node: Assignment Ops300849 -Ref: table-assign-ops305837 -Node: Increment Ops307245 -Node: Truth Values and Conditions310723 -Node: Truth Values311806 -Node: Typing and Comparison312854 -Node: Variable Typing313643 -Ref: Variable Typing-Footnote-1317540 -Node: Comparison Operators317662 -Ref: table-relational-ops318072 -Node: POSIX String Comparison321621 -Ref: POSIX String Comparison-Footnote-1322578 -Node: Boolean Ops322716 -Ref: Boolean Ops-Footnote-1326794 -Node: Conditional Exp326885 -Node: Function Calls328617 -Node: Precedence332207 -Node: Patterns and Actions335860 -Node: Pattern Overview336914 -Node: Regexp Patterns338580 -Node: Expression Patterns339123 -Node: Ranges342697 -Node: BEGIN/END345663 -Node: Using BEGIN/END346413 -Ref: Using BEGIN/END-Footnote-1349144 -Node: I/O And BEGIN/END349258 -Node: Empty351527 -Node: BEGINFILE/ENDFILE351861 -Node: Using Shell Variables354686 -Node: Action Overview356965 -Node: Statements359322 -Node: If Statement361181 -Node: While Statement362680 -Node: Do Statement364724 -Node: For Statement365880 -Node: Switch Statement369032 -Node: Break Statement371129 -Node: Continue Statement373105 -Node: Next Statement374806 -Node: Nextfile Statement377188 -Node: Exit Statement379713 -Node: Built-in Variables382044 -Node: User-modified383139 -Ref: User-modified-Footnote-1391140 -Node: Auto-set391202 -Ref: Auto-set-Footnote-1400406 -Node: ARGC and ARGV400611 -Node: Arrays404370 -Node: Array Basics405941 -Node: Array Intro406652 -Node: Reference to Elements410970 -Node: Assigning Elements413240 -Node: Array Example413731 -Node: Scanning an Array415463 -Node: Delete417740 -Ref: Delete-Footnote-1420171 -Node: Numeric Array Subscripts420228 -Node: Uninitialized Subscripts422411 -Node: Multi-dimensional424039 -Node: Multi-scanning427130 -Node: Array Sorting428714 -Ref: Array Sorting-Footnote-1431912 -Node: Arrays of Arrays432106 -Node: Functions436268 -Node: Built-in437090 -Node: Calling Built-in438168 -Node: Numeric Functions440144 -Ref: Numeric Functions-Footnote-1443901 -Ref: Numeric Functions-Footnote-2444237 -Ref: Numeric Functions-Footnote-3444285 -Node: String Functions444554 -Ref: String Functions-Footnote-1466360 -Ref: String Functions-Footnote-2466489 -Ref: String Functions-Footnote-3466737 -Node: Gory Details466824 -Ref: table-sub-escapes468481 -Ref: table-posix-sub469795 -Ref: table-gensub-escapes470695 -Node: I/O Functions471866 -Ref: I/O Functions-Footnote-1478561 -Node: Time Functions478708 -Ref: Time Functions-Footnote-1489575 -Ref: Time Functions-Footnote-2489643 -Ref: Time Functions-Footnote-3489801 -Ref: Time Functions-Footnote-4489912 -Ref: Time Functions-Footnote-5490024 -Ref: Time Functions-Footnote-6490251 -Node: Bitwise Functions490517 -Ref: table-bitwise-ops491075 -Ref: Bitwise Functions-Footnote-1495235 -Node: Type Functions495419 -Node: I18N Functions495857 -Node: User-defined497484 -Node: Definition Syntax498288 -Ref: Definition Syntax-Footnote-1502925 -Node: Function Example502994 -Node: Function Caveats505588 -Node: Calling A Function506009 -Node: Variable Scope507124 -Node: Pass By Value/Reference509052 -Node: Return Statement512492 -Node: Dynamic Typing515434 -Node: Indirect Calls516171 -Node: Internationalization525856 -Node: I18N and L10N527284 -Node: Explaining gettext527970 -Ref: Explaining gettext-Footnote-1533032 -Ref: Explaining gettext-Footnote-2533215 -Node: Programmer i18n533380 -Node: Translator i18n537671 -Node: String Extraction538464 -Ref: String Extraction-Footnote-1539425 -Node: Printf Ordering539511 -Ref: Printf Ordering-Footnote-1542295 -Node: I18N Portability542359 -Ref: I18N Portability-Footnote-1544808 -Node: I18N Example544871 -Ref: I18N Example-Footnote-1547506 -Node: Gawk I18N547578 -Node: Advanced Features548195 -Node: Nondecimal Data549514 -Node: Two-way I/O551095 -Ref: Two-way I/O-Footnote-1556509 -Node: TCP/IP Networking556586 -Node: Profiling559429 -Node: Library Functions566829 -Ref: Library Functions-Footnote-1569799 -Node: Library Names569970 -Ref: Library Names-Footnote-1573441 -Ref: Library Names-Footnote-2573661 -Node: General Functions573747 -Node: Nextfile Function574810 -Node: Strtonum Function579191 -Node: Assert Function582142 -Node: Round Function585468 -Node: Cliff Random Function587009 -Node: Ordinal Functions588025 -Ref: Ordinal Functions-Footnote-1591095 -Ref: Ordinal Functions-Footnote-2591347 -Node: Join Function591563 -Ref: Join Function-Footnote-1593334 -Node: Gettimeofday Function593534 -Node: Data File Management597249 -Node: Filetrans Function597881 -Node: Rewind Function602118 -Node: File Checking603571 -Node: Empty Files604665 -Node: Ignoring Assigns606895 -Node: Getopt Function608448 -Ref: Getopt Function-Footnote-1619773 -Node: Passwd Functions619976 -Ref: Passwd Functions-Footnote-1628964 -Node: Group Functions629052 -Node: Sample Programs637132 -Node: Running Examples637797 -Node: Clones638525 -Node: Cut Program639648 -Node: Egrep Program649489 -Ref: Egrep Program-Footnote-1657260 -Node: Id Program657370 -Node: Split Program660986 -Ref: Split Program-Footnote-1664505 -Node: Tee Program664633 -Node: Uniq Program667436 -Node: Wc Program674859 -Ref: Wc Program-Footnote-1679123 -Node: Miscellaneous Programs679323 -Node: Dupword Program680443 -Node: Alarm Program682474 -Node: Translate Program687196 -Ref: Translate Program-Footnote-1691575 -Ref: Translate Program-Footnote-2691803 -Node: Labels Program691937 -Ref: Labels Program-Footnote-1695308 -Node: Word Sorting695392 -Node: History Sorting699737 -Node: Extract Program701575 -Ref: Extract Program-Footnote-1709056 -Node: Simple Sed709184 -Node: Igawk Program712246 -Ref: Igawk Program-Footnote-1727280 -Ref: Igawk Program-Footnote-2727481 -Node: Signature Program727619 -Node: Debugger728699 -Node: Debugging729610 -Node: Debugging Concepts729924 -Node: Debugging Terms731780 -Node: Awk Debugging734325 -Node: Sample dgawk session735217 -Node: dgawk invocation735709 -Node: Finding The Bug736891 -Node: List of Debugger Commands743375 -Node: Breakpoint Control744686 -Node: Dgawk Execution Control748162 -Node: Viewing And Changing Data751513 -Node: Dgawk Stack754822 -Node: Dgawk Info756282 -Node: Miscellaneous Dgawk Commands760230 -Node: Readline Support765658 -Node: Dgawk Limitations766485 -Node: Language History768624 -Node: V7/SVR3.1770056 -Node: SVR4772351 -Node: POSIX773793 -Node: BTL774791 -Node: POSIX/GNU775525 -Node: Common Extensions780711 -Node: Contributors781844 -Node: Installation785879 -Node: Gawk Distribution786773 -Node: Getting787257 -Node: Extracting788083 -Node: Distribution contents789761 -Node: Unix Installation794779 -Node: Quick Installation795396 -Node: Additional Configuration Options797358 -Node: Configuration Philosophy798835 -Node: Non-Unix Installation801177 -Node: PC Installation801635 -Node: PC Binary Installation802934 -Node: PC Compiling804782 -Node: PC Testing807928 -Node: PC Using809104 -Node: Cygwin813289 -Node: MSYS814286 -Node: VMS Installation814800 -Node: VMS Compilation815404 -Node: VMS Installation Details816981 -Node: VMS Running818611 -Node: VMS POSIX820208 -Node: VMS Old Gawk821506 -Node: Bugs821978 -Node: Other Versions825843 -Node: Notes831122 -Node: Compatibility Mode831814 -Node: Additions832597 -Node: Accessing The Source833409 -Node: Adding Code834832 -Node: New Ports840380 -Node: Dynamic Extensions844493 -Node: Internals845869 -Node: Plugin License854985 -Node: Sample Library855619 -Node: Internal File Description856305 -Node: Internal File Ops860012 -Ref: Internal File Ops-Footnote-1864780 -Node: Using Internal File Ops864928 -Node: Future Extensions867305 -Node: Basic Concepts869809 -Node: Basic High Level870566 -Ref: Basic High Level-Footnote-1874601 -Node: Basic Data Typing874786 -Node: Floating Point Issues879311 -Node: String Conversion Precision880394 -Ref: String Conversion Precision-Footnote-1882088 -Node: Unexpected Results882197 -Node: POSIX Floating Point Problems884023 -Ref: POSIX Floating Point Problems-Footnote-1887719 -Node: Glossary887757 -Node: Copying911856 -Node: GNU Free Documentation License949413 -Node: next-edition974557 -Node: unresolved974909 -Node: revision975409 -Node: consistency975832 -Node: Index979331 +Node: Foreword30171 +Node: Preface34499 +Ref: Preface-Footnote-137451 +Ref: Preface-Footnote-237557 +Node: History37789 +Node: Names40023 +Ref: Names-Footnote-141500 +Node: This Manual41572 +Ref: This Manual-Footnote-146470 +Node: Conventions46570 +Node: Manual History48686 +Ref: Manual History-Footnote-151864 +Ref: Manual History-Footnote-251905 +Node: How To Contribute51979 +Node: Acknowledgments53123 +Node: Getting Started57371 +Node: Running gawk59750 +Node: One-shot60936 +Node: Read Terminal62161 +Ref: Read Terminal-Footnote-163811 +Ref: Read Terminal-Footnote-264085 +Node: Long64256 +Node: Executable Scripts65632 +Ref: Executable Scripts-Footnote-167493 +Ref: Executable Scripts-Footnote-267595 +Node: Comments68046 +Node: Quoting70513 +Node: DOS Quoting75130 +Node: Sample Data Files75805 +Node: Very Simple78837 +Node: Two Rules83434 +Node: More Complex85581 +Ref: More Complex-Footnote-188511 +Node: Statements/Lines88591 +Ref: Statements/Lines-Footnote-193053 +Node: Other Features93318 +Node: When94187 +Node: Invoking Gawk96330 +Node: Command Line97715 +Node: Options98498 +Ref: Options-Footnote-1111552 +Node: Other Arguments111577 +Node: Naming Standard Input114240 +Node: Environment Variables115204 +Node: AWKPATH Variable115648 +Ref: AWKPATH Variable-Footnote-1118385 +Node: Other Environment Variables118645 +Node: Exit Status120993 +Node: Include Files121668 +Node: Obsolete125059 +Node: Undocumented125745 +Node: Regexp125986 +Node: Regexp Usage127438 +Node: Escape Sequences129464 +Node: Regexp Operators135207 +Ref: Regexp Operators-Footnote-1142379 +Ref: Regexp Operators-Footnote-2142526 +Node: Character Lists142624 +Ref: table-char-classes144399 +Node: GNU Regexp Operators147039 +Node: Case-sensitivity150758 +Ref: Case-sensitivity-Footnote-1153713 +Ref: Case-sensitivity-Footnote-2153948 +Node: Leftmost Longest154056 +Node: Computed Regexps155257 +Node: Locales158674 +Node: Reading Files162216 +Node: Records164157 +Ref: Records-Footnote-1172836 +Node: Fields172873 +Ref: Fields-Footnote-1175905 +Node: Nonconstant Fields175991 +Node: Changing Fields178193 +Node: Field Separators183483 +Node: Default Field Splitting186112 +Node: Regexp Field Splitting187229 +Node: Single Character Fields190587 +Node: Command Line Field Separator191646 +Node: Field Splitting Summary195085 +Ref: Field Splitting Summary-Footnote-1198271 +Node: Constant Size198372 +Node: Splitting By Content202934 +Ref: Splitting By Content-Footnote-1206660 +Node: Multiple Line206700 +Ref: Multiple Line-Footnote-1212547 +Node: Getline212726 +Node: Plain Getline214954 +Node: Getline/Variable217043 +Node: Getline/File218184 +Node: Getline/Variable/File219506 +Ref: Getline/Variable/File-Footnote-1221105 +Node: Getline/Pipe221192 +Node: Getline/Variable/Pipe223740 +Node: Getline/Coprocess224847 +Node: Getline/Variable/Coprocess226090 +Node: Getline Notes226804 +Node: Getline Summary228746 +Ref: table-getline-variants229030 +Node: Command line directories229935 +Node: Printing230560 +Node: Print232191 +Node: Print Examples233528 +Node: Output Separators236312 +Node: OFMT238071 +Node: Printf239429 +Node: Basic Printf240335 +Node: Control Letters241872 +Node: Format Modifiers245684 +Node: Printf Examples251695 +Node: Redirection254410 +Node: Special Files261388 +Node: Special FD261921 +Ref: Special FD-Footnote-1265532 +Node: Special Network265606 +Node: Special Caveats266461 +Node: Close Files And Pipes267255 +Ref: Close Files And Pipes-Footnote-1274199 +Ref: Close Files And Pipes-Footnote-2274347 +Node: Expressions274497 +Node: Values275566 +Node: Constants276242 +Node: Scalar Constants276922 +Ref: Scalar Constants-Footnote-1277781 +Node: Nondecimal-numbers277963 +Node: Regexp Constants281022 +Node: Using Constant Regexps281497 +Node: Variables284502 +Node: Using Variables285157 +Node: Assignment Options286884 +Node: Conversion288765 +Ref: table-locale-affects294139 +Ref: Conversion-Footnote-1294763 +Node: All Operators294872 +Node: Arithmetic Ops295502 +Node: Concatenation298008 +Ref: Concatenation-Footnote-1300801 +Node: Assignment Ops300920 +Ref: table-assign-ops305908 +Node: Increment Ops307316 +Node: Truth Values and Conditions310794 +Node: Truth Values311877 +Node: Typing and Comparison312925 +Node: Variable Typing313714 +Ref: Variable Typing-Footnote-1317611 +Node: Comparison Operators317733 +Ref: table-relational-ops318143 +Node: POSIX String Comparison321692 +Ref: POSIX String Comparison-Footnote-1322649 +Node: Boolean Ops322787 +Ref: Boolean Ops-Footnote-1326865 +Node: Conditional Exp326956 +Node: Function Calls328688 +Node: Precedence332278 +Node: Patterns and Actions335931 +Node: Pattern Overview336985 +Node: Regexp Patterns338651 +Node: Expression Patterns339194 +Node: Ranges342768 +Node: BEGIN/END345734 +Node: Using BEGIN/END346484 +Ref: Using BEGIN/END-Footnote-1349215 +Node: I/O And BEGIN/END349329 +Node: Empty351598 +Node: BEGINFILE/ENDFILE351932 +Node: Using Shell Variables354757 +Node: Action Overview357036 +Node: Statements359393 +Node: If Statement361252 +Node: While Statement362751 +Node: Do Statement364795 +Node: For Statement365951 +Node: Switch Statement369103 +Node: Break Statement371200 +Node: Continue Statement373176 +Node: Next Statement374877 +Node: Nextfile Statement377259 +Node: Exit Statement379784 +Node: Built-in Variables382115 +Node: User-modified383210 +Ref: User-modified-Footnote-1391211 +Node: Auto-set391273 +Ref: Auto-set-Footnote-1400537 +Node: ARGC and ARGV400742 +Node: Arrays404501 +Node: Array Basics406072 +Node: Array Intro406783 +Node: Reference to Elements411101 +Node: Assigning Elements413371 +Node: Array Example413862 +Node: Scanning an Array415594 +Node: Delete418822 +Ref: Delete-Footnote-1421253 +Node: Numeric Array Subscripts421310 +Node: Uninitialized Subscripts423493 +Node: Multi-dimensional425121 +Node: Multi-scanning428212 +Node: Array Sorting429796 +Ref: Array Sorting-Footnote-1432994 +Node: Arrays of Arrays433188 +Node: Functions437726 +Node: Built-in438548 +Node: Calling Built-in439626 +Node: Numeric Functions441602 +Ref: Numeric Functions-Footnote-1445359 +Ref: Numeric Functions-Footnote-2445695 +Ref: Numeric Functions-Footnote-3445743 +Node: String Functions446012 +Ref: String Functions-Footnote-1467818 +Ref: String Functions-Footnote-2467947 +Ref: String Functions-Footnote-3468195 +Node: Gory Details468282 +Ref: table-sub-escapes469939 +Ref: table-posix-sub471253 +Ref: table-gensub-escapes472153 +Node: I/O Functions473324 +Ref: I/O Functions-Footnote-1480019 +Node: Time Functions480166 +Ref: Time Functions-Footnote-1491033 +Ref: Time Functions-Footnote-2491101 +Ref: Time Functions-Footnote-3491259 +Ref: Time Functions-Footnote-4491370 +Ref: Time Functions-Footnote-5491482 +Ref: Time Functions-Footnote-6491709 +Node: Bitwise Functions491975 +Ref: table-bitwise-ops492533 +Ref: Bitwise Functions-Footnote-1496693 +Node: Type Functions496877 +Node: I18N Functions497315 +Node: User-defined498942 +Node: Definition Syntax499746 +Ref: Definition Syntax-Footnote-1504383 +Node: Function Example504452 +Node: Function Caveats507046 +Node: Calling A Function507467 +Node: Variable Scope508582 +Node: Pass By Value/Reference510510 +Node: Return Statement513950 +Node: Dynamic Typing516892 +Node: Indirect Calls517629 +Node: Internationalization527314 +Node: I18N and L10N528742 +Node: Explaining gettext529428 +Ref: Explaining gettext-Footnote-1534490 +Ref: Explaining gettext-Footnote-2534673 +Node: Programmer i18n534838 +Node: Translator i18n539129 +Node: String Extraction539922 +Ref: String Extraction-Footnote-1540883 +Node: Printf Ordering540969 +Ref: Printf Ordering-Footnote-1543753 +Node: I18N Portability543817 +Ref: I18N Portability-Footnote-1546266 +Node: I18N Example546329 +Ref: I18N Example-Footnote-1548964 +Node: Gawk I18N549036 +Node: Advanced Features549653 +Node: Nondecimal Data550972 +Node: Two-way I/O552553 +Ref: Two-way I/O-Footnote-1557967 +Node: TCP/IP Networking558044 +Node: Profiling560887 +Node: Library Functions568287 +Ref: Library Functions-Footnote-1571326 +Node: Library Names571497 +Ref: Library Names-Footnote-1574968 +Ref: Library Names-Footnote-2575188 +Node: General Functions575274 +Node: Nextfile Function576337 +Node: Strtonum Function580718 +Node: Assert Function583669 +Node: Round Function586995 +Node: Cliff Random Function588536 +Node: Ordinal Functions589552 +Ref: Ordinal Functions-Footnote-1592622 +Ref: Ordinal Functions-Footnote-2592874 +Node: Join Function593090 +Ref: Join Function-Footnote-1594861 +Node: Gettimeofday Function595061 +Node: Data File Management598776 +Node: Filetrans Function599408 +Node: Rewind Function603645 +Node: File Checking605098 +Node: Empty Files606192 +Node: Ignoring Assigns608422 +Node: Getopt Function609975 +Ref: Getopt Function-Footnote-1621300 +Node: Passwd Functions621503 +Ref: Passwd Functions-Footnote-1630491 +Node: Group Functions630579 +Node: Walking Arrays638682 +Node: Sample Programs640248 +Node: Running Examples640913 +Node: Clones641641 +Node: Cut Program642764 +Node: Egrep Program652605 +Ref: Egrep Program-Footnote-1660376 +Node: Id Program660486 +Node: Split Program664102 +Ref: Split Program-Footnote-1667621 +Node: Tee Program667749 +Node: Uniq Program670552 +Node: Wc Program677975 +Ref: Wc Program-Footnote-1682239 +Node: Miscellaneous Programs682439 +Node: Dupword Program683627 +Node: Alarm Program685658 +Node: Translate Program690380 +Ref: Translate Program-Footnote-1694759 +Ref: Translate Program-Footnote-2694987 +Node: Labels Program695121 +Ref: Labels Program-Footnote-1698492 +Node: Word Sorting698576 +Node: History Sorting702921 +Node: Extract Program704759 +Ref: Extract Program-Footnote-1712240 +Node: Simple Sed712368 +Node: Igawk Program715430 +Ref: Igawk Program-Footnote-1730462 +Ref: Igawk Program-Footnote-2730663 +Node: Anagram Program730801 +Node: Signature Program733899 +Node: Debugger734981 +Node: Debugging735892 +Node: Debugging Concepts736206 +Node: Debugging Terms738062 +Node: Awk Debugging740607 +Node: Sample dgawk session741499 +Node: dgawk invocation741991 +Node: Finding The Bug743173 +Node: List of Debugger Commands749657 +Node: Breakpoint Control750968 +Node: Dgawk Execution Control754444 +Node: Viewing And Changing Data757795 +Node: Dgawk Stack761104 +Node: Dgawk Info762564 +Node: Miscellaneous Dgawk Commands766512 +Node: Readline Support771940 +Node: Dgawk Limitations772767 +Node: Language History774906 +Node: V7/SVR3.1776338 +Node: SVR4778633 +Node: POSIX780075 +Node: BTL781073 +Node: POSIX/GNU781807 +Node: Common Extensions786993 +Node: Contributors788126 +Node: Installation792161 +Node: Gawk Distribution793055 +Node: Getting793539 +Node: Extracting794365 +Node: Distribution contents796043 +Node: Unix Installation801061 +Node: Quick Installation801678 +Node: Additional Configuration Options803640 +Node: Configuration Philosophy805117 +Node: Non-Unix Installation807459 +Node: PC Installation807917 +Node: PC Binary Installation809216 +Node: PC Compiling811064 +Node: PC Testing814210 +Node: PC Using815386 +Node: Cygwin819571 +Node: MSYS820568 +Node: VMS Installation821082 +Node: VMS Compilation821686 +Node: VMS Installation Details823263 +Node: VMS Running824893 +Node: VMS POSIX826490 +Node: VMS Old Gawk827788 +Node: Bugs828260 +Node: Other Versions832125 +Node: Notes837404 +Node: Compatibility Mode838096 +Node: Additions838879 +Node: Accessing The Source839691 +Node: Adding Code841114 +Node: New Ports846662 +Node: Dynamic Extensions850775 +Node: Internals852151 +Node: Plugin License861267 +Node: Sample Library861901 +Node: Internal File Description862587 +Node: Internal File Ops866294 +Ref: Internal File Ops-Footnote-1871062 +Node: Using Internal File Ops871210 +Node: Future Extensions873587 +Node: Basic Concepts876091 +Node: Basic High Level876848 +Ref: Basic High Level-Footnote-1880883 +Node: Basic Data Typing881068 +Node: Floating Point Issues885593 +Node: String Conversion Precision886676 +Ref: String Conversion Precision-Footnote-1888370 +Node: Unexpected Results888479 +Node: POSIX Floating Point Problems890305 +Ref: POSIX Floating Point Problems-Footnote-1894001 +Node: Glossary894039 +Node: Copying918138 +Node: GNU Free Documentation License955695 +Node: next-edition980839 +Node: unresolved981191 +Node: revision981691 +Node: consistency982114 +Node: Index985613 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 08fc7faf..78c4f198 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -20,7 +20,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH December, 2010 +@set UPDATE-MONTH January, 2011 @set VERSION 4.0 @set PATCHLEVEL 0 @@ -607,6 +607,7 @@ particular records in a file and perform operations upon them. * Simple Sed:: A Simple Stream Editor. * Igawk Program:: A wrapper for @command{awk} that includes files. +* Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. * Debugging:: Introduction to @command{dgawk}. @@ -12716,6 +12717,7 @@ If this element exists in @code{PROCINFO}, @emph{no matter what its value}, then @command{gawk} will cause @samp{for(i in arr) @dots{}} loops to traverse the array indices in sorted order. +@xref{Scanning an Array}, for more information. @item PROCINFO["strftime"] The default time format string for @code{strftime()}. @@ -13379,6 +13381,39 @@ the loop body; it is not predictable whether the @code{for} loop will reach them. Similarly, changing @var{var} inside the loop may produce strange results. It is best to avoid such things. +As an extension, @command{gawk} makes it possible for you to +loop over the elements of an array in order, sorted by index. +Sorting is based on string comparison (since all array indices are +strings), and you cannot control the style of sorting; it is always +from lowest to highest (@code{"A"} before @code{"B"}). +To enable this feature, create the array element +@code{PROCINFO["sorted_in"]} (@pxref{Auto-set}). +The value of this element does not +matter; @command{gawk} only tests if the element with this index +exists in @code{PROCINFO} or not. +This extension is disabled in POSIX mode, since the @code{PROCINFO} +array is not special in that case. For example: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{ a[4] = 4} +> @kbd{ a[3] = 3} +> @kbd{ for (i in a)} +> @kbd{ print i, a[i]} +> @kbd{@}'} +@print{} 4 4 +@print{} 3 3 +$ @kbd{gawk 'BEGIN @{} +> @kbd{ PROCINFO["sorted_in"]++} +> @kbd{ a[4] = 4} +> @kbd{ a[3] = 3} +> @kbd{ for (i in a)} +> @kbd{ print i, a[i]} +> @kbd{@}'} +@print{} 3 3 +@print{} 4 4 +@end example + @node Delete @section The @code{delete} Statement @cindex @code{delete} statement @@ -13960,6 +13995,19 @@ for (i in array) print array[i][j] @end example +The @code{isarray()} function (@pxref{Type Functions}) +lets you test if an array element is itself an array: + +@example +for (i in array) @{ + if (isarray(array[i]) @{ + for (j in array[i]) @{ + print array[i][j] + @} + @} +@} +@end example + If the structure of a jagged array of arrays is known in advance, you can often devise workarounds using control statements. For example, the following code prints the elements of our main array @code{a}: @@ -13976,6 +14024,10 @@ for (i in a) @{ @} @end example +@noindent +@xref{Walking Arrays}, for a user-defined function that will ``walk'' an +arbitrarily-dimensioned array of arrays. + Recall that a reference to an uninitialized array element yields a value of @code{""}, the null string. This has one important implication when you intend to use a subarray as an argument to a function, as illustrated by @@ -18807,6 +18859,7 @@ comparisons use only lowercase letters. arguments. * Passwd Functions:: Functions for getting user information. * Group Functions:: Functions for getting group information. +* Walking Arrays:: A function to walk arrays of arrays. @end menu @node Library Names @@ -21191,6 +21244,68 @@ simple, relying on @command{awk}'s associative arrays to do work. The @command{id} program in @ref{Id Program}, uses these functions. + +@node Walking Arrays +@section Traversing Arrays of Arrays + +@ref{Arrays of Arrays}, described how @command{gawk} +provides arrays of arrays. In particular, any element of +an array may be either a scalar, or another array. The +@code{isarray()} function (@pxref{Type Functions}) +lets you distinguish an array +from a scalar. +The following function, @code{walk_array}, recursively traverses +an array, printing each element's indices and value. +You call it with the array and a string representing the name +of the array: + +@cindex @code{walk_array()} user-defined function +@example +@c file eg/lib/walkarray.awk +function walk_array(arr, name, i) +@{ + for (i in arr) @{ + if (isarray(arr[i])) + walk_array(arr[i], (name "[" i "]")) + else + printf("%s[%s] = %s\n", name, i, arr[i]) + @} +@} +@c endfile +@end example + +@noindent +It works by looping over each element of the array. If any given +element is itself an array, the function calls itself recursively, +passing the subarray and a new string representing the current index. +Otherwise, the function simply prints the element's name, index, and value. +Here is a main program to demonstrate + +@example +BEGIN @{ + a[1] = 1 + a[2][1] = 21 + a[2][2] = 22 + a[3] = 3 + a[4][1][1] = 411 + a[4][2] = 42 + + walk_array(a, "a") +@} +@end example + +When run, the program produces the following output: + +@example +$ @kbd{gawk -f walk_array.awk} +@print{} a[4][1][1] = 411 +@print{} a[4][2] = 42 +@print{} a[1] = 1 +@print{} a[2][1] = 21 +@print{} a[2][2] = 22 +@print{} a[3] = 3 +@end example + @c ENDOFRANGE libfgdata @c ENDOFRANGE flibgdata @c ENDOFRANGE gdatar @@ -22806,6 +22921,7 @@ We hope you find them both interesting and enjoyable. * Simple Sed:: A Simple Stream Editor. * Igawk Program:: A wrapper for @command{awk} that includes files. +* Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. @end menu @@ -24415,6 +24531,136 @@ statements for the desired library functions. @c ENDOFRANGE flibex @c ENDOFRANGE awkpex +@node Anagram Program +@subsection Finding Anagrams From A Dictionary + +An interesting programming challenge is to +read a word list (such as +@file{/usr/share/dict/words} on many GNU/Linux systems) +and find words that are @dfn{anagrams} of each other. +One word is an anagram of another if both words contain +the same letters +(for example, ``babbling'' and ``blabbing''). + +An elegant algorithm is presented in Column 2, Problem C of +Jon Bentley's @cite{Programming Pearls}, second edition. +The idea is to give words that are anagrams a common signature, +sort all the words together by their signature, and then print them. +Dr.@: Bentley observes that taking the letters in each word and +sorting them produces that common signature. + +The following program uses arrays of arrays to bring together +words with the same signature and array sorting to print the words +in sorted order. + +@cindex @code{anagram.awk} program +@example +@c file eg/prog/anagram.awk +# anagram.awk --- An implementation of the anagram finding algorithm +# from Jon Bentley's "Programming Pearls", 2nd edition. +# Addison Wesley, 2000, ISBN 0-201-65788-0. +# Column 2, Problem C, section 2.8, pp 18-20. +@c endfile +@ignore +@c file eg/prog/anagram.awk +# +# This program requires gawk 4.0 or newer. +# Required gawk-specific features: +# - True multidimensional arrays +# - split() with "" as separater splits out individual characters +# - asort() and asorti() functions +# +# See http://savannah.gnu.org/projects/gawk. +# +# Arnold Robbins +# arnold@@skeeve.com +# Public Domain +# January, 2011 +@c endfile +@end ignore +@c file eg/prog/anagram.awk + +/'s$/ @{ next @} # Skip possessives +@c endfile +@end example + +The program starts with a header, and then a rule to skip +possessives in the dictionary file. The next rule builds +up the data structure. The first dimension of the array +is indexed by the signature; the second dimension is the word +itself: + +@example +@c file eg/prog/anagram.awk +@{ + key = word2key($1) # Build signature + data[key][$1] = $1 # Store word with signature +@} +@c endfile +@end example + +The @code{word2key()} function creates the signature. +It splits the word apart into individual letters, +sorts the letters, and then joins them back together: + +@example +@c file eg/prog/anagram.awk +# word2key --- split word apart into letters, sort, joining back together + +function word2key(word, a, i, n, result) +@{ + n = split(word, a, "") + asort(a) + + for (i = 1; i <= n; i++) + result = result a[i] + + return result +@} +@c endfile +@end example + +Finally, the @code{END} rule traverses the array +and prints out the anagram lists. It sends the output +to the system @command{sort} command, since otherwise +the anagrams would appear in arbitrary order: + +@example +@c file eg/prog/anagram.awk +END @{ + sort = "sort" + for (key in data) @{ + # Sort words with same key + nwords = asorti(data[key], words) + if (nwords == 1) + continue + + # And print. Minor glitch: trailing space at end of each line + for (j = 1; j <= nwords; j++) + printf("%s ", words[j]) | sort + print "" | sort + @} + close(sort) +@} +@c endfile +@end example + +Here is some partial output when the program is run: + +@example +$ @kbd{gawk -f anagram.awk /usr/share/dict/words | grep '^b'} +@dots{} +babbled blabbed +babbler blabber brabble +babblers blabbers brabbles +babbling blabbing +babbly blabby +babel bable +babels beslab +babery yabber +@dots{} +@end example + @node Signature Program @subsection And Now For Something Completely Different |