4 files changed, 574 insertions, 336 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 0fbba4ee..a6ad9a76 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -2,6 +2,11 @@
 
 	* gawktexi.in: Minor edits.
 
+	Unrelated:
+
+	* gawktexi (Wc Program): Update to POSIX, support both bytes
+	and characters via the gawkextlib mbs extension.
+
 2020-10-01         Arnold D. Robbins     <arnold@skeeve.com>
 
 	* gawktexi.in (Split Program): Rewrite split to be POSIX
diff --git a/doc/gawk.info b/doc/gawk.info
index cf00ed11..320e0657 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -19011,10 +19011,76 @@ File: gawk.info,  Node: Wc Program,  Prev: Uniq Program,  Up: Clones
 11.2.7 Counting Things
 ----------------------
 
-The 'wc' (word count) utility counts lines, words, and characters in one
-or more input files.  Its usage is as follows:
+The 'wc' (word count) utility counts lines, words, characters and bytes
+in one or more input files.
 
-     'wc' ['-lwc'] [FILES ...]
+* Menu:
+
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* wc program::                  Code for 'wc.awk'.
+
+
+File: gawk.info,  Node: Bytes vs. Characters,  Next: Using extensions,  Up: Wc Program
+
+11.2.7.1 Modern Character Sets
+..............................
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC, which
+each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+   Today, the most popular character set in use is Unicode (of which
+ASCII is a pure subset).  Unicode provides tens of thousands of unique
+characters (called "code points") to cover most existing human languages
+(living and dead) and a number of nonhuman ones as well (such as Klingon
+and J.R.R. Tolkien's elvish languages).
+
+   To save space in files, Unicode code points are "encoded", where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such "multibyte encodings".
+
+   The POSIX standard requires that 'awk' function in terms of
+characters, not bytes.  Thus in 'gawk', 'length()', 'substr()',
+'split()', 'match()' and the other string functions (*note String
+Functions::) all work in terms of characters in the local character set,
+and not in terms of bytes.  (Not all 'awk' implementations do so,
+though).
+
+   There is no standard, built-in way to distinguish characters from
+bytes in an 'awk' program.  For an 'awk' implementation of 'wc', which
+needs to make such a distinction, we will have to use an external
+extension.
+
+
+File: gawk.info,  Node: Using extensions,  Next: wc program,  Prev: Bytes vs. Characters,  Up: Wc Program
+
+11.2.7.2 A Brief Introduction To Extensions
+...........................................
+
+Loadable extensions are presented in full detail in *note Dynamic
+Extensions::.  They provide a way to add functions to 'gawk' which can
+call out to other facilities written in C or C++.
+
+   For the purposes of 'wc.awk', it's enough to know that the extension
+is loaded with the '@load' directive, and the additional function we
+will use is called 'mbs_length()'.  This function returns the number of
+bytes in a string, and not the number of characters.
+
+   The '"mbs"' extension comes from the 'gawkextlib' project.  *Note
+gawkextlib:: for more information.
+
+
+File: gawk.info,  Node: wc program,  Prev: Using extensions,  Up: Wc Program
+
+11.2.7.3 Code for 'wc.awk'
+..........................
+
+The usage for 'wc' is as follows:
+
+     'wc' ['-lwcm'] [FILES ...]
 
    If no files are specified on the command line, 'wc' reads its
 standard input.  If there are multiple files, it also prints total
@@ -19031,21 +19097,27 @@ follows:
      data.
 
 '-c'
+     Count only bytes.  Once upon a time, the 'c' in this option stood
+     for "characters."  But, as explained earlier, bytes and character
+     are no longer synonymous with each other.
+
+'-m'
      Count only characters.
 
    Implementing 'wc' in 'awk' is particularly elegant, because 'awk'
 does a lot of the work for us; it splits lines into words (i.e., fields)
 and counts them, it counts lines (i.e., records), and it can easily tell
-us how long a line is.
+us how long a line is in characters.
 
    This program uses the 'getopt()' library function (*note Getopt
 Function::) and the file-transition functions (*note Filetrans
 Function::).
 
-   This version has one notable difference from traditional versions of
-'wc': it always prints the counts in the order lines, words, and
-characters.  Traditional versions note the order of the '-l', '-w', and
-'-c' options on the command line, and print the counts in that order.
+   This version has one notable difference from older versions of 'wc':
+it always prints the counts in the order lines, words, characters and
+bytes.  Older versions note the order of the '-l', '-w', and '-c'
+options on the command line, and print the counts in that order.  POSIX
+does not mandate this behavior, though.
 
    The 'BEGIN' rule does the argument processing.  The variable
 'print_total' is true if more than one file is named on the command
@@ -19056,40 +19128,46 @@ line:
      # Options:
      #    -l    only count lines
      #    -w    only count words
-     #    -c    only count characters
+     #    -c    only count bytes
+     #    -m    only count characters
      #
-     # Default is to count lines, words, characters
+     # Default is to count lines, words, bytes
      #
      # Requires getopt() and file transition library functions
+     # Requires mbs extension from gawkextlib
+
+     @load "mbs"
 
      BEGIN {
          # let getopt() print a message about
          # invalid options. we ignore them
-         while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
+         while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) {
              if (c == "l")
                  do_lines = 1
              else if (c == "w")
                  do_words = 1
              else if (c == "c")
+                 do_bytes = 1
+             else if (c == "m")
                  do_chars = 1
          }
          for (i = 1; i < Optind; i++)
              ARGV[i] = ""
 
-         # if no options, do all
-         if (! do_lines && ! do_words && ! do_chars)
-             do_lines = do_words = do_chars = 1
+         # if no options, do lines, words, bytes
+         if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+             do_lines = do_words = do_bytes = 1
 
          print_total = (ARGC - i > 1)
      }
 
    The 'beginfile()' function is simple; it just resets the counts of
-lines, words, and characters to zero, and saves the current file name in
-'fname':
+lines, words, characters and bytes to zero, and saves the current file
+name in 'fname':
 
      function beginfile(file)
      {
-         lines = words = chars = 0
+         lines = words = chars = bytes = 0
          fname = FILENAME
      }
 
@@ -19103,26 +19181,31 @@ those numbers for the file that was just read.  It relies on
          tlines += lines
          twords += words
          tchars += chars
+         tbytes += bytes
          if (do_lines)
              printf "\t%d", lines
          if (do_words)
              printf "\t%d", words
          if (do_chars)
              printf "\t%d", chars
+         if (do_bytes)
+             printf "\t%d", bytes
          printf "\t%s\n", fname
      }
 
    There is one rule that is executed for each line.  It adds the length
-of the record, plus one, to 'chars'.(1)  Adding one plus the record
-length is needed because the newline character separating records (the
-value of 'RS') is not part of the record itself, and thus not included
-in its length.  Next, 'lines' is incremented for each line read, and
+of the record, plus one, to 'chars'.  Adding one plus the record length
+is needed because the newline character separating records (the value of
+'RS') is not part of the record itself, and thus not included in its
+length.  Similarly, it adds the length of the record in bytes, plus one,
+to 'bytes'.  Next, 'lines' is incremented for each line read, and
 'words' is incremented by the value of 'NF', which is the number of
 "words" on this line:
 
      # do per line
      {
          chars += length($0) + 1    # get newline
+         bytes += mbs_length($0) + 1
          lines++
          words += NF
      }
@@ -19137,15 +19220,12 @@ in its length.  Next, 'lines' is incremented for each line read, and
                  printf "\t%d", twords
              if (do_chars)
                  printf "\t%d", tchars
+             if (do_bytes)
+                 printf "\t%d", tbytes
              print "\ttotal"
          }
      }
 
-   ---------- Footnotes ----------
-
-   (1) Because 'gawk' understands multibyte locales, this code counts
-characters, not bytes.
-
 
 File: gawk.info,  Node: Miscellaneous Programs,  Next: Programs Summary,  Prev: Clones,  Up: Sample Programs
 
@@ -35111,6 +35191,7 @@ Index
 * built-in functions:                    Functions.           (line   6)
 * built-in functions, evaluation order:  Calling Built-in.    (line  30)
 * BusyBox Awk:                           Other Versions.      (line  88)
+* bytes, counting:                       Wc Program.          (line   6)
 * C library functions, assert():         Assert Function.     (line   6)
 * C library functions, getopt():         Getopt Function.     (line  15)
 * C library functions, getpwent():       Passwd Functions.    (line  16)
@@ -35303,7 +35384,7 @@ Index
 * coprocesses <1>:                       Two-way I/O.         (line  27)
 * cos:                                   Numeric Functions.   (line  16)
 * cosine:                                Numeric Functions.   (line  16)
-* counting words, lines, and characters: Wc Program.          (line   6)
+* counting words, lines, characters, and bytese: Wc Program.  (line   6)
 * csh utility:                           Statements/Lines.    (line  45)
 * csh utility, POSIXLY_CORRECT environment variable: Options. (line 405)
 * csh utility, |& operator, comparison with: Two-way I/O.     (line  27)
@@ -37765,7 +37846,7 @@ Index
 * watchpoint (debugger):                 Debugging Terms.     (line  42)
 * watchpoints, show in debugger:         Debugger Info.       (line  51)
 * wc utility:                            Wc Program.          (line   6)
-* wc.awk program:                        Wc Program.          (line  46)
+* wc.awk program:                        wc program.          (line  51)
 * Weinberger, Peter:                     History.             (line  17)
 * Weinberger, Peter <1>:                 Contributors.        (line  12)
 * where debugger command (alias for backtrace): Execution Stack.
@@ -38140,266 +38221,268 @@ Ref: Split Program-Footnote-1766907
 Node: Tee Program767080
 Node: Uniq Program769870
 Node: Wc Program777434
-Ref: Wc Program-Footnote-1781689
-Node: Miscellaneous Programs781783
-Node: Dupword Program782996
-Node: Alarm Program785026
-Node: Translate Program789881
-Ref: Translate Program-Footnote-1794446
-Node: Labels Program794716
-Ref: Labels Program-Footnote-1798067
-Node: Word Sorting798151
-Node: History Sorting802223
-Node: Extract Program804448
-Node: Simple Sed812502
-Node: Igawk Program815576
-Ref: Igawk Program-Footnote-1829907
-Ref: Igawk Program-Footnote-2830109
-Ref: Igawk Program-Footnote-3830231
-Node: Anagram Program830346
-Node: Signature Program833408
-Node: Programs Summary834655
-Node: Programs Exercises835869
-Ref: Programs Exercises-Footnote-1839999
-Node: Advanced Features840085
-Node: Nondecimal Data842075
-Node: Array Sorting843666
-Node: Controlling Array Traversal844366
-Ref: Controlling Array Traversal-Footnote-1852734
-Node: Array Sorting Functions852852
-Ref: Array Sorting Functions-Footnote-1857943
-Node: Two-way I/O858139
-Ref: Two-way I/O-Footnote-1865860
-Ref: Two-way I/O-Footnote-2866047
-Node: TCP/IP Networking866129
-Node: Profiling869247
-Node: Advanced Features Summary878561
-Node: Internationalization880405
-Node: I18N and L10N881885
-Node: Explaining gettext882572
-Ref: Explaining gettext-Footnote-1888464
-Ref: Explaining gettext-Footnote-2888649
-Node: Programmer i18n888814
-Ref: Programmer i18n-Footnote-1893763
-Node: Translator i18n893812
-Node: String Extraction894606
-Ref: String Extraction-Footnote-1895738
-Node: Printf Ordering895824
-Ref: Printf Ordering-Footnote-1898610
-Node: I18N Portability898674
-Ref: I18N Portability-Footnote-1901130
-Node: I18N Example901193
-Ref: I18N Example-Footnote-1904468
-Ref: I18N Example-Footnote-2904541
-Node: Gawk I18N904650
-Node: I18N Summary905299
-Node: Debugger906640
-Node: Debugging907640
-Node: Debugging Concepts908081
-Node: Debugging Terms909890
-Node: Awk Debugging912465
-Ref: Awk Debugging-Footnote-1913410
-Node: Sample Debugging Session913542
-Node: Debugger Invocation914076
-Node: Finding The Bug915462
-Node: List of Debugger Commands921936
-Node: Breakpoint Control923269
-Node: Debugger Execution Control926963
-Node: Viewing And Changing Data930325
-Node: Execution Stack933866
-Node: Debugger Info935503
-Node: Miscellaneous Debugger Commands939574
-Node: Readline Support944636
-Node: Limitations945532
-Node: Debugging Summary948086
-Node: Namespaces949365
-Node: Global Namespace950476
-Node: Qualified Names951874
-Node: Default Namespace952873
-Node: Changing The Namespace953614
-Node: Naming Rules955228
-Node: Internal Name Management957076
-Node: Namespace Example958118
-Node: Namespace And Features960680
-Node: Namespace Summary962115
-Node: Arbitrary Precision Arithmetic963592
-Node: Computer Arithmetic965079
-Ref: table-numeric-ranges968845
-Ref: table-floating-point-ranges969338
-Ref: Computer Arithmetic-Footnote-1969996
-Node: Math Definitions970053
-Ref: table-ieee-formats973369
-Ref: Math Definitions-Footnote-1973972
-Node: MPFR features974077
-Node: FP Math Caution975795
-Ref: FP Math Caution-Footnote-1976867
-Node: Inexactness of computations977236
-Node: Inexact representation978196
-Node: Comparing FP Values979556
-Node: Errors accumulate980797
-Node: Getting Accuracy982230
-Node: Try To Round984940
-Node: Setting precision985839
-Ref: table-predefined-precision-strings986536
-Node: Setting the rounding mode988366
-Ref: table-gawk-rounding-modes988740
-Ref: Setting the rounding mode-Footnote-1992671
-Node: Arbitrary Precision Integers992850
-Ref: Arbitrary Precision Integers-Footnote-1996025
-Node: Checking for MPFR996174
-Node: POSIX Floating Point Problems997648
-Ref: POSIX Floating Point Problems-Footnote-11001933
-Node: Floating point summary1001971
-Node: Dynamic Extensions1004161
-Node: Extension Intro1005714
-Node: Plugin License1006980
-Node: Extension Mechanism Outline1007777
-Ref: figure-load-extension1008216
-Ref: figure-register-new-function1009781
-Ref: figure-call-new-function1010873
-Node: Extension API Description1012935
-Node: Extension API Functions Introduction1014648
-Ref: table-api-std-headers1016484
-Node: General Data Types1020733
-Ref: General Data Types-Footnote-11029363
-Node: Memory Allocation Functions1029662
-Ref: Memory Allocation Functions-Footnote-11034163
-Node: Constructor Functions1034262
-Node: API Ownership of MPFR and GMP Values1037728
-Node: Registration Functions1039041
-Node: Extension Functions1039741
-Node: Exit Callback Functions1045063
-Node: Extension Version String1046313
-Node: Input Parsers1046976
-Node: Output Wrappers1059697
-Node: Two-way processors1064209
-Node: Printing Messages1066474
-Ref: Printing Messages-Footnote-11067645
-Node: Updating ERRNO1067798
-Node: Requesting Values1068537
-Ref: table-value-types-returned1069274
-Node: Accessing Parameters1070210
-Node: Symbol Table Access1071447
-Node: Symbol table by name1071959
-Ref: Symbol table by name-Footnote-11074983
-Node: Symbol table by cookie1075111
-Ref: Symbol table by cookie-Footnote-11079296
-Node: Cached values1079360
-Ref: Cached values-Footnote-11082896
-Node: Array Manipulation1083049
-Ref: Array Manipulation-Footnote-11084140
-Node: Array Data Types1084177
-Ref: Array Data Types-Footnote-11086835
-Node: Array Functions1086927
-Node: Flattening Arrays1091425
-Node: Creating Arrays1098401
-Node: Redirection API1103168
-Node: Extension API Variables1106001
-Node: Extension Versioning1106712
-Ref: gawk-api-version1107141
-Node: Extension GMP/MPFR Versioning1108872
-Node: Extension API Informational Variables1110500
-Node: Extension API Boilerplate1111573
-Node: Changes from API V11115547
-Node: Finding Extensions1117119
-Node: Extension Example1117678
-Node: Internal File Description1118476
-Node: Internal File Ops1122556
-Ref: Internal File Ops-Footnote-11133906
-Node: Using Internal File Ops1134046
-Ref: Using Internal File Ops-Footnote-11136429
-Node: Extension Samples1136703
-Node: Extension Sample File Functions1138232
-Node: Extension Sample Fnmatch1145881
-Node: Extension Sample Fork1147368
-Node: Extension Sample Inplace1148586
-Node: Extension Sample Ord1152212
-Node: Extension Sample Readdir1153048
-Ref: table-readdir-file-types1153937
-Node: Extension Sample Revout1155004
-Node: Extension Sample Rev2way1155593
-Node: Extension Sample Read write array1156333
-Node: Extension Sample Readfile1158275
-Node: Extension Sample Time1159370
-Node: Extension Sample API Tests1161122
-Node: gawkextlib1161614
-Node: Extension summary1164532
-Node: Extension Exercises1168234
-Node: Language History1169476
-Node: V7/SVR3.11171132
-Node: SVR41173284
-Node: POSIX1174718
-Node: BTL1176099
-Node: POSIX/GNU1176828
-Node: Feature History1182606
-Node: Common Extensions1198925
-Node: Ranges and Locales1200208
-Ref: Ranges and Locales-Footnote-11204824
-Ref: Ranges and Locales-Footnote-21204851
-Ref: Ranges and Locales-Footnote-31205086
-Node: Contributors1205309
-Node: History summary1211306
-Node: Installation1212686
-Node: Gawk Distribution1213630
-Node: Getting1214114
-Node: Extracting1215077
-Node: Distribution contents1216715
-Node: Unix Installation1223195
-Node: Quick Installation1223877
-Node: Shell Startup Files1226291
-Node: Additional Configuration Options1227380
-Node: Configuration Philosophy1229695
-Node: Non-Unix Installation1232064
-Node: PC Installation1232524
-Node: PC Binary Installation1233362
-Node: PC Compiling1233797
-Node: PC Using1234914
-Node: Cygwin1238467
-Node: MSYS1239691
-Node: VMS Installation1240293
-Node: VMS Compilation1241084
-Ref: VMS Compilation-Footnote-11242313
-Node: VMS Dynamic Extensions1242371
-Node: VMS Installation Details1244056
-Node: VMS Running1246309
-Node: VMS GNV1250588
-Node: VMS Old Gawk1251323
-Node: Bugs1251794
-Node: Bug address1252457
-Node: Usenet1255439
-Node: Maintainers1256443
-Node: Other Versions1257628
-Node: Installation summary1264716
-Node: Notes1265925
-Node: Compatibility Mode1266719
-Node: Additions1267501
-Node: Accessing The Source1268426
-Node: Adding Code1269863
-Node: New Ports1276082
-Node: Derived Files1280457
-Ref: Derived Files-Footnote-11286117
-Ref: Derived Files-Footnote-21286152
-Ref: Derived Files-Footnote-31286750
-Node: Future Extensions1286864
-Node: Implementation Limitations1287522
-Node: Extension Design1288732
-Node: Old Extension Problems1289876
-Ref: Old Extension Problems-Footnote-11291394
-Node: Extension New Mechanism Goals1291451
-Ref: Extension New Mechanism Goals-Footnote-11294815
-Node: Extension Other Design Decisions1295004
-Node: Extension Future Growth1297117
-Node: Notes summary1297723
-Node: Basic Concepts1298881
-Node: Basic High Level1299562
-Ref: figure-general-flow1299844
-Ref: figure-process-flow1300529
-Ref: Basic High Level-Footnote-11303830
-Node: Basic Data Typing1304015
-Node: Glossary1307343
-Node: Copying1339228
-Node: GNU Free Documentation License1376771
-Node: Index1401891
+Node: Bytes vs. Characters777831
+Node: Using extensions779379
+Node: wc program780137
+Node: Miscellaneous Programs784995
+Node: Dupword Program786208
+Node: Alarm Program788238
+Node: Translate Program793093
+Ref: Translate Program-Footnote-1797658
+Node: Labels Program797928
+Ref: Labels Program-Footnote-1801279
+Node: Word Sorting801363
+Node: History Sorting805435
+Node: Extract Program807660
+Node: Simple Sed815714
+Node: Igawk Program818788
+Ref: Igawk Program-Footnote-1833119
+Ref: Igawk Program-Footnote-2833321
+Ref: Igawk Program-Footnote-3833443
+Node: Anagram Program833558
+Node: Signature Program836620
+Node: Programs Summary837867
+Node: Programs Exercises839081
+Ref: Programs Exercises-Footnote-1843211
+Node: Advanced Features843297
+Node: Nondecimal Data845287
+Node: Array Sorting846878
+Node: Controlling Array Traversal847578
+Ref: Controlling Array Traversal-Footnote-1855946
+Node: Array Sorting Functions856064
+Ref: Array Sorting Functions-Footnote-1861155
+Node: Two-way I/O861351
+Ref: Two-way I/O-Footnote-1869072
+Ref: Two-way I/O-Footnote-2869259
+Node: TCP/IP Networking869341
+Node: Profiling872459
+Node: Advanced Features Summary881773
+Node: Internationalization883617
+Node: I18N and L10N885097
+Node: Explaining gettext885784
+Ref: Explaining gettext-Footnote-1891676
+Ref: Explaining gettext-Footnote-2891861
+Node: Programmer i18n892026
+Ref: Programmer i18n-Footnote-1896975
+Node: Translator i18n897024
+Node: String Extraction897818
+Ref: String Extraction-Footnote-1898950
+Node: Printf Ordering899036
+Ref: Printf Ordering-Footnote-1901822
+Node: I18N Portability901886
+Ref: I18N Portability-Footnote-1904342
+Node: I18N Example904405
+Ref: I18N Example-Footnote-1907680
+Ref: I18N Example-Footnote-2907753
+Node: Gawk I18N907862
+Node: I18N Summary908511
+Node: Debugger909852
+Node: Debugging910852
+Node: Debugging Concepts911293
+Node: Debugging Terms913102
+Node: Awk Debugging915677
+Ref: Awk Debugging-Footnote-1916622
+Node: Sample Debugging Session916754
+Node: Debugger Invocation917288
+Node: Finding The Bug918674
+Node: List of Debugger Commands925148
+Node: Breakpoint Control926481
+Node: Debugger Execution Control930175
+Node: Viewing And Changing Data933537
+Node: Execution Stack937078
+Node: Debugger Info938715
+Node: Miscellaneous Debugger Commands942786
+Node: Readline Support947848
+Node: Limitations948744
+Node: Debugging Summary951298
+Node: Namespaces952577
+Node: Global Namespace953688
+Node: Qualified Names955086
+Node: Default Namespace956085
+Node: Changing The Namespace956826
+Node: Naming Rules958440
+Node: Internal Name Management960288
+Node: Namespace Example961330
+Node: Namespace And Features963892
+Node: Namespace Summary965327
+Node: Arbitrary Precision Arithmetic966804
+Node: Computer Arithmetic968291
+Ref: table-numeric-ranges972057
+Ref: table-floating-point-ranges972550
+Ref: Computer Arithmetic-Footnote-1973208
+Node: Math Definitions973265
+Ref: table-ieee-formats976581
+Ref: Math Definitions-Footnote-1977184
+Node: MPFR features977289
+Node: FP Math Caution979007
+Ref: FP Math Caution-Footnote-1980079
+Node: Inexactness of computations980448
+Node: Inexact representation981408
+Node: Comparing FP Values982768
+Node: Errors accumulate984009
+Node: Getting Accuracy985442
+Node: Try To Round988152
+Node: Setting precision989051
+Ref: table-predefined-precision-strings989748
+Node: Setting the rounding mode991578
+Ref: table-gawk-rounding-modes991952
+Ref: Setting the rounding mode-Footnote-1995883
+Node: Arbitrary Precision Integers996062
+Ref: Arbitrary Precision Integers-Footnote-1999237
+Node: Checking for MPFR999386
+Node: POSIX Floating Point Problems1000860
+Ref: POSIX Floating Point Problems-Footnote-11005145
+Node: Floating point summary1005183
+Node: Dynamic Extensions1007373
+Node: Extension Intro1008926
+Node: Plugin License1010192
+Node: Extension Mechanism Outline1010989
+Ref: figure-load-extension1011428
+Ref: figure-register-new-function1012993
+Ref: figure-call-new-function1014085
+Node: Extension API Description1016147
+Node: Extension API Functions Introduction1017860
+Ref: table-api-std-headers1019696
+Node: General Data Types1023945
+Ref: General Data Types-Footnote-11032575
+Node: Memory Allocation Functions1032874
+Ref: Memory Allocation Functions-Footnote-11037375
+Node: Constructor Functions1037474
+Node: API Ownership of MPFR and GMP Values1040940
+Node: Registration Functions1042253
+Node: Extension Functions1042953
+Node: Exit Callback Functions1048275
+Node: Extension Version String1049525
+Node: Input Parsers1050188
+Node: Output Wrappers1062909
+Node: Two-way processors1067421
+Node: Printing Messages1069686
+Ref: Printing Messages-Footnote-11070857
+Node: Updating ERRNO1071010
+Node: Requesting Values1071749
+Ref: table-value-types-returned1072486
+Node: Accessing Parameters1073422
+Node: Symbol Table Access1074659
+Node: Symbol table by name1075171
+Ref: Symbol table by name-Footnote-11078195
+Node: Symbol table by cookie1078323
+Ref: Symbol table by cookie-Footnote-11082508
+Node: Cached values1082572
+Ref: Cached values-Footnote-11086108
+Node: Array Manipulation1086261
+Ref: Array Manipulation-Footnote-11087352
+Node: Array Data Types1087389
+Ref: Array Data Types-Footnote-11090047
+Node: Array Functions1090139
+Node: Flattening Arrays1094637
+Node: Creating Arrays1101613
+Node: Redirection API1106380
+Node: Extension API Variables1109213
+Node: Extension Versioning1109924
+Ref: gawk-api-version1110353
+Node: Extension GMP/MPFR Versioning1112084
+Node: Extension API Informational Variables1113712
+Node: Extension API Boilerplate1114785
+Node: Changes from API V11118759
+Node: Finding Extensions1120331
+Node: Extension Example1120890
+Node: Internal File Description1121688
+Node: Internal File Ops1125768
+Ref: Internal File Ops-Footnote-11137118
+Node: Using Internal File Ops1137258
+Ref: Using Internal File Ops-Footnote-11139641
+Node: Extension Samples1139915
+Node: Extension Sample File Functions1141444
+Node: Extension Sample Fnmatch1149093
+Node: Extension Sample Fork1150580
+Node: Extension Sample Inplace1151798
+Node: Extension Sample Ord1155424
+Node: Extension Sample Readdir1156260
+Ref: table-readdir-file-types1157149
+Node: Extension Sample Revout1158216
+Node: Extension Sample Rev2way1158805
+Node: Extension Sample Read write array1159545
+Node: Extension Sample Readfile1161487
+Node: Extension Sample Time1162582
+Node: Extension Sample API Tests1164334
+Node: gawkextlib1164826
+Node: Extension summary1167744
+Node: Extension Exercises1171446
+Node: Language History1172688
+Node: V7/SVR3.11174344
+Node: SVR41176496
+Node: POSIX1177930
+Node: BTL1179311
+Node: POSIX/GNU1180040
+Node: Feature History1185818
+Node: Common Extensions1202137
+Node: Ranges and Locales1203420
+Ref: Ranges and Locales-Footnote-11208036
+Ref: Ranges and Locales-Footnote-21208063
+Ref: Ranges and Locales-Footnote-31208298
+Node: Contributors1208521
+Node: History summary1214518
+Node: Installation1215898
+Node: Gawk Distribution1216842
+Node: Getting1217326
+Node: Extracting1218289
+Node: Distribution contents1219927
+Node: Unix Installation1226407
+Node: Quick Installation1227089
+Node: Shell Startup Files1229503
+Node: Additional Configuration Options1230592
+Node: Configuration Philosophy1232907
+Node: Non-Unix Installation1235276
+Node: PC Installation1235736
+Node: PC Binary Installation1236574
+Node: PC Compiling1237009
+Node: PC Using1238126
+Node: Cygwin1241679
+Node: MSYS1242903
+Node: VMS Installation1243505
+Node: VMS Compilation1244296
+Ref: VMS Compilation-Footnote-11245525
+Node: VMS Dynamic Extensions1245583
+Node: VMS Installation Details1247268
+Node: VMS Running1249521
+Node: VMS GNV1253800
+Node: VMS Old Gawk1254535
+Node: Bugs1255006
+Node: Bug address1255669
+Node: Usenet1258651
+Node: Maintainers1259655
+Node: Other Versions1260840
+Node: Installation summary1267928
+Node: Notes1269137
+Node: Compatibility Mode1269931
+Node: Additions1270713
+Node: Accessing The Source1271638
+Node: Adding Code1273075
+Node: New Ports1279294
+Node: Derived Files1283669
+Ref: Derived Files-Footnote-11289329
+Ref: Derived Files-Footnote-21289364
+Ref: Derived Files-Footnote-31289962
+Node: Future Extensions1290076
+Node: Implementation Limitations1290734
+Node: Extension Design1291944
+Node: Old Extension Problems1293088
+Ref: Old Extension Problems-Footnote-11294606
+Node: Extension New Mechanism Goals1294663
+Ref: Extension New Mechanism Goals-Footnote-11298027
+Node: Extension Other Design Decisions1298216
+Node: Extension Future Growth1300329
+Node: Notes summary1300935
+Node: Basic Concepts1302093
+Node: Basic High Level1302774
+Ref: figure-general-flow1303056
+Ref: figure-process-flow1303741
+Ref: Basic High Level-Footnote-11307042
+Node: Basic Data Typing1307227
+Node: Glossary1310555
+Node: Copying1342440
+Node: GNU Free Documentation License1379983
+Node: Index1405103
 
 End Tag Table
 
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 9446e696..91625d06 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -26761,19 +26761,76 @@ as fast.''  Consider how to rewrite the logic to follow this suggestion.
 @node Wc Program
 @subsection Counting Things
 
-@c FIXME: One day, update to current POSIX version of wc
-
-@cindex counting words, lines, and characters
+@cindex counting words, lines, characters, and bytese
 @cindex input files @subentry counting elements in
 @cindex words @subentry counting
 @cindex characters @subentry counting
 @cindex lines @subentry counting
+@cindex bytes @subentry counting
 @cindex @command{wc} utility
-The @command{wc} (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
+The @command{wc} (word count) utility counts lines, words, characters
+and bytes in one or more input files.
+
+@menu
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* @command{wc} program::                  Code for @file{wc.awk}.
+@end menu
+
+@node Bytes vs. Characters
+@subsubsection Modern Character Sets
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC,
+which each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+Today, the most popular character set in use is Unicode (of which ASCII
+is a pure subset). Unicode provides tens of thousands of unique characters
+(called @dfn{code points}) to cover most existing human languages (living
+and dead) and a number of  nonhuman ones as well (such as Klingon and
+J.R.R.@: Tolkien's elvish languages).
+
+To save space in files, Unicode code points are @dfn{encoded}, where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such @dfn{multibyte encodings}.
+
+The POSIX standard requires that @command{awk} function in terms
+of characters, not bytes.  Thus in @command{gawk}, @code{length()},
+@code{substr()}, @code{split()}, @code{match()} and the other string
+functions (@pxref{String Functions}) all work in terms of characters in
+the local character set, and not in terms of bytes. (Not all @command{awk}
+implementations do so, though).
+
+There is no standard, built-in way to distinguish characters from bytes
+in an @command{awk} program.  For an @command{awk} implementation of
+@command{wc}, which needs to make such a distinction, we will have to
+use an external extension.
+
+@node Using extensions
+@subsubsection A Brief Introduction To Extensions
+
+Loadable extensions are presented in full detail in @ref{Dynamic Extensions}.
+They provide a way to add functions to @command{gawk} which can call
+out to other facilities written in C or C++.
+
+For the purposes of
+@file{wc.awk}, it's enough to know that the extension is loaded
+with the @code{@@load} directive, and the additional function we
+will use is called @code{mbs_length()}.  This function returns the
+number of bytes in a string, and not the number of characters.
+
+The @code{"mbs"} extension comes from the @code{gawkextlib}
+project. @xref{gawkextlib} for more information.
+
+@node @command{wc} program
+@subsubsection Code for @file{wc.awk}
+
+The usage for @command{wc} is as follows:
 
 @display
-@command{wc} [@option{-lwc}] [@var{files} @dots{}]
+@command{wc} [@option{-lwcm}] [@var{files} @dots{}]
 @end display
 
 If no files are specified on the command line, @command{wc} reads its standard
@@ -26791,24 +26848,30 @@ by spaces and/or TABs.  Luckily, this is the normal way @command{awk} separates
 fields in its input data.
 
 @item -c
+Count only bytes.
+Once upon a time, the @samp{c} in this option stood for ``characters.''
+But, as explained earlier, bytes and character are no longer synonymous
+with each other.
+
+@item -m
 Count only characters.
 @end table
 
 Implementing @command{wc} in @command{awk} is particularly elegant,
 because @command{awk} does a lot of the work for us; it splits lines into
 words (i.e., fields) and counts them, it counts lines (i.e., records),
-and it can easily tell us how long a line is.
+and it can easily tell us how long a line is in characters.
 
 This program uses the @code{getopt()} library function
 (@pxref{Getopt Function})
 and the file-transition functions
 (@pxref{Filetrans Function}).
 
-This version has one notable difference from traditional versions of
+This version has one notable difference from older versions of
 @command{wc}: it always prints the counts in the order lines, words,
-and characters.  Traditional versions note the order of the @option{-l},
+characters and bytes.  Older versions note the order of the @option{-l},
 @option{-w}, and @option{-c} options on the command line, and print the
-counts in that order.
+counts in that order.  POSIX does not mandate this behavior, though.
 
 The @code{BEGIN} rule does the argument processing.  The variable
 @code{print_total} is true if more than one file is named on the
@@ -26824,6 +26887,7 @@ command line:
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
+# Revised September 2020
 @c endfile
 @end ignore
 @c file eg/prog/wc.awk
@@ -26831,29 +26895,35 @@ command line:
 # Options:
 #    -l    only count lines
 #    -w    only count words
-#    -c    only count characters
+#    -c    only count bytes
+#    -m    only count characters
 #
-# Default is to count lines, words, characters
+# Default is to count lines, words, bytes
 #
 # Requires getopt() and file transition library functions
+# Requires mbs extension from gawkextlib
+
+@@load "mbs"
 
 BEGIN @{
     # let getopt() print a message about
     # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+    while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) @{
         if (c == "l")
             do_lines = 1
         else if (c == "w")
             do_words = 1
         else if (c == "c")
+            do_bytes = 1
+        else if (c == "m")
             do_chars = 1
     @}
     for (i = 1; i < Optind; i++)
         ARGV[i] = ""
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+    # if no options, do lines, words, bytes
+    if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+        do_lines = do_words = do_bytes = 1
 
     print_total = (ARGC - i > 1)
 @}
@@ -26861,14 +26931,14 @@ BEGIN @{
 @end example
 
 The @code{beginfile()} function is simple; it just resets the counts of lines,
-words, and characters to zero, and saves the current @value{FN} in
+words, characters and bytes to zero, and saves the current @value{FN} in
 @code{fname}:
 
 @example
 @c file eg/prog/wc.awk
 function beginfile(file)
 @{
-    lines = words = chars = 0
+    lines = words = chars = bytes = 0
     fname = FILENAME
 @}
 @c endfile
@@ -26886,6 +26956,7 @@ function endfile(file)
     tlines += lines
     twords += words
     tchars += chars
+    tbytes += bytes
     if (do_lines)
         printf "\t%d", lines
 @group
@@ -26894,26 +26965,28 @@ function endfile(file)
 @end group
     if (do_chars)
         printf "\t%d", chars
+    if (do_bytes)
+        printf "\t%d", bytes
     printf "\t%s\n", fname
 @}
 @c endfile
 @end example
 
 There is one rule that is executed for each line. It adds the length of
-the record, plus one, to @code{chars}.@footnote{Because @command{gawk}
-understands multibyte locales, this code counts characters, not bytes.}
-Adding one plus the record length
+the record, plus one, to @code{chars}.  Adding one plus the record length
 is needed because the newline character separating records (the value
 of @code{RS}) is not part of the record itself, and thus not included
-in its length.  Next, @code{lines} is incremented for each line read,
-and @code{words} is incremented by the value of @code{NF}, which is the
-number of ``words'' on this line:
+in its length.  Similarly, it adds the length of the record in bytes,
+plus one, to @code{bytes}.  Next, @code{lines} is incremented for each
+line read, and @code{words} is incremented by the value of @code{NF},
+which is the number of ``words'' on this line:
 
 @example
 @c file eg/prog/wc.awk
 # do per line
 @{
     chars += length($0) + 1    # get newline
+    bytes += mbs_length($0) + 1
     lines++
     words += NF
 @}
@@ -26932,6 +27005,8 @@ END @{
             printf "\t%d", twords
         if (do_chars)
             printf "\t%d", tchars
+        if (do_bytes)
+            printf "\t%d", tbytes
         print "\ttotal"
     @}
 @}
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index f96ff861..f982ae8b 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -25771,19 +25771,76 @@ as fast.''  Consider how to rewrite the logic to follow this suggestion.
 @node Wc Program
 @subsection Counting Things
 
-@c FIXME: One day, update to current POSIX version of wc
-
-@cindex counting words, lines, and characters
+@cindex counting words, lines, characters, and bytese
 @cindex input files @subentry counting elements in
 @cindex words @subentry counting
 @cindex characters @subentry counting
 @cindex lines @subentry counting
+@cindex bytes @subentry counting
 @cindex @command{wc} utility
-The @command{wc} (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
+The @command{wc} (word count) utility counts lines, words, characters
+and bytes in one or more input files.
+
+@menu
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* @command{wc} program::                  Code for @file{wc.awk}.
+@end menu
+
+@node Bytes vs. Characters
+@subsubsection Modern Character Sets
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC,
+which each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+Today, the most popular character set in use is Unicode (of which ASCII
+is a pure subset). Unicode provides tens of thousands of unique characters
+(called @dfn{code points}) to cover most existing human languages (living
+and dead) and a number of  nonhuman ones as well (such as Klingon and
+J.R.R.@: Tolkien's elvish languages).
+
+To save space in files, Unicode code points are @dfn{encoded}, where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such @dfn{multibyte encodings}.
+
+The POSIX standard requires that @command{awk} function in terms
+of characters, not bytes.  Thus in @command{gawk}, @code{length()},
+@code{substr()}, @code{split()}, @code{match()} and the other string
+functions (@pxref{String Functions}) all work in terms of characters in
+the local character set, and not in terms of bytes. (Not all @command{awk}
+implementations do so, though).
+
+There is no standard, built-in way to distinguish characters from bytes
+in an @command{awk} program.  For an @command{awk} implementation of
+@command{wc}, which needs to make such a distinction, we will have to
+use an external extension.
+
+@node Using extensions
+@subsubsection A Brief Introduction To Extensions
+
+Loadable extensions are presented in full detail in @ref{Dynamic Extensions}.
+They provide a way to add functions to @command{gawk} which can call
+out to other facilities written in C or C++.
+
+For the purposes of
+@file{wc.awk}, it's enough to know that the extension is loaded
+with the @code{@@load} directive, and the additional function we
+will use is called @code{mbs_length()}.  This function returns the
+number of bytes in a string, and not the number of characters.
+
+The @code{"mbs"} extension comes from the @code{gawkextlib}
+project. @xref{gawkextlib} for more information.
+
+@node @command{wc} program
+@subsubsection Code for @file{wc.awk}
+
+The usage for @command{wc} is as follows:
 
 @display
-@command{wc} [@option{-lwc}] [@var{files} @dots{}]
+@command{wc} [@option{-lwcm}] [@var{files} @dots{}]
 @end display
 
 If no files are specified on the command line, @command{wc} reads its standard
@@ -25801,24 +25858,30 @@ by spaces and/or TABs.  Luckily, this is the normal way @command{awk} separates
 fields in its input data.
 
 @item -c
+Count only bytes.
+Once upon a time, the @samp{c} in this option stood for ``characters.''
+But, as explained earlier, bytes and character are no longer synonymous
+with each other.
+
+@item -m
 Count only characters.
 @end table
 
 Implementing @command{wc} in @command{awk} is particularly elegant,
 because @command{awk} does a lot of the work for us; it splits lines into
 words (i.e., fields) and counts them, it counts lines (i.e., records),
-and it can easily tell us how long a line is.
+and it can easily tell us how long a line is in characters.
 
 This program uses the @code{getopt()} library function
 (@pxref{Getopt Function})
 and the file-transition functions
 (@pxref{Filetrans Function}).
 
-This version has one notable difference from traditional versions of
+This version has one notable difference from older versions of
 @command{wc}: it always prints the counts in the order lines, words,
-and characters.  Traditional versions note the order of the @option{-l},
+characters and bytes.  Older versions note the order of the @option{-l},
 @option{-w}, and @option{-c} options on the command line, and print the
-counts in that order.
+counts in that order.  POSIX does not mandate this behavior, though.
 
 The @code{BEGIN} rule does the argument processing.  The variable
 @code{print_total} is true if more than one file is named on the
@@ -25834,6 +25897,7 @@ command line:
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
+# Revised September 2020
 @c endfile
 @end ignore
 @c file eg/prog/wc.awk
@@ -25841,29 +25905,35 @@ command line:
 # Options:
 #    -l    only count lines
 #    -w    only count words
-#    -c    only count characters
+#    -c    only count bytes
+#    -m    only count characters
 #
-# Default is to count lines, words, characters
+# Default is to count lines, words, bytes
 #
 # Requires getopt() and file transition library functions
+# Requires mbs extension from gawkextlib
+
+@@load "mbs"
 
 BEGIN @{
     # let getopt() print a message about
     # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+    while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) @{
         if (c == "l")
             do_lines = 1
         else if (c == "w")
             do_words = 1
         else if (c == "c")
+            do_bytes = 1
+        else if (c == "m")
             do_chars = 1
     @}
     for (i = 1; i < Optind; i++)
         ARGV[i] = ""
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+    # if no options, do lines, words, bytes
+    if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+        do_lines = do_words = do_bytes = 1
 
     print_total = (ARGC - i > 1)
 @}
@@ -25871,14 +25941,14 @@ BEGIN @{
 @end example
 
 The @code{beginfile()} function is simple; it just resets the counts of lines,
-words, and characters to zero, and saves the current @value{FN} in
+words, characters and bytes to zero, and saves the current @value{FN} in
 @code{fname}:
 
 @example
 @c file eg/prog/wc.awk
 function beginfile(file)
 @{
-    lines = words = chars = 0
+    lines = words = chars = bytes = 0
     fname = FILENAME
 @}
 @c endfile
@@ -25896,6 +25966,7 @@ function endfile(file)
     tlines += lines
     twords += words
     tchars += chars
+    tbytes += bytes
     if (do_lines)
         printf "\t%d", lines
 @group
@@ -25904,26 +25975,28 @@ function endfile(file)
 @end group
     if (do_chars)
         printf "\t%d", chars
+    if (do_bytes)
+        printf "\t%d", bytes
     printf "\t%s\n", fname
 @}
 @c endfile
 @end example
 
 There is one rule that is executed for each line. It adds the length of
-the record, plus one, to @code{chars}.@footnote{Because @command{gawk}
-understands multibyte locales, this code counts characters, not bytes.}
-Adding one plus the record length
+the record, plus one, to @code{chars}.  Adding one plus the record length
 is needed because the newline character separating records (the value
 of @code{RS}) is not part of the record itself, and thus not included
-in its length.  Next, @code{lines} is incremented for each line read,
-and @code{words} is incremented by the value of @code{NF}, which is the
-number of ``words'' on this line:
+in its length.  Similarly, it adds the length of the record in bytes,
+plus one, to @code{bytes}.  Next, @code{lines} is incremented for each
+line read, and @code{words} is incremented by the value of @code{NF},
+which is the number of ``words'' on this line:
 
 @example
 @c file eg/prog/wc.awk
 # do per line
 @{
     chars += length($0) + 1    # get newline
+    bytes += mbs_length($0) + 1
     lines++
     words += NF
 @}
@@ -25942,6 +26015,8 @@ END @{
             printf "\t%d", twords
         if (do_chars)
             printf "\t%d", tchars
+        if (do_bytes)
+            printf "\t%d", tbytes
         print "\ttotal"
     @}
 @}