aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawktexi.in
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r--doc/gawktexi.in34
1 files changed, 34 insertions, 0 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 7b3c7ce8..e3b1bd2d 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -574,6 +574,7 @@ particular records in a file and perform operations upon them.
* Fields with fixed data:: Field values with fixed-width data.
* Splitting By Content:: Defining Fields By Content
* More CSV:: More on CSV files.
+* FS versus FPAT:: A subtle difference.
* Testing field creation:: Checking how @command{gawk} is
splitting records.
* Multiple Line:: Reading multiline records.
@@ -7913,6 +7914,7 @@ four, and @code{$4} has the value @code{"ddd"}.
@menu
* More CSV:: More on CSV files.
+* FS versus FPAT:: A subtle difference.
@end menu
@c O'Reilly doesn't like it as a note the first thing in the section.
@@ -8122,6 +8124,28 @@ a bed with a blanket that's not quite big enough. There's always a corner
that isn't covered. We recommend, instead, that you use Manuel Collado's
@uref{http://mcollado.z15.es/xgawk/, @code{CSVMODE} library for @command{gawk}}.
+@node FS versus FPAT
+@subsection @code{FS} Versus @code{FPAT}: A Subtle Difference
+
+As we discussed earlier, @code{FS} describes the data between fields (``what fields are not'')
+and @code{FPAT} describes the fields themselves (``what fields are'').
+This leads to a subtle difference in how fields are found when using regexps as the value
+for @code{FS} or @code{FPAT}.
+
+In order to distinguish one field from another, there must be a non-empty separator between
+each field. This makes intuitive sense---otherwise one could not distinguish fields from
+separators.
+
+Thus, regular expression matching as done when splitting fields with @code{FS} is not
+allowed to match the null string; it must always match at least one character, in order
+to be able to proceed through the entire record.
+
+On the other hand, regular expression matching with @code{FPAT} can match the null
+string, and the non-matching intervening characters function as the separators.
+
+This same difference is reflected in how matching is done with the @code{split()}
+and @code{patsplit()} functions (@pxref{String Functions}).
+
@node Testing field creation
@section Checking How @command{gawk} Is Splitting Records
@@ -18505,6 +18529,16 @@ Nonalphabetic characters are left unchanged. For example,
@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
@end table
+At first glance, the @code{split()} and @code{patsplit()} functions appear to be
+mirror images of each other. But there are differences:
+
+@itemize @bullet
+@item @code{split()} treats its third argument like @code{FS}, with all the
+special rules involved for @code{FS}.
+
+@item Matching of null strings differs. This is discussed in @ref{FS versus FPAT}.
+@end itemize
+
@sidebar Matching the Null String
@cindex matching @subentry null strings
@cindex null strings @subentry matching