.\"Copyright (C) 2009, Kaz Kylheku <kkylheku@gmail.com>.
.\"All rights reserved.
.\"
.\"BSD License:
.\"
.\"Redistribution and use in source and binary forms, with or without
.\"modification, are permitted provided that the following conditions
.\"are met:
.\"
.\"  1. Redistributions of source code must retain the above copyright
.\"     notice, this list of conditions and the following disclaimer.
.\"  2. Redistributions in binary form must reproduce the above copyright
.\"     notice, this list of conditions and the following disclaimer in
.\"     the documentation and/or other materials provided with the
.\"     distribution.
.\"  3. The name of the author may not be used to endorse or promote
.\"     products derived from this software without specific prior
.\"     written permission.
.\"
.\"THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
.\"IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
.\"WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

.TH txr 1 2009-09-09 "txr v. 012" "Text Extraction Utility"
.SH NAME
txr \- text extractor
.SH SYNOPSIS
.B txr [ options ] query-file { data-file }*
.sp
.SH DESCRIPTION
.B txr
is a query tool for extracting pieces of text buried in one or more text
file based on pattern matching.  A
.B txr
query specifies a pattern which matches (a prefix of) entire file, or
multiple files. The pattern is matched against the material in the files, and
free variables occurring in the pattern are bound to the pieces of text
occurring in the corresponding positions. If the overall match is
successful, then
.B txr
can do one of two things: it can report the list of variables which were bound,
in the form of a set of variable assignments which can be evaluated by the
.B eval
command of the POSIX shell language, or generate a custom report according
to special directives in the query.

In addition to embedded variables which implicitly match text, the
.B txr
query language supports a number of directives, for matching text using regular
expressions, for continuing a match in another file, for searching through a
file for the place where an entire sub-query matches, for collecting lists, and
for combining sub-queries using logical conjunction, disjunction and negation.

When
.B txr
finds a match for a variable and binds it, if that variable occurs again
later in the query, the variable's text is substituted, forcing a match for
that exact text. Thus txr supports a rudimentary form of backreferencing
unification, if you will. For example, the query

  @FOO=@FOO

will match material from the start of the line until the first equal sign,
and bind it to the variable
.IR FOO.
Then, the material which follows the equal sign to the end of the line must
match the contents bound to FOO. Hence the line "abc=abc" will match, but
"abc=xyz" will fail to match.

Generally, the scope of a variable's binding
extends from its first successful match where the binding is established, to
the end of the query. Unsuccessful subqueries have no effect on the
bindings.  Even if a failed subquery is partially successful, all of its
bindings are thrown away.  Some directives treat the bindings emanating
from their subqueries in special ways.

.SH ARGUMENTS AND OPTIONS

Options other than -D may be combined together into a single argument.
The -v and -q options are mutually exclusive. The one which occurs
in the rightmost position in the argument list dominates.

.IP -Dvar=value
Bind the variable
.IR var
to the value
.IR value
prior to processing the query. The name is in scope over the entire
query, so that all occurrence of the variable are substituted and
match the equivalent text.  If the value contains commas, these
are interpreted as separators, which give rise to a list value.
For instance -Da,b,c creates a list of the strings "a", "b" and "c".
(See Collect Directive bellow). List variables provide a multiple
match. That is to say, if a list variable occurs in a query, a successful
match occurs if any of its values matches the text. If more than one
value matches the text, the first one is taken.

.IP -Dvar
Binds the variable
.IR var
to an empty string value prior to processing the query.

.IP -q
Quiet operation during matching. Certain error messages are not reported on the
standard error device (but the if the situations occur, they still fail the
query). This option does not suppress error generation during the parsing
of the query, only during its execution.

.IP -v
Verbose operation. Detailed logging is enabled.

.IP -b
Suppresses the printing of variable bindings for a successful query, and the
word .IR false for a failed query. The program still sets an appropriate
termination status.

.IP -a num
Specifies the maximum number of array dimensions to use for variables
arising out of collect. The default is 1. Additional dimensions are
expressed using numeric suffixes in the generated variable names.
For instance, consider the three-dimensional list arising out of a triply
nested collect: ((("a" "b") ("c" "d")) (("e" "f") ("g" "h"))).
Suppose this is bound to a variable V.  With -a 1, this will be
reported as:

  V_0_0[0]="a"
  V_0_1[0]="b"
  V_1_0[0]="c"
  V_1_1[0]="d"
  V_0_0[1]="e"
  V_0_1[1]="f"
  V_1_0[1]="g"
  V_1_1[1]="h"

The leftmost bracketed index is the most major index. That is to say,
the dimension order is: NAME_m_m+1_..._n[1][2]...[m-1].

.IP --help
Prints usage summary on standard output, and terminates successfully.

.IP --version
Prints program version standard output, and terminates successfully.

.IP --
Signifies the end of the option list. This option does not combine with others, so for instance -b- does not mean -b --, but is an error.

.IP -
This argument is not interpreted as an option, but treated as a filename
argument. After the first such argument, no more options are recognized. Even
if another argument looks like an option, it is treated as a name.
This special argument - means "read from standard input" instead of a file.
The query file, or any of the data files, may be specified using this option.
If two or more files are specified as -, the behavior is system-dependent.
It may be possible to indicate EOF from the interactive terminal, and
then specify more input which is interpreted as the second file, and so forth.

.PP
After the options, the remaining arguments are files. The first file argument
specifies the query, and is mandatory.  A file argument consisting of a single
- means to read the standard input instead of opening a file. A file argument
which begins with an exclamation symbol means that the rest of the argument is
a shell command which is to be run as a coprocess, and its output read like a
file.

.PP
.B txr
begins by reading the query. The entire query is scanned, internalized
and then begins executing.  No file is opened until the query calls for a match
for material from that file, but once opened, a file is always read in its
entirety and stored in memory. A query may complete (successfully or not)
before opening some or all of the files.

If no files arguments are specified on the command line, it is up to the
query to open a file, pipe or standard input via the @(next) directive
prior to attempting to make a match. If a query attempts to match text,
but has run out of files to process, the match fails.

.SH STATUS AND ERROR REPORTING
.B txr
sends errors and verbose logs to the standard error device.  The following paragraphs apply when
.B txr
is run without enabling verbose mode. If verbose mode is enabled, then
.B txr
issues diagnostics on the standard error device even in situations which are
not erroneous.

If the command line arguments are incorrect, or the query has a malformed
syntax, or fails to match,
.B txr
issues an error diagnostic and terminates with a failed status.

If the query is accepted, but fails to execute, either due to a
semantic error or due to a mismatch against the data,
.B txr
terminates with a failed status, it also prints the word
.IR false
on standard output. (See NOTES ON FALSE below).  Printing of false
is suppressed if the query executed one or more @(output) directive
directed to standard output.

If the query is well-formed, and matches, then
.B txr
issues no diagnostics on standard error (except in the case of verbose
reporting enabled by -v).  If no variables were bound in the query, then
nothing is printed on standard output.  If the query has matched one or more
variables, then these variables are printed on standard output, in the form of
a shell script which, when evaluated, will cause shell variables to be
assigned.  Printing of these variables is suppressed if the query executed one
or more @(output) directive directed to standard output.

.SH BASIC QUERY SYNTAX AND SEMANTICS

.SS Comments

A query may contain comments which are delimited by the sequence @# and
extend to the end of the line. No whitespace can occur between the @ and #.
A comment which begins on a line swallows that entire line, as well as the
newline which terminates it. In essence, the entire comment disappears.
If the comment follows some material in a line, then it does not consume
the newline. Thus, the following two queries are equivalent:

 1.  @a@# comment: match whole line against variable @a
     @# this comment disappears entirely
     @b

 2.  @a
     @b

The comment after the @a does not consume the newline, but the
comment which follows does. Without this intuitive behavior,
line comment would give rise to empty lines that must match empty
lines in the data, leading to spurious mismatches.

.SS Text

Query material which is not escaped by the special character @ is
literal text, which matches input character for character. Text which occurs at
the beginning of a line matches the beginning of a line.  Text which starts in
the middle of a line, other than following a variable, must match exactly at
the current position, where the previous match left off. Moreover, if the text
is the last element in the line, its match is anchored to the end of the line.

The semantics of text matching next to a variable is discussed in the following
section.

A query may not leave unmatched material in a line which is covered by the
query.  However, a query may leave unmatched lines.

In the following example, the query matches the text, even though
the text has an extra line.

 Query:         Four score and seven
                years ago our

 Text:          Four score and seven
                years ago our
                forefathers

In the following example, the query
.B fails
to match the text, because the text has extra material on one
line.

 Query:         I can carry nearly eighty gigs
                in my head

 Text:          I can carry nearly eighty gigs of data
                in my head

Needless to say, if the text has insufficient material relative
to the query, that is a failure also.

To match arbitrary material from the current position to the end
of a line, the "match any sequence of characters, including empty"
regular expression @/.*/ can be used. Example:

 Query:         I can carry nearly eighty gigs@/.*/

 Text:          I can carry nearly eighty gigs of data

In this example, the query matches, since the regular expression
matches the string "of data". (See Regular Expressions section below).

.SS Special Characters in Text

Control characters may be embedded directly in a query (with the exception of
newline characters). An alternative to embedding is to use escape syntax.
The following escapes are supported:

.IP @\ea
Alert character (ASCII 7, BEL).
.IP @\eb
Backspace (ASCII 8, BS).
.IP @\et
Horizontal tab (ASCII 9, HT).
.IP @\en
Line feed (ASCII 10, LF). Serves as abstract newline on POSIX systems.
.IP @\ev
Vertical tab (ASCII 11, VT).
.IP @\ef
Form feed (ASCII 12, FF). This character clears the screen on many
kinds of terminals, or ejects a page of text from a line printer.
.IP @\er
Carriage return (ASCII 13, CR).
.IP @\ee
Escape (ASCII 27, ESC)
.IP @\exHEX
A @\ex followed by a sequence of hex digits is interpreted as a hexadecimal
numeric character code. For instance @\ex41 is the ASCII character A.
.IP @\eOCTAL
A @\e followed by a sequence of octal digits (0 through 7) is interpreted
as an octal character code. For instance @\e010 is character 8, same as @\eb.
.PP

Note that if a newline is embedded into a query line with @\en, this
does not split the line into two; it's embedded into the line and
thus cannot match anything. However, @\en may be useful in the @(cat)
directive and in @(output).

.SS Variables

Much of the query syntax consists of arbitrary text, which matches file data
character for character. Embedded within the query may be variables and
directives which are introduced by a @ character.  Two consecutive @@
characters encode a literal @.

A variable matching or substitution directive is written in one of several
ways:

  @NAME
  @{NAME}
  @*NAME
  @*{NAME}
  @{NAME /RE/}
  @{NAME NUMBER}

The forms with an * indicate a long match, see Longest Match below.
The last two forms with the embedded regexp /RE/ or number have special
semantics, see Positive Match below.

The name itself may consist of any combination of one or more letters, numbers,
and underscores, and must begin with a letter or underscore.  Case is
sensitive, so that @FOO is different from @foo, which is different from @Foo.
The braces around a name can be used when material which follows would
otherwise be interpreted as being part of the name. For instance @FOO_bar
introduces the name "FOO_bar", whereas @{FOO}_bar means the variable named
"FOO" followed by the text "_bar".   There may be whitespace between the @ and
the name, or opening brace. Whitespace is also allowed in the interior of the
braces. It is not significant.

If a variable has no prior binding, then it specifies a match. The
match is determined from some current position in the data: the
character which immediately follows all that has been matched previously.
If a variable occurs at the start of a line, it matches some text
at the start of the line. If it occurs at the end of a line, it matches
everything from the current position to the end of the line.

The extent of the matched text (the text bound to the variable) is determined
by looking at what follows the variable.  A variable may be followed by a piece
of text, a regular expression directive, another variable, or nothing (i.e.
occurs at the end of a line).

If the variable is followed by nothing, the
match extends from the current position in the data, to the end of the line.
Example:

  pattern:      "a b c @FOO"
  data:         "a b c defghijk"
  result:       FOO="defghijk"

If the variable is followed by text (all non-directive material extending to
the end of the line, or to the start of another directive), then the extent of
the match is determined by searching for the first occurrence of that text
within the line, starting at the current position. The variable matches
everything between the current position and the matching position (not
including the matching position). Any whitespace which follows the
variable (and is not enclosed inside braces that surround the variable
name) is part of the text. For example:

  pattern:      "a b @FOO e f"
  data:         "a b c d e f"
  result:       FOO="c d"

In the above example, the pattern text "a b " matches the
data "a b ". So when the @FOO variable is processed, the data being
matched is the remaining "c d e f". The text which follows @FOO
is " e f". This is found within the data "c d e f" at position 3
(counting from 0).  So positions 0-2 ("c d") constitute the matching
text which is bound to FOO.

If the variable is followed by a regular expression directive,
the extent is determined by finding the closest match for the
regular expression. (See Regular Expressions section below).

.SS Consecutive Variables

If an unbound variable is followed by another unbound variable, the
combination is a semantic error which will fail the query. A
diagnostic message will be issued, unless operating in quiet mode via -q.
The reason is that there is no way to bind two consecutive variables to
an extent of text; this is an ambiguous situation, since there is no
matching criterion for dividing the text between two variables.
(In theory, a repetition of the same variable, like @FOO@FOO, could
find a solution by dividing the match extent in half, which would work
only in the case when it contains an even number of characters.
This behavior seems to have dubious value).

An unbound variable may be followed by one which is bound. The bound
variable is replaced by the text which it denotes, and the logic proceeds
accordingly.  Variables are never bound to regular expressions, so
the regular expression match does not arise in this case.
The @* syntax for longest match is available. Example:

  pattern:      "@FOO:@BAR@FOO"
  data:         "xyz:defxyz"
  result:       FOO=xyz, BAR=def

Here, FOO is matched with "xyz", based on the delimiting around the
colon. The colon in the pattern then matches the colon in the data,
so that BAR is considered for matching against "defxyz".
BAR is followed by FOO, which is already bound to "xyz".
Thus "xyz" is located in the "defxyz" data following "def",
and so BAR is bound to "def".

If an unbound variable is followed by a variable which is bound to a list, or
nested list, then each character string in the list is tried in turn to produce
a match. The first match is taken.

.SS Longest Match

The closest-match behavior for text and regular expressions can be
overridden to longest match behavior. A special syntax is provided
for this: an asterisk between the @ and the variable, e.g:

  pattern:      "a @*{FOO}cd"
  data:         "a b cdcdcdcd"
  result:       FOO="b cdcdcd"

  pattern:      "a @{FOO}cd"
  data:         "a b cdcdcd"
  result:       FOO="b "

In the former example, the match extends to the rightmost occurrence of "cd",
and so FOO receives "b cdcdcd".  In the latter example, the *
syntax isn't used, and so a leftmost match takes place. The extent
covers only the "b ", stopping at the first "cd" occurrence.

.SS Positive Match

The syntax variants

 @{NAME /RE/}
 @{NAME NUMBER}

specify a variable binding that is driven by a positive match derived
from a regular expression or character count, rather than from trailing
material (which may be regarded as a "negative" match, since the variable is
bound to material which is
.B skipped
in order to match the trailing material). In the /RE/ form, the match
extends over all characters from the current position which match
the regular expression RE.

In the NUMBER form, the match processes a field of text which
consists of the specified number of characters, which must be nonnegative
number.  If the data line doesn't have that many characters starting at the
current position, the match fails. A match for zero characters produces an
empty string.  The text which is actually matched by this construct
is all text within the specified field, but excluding leading and
trailing whitespace. If the field contains only spaces, then an empty
string is extracted.

A number is made up of digits, optionally preceded by a + or - sign.

This syntax is processed without consideration of what other
syntax follows.  A positive match may be directly followed by an unbound
variable.

.SS Regular Expressions

Like text, a regular expression (regexp) must match text in the data.  A regexp
which occurs at the beginning of a line matches the beginning of a line.  A
regexp which occurs elsewhere, other than following a variable, must match
exactly starting at the current position, where the previous match left off. A
regexp which occurs at the end of a line must match from the current position
to the end of the line.

The semantics of a regular expression which follow variables is
discussed in the preceding section Variables.

A regular expression, as a standalone directive, looks like this:

  @/RE/

where RE is regular expression syntax.
.B txr
contains an original implementation of regular expressions, which
supports the following syntax:
.IP .
matches any character.
.IP []
Character class: matches a single character, from the set specified by
the class. Supports basic regexp character class syntax; no POSIX
notation like [:digit:]. The class [a-zA-Z] means match an uppercase
or lowercase letter; the class [0-9a-f] means match a digit or
a lowercase letter, the class [^0-9] means match a non-digit, et cetera.
A ] or - can be used within a character class, but must be escaped
with a backslash. Two backslashes code for one backslash. So
for instance [\e[\e-] means match a [ or - character, [^^] means match
any character other than ^, and [\e^\e\e] means match either a ^ or a
backslash.
.IP (RE)
If RE is a regular expression, then so is (RE).
The contents of parentheses denote one regular expression unit, so that for
instance in (RE)*, the * operator applies to the entire parenthesized group.
.IP (RE)?
optionally matches the preceding regular expression (RE).
.IP (RE)+
matches the preceding expression one or more times.
.IP (RE)*
matches the preceding expression zero or more times.
.IP (RE1)(RE2)
Two consecutive regular expressions denote catenation:
the left expression must match, and then the right.

.IP (RE1)|(RE2)
matches either the expression RE1 or RE2.

.PP
Any of the special characters, including the delimiting /,  can be escaped with
a backslash to suppress its meaning and denote the character itself.

Furthermore, all of the same escapes are as described in the section Special
Characters in Text above---the difference is that in regular expressions, the @
character is not required, so for example a tab is coded as \et rather
than @\e\t.

Any escaped character which does not fall into the above escaping conventions,
or any unescaped character which is not a regular expression operator, denotes
one-position match of that character itself.

Character classes and parentheses have the highest precedence.

The postfix operators ?, + and * have the second highest precedence, and
associate left to right, so that in A+?*, the * applies to A+?, and the ?
applies to A+.

Catenation is on the next lower precedence rung, so that AB? means "match A,
and then optionally B" not "match A and B, as one optional unit".  The latter
must be written (AB)?  using parentheses to override precedence.

The disjunction operator | has the lowest precedence, lower than catenation.
Thus abc|def means "match abc, or match def". The meaning "match ab,
then c or d, then ef" must be expressed as ab(c|d)ef, or using
a character class: ab[cd]ef.

In
.b txr,
regular expression matches do not span multiple lines. There is no way
to match a newline character since it's simply not internally represented in
the data.

It's possible for a regular expression to match an empty string.
For instance, if the next input character is z, facing a
the regular expression /a?/, there is a zero-character match:
the regular expression's state machine can reach an acceptance
state without consuming any characters. Examples:

  pattern:      @A@/a?/@/.*/
  data:         zzzzz
  result:       A=""

  pattern:      @{A /a?/}@B
  data:         zzzzz
  result:       A="", B="zzzz"

  pattern:      @*A@/a?/
  data:         zzzzz
  result:       A="zzzzz"

In the first example, variable @A is followed by a regular expression
which can match an empty string. The expression faces the letter "z"
at position 0 in the data line. A zero-character match occurs there,
therefore the variable A takes on the empty string. The @/.*/ regular
expression then consumes the line.

Similarly, in the second example, the /a?/ regular expression faces
a "z", and thus yields an empty string which is bound to A. Variable
@B consumes the entire line.

The third example request the longest match for the variable binding.
Thus, a search takes place for the rightmost position where the
regular expression matches. The regular expression matches anywhere,
including the empty string after the last character, which is
the rightmost place. Thus variable A fetches the entire line.

.SS Directives

The general syntax of a directive is:

  @EXPR

where expr is a parenthesized list of subexpressions. A subexpression
is an symbol, number, regular expression, or a parenthesized expression.
So, examples of valid directives are:

  @(banana)

  @(a b c (d e f))

  @(  a (b (c d) (e  ) ))

  @(a /[a-z]*/ b)

A symbol is lexically the same thing as a variable and the same rules
apply. Tokens that look like numbers are treated as numbers.

Some directives are involved in structuring the overall syntax of the query.

There are syntactic constraints that depend on the directive.  For instance the
@(next) directive can take argument material, which is everything that follows
on the same line, until the end of the line.  But @(skip) does not take
argument material.  Most directives must be the first item of a line.

A summary of the available directives follows:

.IP @(next)
Continue matching in another file.

.IP @(block)
The remaining query is treated as an anonymous or named block.
Blocks may be referenced by @(accept) and @(fail) directives.
Blocks are discussed in the section Blocks below.

.IP @(skip)
Treat the remaining query as a subquery unit, and search the lines of
the input file until that subquery matches somewhere.
A skip is also an anonymous block.

.IP @(some)
Match some clauses in parallel. At least one has to match.

.IP @(all)
Match some clauses in parallel. Each one must match.

.IP @(none)
Match some clauses in parallel. None must match.

.IP @(maybe)
Match some clauses in parallel. None must match.

.IP @(collect)
Search the data for multiple matches of a clause. Collect the
bindings in the clause into lists, which are output as array variables.
The @(collect) directive is line oriented. It works with a multi-line
pattern and scans line by line. A similar directive called @(coll)
works within one line.

A collect is an anonymous block.

.IP @(and)
Separator of clauses for @(some), @(all), and @(none).
Equivalent to @(or). Choice is stylistic.

.IP @(or)
Separator of clauses for @(some), @(all), and @(none).
Equivalent to @(and). Choice is stylistic.

.IP @(end)
Required terminator for @(some), @(all), @(none), @(maybe), @(collect),
@(output), and @(repeat).

.IP @(fail)
Terminate the processing of a block, as if it were a failed match.
Blocks are discussed in the section Blocks below.

.IP @(accept)
Terminate the processing of a block, as if it were a successful match.
What bindings emerge may depend on the kind of block: collect
has special semantics.  Blocks are discussed in the section Blocks below.

.IP @(flatten)
Normalizes a set of specified variables to one-dimensional lists. Those
variables which have scalar value are reduced to lists of that value.
Those which are lists of lists (to an arbitrary level of nesting) are converted
to flat lists of their leaf values.

.IP @(merge)
Binds a new variable which is the result of merging two or more
other variables. Merging has somewhat complicated semantics.

.IP @(cat)
Decimates a list (any number of dimensions) to a string, by catenating its
constituent strings, with an optional separator string between all of the
values.

.IP @(bind)
Binds one or more variables against another variable using a structural
pattern. A limited form of unification takes place which can cause a match to
fail.

.IP @(output)
A directive which encloses an output clause in the query. An output section
does not match text, but produces text. The directives above are not
understood in an output clause.

.IP @(repeat)
A directive understood within an @(output) section, for repeating multi-line
text, with successive substitutions pulled from lists. A version @(rept)
produces repeated text within one line.

.PP

.SS The Next Directive

The next directive comes in two forms. It can occur by itself as the
only element in a query line:

  @(next)

Or it may be followed by material, which may contain variables.
All of the variables must be bound. For example:

  @(next)/path/to/@foo.txt

Both forms indicate that the remainder of the query applies
to a new file. The lone @(next) switches to the next file in the
argument list which was passed to the
.B txr
utility. The second form diverts the remainder of the query to a file whose
name is given by the trailing material, after variable substitutions are
performed.

Note that "remainder of the query" refers to the subquery in which
the next directive appears, not necessarily the entire query.

For example, the following query looks for the line starting with "xyz"
at the top of the file "foo.txt", within a some directive.
After the @(end) which terminates the @(some), the "abc" is matched in the
current file.

  @(some)
  @(next)foo.txt
  xyz@suffix
  @(end)
  abc

However, if the @(some) subquery successfully matched "xyz@suffix" within the
file foo.text,  there is now a binding for the suffix variable, which
is globally visible to the remainder of the entire query.

The @(next) directive supports the file name conventions as the command
line. The name - means standard input. Text which starts with a ! is
interpreted as a shell command whose output is read like a file.  These
interpretations are applied after variable substitution. If the file is
specified as @a, but the variable a expands to "!echo foo", then the output of
the "echo foo" command will be processed.

.SS The Skip Directive

The skip directive considers the remainder of the query as a search
pattern. The remainder is no longer required to strictly match at the
current line in the current file. Rather, the current file is searched,
starting with the current line, for the first line where the entire remainder
of the query will successfully match. If no such line is found, the skip
directive fails. If a matching position is found, the remainder of
the query is understood to be processed there.

Of course, the remainder of the query can itself contain skip directives.
Each such directive performs a recursive subsearch.

The skip directive has an optional numeric argument. The value of this
argument limits the range of lines scanned for a match. Judicious use
of this feature can improve the performance of queries.

Example: scan until "size: @SIZE" matches, which must happen within
the next 15 lines:

  @(skip 15)
  size: @SIZE

Without the range limitation skip will keep searching until it consumes
the entire input source. While sometimes this is what is intended,
often it is not. Sometimes a skip is nested within a collect, or
following another skip. For instance, consider:

  @(collect)
  begin @BEG_SYMBOL
  @(skip)
  end @BEG_SYMBOL
  @(end)

The collect iterates over the entire input. But, potentially, so does
the skip. Suppose that "begin x" is matched, but the data has no
matching "end x". The skip will search in vain all the way to the end of the
data, and then the collect will try another iteration back at the
beginning, just one line down from the original starting point.  If it is a
reasonable expectation that an "end x" occurs 15 lines of a "begin x", this can
be written instead:

  @(collect)
  begin @BEG_SYMBOL
  @(skip 15)
  end @BEG_SYMBOL
  @(end)

.SS The Some, All, None and Maybe directives

These directives combine multiple subqueries, which are applied at the same position in parallel. The syntax of all three follows this example:

  @(some)
  subquery1
  .
  .
  .
  @(and)
  subquery2
  .
  .
  .
  @(and)
  subquery3
  .
  .
  .
  @(end)

The @(some), @(all) or @(none) directive must appear as the only element in a
query line. It must be followed by at least one subquery clause, and terminated
by @(end). If there are two or more subqueries, these additional clauses are
indicated by @(and) or @(or), which are interchangeable.  The @(and), @(or) and
@(end) directives also must appear as the only element in a query line.

The syntax supports arbitrary nesting. For example:

  QUERY:            SYNTAX TREE:

  @(all)            all -+
  @  (skip)              +- skip -+
  @  (some)              |        +- some -+
  it                     |        |        +- TEXT
  @  (and)               |        |        +- and
  @    (none)            |        |        +- none -+
  was                    |        |        |        +- TEXT
  @    (end)             |        |        |        +- end
  @  (end)               |        |        +- end
  a dark                 |        +- TEXT
  @(end)                 *- end

nesting can be indicated using whitespace between @ and the
directive expression. Thus, the above is an @(all) query containing a @(skip)
clause which applies to a @(some) that is followed by the the text
line "a dark". The @(some) clause combines the text line "it",
and a @(none) clause which contains just one clause consisting of
the line "was".

The semantics of the some, all, none and maybe directives is:

.IP @(all)
Each of the clauses is matched at the current position. If any of the
clauses fails to match, the directive fails (and thus does not produce
any variable bindings).

.IP @(some)
Each of the clauses is matched at the current position. If any
of the clauses succeed, the directive succeeds. The bindings from
all successful clauses are retained.

.IP @(none)
Each of the clauses is matched at the current position. The
directive succeeds only if all of the clauses fail. If
any clause succeeds, the directive fails. Thus, this
directive never produces variable bindings.

.IP @(maybe)
Each of the clauses is matched at the current position.
The directive succeeds even if all of the clauses fail.
Whatever bindings are found in any of the clauses are
retained.

When a @(some) or @(all) directive matches successfully, or a @(maybe)
directive matches something, the query advances by the greatest number of lines
matched in any of the subclauses. For instance if there are two subclauses, and
one of them matches three lines, but the other one matches five lines, then the
overall clause is considered to have made a five line match at its position. If
more directives follow, they begin matching five lines down from that position.

.SS The Collect Directive

The syntax of the collect directive is:

  @(collect)
  ... lines of subquery
  @(end)

or with an until clause:

  @(collect)
  ... lines of subquery: main clause
  @(until)
  ... lines of subquery: until clause
  @(end)


The subquery is matched repeatedly, starting at the current line.
If it fails to match, it is tried starting at the subsequent line.
If it matches successfully, it is tried at the line following the
entire extent of matched data, if there is one. Thus, the collected regions do
not overlap.

The collect as a whole always succeeds, even if the subquery does not match at
any position, and even if the until clause does not match. That is to say, a
query will never fail for the reason that a collect didn't collect anything.

If no until clause is specified, the collect is unbounded. It consumes the entire data file. If any query material follows such the collect clause, it will
fail if it tries to match anything in the current file; but of course, it
is possible to continue matching in another file by means of @(next).

If an until clause is specified, the collection stops when that clause matches
at the current position. When an until clause matches at a position,
no bindings are collected at that position, even if the main clause
matches at that position also. Moreover, the position is not advanced.
The remainder of the query begins matching at that position.

Example:

  Query:        @(collect)
                @a
                @(until)
                42
                @(end)

  Data:         1
                2
                3
                42
                5
                6

  Output:       a[0]="1"
                a[1]="2"
                a[2]="3"

The line 42 is not collected, even though it matches @a.

The binding variables within the clause of a collect are treated specially.
The multiple matches for each variable are collected into lists,
which then appear as array variables in the final output.

Example:

  Query:        @(collect)
                @a:@b:@c
                @(end)

  Data:         John:Doe:101
                Mary:Jane:202
                Bob:Coder:313

  Output:
                a[0]="John"
                a[1]="Mary"
                a[2]="Bob"
                b[0]="Doe"
                b[1]="Jane"
                b[2]="Coder"
                c[0]="101"
                c[1]="202"
                c[2]="313"

The query matches the data in three places, so each variable becomes
a list of three elements, reported as an array.

Variables with list bindings may be referenced in a query. They denote a
multiple match. The -D command line option can establish a one-dimensional
list binding.

Collect clauses may be nested.   Variable matches collated into lists in an
inner collect, are again collated into nested lists in the outer collect.
Thus an unbound variable wrapped in N nestings of @(collect) will
be an N-dimensional list. A one dimensional list is a list of strings;
a two dimensional list is a list of lists of strings, etc.

It is important to note that the variables which are bound within the main
clause of a collect---i.e. the variables which are subject to
collection---appear, within the collect, as normal one-value bindings. The
collation into lists happens outside of the collect. So for instance in the
query:

 @(collect)
 @x=@x
 @(end)

The left @x establishes a binding for some material preceding an equal sign.
The right @x refers to that binding. The value of @x is different in each
iteration, and these values are collected. What finally comes out of the
collect clause is list variable called x which holds each value that
was ever instantiated under that name within the collect clause.

Also note that the until clause has visibility over the bindings
established in the main clause. This is true even in the terminating
case when the until clause matches, and the bindings of the main clause
are discarded.

.SS The Coll Directive

The coll directive is a kind of miniature version of the collect directive.
Whereas the collect directive works with multi-line clauses on line-oriented
material, coll works within a single line. With coll, it is possible to
recognize repeating regularities within a line and collect lists.

Regular-expression based Positive Match variables work well with coll.

Example: collect a comma-separated list, terminated by a space.

  pattern:  @(coll)@{A /[^, ]+/}@(until) @(end)@B
  data:     foo,bar,xyzzy blorch
  result:   A[0]="foo"
            A[1]="bar"
            A[2]="xyzzy"
            B=blorch

Here, the variable A is bound to tokens which match the regular
expression /[^, ]+/: non-empty sequence of characters other than commas or
spaces.

Like its big cousin, the coll directive searches for matches.  If no match
occurs at the current character position, it tries at the next character
position. Whenever a match occurs, it continues at the character position which
follows the last character of the match, if such a position exists.

If not bounded by an until clause, it will exhaust the entire line.  If the
until clause matches, then the collection stops at that position,
and any bindings from that iteration are discarded.

Coll clauses nest, and variables bound within a coll are available to within
the rest of the coll clause, including the until clause, and appear as single
values.  The final list aggregation is only visible after the coll clause.

The behavior of coll is troublesome, when delimited variables are used,
because in text file formats, the material which separates items is not
repeated after the last item. For instance, a comma-separated list usually
not appear as "a,b,c," but rather "a,b,c". There might not be any explicit
termination---the last item might be at the very end of the line.

So for instance, the following result is not satisfactory:

  pattern:      @(coll)@a @(end)
  data:         1 2 3 4 5
  result:       a[0]="1"
                a[1]="2"
                a[2]="3"
                a[3]="4"

What happened to the 5? After matching "4 ", coll continues to look for
matches. It tries "5", which does not match, because it is not followed by a
space. Then the line is consumed.  So in this sequence, a valid item is either
followed by a space, or by nothing. So it is tempting to try this:

  pattern:      @(coll)@a@/ ?/@(end)
  data:         1 2 3 4 5
  result:       a[0]=""
                a[1]=""
                a[2]=""
                a[3]=""
                a[4]=""
                a[5]=""
                a[6]=""
                a[7]=""
                a[8]=""

however, the problem is that the regular expression / ?/ (match either a space
or nothing), matches at any position.  So when it is used as a variable
delimiter, it matches at the current position, which binds the empty string to
the variable, the extent of the match being zero. In this situation, the coll
directive proceeds character by character. The solution is to use
positive matching: specify the regular expression which matches the item,
rather than a trying to match whatever follows.  The collect directive will
recognize all items which match the regular expression.

  pattern:      @(coll)@{a /[^ ]+/}@(end)
  data:         1 2 3 4 5
  result:       a[0]="1"
                a[1]="2"
                a[2]="3"
                a[3]="4"
                a[4]="5"

The until clause can specify a pattern which, when recognized, terminates
the collection. So for instance, suppose that the list of items may
or may not be terminated by a semicolon. We must exclude
the semicolon from being a valid character inside an item, and
add an until clause which recognizes a semicolon:

  pattern:      @(coll)@{a /[^ ;]+/}@(until);@(end);

  data:         1 2 3 4 5;
  result:       a[0]="1"
                a[1]="2"
                a[2]="3"
                a[3]="4"
                a[4]="5"

  data:         1 2 3 4 5;
  result:       a[0]="1"
                a[1]="2"
                a[2]="3"
                a[3]="4"
                a[4]="5"

Semicolon or not, the items are collected properly.

Note that the @(end) is followed by a semicolon. That's because
when the @(until) clause meets a match, the matching material
is not consumed.

.SS The Flatten Directive.

The flatten directive can be used to convert variables to one dimensional
lists. Variables which have a scalar value are converted to lists containing
that value. Variables which are multidimensional lists are flattened to
one-dimensional lists.

Example (without @(flatten))

  pattern:      @b
                @(collect)
                @(collect)
                @a
                @(end)
                @(end)

  data:         0
                1
                2
                3
                4
                5

  result:       b="0"
                a_0[0]="1"
                a_1[0]="2"
                a_2[0]="3"
                a_3[0]="4"
                a_4[0]="5"

Example (with flatten):

  pattern:      @b
                @(collect)
                @(collect)
                @a
                @(end)
                @(end)
                @(flatten a b)

  data:         0
                1
                2
                3
                4
                5

  result:       b[0]="0"
                a[0]="1"
                a[1]="2"
                a[2]="3"
                a[3]="4"
                a[4]="5"


.SS The Cat Directive

The @(cat) directive converts a list variable into a single
piece of text. Optionally, a separating piece of text can be inserted
in between the elements. This piece is written to the right of
the @(cat) directive, and spans to the end of the line. It may
contain variable substitutions.

Example:

  pattern:      @(coll)@{a /[^ ]+/}@(end)
                @(cat a):
  data:         1 2 3 4 5
  result:       a="1:2:3:4:5"


.SS The Bind Directive

The @(bind) directive is a kind of pattern match, which matches one or more
variables on the left hand side to the value of a variable on the right hand
side.  The right hand side variable must have a binding, or else the directive
fails. Any variables on the left hand side which are unbound receive a matching
piece of the right hand side value. Any variables on the left which are already
bound must match their corresponding value, or the bind fails. Any variables
which are already bound and which do match their corresponding value remain
unchanged (the match can be inexact).

The simplest bind is of one variable against itself, for instance bind A
against A:

  @(bind A A)

This will fail if A is not bound, (and complain loudly). If A is bound, it
succeeds, since A matches A.

The next simplest bind binds one variable to another:

  @(bind A B)

Here, if A is unbound, it takes on the same value as B. If A is bound, it has
to match B, or the bind fails. Matching means that either

- A and B are the same text
- A is text, B is a list, and A occurs within B.
- vice versa: B is text, A is a list, and B occurs within A.
- A and B are lists and are either identical, or one is
  found as substructure within the other.

The left hand side of a bind can be a nested list pattern containing variables.
The last item of a list at any nesting level can be preceded by a dot, which
means that the variable matches the rest of the list from that position.

Example: suppose that the list A contains ("now" "now" "brown" "cow"). Then the
directive @(bind (H N . C) A), assuming that H, N and C are unbound variables,
will bind H to "how", N to "now", and C to the remainder of the list ("brown"
"cow").

Example: suppose that the list A is nested to two dimensions and  contains
(("how" "now") ("brown" "cow")). Then @(bind ((H N) (B C)) A)
binds H to "how", N to "now", B to "brown" and C to "cow".

The dot notation may be used at any nesting level. it must be preceded and
followed by a symbol: the forms (.) (. X) and (X .) are invalid.

.SH BLOCKS

.SS Introduction

Blocks are sections of a query which are denoted by a name. Blocks denoted by
the name nil are understood as anonymous.

The @(block NAME) directive introduces a named block, except when the name is
the word nil.  The @(block) directive introduces an unnamed block, equivalent
to @(block nil).

The @(skip) and @(collect) directives introduce implicit anonymous blocks.

.SS Block Scope

The names of blocks are in a distinct namespace from the variable binding
space. So @(block foo) has no interaction with the variable @foo.

A block extends from the @(block ...) directive which introduces it,
to the end of the subquery in which that directive is contained. For instance:

  @(some)
  abc
  @(block foo)
  xyz
  @(end)

Here, the block foo occurs in a @(some) clause, and so it extends to the @(end)
which terminates that clause.  After that @(end), the name foo is not
associated with a block (is not "in scope"). A block which is not contained in
any subquery extends to the end of the overall query.  Blocks are never
terminated by @(end).

The implicit anonymous blocks introduced by @(skip) has the same scope
as the @(skip): it extends over all of the material which follows the skip, to the end of the containing subquery.

The scope of the implicit anonymous block introduced by @(collect) spans only
that collect coincides with the scope of that collect: from the @(collect)
to its matching @(end).

.SS Block Nesting

Blocks may nest, and nested blocks may have the same names as blocks in
which they are nested. For instance:

  @(block)
  @(block)
  ...

is a nesting of two anonymous blocks, and

  @(block foo)
  @(block foo)

is a nesting of two named blocks which happen to have the same name.
When a nested block has the same name as an outer block, it creates
a block scope in which the outer block is "shadowed"; that is to say,
directives which refer to that block name within the nested block refer to the
inner block, and not to the outer one.

A more complicated example of nesting is:

  @(skip)
  abc
  @(block)
  @(some)
  @(block foo)
  @(end)

Here, the @(skip) introduces an anonymous block. The explicit anonymous
@(block) is nested within skip's anonymous block and shadows it.
The foo block is nested within both of these.

.SS Block Semantics

A block normally does nothing. The query material in the block is evaluated
normally. However, a block serves as a termination point for @(fail) and
@(accept) directives which are in scope of that block and refer to it.

The precise meaning of these directives is:

.IP @(fail\ NAME)

Immediately terminate the enclosing query block called NAME, as if that block failed to match anything. If more than one block by that name encloses
the directive, the inner-most block is terminated. No bindings
emerge from a failed block.

.IP @(fail)

Immediately terminate the innermost enclosing anonymous block, as if
that block failed to match.

If the implicit block introduced by @(skip) is terminated in this manner,
this has the effect of causing the skip itself to fail. I.e. the behavior
is as if skip search did not find a match for the trailing material,
except that it takes place prematurely (before the end of the available
data source is reached).

If the implicit block associated with a @(collect)  is terminated this way,
then the entire collect fails. This is a special behavior, because a
collect normally does not fail, even if it matches and collects nothing!

To prematurely terminate a collect by means of its anonymous block, without
failing it, use @(accept).

.IP @(accept\ NAME)

Immediately terminate the enclosing query block called NAME, as if that block
successfully matched. If more than one block by that name encloses the
directive, the inner-most block is terminated.  Any bindings established within
that block until this point emerge from that block.

.IP @(accept)

Immediately terminate the innermost enclosing anonymous block, as if
that block successfully mached. Any bindings established within
that block until this point emerge from that block.

If the implicit block introduced by @(skip) is terminated in this manner,
this has the effect of causing the skip itself to succeed, as if
all of the trailing material succesfully matched.

If the implicit block associated with a @(collect)  is terminated this way,
then the collection stops. All bindings collected in the current iteration of
the collect are discarded. Bindings collected in previous iterations are
retained, and collated into lists in accordance with the semantics of collect.

Example: alternative way to @(until) termination:

  @(collect)
  @  (maybe)
  ---
  @  (accept)
  @  (end)
  @LINE
  @(end)

This query will collect entire lines into a list called LINE. However,
if the line --- is matched (by the embedded @(maybe)), the collection
is terminated. Only the lines up to, and not including the --- line,
are collected. The effect is identical to:

  @(collect)
  @LINE
  @(until)
  ---
  @(end)

The difference (not relevant in these examples) is that the until clause has
visibility into the bindings set up by the main clause.

However, the following example has a different meaning:

  @(collect)
  @LINE
  @  (maybe)
  ---
  @  (accept)
  @  (end)
  @(end)

Now, lines are collected until the end of the data source, or until a line is
found which is followed by a --- line. If such a line is found,
the collection stops, and that line is not included in the collection!
The @(accept) terminates the process of the collect body, and so the
action of collecting the last @LINE binding into the list is not performed.

.SS Data Extent of Terminated Blocks

A query block may have matched some material prior to being terminated by
accept. In that case, it is deemed to have only matched that material,
and not any material which follows. This may matter, depending on the context
in which the block occurs.

Example:

  Query:        @(some)
                @(block foo)
                @first
                @(accept foo)
                @ignored
                @(end)
                @second

  Data:         1
                2
                3

  Output:       first="1"
                second="2"

At the point where the accept occurs, the foo block has matched the first line,
bound the text "1" to the variable @first. The block is then terminated.
Not only does the @first binding emerge from this terminated block, but
what also emerges is that the block advanced the data past the first line to
the second line. So next, the @(some) directive ends, and propagates the
bindings and position. Thus the @second which follows then matches the second
line and takes the text "2".

In the following query, the foo block occurs inside a maybe clause.
Inside the foo block there is a @(some) clause. Its first subclause
matches variable @first and then terminates block foo. Since block foo is
outside of the @(some) directive, this has the effect of terminating the
@(some) clause:

  Query:        @(maybe)
                @(block foo)
                @  (some)
                @first
                @  (accept foo)
                @  (or)
                @one
                @two
                @three
                @four
                @  (end)
                @(end)
                @second

  Data:         1
                2
                3
                4
                5

  Output:       first="1"
                second="2"

The second clause of the @(some) directive, namely:

  @one
  @two
  @three
  @four

is never processed. The reason is that subclauses are processed in top
to bottom order, but the processing was aborted within the
first clause the @(accept foo). The @(some) construct never had the
opportunity to match four lines.

If the @(accept foo) line is removed from the above query, the output
is different:

  Query:        @(maybe)
                @(block foo)
                @  (some)
                @first
                @#          <--  @(accept foo) removed from here!!!
                @  (or)
                @one
                @two
                @three
                @four
                @  (end)
                @(end)
                @second

  Data:         1
                2
                3
                4
                5

  Output:       first="1"
                one="1"
                two="2"
                three="3"
                four="4"
                second="5"

Now, all clauses of the @(some) directive have the opportunity to match.
The second clause grabs four lines, which is the longest match.
And so, the next line of input available for matching is 5, which goes
to the @second variable.

.SH OUTPUT

A
.B txr
query may perform custom output. Output is performed by @(output) clauses,
which may be embedded anywhere in the query, or placed at the end.  Output
occurs as a side effect of producing a part of a query which contains an
@(output) directive, and is executed even if that part of the query ultimately
fails to find a match. Thus output can be useful for debugging.
An output clause specifies that its output goes to a file, pipe, or (by
default) standard output. If any output clause is executed whose destination is
standard output,
.B txr
makes a note of this, and later, just prior to termination, suppresses the
usual printing of the variable bindings or the word false.

.SS The Output Directive

The syntax of the @(output) directive is:

  @(output)...optional destination...
  .
  . one or more output directives or lines
  .
  @(end)

The optional destination is a filename, the special name, - which
redirects to standard output, or a shell command preceded by the ! symbol.
Variables are substituted in the directive.

.SS Output Text

Text in an output clause is not matched against anything, but is output
verbatim to the destination file, device or command pipe.

.SS Output Variables

Variables occurring in an output clause do not match anything, but instead their
contents are output. A variable being output must be a simple string, not a
list. Lists may be output within @(repeat) or @(rep) clauses. A list variable
must be wrapped in as many nestings of these clauses as it has dimensions.  For
instance, a two-dimensional list may be mentioned in output if it is inside a
@(rep) or @(repeat) clause which is itself wrapped inside another @(rep) or
@(repeat) clause.

In an output clause, the @{NAME NUMBER} variable syntax generates fixed-width
field, which contains the variable's text.  The absolute value of the
number specifies the field width. For instance -20 and 20 both specify a field
width of twenty.  If the text is longer than the field, then it overflows the
field. If the text is shorter than the field, then it is left-adjusted within
that field, if the width is specified as a positive number, and right-adjusted
if the width is specified as negative.

.SS The Repeat Directive

The repeat directive is generates repeated text from a ``boilerplate'',
by taking successive elements from lists. The syntax of repeat is
like this:

  @(repeat)
  .
  .
  main clause material, required
  .
  .
  special clauses, optional
  .
  .
  @(end)

Repeat has four types of special clauses, any of which may be
specified with empty contents, or omitted entirely. They are explained
below.

All of the material in the main clause and optional clauses
is examined for the presence of variables.  If none of the variables
hold lists which contain at least one item, then no output is performed,
(unless the repeat specifies an @(empty) clause, see below).
Otherwise, among those variables which contain non-empty lists, repeat finds
the length of the longest list. This length of this list determines the number
of repetitions, R.

If the repeat contains only a main clause, then the lines of this clause is
output R times. Over the first repetition, all of the variables which, outside
of the repeat, contain lists are locally rebound to just their first item. Over
the second repetition, all of the list variables are bound to their second
item, and so forth. Any variables which hold shorter lists than the longest
list eventually end up with empty values over some repetitions.

Example: if the list A holds "1", "2" and "3"; the list B holds "A", "B";
and the variable C holds "X", then

  @(repeat)
  >> @C
  >> @A @B
  @(end)

will produce three repetitions (since there are two lists, the longest
of which has three items). The output is:

  >> X
  >> 1 A
  >> X
  >> 2 B
  >> X
  >> 3

The last line has a trailing space, since it is produced by "@A @B",
where @B has an empty value. Since C is not a list variable, it
produces the same value in each repetition.

The special clauses are:

.IP @(single)
If the repeat produces exactly one repetition, then the contents of this clause
are processed for that one and only repetition, instead of the main clause
or any other clause which would otherwise be processed.

.IP @(first)
The body of this clause specifies an alternative body to be used for the first
repetition, instead of the material from the main clause.

.IP @(last)
The body of this clause is used instead of the main clause for the last
repetition.

.IP @(empty)
If the repeat produces no repetitions, then the body of this clause is output.
If this clause is absent or empty, the repeat produces no output.

.PP
The precedence among the clauses which take an iteration is:
single > first > last > main.   That is if two or more of these clauses
can apply to a repetition, then the leftmost one in this precedence list
applies. For instance, if there is just a single repetition, then any of these
special clause types can apply to that repetition, since it is the only
repetition, as well as the first and last one. In this situation, if
there is a single clause present, then the repetition is processed
using that clause. Otherwise, if there is a first clause present, that
clause is used. Failing that, a last clause applies. Only if none of these
clauses are present will the repetition be processed using the main clause.

.SS Nested Repeats

If a repeat clause encloses variables which holds multidimensional lists,
those lists require additional nesting levels of repeat (or rep).
It is an error to attempt to output a list variable which has not been
decimated into primary elements via a repeat construct.

Suppose that a variable X is two-dimensional (contains a list of lists).  X
must be twice nested in a repeat. The outer repeat will walk over the lists
contained in X. The inner repeat will walk over the elements of each of these
lists.

A nested repeat may be embedded in any of the clauses of a repeat,
not only the main clause.

.SS The Rep Directive

The @(rep) directive is similar to @(repeat), but whereas @(repeat) is line
oriented, @(rep) generates material within a line. It has all the same clauses,
but everything is specified within one line:

  @(rep)... main material ... .... special clauses ...@(end)

More than one @(rep) can occur within a line, mixed with other material.
A @(rep) can be nested within a @(repeat) or within another @(rep).

.SS Repeat and Rep Examples

Example 1: show the list L in parentheses, with spaces between
the elements, or the symbol NIL if the list is empty:

  @(output)
  @(rep)@L @(single)(@L)@(first)(@L @(last)@L)@(empty)NIL@(end)
  @(end)

Here, the @(empty) clause specifies NIL. So if there are no repetitions,
the text NIL is produced. If there is a single item in the list L,
then  @(single)(@L) produces that item between parentheses.  Otherwise
if there are two or more items, the first item is produced with
a leading parenthesis followed by a space by @(first)(@L , and
the last item is produced with a closing parenthesis: @(last)@L).
All items in between are emitted with a trailing space by
the main clause: @(rep)@L .

Example 2: show the list L like Example 1 above, but the empty list is ().

  @(output)
  (@(rep)@L @(last)@L@(end))
  @(end)

This is simpler. The parentheses are part of the text which
surrounds the @(rep) construct, produced unconditionally.
If the list L is empty, then @(rep) produces no output, resulting in ().
If the list L has one or more items, then they are produced with
spaces each one, except the last which has no space.
If the list has exactly one item, then the @(last) applies to it
instead of the main clause: it is produced with no trailing space.

.SH NOTES ON FALSE

The reason for printing the word
.IR false
on standard output when
a query doesn't match, in addition to returning a failed termination
status, is that the output of
.B txr
may be collected by a shell script, by the application of eval to command
substitution syntax. Printing
.IR false
will cause eval to evaluate the
.IR false
command, and thus failed status will propagate from the eval
itself.   The eval command conceals the termination status of a
program run via command substitution.  That is to say, if a program
fails, without producing output, its output is substituted into the eval
command which then succeeds, masking the failure of the program. For example:

  eval "$(false)"

appears successful: the false utility indicates a failed status, but
produces no output. Eval evaluates an empty script and reports success;
the failed status of the false program is forgotten.
Note the difference between the above and this:

  eval "$(echo false)"

This command has a failed status. The echo prints the word false and succeeds;
this false word is then evaluated as a script, and thus interpreted as the
false command which fails. This failure
.B is
propagated as the result of the eval
command.