TXR - Programming Language (Version 275)
txr [ options ] [ script-file [ arguments ... ]]
TXR is a general-purpose, multi-paradigm programming language. It comprises two languages integrated into a single tool: a text scanning and extraction language referred to as the TXR Pattern Language (sometimes just "TXR"), and a general-purpose dialect of Lisp called TXR Lisp.
TXR can be used for everything from "one liner" data transformation tasks at the command line, to data scanning and extracting scripts, to full application development in a wide range of areas.
A script written in the TXR Pattern Language, also referred to in this document as a query, specifies a pattern which matches one or more sources of inputs, such as text files. Patterns can consist of large chunks of multiline free-form text, which is matched literally against material in the input sources. Free variables occurring in the pattern (denoted by the @ symbol) are bound to the pieces of text occurring in the corresponding positions. Patterns can be arbitrarily complex, and can be broken down into named pattern functions, which may be mutually recursive.
In addition to embedded variables which implicitly match text, the TXR pattern language supports a number of directives, for matching text using regular expressions, for continuing a match in another file, for searching through a file for the place where an entire subquery matches, for collecting lists, and for combining subqueries using logical conjunction, disjunction and negation, and numerous others.
Patterns can contain actions which transform data and generate output. These actions can be embedded anywhere within the pattern-matching logic. A common structure for small TXR scripts is to perform a complete matching session at the top of the script, and then deal with processing and reporting at the bottom.
The TXR Lisp language can be used from within TXR scripts as an embedded language, or completely standalone. It supports functional, imperative and object-oriented programming, and provides numerous data types such as symbols, strings, vectors, hash tables with weak reference support, lazy lists, and arbitrary-precision ("bignum") integers. It has an expressive foreign function interface (FFI) for calling into libraries and other software components that support C-language-style calls.
TXR Lisp source files as well as individual functions can be optionally compiled for execution on a virtual machine that is built into TXR. Compiled files execute and load faster, and resist reverse-engineering. Standalone application delivery is possible.
TXR is free software offered under the two-clause BSD license which places almost no restrictions on redistribution, and allows every conceivable use, of the whole software or any constituent part, royalty-free, free of charge, and free of any restrictions.
If TXR is given no arguments, it will enter into an interactive mode. See the INTERACTIVE LISTENER section for a description of this mode. When TXR enters interactive mode this way, it prints a one-line banner announcing the program name and version, and one line of help text instructing the user how to exit.
Unless the -c or -f options are present, the first non-option argument is treated as a script-file which is executed. This is described after the following descriptions of all of the options. Any additional arguments have no fixed meaning; they are available to the TXR query or TXR Lisp application for specifying input files to be processed, or other meanings under the control of the application.
Options which don't take an argument may be combined together. The -v and -q options are mutually exclusive. Of these two, the one which occurs in the rightmost position in the argument list dominates. The -c and -f options are also mutually exclusive; if both are specified, it is a fatal error.
Normally, if this stream is connected to a terminal device, it is automatically marked as having the real-time property when TXR starts up (see the functions stream-set-prop and real-time-stream-p). The -n option suppresses this behavior; the *stdin* stream remains ordinary.
The TXR pattern language reads standard input via a lazy list, created by applying the lazy-stream-cons function to the *stdin* stream. If that stream is marked real-time, then the lazy list which is returned by that function has behaviors that are better suited for scanning interactive input. A more detailed explanation is given under the description of this function.
If the -n option is effect and TXR enters into the interactive listener, the listener operates in plain mode. The listener reads buffered lines from the operating system without any character-based editing features or history navigation. In plain mode, no prompts appear and no terminal control escape sequences are generated. The only output is the results of evaluation, related diagnostic messages, and any output generated by the evaluated expressions themselves.
V_0_0[0]="a"
V_0_1[0]="b"
V_1_0[0]="c"
V_1_1[0]="d"
V_0_0[1]="e"
V_0_1[1]="f"
V_1_0[1]="g"
V_1_1[1]="h"
With -a 2, it comes out as:
V_0[0][0]="a"
V_1[0][0]="b"
V_0[0][1]="c"
V_1[0][1]="d"
V_0[1][0]="e"
V_1[1][0]="f"
V_0[1][1]="g"
V_1[1][1]="h"
The leftmost bracketed index is the most major index. That is to say, the dimension order is: NAME_m_m+1_..._n[1][2]...[m-1].
Example:
Shell script which uses TXR to read two lines "1" and "2" from standard input, binding them to variables a and b. Standard input is specified as - and the data comes from shell "here document" redirection:
txr -B -c "@a
@b" - <<!
1
2
!
The @; comment syntax can be used for better formatting:
txr -B -c "@;
@a
@b"
The requested value of N can be too low, in which case TXR will complain and exit with an unsuccessful termination status. This indicates that TXR refuses to be compatible with such an old version. Users requiring the behavior of that version will have to install an older version of TXR which supports that behavior, or even that exact version.
If the option is specified more than once, the behavior is not specified.
Compatibility can also be requested via the TXR_COMPAT environment variable instead of the -C option.
For more information, see the COMPATIBILITY section.
Note that --lisp and --compiled influence how the argument of the -f option is treated, but only if they precede that option.
If the file has a recognized suffix: ".tl", ".tlo", ".txr" or ".txr_profile", then these options have no effect. The suffix determines the interpretation of the content. Moreover, no suffix search takes place: only the given path name is tried.
After the options, the remaining arguments are treated as follows.
If neither the -f nor the -c options were specified, then the first argument is treated as the script-file. If no arguments are present, then TXR enters interactive mode, provided that none of the -e, -p, -P or -t options had been processed, in which case it instead terminates.
The TXR Pattern Language has features for implicitly treating the subsequent command-line arguments as input files. It follows the convention that an argument consisting of a single - (dash) character specifies that standard input is to be used, instead of opening a file. If the query does not use the @(next) directive to select an alternative data source, and a pattern-matching construct is processed which demands data, then the first argument will be opened as a data source. Arguments not opened as data sources can be assigned alternative meanings and uses, or can be ignored entirely, under control of the query.
Specifying standard input as a source with an explicit - argument is unnecessary. If no arguments are present, then TXR scans standard input by default. This was not true in versions of TXR prior to 171; see the COMPATIBILITY section.
TXR begins by reading the script, which is given as the contents of the argument of the -c option, or else as the contents of an input source specified by the -f option or by the script-file argument. If -f or the script-file argument specify - (dash) then the script is read from standard input.
In the case of the TXR pattern language, the entire query is scanned, internalized, and then begins executing, if it is free of syntax errors. (TXR Lisp is processed differently, form by form.) On the other hand, the pattern language reads data files in a lazy manner. A file isn't opened until the query demands material from that file, and then the contents are read on demand, not all at once.
The suffix of the script-file is significant. If the name has no suffix, or if it has a ".txr" suffix, then it is assumed to be in the TXR pattern language. If it has the ".tl" suffix, then it is assumed to be TXR Lisp. The --lisp and --compiled options change the treatment of unsuffixed script file names, causing them to be interpreted as TXR Lisp source or compiled TXR Lisp, respectively.
If a file name is specified which does not have a recognized suffix, and names a file which doesn't exist, then TXR adds the ".txr" suffix and tries again. If that doesn't exist, another attempt is made with the ".tlo" suffix, which will be treated as as a TXR Lisp compiled file. Finally, if that doesn't exist, the ".tl" suffix is tried, which will be treated as containing TXR Lisp source. If either the --lisp or --compiled option has been specified, then TXR skips trying the ".txr" suffix, and tries only ".tlo" followed by ".tl".
A TXR Lisp file is processed as if by the load macro: forms from the file are read and evaluated. If the forms do not terminate the TXR process or throw an exception, and there are no syntax errors, then TXR terminates successfully after evaluating the last form. If syntax errors are encountered in a form, then TXR terminates unsuccessfully. TXR Lisp is documented in the section TXR LISP.
If a query file is specified, but no file arguments, it is up to the query to open a file, pipe or standard input via the @(next) directive prior to attempting to make a match. If a query attempts to match text, but has run out of files to process, the match fails.
TXR sends errors and verbose logs to the standard error device. The following paragraphs apply when TXR is run without enabling verbose mode with -v, or the printing of variable bindings with -B or -a.
If the command-line arguments are incorrect, TXR issues an error diagnostic and terminates with a failed status.
If the script-file specifies a query, and the query has a malformed syntax, TXR likewise issues error diagnostics and terminates with a failed status.
If the query fails due to a mismatch, TXR terminates with a failed status. No diagnostics are issued.
If the query is well-formed, and matches, then TXR issues no diagnostics, and terminates with a successful status.
In verbose mode (option -v), TXR issues diagnostics on the standard error device even in situations which are not erroneous.
In bindings-printing mode (options -B or -a), TXR prints the word false if the query fails, and exits with a failed termination status. If the query succeeds, the variable bindings, if any, are output on standard output.
If the script-file is TXR Lisp, then it is processed form by form. Each top-level Lisp form is evaluated after it is read. If any form is syntactically malformed, TXR issues diagnostics and terminates unsuccessfully. This is somewhat different from how the pattern language is treated: a script in the pattern language is parsed in its entirety before being executed.
A query may contain comments which are delimited by the sequence @; and extend to the end of the line. Whitespace can occur between the @ and ;. A comment which begins on a line swallows that entire line, as well as the newline which terminates it. In essence, the entire comment line disappears. If the comment follows some material in a line, then it does not consume the newline. Thus, the following two queries are equivalent:
The comment after the @a does not consume the newline, but the comment which follows does. Without this intuitive behavior, line comment would give rise to empty lines that must match empty lines in the data, leading to spurious mismatches.
Instead of the ; character, the # character can be used. This is an obsolescent feature.
TXR has several features which support use of the hash-bang convention for creating apparently standalone executable programs.
This removal allows for TXR queries to be turned into standalone executable programs in the POSIX environment using the hash-bang mechanism. Unlike most interpreters, TXR applies special processing to the #! line, which is described below, in the section Argument Generation with the Null Hack.
Shell session example: create a simple executable program called "twoline.txr" and run it. This assumes TXR is installed in /usr/bin.
$ cat > hello.txr
#!/usr/bin/txr
@(bind a "Hey")
@(output)
Hello, world!
@(end)
$ chmod a+x hello.txr
$ ./hello.txr
Hello, world!
When this plain hash-bang line is used, TXR receives the name of the script as an argument. Therefore, it is not possible to pass additional options to TXR. For instance, if the above script is invoked like this
$ ./hello.txr -B
the -B option isn't processed by TXR, but treated as an additional argument, just as if txr script-file -B had been executed directly.
This behavior is useful if the script author wants not to expose the TXR options to the user of the script.
However, the hash-bang line can use the -f option:
#!/usr/bin/txr -f
Now, the name of the script is passed as an argument to the -f option, and TXR will look for more options after that, so that the resulting program appears to accept TXR options. Now we can run
$ ./hello.txr -B
Hello, world!
a="Hey"
The -B option is honored.
#!/usr/bin/txr -B -f
To support systems like this, TXR supports the special argument --args, as well as an extended version, --eargs. With --args, it is possible to encode multiple arguments into one argument. The --args option must be followed by a separator character, chosen by the programmer. The characters after that are split into multiple arguments on the separator character. The --args option is then removed from the argument list and replaced with these arguments, which are processed in its place.
Example:
#!/usr/bin/txr --args:-B:-f
The above has the same behavior as
#!/usr/bin/txr -B -f
on a system which supports multiple arguments in the hash-bang line. The separator character is the colon, and so the remainder of that argument, -B:-f, is split into the two arguments -B -f.
The --eargs option is similar to --args, but must be followed by one more argument. After --eargs performs the argument splitting in the same manner as --args, any of the arguments which it produces which are the two-character sequence {} are replaced with that following argument. Whether or not the replacement occurs, that following argument is then removed.
Example:
#!/usr/bin/txr --eargs:-B:{}:--foo:42
This has an effect which cannot be replicated in any known implementation of the hash-bang mechanism. Suppose that this hash-bang line is placed in a script called script.txr. When this script is invoked with arguments, as in:
script.txr a b c
then TXR is invoked similarly to:
/usr/bin/txr --eargs:-B:{}:--foo:42 script.txr a b c
Then, when --eargs processing takes place, firstly the argument sequence
-B {} --foo 42
is produced by splitting into four fields using the : (colon) character as the separator. Then, within these four fields, all occurrences of {} are replaced with the following argument script.txr, resulting in:
-B script.txr --foo 42
Furthermore, that script.txr argument is removed from the remaining argument list.
The four arguments are then substituted in place of the original --eargs:-B:{}:--foo:42 syntax.
The resulting TXR invocation is, therefore:
/usr/bin/txr -B script.txr --foo 42 a b c
Thus, --eargs allows some arguments to be encoded into the interpreter script, such that script name is inserted anywhere among them, possibly multiple times. Arguments for the interpreter can be encoded, as well as arguments to be processed by the script.
#!/usr/bin/env txr
Here, the env utility searches for the txr program in the directories indicated by the PATH variable, which liberates the script from having to encode the exact location where the program is installed. However, if the operating system allows only one argument in the hash-bang mechanism, then no arguments can be passed to the program.
To mitigate this problem, TXR supports a special feature in its hash-bang support. If the hash-bang line contains a null byte, then the text from after the null byte until the end of the line is split into fields using the space character as a separator, and these fields are inserted into the command line. This manipulation happens during command-line processing, i.e. prior to the execution of the file. If this processing is applied to a file that is specified using the -f option, then the arguments which arise from the special processing are inserted after that option and its argument. If this processing is applied to the file which is the first non-option argument, then the options are inserted before that argument. However, care is taken not to process that argument a second time. In either situation, processing of the command-line options continues, and the arguments which are processed next are the ones which were just inserted. This is true even if the options had been inserted as a result of processing the first non-option argument, which would ordinarily signal the termination of option processing.
In the following examples, it is assumed that the script is named, and invoked, as /home/jenny/foo.txr, and is given arguments --bar abc, and that txr resolves to /usr/bin/txr. The <NUL> code indicates a literal ASCII NUL character (the zero byte).
Basic example:
#!/usr/bin/env txr<NUL>-a 3
Here, env searches for txr, finding it in /usr/bin. Thus, including the executable name, TXR receives this full argument list:
/usr/bin/txr /home/jenny/foo.txr --bar abc
The first non-option argument is the name of the script. TXR opens the script, and notices that it begins with a hash-bang line. It consumes the hash-bang line and finds the null byte inside it, retrieving the character string after it, which is "-a 3". This is split into the two arguments -a and 3, which are then inserted into the command line ahead of the the script name. The effective command line then becomes:
/usr/bin/txr -a 3 /home/jenny/foo.txr --bar abc
Command-line option processing continues, beginning with the -a option. After the option is processed, /home/jenny/foo.txr is encountered again. This time it is not opened a second time; it signals the end of option processing, exactly as it would immediately do if it hadn't triggered the insertion of any arguments.
Advanced example: use env to invoke txr, passing options to the interpreter and to the script:
#!/usr/bin/env txr<NUL>--eargs:-C:175:{}:--debug
This example shows how --eargs can be used in conjunction with the null hack. When txr begins executing, it receives the arguments
/usr/bin/txr /home/jenny/foo.txr
The script file is opened, and the arguments delimited by the null character in the hash-bang line are inserted, resulting in the effective command line:
/usr/bin/txr --eargs:-C:175:{}:--debug /home/jenny/foo.txr
Next, --eargs is processed in the ordinary way, transforming the command line into:
/usr/bin/txr -C 175 /home/jenny/foo.txr --debug
The name of the script file is encountered, and signals the end of option processing. Thus txr receives the -C option, instructing it to emulate some behaviors from version 175, and the /home/jenny/foo.txr script receives --debug as its argument: it executes with the *args* list containing one element, the character string "--debug".
The hash-bang null-hack feature was introduced in TXR 177. Previous versions ignore the hash-bang line, performing no special processing. Where a risk exists that programs which depend on the feature might be executed by an older version of TXR, care must be taken to detect and handle that situation, either by means of the txr-version variable, or else by some logic which infers that the processing of the hash-bang line hasn't been performed.
It is possible to use the Hash-Bang Null Hack, such that the resulting executable program recognizes TXR options. This is made possible by a special behavior in the processing of the -f option.
For instance, suppose that the effect of the following familiar hash-bang line is required:
#!/path/to/txr -f
However, suppose there is also a requirement to use the env utility to find TXR. Furthermore, the operating system allows only one hash-bang argument. Using the Null Hack, this is rewritten as:
#!/usr/bin/env txr<NUL>-f
then if the script is invoked with arguments -a b c, the command line will ultimately be transformed into:
/path/to/txr -f /path/to/scriptfile -i a b c
which allows TXR to process the -i option, leaving a, b and c as arguments for the script.
However, note that there is a subtle issue with the -f option that has been inserted via the Null Hack: namely, this insertion happens after TXR has opened the script file and read the hash-bang line from it. This means that when the inserted -f option is being processed, the script file is already open. A special behavior occurs. The -f option processing notices that the argument to -f is identical to the pathname of name of the script file that TXR has already opened for processing. The -f option and its argument are then skipped.
Outside of directives, whitespace is significant in TXR queries, and represents a pattern match for whitespace in the input. An extent of text consisting of an undivided mixture of tabs and spaces is a whitespace token.
Whitespace tokens match a precisely identical piece of whitespace in the input, with one exception: a whitespace token consisting of precisely one space has a special meaning. It is equivalent to the regular expression @/[ ]+/: match an extent of one or more spaces (but not tabs!). Multiple consecutive spaces do not have this meaning.
Thus, the query line "a b" (one space between a and b) matches "a b" with any number of spaces between the two letters.
For matching a single space, the syntax @\ can be used (backslash-escaped space).
It is more often necessary to match multiple spaces than to match exactly one space, so this rule simplifies many queries and inconveniences only a few.
In output clauses, string and character literals and quasiliterals, a space token denotes a space.
Query material which is not escaped by the special character @ is literal text, which matches input character for character. Text which occurs at the beginning of a line matches the beginning of a line. Text which starts in the middle of a line, other than following a variable, must match exactly at the current position, where the previous match left off. Moreover, if the text is the last element in the line, its match is anchored to the end of the line.
An empty query line matches an empty line in the input. Note that an empty input stream does not contain any lines, and therefore is not matched by an empty line. An empty line in the input is represented by a newline character which is either the first character of the file, or follows a previous newline-terminated line.
Input streams which end without terminating their last line with a newline are tolerated, and are treated as if they had the terminator.
Text which follows a variable has special semantics, described in the section Variables below.
A query may not leave a line of input partially matched. If any portion of a line of input is matched, it must be entirely matched, otherwise a matching failure results. However, a query may leave unmatched lines. Matching only four lines of a ten-line file is not a matching failure. The eof directive can be used to explicitly match the end of a file.
In the following example, the query matches the text, even though the text has an extra line.
In the following example, the query fails to match the text, because the text has extra material on one line that is not matched:
Needless to say, if the text has insufficient material relative to the query, that is a failure also.
To match arbitrary material from the current position to the end of a line, the "match any sequence of characters, including empty" regular expression @/.*/ can be used. Example:
In this example, the query matches, since the regular expression matches the string "of data". (See the Regular Expressions section below.)
Another way to do this is:
Control characters may be embedded directly in a query (with the exception of newline characters). An alternative to embedding is to use escape syntax. The following escapes are supported:
abcd@\
@\ efg
is equivalent to the line
abcd efg
The two spaces before the @\ in the second line are consumed. The spaces after are preserved.
Note that if a newline is embedded into a query line with @\n, this does not split the line into two; it's embedded into the line and thus cannot match anything. However, @\n may be useful in the @(cat) directive and in @(output).
TXR represents text internally using wide characters, which are used to represent Unicode code points. Script source code, as well as all data sources, are assumed to be in the UTF-8 encoding. In TXR and TXR Lisp source, extended characters can be used directly in comments, literal text, string literals, quasiliterals and regular expressions. Extended characters can also be expressed indirectly using hexadecimal or octal escapes. On some platforms, wide characters may be restricted to 16 bits, so that TXR can only work with characters in the BMP (Basic Multilingual Plane) subset of Unicode.
TXR does not use the localization features of the system library; its handling of extended characters is not affected by environment variables like LANG and L_CTYPE. The program reads and writes only the UTF-8 encoding.
TXR deals with UTF-8 separately in its parser and in its I/O streams implementation.
TXR's text streams perform UTF-8 conversion internally, such that TXR applications use Unicode code points.
In text streams, invalid UTF-8 bytes are treated as follows. When an invalid byte is encountered in the middle of a multibyte character, or if the input ends in the middle of a multibyte character, or if an invalid character is decoded, such as an overlong from, or code in the range U+DC00 through U+DCFF, the UTF-8 decoder returns to the starting byte of the ill-formed multibyte character, and extracts just one byte, mapping that byte to the Unicode character range U+DC00 through U+DCFF, producing that code point as the decoded result. The decoder is then reset to its initial state and begins decoding at the following byte, where the same algorithm is repeated.
Furthermore, because TXR internally uses a null-terminated character representation of strings which easily interoperates with C language interfaces, when a null character is read from a stream, TXR converts it to the code U+DC00. On output, this code converts back to a null byte, as explained in the previous paragraph. By means of this representational trick, TXR can handle textual data containing null bytes.
In contrast to the above, the TXR parser scans raw UTF-8 bytes from a binary stream, rather than using a text stream. The parser performing its own recognition of UTF-8 sequences in certain language constructs, using a UTF-8 decoder only when processing certain kinds of tokens.
Comments are read without regard for encoding, so invalid encoding bytes in comments are not detected. A comment is simply a sequence of bytes terminated by a newline.
Invalid UTF-8 encountered while scanning identifiers and character names in character literal (hash-backslash) syntax is diagnosed as a syntax error.
UTF-8 in string literals is treated in the same way as UTF-8 in text streams. Invalid UTF-8 bytes are mapped into code points in the U+DC000 through U+DCFF range, and incorporated as such into the resulting string object which the literal denotes. The same remarks apply to regular-expression literals.
In place of a piece of text (see section Text above), a regular-expression directive may be used, which has the following syntax:
@/RE/
where the RE part enclosed in slashes represents regular-expression syntax (described in the section Regular Expressions below).
Long regular expressions can be broken into multiple lines using a backslash-newline sequence. Whitespace before the sequence or after the sequence is not significant, so the following two are equivalent:
@/reg \
ular/
@/regular/
There may not be whitespace between the backslash and newline.
Whereas literal text simply represents itself, regular expression denotes a (potentially infinite) set of texts. The regular-expression directive matches the longest piece of text (possibly empty) which belongs to the set denoted by the regular expression. The match is anchored to the current position; thus if the directive is the first element of a line, the match is anchored to the start of a line. If the regular-expression directive is the last element of a line, it is anchored to the end of the line also: the regular expression must match the text from the current position to the end of the line.
Even if the regular expression matches the empty string, the match will fail if the input is empty, or has run out of data. For instance suppose the third line of the query is the regular expression @/.*/, but the input is a file which has only two lines. This will fail: the data has no line for the regular expression to match. A line containing no characters is not the same thing as the absence of a line, even though both abstractions imply an absence of characters.
Like text which follows a variable, a regular-expression directive which follows a variable has special semantics, described in the section Variables below.
Much of the query syntax consists of arbitrary text, which matches file data character for character. Embedded within the query may be variables and directives which are introduced by a @ character. Two consecutive @@ characters encode a literal @.
A variable-matching or substitution directive is written in one of several ways:
@sident
@{bident}
@*sident
@*{bident}
@{bident /regex/}
@{bident (fun [arg ...])}
@{bident number}
@{bident bident}
The forms with an * indicate a long match, see Longest Match below. The forms with the embedded regexp /regex/ or function or number have special semantics; see Positive Match below.
The identifier t cannot be used as a name; it is a reserved symbol which denotes the value true. An attempt to use the variable @t will result in an exception. The symbol nil can be used where a variable name is required syntactically, but it has special semantics, described in a section below.
A sident is a "simple identifier" form which is not delimited by braces.
A sident consists of any combination of one or more letters, numbers, and underscores. It may not look like a number, so that for instance 123 is not a valid sident, but 12A is valid. Case is sensitive, so that FOO is different from foo, which is different from Foo.
The braces around an identifier can be used when material which follows would otherwise be interpreted as being part of the identifier. When a name is enclosed in braces it is a bident.
The following additional characters may be used as part of a bident which are not allowed in a sident:
! $ % & * + - < = > ? \ ~
Moreover, most Unicode characters beyond U+007F may appear in a bident, with certain exceptions. A character may not be used if it is any of the Unicode space characters, a member of the high or low surrogate region, a member of any Unicode private-use area, or is either of the two characters U+FFFE and U+FFFF. These situations produce a syntax error. Invalid UTF-8 in an identifier is also a syntax error.
The rule still holds that a name cannot look like a number so +123 is not a valid bident but these are valid: a->b, *xyz*, foo-bar.
The syntax @FOO_bar introduces the name FOO_bar, whereas @{FOO}_bar means the variable named "FOO" followed by the text "_bar". There may be whitespace between the @ and the name, or opening brace. Whitespace is also allowed in the interior of the braces. It is not significant.
If a variable has no prior binding, then it specifies a match. The match is determined from some current position in the data: the character which immediately follows all that has been matched previously. If a variable occurs at the start of a line, it matches some text at the start of the line. If it occurs at the end of a line, it matches everything from the current position to the end of the line.
If a variable is one of the plain forms
@sident
@{bident}
@*sident
@*{bident}
then this is a "negative match". The extent of the matched text (the text bound to the variable) is determined by looking at what follows the variable, and ranges from the current position to some position where the following material finds a match. This is why this is called a "negative match": the spanned text which ends up bound to the variable is that in which the match for the trailing material did not occur.
A variable may be followed by a piece of text, a regular-expression directive, a function call, a directive, another variable, or nothing (i.e. occurs at the end of a line). These cases are described in detail below.
the variable a is considered to be followed by ":@/foo/bcd e".
If a variable is followed by text, then the extent of the negative match is determined by searching for the first occurrence of that text within the line, starting at the current position.
The variable matches everything between the current position and the matching position (not including the matching position). Any whitespace which follows the variable (and is not enclosed inside braces that surround the variable name) is part of the text. For example:
In the above example, the pattern text "a b " matches the data "a b ". So when the @FOO variable is processed, the data being matched is the remaining "c d e f". The text which follows @FOO is " e f". This is found within the data "c d e f" at position 3 (counting from 0). So positions 0–2 ("c d") constitute the matching text which is bound to FOO.
If the variable is followed by a function call, or a directive, the extent is determined by scanning the text for the first position where a match occurs for the entire remainder of the line. (For a description of functions, see Functions.)
For example:
@foo@(bind a "abc")xyz
Here, @foo will match the text from the current position to where "xyz" occurs, even though there is a @(bind) directive. Furthermore, if more material is added after the "xyz", it is part of the search. Note the difference between the following two:
@foo@/abc/@(func)
@foo@(func)@/abc/
In the first example, @foo matches the text from the current position until the match for the regular expression "abc". @(func) is not considered when processing @foo. In the second example, @foo matches the text from the current position until the position which matches the function call, followed by a match for the regular expression. The entire sequence @(func)@/abc/ is considered.
However, what if an unbound variable with no modifier is followed by another variable? The behavior depends on the nature of the other variable.
If the other variable is also unbound, and also has no modifier, this is a semantic error which will cause the query to fail. A diagnostic message will be issued, unless operating in quiet mode via -q. The reason is that there is no way to bind two consecutive variables to an extent of text; this is an ambiguous situation, since there is no matching criterion for dividing the text between two variables. (In theory, a repetition of the same variable, like @FOO@FOO, could find a solution by dividing the match extent in half, which would work only in the case when it contains an even number of characters. This behavior seems to have dubious value.)
An unbound variable may be followed by one which is bound. The bound variable is effectively replaced by the text which it denotes, and the logic proceeds accordingly.
It is possible for a variable to be bound to a regular expression. If x is an unbound variable and y is bound to a regular expression RE, then @x@y means @x@/RE/. A variable v can be bound to a regular expression using, for example, @(bind v #/RE/).
The @* syntax for longest match is available. Example:
Here, FOO is matched with "xyz", based on the delimiting around the colon. The colon in the pattern then matches the colon in the data, so that BAR is considered for matching against "defxyz". BAR is followed by FOO, which is already bound to "xyz". Thus "xyz" is located in the "defxyz" data following "def", and so BAR is bound to "def".
If an unbound variable is followed by a variable which is bound to a list, or nested list, then each character string in the list is tried in turn to produce a match. The first match is taken.
An unbound variable may be followed by another unbound variable which specifies a regular expression or function call match. This is a special case called a "double variable match". What happens is that the text is searched using the regular expression or function. If the search fails, then neither variable is bound: it is a matching failure. If the search succeeds, then the first variable is bound to the text which is skipped by the search. The second variable is bound to the text matched by the regular expression or function. Example:
This is treated just like the variable followed by directive. No semantic error is identified, even if both variables are unbound. Here, @var2 matches everything at the current position, and so @var1 ends up bound to the empty string.
Example 1: b matches at position 0 and a binds the empty string:
Example 2: *a specifies longest match (see Longest Match below), and so it takes everything:
In the former example, the match extends to the rightmost occurrence of "cd", and so FOO receives "b cdcdcd". In the latter example, the * syntax isn't used, and so a leftmost match takes place. The extent covers only the "b ", stopping at the first "cd" occurrence.
There are syntactic variants of variable syntax which have an embedded expression enclosed with the variable in braces:
@{bident /regex/}
@{bident (fun [args ...])}
@{bident number}
@{bident bident}
These specify a variable binding that is driven by a positive match derived from a regular expression, function or character count, rather than from trailing material (which is regarded as a "negative" match, since the variable is bound to material which is skipped in order to match the trailing material).
The positive match syntax is processed without considering any following syntax, and therefore may be followed by an unbound variable.
In the @{bident /regex/} form, the match extends over all characters from the current position which match the regular expression regex. (See the Regular Expressions section below.) If the variable already has a value, the text extracted by the regular expression must exactly match the variable.
In the @{bident (fun [args ...])} form, the match extends over lines or characters which are matched by the call to the function, if the call succeeds. Thus @{x (y z w)} is just like @(y z w), except that the region of text skipped over by @(y z w) is also bound to the variable x. Except in one special case, the matching takes place horizontally within the current line, and the spanned range of text is treated as a string. The exception is that if the @{bident (fun [args ...])} appears as the only element of a line, and fun has a binding as a vertical function, then the function is invoked in the same manner as it would be by the @(fun [args ...]) syntax. Then the variable indicated by bident is bound to the list of lines matched by the function call. Pattern functions are described in the Functions section below. The function is invoked even if the variable already has a value. The text matched by the function must match the variable.
In the @{bident number} form, the match processes a field of text which consists of the specified number of characters, which must be a nonnegative number. If the data line doesn't have that many characters starting at the current position, the match fails. A match for zero characters produces an empty string. The text which is actually bound to the variable is all text within the specified field, but excluding leading and trailing whitespace. If the field contains only spaces, then an empty string is extracted. This fixed-field extraction takes place whether or not the variable already has a binding. If it already has a binding, then it must match the extracted, trimmed text.
The @{bident bident} syntax allows the number or regex modifier to come from a variable. The variable must be bound and contain a nonnegative integer or regular expression. For example, @{x y} behaves like @{x 3} if y is bound to the integer 3. It is an error if y is unbound.
Just like in the Common Lisp language, the names nil and t are special.
nil symbol stands for the empty list object, an object which marks the end of a list, and Boolean false. It is synonymous with the syntax () which may be used interchangeably with nil in most constructs.
In TXR Lisp, nil and t cannot be used as variables. When evaluated, they evaluate to themselves.
In the TXR pattern language, nil can be used in the variable binding syntax, but does not create a binding; it has a special meaning. It allows the variable-matching syntax to be used to skip material, in ways similar to the skip directive.
The nil symbol is also used as a block name, both in the TXR pattern language and in TXR Lisp. A block named nil is considered to be anonymous.
Names beginning with the : (colon) character are keyword symbols. These also stand for themselves and may not be used as variables. Keywords are useful for labeling information and situations.
Regular expressions are a language for specifying sets of character strings. Through the use of pattern-matching elements, a regular expression is able to denote an infinite set of texts. TXR contains an original implementation of regular expressions, which supports the following syntax:
Any character which is not a regular-expression operator, a backslash escape, or the slash delimiter, denotes a one-position match of that character itself.
Any of the special characters, including the delimiting /, and the backslash, can be escaped with a backslash to suppress its meaning and denote the character itself.
Furthermore, all of the same escapes that are described in the section Special Characters in Text above are supported — the difference is that in regular expressions, the @ character is not required, so for example a tab is coded as \t rather than @\t. Octal and hex character escapes can be optionally terminated by a semicolon, which is useful if the following characters are octal or hex digits not intended to be part of the escape.
Only the above escapes are supported. Unlike in some other regular-expression implementations, if a backlash appears before a character which isn't a regex special character or one of the supported escape sequences, it is an error. This wasn't true of historic versions of TXR. See the COMPATIBILITY section.
Operators | Class | Associativity |
(R) [] | primary | |
R? R+ R* R%... | postfix | left-to-right |
R1R2 | catenation | left-to-right |
~R ...%R | unary | right-to-left |
R1&R2 | intersection | left-to-right |
R1|R2 | union | left-to-right |
The % operator is like a postfix operator with respect to its left operand, but like a unary operator with respect to its right operand. Thus a~b%c~d is a(~(b%(c(~d)))), demonstrating right-to-left associativity, where all of b% may be regarded as a unary operator being applied to c~d. Similarly, a?*+%b means (((a?)*)+)%b, where the trailing %b behaves like a postfix operator.
In TXR, regular expression matches do not span multiple lines. The regex language has no feature for multiline matching. However, the @(freeform) directive allows the remaining portion of the input to be treated as one string in which line terminators appear as explicit characters. Regular expressions may freely match through this sequence.
It's possible for a regular expression to match an empty string. For instance, if the next input character is z, facing the regular expression /a?/, there is a zero-character match: the regular expression's state machine can reach an acceptance state without consuming any characters. Examples:
In the first example, variable @A is followed by a regular expression which can match an empty string. The expression faces the letter z at position 0 in the data line. A zero-character match occurs there, therefore the variable A takes on the empty string. The @/.*/ regular expression then consumes the line.
Similarly, in the second example, the /a?/ regular expression faces a z, and thus yields an empty string which is bound to A. Variable @B consumes the entire line.
The third example requests the longest match for the variable binding. Thus, a search takes place for the rightmost position where the regular expression matches. The regular expression matches anywhere, including the empty string after the last character, which is the rightmost place. Thus variable A fetches the entire line.
For additional information about the advanced regular-expression operators, see NOTES ON EXOTIC REGULAR EXPRESSIONS below.
If the @ escape character is followed by an open parenthesis or square bracket, this is taken to be the start of a TXR Lisp compound expression.
The TXR language has the unusual property that its syntactic elements, so-called directives, are Lisp compound expressions. These expressions not only enclose syntax, but expressions which begin with certain symbols de facto behave as tokens in a phrase structure grammar. For instance, the expression @(collect) begins a block which must be terminated by the expression @(end), otherwise there is a syntax error. The collect expression can contain arguments which modify the behavior of the construct, for instance @(collect :gap 0 :vars (a b)). In some ways, this situation might be compared to HTML, in which an element such as <a> must be terminated by </a> and can have attributes such as <a href="...">.
Compound expressions contain subexpressions which are other compound expressions or literal objects of various kinds. Among these are: symbols, numbers, string literals, character literals, quasiliterals and regular expressions. These are described in the following sections. Additional kinds of literal objects exist, which are discussed in the TXR LISP section of the manual.
Some examples of compound expressions are:
(banana)
(a b c (d e f))
( a (b (c d) (e ) ))
("apple" #\b #\space 3)
(a #/[a-z]*/ b)
(_ `@file.txt`)
Symbols occurring in a compound expression follow a slightly more permissive lexical syntax than the bident in the syntax @{bident} introduced earlier. The / (slash) character may be part of an identifier, or even constitute an entire identifier. In fact a symbol inside a directive is a lident. This is described in the Symbol Tokens section under TXR LISP. A symbol must not be a number; tokens that look like numbers are treated as numbers and not symbols.
Character literals are introduced by the #\ (hash-backslash) syntax, which is either followed by a character name, the letter x followed by hex digits, the letter o followed by octal digits, or a single character. Valid character names are:
nul linefeed return
alarm newline esc
backspace vtab space
tab page pnul
For instance #\esc denotes the escape character.
This convention for character literals is similar to that of the Scheme language. Note that #\linefeed and #\newline are the same character. The #\pnul character is specific to TXR and denotes the U+DC00 code in Unicode; the name stands for "pseudo-null", which is related to its special function. For more information about this, see the section "Character Handling and International Characters".
String literals are delimited by double quotes. A double quote within a string literal is encoded using \" and a backslash is encoded as \\. Backslash escapes like \n and \t are recognized, as are hexadecimal escapes like \xFF or \xabc and octal escapes like \123. Ambiguity between an escape and subsequent text can be resolved by adding a semicolon delimiter after the escape: "\xabc;d" is a string consisting of the character U+0ABC followed by "d". The semicolon delimiter disappears. To write a literal semicolon immediately after a hex or octal escape, write two semicolons, the first of which will be interpreted as a delimiter. Thus, "\x21;;" represents "!;".
Note that the source code syntax of TXR string literals is specified in UTF-8, which is decoded into an internal string representation consisting of code points. The numeric escape sequences are an abstract syntax for specifying code points, not for specifying bytes to be inserted into the UTF-8 representation, even if they lie in the 8-bit range. Bytes cannot be directly specified, other than literally. However, when a TXR string object is encoded to UTF-8, every code point lying in the range U+DC00 through U+DCFF is converted to a single byte by taking the low-order eight bits of its value. By manipulating code points in this special range, TXR programs can reproduce arbitrary byte sequences in text streams. Also note that the \u escape sequence for specifying code points found in some languages is unnecessary and absent, since the existing hexadecimal and octal escapes satisfy this requirement. More detailed information is given in the earlier section Character Handling and International Characters.
If the line ends in the middle of a literal, it is an error, unless the last character is a backslash. This backslash is a special escape which does not denote a character; rather, it indicates that the string literal continues on the next line. The backslash is deleted, along with whitespace which immediately precedes it, as well as leading whitespace in the following line. The escape sequence "\ " (backslash space) can be used to encode a significant space.
Example:
"foo \
bar"
"foo \
\ bar"
"foo\ \
bar"
The first string literal is the string "foobar". The second two are "foo bar".
A word list literal (WLL) provides a convenient way to write a list of strings when such a list can be given as whitespace-delimited words.
There are two flavors of the WLL: the regular WLL which begins with #" (hash, double quote) and the splicing list literal which begins with #*" (hash, star, double quote).
Both types are terminated by a double quote, which may be escaped as \" in order to include it as a character. All the escaping conventions used in string literals can be used in word literals.
Unlike in string literals, whitespace (tabs and spaces) is not significant in word literals: it separates words. A whitespace character may be escaped with a backslash in order to include it as a literal character.
Just like in string literals, an unescaped newline character is not allowed. A newline preceded by a backslash is permitted. Such an escaped backslash, together with any leading and trailing unescaped whitespace, is removed and replaced with a single space.
Example:
#"abc def ghi" --> notates ("abc" "def" "ghi")
#"abc def \
ghi" --> notates ("abc" "def" "ghi")
#"abc\ def ghi" --> notates ("abc def" "ghi")
#"abc\ def\ \
\ ghi" --> notates ("abc def " " ghi")
A splicing word literal differs from a word literal in that it does not produce a list of string literals, but rather it produces a sequence of string literals that is merged into the surrounding syntax. Thus, the following two notations are equivalent:
(1 2 3 #*"abc def" 4 5 #"abc def")
(1 2 3 "abc" "def" 4 5 ("abc" "def"))
The regular WLL produced a single list object, but the splicing WLL expanded into multiple string literal objects.
Quasiliterals are similar to string literals, except that they may contain variable references denoted by the usual @ syntax. The quasiliteral represents a string formed by substituting the values of those variables into the literal template. If a is bound to "apple" and b to "banana", the quasiliteral `one @a and two @{b}s` represents the string "one apple and two bananas". A backquote escaped by a backslash represents itself. Unlike in directive syntax, two consecutive @ characters do not code for a literal @, but cause a syntax error. The reason for this is that compounding of the @ syntax is meaningful. Instead, there is a \@ escape for encoding a literal @ character. Quasiliterals support the full output variable syntax. Expressions within variable substitutions follow the evaluation rules of TXR Lisp. This hasn't always been the case: see the COMPATIBILITY section.
Quasiliterals can be split into multiple lines in the same way as ordinary string literals.
The quasiword list literals (QLLs) are to quasiliterals what WLLs are to ordinary literals. (See the above section Word List Literals.)
A QLL combines the convenience of the WLL with the power of quasistrings.
Just as in the case of WLLs, there are two flavors of the QLL: the regular QLL which begins with #` (hash, backquote) and the splicing QLL which begins with #*` (hash, star, backquote).
Both types are terminated by a backquote, which may be escaped as \` in order to include it as a character. All the escaping conventions used in quasiliterals can be used in QLLs.
Unlike in quasiliterals, whitespace (tabs and spaces) is not significant in QLLs: it separates words. A whitespace character may be escaped with a backslash in order to include it as a literal character.
A newline is not permitted unless escaped. An escaped newline works exactly the same way as it does in WLLs.
Note that the delimiting into words is done before the variable substitution. If the variable a contains spaces, then #`@a` nevertheless expands into a list of one item: the string derived from a.
Examples:
#`abc @a ghi` --> notates (`abc` `@a` `ghi`)
#`abc @d@e@f \
ghi` --> notates (`abc` `@d@e@f` `ghi`)
#`@a\ @b @c` --> notates (`@a @b` `@c`)
A splicing QLL differs from an ordinary QLL in that it does not produce a list of quasiliterals, but rather it produces a sequence of quasiliterals that is merged into the surrounding syntax.
TXR supports integers and floating-point numbers.
An integer constant is made up of digits 0 through 9, optionally preceded by a + or - sign.
Examples:
123
-34
+0
-0
+234483527304983792384729384723234
An integer constant can also be specified in hexadecimal using the prefix #x followed by an optional sign, followed by hexadecimal digits: 0 through 9 and the uppercase or lowercase letters A through F:
#xFF ;; 255
#x-ABC ;; -2748
Similarly, octal numbers are supported with the prefix #o followed by octal digits:
#o777 ;; 511
and binary numbers can be written with a #b prefix:
#b1110 ;; 14
Note that the #b prefix is also used for buffer literals.
A floating-point constant is marked by the inclusion of a decimal point, the scientific E notation, or both. It is an optional sign, followed by a mantissa consisting of digits, a decimal point, more digits, and then an optional E notation consisting of the letter e or E, an optional + or - sign, and then digits indicating the exponent value. In the mantissa, the digits are not optional. At least one digit must either precede the decimal point or follow it. That is to say, a decimal point by itself is not a floating-point constant.
Examples:
.123
123.
1E-3
20E40
.9E1
9.E19
-.5
+3E+3
1.E5
Examples which are not floating-point constant tokens:
. ;; dot token, not a number
123E ;; the symbol 123E
1.0E- ;; syntax error: invalid floating point constant
1.0E ;; syntax error: invalid floating point constant
1.E ;; syntax error: invalid floating point literal
.e ;; syntax error: dot token followed by symbol
In TXR there is a special "dotdot" token consisting of two consecutive periods. An integer constant followed immediately by dotdot is recognized as such; it is not treated as a floating constant followed by a dot. That is to say, 123.. does not mean 123. . (floating point 123.0 value followed by dot token). It means 123 .. (integer 123 followed by .. token).
Dialect Note: unlike in Common Lisp, 123. is not an integer, but the floating-point number 123.0.
Comments of the form @; were introduced earlier. Inside compound expressions, another convention for comments exists: Lisp comments, which are introduced by the ; (semicolon) character and span to the end of the line.
Example:
@(foo ; this is a comment
bar ; this is another comment
)
This is equivalent to @(foo bar).
When a TXR Lisp compound expression occurs in TXR preceded by a @, it is a directive.
Directives which are based on certain symbols are, additionally, involved in a phrase-structure syntax which uses Lisp expressions as if they were tokens.
For instance, the directive
@(collect)
not only denotes a compound expression with the collect symbol in its head position, but it also introduces a syntactic phrase which requires a matching @(end) directive. In other words, @(collect) is not only an expression, but serves as a kind of token in a higher-level, phrase-structure grammar.
Effectively, collect is a reserved symbol in the TXR language. A TXR program cannot use this symbol as the name of a pattern function due to its role in the syntax. The symbol has no reserved role in TXR Lisp.
Usually if this type of directive occurs alone in a line, not preceded or followed by other material, it is involved in a "vertical" (or line-oriented) syntax.
If such a directive is embedded in a line (has preceding or trailing material) then it is in a horizontal syntactic and semantic context (character-oriented).
There is an exception: the definition of a horizontal function looks like this:
@(define name (arg))body material@(end)
Yet, this is considered one vertical item, which means that it does not match a line of data. (This is necessary because all horizontal syntax matches something within a line of data, which is undesirable for definitions.)
Many directives exhibit both horizontal and vertical syntax, with different but closely related semantics. Some are vertical only, some are horizontal only.
A summary of the available directives follows:
A collect is an anonymous block.
Named filters are stored in the hash table held in the Lisp special variable *filters*.
Some directives contain subexpressions which are evaluated. Two distinct styles of evaluations occur in TXR: bind expressions and Lisp expressions. Which semantics applies to an expression depends on the syntactic context in which it occurs: which position in which directive.
The evaluation of TXR Lisp expressions is described in the TXR LISP section of the manual.
Bind expressions are so named because they occur in the @(bind) directive. TXR pattern function invocations also treat argument expressions as bind expressions.
The @(rebind), @(set), @(merge), and @(deffilter) directives also use bind expression evaluation. Bind expression evaluation also occurs in the argument position of the :tlist keyword in the @(next) directive.
Unlike Lisp expressions, bind expressions do not support operators. If a bind expression is a nested list structure, it is a template denoting that structure. Any symbol in any position of that structure is interpreted as a variable. When the bind expression is evaluated, those corresponding positions in the template are replaced by the values of the variables.
Anywhere where a variable can appear in a bind expression's nested list structure, a Lisp expression can appear preceded by the @ character. That Lisp expression is evaluated and its value is substituted into the bind expression's template.
Moreover, a Lisp expression preceded by @ can be used as an entire bind expression. The value of that Lisp expression is then taken as the bind expression value.
Any object in a bind expression which is not a nested list structure containing Lisp expressions or variables denotes itself literally.
In the following examples, the variables a and b are assumed to have the string values "foo" and "bar", respectively.
The -> notation indicates the value of each expression.
a -> "foo"
(a b) -> ("foo" "bar")
((a) ((b) b)) -> (("foo") (("bar") "bar"))
(list a b) -> error: unbound variable list
@(list a b) -> ("foo" "bar") ;; Lisp expression
(a @[b 1..:]) -> ("foo" "ar") ;; Lisp eval of [b 1..:]
(a @(+ 2 2)) -> ("foo" 4) ;; Lisp eval of (+ 2 2)
#(a b) -> (a b) ;; Vector literal, not list.
[a b] -> error: unbound variable dwim
The last example above [a b] is a notation equivalent to (dwim a b) and so follows similarly to the example involving list.
The next directive indicates that the remaining directives in the current block are to be applied against a new input source.
It can only occur by itself as the only element in a query line, and takes various arguments, according to these possibilities:
@(next)
@(next source)
@(next source :nothrow)
@(next :args)
@(next :env)
@(next :list lisp-expr)
@(next :tlist bind-expr)
@(next :string lisp-expr)
@(next :var var)
@(next nil)
The lone @(next) without arguments specifies that subsequent directives will match inside the next file in the argument list which was passed to TXR on the command line.
If source is given, it must be a TXR Lisp expression which denotes an input source. Its value may be a string or an input stream. For instance, if variable A contains the text "data", then @(next A) means switch to the file called "data", and @(next `@A.txt`) means to switch to the file "data.txt". The directive @(next (open-command `git log`)) switches to the input stream connected to the output of the git log command.
If the input source cannot be opened for whatever reason, TXR throws an exception (see Exceptions below). An unhandled exception will terminate the program. Often, such a drastic measure is inconvenient; if @(next) is invoked with the :nothrow keyword, then if the input source cannot be opened, the situation is treated as a simple match failure. The :nothrow keyword also ensures that when the stream is later closed, which occurs when the lazy list reads all of the available data, the implicit call to the close-stream function specifies nil as the argument value to that function's throw-on-error-p parameter. This :nothrow mechanism does not suppress all exceptions related to the processing of that stream; unusual conditions encountered during the reading of data from the stream may throw exceptions.
The variant @(next :args) means that the remaining command-line arguments are to be treated as a data source. For this purpose, each argument is considered to be a line of text. The argument list does include that argument which specifies the file that is currently being processed or was most recently processed. As the arguments are matched, they are consumed. This means that if a @(next) directive without arguments is executed in the scope of @(next :args), it opens the file named by the first unconsumed argument.
To process arguments, and then continue with the original file and argument list, wrap the argument processing in a @(block). When the block terminates, the input source and argument list are restored to what they were before the block.
The variant @(next :env) means that the list of process environment variables is treated as a source of data. It looks like a text file stream consisting of lines of the form "name=value". If this feature is not available on a given platform, an exception is thrown.
The syntax @(next :list lisp-expr) treats TXR Lisp expression lisp-expr as a source of text. The value of lisp-expr is flattened to a simple list in a way similar to the @(flatten) directive. The resulting list is treated as if it were the lines of a text file: each element of the list must be a string, which represents a line. If the strings happen contain embedded newline characters, they are a visible constituent of the line, and do not act as line separators.
The syntax @(next :tlist bind-expr) is similar to @(next :list ...) except that bind-expr is not a TXR Lisp expression, but a TXR bind expression.
The syntax @(next :var var) requires var to be a previously bound variable. The value of the variable is retrieved and treated like a list, in the same manner as under @(next :list ...). Note that @(next :var x) is not always the same as @(next :tlist x), because :var x strictly requires x to be a TXR variable, whereas the x in :tlist x is an expression which can potentially refer to Lisp variable.
The syntax @(next :string lisp-expr) treats expression lisp-expr as a source of text. The value of the expression must be a string. Newlines in the string are interpreted as line terminators.
A string which is not terminated by a newline is tolerated, so that:
@(next :string "abc")
@a
binds a to "abc". Likewise, this is also the case with input files and other streams whose last line is not terminated by a newline.
However, watch out for empty strings, which are analogous to a correctly formed empty file which contains no lines:
@(next :string "")
@a
This will not bind a to ""; it is a matching failure. The behavior of :list is different. The query
@(next :list "")
@a
binds a to "". The reason is that under :list the string "" is flattened to the list ("") which is not an empty input stream, but a stream consisting of one empty line.
The @(next nil) variant indicates that the following subquery is applied to empty data, and the list of data sources from the command line is considered empty. This directive is useful in front of TXR code which doesn't process data sources from the command line, but takes command-line arguments. The @(next nil) incantation absolutely prevents TXR from trying to open the first command-line argument as a data source.
Note that the @(next) directive only redirects the source of input over the scope of subquery in which the that directive appears. For example, the following query looks for the line starting with "xyz" at the top of the file "foo.txt", within a some directive. After the @(end) which terminates the @(some), the "abc" is matched in the previous input stream which was in effect before the @(next) directive:
@(some)
@(next "foo.txt")
xyz@suffix
@(end)
abc
However, if the @(some) subquery successfully matched "xyz@suffix" within the file foo.text, there is now a binding for the suffix variable, which is visible to the remainder of the entire query. The variable bindings survive beyond the clause, but the data stream does not.
The skip directive considers the remainder of the query as a search pattern. The remainder is no longer required to strictly match at the current line in the current input stream. Rather, the current stream is searched, starting with the current line, for the first line where the entire remainder of the query will successfully match. If no such line is found, the skip directive fails. If a matching position is found, the remainder of the query is processed from that point.
The remainder of the query can itself contain skip directives. Each such directive performs a recursive subsearch.
Skip comes in vertical and horizontal flavors. For instance, skip and match the last line:
Skip and match the last character of the line:
@(skip)@{last 1}@(eol)
The skip directive has two optional arguments, which are evaluated as TXR Lisp expressions. If the first argument evaluates to an integer, its value limits the range of lines scanned for a match. Judicious use of this feature can improve the performance of queries.
Example: scan until "size: @SIZE" matches, which must happen within the next 15 lines:
@(skip 15)
size: @SIZE
Without the range limitation, skip will keep searching until it consumes the entire input source. In a horizontal skip, the range-limiting numeric argument is expressed in characters, so that
abc@(skip 5)def
means: there must be a match for "abc" at the start of the line, and then within the next five characters, there must be a match for "def".
Sometimes a skip is nested within a collect, or following another skip. For instance, consider:
@(collect)
begin @BEG_SYMBOL
@(skip)
end @BEG_SYMBOL
@(end)
The above collect iterates over the entire input. But, potentially, so does the embedded skip. Suppose that "begin x" is matched, but the data has no matching "end x". The skip will search in vain all the way to the end of the data, and then the collect will try another iteration back at the beginning, just one line down from the original starting point. If it is a reasonable expectation that an end x occurs 15 lines of a "begin x", this can be specified instead:
@(collect)
begin @BEG_SYMBOL
@(skip 15)
end @BEG_SYMBOL
@(end)
If the symbol nil is used in place of a number, it means to scan an unlimited range of lines; thus, @(skip nil) is equivalent to @(skip).
If the symbol :greedy is used, it changes the semantics of the skip to longest match semantics. For instance, match the last three space-separated tokens of the line:
@(skip :greedy) @a @b @c
Without :greedy, the variable @c may match multiple tokens, and end up with spaces in it, because nothing follows @c and so it matches from any position which follows a space to the end of the line. Also note the space in front of @a. Without this space, @a will get an empty string.
A line-oriented example of greedy skip: match the last line without using @(eof):
@(skip :greedy)
@last_line
There may be a second numeric argument. This specifies a minimum number of lines to skip before looking for a match. For instance, skip 15 lines and then search indefinitely for begin ...:
@(skip nil 15)
begin @BEG_SYMBOL
The two arguments may be used together. For instance, the following matches if and only if the 15th line of input starts with begin :
@(skip 1 15)
begin @BEG_SYMBOL
Essentially, @(skip 1 n) means "hard skip by n lines". @(skip 1 0) is the same as @(skip 1), which is a noop, because it means: "the remainder of the query must match starting on the next line", or, more briefly, "skip exactly zero lines", which is the behavior if the skip directive is omitted altogether.
Here is one trick for grabbing the fourth line from the bottom of the input:
@(skip)
@fourth_from_bottom
@(skip 1 3)
@(eof)
Or using greedy skip:
@(skip :greedy)
@fourth_from_bottom
@(skip 1 3)
Non-greedy skip with the @(eof) directive has a slight advantage because the greedy skip will keep scanning even though it has found the correct match, then backtrack to the last good match once it runs out of data. The regular skip with explicit @(eof) will stop when the @(eof) matches.
The skip directive can consume considerable CPU time when multiple skips are nested. Consider:
This is actually nesting: the second and third skips occur within the body of the first one, and thus this creates nested iteration. TXR is searching for the combination of skips which match the pattern of lines A, B and C with backtracking behavior. The outermost skip marches through the data until it finds A followed by a pattern match for the second skip. The second skip iterates to find B followed by the third skip, and the third skip iterates to find C. If A and B are only one line each, then this is reasonably fast. But suppose there are many lines matching A and B, giving rise to a large number of combinations of skips which match A and B, and yet do not find a match for C, triggering backtracking. The nested stepping which tries the combinations of A and B can give rise to a considerable running time.
One way to deal with the problem is to unravel the nesting with the help of blocks. For example:
@(block)
@ (skip)
A
@(end)
@(block)
@ (skip)
B
@(end)
@(skip)
C
Now the scope of each skip is just the remainder of the block in which it occurs. The first skip finds A, and then the block ends. Control passes to the next block, and backtracking will not take place to a block which completed (unless all these blocks are enclosed in some larger construct which backtracks, causing the blocks to be re-executed.
This rewrite is not equivalent, and cannot be used for instance in backreferencing situations such as:
@;
@; Find three lines anywhere in the input which are identical.
@;
@(skip)
@line
@(skip)
@line
@(skip)
@line
This example depends on the nested search-within-search semantics.
The trailer directive introduces a trailing portion of a query or subquery which matches input material normally, but in the event of a successful match, does not advance the current position. This can be used, for instance, to cause @(collect) to match partially overlapping regions.
Trailer can be used in vertical context:
@(trailer)
directives
...
or horizontal:
@(trailer) directives ...
A vertical trailer prevents the vertical input position from advancing as it is matched by directives, whereas a horizontal trailer prevents the horizontal position from advancing. In other words, trailer performs matching without consuming the input, providing a lookahead mechanism.
Example:
@(collect)
@line
@(trailer)
@(skip)
@line
@(end)
This script collects each line which has a duplicate somewhere later in the input. Without the @(trailer) directive, this does not work properly for inputs like:
111
222
111
222
Without @(trailer), the first duplicate pair constitutes a match which spans over the 222. After that pair is found, the matching continues after the second 111.
With the @(trailer) directive in place, the collect body, on each iteration, only consumes the lines matched prior to @(trailer).
The freeform directive provides a useful alternative to TXR's line-oriented matching discipline. The freeform directive treats all remaining input from the current input source as one big line. The query line which immediately follows freeform is applied to that line.
The syntax variations are:
@(freeform)
... query line ..
@(freeform
number)
... query line ..
@(freeform string)
... query line ..
@(freeform number string)
... query line ..
where number and string denote TXR Lisp expressions which evaluate to an integer or string value, respectively.
If number and string are both present, they may be given in either order.
If the number argument is given, its value limits the range of lines which are combined together. For instance @(freeform 5) means to only consider the next five lines to be one big line. Without this argument, freeform is "bottomless". It can match the entire file, which creates the risk of allocating a large amount of memory.
If the string argument is given, it specifies a custom line terminator. The default terminator is "\n". The terminator does not have to be one character long.
Freeform does not convert the entire remainder of the input into one big line all at once, but does so in a dynamic, lazy fashion, which takes place as the data is accessed. So at any time, only some prefix of the data exists as a flat line in which newlines are replaced by the terminator string, and the remainder of the data still remains as a list of lines.
After the subquery is applied to the virtual line, the unmatched remainder of that line is broken up into multiple lines again, by looking for and removing all occurrences of the terminator string within the flattened portion.
Care must be taken if the terminator is other than the default "\n". All occurrences of the terminator string are treated as line terminators in the flattened portion of the data, so extra line breaks may be introduced. Likewise, in the yet unflattened portion, no breaking takes place, even if the text contains occurrences of the terminator string. The extent of data which is flattened, and the amount of it which remains, depends entirely on the query line underneath @(flatten).
In the following example, lines of data are flattened using $ as the line terminator.
The data is turned into the virtual line 1$2:3$4$. The @a$@b: subquery matches the 1$2: portion, binding a to "1", and b to "2". The remaining portion 3$4$ is then split into separate lines again according to the line terminator $i:
3
4
Thus the remainder of the query
@c
@d
faces these lines, binding c to 3 and d to 4. Note that since the data does not contain dollar signs, there is no ambiguity; the meaning may be understood in terms of the entire data being flattened and split again.
In the following example, freeform is used to solve a tokenizing problem. The Unix password file has fields separated by colons. Some fields may be empty. Using freeform, we can join the password file using ":" as a terminator. By restricting freeform to one line, we can obtain each line of the password file with a terminating ":", allowing for a simple tokenization, because now the fields are colon-terminated rather than colon-separated.
Example:
@(next "/etc/passwd")
@(collect)
@(freeform 1 ":")
@(coll)@{token /[^:]*/}:@(end)
@(end)
The fuzz directive allows for an imperfect match spanning a set number of lines. It takes two arguments, both of which are TXR Lisp expressions that should evaluate to integers:
@(fuzz m n)
...
This expresses that over the next n query lines, the matching strictness is relaxed a little bit. Only m out of those n lines have to match. Afterward, the rest of the query follows normal, strict processing.
In the degenerate situation where there are fewer than n query lines following the fuzz directive, then m of them must succeed anyway. (If there are fewer than m, then this is impossible.)
The line and chr directives perform binding between the current input line number or character position within a line, against an expression or variable:
@(line 42)
@(line x)
abc@(chr 3)def@(chr y)
The directive @(line 42) means "match the current input line number against the integer 42". If the current line is 42, then the directive matches, otherwise it fails. line is a vertical directive which doesn't consume a line of input. Thus, the following matches at the beginning of an input stream, and x ends up bound to the first line of input:
@(line 1)
@(line 1)
@(line 1)
@x
The directive @(line x) binds variable x to the current input line number, if x is an unbound variable. If x is already bound, then the value of x must match the current line number, otherwise the directive fails.
The chr directive is similar to line except that it's a horizontal directive, and matches the character position rather than the line position. Character positions are measured from zero, rather than one. chr does not consume a character. Hence the two occurrences of chr in the following example both match, and x takes the entire line of input:
The argument of line or chr may be an @-delimited Lisp expression. This is useful for matching computed lines or character positions:
The name directive performs a binding between the name of the current data source and a variable or bind expression:
If na is an unbound variable, it is bound and takes on the name of the data source, such as a file name. If na is bound, then it has to match the name of the data source, otherwise the directive fails.
The directive @(name "data.txt") fails unless the current data source has that name.
The data directive performs a binding between the unmatched data at the current position, and and a variable or bind expression. The unmatched data takes the form of a list of strings:
@(data d)
The binding is performed on object equality. If d is already bound, a matching failure occurs unless d contains the current unmatched data.
Matching the current data has various uses.
For instance, two branches of pattern matching can, at some point, bind the current data into different variables. When those paths join, the variables can be bound together to create the assertion that the current data had been the same at those points:
@(all)
@ (skip)
foo
@ (skip)
bar
@ (data x)
@(or)
@ (skip)
xyzzy
@ (skip)
bar
@ (data y)
@(end)
@(require (eq x y))
Here, two branches of the @(all) match some material which ends in the line bar. However, it is possible that this is a different line. The data directives are used to create an assertion that the data regions matched by the two branches are identical. That is to say, the unmatched data x captured after the first bar and the unmatched data y captured after the second bar must be the same object in order for @(require (eq x y)) to succeed, which implies that the same bar was matched in both branches of the @(all).
Another use of data is simply to gain access to the trailing remainder of the unmatched input in order to print it, or do some special processing on it.
The tprint Lisp function is useful for printing the unmatched data as newline-terminated lines:
@(data remainder)
@(do (tprint remainder))
The eof directive, if not given any argument, matches successfully when no more input is available from the current input source.
In the following example, the line variable captures the text "One-line file" and then since that is the last line of input, the eof directive matches:
If the data consisted of two or more lines, eof would fail.
The eof directive may be given a single argument, which is a pattern that matches the termination status of the input source. This is useful when the input source is a process pipe. For the purposes of eof, sources which are not process pipes have the symbol t as their termination status.
In the following example, which assumes the availability of a POSIX shell command interpreter in the host system, the variable a captures the string "a" and the status variable captures the integer value 5, which is the termination status of the command:
@(next (open-command "echo a; exit 5"))
@a
@(eof status)
These directives, called the parallel directives, combine multiple subqueries, which are applied at the same input position, rather than to consecutive input.
They come in vertical (line mode) and horizontal (character mode) flavors.
In horizontal mode, the current position is understood to be a character position in the line being processed. The clauses advance this character position by moving it to the right. In vertical mode, the current position is understood to be a line of text within the stream. A clause advances the position by some whole number of lines.
The syntax of these parallel directives follows this example:
@(some)
subquery1
.
.
.
@(and)
subquery2
.
.
.
@(and)
subquery3
.
.
.
@(end)
And in horizontal mode:
@(some)subquery1...@(and)subquery2...@(and)subquery3...@(end)
Long horizontal lines can be broken up with line continuations, allowing the above example to be written like this, which is considered a single logical line:
@(some)@\
subquery1...@\
@(and)@\
subquery2...@\
@(and)@\
subquery3...@\
@(end)
The @(some), @(all), @(none), @(maybe), @(cases) or @(choose) must be followed by at least one subquery clause, and be terminated by @(end). If there are two or more subqueries, these additional clauses are indicated by @(and) or @(or), which are interchangeable. The separator and terminator directives also must appear as the only element in a query line.
The choose directive requires keyword arguments. See below.
The syntax supports arbitrary nesting. For example:
QUERY: SYNTAX TREE:
@(all) all -+
@ (skip) +- skip -+
@ (some) | +- some -+
it | | +- TEXT
@ (and) | | +- and
@ (none) | | +- none -+
was | | | +- TEXT
@ (end) | | | +- end
@ (end) | | +- end
a dark | +- TEXT
@(end) *- end
nesting can be indicated using whitespace between @ and the directive expression. Thus, the above is an @(all) query containing a @(skip) clause which applies to a @(some) that is followed by the text line "a dark". The @(some) clause combines the text line "it", and a @(none) clause which contains just one clause consisting of the line "was".
The semantics of the parallel directives is:
The :resolve parameter is for situations when the @(some) directive has multiple clauses that need to bind some common variables to different values: for instance, output parameters in functions. Resolve takes a list of variable name symbols as an argument. This is called the resolve set. If the clauses of @(some) bind variables in the resolve set, those bindings are not visible to later clauses. However, those bindings do emerge out of the @(some) directive as a whole. This creates a conflict: what if two or more clauses introduce different bindings for a variable in the resolve set? This is why it is called the resolve set: conflicts for variables in the resolve set are automatically resolved in favor of later directives.
Example:
@(some :resolve (x))
@ (bind a "a")
@ (bind x "x1")
@(or)
@ (bind b "b")
@ (bind x "x2")
@(end)
Here, the two clauses both introduce a binding for x. Without the :resolve parameter, this would mean that the second clause fails, because x comes in with the value "x1", which does not bind with "x2". But because x is placed into the resolve set, the second clause does not see the "x1" binding. Both clauses establish their bindings independently creating a conflict over x. The conflict is resolved in favor of the second clause, and so the bindings which emerge from the directive are:
a="a"
b="b"
x="x2"
For all of the parallel directives other than @(none) and @(choose), the query advances the input position by the greatest number of lines that match in any of the successfully matching subclauses that are evaluated. The @(none) directive does not advance the input position.
For instance if there are two subclauses, and one of them matches three lines, but the other one matches five lines, then the overall clause is considered to have made a five line match at its position. If more directives follow, they begin matching five lines down from that position.
The syntax of @(require) is:
@(require lisp-expression)
The require directive evaluates a TXR Lisp expression. (See TXR LISP far below.) If the expression yields a true value, then it succeeds, and matching continues with the directives which follow. Otherwise the directive fails.
In the context of the require directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.
Example:
@; require that 4 is greater than 3
@; This succeeds; therefore, @a is processed
@(require (> (+ 2 2) 3))
@a
The if directive allows for conditional selection of pattern-matching clauses, based on the Boolean results of Lisp expressions.
A variant of the if directive is also available for use inside an output clauses, where it similarly allows for the conditional selection of output clauses.
The syntax of the if directive can be exemplified as follows:
@(if lisp-expr)
.
.
.
@(elif lisp-expr)
.
.
.
@(elif lisp-expr)
.
.
.
@(else)
.
.
.
@(end)
The @(elif) and @(else) clauses are all optional. If @(else) is present, it must be last, before @(end), after any @(elif) clauses. Any of the clauses may be empty.
@(if (> (length str) 42))
foo: @a @b
@(else)
{@c}
@(end)
In this example, if the length of the variable str is greater than 42, then matching continues with "foo: @a b", otherwise it proceeds with {@c}.
More precisely, how the if directive works is as follows. The Lisp expressions are evaluated in order, starting with the if expression, then the elif expressions if any are present. If any Lisp expression yields a true result (any value other than nil) then evaluation of Lisp expressions stops. The corresponding clause of that Lisp expression is selected and pattern matching continues with that clause. The result of that clause (its success or failure, and any newly bound variables) is then taken as the result of the if directive. If none of the Lisp expressions yield true, and an else clause is present, then that clause is processed and its result determines the result of the if directive. If none of the Lisp expressions yield true, and there is no else clause, then the if directive is deemed to have trivially succeeded, allowing matching to continue with whatever directive follows it.
The @(output) directive supports the embedding of Lisp expressions, whose values are interpolated into the output. In particular, Lisp if expressions are useful. For instance @(if expr "A" "B") reproduces A if expr yields a true value, otherwise B. Yet the @(if) directive is also supported in @(output). How the apparent conflict between the two is resolved is that the two take different numbers of arguments. An @(if) which has no arguments at all is a syntax error. One that has one argument is the head of the if directive syntax which must be terminated by @(end) and which takes the optional @(elif) and @(else) clauses. An @(if) which has two or more arguments is parsed as a self-contained Lisp expression.
Sometimes text is structured as items that can appear in an arbitrary order. When multiple matches need to be extracted, there is a combinatorial explosion of possible orders, making it impractical to write pattern matches for all the possible orders.
The gather directive is for these situations. It specifies multiple clauses which all have to match somewhere in the data, but in any order.
For further convenience, the lines of the first clause of the gather directive are implicitly treated as separate clauses.
The syntax follows this pattern:
@(gather)
one-line-query1
one-line-query2
.
.
.
one-line-queryN
@(and)
multi
line
query1
.
.
.
@(and)
multi
line
query2
.
.
.
@(end)
The multiline clauses are optional. The gather directive takes keyword parameters, see below.
Similarly to collect, gather has an optional until/last clause:
@(gather)
...
@(until)
...
@(end)
How gather works is that the text is searched for matches for the single-line and multiline queries. The clauses are applied in the order in which they appear. Whenever one of the clauses matches, any bindings it produces are retained and it is removed from further consideration. Multiple clauses can match at the same text position. The position advances by the longest match from among the clauses which matched. If no clauses match, the position advances by one line. The search stops when all clauses are eliminated, and then the cumulative bindings are produced. If the data runs out, but unmatched clauses remain, the directive fails.
Example: extract several environment variables, which do not appear in a particular order:
@(next :env)
@(gather)
USER=@USER
HOME=@HOME
SHELL=@SHELL
@(end)
If the until or last clause is present and a match occurs, then the matches from the other clauses are discarded and the gather terminates. The difference between until/last is that any bindings bindings established in last are retained, and the input position is advanced past the matching material. The until/last clause has visibility to bindings established in the previous clauses in that same iteration, even though those bindings end up thrown away.
For consistency, the :mandatory keyword is supported in the until/last clause of gather. The semantics of using :mandatory in this situation is tricky. In particular, if it is in effect, and the gather terminates successfully by collecting all required matches, it will trigger a failure. On the other hand, if the until or last clause activates before all required matches are gathered, a failure also occurs, whether or not the clause is :mandatory.
Meaningful use of :mandatory requires that the gather be open-ended; it must allow some (or all) variables not to be required. The presence of the option means that for gather to succeed, all required variables must be gathered first, but then termination must be achieved via the until/last clause before all gather clauses are satisfied.
Example:
@(gather :vars (a b c (d "foo")))
...
@(end)
Here, a, b and c are required variables, and d is optional, with the default value given by the Lisp expression "foo".
The presence of :vars changes the behavior in three ways.
Firstly, even if all the clauses in the gather match successfully and are eliminated, the directive will fail if the required variables do not have bindings. It doesn't matter whether the bindings are existing, or whether they are established by gather.
Secondly, if some of the clauses of gather did not match, but all of the required variables have bindings, then the directive succeeds. Without the presence of :vars, it would fail in this situation.
Thirdly, if gather succeeds (all required variables have bindings), then all of the optional variables which do not have bindings are given bindings to their default values.
The expressions which give the default values are evaluated whenever the gather directive is evaluated, whether or not their values are used.
The syntax of the collect directive is:
@(collect)
... lines of subquery
@(end)
or with an until or last clause:
@(collect)
... lines of subquery: main clause
@(until)
... lines of subquery: until clause
@(end)
@(collect)
... lines of subquery: main clause
@(last)
... lines of subquery: last clause
@(end)
The repeat symbol may be specified instead of collect, which changes the meaning, see below:
@(repeat)
... lines of subquery
@(end)
The subquery is matched repeatedly, starting at the current line. If it fails to match, it is tried starting at the subsequent line. If it matches successfully, it is tried at the line following the entire extent of matched data, if there is one. Thus, the collected regions do not overlap. (Overlapping behavior can be obtained: see the @(trailer) directive.)
Unless certain keywords are specified, or unless the collection is explicitly failed with @(fail), it always succeeds, even if it collects nothing, and even if the until/last clause never finds a match.
If no until/last clause is specified, and the collect is not limited using parameters, the collection is unbounded: it consumes the entire data file.
If an until/last clause is specified, the collection stops when that clause matches at the current position.
If an until clause terminates collect, no bindings are collected at that position, even if the main clause matches at that position also. Moreover, the position is not advanced. The remainder of the query begins matching at that position.
If a last clause terminates collect, the behavior is different. Any bindings captured by the main clause are thrown away, just like with the until clause. However, the bindings in the last clause itself survive, and the position is advanced to skip over that material.
Example:
The line 42 is not collected, even though it matches @a. Furthermore, the @(until) does not advance the position, so variable c takes 42.
If the @(until) is changed to @(last) the output will be different:
The 42 is not collected into a list, just like before. But now the binding captured by @b emerges. Furthermore, the position advances so variable now takes 6.
The binding variables within the clause of a collect are treated specially. The multiple matches for each variable are collected into lists, which then appear as array variables in the final output.
Example:
The query matches the data in three places, so each variable becomes a list of three elements, reported as an array.
Variables with list bindings may be referenced in a query. They denote a multiple match. The -D command-line option can establish a one-dimensional list binding.
The clauses of collect may be nested. Variable matches collated into lists in an inner collect are again collated into nested lists in the outer collect. Thus an unbound variable wrapped in N nestings of @(collect) will be an N-dimensional list. A one-dimensional list is a list of strings; a two-dimensional list is a list of lists of strings, etc.
It is important to note that the variables which are bound within the main clause of a collect, that is, the variables which are subject to collection, appear, within the collect, as normal one-value bindings. The collation into lists happens outside of the collect. So for instance in the query:
The left @x establishes a binding for some material preceding an equal sign. The right @x refers to that binding. The value of @x is different in each iteration, and these values are collected. What finally comes out of the collect clause is a single variable called x which holds a list containing each value that was ever instantiated under that name within the collect clause.
Also note that the until clause has visibility over the bindings established in the main clause. This is true even in the terminating case when the until clause matches, and the bindings of the main clause are discarded.
Within the @(collect) syntax, it is possible to specify keyword parameters for additional control of the behavior. A keyword parameter consist of a keyword symbol followed by an argument, enclosed within the @(collect) syntax. The following are the supported keywords.
@(collect :maxgap 5)
specifies that the gap between the current position and the first match for the body of the collect, or between consecutive matches can be no longer than five lines. A :maxgap value of 0 means that the collected regions must be adjacent and must match right from the starting position. For instance:
@(collect :maxgap 0)
M @a
@(end)
means: from here, collect consecutive lines of the form "M ...". This will not search for the first such line, nor will it skip lines which do not match this form.
means: collect every other line starting with the current line.
@(collect :lines 2)
foo: @a
bar: @b
baz: @c
@(end)
The above collect will look for a match only twice: at the current position, and one line down.
If there is an existing binding for variable prior to the processing of the collect, then the variable is shadowed.
The binding is collected in the same way as other bindings that are established in the collect body.
The repetition count only increments after a successful match.
The variable is visible to the collect's until/last clause. If that clause is being processed after a successful match of the body, then variable holds an integer value. If the body fails to match, then the until/last clause sees a binding for variable with a value of nil.
The :vars keyword allows the query writer to add discipline the collect body.
The argument to :vars is a list of variable specs. A variable spec is either a symbol, denoting a required variable, or a (symbol default-value) pair, where default-value is a Lisp expression whose value specifies a default value for the variable, which is optional.
When a :vars list is specified, it means that only the given variables can emerge from the successful collect. Any newly introduced bindings for other variables do not propagate. More precisely, whenever the collect body matches successfully, the following three rules apply:
In the event that collect does not match anything, the variables specified in :vars, whether required or optional, are all bound to empty lists. These bindings are established after the processing of the until/last clause, if present.
Example:
@(collect :vars (a b (c "foo")))
@a @c
@(end)
Here, if the body "@a @c" matches, an error will be thrown because one of the mandatory variables is b, and the body neglects to produce a binding for b.
Example:
@(collect :vars (a (c "foo")))
@a @b
@(end)
Here, if "@a @b" matches, only a will be collected, but not b, because b is not in the variable list. Furthermore, because there is no binding for c in the body, a binding is created with the value "foo", exactly as if c matched such a piece of text.
In the following example, the assumption is that THIS NEVER MATCHES is not found anywhere in the input but the line THIS DOES MATCH is found and has a successor which is bound to a. Because the body did not match, the :vars a and b should be bound to empty lists. But a is bound by the last clause to some text, so this takes precedence. Only b is bound to an empty list.
@(collect :vars (a b))
THIS NEVER MATCHES
@(last)
THIS DOES MATCH
@a
@(end)
The following means: do not allow any variables to propagate out of any iteration of the collect and therefore collect nothing:
@(collect :vars nil)
...
@(end)
Instead of writing @(collect :vars nil), it is possible to write @(repeat). @(repeat) takes all collect keywords, except for :vars. There is a @(repeat) directive used in @(output) clauses; that is a different directive.
The until/last clause supports the option keyword :mandatory, exemplified by the following:
@(collect)
...
@(last :mandatory)
...
@(end)
This means that the collect must be terminated by a match for the until/last clause, or else by an explicit @(accept).
Specifically, the collect cannot terminate due to simply running out of data, or exceeding a limit on the number of matches that may be collected. In those situations, if an until or last clause is present with :mandatory, the collect is deemed to have failed.
The coll directive is the horizontal version of collect. Whereas collect works with multiline clauses on line-oriented material, coll works within a single line. With coll, it is possible to recognize repeating regularities within a line and collect lists.
Regular-expression-based Positive Match variables work well with coll.
Example: collect a comma-separated list, terminated by a space.
Here, the variable A is bound to tokens which match the regular expression /[^, ]+/: nonempty sequence of characters other than commas or spaces.
Like collect, coll searches for matches. If no match occurs at the current character position, it tries at the next character position. Whenever a match occurs, it continues at the character position which follows the last character of the match, if such a position exists.
If not bounded by an until clause, it will exhaust the entire line. If the until clause matches, then the collection stops at that position, and any bindings from that iteration are discarded. Like collect, coll also supports an until/last clause, which propagates variable bindings and advances the position. The :mandatory keyword is supported.
coll clauses nest, and variables bound within a coll are available to clauses within the rest of the coll clause, including the until/last clause, and appear as single values. The final list aggregation is only visible after the coll clause.
The behavior of coll leads to difficulties when a delimited variable are used to match material which is delimiter separated rather than terminated. For instance, entries in a comma-separated files usually do not appear as "a,b,c," but rather "a,b,c".
So for instance, the following result is not satisfactory:
The 5 is missing because it isn't followed by a space, which the text-delimited variable match "@a " looks for. After matching "4 ", coll continues to look for matches, and doesn't find any. It is tempting to try to fix it like this:
The problem now is that the regular expression / ?/ (match either a space or nothing), matches at any position. So when it is used as a variable delimiter, it matches at the current position, which binds the empty string to the variable, the extent of the match being zero. In this situation, the coll directive proceeds character by character. The solution is to use positive matching: specify the regular expression which matches the item, rather than a trying to match whatever follows. The collect directive will recognize all items which match the regular expression:
The until clause can specify a pattern which, when recognized, terminates the collection. So for instance, suppose that the list of items may or may not be terminated by a semicolon. We must exclude the semicolon from being a valid character inside an item, and add an until clause which recognizes a semicolon:
Whether followed by the semicolon or not, the items are collected properly.
Note that the @(end) is followed by a semicolon. That's because when the @(until) clause meets a match, the matching material is not consumed.
This repetition can be avoided by using @(last) instead of @(until) since @(last) consumes the terminating material.
Instead of the above regular-expression-based approach, this extraction problem can also be solved with cases:
The @(coll) directive takes the :vars keyword.
The shorthand @(rep) may be used instead of @(coll :vars nil). @(rep) takes all keywords, except :vars.
The flatten directive can be used to convert variables to one-dimensional lists. Variables which have a scalar value are converted to lists containing that value. Variables which are multidimensional lists are flattened to one-dimensional lists.
Example (without @(flatten)):
Example (with @(flatten)):
The syntax of merge follows the pattern:
@(merge destination [sources ...])
destination is a variable, which receives a new binding. sources are bind expressions.
The merge directive provides a way of combining collected data from multiple nested lists in a way which normalizes different nesting levels among the sources. This directive is useful for combining the results from collects at different levels of nesting into a single nested list such that parallel elements are at equal depth.
A new binding is created for the destination variable, which holds the result of the operation.
The merge directive performs its special function if invoked with at least three arguments: a destination and two sources.
The one-argument case @(merge x) binds a new variable x and initializes it with the empty list and is thus equivalent to @(bind x). Likewise, the two-argument case @(merge x y) is equivalent to @(bind x y), establishing a binding for x which is initialized with the value of y.
To understand what merge does when two sources are given, as in @(merge C A B), we first have to define a property called depth. The depth of an atom such as a string is defined as 1. The depth of an empty list is 0. The depth of a nonempty list is one plus the depth of its deepest element. So for instance "foo" has depth 1, ("foo") has depth 2, and ("foo" ("bar")) has depth three.
We can now define a binary (two argument) merge(A, B) function as follows. First, merge(A, B) normalizes the values A and B to produce a pair of values which have equal depth, as defined above. If either value is an atom it is first converted to a one-element list containing that atom. After this step, both values are lists; and the only way an argument has depth zero is if it is an empty list. Next, if either value has a smaller depth than the other, it is wrapped in a list as many times as needed to give it equal depth. For instance if A is (a) and B is (((("b" "c") ("d" "e)))) then A is converted to (((("a")))). Finally, the list values are appended together to produce the merged result. In the case of the preceding two example values, the result is: (((("a"))) ((("b" "c") ("d" "e)))). The result is stored into a the newly bound destination variable C.
If more than two source arguments are given, these are merged by a left-associative reduction, which is to say that a three argument merge(X, Y, Z) is defined as merge(merge(X, Y), Z). The leftmost two values are merged, and then this result is merged with the third value, and so on.
The cat directive converts a list variable into a single piece of text. The syntax is:
@(cat var [sep])
The sep argument is a Lisp expression whose value specifies a separating piece of text. If it is omitted, then a single space is used as the separator.
Example:
The syntax of the bind directive is:
@(bind pattern bind-expression {keyword value}*)
The bind directive is a kind of pattern match, which matches one or more variables given in pattern against a value produced by the bind-expression on the right.
Variable names occurring in the pattern expression may refer to bound or unbound variables.
All variable references occurring in bind-expression must have a value.
Binding occurs as follows. The tree structure of pattern and the value of bind-expression are considered to be parallel structures.
Any variables in pattern which are unbound receive a new binding, which is initialized with the structurally corresponding piece of the object produced by bind-expression.
Any variables in pattern which are already bound must match the corresponding part of the value of bind-expression, or else the bind directive fails. Variables which are already bound are not altered, retaining their current values even if the matching is inexact.
The simplest bind is of one variable against itself, for instance binding A against A:
@(bind A A)
This will throw an exception if A is not bound. If A is bound, it succeeds, since A matches itself.
The next simplest bind binds one variable to another:
@(bind A B)
Here, if A is unbound, it takes on the same value as B. If A is bound, it has to match B, or the bind fails. Matching means that either
The right-hand side does not have to be a variable. It may be some other object, like a string, quasiliteral, regexp, or list of strings, etc. For instance,
@(bind A "ab\tc")
will bind the string "ab\tc" to the variable A if A is unbound. If A is bound, this will fail unless A already contains an identical string. However, the right-hand side of a bind cannot be an unbound variable, nor a complex expression that contains unbound variables.
The left-hand side of bind can be a nested list pattern containing variables. The last item of a list at any nesting level can be preceded by a . (dot), which means that the variable matches the rest of the list from that position.
Suppose that the list A contains ("now" "now" "brown" "cow"). Then the directive @(bind (H N . C) A), assuming that H, N and C are unbound variables, will bind H to "how", code N to "now", and C to the remainder of the list ("brown" "cow").
Example: suppose that the list A is nested to two dimensions and contains (("how" "now") ("brown" "cow")). Then @(bind ((H N) (B C)) A) binds H to "how", N to "now", B to "brown" and C to "cow".
The dot notation may be used at any nesting level. it must be followed by an item. The forms (.) and (X .) are invalid, but (. X) is valid and equivalent to X.
The number of items in a left pattern match must match the number of items in the corresponding right side object. So the pattern () only matches an empty list. The notations () and nil mean exactly the same thing.
The symbols nil, t and keyword symbols may be used on either side. They represent themselves. For example @(bind :foo :bar) fails, but @(bind :foo :foo) succeeds since the two sides denote the same keyword symbol object.
In this example, suppose A contains "foo" and B contains bar. Then @(bind (X (Y Z)) (A (B "hey"))) binds X to "foo", Y to "bar" and Z to "hey". This is because the bind-expression produces the object ("foo" ("bar" "hey")) which is then structurally matched against the pattern (X (Y Z)), and the variables receive the corresponding pieces.
@(bind "a" "A" :lfilt :upcase)
produces a match, since the left side is the same as the right after filtering through the :upcase filter.
For example, the following produces a match:
@(bind "A" "a" :rfilt :upcase)
For a description of filters, see Output Filtering below.
Compound filters like (:fromhtml :upcase) are supported with all these keywords. The filters apply across arbitrary patterns and nested data.
Example:
@(bind (a b c) ("A" "B" "C"))
@(bind (a b c) (("z" "a") "b" "c") :rfilt :upcase)
Here, the first bind establishes the values for a, b and c, and the second bind succeeds, because the value of a matches the second element of the list ("z" "a") if it is upcased, and likewise b matches "b" and c matches "c" if these are upcased.
TXR Lisp forms, introduced by @ may be used in the bind-expression argument of bind, or as the entire form. This is consistent with the rules for bind expressions.
TXR Lisp forms can be used in the pattern expression also.
Example:
@(bind a @(+ 2 2))
@(bind @(+ 2 2) @(* 2 2))
Here, a is bound to the integer 4. The second bind then succeeds because the forms (+ 2 2) and (* 2 2) produce equal values.
The syntax of the set directive is:
@(set pattern bind-expression)
The set directive syntactically resembles bind, but is not a pattern match. It overwrites the previous values of variables with new values from the right-hand side. Each variable that is assigned must have an existing binding: set will not induce binding.
Examples follow.
Store the value of A back into A, an operation with no effect:
@(set A A)
Exchange the values of A and B:
@(set (A B) (B A))
Store a string into A:
@(set A "text")
Store a list into A:
@(set A ("line1" "line2"))
Destructuring assignment. A ends up with "A", B ends up with ("B1" "B2") and C binds to ("C1" "C2").
@(bind D ("A" ("B1" "B2") "C1" "C2"))
@(bind (A B C) (() () ()))
@(set (A B . C) D)
Note that set does not support a TXR Lisp expression on the left side, so the following are invalid syntax:
@(set @(+ 1 1) @(* 2 2))
@(set @b @(list "a"))
The second one is erroneous even though there is a variable on the left. Because it is preceded by the @ escape, it is a Lisp variable, and not a pattern variable.
The set directive also doesn't support Lisp expressions in the pattern, which must consist only of variables.
The syntax of the rebind directive is:
@(rebind pattern bind-expression)
The rebind directive resembles bind. It combines the semantics of local and bind into a single directive. The bind-expression is evaluated in the current environment, and its value remembered. Then a new environment is produced in which all the variables specified in pattern are absent. Then, the pattern is newly bound in that environment against the previously produced value, as if using bind.
The old environment with the previous variables is not modified; it continues to exist. This is in contrast with the set directive, which mutates existing bindings.
rebind makes it easy to create temporary bindings based on existing bindings.
@(define pattern-function (arg))
@;; inside a pattern function:
@(rebind recursion-level @(+ recursion-level 1))
@;; ...
@(end)
When the function terminates, the previous value of recursion-level is restored. The effect is less verbose and more efficient than the following equivalent
@(define pattern-function (arg))
@;; inside a pattern function:
@(local temp)
@(set temp recursion-level)
@(local recursion-level)
@(set recursion-level @(+ temp 1))
@;; ...
@(end)
Like bind, rebind supports nested patterns, such as
@(rebind (a (b c)) (1 (2 3))
but it does not support any keyword arguments. The filtering features of bind do not make sense in rebind because the variables are always reintroduced into an environment in which they don't exist, whereas filtering applies in situations when bound variables are matched against values.
The rebind directive also doesn't support Lisp expressions in the pattern, which must consist only of variables.
The forget has two spellings: @(forget) and @(local).
The arguments are one or more symbols, for example:
@(forget a)
@(local a b c)
this can be written
@(local a)
@(local a b c)
Directives which follow the forget or local directive no longer see any bindings for the symbols mentioned in that directive, and can establish new bindings.
It is not an error if the bindings do not exist.
It is strongly recommended to use the @(local) spelling in functions, because the forgetting action simulates local variables: for the given symbols, the machine forgets any earlier variables from outside of the function, and consequently, any new bindings for those variables belong to the function. (Furthermore, functions suppress the propagation of variables that are not in their parameter list, so these locals will be automatically forgotten when the function terminates.)
The syntax of @(do) is:
@(do lisp-expression*)
The do directive evaluates zero or more TXR Lisp expressions. (See TXR LISP far below.) The value of the expression is ignored, and matching continues with the directives which follow the do directive, if any.
In the context of the do directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.
Example:
@; match text into variables a and b, then insert into hash table h
@(bind h @(hash))
@a:@b
@(do (set [h a] b))
The syntax of @(mdo) is:
@(mdo lisp-expression*)
Like the do directive, mdo (macro-time do) evaluates zero or more TXR Lisp expressions. Unlike do, mdo performs this evaluation immediately upon being parsed. Then it disappears from the syntax.
The effect of @(mdo e0 e1 e2 ...) is exactly like @(do (macro-time e0 e1 e2 ...)) except that do doesn't disappear from the syntax.
Another difference is that do can be used as a horizontal or vertical directive, whereas mdo is only vertical.
The in-package directive shares the same syntax and semantics as the TXR Lisp macro of the same name:
(in-package name)
The in-package directive is evaluated immediately upon being parsed, leaving no trace in the syntax tree of the surrounding TXR query.
It causes the *package* special variable to take on the package denoted by name.
The directive that name is either a string or symbol. An error exception is thrown if this isn't the case. Otherwise it searches for the package. If the package is not found, an error exception is thrown.
Blocks are useful for terminating parts of a pattern-matching search prematurely, and escaping to a higher level. This makes blocks not only useful for simplifying the semantics of certain pattern matches, but also an optimization tool.
Judicious use of blocks and escapes can reduce or eliminate the amount of backtracking that TXR performs.
The @(block name) directive introduces a named block, except when name is the symbol nil. The @(block) directive introduces an unnamed block, equivalent to @(block nil).
The @(skip) and @(collect) directives introduce implicit anonymous blocks, as do function bodies.
Blocks must be terminated by @(end) and can be vertical:
or horizontal:
The names of blocks are in a distinct namespace from the variable binding space. So @(block foo) is unrelated to the variable @foo.
A block extends from the @(block ...) directive which introduces it, until the matching @(end), and may be empty. For instance:
@(some)
abc
@(block foo)
xyz
@(end)
@(end)
Here, the block foo occurs in a @(some) clause, and so it extends to the @(end) which terminates the block. After that @(end), the name foo is not associated with a block (is not "in scope"). The second @(end) terminates the @(some) block.
The implicit anonymous block introduced by @(skip) has the same scope as the @(skip): it extends over all of the material which follows the skip, to the end of the containing subquery.
Blocks may nest, and nested blocks may have the same names as blocks in which they are nested. For instance:
@(block)
@(block)
...
@(end)
@(end)
is a nesting of two anonymous blocks, and
@(block foo)
@(block foo)
@(end)
@(end)
is a nesting of two named blocks which happen to have the same name. When a nested block has the same name as an outer block, it creates a block scope in which the outer block is "shadowed"; that is to say, directives which refer to that block name within the nested block refer to the inner block, and not to the outer one.
A block normally does nothing. The query material in the block is evaluated normally. However, a block serves as a termination point for @(fail) and @(accept) directives which are in scope of that block and refer to it.
The precise meaning of these directives is:
The @(fail) directive has a vertical and horizontal form.
If the implicit block introduced by @(skip) is terminated in this manner, this has the effect of causing skip itself to fail. In other words, the behavior is as if @(skip)'s search did not find a match for the trailing material, except that it takes place prematurely (before the end of the available data source is reached).
If the implicit block associated with a @(collect) is terminated this way, then the entire collect fails. This is a special behavior, because a collect normally does not fail, even if it matches nothing and collects nothing!
To prematurely terminate a collect by means of its anonymous block, without failing it, use @(accept).
@(accept) communicates the current bindings and input position to the terminated block. These bindings and current position may be altered by special interactions between certain directives and @(accept), described in the following section. Communicating the current bindings and input position means that the block which is terminated by @(accept) exhibits the bindings which were collected just prior to the execution of that @(accept) and the input position which was in effect at that time.
@(accept) has a vertical and horizontal form. In the horizontal form, it communicates a horizontal input position. A horizontal input position thus communicated will only take effect if the block being terminated had been suspended on the same line of input.
If the implicit block introduced by @(skip) is terminated by @(accept), this has the effect of causing the skip itself to succeed, as if all of the trailing material had successfully matched.
If the implicit block associated with a @(collect) is terminated by @(accept), then the collection stops. All bindings collected in the current iteration of the collect are discarded. Bindings collected in previous iterations are retained, and collated into lists in accordance with the semantics of collect.
Example: alternative way to achieve @(until) termination:
@(collect)
@ (maybe)
---
@ (accept)
@ (end)
@LINE
@(end)
This query will collect entire lines into a list called LINE. However, if the line --- is matched (by the embedded @(maybe)), the collection is terminated. Only the lines up to, and not including the --- line, are collected. The effect is identical to:
@(collect)
@LINE
@(until)
---
@(end)
The difference (not relevant in these examples) is that the until clause has visibility into the bindings set up by the main clause.
However, the following example has a different meaning:
@(collect)
@LINE
@ (maybe)
---
@ (accept)
@ (end)
@(end)
Now, lines are collected until the end of the data source, or until a line is found which is followed by a --- line. If such a line is found, the collection stops, and that line is not included in the collection! The @(accept) terminates the process of the collect body, and so the action of collecting the last @LINE binding into the list is not performed.
Example: communication of bindings and input position:
At the point where the accept occurs, the foo block has matched the first line, bound the text "1" to the variable @first. The block is then terminated. Not only does the @first binding emerge from this terminated block, but what also emerges is that the block advanced the data past the first line to the second line. Next, the @(some) directive ends, and propagates the bindings and position. Thus the @second which follows then matches the second line and takes the text "2".
Example: abandonment of @(some) clause by @(accept):
In the following query, the foo block occurs inside a maybe clause. Inside the foo block there is a @(some) clause. Its first subclause matches variable @first and then terminates block foo. Since block foo is outside of the @(some) directive, this has the effect of terminating the @(some) clause:
The second clause of the @(some) directive, namely:
@one
@two
@three
@four
is never processed. The reason is that subclauses are processed in top to bottom order, but the processing was aborted within the first clause the @(accept foo). The @(some) construct never gets the opportunity to match four lines.
If the @(accept foo) line is removed from the above query, the output is different:
Now, all clauses of the @(some) directive have the opportunity to match. The second clause grabs four lines, which is the longest match. And so, the next line of input available for matching is 5, which goes to the @second variable.
If one of the clauses which follow a @(trailer) requests a successful termination to an outer block via @(accept), then @(trailer) intercepts the escape and adjusts the data extent to the position that it was given.
Example:
The variable line3 is bound to "1" because although @(accept) yields a data position which has advanced to the third line, this is intercepted by @(trailer) and adjusted back to the first line. Neglecting to do this adjustment would violate the semantics of trailer.
When the clauses under a next directive are terminated by an accept, such that control passes to a block which surrounds that next, the accept is intercepted by next.
The input position being communicated by the accept is replaced with the original input position in the original stream which is in effect prior to the next directive. The accept transfer is then resumed.
In other words, accept cannot be used to "leak" the new stream out of a next scope.
However, next has no effect on the bindings being communicated.
Example:
@(next "file-x")
@(block b)
@(next "file-y")
@line
@(accept b)
@(end)
Here, the variable line matches the first line of the file "file-y", after which an accept transfer is initiated, targeting block b. This transfer communicates the line binding, as well as the position within file-y, pointing at the second line. However, the accept traverses the next directive, causing it to be abandoned. The special unwinding action within that directive detects this transfer and rewrites the input position to be the original one within the stream associated with "file-x". Note that this special handling exists in order for the behavior to be consistent with what would happen if the @(accept b) were removed, and the block b terminated normally: because the inner next is nested within that block, TXR would backtrack to the previous input position within "file-x".
Example:
@(define fun (a))
@ (bind a "a")
@ (bind b "b")
@ (accept blk)
@(end)
@(block blk)
@(fun x)
this line is skipped by accept
@(end)
Here, the accept initiates a control transfer which communicates the a and b variable bindings which are visible in that scope. This transfer is intercepted by the function, and the treatment of the bindings follows to the same rules as a normal return (which, in the given function, would readily take place if the accept directive were removed). The b variable is suppressed, because b isn't a parameter of the function. Because a is a parameter, and the argument to that parameter is the unbound variable x, the effect is that x is bound to the value of a. When the accept transfer reaches block blk and terminates it, all that emerges is the x binding carrying "a".
If the accept invocation is removed from fun, then the function returns normally, producing the x binding. In that case, the line this line is skipped by accept isn't skipped since the block isn't being terminated; that line must match something.
The processing of the finally block detects that it has been triggered by an accept transfer. Consequently, it retrieves the current input position and bindings from that transfer, and uses that position and those bindings for the processing of the finally clauses.
If the finally clauses succeed, then the new input position and new bindings are installed into the accept control transfer and that transfer resumes.
If the finally clauses fail, then the accept transfer is converted to a fail, with exactly the same block as its destination.
This creates the possibility that an accept in horizontal context targets a vertical block or vice versa, raising the question of how the input position is treated. The semantics of this is defined.
If a horizontal-context accept targets a vertical block, the current position at the target block will be the following line. That is to say, when the horizontal accept occurs, there is a current input line which may have unconsumed material past the current position. If the accept communicates its input position to a vertical context, that unconsumed material is skipped, as if it had been matched and the vertical position is advanced to the next line.
If a horizontal block catches a vertical accept, it rejects that accept's position and stays at the current backtracking position for that block. Only the bindings from the accept are retained.
Functions in TXR are not exactly like functions in mathematics or functional languages, and are not like procedures in imperative programming languages. They are not exactly like macros either. What it means for a TXR function to take arguments and produce a result is different from the conventional notion of a function.
A TXR function may have one or more parameters. When such a function is invoked, an argument must be specified for each parameter. However, a special behavior is at play here. Namely, some or all of the argument expressions may be unbound variables. In that case, the corresponding parameters behave like unbound variables also. Thus TXR function calls can transmit the "unbound" state from argument to parameter.
It should be mentioned that functions have access to all bindings that are visible in the caller; functions may refer to variables which are not mentioned in their parameter list.
With regard to returning, TXR functions are also unconventional. If the function fails, then the function call is considered to have failed. The function call behaves like a kind of match; if the function fails, then the call is like a failed match.
When a function call succeeds, then the bindings emanating from that function are processed specially. Firstly, any bindings for variables which do not correspond to one of the function's parameters are thrown away. Functions may internally bind arbitrary variables in order to get their job done, but only those variables which are named in the function argument list may propagate out of the function call. Thus, a function with no arguments can only indicate matching success or failure, but not produce any bindings. Secondly, variables do not propagate out of the function directly, but undergo a renaming. For each parameter which went into the function as an unbound variable (because its corresponding argument was an unbound variable), if that parameter now has a value, that value is bound onto the corresponding argument.
Example:
@(define collect-words (list))
@(coll)@{list /[^ \t]+/}@(end)
@(end)
The above function collect-words contains a query which collects words from a line (sequences of characters other than space or tab), into the list variable called list. This variable is named in the parameter list of the function, therefore, its value, if it has one, is permitted to escape from the function call.
Suppose the input data is:
Fine summer day
and the function is called like this:
@(collect-words wordlist)
The result (with txr -B) is:
wordlist[0]=Fine
wordlist[1]=summer
wordlist[1]=day
How it works is that in the function call @(collect-words wordlist), wordlist is an unbound variable. The parameter corresponding to that unbound variable is the parameter list. Therefore, that parameter is unbound over the body of the function. The function body collects the words of "Fine summer day" into the variable list, and then yields the that binding. Then the function call completes by noticing that the function parameter list now has a binding, and that the corresponding argument wordlist has no binding. The binding is thus transferred to the wordlist variable. After that, the bindings produced by the function are thrown away. The only enduring effects are:
Another way to understand the parameter behavior is that function parameters behave like proxies which represent their arguments. If an argument is an established value, such as a character string or bound variable, the parameter is a proxy for that value and behaves just like that value. If an argument is an unbound variable, the function parameter acts as a proxy representing that unbound variable. The effect of binding the proxy is that the variable becomes bound, an effect which is settled when the function goes out of scope.
Within the function, both the original variable and the proxy are visible simultaneously, and are independent. What if a function binds both of them? Suppose a function has a parameter called P, which is called with an argument A, which is an unbound variable, and then, in the function, both A and P bound. This is permitted, and they can even be bound to different values. However, when the function terminates, the local binding of A simply disappears (because the symbol A is not among the parameters of the function). Only the value bound to P emerges, and is bound to A, which still appears unbound at that point. The P binding disappears also, and the net effect is that A is now bound. The "proxy" binding of A through the parameter P "wins" the conflict with the direct binding.
Function definition syntax comes in two flavors: vertical and horizontal. Horizontal definitions actually come in two forms, the distinction between which is hardly noticeable, and the need for which is made clear below.
A function definition begins with a @(define ...) directive. For vertical functions, this is the only element in a line.
The define symbol must be followed by a symbol, which is the name of the function being defined. After the symbol, there is a parenthesized optional argument list. If there is no such list, or if the list is specified as () or the symbol nil then the function has no parameters. Examples of valid define syntax are:
@(define foo)
@(define bar ())
@(define match (a b c))
If the define directive is followed by more material on the same line, then it defines a horizontal function:
@(define match-x)x@(end)
If the define is the sole element in a line, then it is a vertical function, and the function definition continues below:
@(define match-x)
x
@(end)
The difference between the two is that a horizontal function matches characters within a line, whereas a vertical function matches lines within a stream. The former match-x matches the character x, advancing to the next character position. The latter match-x matches a line consisting of the character x, advancing to the next line.
Material between @(define) and @(end) is the function body. The define directive may be followed directly by the @(end) directive, in which case the function has an empty body.
Functions may be nested within function bodies. Such local functions have dynamic scope. They are visible in the function body in which they are defined, and in any functions invoked from that body.
The body of a function is an anonymous block. (See Blocks above.)
If a horizontal function is defined as the only element of a line, it may not be followed by additional material. The following construct is erroneous:
@(define horiz (x))@foo:@bar@(end)lalala
This kind of definition is actually considered to be in the vertical context, and like other directives that have special effects and that do not match anything, it does not consume a line of input. If the above syntax were allowed, it would mean that the line would not only define a function but also match lalala. This would, in turn, would mean that the @(define)...@(end) is actually in horizontal mode, and so it matches a span of zero characters within a line (which means that is would require a line of input to match: a surprising behavior for a nonmatching directive!)
A horizontal function can be defined in an actual horizontal context. This occurs if its is in a line where it is preceded by other material. For instance:
X@(define fun)...@(end)Y
This is a query line which must match the text XY. It also defines the function fun. The main use of this form is for nested horizontal functions:
@(define fun)@(define local_fun)...@(end)@(end)
A function of the same name may be defined as both vertical and horizontal. Both functions are available at the same time. Which one is used by a call is resolved by context. See the section Vertical Versus Horizontal Calls below.
A function is invoked by compound directive whose first symbol is the name of that function. Additional elements in the directive are the arguments. Arguments may be symbols, or other objects like string and character literals, quasiliterals ore regular expressions.
Example:
The first call to the function takes the line "one two". The parameter a takes "one" and parameter b takes "two". These are rebound to the arguments first and second. The second call to the function binds the a parameter to the word "ice", and the b is unbound, because the corresponding argument cream is unbound. Thus inside the function, a is forced to match ice. Then a space is matched and b collects the text "milk". When the function returns, the unbound "cream" variable gets this value.
If a symbol occurs multiple times in the argument list, it constrains both parameters to bind to the same value. That is to say, all parameters which, in the body of the function, bind a value, and which are all derived from the same argument symbol must bind to the same value. This is settled when the function terminates, not while it is matching. Example:
Here the query fails because a and b are effectively proxies for the same unbound variable same and are bound to different values, creating a conflict which constitutes a match failure.
A function call which is the only element of the query line in which it occurs is ambiguous. It can go either to a vertical function or to the horizontal one. If both are defined, then it goes to the vertical one.
Example:
Not only does this call go to the vertical function, but it is in a vertical context.
If only a horizontal function is defined, then that is the one which is called, even if the call is the only element in the line. This takes place in a horizontal character-matching context, which requires a line of input which can be traversed:
Example:
The query fails because since @(which fun) is in horizontal mode, it matches characters in a line. Since the function body consists only of @(bind ...) which doesn't match any characters, the function call requires an empty line to match. The line ABC is not empty, and so there is a matching failure. The following example corrects this:
Example:
A call made in a clearly horizontal context will prefer the horizontal function, and only fall back on the vertical one if the horizontal one doesn't exist. (In this fallback case, the vertical function is called with empty data; it is useful for calling vertical functions which process arguments and produce values.)
In the next example, the call is followed by trailing material, placing it in a horizontal context. Leading material will do the same thing:
Example:
As described earlier, variables bound in a function body which are not parameters of the function are discarded when the function returns. However, that, by itself, doesn't make these variables local, because pattern functions have visibility to all variables in their calling environment. If a variable x exists already when a function is called, then an attempt to bind it inside a function may result in a failure. The local directive must be used in a pattern function to list which variables are local.
Example:
@(define path (path))@\
@(local x y)@\
@(cases)@\
(@(path x))@(path y)@(bind path `(@x)@y`)@\
@(or)@\
@{x /[.,;'!?][^ \t\f\v]/}@(path y)@(bind path `@x@y`)@\
@(or)@\
@{x /[^ .,;'!?()\t\f\v]/}@(path y)@(bind path `@x@y`)@\
@(or)@\
@(bind path "")@\
@(end)@\
@(end)
This is a horizontal function which matches a path, which lands into four recursive cases. A path can be parenthesized path followed by a path; it can be a certain character followed by a path, or it can be empty
This function ensures that the variables it uses internally, x and y, do not have anything to do with any inherited bindings for x and y.
Note that the function is recursive, which cannot work without x and y being local, even if no such bindings exist prior to the top-level invocation of the function. The invocation @(path x) causes x to be bound, which is visible inside the invocation @(path y), but that invocation needs to have its own binding of x for local use.
Function definitions may appear in a function. Such definitions are visible in all functions which are invoked from the body (and not necessarily enclosed in the body). In other words, the scope is dynamic, not lexical. Inner definitions shadow outer definitions. This means that a caller can redirect the function calls that take place in a callee, by defining local functions which capture the references.
Example:
Here, the function which is defined which calls fun. A top-level definition of fun is introduced which outputs "top-level fun!". The function callee provides its own local definition of fun which outputs "local fun!" before calling which. When callee is invoked, it calls which, whose @(fun) call is routed to callee's local definition. When which is called directly from the top level, its fun call goes to the top-level definition.
Function indirection may be performed using the
call
directive. If
fun-expr
is an Lisp expression which evaluates to a symbol, and
that symbol names a function which takes no arguments, then
@(call fun-expr)
may be used to invoke the function. Additional
expressions may be supplied which specify arguments.
Example 1:
@(define foo (arg))
@(bind arg "abc")
@(end)
@(call 'foo b)
In this example, the effect is that foo is invoked, and b ends up bound to "abc".
The call directive here uses the 'foo expression to calculate the name of the function to be invoked. (See the quote operator).
This particular call expression can just be replaced by the direct invocation syntax @(foo b).
The power of call lies in being able to specify the function as a value which comes from elsewhere in the program, as in the following example.
@(define foo (arg))
@(bind arg "abc")
@(end)
@(bind f @'foo)
@(call f b)
Here the call directive obtains the name of the function from the f variable.
Note that function names are resolved to functions in the environment that is apparent at the point in execution where the call takes place. The directive @(call f args ...) is precisely equivalent to @(s args ...) if, at the point of the call, f is a variable which holds the symbol s and symbol s is defined as a function. Otherwise it is erroneous.
The syntax of the load and include directives is:
Where expr is a Lisp expression that evaluates to a string giving the path of the file to load.
Firstly, the path given by expr is converted to an effective path, as follows.
If the value of the *load-path* variable has a current value which is not nil and the path given in expr is pure relative according to the pure-rel-path-p function, then the effective path is interpreted taken relative to the directory portion of the path which is stored in *load-path*.
If *load-path* is nil, or the load path is not pure relative, then the path is taken as-is as the effective path.
Next, an attempt is made to open the file for processing, in almost exactly the same manner as by the TXR Lisp function load. The difference is that if the effective path is unsuffixed, then the .txr suffix is added to it, and that resulting path is tried first, and if it succeeds, then the file is treated as TXR Pattern Language syntax. If that fails, then the suffix .tlo is tried, and so forth, as described for the load function.
If these initial attempts to find the file fail, and the failure is due to the file not being found rather than some other problem such as a permission error, and expr isn't an absolute path according to abs-path-p, then additional attempts are made by searching for the file in the list of directories given in the *load-search-dirs* variable. Details are given in the description of the TXR Lisp load function.
Both the load and include directives bind the *load-path* variable to the path of the loaded file just before parsing syntax from it, The *package* variable is also given a new dynamic binding, whose value is the same as the existing binding. These bindings are removed when the load operation completes, restoring the prior values of these variables. The *load-hooks* variable is given a new dynamic binding, with a nil value.
If the file opened for processing is TXR Lisp source, or a compiled TXR Lisp file, then it is processed in the manner described for the load function.
Different requirements apply to the processing of the file under the load and include directives.
The include directive performs the processing of the file at parse time. If the file being processed is TXR Pattern Language, then it is parsed, and then its syntax replaces the include directive, as if it had originally appeared in its place. If a TXR Lisp source or a compiled TXR Lisp file is processed by include then the include directive is removed from the syntax.
The load directive performs the processing of the file at evaluation time. Evaluation time occurs after a TXR program is read from beginning to end and parsed. That is to say, when a TXR query is parsed, any embedded @(load ...) forms in it are parsed and constitute part of its syntax tree. They are executed when that query is executed, whenever its execution reaches those load directives. When the load directive processes TXR Pattern Language syntax, it parses the file in its entirety and then executes that file's directives against the current input position. Repeated executions of the same load directive result in repeated processing of the file.
Note: the include directive is useful for loading TXR files which contain Lisp macros which are needed by the parent program. The parent program cannot use load to bring in macros because macros are required during expansion, which takes place prior to evaluation time, whereas load doesn't execute until evaluation time.
See also: the self-path, stdlib and *load-path* variables in TXR Lisp.
A TXR query may perform custom output. Output is performed by output clauses, which may be embedded anywhere in the query, or placed at the end. Output occurs as a side effect of producing a part of a query which contains an @(output) directive, and is executed even if that part of the query ultimately fails to find a match. Thus output can be useful for debugging. An output clause specifies that its output goes to a file, pipe, or (by default) standard output. If any output clause is executed whose destination is standard output, TXR makes a note of this, and later, just prior to termination, suppresses the usual printing of the variable bindings or the word false.
The syntax of the @(output) directive is:
@(output [ destination ] { bool-keyword | keyword value }* )
.
. one or more output directives or lines
.
@(end)
If the directive has arguments, then the first one is evaluated. If it is an object other than a keyword symbol, then it specifies the optional destination. Any remaining arguments after the optional destination are the keyword list. If the destination is missing, then the entire argument list is a keyword list.
The destination argument, if present, is treated as a TXR Lisp expression and evaluated. The resulting value is taken as the output destination. The value may be a string which gives the pathname of a file to open for output. Otherwise, the destination must be a stream object.
The keyword list consists of a mixture of Boolean keywords which do not have an argument, or keywords with arguments.
The following Boolean keywords are supported:
Note that since command pipes are processes that report errors asynchronously, a failing command will not throw an immediate exception that can be suppressed with :nothrow. This is for synchronous errors, like trying to open a destination file, but not having permissions, etc.
The following value keywords are supported:
See the later sections Output Filtering below, and The Deffilter Directive.
Text in an output clause is not matched against anything, but is output verbatim to the destination file, device or command pipe.
Variables occurring in an output clause do not match anything; instead their contents are output.
A variable being output can be any object. If it is of a type other than a list or string, it will be converted to a string as if by the tostring function in TXR Lisp.
A list is converted to a string in a special way: the elements are individually converted to a string and then they are catenated together. The default separator string is a single space: an alternate separation can be specified as an argument in the brace substitution syntax. Empty lists turn into an empty string.
Lists may be output within @(repeat) or @(rep) clauses. Each nesting of these constructs removes one level of nesting from the list variables that it contains.
In an output clause, the @{name number} variable syntax generates fixed-width field, which contains the variable's text. The absolute value of the number specifies the field width. For instance -20 and 20 both specify a field width of twenty. If the text is longer than the field, then it overflows the field. If the text is shorter than the field, then it is left-adjusted within that field, if the width is specified as a positive number, and right-adjusted if the width is specified as negative.
An output variable may specify a filter which overrides any filter established for the output clause. The syntax for this is @{NAME :filter filterspec}. The filter specification syntax is the same as in the output clause. See Output Filtering below.
Additional syntax is supported in output variables that does not appear in pattern-matching variables.
A square bracket index notation may be used to extract elements or ranges from a variable, which works with strings, vectors and lists. Elements are indexed from zero. This notation is only available in brace-enclosed syntax, and looks like this:
If the variable is a list, it is treated as a list substitution, exactly as if it were the value of an unsubscripted list variable. The elements of the list are converted to strings and catenated together with a separator string between them, the default one being a single space.
An alternate character may be given as a string argument in the brace notation.
Example:
@(bind a ("a" "b" "c" "d"))
@(output)
@{a[1..3] "," 10}
@(end)
The above produces the text "b,c" in a field 10 spaces wide. The [1..3] argument extracts a range of a; the "," argument specifies an alternate separator string, and 10 specifies the field width.
The brace syntax has another syntactic and semantic extension in output clauses. In place of the symbol, an expression may appear. The value of that expression is substituted.
Example:
@(bind a "foo")
@(output)
@{`@a:` -10}
Here, the quasiliteral expression `@a:` is evaluated, producing the string "foo:". This string is printed right-adjusted in a 10 character field.
The repeat directive generates repeated text from a "boilerplate", by taking successive elements from lists. The syntax of repeat is like this:
@(repeat)
.
.
main clause material, required
.
.
special clauses, optional
.
.
@(end)
repeat has four types of special clauses, any of which may be specified with empty contents, or omitted entirely. They are described below.
repeat takes arguments, also described below.
All of the material in the main clause and optional clauses is examined for the presence of variables. If none of the variables hold lists which contain at least one item, then no output is performed, (unless the repeat specifies an @(empty) clause, see below). Otherwise, among those variables which contain nonempty lists, repeat finds the length of the longest list. This length of this list determines the number of repetitions, R.
If the repeat contains only a main clause, then the lines of this clause is output R times. Over the first repetition, all of the variables which, outside of the repeat, contain lists are locally rebound to just their first item. Over the second repetition, all of the list variables are bound to their second item, and so forth. Any variables which hold shorter lists than the longest list eventually end up with empty values over some repetitions.
Example: if the list A holds "1", "2" and "3"; the list B holds "A", "B"; and the variable C holds "X", then
@(repeat)
>> @C
>> @A @B
@(end)
will produce three repetitions (since there are two lists, the longest of which has three items). The output is:
>> X
>> 1 A
>> X
>> 2 B
>> X
>> 3
The last line has a trailing space, since it is produced by "@A @B", where B has an empty value. Since C is not a list variable, it produces the same value in each repetition.
The special clauses are:
The precedence among the clauses which take an iteration is: single > first > modlast > last > mod > main. That is, whenever two or more of these clauses can apply to a repetition, then the leftmost one in this precedence list will be selected. It is possible for all these clauses to be viable for processing the same repetition. If a repeat occurs which has only one repetition, then that repetition is simultaneously the first, only and last repetition. Moreover, it also matches (mod 0 m) and, because it is the last repetition, it matches (modlast 0 m). In this situation, if there is a @(single) clause present, then the repetition shall be processed using that clause. Otherwise, if there is a @(first) clause present, that clause is activated. Failing that, @(modlast) is used if there is such a clause, featuring an n argument of zero. If there isn't, then the @(last) clause is considered, if present. Otherwise, the @(mod) clause is considered if present with an n argument of zero. Otherwise, none of these clauses are present or applicable, and the repetition is processed using the main clause.
The @(empty) clause does not appear in the above precedence list because it is mutually exclusive with respect to the others: it is processed only when there are no iterations, in which case even the main clause isn't active.
The @(repeat) clause supports arguments.
@(repeat
[:counter {symbol | (symbol expr)}]
[:vars ({symbol | (symbol expr)}*)])
The :counter argument designates a symbol which will behave as an integer variable over the scope of the clauses inside the repeat. The variable provides access to the repetition count, starting at zero, incrementing with each repetition. If the argument is given as (symbol expr) then expr is a Lisp expression whose value is taken as a displacement value which is added to each iteration of the counter. For instance :counter (c 1) specifies a counter c which counts from 1.
The :vars argument specifies a list of variable name symbols symbol or else pairs of the form (symbol init-form) consisting of a variable name and Lisp expression. Historically, the former syntax informed repeat about references to variables contained in Lisp code. This usage is no longer necessary as of TXR 243, since the repeat construct walks Lisp code, identifying all free variables. The latter syntax introduces a new pattern variable binding for symbol over the scope of the repeat construct. The init-form specifies a Lisp expression which is evaluated to produce the binding's value.
The repeat directive then processes the list of variables, selecting from it those which have a binding, either a previously existing binding or the one just introduced. For each selected variable, repeat will assume that the variable occurs in the repeat block and contains a list to be iterated.
The variable binding syntax supported by :vars of the form (symbol init-form) provides a solution for situations when it is necessary to iterate over some list, but that list is the result of an expression, and not stored in any variable. A repeat block iterates only over lists emanating from variables; it does not iterate over lists pulled from arbitrary expressions.
Example: output all file names matching the *.txr pattern in the current directory:
@(output)
@(repeat :vars ((name (glob "*.txr"))))
@name
@(end)
@(end)
Prior to TXR 243, the simple variable-binding syntax supported by :vars of the form symbol was needed for situations in which TXR Lisp expressions which referenced variables were embedded in @(repeat) blocks. Variable references embedded in Lisp code were not identified in @(repeat). For instance, the following produced no output, because no variables were found in the repeat body:
@(bind trigraph ("abc" "def" "ghi"))
@(output)
@(repeat)
@(reverse trigraph)
@(end)
@(end)
There is a reference to trigraph but it's inside the (reverse trigraph) Lisp expression that was not processed by repeat. The solution was to mention trigraph in the :vars construct:
@(bind trigraph ("abc" "def" "ghi"))
@(output)
@(repeat :vars (trigraph))
@(reverse trigraph)
@(end)
@(end)
Then the repeat block would iterate over trigraph, producing the output
cba
fed
igh
This workaround is no longer required as of TXR 243; the output is produced by the first example, without :vars.
If a repeat clause encloses variables which hold multidimensional lists, those lists require additional nesting levels of repeat (or rep). It is an error to attempt to output a list variable which has not been decimated into primary elements via a repeat construct.
Suppose that a variable X is two-dimensional (contains a list of lists). X must be nested twice in a repeat. The outer repeat will traverse the lists contained in X. The inner repeat will traverse the elements of each of these lists.
A nested repeat may be embedded in any of the clauses of a repeat, not only in the main clause.
The rep directive is similar to repeat. Whereas repeat is line-oriented, rep generates material within a line. It has all the same clauses, but everything is specified within one line:
@(rep)... main material ... .... special clauses ...@(end)
More than one @(rep) can occur within a line, mixed with other material. A @(rep) can be nested within a @(repeat) or within another @(rep).
Also, @(rep) accepts the same :counter and :vars arguments.
Example 1: show the list L in parentheses, with spaces between the elements, or the word EMPTY if the list is empty:
@(output)
@(rep)@L @(single)(@L)@(first)(@L @(last)@L)@(empty)EMPTY@(end)
@(end)
Here, the @(empty) clause specifies EMPTY. So if there are no repetitions, the text EMPTY is produced. If there is a single item in the list L, then @(single)(@L) produces that item between parentheses. Otherwise if there are two or more items, the first item is produced with a leading parenthesis followed by a space by @(first)(@L and the last item is produced with a closing parenthesis: @(last)@L). All items in between are emitted with a trailing space by the main clause: @(rep)@L.
Example 2: show the list L like Example 1 above, but the empty list is ().
@(output)
(@(rep)@L @(last)@L@(end))
@(end)
This is simpler. The parentheses are part of the text which surrounds the @(rep) construct, produced unconditionally. If the list L is empty, then @(rep) produces no output, resulting in (). If the list L has one or more items, then they are produced with spaces each one, except the last which has no space. If the list has exactly one item, then the @(last) applies to it instead of the main clause: it is produced with no trailing space.
The syntax of the close directive is:
@(close expr)
Where expr evaluates to a stream. The close directive can be used to explicitly close streams created using @(output ... :named var) syntax, as an alternative to @(output :finish expr).
Examples:
Write two lines to "foo.txt" over two output blocks using a single stream:
@(output "foo.txt" :named foo)
Hello,
@(end)
@(output :continue foo)
world!
@(end)
@(close foo)
The same as above, using :finish rather than :continue so that the stream is closed at the end of the second block:
@(output "foo.txt" :named foo)
Hello,
@(end)
@(output :finish foo)
world!
@(end)
Often it is necessary to transform the output to preserve its meaning under the convention of a given data format. For instance, if a piece of text contains the characters < or >, then if that text is being substituted into HTML, these should be replaced by < and >. This is what filtering is for. Filtering is applied to the contents of output variables, not to any template text. TXR implements named filters. Built-in filters are named by keywords, given below. User-defined filters are possible, however. See notes on the deffilter directive below.
Instead of a filter name, the syntax (fun name) can be used. This denotes that the function called name is to be used as a filter. This is described in the next section Function Filters below.
Built-in filters named by keywords:
Examples:
To escape HTML characters in all variable substitutions occurring in an output clause, specify :filter :tohtml in the directive:
@(output :filter :tohtml)
...
@(end)
To filter an individual variable, add the syntax to the variable spec:
@(output)
@{x :filter :tohtml}
@(end)
Multiple filters can be applied at the same time. For instance:
@(output)
@{x :filter (:upcase :tohtml)}
@(end)
This will fold the contents of x to uppercase, and then encode any special characters into HTML. Beware of combinations that do not make sense. For instance, suppose the original text is HTML, containing codes like ". The compound filter (:upcase :fromhtml) will not work because " will turn to " which no longer be recognized by the :fromhtml filter, since the entity names in HTML codes are case-sensitive.
Capture some numeric variables and convert to numbers:
@date @time @temperature @pressure
@(filter :tofloat temperature pressure)
@;; temperature and pressure can now be used in calculations
A function can be used as a filter. For this to be possible, the function must conform to certain rules:
For instance, the following is a valid filter function:
@(define foo_to_bar (in out))
@ (next :string in)
@ (cases)
foo
@ (bind out "bar")
@ (or)
@ (bind out in)
@ (end)
@(end)
This function binds the out parameter to "bar" if the in parameter is "foo", otherwise it binds the out parameter to a copy of the in parameter. This is a simple filter.
To use the filter, use the syntax (:fun foo_to_bar) in place of a filter name. For instance in the bind directive:
@(bind "foo" "bar" :lfilt (:fun foo_to_bar))
The above should succeed since the left side is filtered from "foo" to "bar", so that there is a match.
Function filters can be used in a chain:
@(output :filter (:downcase (:fun foo_to_bar) :upcase))
...
@(end)
Here is a split function which takes an extra argument which specifies the separator:
@(define split (in out sep))
@ (next :list in)
@ (coll)@(maybe)@token@sep@(or)@token@(end)@(end)
@ (bind out token)
@(end)
Furthermore, note that it produces a list rather than a string. This function separates the argument in into tokens according to the separator text carried in the variable sep.
Here is another function, join, which catenates a list:
@(define join (in out sep))
@ (output :into out)
@ (rep)@in@sep@(last)@in@(end)
@ (end)
@(end)
Now here is these two being used in a chain:
@(bind text "how,are,you")
@(output :filter (:fun split ",") (:fun join "-"))
@text
@(end)
Output:
how-are-you
When the filter invokes a function, it generates the first two arguments internally to pass in the input value and capture the output. The remaining arguments from the (:fun ...) construct are also passed to the function. Thus the string objects "," and "-" are passed as the sep argument to split and join.
Note that split puts out a list, which join accepts. So the overall filter chain operates on a string: a string goes into split, and a string comes out of join.
The deffilter directive allows a query to define a custom filter, which can then be used in output clauses to transform substituted data.
The syntax of deffilter is illustrated in this example:
The deffilter symbol must be followed by the name of the filter to be defined, followed by bind expressions which evaluate to lists of strings. Each list must be at least two elements long and specifies one or more texts which are mapped to a replacement text. For instance, the following specifies a telephone keypad mapping from uppercase letters to digits.
@(deffilter alpha_to_phone ("E" "0")
("J" "N" "Q" "1")
("R" "W" "X" "2")
("D" "S" "Y" "3")
("F" "T" "4")
("A" "M" "5")
("C" "I" "V" "6")
("B" "K" "U" "7")
("L" "O" "P" "8")
("G" "H" "Z" "9"))
@(deffilter foo (`@a` `@b`) ("c" `->@d`))
@(bind x ("from" "to"))
@(bind y ("---" "+++"))
@(deffilter sub x y)
The last deffilter has the same effect as the @(deffilter sub ("from" "to") ("---" "+++")) directive.
Filtering works using a longest match algorithm. The input is scanned from left to right, and the longest piece of text is identified at every character position which matches a string on the left-hand side, and that text is replaced with its associated replacement text. The scanning then continues at the first character after the matched text.
If none of the strings matches at a given character position, then that character is passed through the filter untranslated, and the scan continues at the next character in the input.
Filtering is not in-place but rather instantiates a new text, and so replacement text is not re-scanned for more replacements.
If a filter definition accidentally contains two or more repetitions of the same left-hand string with different right-hand translations, the later ones take precedence. No warning is issued.
The syntax of the filter directive is:
@(filter FILTER { VAR }+ )
A filter is specified, followed by one or more variables whose values are filtered and stored back into each variable.
Example: convert a, b, and c to uppercase and HTML encode:
@(filter (:upcase :tohtml) a b c)
The exceptions mechanism in TXR is another disciplined form of nonlocal transfer, in addition to the blocks mechanism (see Blocks above). Like blocks, exceptions provide a construct which serves as the target for a dynamic exit. Both blocks and exceptions can be used to bail out of deep nesting when some condition occurs. However, exceptions provide more complexity. Exceptions are useful for error handling, and TXR in fact maps certain error situations to exception control transfers. However, exceptions are not inherently an error-handling mechanism; they are a structured dynamic control transfer mechanism, one of whose applications is error handling.
An exception control transfer (simply called an exception) is always identified by a symbol, which is its type. Types are organized in a subtype-supertype hierarchy. For instance, the file-error exception type is a subtype of the error type. This means that a file error is a kind of error. An exception handling block which catches exceptions of type error will catch exceptions of type file-error, but a block which catches file-error will not catch all exceptions of type error. A query-error is a kind of error, but not a kind of file-error. The symbol t is the supertype of every type: every exception type is considered to be a kind of t. (Mnemonic: t stands for type, as in any type).
Exceptions are handled using @(catch) clauses within a @(try) directive.
In addition to being useful for exception handling, the @(try) directive also provides unwind protection by means of a @(finally) clause, which specifies query material to be executed unconditionally when the try clause terminates, no matter how it terminates.
The general syntax of the try directive is
@(try)
... main clause, required ...
... optional catch clauses ...
... optional finally clause
@(end)
A catch clause looks like:
@(catch TYPE [ PARAMETERS ])
.
.
.
and also this simple form:
@(catch)
.
.
.
which catches all exceptions, and is equivalent to @(catch t).
A finally clause looks like:
@(finally)
...
.
.
The main clause may not be empty, but the catch and finally may be.
A try clause is surrounded by an implicit anonymous block (see Blocks section above). So for instance, the following is a no-op (an operation with no effect, other than successful execution):
The @(accept) causes a successful termination of the implicit anonymous block. Execution resumes with query lines or directives which follow, if any.
try clauses and blocks interact. For instance, an accept from within a try clause invokes a finally.
How this works: the try block's main clause is @(accept foo). This causes the enclosing block named foo to terminate, as a successful match. Since the try is nested within this block, it too must terminate in order for the block to terminate. But the try has a finally clause, which executes unconditionally, no matter how the try block terminates. The finally clause performs some output, which is seen.
Note that finally interacts with accept in subtle ways not revealed in this example; they are documented in the description of accept under the block directive documentation.
A try directive can terminate in one of three ways. The main clause may match successfully, and possibly yield some new variable bindings. The main clause may fail to match. Or the main clause may be terminated by a nonlocal control transfer, like an exception being thrown or a block return (like the block foo example in the previous section).
No matter how the try clause terminates, the finally clause is processed.
The finally clause is itself a query which binds variables, which leads to questions: what happens to such variables? What if the finally block fails as a query? As well as: what if a finally clause itself initiates a control transfer? Answers follow.
Firstly, a finally clause will contribute variable bindings only if the main clause terminates normally (either as a successful or failed match). If the main clause of the try block successfully matches, then the finally block continues matching at the next position in the data, and contributes bindings. If the main clause fails, then the finally block tries to match at the same position where the main clause failed.
The overall try directive succeeds as a match if either the main clause or the finally clause succeed. If both fail, then the try directive is a failed match.
Example:
In this example, the main clause of the try captures line "1" of the data as variable a, then the finally clause captures "2" as b, and then the query continues with the @c line after try block, so that c captures "3".
Example:
In this example, the main clause of the try fails to match, because the input is not prefixed with "hello ". However, the finally clause matches, binding b to "1". This means that the try block is a successful match, and so processing continues with @c which captures "2".
When finally clauses are processed during a nonlocal return, they have no externally visible effect if they do not bind variables. However, their execution makes itself known if they perform side effects, such as output.
A finally clause guards only the main clause and the catch clauses. It does not guard itself. Once the finally clause is executing, the try block is no longer guarded. This means if a nonlocal transfer, such as a block accept or exception, is initiated within the finally clause, it will not re-execute the finally clause. The finally clause is simply abandoned.
The disestablishment of blocks and try clauses is properly interleaved with the execution of finally clauses. This means that all surrounding exit points are visible in a finally clause, even if the finally clause is being invoked as part of a transfer to a distant exit point. The finally clause can make a control transfer to an exit point which is more near than the original one, thereby "hijacking" the control transfer. Also, the anonymous block established by the try directive is visible in the finally clause.
Example:
@(try)
@ (try)
@ (next "nonexistent-file")
@ (finally)
@ (accept)
@ (end)
@(catch file-error)
@ (output)
file error caught
@ (end)
@(end)
In this example, the @(next) directive throws an exception of type file-error, because the given file does not exist. The exit point for this exception is the @(catch file-error) clause in the outermost try block. The inner block is not eligible because it contains no catch clauses at all. However, the inner try block has a finally clause, and so during the processing of this exception which is headed for @(catch file-error), the finally clause performs an anonymous accept. The exit point for that accept is the anonymous block surrounding the inner try. So the original transfer to the catch clause is thereby abandoned. The inner try terminates successfully due to the accept, and since it constitutes the main clause of the outer try, that also terminates successfully. The "file error caught" message is never printed.
catch clauses establish their associated try blocks as potential exit points for exception-induced control transfers (called "throws").
A catch clause specifies an optional list of symbols which represent the exception types which it catches. The catch clause will catch exceptions which are a subtype of any one of those exception types.
If a try block has more than one catch clause which can match a given exception, the first one will be invoked.
When a catch is invoked, it is understood that the main clause did not terminate normally, and so the main clause could not have produced any bindings.
catch clauses are processed prior to finally.
If a catch clause itself throws an exception, that exception cannot be caught by that same clause or its siblings in the same try block. The catch clauses of that block are no longer visible at that point. Nevertheless, the catch clauses are still protected by the finally block. If a catch clause throws, or otherwise terminates, the finally block is still processed.
If a finally block throws an exception, then it is simply aborted; the remaining directives in that block are not processed.
So the success or failure of the try block depends on the behavior of the catch clause or the finally clause, if there is one. If either of them succeed, then the try block is considered a successful match.
Example:
Here, the try block's main clause is terminated abruptly by a file-error exception from the @(next) directive. This is handled by the catch clause, which binds variable a to the input line "1". Then the finally clause executes, binding b to "2". The try block then terminates successfully, and so @c takes "3".
A catch clause may have parameters following the type name, like this:
@(catch pair (a b))
To write a catch-all with parameters, explicitly write the master supertype t:
Parameters are useful in conjunction with throw. The built-in error exceptions carry one argument, which is a string containing the error message. Using throw, arbitrary parameters can be passed from the throw site to the catch site.
The throw directive generates an exception. A type must be specified, followed by optional arguments, which are bind expressions. For example,
@(throw pair "a" `@file.txt`)
throws an exception of type pair, with two arguments, being "a" and the expansion of the quasiliteral `@file.txt`.
The selection of the target catch is performed purely using the type name; the parameters are not involved in the selection.
Binding takes place between the arguments given in throw and the target catch.
If any catch parameter, for which a throw argument is given, is a bound variable, it has to be identical to the argument, otherwise the catch fails. (Control still passes to the catch, but the catch is a failed match).
If any argument is an unbound variable, the corresponding parameter in the catch is left alone: if it is an unbound variable, it remains unbound, and if it is bound, it stays as is.
If a catch has fewer parameters than there are throw arguments, the excess arguments are ignored:
If a catch has more parameters than there are throw arguments, the excess parameters are left alone. They may be bound or unbound variables.
A throw argument passing a value to a catch parameter which is unbound causes that parameter to be bound to that value.
throw arguments are evaluated in the context of the throw, and the bindings which are available there. Consideration of what parameters are bound is done in the context of the catch.
In the above example, c has a top-level binding to the string "c", but then becomes unbound via forget within the try construct, and rebound to the value "lc". Since the try construct is terminated by a throw, these modifications of the binding environment are discarded. Hence, at the end of the query, variable c ends up bound to the original value "c". The throw still takes place within the scope of the bindings set up by the try clause, so the values of a and c that are thrown are "a" and "lc". However, at the catch site, variable a does not have a binding. At that point, the binding to "a" established in the try has disappeared already. Being unbound, the catch parameter a can take whatever value the corresponding throw argument provides, so it ends up with "lc".
There is a horizontal form of throw. For instance:
abc@(throw e 1)
throws exception e if abc matches.
If throw is used to generate an exception derived from type error and that exception is not handled, TXR will issue diagnostics on the *stderr* stream and terminate. If an exception derived from warning is not handled, TXR will generate diagnostics on the *stderr* stream, after which control returns to the throw directive, and proceeds with the next directive. If an exception not derived from error is thrown, control returns to the throw directive and proceeds with the next directive.
The defex directive allows the query writer to invent custom exception types, which are arranged in a type hierarchy (meaning that some exception types are considered subtypes of other types).
Subtyping means that if an exception type B is a subtype of A, then every exception of type B is also considered to be of type A. So a catch for type A will also catch exceptions of type B. Every type is a supertype of itself: an A is a kind of A. This implies that every type is a subtype of itself also. Furthermore, every type is a subtype of the type t, which has no supertype other than itself. Type nil is a subtype of every type, including itself. The subtyping relationship is transitive also. If A is a subtype of B, and B is a subtype of C, then A is a subtype of C.
defex may be invoked with no arguments, in which case it does nothing:
@(defex)
It may be invoked with one argument, which must be a symbol. This introduces a new exception type. Strictly speaking, such an introduction is not necessary; any symbol may be used as an exception type without being introduced by @(defex):
@(defex a)
Therefore, this also does nothing, other than document the intent to use a as an exception.
If two or more argument symbols are given, the symbols are all introduced as types, engaged in a subtype-supertype relationship from left to right. That is to say, the first (leftmost) symbol is a subtype of the next one, which is a subtype of the next one and so on. The last symbol, if it had not been already defined as a subtype of some type, becomes a direct subtype of the master supertype t. Example:
The first directive defines d as a subtype of e, and e as a subtype of t. The second defines a as a subtype of b, b as a subtype of c, and c as a subtype of d, which is already defined as a subtype of e. Thus a is now a subtype of e. The above can be condensed to:
@(defex a b c d e)
Example:
Exception types have a pervasive scope. Once a type relationship is introduced, it is visible everywhere. Moreover, the defex directive is destructive, meaning that the supertype of a type can be redefined. This is necessary so that something like the following works right:
@(defex gorilla ape)
@(defex ape primate)
These directives are evaluated in sequence. So after the first one, the ape type has the type t as its immediate supertype. But in the second directive, ape appears again, and is assigned the primate supertype, while retaining gorilla as a subtype. This situation could be diagnosed as an error, forcing the programmer to reorder the statements, but instead TXR obliges. However, there are limitations. It is an error to define a subtype-supertype relationship between two types if they are already connected by such a relationship, directly or transitively. So the following definitions are in error:
@(defex a b)
@(defex b c)
@(defex a c)@# error: a is already a subtype of c, through b
@(defex x y)
@(defex y x)@# error: circularity; y is already a supertype of x.
The assert directive requires the remaining query or subquery which follows it to match. If the remainder fails to match, the assert directive throws an exception. If the directive is simply
@(assert)
Then it throws an assertion of type assert, which is a subtype of error. The assert directive also takes arguments similar to the throw directive: an exception symbol and additional arguments which are bind expressions, and may be unbound variables. The following assert directive, if it triggers, will throw an exception of type foo, with arguments 1 and "2":
@(assert foo 1 "2")
Example:
@(collect)
Important Header
----------------
@(assert)
Foo: @a, @b
@(end)
Without the assertion in places, if the Foo: @a, @b part does not match, then the entire interior of the @(collect) clause fails, and the collect continues searching for another match.
With the assertion in place, if the text "Important Header" and its underline match, then the remainder of the collect body must match, otherwise an exception is thrown. Now the program will not silently skip over any Important Header sections due to a problem in its matching logic. This is particularly useful when the matching is varied with numerous cases, and they must all be handled.
There is a horizontal assert directive also. For instance:
abc@(assert)d@x
asserts that if the prefix "abc" is matched, then it must be followed by a successful match for "d@x", or else an exception is thrown.
If the exception is not handled, and is derived from error then TXR issues diagnostics on the *stderr* stream and terminates. If the exception is derived from warning and not handled, TXR issues a diagnostic on *stderr* after which control returns to the assert directive. Control silently returns to the assert directive if an exception of any other kind is not handled.
When control returns to assert due to an unhandled exception, it behaves like a failed match, similarly to the require directive.
The TXR language contains an embedded Lisp dialect called TXR Lisp.
This language is exposed in TXR in a number of ways.
In any situation that calls for an expression, a Lisp expression can be used, if it is preceded by the @ character. The Lisp expression is evaluated and its value becomes the value of that expression. Thus, TXR directives are embedded in literal text using @, and Lisp expressions are embedded in directives using @ also.
Furthermore, certain directives evaluate Lisp expressions without requiring @. These are @(do), @(require), @(assert), @(if) and @(next).
TXR Lisp code can be placed into files. On the command line, TXR treats files with a ".tl" or ".tlo" suffix as TXR Lisp source or compiled code, and the @(load) directive does also.
TXR also provides an interactive listener for Lisp evaluation.
Lastly, TXR Lisp expressions can be evaluated via the command line, using the -e and -p options.
Examples:
Bind variable a to the integer 4:
Bind variable b to the standard input stream. Note that @ is not required on a Lisp variable:
@(bind a *stdin*)
Define several Lisp functions inside @(do):
@(do
(defun add (x y) (+ x y))
(defun occurs (item list)
(cond ((null list) nil)
((atom list) (eql item list))
(t (or (eq (first list) item)
(occurs item (rest list)))))))
Trigger a failure unless previously bound variable answer is greater than 42:
@(require (> (int-str answer) 42)
TXR Lisp is a small and simple dialect, like Scheme, but much more similar to Common Lisp than Scheme. It has separate value and function binding namespaces, like Common Lisp (and thus is a Lisp-2 type dialect), and represents Boolean true and false with the symbols t and nil (note the case sensitivity of identifiers denoting symbols!). Furthermore, the symbol nil is also the empty list, which terminates nonempty lists.
TXR Lisp has lexically scoped local variables and dynamic global variables, similarly to Common Lisp, including the convention that defvar marks symbols for dynamic binding in local scopes. Lexical closures are supported. TXR Lisp also supports global lexical variables via defvarl.
Functions are lexically scoped in TXR Lisp; they can be defined in the pervasive global environment using defun or in local scopes using flet and labels.
Much of the TXR Lisp syntax has been introduced in the previous sections of the manual, since directive forms are based on it. There is some additional syntax that is useful in TXR Lisp programming.
The symbol tokens in TXR Lisp, called a lident (Lisp identifier) has a similar syntax to the bident (braced identifier) in the TXR pattern language. It may consist of all the same characters, as well as the / (slash) character which may not be used in a bident. Thus a lident may consist of these characters, in addition to letters, numbers and underscores:
! $ % & * + - < = > ? \ ~ /
and may not look like a number.
A lident may also include all of the Unicode characters which are permitted in a bident.
The one character which is allowed in a lident but not in a bident is / (forward slash).
A lone / is a valid lident and consequently a symbol token in TXR Lisp. The token /abc/ is also a symbol, and, unlike in a braced expression, is not a regular expression. In TXR Lisp expressions, regular expressions are written with a leading #.
If a symbol name contains a colon, the lident characters, if any, before that colon constitute the package prefix.
For example, the syntax foo:bar denotes bar symbol in the foo package.
It is a syntax error to read a symbol whose package doesn't exist.
If the package exists, but the symbol name doesn't exist in that package, then the symbol is interned in that package.
If the package name is an empty string (the colon is preceded by nothing), the package is understood to be the keyword package. The symbol is interned in that package.
The syntax :test denotes the symbol test in the keyword package, the same as keyword:test.
Symbols in the keyword package are self-evaluating. This means that when a keyword symbol is evaluated as a form, the value of that form is the keyword symbol itself. Exactly two non-keyword symbols also have this special self-evaluating behavior: the symbols t and nil in the user package, whose fully qualified names are usr:t and usr:nil.
The syntax @foo:bar denotes the meta prefix @ being applied to the foo:bar symbol, not to a symbol in the @foo package.
The syntax #:bar denotes an uninterned symbol named bar, described in the next section.
In ANSI Common Lisp, the foo:bar syntax does not intern the symbol bar in the foo package; the symbol must exist and be an exported symbol, or else the syntax is erroneous. In ANSI Common Lisp, the syntax foo::bar does intern foo in the bar package. TXR's package system has no double-colon syntax, and lacks the concept of exported symbols.
Uninterned symbols are written with the #: prefix, followed by zero or more lident characters. When an uninterned symbol is read, a new, unique symbol is constructed, with the specified name. Even if two uninterned symbols have the same name, they are different objects. The make-sym and gensym functions produce uninterned symbols.
"Uninterned" means "not entered into a package". Interning refers to a process which combines package lookup with symbol creation, which ensures that multiple occurrences of a symbol name in written syntax are all converted to the same object: the first occurrence creates the symbol and associates it with its name in a package. Subsequent occurrences do not create a new symbol, but retrieve the existing one.
An expression may be preceded by the @ (at sign) character. If the expression is an atom, then this is a meta-atom, otherwise it is a meta-expression.
When the atom is a symbol, this is also called a meta-symbol and in situations when such a symbol behaves like a variable, it is also referred to as a meta-variable.
When the atom is an integer, the meta-atom expression is called a meta-number.
Meta-atom and meta-expression expressions have no evaluation semantics; evaluating them throws an exception. They play a syntactic role in the op operator, which makes use of meta-variables and meta-numbers, and in structural pattern matching, which uses meta-variables as pattern variables and whose operator vocabulary is based on meta-expressions.
Meta-expressions also appear in the quasiliteral notation.
In other situations, application code may assign meaning to meta syntax as the programmer sees fit.
Meta syntax is defined as a shorthand notation, as follows:
If X is the syntax of an atom, such as a symbol, string or vector, then @X is a shorthand for the expression (sys:var X). Here, sys:var refers to the var symbol in the system-package.
If X is a compound expression, either (...) or [...], then @X is a shorthand for the expression (sys:expr X).
The behavior of @ followed by the syntax of a floating-point constant introduced by a leading decimal point, not preceded by digits, is unspecified. Examples of this are @.123 and @.123E+5.
The behavior of @ followed by the syntax of a floating-point expression in E notation, which lacks a decimal point, is also unspecified. An example of this is @12E5.
It is a syntax error for @ to be followed by what appears to be a floating-point constant consisting of a decimal point flanked by digits on both sides. For instance @1.2 is rejected.
A meta-expression followed by a period, and the syntax of another object is otherwise interpreted as a referencing dot expression. For instance @1.E3 denotes (qref @1 E3) which, in turn, denotes (qref (sys:var 1) E3), even though the unprefixed character sequence 1.E3 is otherwise a floating-point constant.
Unlike other major Lisp dialects, TXR Lisp allows a consing dot with no forms preceding it. This construct simply denotes the form which follows the dot. That is to say, the parser implements the following transformation:
(. expr) -> expr
This is convenient in writing function argument lists that only take variable arguments. Instead of the syntax:
(defun fun args ...)
the following syntax can be used:
(defun fun (. args) ...)
When a lambda form is printed, it is printed in the following style.
(lambda nil ...) -> (lambda () ...)
(lambda sym ...) -> (lambda (. sym) ...)
(lambda (sym) ...) -> (lambda (sym) ...)
In no other circumstances is nil printed as (), or an atom sym as (. sym).
A dot token which is flanked by expressions on both sides, without any intervening whitespace, is the referencing dot, and not the consing dot. The referencing dot is a syntactic sugar which translated to the qref syntax ("quoted ref"). When evaluated as a form, this syntax denotes structure access; see Structures. However, it is possible to put this syntax to use for other purposes, in other contexts.
;; a.b may be almost any expressions
a.b <--> (qref a b)
a.b.c <--> (qref a b c)
a.(qref b c) <--> (qref a b c)
(qref a b).c <--> (qref (qref a b) c)
That is to say, this dot operator constructs a qref expression out of its left and right arguments. If the right argument of the dot is already a qref expression (whether produced by another instance of the dot operator, or expressed directly) it is merged. This requires the qref dot operator to be right-to-left associative, so that a.b.c works by first translating b.c to (qref b c), and then adjoining a to produce (qref a b c).
If the referencing dot is immediately followed by a question mark, it forms a single token, which produces the following syntactic variation, in which the following item is annotated as a list headed by the symbol t:
a.?b <--> (t a).b <--> (qref (t a) b)
a.?b.?c <--> (t a).(t b).c <--> (qref (t a) (t b) c)
a.?(b) <--> (t a).(b) <--> (qref (t a) (b))
(a).?b <--> (t (a)).b <--> (qref (t (a)) b)
This syntax denotes null-safe access to structure slots and methods. a.?b means that a may evaluate to nil, in which case the expression yields nil; otherwise, a must evaluate to a struct which has a slot b, and the expression denotes access to that slot. Similarly, a.?(b 1) means that if a evaluates to nil, the expression yields nil; otherwise, a is treated as a struct object whose method b is invoked with argument 1, and the value returned by that method becomes the value of the expression.
Integer tokens cannot be involved in this syntax, because they form floating-point constants when juxtaposed with a dot. Such ambiguous uses of floating-point tokens are diagnosed as syntax errors:
(a.4) ;; error: cramped floating-point literal
(a .4) ;; good: a followed by 0.4
Closely related to the referencing dot syntax is the unbound referencing dot. This is a dot which is flanked by an expression on the right, without any intervening whitespace, but is not preceded by an expression Rather, it is preceded by whitespace, or some punctuation such as [, ( or '. This is a syntactic sugar which translates to uref syntax:
.a <--> (uref a)
.a.b <--> (uref a b)
.a.?b <--> (uref (t a) b)
If the unbound referencing dot is itself combined with a question mark to form the .? token, then the translation to uref is as follows:
.?a <--> (uref t a)
.?a.b <--> (uref t a b)
.?a.?b <--> (uref t a (t b))
When the unbound referencing dot is applied to a dotted expression, this can be understood as a conversion of qref to uref.
Indeed, this is exactly what happens if the unbound dot is applied to an explicit qref expression:
The unbound referencing dot takes its name from the semantics of the uref macro, which produces a function that implements late binding of an object to a method slot. Whereas the expression obj.a.b denotes accessing object obj to retrieve slot a and then accessing slot b of the object from that slot, the expression .a.b. represents a "disembodied" reference: it produces a function which takes an object as an argument and then performs the implied slot referencing on that argument. When the function is called, it is said to bind the referencing to the object. Hence that referencing is "unbound".
Whereas the expression .a produces a function whose argument must be an object, .?a produces a function whose argument may be nil. The function detects this case and returns nil.
Under a quasiquote, form is considered to be a quasiquote template. The template is considered to be a literal structure, except that it may contain the notations ,expr and ,*expr which denote non-constant parts.
A quasiquote gets translated into code which, when evaluated, constructs the structure implied by qq-template, taking into account the unquotes and splices.
A quasiquote also processes nested quasiquotes specially.
If qq-template does not contain any unquotes or splices (which match its level of nesting), or is simply an atom, then ^qq-template is equivalent to 'qq-template . in other words, it is like an ordinary quote. For instance ^(a b ^(c ,d)) is equivalent to '(a b ^(c ,d)). Although there is an unquote ,d it belongs to the inner quasiquote ^(c ,d), and the outer quasiquote does not have any unquotes of its own, making it equivalent to a quote.
Dialect Note: in Common Lisp and Scheme, ^form is written `form, and quasiquotes are also informally known as backquotes. In TXR, the backquote character ` used for quasistring literals.
Note: if a variable is called *x*, then the syntax ,*x* means ,* x*: splice the value of x*. In this situation, whitespace between the comma and the variable name must be used: , *x*.
In other Lisp dialects, like Scheme and ANSI Common Lisp, the equivalent syntax is usually ,@ (comma at). The @ character already has an assigned meaning in TXR, so * is used.
However, * is also a character that may appear in a symbol name, which creates a potential for ambiguity. The syntax ,*abc denotes the application of the ,* splicing operator to the symbolic expression abc; to apply the ordinary non-splicing unquote to the symbol *abc, whitespace must be used: , *abc.
In TXR, the unquoting and splicing forms may freely appear outside of a quasiquote template. If they are evaluated as forms, however, they throw an exception:
,(+ 2 2) ;; error!
',(+ 2 2) --> ,(+ 2 2)
In other Lisp dialects, a comma not enclosed by backquote syntax is treated as a syntax error by the reader.
'#(1 2 3)
The #(1 2 3) literal is turned into a vector atom right in the TXR parser, and this atom is being quoted: this is (quote atom) syntactically, which evaluates to atom.
When a vector is quasi-quoted, this is a case of ^atom which evaluates to atom.
A vector can be quasiquoted, for example:
^#(1 2 3)
Unquotes can occur within a quasiquoted vector:
(let ((a 42))
^#(1 ,a 3)) ; value is #(1 42 3)
In this situation, the ^#(...) notation produces code which constructs a vector.
The vector in the following example is also a quasivector. It contains unquotes, and though the quasiquote is not directly applied to it, it is embedded in a quasiquote:
(let ((a 42))
^(a b c #(d ,a))) ; value is (a b c #(d 42))
Hash-table literals have two parts: the list of hash construction arguments and the key-value pairs. For instance:
#H((:eql-based) (a 1) (b 2))
where (:eql-based) indicates that this hash table's keys are treated using eql equality, and (a 1) and (b 2) are the key/value entries. Hash literals may be quasiquoted. In quasiquoting, the arguments and pairs are treated as separate syntax; it is not one big list. So the following is not a possible way to express the above hash:
;; not supported: splicing across the entire syntax
(let ((hash-syntax '((:eql-based) (a 1) (b 2))))
^#H(,*hash-syntax))
This is correct:
;; fine: splicing hash arguments and contents separately
(let ((hash-args '(:eql-based))
(hash-contents '((a 1) (b 2))))
^#H(,hash-args ,*hash-contents))
Example:
(eval (let ((a 3)) ^`abc @,a @{,a} @{(list 1 2 ,a)}`))
-> "abc 3 3 1 2 3"
When a struct literal is read, the denoted struct type is constructed as if by a call to make-struct with an empty plist argument, followed by a sequence of assignments which store into each slot the corresponding value expression.
An empty list can be specified as nil or (), which defaults to a hash table based on the eql function, with no weak semantics or user data.
The entire syntax following #H may be an empty list; however, that empty list may not be specified as nil; the empty parentheses notation is required.
The hash table's key-value contents are specified as zero or more two-element lists, whose first element specifies the key and whose second specifies the value. Both expressions are literal objects, not subject to evaluation.
Buffers may be constructed by the make-buf function, and other means such as the ffi-get function.
Note that the #b prefix is also used for binary numbers. In that syntax, it is followed by an optional sign, and then a mixture of one or more of the digits 0 or 1.
A tree node is an object of type tnode. Every tnode has three elements: a key, a left link and a right link. They may be objects of any type. If the tree node literal syntax omits any of these, they default to nil.
The list syntax which follows #T may be empty. If so, it cannot be written as nil.
The first element of the #T syntax, if present, must be a list of zero to three elements. These elements are symbols giving the names of the tree object's key abstraction functions. keyfun specifies the key function which is applied to each element to retrieve its key. If it is omitted, the object shall use the identity function as its key. The lessfun specifies the name of the comparison function by which keys are compared for inequality. It defaults to less. The equalfun specifies the function by which keys are compared for equality. It defaults to equal. A symbol which is specified as the name of any of these three special functions must be an element of the list stored in the special variable *tree-fun-whitelist*, otherwise the string literal is diagnosed as erroneous. Note: this is due to security considerations, since these three functions are executed during the processing of tree syntax.
A tree object is constructed from a tree literal by first creating an empty tree endowed with the three key abstraction functions that are indicated in the syntax, either explicitly or as defaults. Then, every element object is constructed from its respective literal syntax and inserted into the tree.
Duplicate objects are preserved. For instance the tree literal #T(() 1 1 1) specifies a tree with three nodes which have the same key. Duplicates appear in the tree in the order that they appear in the literal.
The implementation of JSON syntax is based on, and intended to conform with the IETF RFC 8259 document. Only TXR's extensions to JSON syntax are described in this manual, as well as the correspondence between JSON syntax and Lisp.
The json-syntax is translated into a TXR Lisp object as follows.
A JSON string corresponds to a Lisp string. A JSON number corresponds to a Lisp floating-point number. A JSON array corresponds to a Lisp vector. A JSON object corresponds to an equal-based hash table.
The JSON Boolean symbols true and false translate to the Lisp symbols t and nil, respectively, those being the standard ones in the usr package.
The JSON symbol null maps to the null symbol in the usr package.
The #Jjson-syntax expression produces the object:
(json quote lisp-object)
where lisp-object is the Lisp value which corresponds to the json-syntax.
Similarly, but with a key difference, the #J^json-syntax expression produces the object:
(json sys:qquote lisp-object)
in which quote has been replaced with sys:qquote.
The json symbol is bound as a macro, which is expanded when a #J expression is evaluated.
The following remarks indicate special treatment and extensions in the processing of JSON. Similar remarks regarding the production of JSON are given under the put-json function.
When an invalid UTF-8 byte is encountered inside a JSON string, its value is mapped into the code point range U+DC01 to U+DCFF. That byte is consumed, and decoding continues with the next byte. This treatment is consistent with the treatment of invalid UTF-8 bytes in TXR Lisp literals and I/O streams. If the valid UTF-8 byte U+0000 (ASCII NUL) occurs in a JSON string, it is also mapped to U+DC00, TXR's pseudo-null character. This treatment is consistent with TXR string literals and I/O streams.
The JSON escape sequence \u0000 denoting the U+0000 NUL character is also converted to U+DC00.
TXR Lisp does not impose the restriction that the keys in a JSON object must be strings: #J{1:2,true:false} is accepted.
TXR Lisp allows the circle notation to occur within JSON syntax. See the section Notation for Circular and Shared Structure.
TXR Lisp allows for JSON syntax to be quasiquoted, and provides two extensions for writing unquotes and splicing unquotes. Within a JSON quasiquote, the ~ (tilde) character introduces a Lisp expression whose value is to be substituted at that point. Thus, the tilde serves the role of the unquoting comma used in Lisp quasiquotes. Splicing is indicated by the character sequence ~*, which introduces a Lisp expression that is expected to produce a list, whose elements are interpolated into the JSON value.
Note: quasiquoting allows Lisp values to be introduced into the resulting object which are outside of the JSON type system, such as integers, characters, symbols or structures. These objects have no representation in JSON syntax.
;; Basic JSON:
#Jtrue -> t
#Jfalse -> nil
(list #J true #Jtrue #Jfalse) -> (t t nil)
#J[1, 2, 3.14] -> #(1.0 2.0 3.14)
#J{"foo":"bar"} -> #H(() ("foo" "bar"))
;; Quoting JSON shows the json expression
'#Jfalse -> (json quote ())
'#Jtrue -> (json quote t)
'#J["a", true, 3.0] -> (json quote #("a" t 3.0))
'#J^[~(+ 2 2), 3] -> (json sys:qquote #(,(+ 2 2) 3.0))
:; Circle notation:
#J[#1="abc", #1#, #1#] -> #("abc" "abc" "abc")
;; JSON Quasiquote:
#J^[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0]
--> #(1.0 2.0 3.0 4.0 5.0)
;; Lisp quasiquote around JSON quote: requires evaluation round.
^#J[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0]
--> (json quote #(1.0 2.0 3.0 4.0 5.0))
(eval ^#J[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0])
--> #(1.0 2.0 3.0 4.0 5.0)
That is to say, A .. B translates to (rcons A B), and so for instance (a b .. (c d) e .. f . g) means (a (rcons b (c d)) (rcons e f) . g).
The rcons function constructs a range object, which denotes a pair of values. Range objects are most commonly used for referencing subranges of sequences.
For instance, if L is a list, then [L 1 .. 3] computes a sublist of L consisting of elements 1 through 2 (counting from zero).
Note that if this notation is used in the dot position of an improper list, the transformation still applies. That is, the syntax (a . b .. c) is valid and produces the object (a . (rcons b c)) which is another way of writing (a rcons b c), which is quite probably nonsense.
The notation's .. operator associates right to left, so that a..b..c denotes (rcons a (rcons b c)).
Note that range objects are not printed using the dotdot notation. A range literal has the syntax of a two-element list, prefixed by #R. (See Range Literals above.)
In any context where the dotdot notation may be used, and where it is evaluated to its value, a range literal may also be specified. If an evaluated dotdot notation specifies two constant expressions, then an equivalent range literal can replace it. For instance the form [L 1 .. 3] can also be written [L #R(1 3)]. The two are syntactically different, and so if these expressions are being considered for their syntax rather than value, they are not the same.
For instance if foo is a variable which holds a function object, then [foo 3] can be used to call it, instead of (call foo 3). If foo is a vector, then [foo 3] retrieves the fourth element, like (vecref foo 3). Indexing over lists, strings and hash tables is possible, and the notation is assignable.
Furthermore, any arguments enclosed in [] which are symbols are treated according to a modified namespace lookup rule.
More details are given in the documentation for the dwim operator.
The first position of an ordinary Lisp-2 style compound form, is expected to have a function or operator name. Then arguments follow. There may also be an expression in the dotted position, if the form is a function call.
If the form is a function call then the arguments are evaluated. If any of the arguments are symbols, they are treated according to Lisp-2 namespacing rules.
A function name may be a symbol, or else any of the syntactic forms given in the description of the function func-get-name.
If there is an expression in the dotted position of a function call expression, it is also evaluated, and the resulting value is involved in the function call in a special way.
Firstly, note that a compound form cannot be used in the dot position, for obvious reasons, namely that (a b c . (foo z)) does not mean that there is a compound form in the dot position, but denotes an alternate spelling for (a b c foo z), where foo behaves as a variable.
If the dot position of a compound form is an atom, then the behavior may be understood according to the following transformations:
(f a b c ... . x) --> (apply (fun f) a b c ... x)
[f a b c ... . x] --> [apply f a b c ... x]
In addition to atoms, meta-expressions and meta-symbols can appear in the dot position, even though their underlying syntax is comprised of a compound expression. This appears to work according to a transformation pattern which superficially appears to be the same as that for atoms:
(f a b c ... . @x) --> (apply (fun f) a b c ... @x)
However, in this situation, the @x is actually the form (sys:var x) and the dotted form is actually a proper list. The transformation is in fact taking place over a proper list, like this:
(f a b c ... sys:var x) --> (apply (fun f) a b c ... (sys:var @x))
That is to say, the TXR Lisp form expander reacts to the presence of a sys:var or sys:expr atom in embedded in the form. That symbol and the items which follow it are wrapped in an additional level of nesting, converted into a single compound form element.
Effectively, in all these cases, the dot notation constitutes a shorthand for apply.
Examples:
;; a contains 3
;; b contains 4
;; c contains #(5 6 7)
;; s contains "xyz"
(foo a b . c) ;; calls (foo 3 4 5 6 7)
(foo a) ;; calls (foo 3)
(foo . s) ;; calls (foo #\x #\y #\z)
(list . a) ;; yields 3
(list a . b) ;; yields (3 . 4)
(list a . c) ;; yields (3 5 6 7)
(list* a c) ;; yields (3 . #(5 6 7))
(cons a . b) ;; error: cons isn't variadic.
(cons a b . c) ;; error: cons requires exactly two arguments.
[foo a b . c] ;; calls (foo 3 4 5 6 7)
[c 1] ;; indexes into vector #(5 6 7) to yield 6
(call (op list 1 . @1) 2) ;; yields (1 . 2)
Note that the atom in the dot position of a function call may be a symbol macro. Since the semantics works as if by transformation to an apply form in which the original dot position atom is an ordinary argument, the symbol macro may produce a compound form.
Thus:
(symacrolet ((x 2))
(list 1 . x)) ;; yields (1 . 2)
(symacrolet ((x (list 1 2)))
(list 1 . x)) ;; yields (1 1 2)
That is to say, the expansion of x is not substituted into the form (list 1 . x) but rather the transformation to apply syntax takes place first, and so the substitution of x takes place in a form resembling (apply (fun list) 1 x).
Dialect Note:
In some other Lisp dialects like ANSI Common Lisp, the improper list syntax may not be used as a function call; a function called apply (or similar) must be used for application even if the expression which gives the trailing arguments is a symbol. Moreover, applying sequences other than lists is not supported.
TXR Lisp allows macros to be called using forms which are improper lists. These forms are simply destructured by the usual macro parameter list destructuring. To be callable this way, the macro must have an argument list which specifies a parameter match in the dot position. This dot position must either match the terminating atom of the improper list form, or else match the trailing portion of the improper list form.
For instance if a macro mac is defined as
(defmacro mac (a b . c) ...)
then it may not be invoked as (mac 1 . 2) because the required argument b is not satisfied, and so the 2 argument cannot match the dot position c as required. The macro may be called as (mac 1 2 . 3) in which case c receives the form 3. If it is called as (mac 1 2 3 . 4) then c receives the improper list form 3 . 4.
TXR Lisp supports a printed notation called circle notation which accurately articulates the representation of objects which contain shared substructures as well as circular references. The notation is supported as a means of input, and is also optionally produced as output, controlled by the *print-circle* variable.
Ordinarily, shared substructure in printed objects is not evident, except in the case of multiple occurrences of interned symbols, in whose semantics it is implicit that they refer to the same object. Other shared structure is printed as separate copies which look like distinct objects. For instance, the object produced by (let ((shared '(1 2))) (list shared shared)) is printed as ((1 2) (1 2)), where it is not clear that the two occurrences of (1 2) are actually the same object. Under the circle notation, this object can be represented as (#5=(1 2) #5#). The #5= part introduces a reference label, associating the arbitrarily chosen nonnegative integer 5 with the object which follows. The subsequent notation #5# simply refers to the object labeled by 5, reproducing that object by reference. The result is a two-element list which has the same (1 2) in two places.
Circular structure presents a greater challenge to printing: namely, if it is printed by a naive recursive descent, it results in infinite output, and possibly stack exhaustion due to recursion. The circle notation detects and handles circular references. For instance, the object produced by (let ((c (list 1))) (rplacd c c)) produces a circular list which looks like an infinite list of 1's: (1 1 1 1 ...). This cannot be printed. However, under the circle notation, it can be represented as #1=(1 . #1#). The entire object itself is labeled by the integer 1. Then, enclosed within the syntax of that labeled object itself, a reference occurs to the label. This circular label reference represents the corresponding circular reference in the object.
A detailed description of the notational elements follows:
There may be no more than one definition for a given label within the syntactic scope being parsed, otherwise a syntax error occurs. In TXR pattern language code, an entire source file is parsed as one unit, and so scope for the circular notation's references is the entire source file. Files processed by @(include) have their own scope. The scope for labels in TXR Lisp source code is the top-level expression in which they appear. Consequently, references in one TXR Lisp top-level expression cannot reach definitions in another.
Note:
Circular notation can span hash-table literals. The syntax #1=#H((:eql-based) (#1# #1#)) denotes an eql-based hash table which contains one entry, in which that same table itself is both the key and value. This kind of circularity is not supported for equal-based hash tables. The analogous syntax #1=#H(() (#1# #1#)) produces a hash table in an inconsistent state.
Dialect Note:
Circle notation is taken from Common Lisp, intended to be unsurprising to users familiar with that language. The implementation is based on descriptions in the ANSI Common Lisp document, judiciously taking into account the content of the X3J13 Cleanup Issues named PRINT-CIRCLE-STRUCTURE:USER-FUNCTIONS-WORK and PRINT-CIRCLE-SHARED:RESPECT-PRINT-CIRCLE.
This is useful for temporarily "commenting out" an expression.
Notes:
Whereas it is valid for a TXR Lisp source file to be empty, it is a syntax error if a TXR Lisp source file contains nothing but one or more objects which are each suppressed by a preceding #;. In the interactive listener, an input line consisting of nothing but commented-out objects is similarly a syntax error.
The notation does not cascade; consecutive occurrences of #; trigger a syntax error.
The notation interacts with the circle notation. Firstly, if an object which is erased by #; contains circular-referencing instances of the label notation, those instances refer to nil. Secondly, commented-out objects may introduce labels which are subsequently referenced in expr. An example of the first situation occurs in:
#;(#1=(#1#))
Here the #1# label is a circular reference because it refers to an object which is a parent of the object which contains that reference. Such a reference is only satisfied by a "backpatching" process once the entire surrounding syntax is processed to the top level. The erasure perpetrated by #; causes the #1# label reference to be replaced by nil, and therefore the labeled object is the object (nil).
An example of the second situation is
#;(#2=(a b c)) #2#
Here, even though the expression (#2=(a b c)) is suppressed, the label definition which it has introduced persists into the following object, where the label reference #2# resolves to (a b c).
A combination of the two situations occurs in
#;(#1=(#1#)) #1#
which yields (nil). This is because the #1= label is available; but the earlier #1# reference, being a circular reference inside an erased object, had lapsed to nil.
In ancient Lisp in the 1960's, it was not possible to apply the operations car and cdr to the nil symbol (empty list), because it is not a cons cell. In the InterLisp dialect, this restriction was lifted: these operations were extended to accept nil (and return nil). The convention was adopted in other Lisp dialects such as MacLisp and eventually in Common Lisp. Thus there exists an object which is not a cons, yet which takes car and cdr.
In TXR Lisp, this relaxation is extended further. For the sake of convenience, the operations car and cdr, are made to work with strings and vectors:
(cdr "") -> nil
(car "") -> nil
(car "abc") -> #\a
(cdr "abc") -> "bc"
(cdr #(1 2 3)) -> #(2 3)
(car #(1 2 3)) -> 1
Moreover, structure types which define the methods car, cdr and nullify can also be treated in the same way.
The ldiff function is also extended in a special way. When the right parameter a non-list sequence, then it uses the equal equality test rather than eq for detecting the tail of the list.
(ldiff "abcd" "cd") -> (#\a #\b)
The ldiff operation starts with "abcd" and repeatedly applies cdr to produce "bcd" and "cd", until the suffix is equal to the second argument: (equal "cd" "cd") yields true.
Operations based on car, cdr and ldiff, such as keep-if and remq extend to strings and vectors.
Most derived list processing operations such as remq or mapcar obey the following rule: the returned object follows the type of the leftmost input list object. For instance, if one or more sequences are processed by mapcar, and the leftmost one is a character string, the function is expected to return characters, which are converted to a character string. However, in the event that the objects produced cannot be assembled into that type of sequence, a list is returned instead.
For example [mapcar list "ab" "12"] returns ((#\a #\b) (#\1 #\2)), because a string cannot hold lists of characters. However [mappend list "ab" "12"] returns "a1b2".
The lazy versions of these functions such as mapcar* do not have this behavior; they produce lazy lists.
TXR Lisp implements a unified paradigm for iterating over sequence-like container structures and abstract spaces such as bounded and unbounded ranges of integers. This concept is based around an iterator abstraction which is directly compatible with Lisp cons-cell traversal in the sense that when iteration takes place over lists, the iterator instance is nothing but a cons cell.
An iterator is created using the constructor function iter-begin which takes a single argument. The argument denotes a space to be traversed; the iterator provides the means for that traversal.
When the iter-begin function is applied to a list (a cons cell or the nil object), the return value is that object itself. The remaining functions in the iterator API then behave like aliases for list processing functions. The iter-more function behaves like identity, iter-item behaves like car and iter-step behaves like cdr.
For example, the following loops not only produce identical behavior, but the iter variable steps through the cons cells in the same manner in both:
;; print all symbols in the list (a b c d):
(let ((iter '(a b c d)))
(while iter
(prinl (car iter))
(set iter (cdr iter))))
;; likewise:
(let ((iter (iter-begin '(a b c d))))
(while (iter-more iter)
(prinl (iter-item iter))
(set iter (iter-step iter))))
There are three important differences.
Firstly, both examples will still work if the list (a b c d) is replaced by a different kind of sequence, such as the string "abcd" or the vector #(a b c d). However, the former example will not execute efficiently on these objects. The reason is that the cdr function will construct successive suffixes of the string and list object. That requires not only the allocation of memory, but changes the running time complexity of the loop from linear to quadratic.
Secondly, the former example with car/cdr will not work correctly if the sequence is an empty non-list sequence, like the null string or empty vector. Rectifying this problem requires the nullify function to be used:
;; print all symbols in the list (a b c d):
(let ((iter (nullify "abcd")))
(while iter
(prinl (car iter))
(set iter (cdr iter))))
The nullify function converts empty sequences of all kinds into the empty list nil.
Thirdly, the second example will work even if the input list is replaced with certain objects which are not sequences at all:
;; Print the integers from 0 to 3
(let ((iter (iter-begin 0..4)))
(while (iter-more iter)
(prinl (iter-item iter))
(set iter (iter-step iter))))
;; Print incrementing integers starting at 1,
;; breaking out of the loop after 100.
(let ((iter (iter-begin 1)))
(while (iter-more iter)
(if (eql 100 (prinl (iter-item iter)))
(return))
(set iter (iter-step iter))))
In TXR Lisp, numerous functions that appear as list processing functions in other contemporary Lisp dialects, and historically, are actually sequence processing functions based on the above iterator paradigm.
In TXR Lisp, sequences (strings, vectors and lists) as well as hashes and regular expressions can be used as functions everywhere, not just with the DWIM brackets.
Sequences work as one- or two-argument functions. With a single argument, an element is selected by position and returned. With two arguments, a range is extracted and returned.
Moreover, when a sequence is used as a function of one argument, and the argument is a range object rather than an integer, then the call is equivalent to the two-argument form. This is the basis for array slice syntax like ["abc" 0..1] .
Hashes also work as one or two argument functions, corresponding to the arguments of the gethash function.
A regular expression behaves as a one, two, or three argument function, which operates on a string argument. It returns the leftmost matching substring, or else nil.
Example 1:
(mapcar "abc" '(2 0 1)) -> (#\c #\a #\b)
Here, mapcar treats the string "abc" as a function of one argument (since there is one list argument). This function maps the indices 0, 1 and 2 to the corresponding characters of string "abc". Through this function, the list of integer indices (2 0 1) is taken to the list of characters (#\c #\a #\b).
Example 2:
(call '(1 2 3 4) 1..3) -> (2 3)
Here, the shorthand 1 .. 3 denotes (rcons 1 3). A range used as an argument to a sequence performs range extraction: taking a slice starting at index 1, up to and not including index 3, as if by the call (sub '(1 2 3 4) 1 3).
Example 3:
(call '(1 2 3 4) '(0 2)) -> (1 2)
A list of indices applied to a sequence is equivalent to using the select function, as if (select '(1 2 3 4) '(0 2)) were called.
Example 4:
(call #/b./ "abcd") -> "bc"
Here, the regular expression, called as a function, finds the matching substring "bc" within the argument "abcd".
Similarly to Common Lisp, TXR Lisp is lexically scoped by default, but also has dynamically scoped (a.k.a "special") variables.
When a variable is defined with defvar or defparm, a binding for the symbol is introduced in the global name space, regardless of in what scope the defvar form occurs.
Furthermore, at the time the defvar form is evaluated, the symbol which names the variable is tagged as special.
When a symbol is tagged as special, it behaves differently when it is used in a lexical binding construct like let, and all other such constructs such as function parameter lists. Such a binding is not the usual lexical binding, but a "rebinding" of the global variable. Over the dynamic scope of the form, the global variable takes on the value given to it by the rebinding. When the form terminates, the prior value of the variable is restored. (This is true no matter how the form terminates; even if by an exception.)
Because of this "pervasive special" behavior of a symbol that has been used as the name of a global variable, a good practice is to make global variables have visually distinct names via the "earmuffs" convention: beginning and ending the name with an asterisk.
(defvar *x* 42) ;; *x* has a value of 42
(defun print-x ()
(format t "~a\n" *x*))
(let ((*x* "abc")) ;; this overrides *x*
(print-x)) ;; *x* is now "abc" and so that is printed
(print-x) ;; *x* is 42 again and so "42" is printed
The terms bind and binding are used differently in TXR Lisp compared to ANSI Common Lisp. In TXR Lisp binding is an association between a symbol and an abstract storage location. The association is registered in some namespace, such as the global namespace or a lexical scope. That storage location, in turn, contains a value. In ANSI Lisp, a binding of a dynamic variable is the association between the symbol and a value. It is possible for a dynamic variable to exist, and not have a value. A value can be assigned, which creates a binding. In TXR Lisp, an assignment is an operation which transfers a value into a binding, not one which creates a binding.
In ANSI Lisp, a dynamic variable can exist which has no value. Accessing the value signals a condition, but storing a value is permitted; doing so creates a binding. By contrast, in TXR Lisp a global variable cannot exist without a value. If a defvar form doesn't specify a value, and the variable doesn't exist, it is created with a value of nil.
Unlike ANSI Common Lisp, TXR Lisp has global lexical variables in addition to special variables. These are defined using defvarl and defparml. The only difference is that when variables are introduced by these macros, the symbols are not marked special, so their binding in lexical scopes is not altered to dynamic binding.
Many variables in TXR Lisp's standard library are global lexicals. Those which are special variables obey the "earmuffs" convention in their naming. For instance s-ifmt, log-emerg and sig-hup are global lexicals, because they provide constant values for which overriding doesn't make sense. On the other hand the standard output stream variable *stdout* is special. Overriding it over a dynamic scope is useful, as a means of redirecting the output of functions which write to the *stdout* stream.
In Common Lisp, defparm is known as defparameter.
The TXR Lisp feature known as syntactic places allows programs to use the syntax of a form which is used to access a value from an environment or object, as an expression which denotes a place where a value may be stored.
They are almost exactly the same concept as "generalized references" in Common Lisp, and are related to "lvalues" in languages in the C family, or "designators" in Pascal.
A symbol is a is a syntactic place if it names a variable. If a is a variable, then it may be assigned using the set operator: the form (set a 42) causes a to have the integer value 42.
A compound expression can be a syntactic place, if its leftmost constituent is as symbol which is specially registered, and if the form has the correct syntax for that kind of place, and suitable semantics. Such an expression is a compound place.
An example of a compound place is a car form. If c is an expression denoting a cons cell, then (car c) is not only an expression which retrieves the value of the car field of the cell. It is also a syntactic place which denotes that field as a storage location. Consequently, the expression (set (car c) "abc") stores the character string "abc" in that location. Although the same effect can be obtained with (rplaca c "abc") the syntactic place frees the programmer from having to remember different update functions for different kinds of places. There are various other advantages. TXR Lisp provides a plethora of operators for modifying a place in addition to set. Subject to certain usage restrictions, these operators work uniformly on all places. For instance, the expression (rotate (car x) [str 3] y) causes three different kinds of places to exchange contents, while the three expressions denoting those places are evaluated only once. New kinds of place update macros like rotate are quite easily defined, as are new kinds of compound places.
When a function call form such as the above (car x) is a syntactic place, then the function is called an accessor. This term is used throughout this document to denote functions which have associated syntactic places.
Syntactic places can be macros (global and lexical), including symbol macros. So for instance in (set x 42) the x place can actually be a symbolic macro which expands to, say, (cdr y). This means that the assignment is effectively (set (cdr y) 42).
Syntactic places, as well as operators upon syntactic places, are both open-ended. Code can be written quite easily in TXR Lisp to introduce new kinds of places, as well as new place-mutating operators. New places can be introduced with the help of the defplace, define-accessor or defset macros, or possibly the define-place-macro macro in simple cases when a new syntactic place can be expressed as a transformation to the syntax of an existing place. Three ways exist for developing new place update macros (place operators). They can be written using the ordinary macro definer ordinary macro definer defmacro, with the help of special utility macros called with-update-expander, with-clobber-expander, and with-delete-expander. They can also be written using defmacro in conjunction with the operators placelet or placelet*. Simple update macros similar to inc and push can be written compactly using define-modify-macro.
Unlike generalized references in Common Lisp, TXR Lisp syntactic places support the concept of deletion. Some kinds of places can be deleted, which is an action distinct from (but does not preclude) being overwritten with a value. What exactly it means for a place to be deleted, or whether that is even permitted, depends on the kind of place. For instance a place which denotes a lexical variable may not be deleted, whereas a global variable may be. A place which denotes a hash-table entry may be deleted, and results in the entry being removed from the hash table. Deleting a place in a list causes the trailing items, if any, or else the terminating atom, to move in to close the gap. Users may define new kinds of places which support deletion semantics.
To bring about their effect, place operators must evaluate one or more places. Moreover, some of them evaluate additional forms which are not places. Which arguments of a place operator form are places and which are ordinary forms depends on its specific syntax. For all the built-in place operators, the position of an argument in the syntax determines whether it is treated as (and consequently required to be) a syntactic place, or whether it is an ordinary form.
All built-in place operators perform the evaluation of place and non-place argument forms in strict left-to-right order.
Place forms are evaluated not in order to compute a value, but in order to determine the storage location. In addition to determining a storage location, the evaluation of a place form may possibly give rise to side effects. Once a place is fully evaluated, the storage location can then be accessed. Access to the storage location is not considered part of the evaluation of a place. To determine a storage location means to compute some hidden referential object which provides subsequent access to that location without the need for a reevaluation of the original place form. (The subsequent access to the place through this referential object may still require a multi-step traversal of a data structure; minimizing such steps is a matter of optimization.)
Place forms may themselves be compounds, which contain subexpressions that must be evaluated. All such evaluation for the built-in places takes place in left to right order.
Certain place operators, such as shift and rotate, exhibit an unspecified behavior with regard to the timing of the access of the prior value of a place, relative to the evaluation of places which occur later in the same place operator form. Access to the prior values may be delayed until the entire form is evaluated, or it may be interleaved into the evaluation of the form. For example, in the form (shift a b c 1), the prior value of a can be accessed and saved as soon as a is evaluated, prior to the evaluation of b. Alternatively, a may be accessed and saved later, after the evaluation of b or after the evaluation of all the forms. This issue affects the behavior of place-modifying forms whose subforms contain side effects. It is recommended that such forms not be used in programs.
Certain place forms are required to have one or more arguments which are themselves places. The prime example of this, and the only example from among built-in syntactic places, are DWIM forms. A DWIM form has the syntax
(dwim obj-place index [alt])
and the square-bracket-notation equivalent:
[obj-place index [alt]]
Note that not only is the entire form a place, denoting some element or element range of obj-place, but there is the added constraint that obj-place must also itself be a syntactic place.
This requirement is necessary, because it supports the behavior that when the element or element range is updated, then obj-place is also potentially updated.
After the assignment (set [obj 0..3] '("forty" "two")) not only is the range of places denoted by [obj 0..3] replaced by the list of strings ("forty" "two") but obj may also be overwritten with a new value.
This behavior is necessary because the DWIM brackets notation maintains the illusion of an encapsulated array-like container over several dissimilar types, including Lisp lists. But Lisp lists do not behave as fully encapsulated containers. Some mutations on Lisp lists return new objects, which then have to stored (or otherwise accepted) in place of the original objects in order to maintain the array-like container illusion.
The following is a summary of the built-in place forms, in addition to symbolic places denoting variables. New syntactic place forms can be defined by TXR programs.
(car object)
(first object)
(rest object)
(second object)
(third object)
...
(tenth object)
(last object [num])
(butlast object [num])
(cdr object)
(caar object)
(cadr object)
(cdar object)
(cddr object)
...
(cdddddr object)
(nthcdr index obj)
(nthlast index obj)
(butlastn num obj)
(last num obj)
(nth index obj)
(ref seq idx)
(sub sequence [from [to]])
(vecref vec idx)
(chr-str str idx)
(gethash hash key [alt])
(hash-userdata hash)
(dwim obj-place index [alt])
(sub-list obj [from [to]])
(sub-vec obj [from [to]])
(sub-str str [from [to]])
[obj-place index [alt]] ;; equivalent to dwim
(symbol-value symbol-valued-form)
(symbol-function function-name-valued-form)
(symbol-macro symbol-valued-form)
(fun function-name)
(force promise)
(errno)
(slot struct-obj slot-name-valued-form)
(qref struct-obj slot-name) ;; by macro-expansion to (slot ...)
struct-obj.slot-name ;; equivalent to qref
(sock-peer socket)
(sock-opt socket level option [ffi-type])
(carray-sub carray [from [to]])
(sub-buf buf [from [to]])
(left node)
(right node)
(key node)
(read-once node)
The following is a summary of the built-in place mutating macros. They are described in detail in their own sections.
TXR Lisp is a Lisp-2 dialect: it features separate namespaces for functions and variables.
In TXR Lisp, global functions and operator macros coexist, meaning that the same symbol can be defined as both a macro and a function.
There is a global namespace for functions, into which functions can be introduced with the defun macro. The global function environment can be inspected and modified using the symbol-function accessor.
There is a global namespace for macros, into which macros are introduced with the defmacro macro. The global function environment can be inspected and modified using the symbol-macro accessor.
If a name x is defined as both a function and a macro, then an expression of the form (x ...) is expanded by the macro, whereas an expression of the form [x ...] refers to the function. Moreover, the macro can produce a call to the function. The expression (fun x) will retrieve the function object.
There is a global namespace for variables also. The operators defvar and defparm introduce bindings into this namespace. These operators have the side effect of marking a symbol as a special variable, of the symbol are treated as dynamic variables, subject to rebinding. The global variable namespace together with the special dynamic rebinding is called the dynamic environment. The dynamic environment can be inspected and modified using the symbol-value accessor.
The operators defvarl and defparml introduce bindings into the global namespace without marking symbols as special variables. Such bindings are called global lexical variables.
Symbol macros may be defined over the global variable namespace using defsymacro.
Note that whereas a symbol may simultaneously have both a function and macro binding in the global namespace, a symbol may not simultaneously have a variable and symbol macro binding.
In addition to global and dynamic namespaces, TXR Lisp provides lexically scoped binding for functions, variables, macros, and symbol macros. Lexical variable binding are introduced with let, let* or various binding macros derived from these. Lexical functions are bound with flet and labels. Lexical macros are established with macrolet and lexical symbol macros with symacrolet.
Macros receive an environment parameter with which they may expand forms in their correct environment, and perform some limited introspection over that environment in order to determine the nature of bindings, or the classification of forms in those environments. This introspection is provided by lexical-var-p, lexical-fun-p, and lexical-lisp1-binding.
Lexical operator macros and lexical functions can also coexist in the following way. A lexical function shadows a global or lexical macro completely. However, the reverse is not the case. A lexical macro shadows only those uses of a function which look like macro calls. This is succinctly demonstrated by the following form:
(flet ((foo () 43))
(macrolet ((foo () 44))
(list (fun foo) (foo) [foo])))
-> (#<interpreted fun: lambda nil> 44 43)
The (fun foo) and [fun] expressions are oblivious to the macro; the macro expansion process process the symbol foo in those contexts. However the form (foo) is subject to macro-expansion and replaced with 44.
If the flet and macrolet are reversed, the behavior is different:
(macrolet ((foo () 44))
(flet ((foo () 43))
(list (fun foo) (foo) [foo])))
-> (#<interpreted fun: lambda nil> 43 43)
All three forms refer to the function, which lexically shadows the macro.
TXR Lisp expressions can be embedded in the TXR pattern language in various ways. Likewise, the pattern language can be invoked from TXR Lisp. This brings about the possibility that Lisp code attempts to access pattern variables bound in the pattern language. The TXR pattern language can also attempt to access TXR Lisp variables.
The rules are as follows, but they have undergone historic changes. See the COMPATIBILITY section, in particular notes under 138 and 121, and also 124.
A Lisp expression evaluated from the TXR pattern language executes in a null lexical environment. The current set of pattern variables captured up to that point by the pattern language are installed as dynamic variables. They shadow any Lisp global variables (whether those are defined by defvar or defvarl).
In the reverse direction, a variable reference from the TXR pattern language searches the pattern variable space first. If a variable doesn't exist there, then the lookup refers to the TXR Lisp global variable space. The pattern language doesn't see Lisp lexical variables.
When Lisp code is evaluated from the pattern language, the pattern variable bindings are not only installed as dynamic variables for the sake of their visibility from Lisp, but they are also specially stored in a dynamic environment frame. When TXR pattern code is reentered from Lisp, these bindings are picked up from the closest such environment frame, allowing the nested invocation of pattern code to continue with the bindings captured by outer pattern code.
Concisely, in any context in which a symbol has both a binding as a Lisp global variable as well as a pattern variable, that symbol refers to the pattern variable. Pattern variables are propagated through Lisp evaluation into nested invocations of the pattern language.
The pattern language can also reference Lisp variables using the @ prefix, which is a consequence of that prefix introducing an expression that is evaluated as Lisp, the name of a variable being such an expression.
The following sections list all of the special operators, macros and functions in TXR Lisp.
In these sections, syntax is indicated using these conventions:
A compound expression with a symbol as its first element, if intended to be evaluated, denotes either an operator invocation or a function call. This depends on whether the symbol names an operator or a function.
When the form is an operator invocation, the interpretation of the meaning of that form is under the complete control of that operator.
If the compound form is a function call, the remaining forms, if any, denote argument expressions to the function. They are evaluated in left-to-right order to produce the argument values, which are passed to the function. An exception is thrown if there are not enough arguments, or too many. Programs can define named functions with the defun operator
Some operators are macros. There exist predefined macros in the library, and macro operators can also be user-defined using the macro-defining operator defmacro. Operators that are not macros are called special operators.
Macro operators work as functions which are given the source code of the form. They analyze the form, and translate it to another form which is substituted in their place. This happens during a code walking phase called the expansion phase, which is applied to each top-level expression prior to evaluation. All macros occurring in a form are expanded in the expansion phase, and subsequent evaluation takes place on a structure which is devoid of macros. All that remains are the executable forms of special operators, function calls, symbols denoting either variables or themselves, and atoms such as numeric and string literals.
Special operators can also perform code transformations during the expansion phase, but that is not considered macroexpansion, but rather an adjustment of the representation of the operator into an required executable form. In effect, it is post-macro compilation phase.
Note that Lisp forms occurring in TXR pattern language are not individual top-level forms. Rather, the entire TXR query is parsed at the same time, and the macros occurring in its Lisp forms are expanded at that time.
(quote form)
The quote operator, when evaluated, suppresses the evaluation of form, and instead returns form itself as an object. For example, if form is a symbol sym, then the value of (quote sym) is sym itself. Without quote, sym would evaluate to the value held by the variable which is named sym, or else throw an error if there is no such variable. The quote operator never raises an error, if it is given exactly one argument, as required.
The notation 'obj is translated to the object (quote obj) providing a shorthand for quoting. Likewise, when an object of the form (quote obj) is printed, it appears as 'obj.
;; yields symbol a itself, not value of variable a
(quote a) -> a
;; yields three-element list (+ 2 2), not 4.
(quote (+ 2 2)) -> (+ 2 2)
Variables are associations between symbols and storage locations which hold values. These associations are called bindings.
Bindings are held in a context called an environment.
Lexical environments hold local variables, and nest according to the syntactic structure of the program. Lexical bindings are always introduced by a some form known as a binding construct, and the corresponding environment is instantiated during the evaluation of that construct. There also exist bindings outside of any binding construct, in the so-called global environment. Bindings in the global environment can be temporarily shadowed by lexically-established binding in the dynamic environment. See the Special Variables section above.
Certain special symbols cannot be used as variable names, namely the symbols t and nil, and all of the keyword symbols (symbols in the keyword package), which are denoted by a leading colon. When any of these symbols is evaluated as a form, the resulting value is that symbol itself. It is said that these special symbols are self-evaluating or self-quoting, similarly to all other atom objects such as numbers or strings.
When a form consisting of a symbol, other than the above special symbols, is evaluated, it is treated as a variable, and yields the value of the variable's storage location. If the variable doesn't exist, an exception is thrown.
Note: symbol forms may also denote invocations of symbol macros. (See the operators defsymacro and symacrolet). All macros, including symbol macros, which occur inside a form are fully expanded prior to the evaluation of a form, therefore evaluation does not consider the possibility of a symbol being a symbol macro.
(defvar sym [value])
(defparm sym value)
The defvar operator binds a name in the variable namespace of the global environment. Binding a name means creating a binding: recording, in some namespace of some environment, an association between a name and some named entity. In the case of a variable binding, that entity is a storage location for a value. The value of a variable is that which has most recently been written into the storage location, and is also said to be a value of the binding, or stored in the binding.
If the variable named sym already exists in the global environment, the form has no effect; the value form is not evaluated, and the value of the variable is unchanged.
If the variable does not exist, then a new binding is introduced, with a value given by evaluating the value form. If the form is absent, the variable is initialized to nil.
The value form is evaluated in the environment in which the defvar form occurs, not necessarily in the global environment.
The symbols t and nil may not be used as variables, nor can they be keyword symbols (symbols denoted by a leading colon).
In addition to creating a binding, the defvar operator also marks sym as the name of a special variable. This changes what it means to bind that symbol in a lexical binding construct such as the let operator, or a function parameter list. See the section "Special Variables" far above.
The defparm macro behaves like defvar when a variable named sym doesn't already exist.
If sym already denotes a variable binding in the global namespace, defparm evaluates the value form and assigns the resulting value to the variable.
The following equivalence holds:
(defparm x y) <--> (prog1 (defvar x) (set x y))
The defvar and defparm forms return sym.
(defvarl sym [value])
(defparml sym value)
The defvarl and defparml macros behave, respectively, almost exactly like defvar and defparm.
The difference is that these operators do not mark sym as special.
If a global variable sym does not previously exist, then after the evaluation of either of these forms (boundp sym) is true, but (special-var-p sym) isn't.
If sym had been already introduced as a special variable, it stays that way after the evaluation of defvarl or defparml.
(let ({sym | (sym init-form)}*) body-form*)
(let* ({sym | (sym init-form)}*) body-form*)
The let and let* operators introduce a new scope with variables and evaluate forms in that scope. The operator symbol, either let or let*, is followed by a list which can contain any mixture of sym or (sym init-form) pairs. Each sym must be a symbol, and specifies the name of variable to be instantiated and initialized.
The (sym init-form) variant specifies that the new variable sym receives an initial value from the evaluation of init-form. The plain sym variant specifies a variable which is initialized to nil. The init-forms are evaluated in order, by both let and let*.
The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.
The difference between let and let* is that in let*, later init-forms are in scope of the variables established by earlier variables in the same let* construct. In plain let, the init-forms are evaluated in a scope which does not include any of the variables.
When the variables are established, the body-forms are evaluated in order. The value of the last body-form becomes the return value of the let. If there are no body-forms, then the return value nil is produced.
The list of variables may be empty.
The list of variables may contain duplicate syms if the operator is let*. In that situation, a given init-form has in scope the rightmost duplicate of any given sym that has been previously established. The body-forms have in scope the rightmost duplicate of any sym in the construct. Therefore, the following form calculates the value 3:
(let* ((a 1)
(a (succ a))
(a (succ a)))
a)
Each duplicate is a separately instantiated binding, and may be independently captured by a lexical closure placed in a subsequent init-form:
(let* ((a 0)
(f1 (lambda () (inc a)))
(a 0)
(f2 (lambda () (inc a))))
(list [f1] [f1] [f1] [f2] [f2] [f2]))
--> (1 2 3 1 2 3)
The preceding example shows that there are two mutable variables named a in independent scopes, each respectively captured by the separate closures f1 and f2. Three calls to f1 increment the first a while the second a retains its initial value.
Under let, the behavior of duplicate variables is unspecified.
Implementation note: the TXR compiler diagnoses and rejects duplicate symbols in let whereas the interpreter ignores the situation.
When the names of a special variables is specified in let or let* remain, a new binding is created for them in the dynamic environment, rather than the lexical environment. In let*, later init-forms are evaluated in a dynamic scope in which previous dynamic variables are established, and later dynamic variables are not yet established. A special variable may appear multiple times in a let*, just like a lexical variable. Each duplicate occurrence extends the dynamic environment with a new dynamic binding. All these dynamic environments are removed when the let or let* form terminates. Dynamic environments aren't captured by lexical closures, but are captured in delimited continuations.
(let ((a 1) (b 2)) (list a b)) -> (1 2)
(let* ((a 1) (b (+ a 1))) (list a b (+ a b))) -> (1 2 3)
(let ()) -> nil
(let (:a nil)) -> error, :a and nil can't be used as variables
(defun name (param* [: opt-param*] [. rest-param])
body-form)
The defun operator introduces a new function in the global function namespace. The function is similar to a lambda, and has the same parameter syntax and semantics as the lambda operator.
Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.
Unlike in lambda, the body-forms of a defun are surrounded by a block. The name of this block is the same as the name of the function, making it possible to terminate the function and return a value using (return-from name value). For more information, see the definition of the block operator.
A function may call itself by name, allowing for recursion.
The special symbols t and nil may not be used as function names. Neither can keyword symbols.
It is possible to define methods as well as macros with defun, as an alternative to the defmeth and defmacro forms.
To define a method, the syntax (meth type name) should be used as the argument to the name parameter. This gives rise to the syntax (defun (meth type name) args form*) which is equivalent to the (defmeth type name args form*) syntax.
Macros can be defined using (macro name) as the name parameter of defun. This way of defining a macro doesn't support destructuring; it defines the expander as an ordinary function with an ordinary argument list. To work, the function must accept two arguments: the entire macro call form that is to be expanded, and the macro environment. Thus, the macro definition syntax is (defun (macro name) form env form*) which is equivalent to the (defmacro name (:form form :env env) form*) syntax.
In ANSI Common Lisp, keywords may be used as function names. In TXR Lisp, they may not.
A function defined by defun may coexist with a macro defined by defmacro. This is not permitted in ANSI Common Lisp.
(lambda (param* [: opt-param*] [. rest-param])
body-form)
(lambda rest-param
body-form)
The lambda operator produces a value which is a function. Like in most other Lisps, functions are objects in TXR Lisp. They can be passed to functions as arguments, returned from functions, aggregated into lists, stored in variables, etc.
Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.
The first argument of lambda is the list of parameters for the function. It may be empty, and it may also be an improper list (dot notation) where the terminating atom is a symbol other than nil. It can also be a single symbol.
The second and subsequent arguments are the forms making up the function body. The body may be empty.
When a function is called, the parameters are instantiated as variables that are visible to the body forms. The variables are initialized from the values of the argument expressions appearing in the function call.
The dotted notation can be used to write a function that accepts a variable number of arguments. There are two ways write a function that accepts only a variable argument list and no required arguments:
(lambda (. rest-param) ...)
(lambda rest-param ...)
(These notations are syntactically equivalent because the list notation (. X) actually denotes the object X which isn't wrapped in any list).
The keyword symbol : (colon) can appear in the parameter list. This is the symbol in the keyword package whose name is the empty string. This symbol is treated specially: it serves as a separator between required parameters and optional parameters. Furthermore, the : symbol has a role to play in function calls: it can be specified as an argument value to an optional parameter by which the caller indicates that the optional argument is not being specified. It will be processed exactly that way.
An optional parameter can also be written in the form (name expr [sym]). In this situation, if the call does not specify a value for the parameter, or specifies a value as the : (colon) keyword symbol, then the parameter takes on the value of the expression expr. This expression is only evaluated when its value is required.
If sym is specified, then sym will be introduced as an additional binding with a Boolean value which indicates whether or not the optional parameter had been specified by the caller.
Each expr that is evaluated is evaluated in an environment in which all of the previous parameters are visible, in addition to the surrounding environment of the lambda. For instance:
(let ((default 0))
(lambda (str : (end (length str)) (counter default))
(list str end counter)))
In this lambda, the initializing expression for the optional parameter end is (length str), and the str variable it refers to is the previous argument. The initializer for the optional variable counter is the expression default, and it refers to the binding established by the surrounding let. This reference is captured as part of the lambda's lexical closure.
Keyword symbols, and the symbols t and nil may not be used as parameter names. The behavior is unspecified if the same symbol is specified more than once anywhere in the parameter list, whether as a parameter name or as the indicator sym in an optional parameter or any combination.
Implementation note: the TXR compiler diagnoses and rejects duplicate symbols in lambda whereas the interpreter ignores the situation.
Note: it is not always necessary to use the lambda operator directly in order to produce an anonymous function.
In situations when lambda is being written in order to simulate partial evaluation, it may be possible to instead make use of the op macro. For instance the function (lambda (. args) [apply + a args]) which adds the values of all of its arguments together, and to the lexically captured variable a can be written more succinctly as (op + a). The op operator is the main representative of a family of operators: lop, ap, ip, do, ado, opip and oand.
In situations when functions are simply combined together, the effect may be achieved using some of the available functional combinators, instead of a lambda. For instance chaining together functions as in (lambda (x) (square (cos x))) is achievable using the chain function: [chain cos square]. The opip operator can also be used: (opip cos square). Numerous combinators are available; see the section Partial Evaluation and Combinators.
When a function is needed which accesses an object, there are also alternatives. Instead of (lambda (obj) obj.slot) and (lambda (obj arg) obj.(slot arg)), it is simpler to use the .slot and .(slot arg) notations. See the section Unbound Referencing Dot. Also see the functions umethod and uslot as well as the related convenience macros umeth and usl.
If a function is needed which partially applies, to some arguments, a method invoked on a specific object, the method function or meth macro may be used. For instance, instead of (lambda (arg) obj.(method 3 arg)), it is possible to write (meth obj 3) except that the latter produces a variadic function.
(let ((counter 0))
(lambda () (inc counter)))
(lambda (x y . z) (list 'my-arguments-are x y z))
(lambda args (list 'my-list-of-arguments args))
[(lambda (x : y) (list x y)) 1] -> (1 nil) [(lambda (x : y) (list x y)) 1 2] -> (1 2)
(flet ({(name param-list function-body-form*)}*)
body-form*)
(labels ({(name param-list function-body-form*)}*)
body-form*)
The flet and labels macros bind local, named functions in the lexical scope.
Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.
The difference between flet and labels is that a function defined by labels can see itself, and therefore recurse directly by name. Moreover, if multiple functions are defined by the same labels construct, they all have each other's names in scope of their bodies. By contrast, a flet-defined function does not have itself in scope and cannot recurse. Multiple functions in the same flet do not have each other's names in their scopes.
More formally, the function-body-forms and param-list of the functions defined by labels are in a scope in which all of the function names being defined by that same labels construct are visible.
Under both labels and flet, the local functions that are defined are lexically visible to the main body-forms.
Note that labels and flet are properly scoped with regard to macros. During macro expansion, function bindings introduced by these macro operators shadow macros defined by macrolet and defmacro.
Furthermore, function bindings introduced by labels and flet also shadow symbol macros defined by symacrolet, when those symbol macros occur as arguments of a dwim form.
See also: the macrolet operator.
The flet and labels macros do not establish named blocks around the body forms of the local functions which they bind. This differs from ANSI Common Lisp, whose local function have implicit named blocks, allowing for return-from to be used.
;; Wastefully slow algorithm for determining evenness.
;; Note:
;; - mutual recursion between labels-defined functions
;; - inner is-even bound by labels shadows the outer
;; one bound by defun so the (is-even n) call goes
;; to the local function.
(defun is-even (n)
(labels ((is-even (n)
(if (zerop n) t (is-odd (- n 1))))
(is-odd (n)
(if (zerop n) nil (is-even (- n 1)))))
(is-even n)))
(call function argument*)
The call function invokes function, passing it the given arguments, if any.
Apply lambda to 1 2 arguments, adding them to produce 3:
(call (lambda (a b) (+ a b)) 1 2)
Useless use of call on a named function; equivalent to (list 1 2):
(apply function [arg* trailing-args])
(iapply function [arg* trailing-args])
The apply function invokes function, optionally passing to it an argument list. The return value of the apply call is that of function.
If no arguments are present after function, then function is invoked without arguments.
If one argument is present after function, then it is interpreted as trailing-args. If this is a sequence (a list, vector or string), then the elements of the sequence are passed as individual arguments to function. If trailing-args is not a sequence, then function is invoked with an improper argument list, terminated by the trailing-args atom.
If two or more arguments are present after function, then the last of these arguments is interpreted as trailing-args. The previous arguments represent leading arguments which are applied to function, prior to the arguments taken from trailing-args.
Note that if trailing-args value is an atom or an improper list, the function is then invoked with an improper argument list. Only a variadic function may be invoked with an improper argument list. Moreover, all of the function's required and optional parameters must be satisfied by elements of the improper list, such that the terminating atom either matches the rest-param directly (see the lambda operator) or else the rest-param receives an improper list terminated by that atom. To treat the terminating atom of an improper list as an ordinary element which can satisfy a required or optional function parameter, the iapply function may be used, described next.
The iapply function ("improper apply") is similar to apply, except with regard to the treatment of trailing-args. Firstly, under iapply, if trailing-args is an atom other than nil (possibly a sequence, such as a vector or string), then it is treated as an ordinary argument: function is invoked with a proper argument list, whose last element is trailing-args. Secondly, if trailing-args is a list, but an improper list, then the terminating atom of trailing-args becomes an individual argument. This terminating atom is not split into multiple arguments, even if it is a sequence. Thus, in all possible cases, iapply treats an extra non-nil atom as an argument, and never calls function with an improper argument list.
;; '(1 2 3) becomes arguments to list, thus (list 1 2 3).
(apply (fun list) '(1 2 3)) -> (1 2 3)
;; this effectively invokes (list 1 2 3 4)
(apply (fun list) 1 2 '(3 4)) -> (1 2 3 4)
;; this effectively invokes (list 1 2 . 3)
(apply (fun list) 1 2 3)) -> (1 2 . 3)
;; "abc" is separated into characters
;; which become arguments of list
(apply (fun list) "abc") -> (#\a #\b #\c)
Note that some uses of this function that are necessary in other Lisp dialects are not necessary in TXR Lisp. The reason is that in TXR Lisp, improper list syntax is accepted as a compound form, and performs application:
(foo a b . x)
Here, the variables a and b supply the first two arguments for foo. In the dotted position, x must evaluate to a list or vector. The list or vector's elements are pulled out and treated as additional arguments for foo. This syntax can only be used if x is a symbolic form or an atom. It cannot be a compound form, because (foo a b . (x)) and (foo a b x) are equivalent structures.
(fun function-name)
The fun operator retrieves the function object corresponding to a named function in the current lexical environment.
The function-name may be a symbol denoting a named function: a built in function, or one defined by defun.
The function-name may also take any of the forms specified in the description of the func-get-name function. If such a function-name refers to a function which exists, then the fun operator yields that function.
Note: the fun operator does not see macro bindings via their symbolic names with which they are defined by defmacro. However, the name syntax (macro name) may be used to refer to macros. This syntax is documented in the description of func-get-name. It is also possible to retrieve a global macro expander using the function symbol-macro.
(dwim argument*)
'['argument*']'
(set (dwim obj-place index [alt]) new-value)
(set '['obj-place index [alt]']' new-value)
The dwim operator's name is an acronym: DWIM may be taken to mean "Do What I Mean", or alternatively, "Dispatch, in a Way that is Intelligent and Meaningful".
The notation [...] is a shorthand which denotes (dwim ...).
Note that since the [ and ] are used in this document for indicating optional syntax, in the above Syntax synopsis the quoted notation '[' and ']' denotes bracket tokens which literally appear in the syntax.
The dwim operator takes a variable number of arguments, which are treated as expressions to be individually macro-expanded and evaluated, using the same rules.
This means that the first argument isn't a function name, but an ordinary expression which can simply compute a function object (or, more generally, a callable object).
Furthermore, for those arguments of dwim which are symbols (after all macro-expansion is performed), the evaluation rules are altered. For the purposes of resolving symbols to values, the function and variable binding namespaces are considered to be merged into a single space, creating a situation that is similar to a Lisp-1 style dialect.
This special Lisp-1 evaluation is not recursively applied. All arguments of dwim which, after macro expansion, are not symbols are evaluated using the normal Lisp-2 evaluation rules. Thus, the DWIM operator must be used in every expression where the Lisp-1 rules for reducing symbols to values are desired.
If a symbol has bindings both in the variable and function namespace in scope, and is referenced by a dwim argument, this constitutes a conflict which is resolved according to two rules. When nested scopes are concerned, then an inner binding shadows an outer binding, regardless of their kind. An inner variable binding for a symbol shadows an outer or global function binding, and vice versa.
If a symbol is bound to both a function and variable in the global namespace, then the variable binding is favored.
Macros do not participate in the special scope conflation, with one exception. What this means is that the space of symbol macros is not folded together with the space of operator macros. An argument of dwim that is a symbol might be symbol macro, variable or function, but it cannot be interpreted as the name of a operator macro.
The exception is this: from the perspective of a dwim form, function bindings can shadow symbol macros. If a function binding is defined in an inner scope relative to a symbol macro for the same symbol, using flet or labels, the function hides the symbol macro. In other words, when macro expansion processes an argument of a dwim form, and that argument is a symbol, it is treated specially in order to provide a consistent name lookup behavior. If the innermost binding for that symbol is a function binding, it refers to that function binding, even if a more outer symbol macro binding exists, and so the symbol is not expanded using the symbol macro. By contrast, in an ordinary form, a symbolic argument never resolves to a function binding. The symbol refers to either a symbol macro or a variable, whichever is nested closer.
If, after macro expansion, the leftmost argument of the dwim is the name of a special operator or macro, the dwim form doesn't denote an invocation of that operator or macro. A dwim form is an invocation of the dwim operator, and the leftmost argument of that operator, if it is a symbol, is treated as a binding to be resolved in the variable or function namespace, like any other argument. Thus [if x y] is an invocation of the if function, not the if operator.
How many arguments are required by the dwim operator depends on the type of object to which the first argument expression evaluates. The possibilities are:
This form is also a syntactic place. If a value is stored to this place, it replaces the element.
The place may also be deleted, which has the effect of removing the element from the sequence, shifting the elements at higher indices, if any, down one element position, and shortening the sequence by one. If the place is deleted, and if sequence is a list, then the sequence form itself must be a place.
This form is implemented using the ref accessor such that, except for the argument evaluation semantics of the DWIM brackets, it is equivalent to using the (ref sequence index) syntax.
This form is also a syntactic place. Storing a value in this place has the effect of replacing the subsequence with a new subsequence. Deleting the place has the effect of removing the specified subsequence from sequence. If sequence is a list, then the sequence form must itself be a place. The new-value argument in a range assignment can be a string, vector or list, regardless of whether the target is a string, vector or list. If the target is a string, the replacement sequence must be a string, or a list or vector of characters.
The semantics is implemented using the sub accessor, such that the following equivalence holds:
[seq from..to] <--> (sub seq from..to)
For this reason, sequence may be any object that is iterable by iter-begin.
This form is equivalent to (select sequence where-index) except when the target of an assignment operation.
This form is a syntactic place if sequence is one. If a sequence is assigned to this place, then elements of the sequence are distributed to the specified locations.
The following equivalences hold between index-list-based indexing and the select and replace functions, except that set always returns the value assigned, whereas replace returns its first argument:
[seq idx-list] <--> (select seq idx-list)
(set [seq idx-list] new) <--> (replace seq new idx-list)
Note that unlike the select function, this does not support [hash index-list] because since hash keys may be lists, that syntax is indistinguishable from a simple hash lookup where index-list is the key.
If start is specified, it gives the starting position where the search begins, and if from-end is given, and has a value other than nil, it specifies a search from right to left. These optional arguments have the same conventions and semantics as their equivalents in the search-regst function.
Note that string is always required, and is always the rightmost argument.
Vector and list range indexing is based from zero, meaning that the first element is numbered zero, the second one and so on. zero. Negative values are allowed; the value -1 refers to the last element of the vector or list, and -2 to the second last and so forth. Thus the range 1 .. -2 means "everything except for the first element and the last two".
The symbol t represents the position one past the end of the vector, string or list, so 0..t denotes the entire list or vector, and the range t..t represents the empty range just beyond the last element. It is possible to assign to t..t. For instance:
(defvar list '(1 2 3))
(set [list t..t] '(4)) ;; list is now (1 2 3 4)
The value zero has a "floating" behavior when used as the end of a range. If the start of the range is a negative value, and the end of the range is zero, the zero is interpreted as being the position past the end of the sequence, rather than the first element. For instance the range -1..0 means the same thing as -1..t. Zero at the start of a range always means the first element, so that 0..-1 refers to all the elements except for the last one.
The dwim operator allows for a Lisp-1 flavor of programming in TXR Lisp, which is principally a Lisp-2 dialect.
A Lisp-1 dialect is one in which an expression like (a b) treats both a and b as expressions subject to the same evaluation rules—at least, when a isn't an operator or an operator macro. This means that the symbols a and b are resolved to values in the same namespace. The form denotes a function call if the value of variable a is a function object. Thus in a Lisp-1, named functions do not exist as such: they are just variable bindings. In a Lisp-1, the form (car 1) means that there is a variable called car, which holds a function, which is retrieved from that variable and the argument 1 is applied to it. In the expression (car car), both occurrences of car refer to the variable, and so this form applies the car function to itself. It is almost certainly meaningless. In a Lisp-2 (car 1) means that there is a function called car, in the function namespace. In the expression (car car) the two occurrences refer to different bindings: one is a function and the other a variable. Thus there can exist a variable car which holds a cons-cell object, rather than the car function, and the form makes sense.
The Lisp-1 approach is useful for functional programming, because it eliminates cluttering occurrences of the call and fun operators. For instance:
;; regular notation
(call foo (fun second) '((1 a) (2 b)))
;; [] notation
[foo second '((1 a) (2 b))]
Lisp-1 dialects can also provide useful extensions by giving a meaning to objects other than functions in the first position of a form, and the dwim/[...] syntax does exactly this.
TXR Lisp is a Lisp-2 because Lisp-2 also has advantages. Lisp-2 programs which use macros naturally achieve hygiene because lexical variables do not interfere with the function namespace. If a Lisp-2 program has a local variable called list, this does not interfere with the hidden use of the function list in a macro expansion in the same block of code. Lisp-1 dialects have to provide hygienic macro systems to attack this problem. Furthermore, even when not using macros, Lisp-1 programmers have to avoid using the names of functions as lexical variable names, if the enclosing code might use them.
The two namespaces of a Lisp-2 also naturally accommodate symbol macros and operator macros. Whereas functions and variables can be represented in a single namespace readily, because functions are data objects, this is not so with symbol macros and operator macros, the latter of which are distinguished syntactically by their position in a form. In a Lisp-1 dialect, given (foo bar), either of the two symbols could be a symbol macro, but only foo can possibly be an operator macro. Yet, having only a single namespace, a Lisp-1 doesn't permit (foo foo), where foo is simultaneously a symbol macro and an operator macro, though the situation is unambiguous by syntax even in Lisp-1. In other words, Lisp-1 dialects do not entirely remove the special syntactic recognition given to the leftmost position of a compound form, yet at the same time they prohibit the user from taking full advantage of it by providing only one namespace.
TXR Lisp provides the "best of both worlds": the DWIM brackets notation provides a model of Lisp-1 computation that is purer than Lisp-1 dialects (since the leftmost argument is not given any special syntactic treatment at all) while the Lisp-2 foundation provides a traditional Lisp environment with its "natural hygiene".
(functionp obj)
The functionp function returns t if obj is a function, otherwise it returns nil.
(copy-fun function)
The copy-fun function produces and returns a duplicate of function, which must be a function.
A duplicate of a function is a distinct function object not eq to the original function, yet which accepts the same arguments and behaves exactly the same way as the original.
If a function contains no captured environment, then a copy made of that function by copy-fun is indistinguishable from the original function in every regard, except for being a distinct object that compares unequal to the original under the eq function.
If a function contains a captured environment, then a copy of that function made by copy-fun has its own copy of that environment. If the copied function changes the values of captured lexical variables, the original function is not affected by these changes and vice versa.
The entire lexical environment is copied; the copy and original function do not share any portion of the environment at any level of nesting.
(progn form*)
(prog1 form*)
The progn operator evaluates each form in left-to-right order, and returns the value of the last form. The value of the form (progn) is nil.
The prog1 operator evaluates each form in left-to-right order, and returns the value of the first form. The value of the form (prog1) is nil.
Various other operators such as let also arrange for the evaluation of a body of forms, the value of the last of which is returned. These operators are said to feature an implicit progn.
These special operators are also functions. The progn function accepts zero or more arguments. It returns its last argument, or nil if called with no arguments. The prog1 function likewise accepts zero or more arguments. It returns its first argument, or nil if called with no arguments.
In ANSI Common Lisp, prog1 requires at least one argument. Neither prog nor prog1 exist as functions.
(prog2 form*)
The prog2 evaluates each form in left-to-right order. The value is that of the second form, if present, otherwise it is nil.
The form (prog2 1 2 3) yields 2. The value of (prog2 1 2) is also 2; (prog2 1) and (prog2) yield nil.
The prog2 symbol also has a function binding. The prog2 function accepts any number of arguments. If invoked with at least two arguments, it returns the second one. Otherwise it returns nil.
In ANSI Common Lisp, prog2 requires at least two arguments. It does not exist as a function.
(cond {(test form*)}*)
The cond operator provides a multi-branching conditional evaluation of forms. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.
The forms are processed from left to right as follows: the first form, test, in each group is evaluated. If it evaluates true, then the remaining forms in that group, if any, are also evaluated. Processing then terminates and the result of the last form in the group is taken as the result of cond. If test is the only form in the group, then result of test is taken as the result of cond.
If the first form of a group yields nil, then processing continues with the next group, if any. If all form groups yield nil, then the cond form yields nil. This holds in the case that the syntax is empty: (cond) yields nil.
(caseq test-form normal-clause* [else-clause])
(caseql test-form normal-clause* [else-clause])
(casequal test-form normal-clause* [else-clause])
These three macros arrange for the evaluation of test-form, whose value is then compared against the key or keys in each normal-clause in turn. When the value matches a key, then the remaining forms of normal-clause are evaluated, and the value of the last form is returned; subsequent clauses are not evaluated. When the value doesn't match any of the keys of a normal-clause then the next normal-clause is tested. If all these clauses are exhausted, and there is no else-clause, then the value nil is returned. Otherwise, the forms in the else-clause are evaluated, and the value of the last one is returned. If there are no forms, then nil is returned.
The syntax of a normal-clause takes on these two forms:
(key form*)
where key may be an atom which denotes a single key, or else a list of keys. There is a restriction that the symbol t may not be used as key. The form (t) may be used as a key to match that symbol.
The syntax of an else-clause is:
(t form*)
which resembles a form that is often used as the final clause in the cond syntax.
The three forms of the case construct differ from what type of test they apply between the value of test-form and the keys. The caseq macro generates code which uses the eq function's equality. The caseql macro uses eql, and casequal uses equal.
(let ((command-symbol (casequal command-string
(("q" "quit") 'quit)
(("a" "add") 'add)
(("d" "del" "delete") 'delete)
(t 'unknown))))
...)
(caseq* test-form normal-clause* [else-clause])
(caseql* test-form normal-clause* [else-clause])
(casequal* test-form normal-clause* [else-clause])
The caseq*, caseql*, and casequal* macros are similar to the macros caseq, caseql, and casequal, differing from them in only the following regard. The normal-clause, of these macros has the form (evaluated-key form*) where evaluated-key is either an atom, which is evaluated to produce a key, or else else a compound form, whose elements are evaluated as forms, producing multiple keys. This evaluation takes place at macro-expansion time, in the global environment.
The else-clause works the same way under these macros as under caseq et al.
Note that although in a normal-clause, evaluated-key must not be the atom t, there is no restriction against it being an atom which evaluates to t. In this situation, the value t has no special meaning.
The evaluated-key expressions are evaluated in the order in which they appear in the construct, at the time the caseq*, caseql* or casequal* macro is expanded.
Note: these macros allow the use of variables and global symbol macros as case keys.
(defvarl red 0)
(defvarl green 1)
(defvarl blue 2)
(let ((color blue))
(caseql* color
(red "hot")
((green blue) "cool")))
--> "cool"
(ecaseq test-form normal-clause* [else-clause])
(ecaseql test-form normal-clause* [else-clause])
(ecasequal test-form normal-clause* [else-clause])
(ecaseq* test-form normal-clause* [else-clause])
(ecaseql* test-form normal-clause* [else-clause])
(ecasequal* test-form normal-clause* [else-clause])
These macros are error-catching variants of, respectively, caseq, caseql, casequal, caseq*, caseql* and casequal*.
If the else-clause is present in the invocation of an error-catching case macro, then the the invocation is precisely equivalent to the corresponding non-error-trapping variant.
If the else-clause is missing in the invocation of an error-catching variant, then a default else-clause is inserted which throws an exception of type case-error, derived from error. After this insertion, the semantics follows that of the non-error-trapping variant.
For instance, (ecaseql 3), which has no else-clause, is equivalent to (caseql 3 (t expr)) where expr indicates the inserted expression which throws case-error. However, (ecaseql 3 (t 42)) is simply equivalent to (caseql 3 (t 42)), since it has an else-clause.
Note: the error-catching case macros are intended for situations in which it is a matter of program correctness that every possible value of test-form matches a normal-clause, such that if a failure to match occurs, it indicates a software defect. The error-throwing else-clause helps to ensure that the error situation is noticed. Without this clause, the case macro terminates with a value of nil, which may conceal the defect and delay its identification.
(if cond t-form [e-form])
'['if cond then [else]']'
There exist both an if operator and an if function. A list form with the symbol if in the first position is interpreted as an invocation of the if operator. The function can be accessed using the DWIM bracket notation and in other ways.
The if operator provides a simple two-way-selective evaluation control. The cond form is evaluated. If it yields true then t-form is evaluated, and that form's return value becomes the return value of the if. If cond yields false, then e-form is evaluated and its return value is taken to be that of if. If e-form is omitted, then the behavior is as if e-form were specified as nil.
The if function provides no evaluation control. All of its arguments are evaluated from left to right. If the cond argument is true, then it returns the then argument, otherwise it returns the value of the else argument if present, otherwise it returns nil.
(and form*)
'['and arg*']'
There exist both an and operator and an and function. A list form with the symbol and in the first position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.
The and operator provides three functionalities in one. It computes the logical "and" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). It also provides an idiom for the convenient substitution of a value in place of nil when some other values are all true.
The and operator evaluates as follows. First, a return value is established and initialized to the value t. The forms, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when nil is stored in the return value. When evaluation stops, the operator yields the return value.
The and function provides no evaluation control: it receives all of its arguments fully evaluated. If it is given no arguments, it returns t. If it is given one or more arguments, and any of them are nil, it returns nil. Otherwise, it returns the value of the last argument.
(and) -> t
(and (> 10 5) (stringp "foo")) -> t
(and 1 2 3) -> 3 ;; shorthand for (if (and 1 2) 3).
(nand form*)
'['nand arg*']'
There exist both a nand macro and a nand function. A list form with the symbol nand in the first position is interpreted as an invocation of the macro. The function can be accessed using the DWIM bracket notation and in other ways.
The nand macro and function are the logical negation of the and operator and function. They are related according to the following equivalences:
(nand f0 f1 f2 ...) <--> (not (and f0 f1 f2 ...))
[nand f0 f1 f2 ...] <--> (not [and f0 f1 f2 ...])
(or form*)
'['or arg*']'
There exist both an or operator and an or function. A list form with the symbol or in the first position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.
The or operator provides three functionalities in one. It computes the logical "or" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). The behavior of or also provides an idiom for the selection of the first non-nil value from a sequence of forms.
The or operator evaluates as follows. First, a return value is established and initialized to the value nil. The forms, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when a true value is stored into the return value. When evaluation stops, the operator yields the return value.
The or function provides no evaluation control: it receives all of its arguments fully evaluated. If it is given no arguments, it returns nil. If all of its arguments are nil, it also returns nil. Otherwise, it returns the value of the first argument which isn't nil.
(or) -> nil
(or 1 2) -> 1
(or nil 2) -> 2
(or (> 10 20) (stringp "foo")) -> t
(nor form*)
'['nor arg*']'
There exist both a nor macro and a nor function. A list form with the symbol nor in the first position is interpreted as an invocation of the macro. The function can be accessed using the DWIM bracket notation and in other ways.
The nor macro and function are the logical negation of the or operator and function. They are related according to the following equivalences:
(nor f0 f1 f2 ...) <--> (not (or f0 f1 f2 ...))
[nor f0 f1 f2 ...] <--> (not [or f0 f1 f2 ...])
(when expression form*)
(unless expression form*)
The when macro operator evaluates expression. If expression yields true, and there are additional forms, then each form is evaluated. The value of the last form becomes the result value of the when form. If there are no forms, then the result is nil.
The unless operator is similar to when, except that it reverses the logic of the test. The forms, if any, are evaluated if and only if expression is false.
(while expression form*)
(until expression form*)
The while macro operator provides a looping construct. It evaluates expression. If expression yields nil, then the evaluation of the while form terminates, producing the value nil. Otherwise, if there are additional forms, then each form is evaluated. Next, evaluation returns to expression, repeating all of the previous steps.
The until macro operator is similar to while, except that the until form terminates when expression evaluates true, rather than false.
These operators arrange for the evaluation of all their enclosed forms in an anonymous block. Any of the forms, or expression, may use the return operator to terminate the loop, and optionally to specify a result value for the form.
The only way these forms can yield a value other than nil is if the return operator is used to terminate the implicit anonymous block, and is given an argument, which becomes the result value.
(while* expression form*)
(until* expression form*)
The while* and until* macros are similar, respectively, to the macros while and until.
They differ in one respect: they begin by evaluating the forms one time unconditionally, without first evaluating expression. After this evaluation, the subsequent behavior is like that of while or until.
Another way to regard the behavior is that that these forms execute one iteration unconditionally, without evaluating the termination test prior to the first iteration. Yet another view is that these constructs relocate the test from the top of the loop to the bottom of the loop.
(whilet ({sym | (sym init-form)}+)
body-form*)
The whilet macro provides a construct which combines iteration with variable binding.
The evaluation of the form takes place as follows. First, fresh bindings are established for syms as if by the let* operator. It is an error for the list of variable bindings to be empty.
After the establishment of the bindings, the value of the last sym is tested. If the value is nil, then whilet terminates. Otherwise, body-forms are evaluated in the scope of the variable bindings, and then whilet iterates from the beginning, again establishing fresh bindings for the syms, and testing the value of the last sym.
All evaluation takes place in an anonymous block, which can be terminated with the return operator. Doing so terminates the loop. If the whilet loop is thus terminated by an explicit return, a return value can be specified. Under normal termination, the return value is nil.
In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test.
;; read lines of text from *stdin* and print them,
;; until the end-of-stream condition:
(whilet ((line (get-line)))
(put-line line))
;; read lines of text from *stdin* and print them,
;; until the end-of-stream condition occurs or
;; a line is identical to the character string "end".
(whilet ((line (get-line))
(more (and line (nequal line "end"))))
(put-line line))
(iflet {({sym | (sym init-form)}+) | atom-form}
then-form [else-form])
(whenlet {({sym | (sym init-form)}+) | atom-form}
body-form*)
The iflet and whenlet macros combine the variable binding of let* with conditional evaluation of if and when, respectively.
In either construct's syntax, a non-compound form atom-form may appear in place of the variable binding list. In this case, atom-form is evaluated as a form, and the construct is equivalent to its respective ordinary if or when counterpart.
If the list of variable bindings is empty, it is interpreted as the atom nil and treated as an atom-form.
If one or more bindings are specified rather than atom-form, then the evaluation of these forms takes place as follows. First, fresh bindings are established for syms as if by the let* operator.
Then, the last variable's value is tested. If it is not nil then the test is true, otherwise false.
In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test. This is intended to be useful in situations in which then-form or else-form do not require access to the tested value.
In the case of the iflet operator, if the test is true, the operator evaluates then-form and yields its value. Otherwise the test is false, and if the optional else-form is present, that is evaluated instead and its value is returned. If this form is missing, then nil is returned.
In the case of the whenlet operator, if the test is true, then the body-forms, if any, are evaluated. The value of the last one is returned, otherwise nil if the forms are missing. If the test is false, then evaluation of body-forms is skipped, and nil is returned.
;; dispose of foo-resource if present
(whenlet ((foo-res (get-foo-resource obj)))
(foo-shutdown foo-res)
(set-foo-resource obj nil))
;; Contrast with: above, using when and let
(let ((foo-res (get-foo-resource obj)))
(when foo-res
(foo-shutdown foo-res)
(set-foo-resource obj nil)))
;; print frobosity value if it exceeds 150
(whenlet ((fv (get-frobosity-value))
(exceeds-p (> fv 150)))
(format t "frobosity value ~a exceeds 150\n" fv))
;; same as above, taking advantage of the
;; last variable being optional:
(whenlet ((fv (get-frobosity-value))
((> fv 150)))
(format t "frobosity value ~a exceeds 150\n" fv))
;; yield 4: 3 interpreted as atom-form
(whenlet 3 4)
;; yield 4: nil interpreted as atom-form
(iflet () 3 4)
(condlet
([({ sym | (sym init-form)}+) | atom-form]
body-form*)*)
The condlet macro generalizes iflet.
Each argument is a compound consisting of at least one item: a list of bindings or atom-form. This item is followed by zero or more body-forms.
If there are no body-forms then the situation is treated as if there were a single body-form specified as nil.
The arguments of condlet are considered in sequence, starting with the leftmost.
If the argument's left item is an atom-form then the form is evaluated. If it yields true, then the body-forms next to it are evaluated in order, and the condlet form terminates, yielding the value obtained from the last body-form. If atom-form yields false, then the next argument is considered, if there is one.
If the argument's left item is a list of bindings, then it is processed with exactly the same logic as under the iflet macro. If the last binding contains a true value, then the adjoining body-forms are evaluated in a scope in which all of the bindings are visible, and condlet terminates, yielding the value of the last body-form. Otherwise, the next argument of condlet is considered (processed in a scope in which the bindings produced by the current item are no longer visible).
If condlet runs out of arguments, it terminates and returns nil.
(let ((l '(1 2 3)))
(condlet
;; first arg
(((a (first l) ;; a binding gets 1
(b (second l)) ;; b binding gets 2
(g (> a b)))) ;; last variable g is nil
'foo) ;; not evaluated
;; second arg
(((b (second l) ;; b gets 2
(c (third l)) ;; c gets 3
(g (> b c)))) ;; last variable g is true
'bar))) ;; condlet terminates
--> bar ;; result is bar
(ifa cond then [else])
The ifa macro provides an anaphoric conditional operator resembling the if operator. Around the evaluation of the then and else forms, the symbol it is implicitly bound to a subexpression of cond, a subexpression which is thereby identified as the it-form. This it alias provides a convenient reference to that place or value, similar to the word "it" in the English language, and similar anaphoric pronouns in other languages.
If it is bound to a place form, the binding is established as if using the placelet operator: the form is evaluated only once, even if the it alias is used multiple times in the then or else expressions. Furthermore, the place form is implicitly surrounded with read-once so that the place's value is accessed only once, and multiple references to it refer to a copy of the value cached in a hidden variable, rather than generating multiple accesses to the place. Otherwise, if the form is not a syntactic place it is bound as an ordinary lexical variable to the form's value.
An it-candidate is an an expression viable for having its value or storage location bound to the it symbol. An it-candidate is any expression which is not a constant expression according to the constantp function, and not a symbol.
The ifa macro imposes applies several rules to the cond expression:
(ifa (not expr) then else) -> (ifa expr else then)
which applies likewise for null or false substituted for not. The Boolean inverse function is removed, and the then and else expressions are exchanged.
In all other regards, the ifa macro behaves similarly to if.
The cond expression is evaluated, and, if applicable, the value of, or storage location denoted by the appropriate argument is captured and bound to the variable it whose scope extends over the then form, as well as over else, if present.
If cond yields a true value, then then is evaluated and the resulting value is returned, otherwise else is evaluated if present and its value is returned. A missing else is treated as if it were the nil form.
(ifa t 1 0) -> 1
;; Rule 6: it binds to (* x x), which is
;; the only it-candidate.
(let ((x 6) (y 49))
(ifa (> y (* x x)) ;; it binds to (* x x)
(list it)))
-> (36)
;; Rule 4: it binds to argument of evenp,
;; even though 4 isn't an it-candidate.
(ifa (evenp 4)
(list it))
-> (4)
;; Rule 5:
(ifa (not (oddp 4))
(list it))
-> (4)
;; Rule 7: no candidates: choose leftmost
(let ((x 6) (y 49))
(ifa (< 1 x y)
(list it)))
-> (1)
-> (4)
;; Violation of Rule 1:
;; while is not a function
(ifa (while t (print 42))
(list it))
--> exception!
;; Violation of Rule 2:
(let ((x 6) (y 49))
(ifa (> (* y y y) (* x x)))
(list it))
--> exception!
(conda {(test form*)}*)
The conda operator provides a multi-branching conditional evaluation of forms, similarly to the cond operator. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.
The conda operator is anaphoric: it expands into a nested structure of zero or more ifa invocations, according to these patterns:
(conda) -> nil
(conda (x y ...) ...) -> (ifa x (progn y ...) (conda ...))
Thus, conda inherits all the restrictions on the test expressions from ifa, as well as the anaphoric it variable feature.
(whena test form*)
The whena macro is similar to the when macro, except that it is anaphoric in exactly the same manner as the ifa macro. It may be understood as conforming to the following equivalence:
(whena x f0 f2 ...) <--> (if x (progn f0 f2 ...))
(dotimes (var count-form [result-form])
body-form*)
The dotimes macro implements a simple counting loop. var is established as a variable, and initialized to zero. count-form is evaluated one time to produce a limiting value, which should be a number. Then, if the value of var is less than the limiting value, the body-forms are evaluated, var is incremented by one, and the process repeats with a new comparison of var against the limiting value possibly leading to another evaluation of the forms.
If var is found to equal or exceed the limiting value, then the loop terminates.
When the loop terminates, its return value is nil unless a result-form is present, in which case the value of that form specifies the return value.
body-forms as well as result-form are evaluated in the scope in which the binding of var is visible.
(each ({(sym init-form)}*) body-form*)
(each* ({(sym init-form)}*) body-form*)
(collect-each ({(sym init-form)}*) body-form*)
(collect-each* ({(sym init-form)}*) body-form*)
(append-each ({(sym init-form)}*) body-form*)
(append-each* ({(sym init-form)}*) body-form*)
These operators establish a loop for iterating over the elements of one or more sequences. Each init-form must evaluate to an iterable object that is suitable as an argument for the iter-begin function. The sequences are then iterated in parallel over repeated evaluations of the body-forms, with each sym variable being assigned to successive elements of its sequence. The shortest list determines the number of iterations, so if any of the init-forms evaluate to an empty sequence, the body is not executed.
If the list of (sym init-form) pairs itself is empty, then an infinite loop is specified.
The body forms are enclosed in an anonymous block, allowing the return operator to terminate the loop prematurely and optionally specify the return value.
The collect-each and collect-each* variants are like each and each*, except that for each iteration, the resulting value of the body is collected into a list. When the iteration terminates, the return value of the collect-each or collect-each* operator is this collection.
The append-each and append-each* variants are like each and each*, except that for each iteration other than the last, the resulting value of the body must be a list. The last iteration may produce either an atom or a list. The objects produced by the iterations are combined together as if they were arguments to the append function, and the resulting value is the value of the append-each or append-each* operator.
The alternate forms denoted by the adorned symbols each*, collect-each* and append-each*, differ from each, collect-each and append-each in the following way. The plain forms evaluate the init-forms in an environment in which none of the sym variables are yet visible. By contrast, the alternate forms evaluate each init-form in an environment in which bindings for the previous sym variables are visible. In this phase of evaluation, sym variables are list-valued: one by one they are each bound to the list object emanating from their corresponding init-form. Just before the first loop iteration, however, the sym variables are assigned the first item from each of their lists.
The semantics of collect-each may be understood in terms of an equivalence to a code pattern involving mapcar:
(collect-each ((x xinit) (mapcar (lambda (x y)
(y yinit)) <--> body)
body) xinit yinit)
The collect-each* variant may be understood in terms of the following equivalence involving let* for sequential binding and mapcar:
(collect-each* ((x xinit) (let* ((x xinit)
(y yinit)) <--> (y yinit))
body) (mapcar (lambda (x y)
body)
x y))
However, note that the let* as well as each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are first initialized with the initializing expressions, and then reused as iteration variables which are stepped by assignment.
The other operators may be understood likewise, with the substitution of the mapdo function in the case of each and each* and of the mappend function in the case of append-each and append-each*.
;; print numbers from 1 to 10 and whether they are even or odd
(each* ((n 1..11) ;; n is just a range object in this scope
(even (collect-each ((m n)) (evenp m))))
;; n is an integer in this scope
(format t "~s is ~s\n" n (if even "even" "odd")))
1 is "odd"
2 is "even"
3 is "odd"
4 is "even"
5 is "odd"
6 is "even"
7 is "odd"
8 is "even"
9 is "odd"
10 is "even"
({for | for*} ({sym | (sym init-form)}*)
([test-form result-form*])
(inc-form*)
body-form*)
The for and for* operators combine variable binding with loop iteration. The first argument is a list of variables with optional initializers, exactly the same as in the let and let* operators. Furthermore, the difference between for and for* is like that between let and let* with regard to this list of variables.
The for and for* operators execute these steps:
Furthermore, the for and for* operators establish an anonymous block, allowing the return operator to be used to terminate at any point.
({doloop | doloop*}
({ sym | (sym [init-form [step-form])}*)
([test-form result-form*])
tagbody-form*)
The doloop and doloop* macros provide an iteration construct inspired by the ANSI Common Lisp do and do* macros.
Each sym element in the form must be a symbol suitable for use as a variable name.
The tagbody-forms are placed into an implicit tagbody, meaning that a tagbody-form which is an integer, character or symbol is interpreted as a tagbody label which may be the target of a control transfer via the go macro.
The doloop macro binds each sym to the value produced by evaluating the adjacent init-form. Then, in the environment in which these variables now exist, test-form is evaluated. If that form yields nil, then the loop terminates. The result-forms are evaluated, and the value of the last one is returned.
If result-forms are absent, then nil is returned.
If test-form is also absent, then the loop terminates and returns nil.
If test-form produces a true value, then result-forms are not evaluated. Instead, the implicit tagbody comprised of the tagbody-forms is evaluated. If that evaluation terminates normally, the loop variables are then updated by assigning to each sym the value of step-form.
The following defaulting behaviors apply in regard to the variable syntax. For each sym which has an associated init-form but no step-form, the init-form is duplicated and taken as the step-form. Thus a variable specification like (x y) is equivalent to (x y y). If both forms are omitted, then the init-form is taken to be nil, and the step-form is taken to be sym. This means that the variable form (x) is equivalent to (x nil x) which has the effect that x retains its current value when the next loop iteration begins. Lastly, the sym variant is equivalent to (sym) so that x is also equivalent to (x nil x).
The differences between doloop and doloop* are: doloop binds the variables in parallel, similarly to let, whereas doloop* binds sequentially, like let*; moreover, doloop performs the step-form assignments in parallel as if using a single (pset sym0 step-form-0 sym1 step-form-1 ...) form, whereas doloop* performs the assignment sequentially as if using set rather than pset.
The doloop and doloop* macros establish an anonymous block, allowing early return from the loop, with a value, via the return operator.
These macros are substantially different from the ANSI Common Lisp do and do* macros. Firstly, the termination logic is inverted; effectively they implement "while" loops, whereas their ANSI CL counterparts implement "until" loops. Secondly, in the ANSI CL macros, the defaulting of the missing step-form is different. Variables with no step-form are not updated. In particular, this means that the form (x y) is not equivalent to (x y y); the ANSI CL macros do not feature the automatic replication of init-form into the step-form position.
(sum-each ({(sym init-form)}*) body-form*)
(sum-each* ({(sym init-form)}*) body-form*)
(mul-each ({(sym init-form)}*) body-form*)
(mul-each* ({(sym init-form)}*) body-form*)
The macros sum-each, and mul-each behave very similarly to the each operator. Whereas the each operator form returns nil as its result, the sum-each and mul-each forms, if they execute to completion and return normally, return an accumulated value.
The sum-each macro initializes newly instantiated, hidden accumulator variable to the value 0. For each iteration of the loop, the body-forms are evaluated, and are expected to produce a value. This value is added to the current value of the hidden accumulator using the + function, and the result is stored into the accumulator. If sum-each returns normally, then the value of this accumulator becomes its resulting value.
The mul-each macro similarly initializes a hidden accumulator to the value 1. The value from each iteration of the body is multiplied with the accumulator using the * function, and the result is stored into the accumulator. If mul-each returns normally, then the value of this accumulator becomes its resulting value.
The sum-each* and mul-each* variants of the macros implement the sequential scoping rule for the variable bindings, exactly the way each* alters the semantics of each.
The body-forms are enclosed in an implicit anonymous block. If the forms terminate by returning from the anonymous block then these macros terminate with the specified value.
When sum-each* and sum-each are specified with variables whose values specify zero iterations, or with no variables at all, the form terminates with a value of 0. In this situation, mul-each and mul-each* terminate with 1. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.
It is unspecified whether mul-each and mul-each* continue iterating when the accumulator takes on a value satisfying the zerop predicate.
(each-true ({(sym init-form)}*) body-form*)
(some-true ({(sym init-form)}*) body-form*)
(each-false ({(sym init-form)}*) body-form*)
(some-false ({(sym init-form)}*) body-form*)
These macros iterate zero or more variables over sequences, similarly to the each operator, and calculate logical results, with short-circuiting semantics.
The each-true macro initializes an internal result variable to the t value. It then evaluates the body-forms for each tuple of variable values, replacing the result variable with the value produced by these forms. If that value is nil, the iteration stops. When the iteration terminates normally, the value of the result variable is returned.
If no variables are specified, termination occurs immediately. Note that this is different from the each operator, which iterates indefinitely if no variables are specified.
The body-forms are surrounded by an implicit anonymous block, making it possible to terminate via return or return-from. In these cases, the form terminates with nil or the specified return value. The internal result is ignored.
The some-true macro is similar to each-true, with the following differences. The internal result variable is initialized to nil rather than t. The iteration stops whenever the body-forms produce a true value, and that value is returned.
The each-false and some-false macros are, respectively, similar to each-true and some-true, with one difference. After each iteration, the value produced by the body-forms is logically inverted using the not function prior to being assigned to the result variable.
(each-true ()) -> t
(each-true ((a ()))) -> t
(each-true ((a '(1 2 3))) a) -> 3
(each-true ((a '(1 2 3))
(b '(4 5 6)))
(< a b))
-> t
(each-true ((a '(1 2 3))
(b '(4 0 6)))
(< a b))
-> nil
(some-true ((a '(1 2 3))) a) -> 1
(some-true ((a '(nil 2 3))) a) -> 2
(some-true ((a '(nil nil nil))) a) -> nil
(some-true ((a '(1 2 3))
(b '(4 0 6)))
(< a b))
-> t
(some-true ((a '(1 2 3))
(b '(0 1 2)))
(< a b))
-> nil
(each-false ((a '(1 2 3))
(b '(4 5 6)))
(> a b))
-> t
(each-false ((a '(1 2 3))
(b '(4 0 6)))
(> a b))
-> nil
(some-false ((a '(1 2 3))
(b '(4 0 6)))
(> a b))
-> t
(some-false ((a '(1 2 3))
(b '(0 1 2)))
(> a b))
-> nil
(each-prod ({(sym init-form)}*) body-form*)
(collect-each-prod ({(sym init-form)}*) body-form*)
(append-each-prod ({(sym init-form)}*) body-form*)
The macros each-prod, collect-each-prod and append-each-prod have a similar syntax to each, collect-each and collect-each-prod. However, instead of iterating over sequences in parallel, they iterate over the Cartesian product of the elements from the sequences. The difference between collect-each and collect-each-prod is analogous to that between the functions mapcar and maprod.
Like in the each operator family, the body-forms are surrounded by an anonymous block. If these forms execute a return from this block, then these macros terminate with the specified return value.
When no iterations are performed, including in the case when an empty list of variables is specified, all these macro forms terminate and return nil. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.
With one caveat noted below, these macros can be understood as providing syntactic sugar according to the pattern established by the following equivalences:
(each-prod (block nil
((x xinit) (let ((#:gx xinit) (#:gy yinit))
(y yinit)) <--> (maprodo (lambda (x y)
body) body)
#:gx #:gy))
(collect-each-prod (block nil
((x xinit) (let ((#:gx xinit) (#:gy yinit))
(y yinit)) <--> (maprod (lambda (x y)
body) body)
#:gx #:gy))
(append-each-prod (block nil
((x xinit) (let ((#:gx xinit) (#:gy yinit))
(y yinit)) <--> (maprend (lambda (x y)
body) body)
#:gx #:gy))
However, note that each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are then stepped by assignment.
(collect-each-prod ((a '(a b c))
(n #(1 2)))
(cons a n))
--> ((a . 1) (a . 2) (b . 1) (b . 2) (c . 1) (c . 2))
(each-prod* ({(sym init-form)}*) body-form*)
(collect-each-prod* ({(sym init-form)}*) body-form*)
(append-each-prod* ({(sym init-form)}*) body-form*)
The macros each-prod*, collect-each-prod* and append-each-prod* are variants of each-prod, collect-each-prod and append-each-prod with sequential binding.
These macros can be understood as providing syntactic sugar according to the pattern established by the following equivalences:
(each-prod* (let* ((x xinit)
((x xinit) (y yinit))
(y yinit)) <--> (maprodo (lambda (x y) body)
body) x y)
(collect-each-prod* (let* ((x xinit)
((x xinit) (y yinit))
(y yinit)) <--> (maprod (lambda (x y) body)
body) x y)
(append-each-prod* (let* ((x xinit)
((x xinit) (y yinit))
(y yinit)) <--> (maprend (lambda (x y) body)
body) x y)
However, note that the let* as well as each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are first initialized with the initializing expressions, and then reused as iteration variables which are stepped by assignment.
(collect-each-prod* ((a "abc")
(b (upcase-str a)))
`@a@b`)
--> ("aA" "aB" "aC" "bA" "bB" "bC" "cA" "cB" "cC")
(sum-each-prod ({(sym init-form)}*) body-form*)
(sum-each-prod* ({(sym init-form)}*) body-form*)
(mul-each-prod ({(sym init-form)}*) body-form*)
(mul-each-prod* ({(sym init-form)}*) body-form*)
The macros sum-each-prod and mul-each-prod have a similar syntax to sum-each and mul-each. However, instead of iterating over sequences in parallel, they iterate over the Cartesian product of the elements from the sequences.
The macros sum-each-prod* and mul-each-prod* variants perform sequential variable binding when establishing the initial values of the variables, similarly to the each* operator.
The body-forms are surrounded by an implicit anonymous block. If these forms execute a return from this block, then these macros terminate with the specified return value.
When no iterations are specified, including in the case when an empty list of variables is specified, the summing macros terminate, yielding 0, and the multiplicative macros terminate with 1. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.
;; Inefficiently calculate (+ (* 1 2 3) (* 4 3 2)).
;; Every value from (1 2 3) is paired with every value
;; from (4 3 2) to form a partial products, and
;; sum-each-prod adds these together implicitly:
(sum-each-prod ((x '(1 2 3))
(y '(4 3 2)))
(* x y))
-> 54
(block name body-form*)
(block* name-form body-form*)
The block operator introduces a named block around the execution of some forms. The name argument may be any object, though block names are usually symbols. Two block name objects are considered to be the same name according to eq equality. Since a block name is not a variable binding, keyword symbols are permitted, and so are the symbols t and nil. A block named by the symbol nil is slightly special: it is understood to be an anonymous block.
The block* operator differs from block in that it evaluates name-form, which is expected to produce a symbol. The resulting symbol is used for the name of the block.
A named or anonymous block establishes an exit point for the return-from or return operator, respectively. These operators can be invoked within a block to cause its immediate termination with a specified return value.
A block also establishes a prompt for a delimited continuation. Anywhere in a block, a continuation can be captured using the sys:capture-cont function. Delimited continuations are described in the section Delimited Continuations. A delimited continuation allows an apparently abandoned block to be restarted at the capture point, with the entire call chain and dynamic environment between the prompt and the capture point intact.
Blocks in TXR Lisp have dynamic scope. This means that the following situation is allowed:
(defun func () (return-from foo 42))
(block foo (func))
The function can return from the foo block even though the foo block does not lexically surround foo.
It is because blocks are dynamic that the block* variant exists; for lexically scoped blocks, it would make little sense to have support a dynamically computed name.
Thus blocks in TXR Lisp provide dynamic nonlocal returns, as well as returns out of lexical nesting.
It is permitted for blocks to be aggressively progn-converted by compilation. This means that a block form which meets certain criteria is converted to a progn form which surrounds the body-forms and thus no longer establishes an exit point.
A block form will be spared from progn-conversion by the compiler if it meets the following rules.
Additionally, the compiler may progn-convert blocks in contravention of the above rules, but only if doing so makes no difference to visible program behavior.
(defun helper ()
(return-from top 42))
;; defun implicitly defines a block named top
(defun top ()
(helper) ;; function returns 42
(prinl 'notreached)) ;; never printed
(defun top2 ()
(let ((h (fun helper)))
(block top (call h)) ;; may progn-convert
(block top (call 'helper)) ;; may progn-convert
(block top (helper)))) ;; not removed
In the above examples, the block containing
(call h)
may be converted to
progn
because it doesn't express a
direct
call to the
helper
function. The block which calls
helper
using
(call 'helper)
is also not considered to be making a direct call.
In Common Lisp, blocks are lexical. A separate mechanism consisting of catch and throw operators performs nonlocal transfer based on symbols. The TXR Lisp example:
(defun func () (return-from foo 42))
(block foo (func))
is not allowed in Common Lisp, but can be transliterated to:
(defun func () (throw 'foo 42))
(catch 'foo (func))
Note that foo is quoted in CL. This underscores the dynamic nature of the construct. throw itself is a function and not an operator. Also note that the CL example, in turn, is even more closely transcribed back into TXR Lisp simply by replacing its throw and catch with return* and block*:
(defun func () (return* 'foo 42))
(block* 'foo (func))
Common Lisp blocks also do not support delimited continuations.
(return [value])
(return-from name [value])
The return operator must be dynamically enclosed within an anonymous block (a block named by the symbol nil). It immediately terminates the evaluation of the innermost anonymous block which encloses it, causing it to return the specified value. If the value is omitted, the anonymous block returns nil.
The return-from operator must be dynamically enclosed within a named block whose name matches the name argument. It immediately terminates the evaluation of the innermost such block, causing it to return the specified value. If the value is omitted, that block returns nil.
(block foo
(let ((a "abc\n")
(b "def\n"))
(pprint a *stdout*)
(return-from foo 42)
(pprint b *stdout*)))
Here, the output produced is "abc". The value of b is not printed because. return-from terminates block foo, and so the second pprint form is not evaluated.
(return* name [value])
The return* function is similar to the return-from operator, except that name is an ordinary function parameter, and so when return* is used, an argument expression must be specified which evaluates to a symbol. Thus return* allows the target block of a return to be dynamically computed.
The following equivalence holds between the operator and function:
(return-from a b) <--> (return* 'a b)
Expressions used as name arguments to return* which do not simply quote a symbol have no equivalent in return-from.
(tagbody {form | label}*)
(go label)
The tagbody macro provides a form of the "go to" control construct. The arguments of a tagbody form are a mixture of zero or more forms and go labels. The latter consist of those arguments which are symbols, integers or characters. Labels are not considered by tagbody and go to be forms, and are not subject to macro expansion or evaluation.
The go macro is available inside tagbody. It is erroneous for a go form to occur outside of a tagbody. This situation is diagnosed by global macro called go, which unconditionally throws an error.
In the absence of invocations of go or other control transfers, the tagbody macro evaluates each form in left-to-right order. The go labels are ignored. After the last form is evaluated, the tagbody form terminates, and yields nil.
Any form itself, or else any of its subforms, may be the form (go label) where label matches one of the go labels of a surrounding tagbody. When this go form is evaluated, then the evaluation of form is immediately abandoned, and control transfers to the specified label. The forms are then evaluated in left-to-right order starting with the form immediately after that label. If the label is not followed by any forms, then the tagbody terminates. If label doesn't match to any label in any surrounding tagbody, the go form is erroneous.
The abandonment of a form by invocation of go is a dynamic transfer. All necessary unwinding inside form takes place.
The go labels are lexically scoped, but dynamically bound. Their scope being lexical means that the labels are not visible to forms which are not enclosed within the tagbody, even if their evaluation is invoked from that tagbody. The dynamic binding means that the labels of a tagbody form are established when it begins evaluating, and removed when that form terminates. Once a label is removed, it is not available to be the target of a go control transfer, even if that go form has the label in its lexical scope. Such an attempted transfer is erroneous.
It is permitted for tagbody forms to nest arbitrarily. The labels of an inner tagbody are not visible to an outer tagbody. However, the reverse is true: a go form in an inner tagbody may branch to a label in an outer tagbody, in which case the entire inner tagbody terminates.
In cases where the same objects are used as labels by an inner and outer tagbody, the inner labels shadow the outer labels.
There is no restriction on what kinds of symbols may be labels. Symbols in the keyword package as well as the symbols t and nil are valid tagbody labels.
ANSI Common Lisp tagbody supports only symbols and integers as labels (which are called "go tags"); characters are not supported.
;; print the numbers 1 to 10
(let ((i 0))
(tagbody
(go skip) ;; forward goto skips 0
again
(prinl i)
skip
(when (<= (inc i) 10)
(go again))))
;; Example of erroneous usage: by the time func is invoked
;; by (call func) the tagbody has already terminated. The
;; lambda body can still "see" the label, but it doesn't
;; have a binding.
(let (func)
(tagbody
(set func (lambda () (go label)))
(go out)
label
(prinl 'never-reached)
out)
(call func))
;; Example of unwinding when the unwind-protect
;; form is abandoned by (go out). Output is:
;; reached
;; cleanup
;; out
(tagbody
(unwind-protect
(progn
(prinl 'reached)
(go out)
(prinl 'notreached))
(prinl 'cleanup))
out
(prinl 'out))
(prog ({sym | (sym init-form)}*)
{body-form | label}*)
(prog* ({sym | (sym init-form)}*)
{body-form | label}*)
The prog and progn* macros combine the features of let and let*, respectively, anonymous block and tagbody.
The prog macro treats the sym and init-form expressions similarly to let, establishing variable bindings in parallel. The prog* macro treats these expressions in a similar way to let*.
The forms enclosed are treated like the argument forms of the tagbody macro: labels are permitted, along with use of go.
Finally, an anonymous block is established around all of the enclosed forms (both the init-forms and body-formss) allowing the use of return to terminate evaluation with a value.
The prog macro may be understood according to the following equivalence:
(prog vars forms ...) <--> (block nil
(let vars
(tagbody forms ...)))
Likewise, the prog* macro follows an analogous equivalence, with let replaced by let*.
(eval form [env])
The eval function treats the form object as a Lisp expression, which is expanded and evaluated. The side effects implied by the form are performed, and the value which it produces is returned. The optional env object specifies an environment for resolving the function and variable references encountered in the expression. If this argument is omitted, then evaluation takes place in the global environment.
The form is not expanded all at once. Rather, it is treated by the following algorithm:
For instance, a form like (progn (defmacro foo ()) (foo)) may be processed with eval, because the above algorithm ensures that the (defmacro foo ()) expression is fully evaluated first, thereby providing the macro definition required by (foo).
This expansion and evaluation order is important because the semantics of eval forms the reference model for how the load function processes top-level forms. Moreover, file compilation perform a similar treatment of top-level forms and incremental macro compilation. The result is that the behavior is consistent between source files and compiled files. See the sections Top-Level Forms and File Compilation Model.
Note that, according to these rules, the constituent body forms of a macrolet or symacrolet top-level form are not individual top-level forms, even if the expansion of the construct combines the expanded versions of those forms with progn.
The form (macrolet () (defmacro foo ()) (foo)) will therefore not work correctly. However, the specific problem in this situation can be be resolved by rewriting foo as a macrolet macro: (macrolet ((foo ())) (foo)).
See also: the make-env function.
(constantp form [env])
The constantp function determines whether form is a constant form, with respect to environment env.
If env is absent, the global environment is used. The env argument is used for fully expanding form prior to analyzing.
Currently, constantp returns true for any form which, after macro-expansion, is any of the following: a compound form with the symbol quote in its first position; a non-symbolic atom; or one of the symbols which evaluate to themselves and cannot be bound as variables. These symbols are the keyword symbols, and the symbols t and nil.
Additionally, constantp returns true for a compound form, or a DWIM form, whose symbol is the member of a set a large number of constant-foldable library functions, and whose arguments are, recursively, constantp expressions for the same environment. The arithmetic functions are members of this set.
For all other inputs, constantp returns nil.
Note: some uses of constantp require manual expansion.
(constantp nil) -> t
(constantp t) -> t
(constantp :key) -> t
(constantp :) -> t
(constantp 'a) -> nil
(constantp 42) -> t
(constantp '(+ 2 2 [* 3 (/ 4 4)])) -> t
;; symacrolet form expands to 42, which is constant
(constantp '(symacrolet ((a 42)) a))
(defmacro cp (:env e arg)
(constantp arg e))
;; macro call (cp 'a) is replaced by t because
;; the symbol a expands to (+ 2 2) in the given environment,
;; and so (* a a) expands to (* (+ 2 2) (+ 2 2)) which is constantp.
(symacrolet ((a (+ 2 2)))
(cp '(* a a))) -> t
(make-env [var-bindings [fun-bindings [next-env]]])
The make-env function creates an environment object suitable as the env parameter.
The var-bindings and fun-bindings parameters, if specified, should be association lists, mapping symbols to objects. The objects in fun-bindings should be functions, or objects callable as functions.
The next-env argument, if specified, should be an environment.
Note: bindings can also be added to an environment using the env-vbind and env-fbind functions.
(env-vbind env symbol value)
(env-fbind env symbol value)
These functions bind a symbol to a value in either the function or variable space of environment env.
Values established in the function space should be functions or objects that can be used as functions such as lists, strings, arrays or hashes.
If symbol already exists in the environment, in the given space, then its value is updated with value.
If env is specified as nil, then the binding takes place in the global environment.
(env-vbindings env)
(env-fbindings env)
(env-next env)
These function retrieve the components of env, which must be an environment. The env-vbindings function retrieves the association list representing variable bindings. Similarly, the env-fbindings retrieves the association list of function bindings. The env-next function retrieves the next environment, if env has one, otherwise nil.
If e is an environment constructed by the expression (make-env v f n), then (env-vbindings e) retrieves v, (env-fbindings e) retrieves f and (env-next e) returns n.
(symbol-function {symbol | method-name | lambda-expr})
(symbol-macro symbol)
(symbol-value symbol)
(set (symbol-function {symbol | method-name}) new-value)
(set (symbol-macro symbol) new-value)
(set (symbol-value symbol) new-value)
If given a symbol argument, the symbol-function function retrieves the value of the global function binding of the given symbol if it has one: that is, the function object bound to the symbol. If symbol has no global function binding, then nil is returned.
The symbol-function function supports method names of the form (meth struct slot) where struct names a struct type, and slot is either a static slot or one of the keyword symbols :init or :postinit which refer to special functions associated with a structure type. Names in this format are returned by the func-get-name function. The symbol-function function also supports names of the form (macro name) which denote macros. Thus, symbol-function provides unified access to functions, methods and macros.
If a lambda expression is passed to symbol-function, then the expression is macro-expanded and if that is successful, the function implied by that expression is returned. It is unspecified whether this function is interpreted or compiled.
The symbol-macro function retrieves the value of the global macro binding of symbol if it has one.
Note: the name of this function has nothing to do with symbol macros; it is named for consistency with symbol-function and symbol-value, referring to the "macro-expander binding of the symbol cell".
The value of a macro binding is a function object. Intrinsic macros are C functions in the TXR kernel, which receive the entire macro call form and macro environment, performing their own destructuring. Currently, macros written in TXR Lisp are represented as curried C functions which carry the following list object in their environment cell:
(#<environment object> macro-parameter-list body-form*)
Local macros created by macrolet have nil in place of the environment object.
This representation is likely to change or expand to include other forms in future TXR versions.
The symbol-value function retrieves the value stored in the dynamic binding of symbol that is apparent in the current context. If the variable has no dynamic binding, then symbol-value retrieves its value in the global environment. If symbol has no variable binding, but is defined as a global symbol macro, then the value of that symbol macro binding is retrieved. The value of a symbol macro binding is simply the replacement form.
Rather than throwing an exception, each of these functions returns nil if the argument symbol doesn't have the binding in the respective namespace or namespaces which that function searches.
A symbol-function, symbol-macro, or symbol-value form denotes a place, if symbol has a binding of the respective kind. This place may be assigned to or deleted. Assignment to the place causes the denoted binding to have a new value. Deleting a place with the del macro removes the binding, and returns the previous contents of that binding. A binding denoted by a symbol-function form is removed using fmakunbound, one denoted by by symbol-macro is removed using mmakunbound and a binding denoted by symbol-value is removed using makunbound.
Deleting a method via symbol-function is not possible; an attempt to do so has no effect.
Storing a value, using any one of these three accessors, to a nonexistent variable, function or macro binding, is not erroneous. It has has the effect of creating that binding.
Using symbol-function accessor to assign to a lambda expression is erroneous.
Deleting a binding, using any of these three accessors, when the binding does not exist, also isn't erroneous. There is no effect and the del operator yields nil as the prior value, consistent with the behavior when accessors are used to retrieve a nonexistent value.
In ANSI Common Lisp, the symbol-function function retrieves a function, macro or special operator binding of a symbol. These are all in one space and may not coexist. In TXR Lisp, it retrieves a symbol's function binding only. Common Lisp has an accessor named macro-function similar to symbol-macro.
(boundp symbol)
(fboundp {symbol | method-name | lambda-expr})
(mboundp symbol)
boundp returns t if the symbol is bound as a variable or symbol macro in the global environment, otherwise nil.
fboundp returns t if the symbol has a function binding in the global environment, the method specified by method-name exists, or a lambda expression argument is given. Otherwise it returns nil.
mboundp returns t if the symbol has an operator macro binding in the global environment, otherwise nil.
The boundp function in ANSI Common Lisp doesn't report that global symbol macros have a binding. They are not considered bindings. In TXR Lisp, they are considered bindings.
The ANSI Common Lisp fboundp yields true if its argument has a function, macro or operator binding, whereas the TXR Lisp fboundp does not consider operators or macros. The ANSI CL fboundp does not yield true for lambda expressions. Behavior similar to the Common Lisp expression (fboundp x) in Common Lisp can be obtained in TXR Lisp using the
(or (fboundp x) (mboundp x) (special-operator-p x))
expression, except that this will also yield true when x is a lambda expression.
The mboundp function doesn't exist in ANSI Common Lisp.
(makunbound symbol)
The function makunbound the binding of symbol from either the dynamic environment or the global symbol macro environment. After the call to makunbound, symbol appears to be unbound.
If the makunbound call takes place in a scope in which there exists a dynamic rebinding of symbol, the information for restoring the previous binding is not affected by makunbound. When that scope terminates, the previous binding will be restored.
If the makunbound call takes place in a scope in which the dynamic binding for symbol is the global binding, then the global binding is removed. When the global binding is removed, then if symbol was previously marked as special (for instance by defvar) this marking is removed.
Otherwise if symbol has a global symbol macro binding, that binding is removed.
If symbol has no apparent dynamic binding, and no global symbol macro binding, makunbound does nothing.
In all cases, makunbound returns symbol.
The behavior of makunbound differs from its counterpart in ANSI Common Lisp.
The makunbound function in Common Lisp only removes a value from a dynamic variable. The dynamic variable does not cease to exist, it only ceases to have a value (because a binding is a value). In TXR Lisp, the variable ceases to exist. The binding of a variable isn't its value, it is the variable itself: the association between a name and an abstract storage location, in some environment. If the binding is undone, the variable disappears.
The makunbound function in Common Lisp does not remove global symbol macros, which are not considered to be bindings in the variable namespace. That is to say, the Common Lisp boundp does not report true for symbol macros.
The Common Lisp makunbound also doesn't remove the special attribute from a symbol. If a variable is introduced with defvar and then removed with makunbound, the symbol continues to exhibit dynamic binding rather than lexical in subsequent scopes. In TXR Lisp, if a global binding is removed, so is the special attribute.
(fmakunbound symbol)
(mmakunbound symbol)
The function fmakunbound removes any binding for symbol from the function namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.
The function mmakunbound removes any binding for symbol from the operator macro namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.
The behavior of fmakunbound differs from its counterpart in ANSI Common Lisp. The fmakunbound function in Common Lisp removes a function or macro binding, which do not coexist.
The mmakunbound function doesn't exist in Common Lisp.
(func-get-form func)
The func-get-form function retrieves a source code form of func, which must be an interpreted function. The source code form has the syntax (name arglist body-form*) .
(func-get-name func [env])
The func-get-name tries to resolve the function object func to a name. If that is not possible, it returns nil.
The resolution is performed by an exhaustive search through up to three spaces.
If an environment is specified by env, then this is searched first. If a binding is found in that environment which resolves to the function, then the search terminates and the binding's symbol is returned as the function's name.
If the search through environment env fails, or if that argument is not specified, then the global environment is searched for a function binding which resolves to func. If such a binding is found, then the search terminates, and the binding's symbol is returned. If two or more symbols in the global environment resolve to the function, it is not specified which one is returned.
If the global function environment search fails, then the function is considered as a possible macro. The global macro environment is searched for a macro binding whose expander function is func, similarly to the way the function environment was searched. If a binding is found, then the syntax (macro name) is returned, where name is the name of the global macro binding that was found which resolves to func. If two or more global macro bindings share func, it is not specified which of those bindings provides name.
If the global macro search fails, then func is considered as a possible method. The static slot space of all struct types is searched for a slot which contains func. If such a slot is found, then the method name is returned, consisting of the syntax (meth type name) where type is a symbol denoting the struct type and name is the static slot of the struct type which holds func.
A check is also performed whether func might be equal to one of the two special functions of a structure type: its initfun or postinitfun, in which case it is returned as either the (meth type :init) or the (meth type :postinit) syntax.
If func is an interpreted function not found under any name, then a lambda expression denoting that function is returned in the syntax (lambda args form*)
If func cannot be identified as a function, then nil is returned.
(func-get-env func)
The func-get-env function retrieves the environment object associated with function func. The environment object holds the captured bindings of a lexical closure.
(fun-fixparam-count func)
(fun-optparam-count func)
The fun-fixparam-count reports func's number of fixed parameters. The fixed parameters consist of the required parameters and the optional parameters. Variadic functions have a parameter which captures the remaining arguments which are in excess of the fixed parameters. That parameter is not considered a fixed parameter and therefore doesn't contribute to this count.
The fun-optparam-count reports func's number of optional parameters.
The func argument must be a function.
Note: if a function isn't variadic (see the fun-variadic function) then the value reported by fun-fixparam-count represents the maximum number of arguments which can be passed to the function. The minimum number of required arguments can be calculated for any function by subtracting the value from fun-optparam-count from the value from fun-fixparam-count.
(fun-variadic func)
The fun-variadic function returns t if func is a variadic function, otherwise nil.
The func argument must be a function.
(interp-fun-p obj)
The interp-fun-p function returns t if obj is an interpreted function, otherwise it returns nil.
(vm-fun-p obj)
The vm-fun-p function returns t if obj a function compiled for the virtual machine: a function representation produced by means of the functions compile-file, compile-toplevel or compile. If obj is of any other type, the function returns nil.
(special-var-p obj)
The special-var-p function returns t if obj is a symbol marked for special variable binding, otherwise it returns nil. Symbols are marked special by defvar and defparm.
(special-operator-p obj)
The special-operator-p function returns t if obj is a symbol which names a special operator, otherwise it returns nil.
In TXR Lisp, objects obey the following type hierarchy. In this type hierarchy, the internal nodes denote abstract types: no object is an instance of an abstract type. Nodes in square brackets indicate an internal structure in the type graph, invisible to programs, and angle brackets indicate a plurality of types which are not listed by name:
t ----+--- [cobj types] ---+--- hash
| |
| +--- hash-iter
| |
| +--- stream
| |
| +--- random-state
| |
| +--- regex
| |
| +--- buf
| |
| +--- tree
| |
| +--- tree-iter
| |
| +--- seq-iter
| |
| +--- cptr
| |
| +--- dir
| |
| +--- struct-type
| |
| +--- <all structures>
| |
| +--- ... others
|
|
+--- sequence ---+--- string ---+--- str
| | |
| | +--- lstr
| | |
| | +--- lit
| |
| +--- list ---+--- null
| | |
| | +--- cons
| | |
| | +--- lcons
| |
| +--- vec
| |
| +--- <structures with car or length methods>
|
+--- number ---+--- float
| |
| +--- integer ---+--- fixnum
| |
| +--- bignum
|
+--- chr
|
+--- sym
|
+--- env
|
+--- range
|
+--- tnode
|
+--- pkg
|
+--- fun
|
+--- args
In addition to the above hierarchy, the following relationships also exist:
t ---+--- atom --- <any type other than cons> --- nil
|
+--- cons ---+--- lcons --- nil
|
+--- nil
sym --- null
struct ---- <all structures>
That is to say, the types are exhaustively partitioned into atoms and conses; an object is either a cons or else it isn't, in which case it is the abstract type atom.
The cons type is odd in that it is both an abstract type, serving as a supertype for the type lcons and it is also a concrete type in that regular conses are of this type.
The type nil is an abstract type which is empty. That is to say, no object is of type nil. This type is considered the abstract subtype of every other type, including itself.
The type nil is not to be confused with the type null which is the type of the nil symbol.
Because the type of nil is the type null and nil is also a symbol, the null type is a subtype of sym.
Lastly, the symbol struct serves as the supertype of all structures.
(typeof value)
The typeof function returns a symbol representing the type of value.
The core types are identified by the following symbols:
There are more kinds of objects, such as user-defined structures.
(subtypep left-type right-type)
The subtypep function tests whether left-type and right-type name a pair of types, such that the left type is a subtype of the right type.
The arguments are either type symbols, or structure type objects, as returned by the find-struct-type function. Thus, the symbol time, which is the name of a predefined struct type, and the object returned by (find-struct-type 'time) are considered equivalent argument values.
If either argument doesn't name a type, the behavior is unspecified.
Each type is a subtype of itself. Most other type relationships can be inferred from the type hierarchy diagrams given in the introduction to this section.
In addition, there are inheritance relationships among structures. If left-type and right-type are both structure types, then subtypep yields true if the types are the same struct type, or if the right type is a direct or indirect supertype of the left.
The type symbol struct is a supertype of all structure types.
(typep object type-symbol)
The typep function tests whether the type of object is a subtype of the type named by type-symbol.
The following equivalence holds:
(typep a b) --> (subtypep (typeof a) b)
(typecase test-form {(type-sym clause-form*)}*)
The typecase macro evaluates test-form and then successively tests its type against each clause.
Each clause consists of a type symbol type-sym and zero or more clause-forms.
The first clause whose type-sym is a supertype of the type of test-form's value is considered to be the matching clause. That clause's clause-forms are evaluated, and the value of the last form is returned.
If there is no matching clause, or there are no clauses present, or the matching clause has no clause-forms, then nil is returned.
Note: since t is the supertype of every type, a clause whose type-sym is the symbol t always matches. If such a clause is placed as the last clause of a typecase, it provides a fallback case, whose forms are evaluated if none of the previous clauses match.
(etypecase test-form {(type-sym clause-form*)}*)
The etypecase macro is the error-catching variant of typecase, similar to the relationship between the ecaseq and caseq families of macros.
If one of the clauses has a type-sym which is the symbol t, then etypecase is precisely equivalent to typecase. Otherwise, a clause with a type-sym of t and which throws an exception of type case-error, derived from error, is appended to the existing clauses, after which the semantics follows that of typecase.
(built-in-type-p object)
The built-in-type-p function returns t if object is a symbol which is the name of a built-in type. For all other objects it returns nil.
(identity value)
(identity* value*)
(use value)
The identity function returns its argument.
If the identity* function is given at least one argument, then it returns its leftmost argument, otherwise it returns nil.
The use function is a synonym of identity.
The
identity
function is useful as a functional argument, when a transformation
function is required, but no transformation is actually desired.
In this role, the
use
synonym leads to readable code. For instance:
;; construct a function which returns its integer argument
;; if it is odd, otherwise it returns its successor.
;; "If it's odd, use it, otherwise take its successor".
[iff oddp use succ]
;; Applications of the function:
[[iff oddp use succ] 3] -> 3 ;; use applied to 3
[[iff oddp use succ] 2] -> 3 ;; succ applied to 2
(null value)
(not value)
(false value)
The null, not and false functions are synonyms. They tests whether value is the object nil. They return t if this is the case, nil otherwise.
(null '()) -> t
(null nil) -> t
(null ()) -> t
(false t) -> nil
(if (null x) (format t "x is nil!"))
(let ((list '(b c d)))
(if (not (memq 'a list))
(format t "list ~s does not contain the symbol a\n")))
(true value)
(have value)
The true function is the complement of the null, not and false functions. The have function is a synonym for true.
It return t if the value is any object other than nil. If value is nil, it returns nil.
Note: programs should avoid explicitly testing values with true. For instance (if x ...) should be favored over (if (true x) ...). However, the latter is useful with the ifa macro because (ifa (true expr) ...) binds the it variable to the value of expr, no matter what kind of form expr is, which is not true in the (ifa expr ...) form.
;; Compute indices where the list '(1 nil 2 nil 3)
;; has true values:
[where '(1 nil 2 nil 3) true] -> (1 3)
(eq left-obj right-obj)
(eql left-obj right-obj)
(equal left-obj right-obj)
The principal equality test functions eq, eql and equal test whether two objects are equivalent, using different criteria. They return t if the objects are equivalent, and nil otherwise.
The eq function uses the strictest equivalence test, called implementation equality. The eq function returns t if and only if, left-obj and right-obj are actually the same object. The eq test is implemented by comparing the raw bit pattern of the value, whether or not it is an immediate value or a pointer to a heaped object. Two character values are eq if they are the same character, and two fixnum integers are eq if they have the same value. All other object representations are actually pointers, and are eq if and only if they point to the same object in memory. So, for instance, two bignum integers might not be eq even if they have the same numeric value, two lists might not be eq even if all their corresponding elements are eq and two strings might not be eq even if they hold identical text.
The eql function is slightly less strict than eq. The difference between eql and eq is that if left-obj and right-obj are numbers which are of the same kind and have the same numeric value, eql returns t, even if they are different objects. Note that an integers and a floating-point number are not eql even if one has a value which converts to the other: thus, (eql 0.0 0) yields nil; a comparison expression which finds these numbers equal is (= 0.0 0). The eql function also specially treats range objects. Two distinct range objects are eql if their corresponding from and to fields are eql. For all other object types, eql behaves like eq.
The equal function is less strict still than eql. In general, it recurses into some kinds of aggregate objects to perform a structural equivalence check. For struct types, it also supports customization via equality substitution. See the Equality Substitution section under Structures.
Firstly, if left-obj and right-obj are eql then they are also equal, though the converse isn't necessarily the case.
If two objects are both cons cells, then they are equal if their car fields are equal and their cdr fields are equal.
If two objects are vectors, they are equal if they have the same length, and their corresponding elements are equal.
If two objects are strings, they are equal if they are textually identical.
If two objects are functions, they are equal if they have equal environments, and if they have the same code. Two compiled functions are considered to have the same code if and only if they are pointers to the same function. Two interpreted functions are considered to have the same code if their list structure is equal.
Two hashes are equal if they use the same equality (both are :equal-based, or both are :eql-based or else both are :eq-based), if their associated user data elements are equal (see the function hash-userdata), if their sets of keys are identical, and if the data items associated with corresponding keys from each respective hash are equal objects.
Two ranges are equal if their corresponding to and from fields are equal.
For some aggregate objects, there is no special semantics. Two arguments which are symbols, packages, or streams are equal if and only if they are the same object.
Certain object types have a custom equal function.
(neq left-obj right-obj)
(neql left-obj right-obj)
(nequal left-obj right-obj)
The functions neq, neql and nequal are logically negated counterparts of, respectively, eq, eql and equal.
If eq returns t for a given pair of arguments left-obj and right-obj, then neq returns nil. Vice versa, if eq returns nil, neq returns t.
The same relationship exits between eql and neql, and between equal and nequal.
(meq left-obj right-obj*)
(meql left-obj right-obj*)
(mequal left-obj right-obj*)
The functions meq, meql and mequal ("member equal" or "multi-equal") provide a particular kind of a generalization of the binary equality functions eq, eql and equal to multiple arguments.
The left-obj value is compared to each right-obj value using the corresponding binary equality function. If a match occurs, then t is returned, otherwise nil.
The traversal of the right-obj argument values proceeds from left to right, and stops when a match is found.
(less left-obj right-obj)
(less obj obj*)
The less function, when called with two arguments, determines whether left-obj compares less than right-obj in a generic way which handles arguments of various types.
The argument syntax of less is generalized. It can accept one argument, in which case it unconditionally returns t regardless of that argument's value. If more than two arguments are given, then less generalizes in a way which can be described by the following equivalence pattern, with the understanding that each argument expression is evaluated exactly once:
(less a b c) <--> (and (less a b) (less b c))
(less a b c d) <--> (and (less a b) (less b c) (less c d))
The less function is used as the default for the lessfun argument of the functions sort and merge, as well as the testfun argument of the pos-min and find-min.
The less function is capable of comparing numbers, characters, symbols, strings, as well as lists and vectors of these. It can also compare buffers.
If both arguments are the same object so that (eq left-obj right-obj) holds true, then the function returns nil regardless of the type of left-obj, even if the function doesn't handle comparing different instances of that type. In other words, no object is less than itself, no matter what it is.
The less function pairs with the equal function. If values a and b are objects which are of suitable types to the less function, then exactly one of the following three expressions must be true: (equal a b), (less a b) or (less b a).
The less relation is: antisymmetric, such that if (less a b) is true, then then (less b a) is false; irreflexive, such that (less a a) is false; and transitive, such that (less a b) and (less b c) imply (less a c).
The following are detailed criteria that less applies to arguments of different types and combinations thereof.
If both arguments are numbers or characters, they are compared as if using the < function.
If both arguments are strings, they are compared as if using the string-lt function.
If both arguments are symbols, the following rules apply. If the symbols have names which are different, then the result is that of their names being compared by the string-lt function. If less is passed symbols which have the same name, and neither of these symbols has a home package, then the raw bit patterns of their values are compared as integers: effectively, the object with the lower machine address is considered lesser than the other. If only one of the two same-named symbols has no home package, then if that symbol is the left argument, less returns t, otherwise nil. If both same-named symbols have home packages, then the result of less is that of string-lt applied to the names of their respective packages. Thus a:foo is less than z:foo.
If both arguments are conses, then they are compared as follows:
This logic performs a lexicographic comparison on ordinary lists such that for instance (1 1) is less than (1 1 1) but not less than (1 0) or (1).
Note that the empty nil list nil compared to a cons is handled by type-based precedence, described below.
Two vectors are compared by less lexicographically, similarly to strings. Corresponding elements, starting with element 0, of the vectors are compared until an index position is found where corresponding elements of the two vectors are not equal. If this differing position is beyond the end of one of the two vectors, then the shorter vector is considered to be lesser. Otherwise, the result of less is the outcome of comparing those differing elements themselves with less.
Two buffers are also compared by less lexicographically, as if they were vectors of integer byte values.
Two ranges are compared by less using lexicographic logic similar to conses and vectors. The from fields of the ranges are first compared. If they are not equal, equal then less is applied to those fields and the result is returned. If the from fields are equal, then less is applied to the to fields and that result is returned.
If the two arguments are of the above types, but of different types from each other, then less resolves the situation based on the following precedence: numbers and characters are less than ranges, which are less than strings, which are less than symbols, which are less than conses, which are less than vectors, which are less than buffers.
Note that since nil is a symbol, it is ranked lower than a cons. This interpretation ensures correct behavior when nil is regarded as an empty list, since the empty list is lexicographically prior to a nonempty list.
If either argument is a structure for which the equal method is defined, the method is invoked on that argument, and the value returned is used in place of that argument for performing the comparison. Structures with no equal method cannot participate in a comparison, resulting in an error. See the Equality Substitution section under Structures.
Finally, if either of the arguments has a type other than the above types, the situation is an error.
(greater left-obj right-obj)
(greater obj obj*)
The greater function is equivalent to less with the arguments reversed. That is to say, the following equivalences hold:
(greater a) <--> (less a) <--> t
(greater a b) <--> (less b a)
(greater a b c ...) <--> (less ... c b a)
The greater function is used as the default for the testfun argument of the pos-max and find-max functions.
(lequal obj obj*)
(gequal obj obj*)
The functions lequal and gequal are similar to less and greater respectively, but differ in the following respect: when called with two arguments which compare true under the equal function, the lequal and gequal functions return t.
When called with only one argument, both functions return t and both functions generalize to three or more arguments in the same way as do less and greater.
(copy object)
The copy function duplicates objects of various supported types: sequences, hashes, structures and random states. If object is nil, it returns nil. Otherwise, copy is equivalent to invoking a more specific copying function according to the type of the argument, as follows:
For all other types of object, the invocation is erroneous.
Except in the case when sequence is nil, copy returns a value that is distinct from (not eq to) sequence. This is different from the behavior of [sequence 0..t] or (sub sequence 0 t) which recognize that they need not make a copy of sequence, and just return it.
Note however, that the elements of the returned sequence may be eq to elements of the original sequence. In other words, copy is a deeper copy than just duplicating the sequence value itself, but it is not a deep copy.
(cons car-value cdr-value)
The cons function allocates, initializes and returns a single cons cell. A cons cell has two fields called car and cdr, which are accessed by functions of the same name, or by the functions first and rest, which are synonyms for these.
Lists are made up of conses. A (proper) list is either the symbol nil denoting an empty list, or a cons cell which holds the first item of the list in its car, and the list of the remaining items in cdr. The expression (cons 1 nil) allocates and returns a single cons cell which denotes the one-element list (1). The cdr is nil, so there are no additional items.
A cons cell whose cdr is an atom other than nil is printed with the dotted pair notation. For example the cell produced by (cons 1 2) is denoted (1 . 2). The notation (1 . nil) is perfectly valid as input, but the cell which it denotes will print back as (1). The notations are equivalent.
The dotted pair notation can be used regardless of what type of object is the cons cell's cdr. so that for instance (a . (b c)) denotes the cons cell whose car is the symbol a a and whose cdr is the list (b c). This is exactly the same thing as (a b c). In other words (a b ... l m . (n o ... w . (x y z))) is exactly the same as (a b ... l m n o ... w x y z).
Every list, and more generally cons-cell tree structure, can be written in a "fully dotted" notation, such that there are as many dots as there are cells. For instance the cons structure of the nested list (1 (2) (3 4 (5))) can be made more explicit using (1 . ((2 . nil) . ((3 . (4 . ((5 . nil) . nil))) . nil)))). The structure contains eight conses, and so there are eight dots in the fully dotted notation.
The number of conses in a linear list like (1 2 3) is simply the number of items, so that list in particular is made of three conses. Additional nestings require additional conses, so for instance (1 2 (3)) requires four conses. A visual way to count the conses from the printed representation is to count the atoms, then add the count of open parentheses, and finally subtract one.
A list terminated by an atom other than nil is called an improper list, and the dot notation is extended to cover improper lists. For instance (1 2 . 3) is an improper list of two elements, terminated by 3, and can be constructed using (cons 1 (cons 2 3)). The fully dotted notation for this list is (1 . (2 . 3)).
(atom value)
The atom function tests whether value is an atom. It returns t if this is the case, nil otherwise. All values which are not cons cells are atoms.
(atom x) is equivalent to (not (consp x)).
(atom 3) -> t
(atom (cons 1 2)) -> nil
(atom "abc") -> t
(atom '(3)) -> nil
(consp value)
The consp function tests whether value is a cons. It returns t if this is the case, nil otherwise.
(consp x) is equivalent to (not (atom x)).
Nonempty lists test positive under consp because a list is represented as a reference to the first cons in a chain of one or more conses.
Note that a lazy cons is a cons and satisfies the consp test. See the function make-lazy-cons and the macro lcons.
(consp 3) -> nil
(consp (cons 1 2)) -> t
(consp "abc") -> nil
(consp '(3)) -> t
(car object)
(first object)
(set (car object) new-value)
(set (first object) new-value)
The functions car and first are synonyms.
If object is a cons cell, these functions retrieve the car field of that cons cell. (car (cons 1 2)) yields 1.
For programming convenience, object may be of several other kinds in addition to conses.
(car nil) is allowed, and returns nil.
object may also be a vector or a string. If it is an empty vector or string, then nil is returned. Otherwise the first character of the string or first element of the vector is returned.
object may be a structure. The car operation is possible if the object has a car method. If so, car invokes that method and returns whatever the method returns. If the structure has no car method, but has a lambda method, then the car function calls that method with one argument, that being the integer zero. Whatever the method returns, car returns. If neither method is defined, an error exception is thrown.
A car form denotes a valid place whenever object is a valid argument for the rplaca function. Modifying the place denoted by the form is equivalent to invoking rplaca with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplaca function, and obeys the same restrictions.
A car form supports deletion. The following equivalence then applies:
(del (car place)) <--> (pop place)
This implies that deletion requires the argument of the car form to be a place, rather than the whole form itself. In this situation, the argument place may have a value which is nil, because pop is defined on an empty list.
The abstract concept behind deleting a car is that physically deleting this field from a cons, thereby breaking it in half, would result in just the cdr remaining. Though fragmenting a cons in this manner is impossible, deletion simulates it by replacing the place which previously held the cons, with that cons' cdr field. This semantics happens to coincide with deleting the first element of a list by a pop operation.
(cdr object)
(rest object)
(set (cdr object) new-value)
(set (rest object) new-value)
The functions cdr and rest are synonyms.
If object is a cons cell, these functions retrieve the cdr field of that cons cell. (cdr (cons 1 2)) yields 2.
For programming convenience, object may be of several other kinds in addition to conses.
(cdr nil) is allowed, and returns nil.
object may also be a vector or a string. If it is a nonempty string or vector containing at least two items, then the remaining part of the object is returned, with the first element removed. For example (cdr "abc") yields "bc". If object is a one-element vector or string, or an empty vector or string, then nil is returned. Thus (cdr "a") and (cdr "") both result in nil.
If object is a structure, then cdr requires it to support either the cdr method or the lambda method. If both are present, cdr is used. When the cdr function uses the cdr method, it invokes it with no arguments. Whatever value the method returns becomes the return value of cdr. When cdr invokes a structure's lambda method, it passes as the argument the range object #R(1 t). Whatever the lambda method returns becomes the return value of cdr.
The invocation syntax of a cdr or rest form is a syntactic place. The place is semantically correct if object is a valid argument for the rplacd function. Modifying the place denoted by the form is equivalent to invoking rplacd with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplacd function, and obeys the same restrictions.
A cdr place supports deletion, according to the following near equivalence:
(del (cdr place)) <--> (prog1 (cdr place)
(set place (car place)))
The place expression is evaluated only once.
Note that this is symmetric with the delete semantics of car in that the cons stored in place goes away, as does the cdr field, leaving just the car, which takes the place of the original cons.
Example:
Walk every element of the list (1 2 3) using a for loop:
(for ((i '(1 2 3))) (i) ((set i (cdr i)))
(print (car i) *stdout*)
(print #\newline *stdout*))
The variable i marches over the cons cells which make up the "backbone" of the list. The elements are retrieved using the car function. Advancing to the next cell is achieved using (cdr i). If i is the last cell in a (proper) list, (cdr i) yields nil and so i becomes nil, the loop guard expression i fails and the loop terminates.
(rplaca object new-car-value)
(rplacd object new-cdr-value)
If object is a cons cell or lazy cons cell, then rplaca and rplacd functions assign new values into the car and cdr fields of the object. In addition, these functions are meaningful for other kinds of objects also.
Note that, except for the difference in return value, (rplaca x y) is the same as the more generic (set (car x) y), and likewise (rplacd x y) can be written as (set (cdr x) y).
The rplaca and rplacd functions return cons. Note: In TXR versions 89 and earlier, these functions returned the new value. The behavior was undocumented.
The cons argument does not have to be a cons cell. Both functions support meaningful semantics for vectors and strings. If cons is a string, it must be modifiable.
The rplaca function replaces the first element of a vector or first character of a string. The vector or string must be at least one element long.
The rplacd function replaces the suffix of a vector or string after the first element with a new suffix. The new-cdr-value must be a sequence, and if the suffix of a string is being replaced, it must be a sequence of characters. The suffix here refers to the portion of the vector or string after the first element.
It is permissible to use rplacd on an empty string or vector. In this case, new-cdr-value specifies the contents of the entire string or vector, as if the operation were done on a nonempty vector or string, followed by the deletion of the first element.
The object argument may be a structure. In the case of rplaca, the structure must have a defined rplaca method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the integer zero, and new-car-value.
In the case of rplacd, the structure must have a defined rplacd method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the range value #R(1 t) and new-car-value.
(first object)
(second object)
(third object)
(fourth object)
(fifth object)
(sixth object)
(seventh object)
(eighth object)
(ninth object)
(tenth object)
(set (first object) new-value)
(set (second object) new-value)
...
(set (tenth object) new-value)
Used as functions, these accessors retrieve the elements of a sequence by position. If the sequence is shorter than implied by the position, these functions return nil.
When used as syntactic places, these accessors denote the storage locations by position. The location must exist, otherwise an error exception results. The places support deletion.
(third '(1 2)) -> nil
(second "ab") -> #\b
(third '(1 2 . 3)) -> **error, improper list*
(let ((x (copy "abcd")))
(inc (third x))
x) -> "abce"
(append [sequence*])
(nconc [sequence*])
The append function creates a new object which is a catenation of the list arguments. All arguments are optional; append produces the empty list, and if a single argument is specified, that argument is returned.
If two or more arguments are present, then the situation is identified as one or more sequence arguments followed by last-arg. The sequence arguments must be sequences; last-arg may be a sequence or atom.
The append operation over three or more arguments is left-associative, such that (append x y z) is equivalent to both (append (append x y) z) and (append x (append z y)).
This allows the catenation of an arbitrary number of arguments to be understood in terms of a repeated application of the two-argument case, whose semantics is given by these rules:
(append nil nil) -> nil
(append nil '(1 2)) -> (1 2) (append nil '(1 2 . 3)) -> (1 2 . 3)
(append '(1 2) nil) -> (1 2)
(append '(1 2) #(3)) -> (1 2 . #(3)) (append '(1 2) 3) -> (1 2 . 3)
(append #(1 2) #(3 4)) -> #(1 2 3 4) (append "ab" "cd") -> "abcd" (append "ab" #(#\c #\d)) -> "abcd" (append "ab" #(3 4)) -> ;; error
(append #(1 2) 3) -> #(1 2 3) (append "ab" #) -> "abc" (append "ab" 3) -> ;; error
(append '(1 2 . "ab") "c") -> (1 2 . "abc") (append '(1 2 . "ab") '(2 3)) -> ;; error
(append 1 2) -> ;; error (append '(1 . 2) 3) -> ;; error
If N arguments are specified, where N > 1, then the first N-1 arguments must be proper lists. Copies of these lists are catenated together. The last argument N, shown in the above syntax as last-arg, may be any kind of object. It is installed into the cdr field of the last cons cell of the resulting list. Thus, if argument N is also a list, it is catenated onto the resulting list, but without being copied. Argument N may be an atom other than nil; in that case append produces an improper list.
The nconc function works like append, but may destructively manipulate any of the input objects.
;; An atom is returned.
(append 3) -> 3
;; A list is also just returned: no copying takes place.
;; The eq function can verify that the same object emerges
;; from append that went in.
(let ((list '(1 2 3)))
(eq (append list) list)) -> t
(append '(1 2 3) '(4 5 6) 7) -> '(1 2 3 4 5 6 . 7))
;; the (4 5 6) tail of the resulting list is the original
;; (4 5 6) object, shared with that list.
(append '(1 2 3) '(4 5 6)) -> '(1 2 3 4 5 6)
(append nil) -> nil
;; (1 2 3) is copied: it is not the last argument
(append '(1 2 3) nil) -> (1 2 3)
;; empty lists disappear
(append nil '(1 2 3) nil '(4 5 6)) -> (1 2 3 4 5 6)
(append nil nil nil) -> nil
;; atoms and improper lists other than in the last position
;; are erroneous
(append '(a . b) 3 '(1 2 3)) -> **error**
;; sequences other than lists can be catenated.
(append "abc" "def" "g" #\h) -> "abcdefgh"
;; lists followed by non-list sequences end with non-list
;; sequences catenated in the terminating atom:
(append '(1 2) '(3 4) "abc" "def") -> (1 2 3 4 . "abcdef")
(append* [list*])
The append* function lazily catenates lists.
If invoked with no arguments, it returns nil. If invoked with a single argument, it returns that argument.
Otherwise, it returns a lazy list consisting of the elements of every list argument from left to right.
Arguments other than the last are treated as lists, and traversed using car and cdr functions to visit their elements.
The last argument isn't traversed: rather, that object itself becomes the cdr field of the last cons cell of the lazy list constructed from the previous arguments.
(revappend list1 list2)
(nreconc list1 list2)
The revappend function returns a list consisting of list2 appended to a reversed copy of list1. The returned object shares structure with list2, which is unmodified.
The nreconc function behaves similarly, except that the returned object may share structure with not only list2 but also list1, which is modified.
(list value*)
The list function creates a new list, whose elements are the argument values.
(list) -> nil
(list 1) -> (1)
(list 'a 'b) -> (a b)
(list* value*)
The list* function is a generalization of cons. If called with exactly two arguments, it behaves exactly like cons: (list* x y) is identical to (cons x y). If three or more arguments are specified, the leading arguments specify additional atoms to be consed to the front of the list. So for instance (list* 1 2 3) is the same as (cons 1 (cons 2 3)) and produces the improper list (1 2 . 3). Generalizing in the other direction, list* can be called with just one argument, in which case it returns that argument, and can also be called with no arguments in which case it returns nil.
(list*) -> nil
(list* 1) -> 1
(list* 'a 'b) -> (a . b)
(list* 'a 'b 'c) -> (a b . c)
Note that unlike in some other Lisp dialects, the effect of (list* 1 2 x) can also be obtained using (list 1 2 . x). However, (list* 1 2 (func 3)) cannot be rewritten as (list 1 2 . (func 3)) because the latter is equivalent to (list 1 2 func 3).
(sub-list list [from [to]])
(set (sub-list list [from [to]]) new-value)
The sub-list function has the same parameters and semantics as the sub function, except that it operates on its list argument using list operations, and assumes that list is terminated by nil.
If a sub-list form is used as a place, then the list argument form must also be a place.
The sub-list place denotes a subrange of list as if it were a storage location. The previous value of this location, if needed, is fetched by a call to sub-list. Storing new-value to the place is performed by a call to replace-list. The return value of replace-list is stored into list. In an update operation which accesses the prior value and stores a new value, the arguments list, from, to and new-value are evaluated once.
(replace-list list item-sequence [from [to]])
The replace-list function is like the replace function, except that it operates on its list argument using list operations. It assumes that list is terminated by nil, and that it is made of cells which can be mutated using rplaca.
(listp value)
(proper-list-p value)
The listp and proper-list-p functions test, respectively, whether value is a list, or a proper list, and return t or nil accordingly.
The listp test is weaker, and executes without having to traverse the object. The value produced by the expression (listp x) is the same as that of (or (null x) (consp x)), except that x is evaluated only once. The empty list nil is a list, and a cons cell is a list.
The proper-list-p function returns t only for proper lists. A proper list is either nil, or a cons whose cdr is a proper list. proper-list-p traverses the list, and its execution will not terminate if the list is circular.
These functions return nil for list-like sequences that are not made of actual cons cells.
Dialect Note: in TXR 137 and older, proper-list-p is called proper-listp. The name was changed for adherence to conventions and compatibility with other Lisp dialects, like Common Lisp. However, the function continues to be available under the old name. Code that must run on TXR 137 and older installations should use proper-listp, but its use going forward is deprecated.
(endp object)
The endp function returns t if object is the object nil.
If object is a cons cell, then endp returns t.
Otherwise, endp function throws an exception.
(length-list list)
The length-list function returns the length of list, which may be a proper or improper list. The length of a list is the number of conses in that list.
(copy-list list)
The copy-list function which returns a list similar to list, but with a newly allocated cons-cell structure.
If list is an atom, it is simply returned.
Otherwise, list is a cons cell, and copy-list returns the same object as the expression (cons (car list) (copy-list (cdr list))).
Note that the object (car list) is not deeply copied, but only propagated by reference into the new list. copy-list produces a new list structure out of the same items that are in list.
Common Lisp does not allow the argument to be an atom, except for the empty list nil.
(copy-cons cons)
The copy-cons function creates and returns a new object that is a replica of cons.
The cons argument must be either a cons cell, or else a lazy cons: an object of type lcons.
A new cell of the same type as cons is created, and all of its fields are initialized by copying the corresponding fields from cons.
If cons is lazy, the newly created object is in the same state as the original. If the original has not yet been updated and thus has an update function, the copy also has not yet been updated and has the same update function.
(copy-tree obj)
The copy-tree function returns a copy of obj which represents an arbitrary cons-cell-based structure.
The cell structure of obj is traversed and a similar structure is constructed, but without regard for substructure sharing or circularity.
More precisely, if obj is an atom, then it is returned. If it is an ordinary cons cell, then copy-tree is recursively applied to the car and cdr fields to produce their individual replicas. A new cons cell is then produced from the replicated car and cdr. If obj is a lazy cons, then just like in the ordinary cons case, the car and cdr fields are duplicated with a recursive call to copy-tree. Then, a lazy cons is created from these replicated fields. If cell has an update function, then the newly created lazy cons has the same update function; the function isn't copied.
Like copy-cons, the copy-tree function doesn't trigger the update of lazy conses. The copies of lazy conses which have not been updated are also conses which have not been updated.
(reverse list)
(nreverse list)
Description:
The functions reverse and nreverse produce an object which contains the same items as proper list list, but in reverse order. If list is nil, then both functions return nil.
The reverse function is non-destructive: it creates a new list.
The nreverse function creates the structure of the reversed list out of the cons cells of the input list, thereby destructively altering it (if it contains more than one element). How nreverse uses the material from the original list is unspecified. It may rearrange the cons cells into a reverse order, or it may keep the structure intact, but transfer the car values among cons cells into reverse order. Other approaches are possible.
(nthlast index list)
(set (nthlast index list) new-value)
The nthlast function retrieves the n-th last cons cell of a list, indexed from one. The index parameter must be a an integer. If index is positive and so large that it specifies a nonexistent cons beyond the beginning of the list, nthlast returns list. Effectively, values of index larger than the length of the list are clamped to the length. If index is negative, then nthlast yields nil. An index value of zero retrieves the terminating atom of list or else the value list itself, if list is an atom.
The following equivalences hold:
(nthlast 1 list) <--> (last list)
An nthlast place designates the storage location which holds the n-th cell, as indicated by the value of index.
A negative index doesn't denote a place.
A positive index greater than the length of the list is treated as if it were equal to the length of the list.
If list is itself a syntactic place, then the index value n is permitted for a list of length n. This index value denotes the list place itself. Storing to this value overwrites list. If list isn't a syntactic place, then storing to position n isn't permitted.
If list is of length zero, or an atom (in which case its length is considered to be zero) then the above remarks about position n apply to an index value of zero: if list is a syntactic place, then the position denotes list itself, otherwise the position doesn't exist as a place.
If list contains one or more elements, then index value of zero denotes the cdr field of its last cons cell. Storing a value to this place overwrites the terminating atom.
(butlastn num list)
(set (butlastn num list) new-value )
The butlastn function calculates that initial portion of list which excludes the last num elements.
Note: the butlastn function doesn't support non-list sequences as sequences; it treats them as the terminating atom of a zero-length improper list. The butlast sequence function supports non-list sequences. If x is a list, then the following equivalence holds:
(butlastn n x) <--> (butlast x n)
If num is zero, or negative, then butlastn returns list.
If num is positive, and meets or exceeds the length of list, then butlastn returns nil.
If a butlastn form is used as a syntactic place, then list must be a place. Assigning to the form causes list to be replaced with a new list which is a catenation of the new value and the last num elements of the original list, according to the following equivalence:
(set (butlastn n x) v)
<-->
(progn (set x (append v (nthlast n x))) v)
except that n, x and v are evaluated only once, in left-to-right order.
(nth index object)
(set (nth index object) new-value)
The nth function performs random access on a list, retrieving the n-th element indicated by the zero-based index value given by index. The index argument must be a nonnegative integer.
If index indicates an element beyond the end of the list, then the function returns nil.
The following equivalences hold:
(nth 0 list) <--> (car 0) <--> (first list)
(nth 1 list) <--> (cadr list) <--> (second list)
(nth 2 list) <--> (caddr list) <--> (third list)
(nth x y) <--> (car (nthcdr x y))
(nthcdr index list)
(set (nthcdr index list) new-value)
The nthcdr function retrieves the n-th cons cell of a list, indexed from zero. The index parameter must be a nonnegative integer. If index specifies a nonexistent cons beyond the end of the list, then nthcdr yields nil.
The following equivalences hold:
(nthcdr 0 list) <--> list
(nthcdr 1 list) <--> (cdr list)
(nthcdr 2 list) <--> (cddr list)
(car (nthcdr x y)) <--> (nth x y)
An nthcdr place designates the storage location which holds the n-th cell, as indicated by the value of index. Indices beyond the last cell of list do not designate a valid place. If list is itself a place, then the zeroth index is permitted and the resulting place denotes list. Storing a value to (nthcdr 0 list) overwrites list. Otherwise if list isn't a syntactic place, then the zeroth index does not designate a valid place; index must have a positive value. A nthcdr place does not support deletion.
In Common Lisp, nthcdr is only a function, not an accessor; nthcdr forms do not denote places.
(tailp object list)
The tailp function tests whether object is a tail of list. This means that object is either list itself, or else one of the cons cells of list or else the terminating atom of list.
More formally, a recursive definition follows. If object and list are the same object (thus equal under the eq function) then tailp returns t. If list is an atom, and is not object, then the function returns nil. Otherwise, list is a cons that is not object and tailp yields the same value as the (tailp object (cdr list)) expression.
(caar object)
(cadr object)
(cdar object)
(cddr object)
...
(cdddr object)
(set (caar object) new-value)
(set (cadr object) new-value)
...
The
a-d accessors
provide a shorthand notation for accessing two to five
levels deep into a cons-cell-based tree structure. For instance, the
the equivalent of the nested function call expression
(car (car (cdr object)))
can be achieved using the single function call
(caadr object).
The symbol names of the a-d accessors are a generalization of the words
"car" and "cdr". They encode the pattern of
car
and
cdr
traversal of the structure using a sequence of the letters
a
and
d
placed between
c
and
r.
The traversal is encoded in right-to-left order, so that
cadr
indicates a traversal of the
cdr
link, followed by the
car.
This order corresponds to the nested function call notation, which also
encodes the traversal right-to-left. The following diagram illustrates
the straightforward relationship:
(cdr (car (cdr x)))
^ ^ ^
| / |
| / /
| / ____/
|| /
(cdadr x)
TXR Lisp provides all possible a-d accessors up to five levels deep, from caar all the way through cdddddr.
Expressions involving a-d accessors are places. For example, (caddr x) denotes the same place as (car (cddr x)), and (cdadr x) denotes the same place as (cdr (cadr x)).
The a-d accessor places support deletion, with semantics derived from the deletion semantics of the car and cdr places. For example, (del (caddr x)) means the same as (del (car (cddr x))).
(cyr address object)
(cxr address object)
The cyr and cxr functions provide car/cdr navigation of tree structure driven by numeric address given by the address argument.
The address argument can express any combination of the application of car and cdr functions, including none at all.
The difference between cyr and cxr is the bit order of the encoding. Under cyr, the most significant bit of the encoding given in address indicates the initial car/cdr navigation, and the least significant bit gives the final one. Under cxr, it is opposite.
Both functions require address to be a positive integer. Any other argument raises an error.
Under both functions, the address value 1 encodes the identity operation: no car/cdr
(flatten list)
(flatten* list)
The flatten function produces a list whose elements are all of the non-nil atoms contained in the structure of list.
The flatten* function works like flatten except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.
(flatten '(1 2 () (3 4))) -> (1 2 3 4)
;; equivalent to previous, since
;; nil is the same thing as ()
(flatten '(1 2 nil (3 4))) -> (1 2 3 4)
(flatten nil) -> nil
(flatten '(((()) ()))) -> nil
(flatcar tree)
(flatcar* tree)
The flatcar function produces a list of all the atoms contained in the tree structure tree, in the order in which they appear, when the structure is traversed left to right.
This list includes those nil atoms which appear in car fields.
The list excludes nil atoms which appear in cdr fields.
The flatcar* function works like flatcar except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.
(flatcar '(1 2 () (3 4))) -> (1 2 nil 3 4)
(flatcar '(a (b . c) d (e) (((f)) . g) (nil . z) nil . h))
--> (a b c d e f g nil z nil h)
(tree-find obj tree test-function)
The tree-find function searches tree for an occurrence of obj. Tree can be any atom, or a cons. If tree it is a cons, it is understood to be a proper list whose elements are also trees.
The equivalence test is performed by test-function which must take two arguments, and has conventions similar to eq, eql or equal.
tree-find works as follows. If tree is equivalent to obj under test-function, then t is returned to announce a successful finding. If this test fails, and tree is an atom, nil is returned immediately to indicate that the find failed. Otherwise, tree is taken to be a proper list, and tree-find is recursively applied to each element of the list in turn, using the same obj and test-function arguments, stopping at the first element which returns a non-nil value.
(memq object list)
(memql object list)
(memqual object list)
The memq, memql and memqual functions search list for a member which is, respectively, eq, eql or equal to object. (See the eq, eql and equal functions above.)
If no such element found, nil is returned.
Otherwise, that suffix of list is returned whose first element is the matching object.
(member key sequence [testfun [keyfun]])
(member-if predfun sequence [keyfun])
The member and member-if functions search through sequence for an item which matches a key, or satisfies a predicate function, respectively.
The keyfun argument specifies a function which is applied to the elements of the sequence to produce the comparison key. If this argument is omitted, then the untransformed elements of the sequence themselves are examined.
The member function's testfun argument specifies the test function which is used to compare the comparison keys taken from the sequence to the search key. If this argument is omitted, then the equal function is used. If member does not find a matching element, it returns nil. Otherwise it returns the suffix of sequence which begins with the matching element.
The member-if function's predfun argument specifies a predicate function which is applied to the successive comparison keys pulled from the sequence by applying the key function to successive elements. If no match is found, then nil is returned, otherwise what is returned is the suffix of sequence which begins with the matching element.
(rmemq object list)
(rmemql object list)
(rmemqual object list)
(rmember key sequence [testfun [keyfun]])
(rmember-if predfun sequence [keyfun])
These functions are counterparts to memq, memql, memqual, member and member-if which look for the rightmost element which matches object, rather than for the leftmost element.
(conses list)
(conses* list)
These functions return a list whose elements are the conses which make up list. The conses* function does this in a lazy way, avoiding the computation of the entire list: it returns a lazy list of the conses of list. The conses function computes the entire list before returning.
The input list may be proper or improper.
The first cons of list is that list itself. The second cons is the rest of the list, or (cdr list). The third cons is (cdr (cdr list)) and so on.
(conses '(1 2 3)) -> ((1 2 3) (2 3) (3))
These functions are useful for simulating the maplist function found in other dialects like Common Lisp.
TXR Lisp's (conses x) can be expressed in Common Lisp as (maplist #'identity x).
Conversely, the Common Lisp operation (maplist function list) can be computed in TXR Lisp as (mapcar function (conses list)).
More generally, the Common Lisp operation
(maplist function list0 list1 ... listn)
can be expressed as:
(mapcar function (conses list0)
(conses list1) ... (conses listn))
(delcons cons list)
The delcons function destructively removes a cons cell from a list. The list is searched to see whether one of its cons cells is the same object as cons. If so, that cell is removed from the list.
The list argument may be a proper or improper list, possibly empty. It may also be an atom other than nil, which is regarded as being, effectively, an empty improper list terminated by that atom.
The operation of delcons is divided into the following three cases. If cons is the first cons cell of list, then the cdr of list is returned. If cons is the second or subsequent cons of list, then list is destructively altered to remove cons and then returned. This means that the cdr field of the predecessor of cons is altered from referencing cons to referencing (cdr cons) instead. The returned value is the same cons cell as list. The third case occurs when cons is not found in list. In this situation, list is returned unchanged.
(let ((x (list 1 2 3)))
(delcons x x))
-> (2 3)
(let ((x (list 1 2 . 3)))
(delcons (cdr x) x))
-> (1 . 3)
Association lists are ordinary lists formed according to a special convention. Firstly, any empty list is a valid association list. A nonempty association list contains only cons cells as the key elements. These cons cells are understood to represent key/value associations, hence the name "association list".
(assoc key alist)
The assoc function searches an association list alist for a cons cell whose car field is equivalent to key under the equal function. The first such cons is returned. If no such cons is found, nil is returned.
(assq key alist)
(assql key alist)
The assq and assql functions are very similar to assoc, with the only difference being that they determine equality using, respectively, the eq and eql functions rather than equal.
(rassq value alist)
(rassql value alist)
(rassoc value alist)
The rassq, rassql and rassoc functions are reverse lookup counterparts to assql and assoc. When searching, they examine the cdr field of the pairs of alist rather than the car field.
The rassoc function searches association list alist for a cons whose cdr field equivalent to value according to the equal function. If such a cons is found, it is returned. Otherwise nil is returned.
The rassq and rassql functions search in the same way as rassoc but compares values using, respectively, eq and eql.
(acons car cdr alist)
The acons function constructs a new alist by consing a new cons to the front of alist. The following equivalence holds:
(acons car cdr alist) <--> (cons (cons car cdr) alist)
(acons-new car cdr alist)
The acons-new function searches alist, as if using the assoc function, for an existing cell which matches the key provided by the car argument. If such a cell exists, then its cdr field is overwritten with the cdr argument, and then the alist is returned. If no such cell exists, then a new list is returned by adding a new cell to the input list consisting of the car and cdr values, as if by the acons function.
(aconsql-new car cdr alist)
The aconsql-new function has similar same parameters and semantics as acons-new, except that the eql function is used for equality testing. Thus, the list is searched for an existing cell as if using the assql function rather than assoc.
(alist-remove alist key*)
The alist-remove function takes association list alist and produces a duplicate from which cells matching any of the specified keys have been removed.
(alist-nremove alist key*)
The alist-nremove function is like alist-remove, but potentially destructive. The input list alist may be destroyed and its structural material reused to form the output list. The application should not retain references to the input list.
(copy-alist alist)
The copy-alist function duplicates alist. Unlike copy-list, which only duplicates list structure, copy-alist also duplicates each cons cell of the input alist. That is to say, each element of the output list is produced as if by the copy-cons function applied to the corresponding element of the input list.
(pairlis keys values [alist])
The pairlis function returns an association list consisting of pairs formed from the elements of keys and values prepended to the existing alist.
If an alist argument is omitted, it defaults to nil.
Pairs of elements are formed by taking successive elements from the keys and values sequences in parallel.
If the sequences are not of equal length, the excess elements from the longer sequence are ignored.
The pairs appear in the resulting list in the original order in which their constituents appeared in keys and values.
The ANSI CL pairlis requires key and data to be lists, not sequences. The behavior of the ANSI CL pairlis is undefined of those lists are of different lengths. Finally, the elements are permitted to appear in either the original order or reverse order.
(pairlis nil nil) -> nil
(pairlis "abc" #(1 2 3 4)) -> ((#\a . 1) (#\b . 2) (#\c . 3))
(pairlis '(1 2 3) '(a b c) '((x . y) (z . w)))
-> ((1 . a) (2 . b) (3 . c) (x . y) (z . w))
A property list, also referred to as a plist, is a flat list of even length consisting of interleaved pairs of property names (usually symbols) and their values (arbitrary objects). An example property list is (:a 1 :b "two") which contains two properties, :a having value 1, and :b having value "two".
An improper plist represents Boolean properties in a condensed way, as property indicators which are not followed by a value. Such properties only indicate their presence or absence, which is useful for encoding a Boolean value. If it is absent, then the property is false. Correctly using an improper plist requires that the exact set of Boolean keys is established by convention.
In this document, the unqualified terms property list and plist refer strictly to an ordinary plist, not to an improper plist.
Unlike in some other Lisp dialects, including ANSI Common Lisp, symbols do not have property lists in TXR Lisp. Improper plists aren't a concept in ANSI CL.
(prop plist key)
The prop function searches property list plist for key key. If the key is found, then the value next to it is returned. Otherwise nil is returned.
It is ambiguous whether nil is returned due to the property not being found, or due to the property being present with a nil value.
The indicators in plist are compared with key using eq equality, allowing them to be symbols, characters or fixnum integers.
(memp key plist)
The memp function searches property list plist for key key, using eq equality.
If the key is found, then the entire suffix of plist beginning with the indicator is returned, such that the first element of the returned list is key and the second element is the property value.
Note the reversed argument convention relative to the prop function, harmonizing with functions in the member family.
(plist-to-alist plist)
(improper-plist-to-alist imp-plist bool-keys)
The functions plist-to-alist and improper-plist-to-alist convert, respectively, a property list and improper property list to an association list.
The plist-to-alist function scans plist and returns the indicator-property pairs as a list of cons cells, such that each car is the indicator, and each cdr is the value.
The improper-plist-to-alist is similar, except that it handles the Boolean properties which, by convention, aren't followed by a value. The list of all such indicators is specified by the bool-keys argument.
(plist-to-alist '(a 1 b 2)) --> ((a . 1) (b . 2))
(improper-plist-to-alist '(:x 1 :blue :y 2) '(:blue))
--> ((:x . 1) (:blue) (:y . 2))
Note: these functions operate on lists. The principal sorting function in TXR Lisp is sort, described under Sequence Manipulation.
The merge function described here provides access to an elementary step of the algorithm used internally by sort when operating on lists.
The multi-sort operation sorts multiple lists in parallel. It is implemented using sort.
(merge seq1 seq2 [lessfun [keyfun]])
The merge function merges two sorted sequences seq1 and seq2 into a single sorted sequence. The semantics and defaulting behavior of the lessfun and keyfun arguments are the same as those of the sort function.
The sequence which is returned is of the same kind as seq1.
This function is destructive of any inputs that are lists. If the output is a list, it is formed out of the structure of the input lists.
(multi-sort columns less-funcs [key-funcs])
The multi-sort function regards a list of lists to be the columns of a database. The corresponding elements from each list constitute a record. These records are to be sorted, producing a new list of lists.
The columns argument supplies the list of lists which comprise the columns of the database. The lists should ideally be of the same length. If the lists are of different lengths, then the shortest list is taken to be the length of the database. Excess elements in the longer lists are ignored, and do not appear in the sorted output.
The less-funcs argument supplies a list of comparison functions which are applied to the columns. Successive functions correspond to successive columns. If less-funcs is an empty list, then the sorted database will emerge in the original order. If less-funcs contains exactly one function, then the rows of the database are sorted according to the first column. The remaining columns simply follow their row. If less-funcs contains more than one function, then additional columns are taken into consideration if the items in the previous columns compare equal. For instance if two elements from column one compare equal, then the corresponding second column elements are compared using the second column comparison function. The less-funcs argument may be a function object, in which case it is treated as if it were a one-element list containing that function object.
The optional key-funcs argument supplies transformation functions through which column entries are converted to comparison keys, similarly to the single key function used in the sort function and others. If there are more key functions than less functions, the excess key functions are ignored.
(make-lazy-cons function [car [cdr]])
The function make-lazy-cons makes a special kind of cons cell called a lazy cons, whose type is lcons. Lazy conses are useful for implementing lazy lists.
Lazy lists are lists which are not allocated all at once. Rather, the elements of its structure materialize just before they are accessed.
A lazy cons has car and cdr fields like a regular cons, and those fields are initialized to the values of the car and cdr arguments of make-lazy-cons when the lazy cons is created. These arguments default to nil if omitted. A lazy cons also has an update function, which is specified by the function argument to make-lazy-cons.
The function argument must be a function that may be called with exactly one parameter.
When either the car and cdr fields of a cons are accessed for the first time to retrieve their value, function is automatically invoked first, and is given the lazy cons as a parameter. That function has the opportunity to store new values into the car and cdr fields. Once the function is called, it is removed from the lazy cons: the lazy cons no longer has an update function. If the update function itself attempts to retrieve the value of the lazy cons cell's car or cdr field, it will be recursively invoked.
The functions lcons-car and lcons-cdr may be used to access the fields of a lazy cons without triggering the update function.
Storing a value into either the car or cdr field does not have the effect of invoking the update function.
If the function terminates by returning normally, the access to the value of the field then proceeds in the ordinary manner, retrieving whatever value has most recently been stored.
The return value of the function is ignored.
To perpetuate the growth of a lazy list, the function can make another call to make-lazy-cons and install the resulting cons as the cdr of the lazy cons.
;;; lazy list of integers between min and max
(defun integer-range (min max)
(let ((counter min))
;; min is greater than max; just return empty list,
;; otherwise return a lazy list
(if (> min max)
nil
(make-lazy-cons
(lambda (lcons)
;; install next number into car
(rplaca lcons counter)
;; now deal wit cdr field
(cond
;; max reached, terminate list with nil!
((eql counter max)
(rplacd lcons nil))
;; max not reached: increment counter
;; and extend with another lazy cons
(t
(inc counter)
(rplacd lcons
(make-lazy-cons
(lcons-fun lcons))))))))))
(lconsp value)
The lconsp function returns t if value is a lazy cons cell. Otherwise it returns nil, even if value is an ordinary cons cell.
(lcons-fun lazy-cons)
The lcons-fun function retrieves the update function of a lazy cons. Once a lazy cons has been accessed, it no longer has an update function and lcons-fun returns nil. While the update function of a lazy cons is executing, it is still accessible. This allows the update function to retrieve a reference to itself and propagate itself into another lazy cons (as in the example under make-lazy-cons).
(lcons-car lazy-cons)
(lcons-cdr lazy-cons)
The functions lcons-car and lcons-cdr retrieve the car and cdr fields of lazy-cons, without triggering the invocation of its associated update function.
The lazy-cons argument must be an object of type lcons. Unlike the functions car and cdr, These functions cannot be applied to any other type of object.
Note: these functions may be used by the update function to retrieve the values which were stored into lazy-cons by the make-lazy-cons constructor, without triggering recursion. The function may then overwrite either or both of these values. This allows the fields of the lazy cons to store state information necessary for the propagation of a lazy list. If that state information consists of no more than two values, then no additional context object need be allocated.
(lcons car-expression cdr-expression)
The lcons macro simplifies the construction of structures based on lazy conses. Syntactically, it resembles the cons function. However, the arguments are expressions rather than values. The macro generates code which, when evaluated, immediately produces a lazy cons. The expressions car-expression and cdr-expression are not immediately evaluated. Rather, when either the car or cdr field of the lazy cons cell is accessed, these expressions are both evaluated at that time, in the order that they appear in the lcons expression, and in the original lexical scope in which that expression was evaluated. The return values of these expressions are used, respectively, to initialize the corresponding fields of the lazy cons.
Note: the lcons macro may be understood in terms of the following reference implementation, as a syntactic sugar combining the make-lazy-cons constructor with a lexical closure provided by a lambda function:
(defmacro lcons (car-form cdr-form)
(let ((lc (gensym)))
^(make-lazy-cons (lambda (,lc)
(rplaca ,lc ,car-form)
(rplacd ,lc ,cdr-form)))))
;; Given the following function ...
(defun fib-generator (a b)
(lcons a (fib-generator b (+ a b))))
;; ... the following function call generates the Fibonacci
;; sequence as an infinite lazy list.
(fib-generator 1 1) -> (1 1 2 3 5 8 13 ...)
(lazy-stream-cons stream [no-throw-close-p])
(get-lines [stream [no-throw-close-p]])
The lazy-stream-cons and get-lines functions are synonyms, except that the stream argument is optional in get-lines and defaults to *stdin*. Thus, the following description of lazy-stream-cons also applies to get-lines.
The lazy-stream-cons returns a lazy cons which generates a lazy list based on reading lines of text from input stream stream, which form the elements of the list. The get-line function is called on demand to add elements to the list.
The lazy-stream-cons function itself makes the first call to get-line on the stream. If this returns nil, then the stream is closed and nil is returned. Otherwise, a lazy cons is returned whose update function will install that line into the car field of the lazy cons, and continue the lazy list by making another call to lazy-stream-cons, installing the result into the cdr field. When this lazy list obtains an end-of-file indication from the stream, it closes the stream.
lazy-stream-cons inspects the real-time property of a stream as if by the real-time-stream-p function. This determines which of two styles of lazy list are returned. For an ordinary (non-real-time) stream, the lazy list treats the end-of-file condition accurately: an empty file turns into the empty list nil, a one line file into a one-element list which contains that line and so on. This accuracy requires one line of lookahead which is not acceptable in real-time streams, and so a different type of lazy list is used, which generates an extra nil item after the last line. Under this type of lazy list, an empty input stream translates to the list (nil); a one-line stream translates to ("line" nil) and so forth.
If and when stream is closed by the function directly, or else by the returned lazy list, the no-throw-close-p Boolean argument, defaulting to nil, controls the throw-on-error-p argument of the call to the close-stream function. These arguments have opposite polarity: if no-throw-close-p is true, then throw-on-error-p shall be false, and vice versa.
(delay expression)
The delay operator arranges for the delayed (or "lazy") evaluation of expression. This means that the expression is not evaluated immediately. Rather, the delay expression produces a promise object.
The promise object can later be passed to the force function (described later in this document). The force function will trigger the evaluation of the expression and retrieve the value.
The expression is evaluated in the original scope, no matter where the force takes place.
The expression is evaluated at most once, by the first call to force. Additional calls to force only retrieve a cached value.
;; list is popped only once: the value is computed
;; just once when force is called on a given promise
;; for the first time.
(defun get-it (promise)
(format t "*list* is ~s\n" *list*)
(format t "item is ~s\n" (force promise))
(format t "item is ~s\n" (force promise))
(format t "*list* is ~s\n" *list*))
(defvar *list* '(1 2 3))
(get-it (delay (pop *list*)))
Output:
*list* is (1 2 3)
item is 1
item is 1
*list* is (2 3)
(force promise)
(set (force promise) new-value)
The force function accepts a promise object produced by the delay macro. The first time force is invoked, the expression which was wrapped inside promise by the delay macro is evaluated (in its original lexical environment, regardless of where in the program the force call takes place). The value of expression is cached inside promise and returned, becoming the return value of the force function call. If the force function is invoked additional times on the same promise, the cached value is retrieved.
A force form is a syntactic place, denoting the value cache location within promise.
Storing a value in a force place causes future accesses to the promise to return that value.
If the promise had not yet been forced, then storing a value into it prevents that from ever happening. The delayed expression will never be evaluated.
If, while a promise is being forced, the evaluation of expression itself causes an assignment to the promise, it is not specified whether the promise will take on the value of expression or the assigned value.
(promisep object)
The promisep function returns t if object is a promise object: an object created by the delay macro. Otherwise it returns nil.
Note: promise objects are conses. The typeof function applied to a promise returns cons.
(mlet ({sym | (sym init-form)}*) body-form*)
The mlet macro ("magic let" or "mutual let") implements a variable binding construct similar to let and let*.
Under mlet, the scope of the bindings of the sym variables extends over the init-forms, as well as the body-forms.
Unlike the let* construct, each init-form has each sym in scope. That is to say, an init-form can refer not only to previous variables, but also to later variables as well as to its own variable.
The variables are not initialized until their values are accessed for the first time. Any sym whose value is not accessed is not initialized.
Furthermore, the evaluation of each init-form does not take place until the time when its value is needed to initialize the associated sym. This evaluation takes place once. If a given sym is not accessed during the evaluation of the mlet construct, then its init-form is never evaluated.
The bound variables may be assigned. If, before initialization, a variable is updated in such a way that its prior value is not needed, it is unspecified whether initialization takes place, and thus whether its init-form is evaluated.
Direct circular references are erroneous and are diagnosed. This takes place when the macro-expanded form is evaluated, not during the expansion of mlet.
;; Dependent calculations in arbitrary order
(mlet ((x (+ y 3))
(z (+ x 1))
(y 4))
(+ z 4)) --> 12
;; Error: circular reference:
;; x depends on y, y on z, but z on x again.
(mlet ((x (+ y 1))
(y (+ z 1))
(z (+ x 1)))
z)
;; Okay: lazy circular reference because lcons is used
(mlet ((list (lcons 1 list)))
list) --> (1 1 1 1 1 ...) ;; circular list
In the last example, the list variable is accessed for the first time in the body of the mlet form. This causes the evaluation of the lcons form. This form evaluates its arguments lazily, which means that it is not a problem that list is not yet initialized. The form produces a lazy cons, which is then used to initialize list. When the car or cdr fields of the lazy cons are accessed, the list expression in the lcons argument is accessed. By that time, the variable is initialized and holds the lazy cons itself, wh