diff options
Diffstat (limited to 'man2html/scripts/cgi-aux/man/mansearchhelp.aux')
-rw-r--r-- | man2html/scripts/cgi-aux/man/mansearchhelp.aux | 295 |
1 files changed, 295 insertions, 0 deletions
diff --git a/man2html/scripts/cgi-aux/man/mansearchhelp.aux b/man2html/scripts/cgi-aux/man/mansearchhelp.aux new file mode 100644 index 0000000..200b509 --- /dev/null +++ b/man2html/scripts/cgi-aux/man/mansearchhelp.aux @@ -0,0 +1,295 @@ +Content-type: text/html + +<HTML> +<HEAD> +<TITLE>Manual Pages - Search Help</TITLE> +<!-- Changed by: Michael Hamilton, 6-Apr-1996 --> +<!-- Note: this is not html, but requires preprocessing --> +</HEAD> +<BODY> +<H1>Manual Pages - Search Help</H1> + +<A HREF="%cg/mansearch">Perform another search</A> +<BR> +<A HREF="%cg/man2html">Return to Main Contents</A> +<P> +<HR> +<P> +The full text index uses the <I>Glimpse</I> text indexing system. +The +<A HREF="%cg/man2html?1+glimpse">glimpse(1)</A> +manual page documents glimpse in full. This summary documents those +features of glimpse that are valid when searching through the manual pages. +<P> +<HR> + +<H2>Search Options</H2> + +The following search options must be at the start of the search string. + +<DL COMPACT> + +<DT><B>-</B> <I>#</I> +<DD> +<I>#</I> is an integer between 1 and 8 +specifying the maximum number of errors +permitted in finding the approximate matches (the default is zero). +Generally, each insertion, deletion, or substitution counts as one error. +Since the index stores only lower case characters, errors of +substituting upper case with lower case may be missed. + +<DT><B>-B</B> +<DD> +Best match mode. (Warning: -B sometimes misses matches. It is safer +to specify the number of errors explicitly.) +When -B is specified and no exact matches are found, glimpse +will continue to search until the closest matches (i.e., the ones +with minimum number of errors) +are found. +In general, -B may be slower than -#, but not by very much. +Since the index stores only lower case characters, errors of +substituting upper case with lower case may be missed. + +<DT><B>-L <I>x</I> | <I>x</I>:<I>y</I> | <I>x</I>:<I>y</I>:<I>z</I></B> +<DD> +A non-zero value of <I>x</I> limits the number of matches +that will be shown. +A non-zero value of <I>y</I> limits the number of man pages +that will be shown. +A non-zero valye of <I>z</I> will only show pages that have +less that z matches. +For example, -L 0:10 will output all matches for the first 10 files that +contain a match. + +<DT><B>-F</B> <I>pattern</I> +<DD> +The -F option provides a pattern that restricts the search results to +those filenames that match the pattern. +or example, <I>-F 8</I> effectively restricts matches to section 8. + +<DT><B>-w</B> +<DD> +Search for the pattern as a word - i.e., surrounded by non-alphanumeric +characters. For example, +<I>-w -1 car</I> will match cars, but not characters and not +car10. +The non-alphanumeric <I>must</I> +surround the match; they cannot be counted as errors. +This option does not work with regular expressions. + +<DT><B>-W</B> +<DD> +The default for Boolean AND queries is that they cover one record +(the default for a record is one line) at a time. +For example, glimpse 'good;bad' will output all lines containing +both 'good' and 'bad'. +The -W option changes the scope of Booleans to be the whole file. +Within a file glimpse will output all matches to any of the patterns. +So, glimpse -W 'good;bad' will output all lines containing 'good' +<I>or</I> 'bad', but only in files that contain both patterns. + +<DT><B>-k</B> +<DD> +No symbol in the pattern is treated as a meta character. +For example, <I>-k a(b|c)*d</I> will find +the occurrences of a(b|c)*d whereas <I>a(b|c)*d</I> +will find substrings that match the regular expression 'a(b|c)*d'. +(The only exception is ^ at the beginning of the pattern and $ at the +end of the pattern, which are still interpreted in the usual way. +Use \^ or \$ if you need them verbatim.) + +</DL> + +<P> +<HR> + +<H2>Patterns</H2> + +<I>Glimpse</I> +supports a large variety of patterns, including simple +strings, strings with classes of characters, sets of strings, +wild cards, and regular expressions (see <A HREF="#limitations">LIMITATIONS</A>). + +<DL COMPACT> +<DT><B>Strings </B><DD> +Strings are any sequence of characters, including the special symbols +`^' for beginning of line and `$' for end of line. +The following special characters ( +`<B>$</B>', + +`^<B>',</B> + +`<B>*</B>', + +`<B>[</B>'<B>,</B> + +`<B>^</B>', + +`<B>|</B>', + +`<B>(</B>', + +`<B>)</B>', + +`<B>!</B>', + +and +`<B>\</B>' + +) +as well as the following meta characters special to glimpse (and agrep): +`<B>;</B>', + +`<B>,</B>', + +`<B>#</B>', + +`<B><</B>', + +`<B>></B>', + +`<B>-</B>', + +and +`<B>.</B>', + +should be preceded by `\' if they are to be matched as regular +characters. For example, \^abc\\ corresponds to the string ^abc\, +whereas ^abc corresponds to the string abc at the beginning of a +line. +<DT><B>Classes of characters</B><DD> +a list of characters inside [] (in order) corresponds to any character +from the list. For example, [a-ho-z] is any character between a and h +or between o and z. The symbol `^' inside [] complements the list. +For example, [^i-n] denote any character in the character set except +character 'i' to 'n'. +The symbol `^' thus has two meanings, but this is consistent with +egrep. +The symbol `.' (don't care) stands for any symbol (except for the +newline symbol). +<DT><B>Boolean operations</B><DD> +<B>Glimpse </B> + +supports an `AND' operation denoted by the symbol `;' +an `OR' operation denoted by the symbol `,', +or any combination. +For example, +<I>glimpse 'pizza;cheeseburger'</I> will output all lines containing +both patterns. +<I>glimpse -F 'gnu;\.c$' 'define;DEFAULT'</I> +will output all lines containing both 'define' and 'DEFAULT' +(anywhere in the line, not necessarily in order) in +files whose name contains 'gnu' and ends with .c. +<I>glimpse '{political,computer};science'</I> will match 'political science' +or 'science of computers'. +<DT><B>Wild cards</B><DD> +The symbol '#' is used to denote a sequence +of any number (including 0) +of arbitrary characters (see <A HREF="#limitations">LIMITATIONS</A>). +The symbol # is equivalent to .* in egrep. +In fact, .* will work too, because it is a valid regular expression +(see below), but unless this is part of an actual regular expression, +# will work faster. +(Currently glimpse is experiencing some problems with #.) +<DT><B>Combination of exact and approximate matching</B><DD> +Any pattern inside angle brackets <> must match the text exactly even +if the match is with errors. For example, <mathemat>ics matches +mathematical with one error (replacing the last s with an a), but +mathe<matics> does not match mathematical no matter how many errors are +allowed. +(This option is buggy at the moment.) +<DT><B>Regular expressions</B><DD> +Since the index is word based, a regular expression must match +words that appear in the index for glimpse to find it. +Glimpse first strips the regular expression from all non-alphabetic +characters, and searches the index for all remaining words. +It then applies the regular expression matching algorithm to the +files found in the index. +For example, <I>glimpse</I> 'abc.*xyz' will search the index +for all files that contain both 'abc' and 'xyz', and then +search directly for 'abc.*xyz' in those files. +(If you use glimpse -w 'abc.*xyz', then 'abcxyz' will not be found, +because glimpse +will think that abc and xyz need to be matches to whole words.) +The syntax of regular expressions in <B>glimpse</B> is in general the same as +that for <B>agrep</B>. The union operation `|', Kleene closure `*', +and parentheses () are all supported. +Currently '+' is not supported. +Regular expressions are currently limited to approximately 30 +characters (generally excluding meta characters). Some options +(-d, -w, -t, -x, -D, -I, -S) do not +currently work with regular expressions. +The maximal number of errors for regular expressions that use '*' +or '|' is 4. (See <A HREF="#limitations">LIMITATIONS</A>.) + +</DL> + +<HR> + +<H2><A NAME="limitations">Limitations</A></H2> + +The index of glimpse is word based. A pattern that contains more than +one word cannot be found in the index. The way glimpse overcomes this +weakness is by splitting any multi-word pattern into its set of words +and looking for all of them in the index. +For example, <B>glimpse 'linear programming'</B> will first consult the index +to find all files containing both <I>linear</I> and <I>programming</I>, +and then apply agrep to find the combined pattern. +This is usually an effective solution, but it can be slow for +cases where both words are very common, but their combination is not. +<P> + +As was mentioned in the section on PATTERNS above, some characters +serve as meta characters for glimpse and need to be +preceded by '\' to search for them. The most common +examples are the characters '.' (which stands for a wild card), +and '*' (the Kleene closure). +So, "glimpse ab.de" will match abcde, but "glimpse ab\.de" +will not, and "glimpse ab*de" will not match ab*de, but +"glimpse ab\*de" will. +The meta character - is translated automatically to a hyphen +unless it appears between [] (in which case it denotes a range of +characters). +<P> + +The index of glimpse stores all patterns in lower case. +When glimpse searches the index it first converts +all patterns to lower case, finds the appropriate files, +and then searches the actual files using the original +patterns. +So, for example, <I>glimpse ABCXYZ</I> will first find all +files containing abcxyz in any combination of lower and upper +cases, and then searches these files directly, so only the +right cases will be found. +One problem with this approach is discovering misspellings +that are caused by wrong cases. +For example, <I>glimpse -B abcXYZ</I> will first search the +index for the best match to abcxyz (because the pattern is +converted to lower case); it will find that there are matches +with no errors, and will go to those files to search them +directly, this time with the original upper cases. +If the closest match is, say AbcXYZ, glimpse may miss it, +because it doesn't expect an error. +Another problem is speed. If you search for "ATT", it will look +at the index for "att". Unless you use -w to match the whole word, +glimpse may have to search all files containing, for example, "Seattle" +which has "att" in it. +<P> + +There is no size limit for simple patterns and simple patterns +within Boolean expressions. +More complicated patterns, such as regular expressions, +are currently limited to approximately 30 characters. +Lines are limited to 1024 characters. +Records are limited to 48K, and may be truncated if they are larger +than that. +The limit of record length can be +changed by modifying the parameter Max_record in agrep.h. +<P> + +Glimpseindex does not index words of size > 64. +<A NAME="lbAQ"> </A> + +<HR> +</BODY> +</HTML> |