aboutsummaryrefslogtreecommitdiffstats
path: root/gawk-info-1
diff options
context:
space:
mode:
Diffstat (limited to 'gawk-info-1')
-rw-r--r--gawk-info-11231
1 files changed, 1231 insertions, 0 deletions
diff --git a/gawk-info-1 b/gawk-info-1
new file mode 100644
index 00000000..b40278a4
--- /dev/null
+++ b/gawk-info-1
@@ -0,0 +1,1231 @@
+Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
+file gawk.texinfo.
+
+This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+Copyright (C) 1989 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+
+File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir)
+
+This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them; it
+contains the following chapters:
+
+* Menu:
+
+* Preface:: What you can do with `awk'; brief history
+ and acknowledgements.
+
+* License:: Your right to copy and distribute `gawk'.
+
+* This Manual:: Using this manual.
+
+ Includes sample input files that you can use.
+
+* Getting Started:: A basic introduction to using `awk'.
+ How to run an `awk' program. Command line syntax.
+
+* Reading Files:: How to read files and manipulate fields.
+
+* Printing:: How to print using `awk'. Describes the
+ `print' and `printf' statements.
+ Also describes redirection of output.
+
+* One-liners:: Short, sample `awk' programs.
+
+* Patterns:: The various types of patterns explained in detail.
+
+* Actions:: The various types of actions are introduced here.
+ Describes expressions and the various operators in
+ detail. Also describes comparison expressions.
+
+* Statements:: The various control statements are described in
+ detail.
+
+* Arrays:: The description and use of arrays. Also includes
+ array--oriented control statements.
+
+* User-defined:: User--defined functions are described in detail.
+
+* Built-in:: The built--in functions are summarized here.
+
+* Special:: The special variables are summarized here.
+
+* Sample Program:: A sample `awk' program with a complete explanation.
+
+* Notes:: Something about the implementation of `gawk'.
+
+* Glossary:: An explanation of some unfamiliar terms.
+
+* Index::
+
+
+
+File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top
+
+Preface
+*******
+
+If you are like many computer users, you frequently would like to
+make changes in various text files wherever certain patterns appear,
+or extract data from parts of certain lines while discarding the
+rest. To write a program to do this in a language such as C or
+Pascal is a time--consuming inconvenience that may take many lines of
+code. The job may be easier with `awk'.
+
+The `awk' utility interprets a special--purpose programming language
+that makes it possible to handle simple data--reformatting jobs
+easily with just a few lines of code.
+
+The GNU implementation of `awk' is called `gawk'; it is fully upward
+compatible with the System V Release 3.1 and later version of `awk'.
+All properly written `awk' programs should work with `gawk'. So we
+usually don't distinguish between `gawk' and other `awk'
+implementations in this manual.
+
+This manual teaches you what `awk' does and how you can use `awk'
+effectively. You should already be familiar with basic,
+general--purpose, operating system commands such as `ls'. Using
+`awk' you can:
+
+ * manage small, personal databases,
+
+ * generate reports,
+
+ * validate data,
+
+ * produce indexes, and perform other document preparation tasks,
+
+ * even experiment with algorithms that can be adapted later to
+ other computer languages!
+
+* Menu:
+
+* History:: The history of gawk and awk. Acknowledgements.
+
+
+
+File: gawk-info, Node: History, Up: Preface
+
+History of `awk' and `gawk'
+===========================
+
+The name `awk' comes from the initials of its designers: Alfred V.
+Aho, Peter J. Weinberger, and Brian W. Kernighan. The original
+version of `awk' was written in 1977. In 1985 a new version made the
+programming language more powerful, introducing user--defined
+functions, multiple input streams, and computed regular expressions.
+
+The GNU implementation, `gawk', was written in 1986 by Paul Rubin and
+Jay Fenlason, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988, David Trueman, with
+help from Arnold Robbins, reworked `gawk' for compatibility with the
+newer `awk'.
+
+Many people need to be thanked for their assistance in producing this
+manual. Jay Fenlason contributed many ideas and sample programs.
+Richard Mlynarik and Robert Chassell gave helpful comments on drafts
+of this manual. The paper ``A Supplemental Document for `awk''' by
+John W. Pierce of the Chemistry Department at UC San Diego,
+pinpointed several issues relevant both to `awk' implementation and
+to this manual, that would otherwise have escaped us.
+
+Finally, we would like to thank Brian Kernighan of Bell Labs for
+invaluable assistance during the testing and debugging of `gawk', and
+for help in clarifying several points about the language.
+
+
+
+File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top
+
+GNU GENERAL PUBLIC LICENSE
+**************************
+
+ Version 1, February 1989
+
+ Copyright (C) 1989 Free Software Foundation, Inc.
+ 675 Mass Ave, Cambridge, MA 02139, USA
+
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+=========
+
+ The license agreements of most software companies try to keep users
+at the mercy of those companies. By contrast, our General Public
+License is intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+The General Public License applies to the Free Software Foundation's
+software and to any other program whose authors commit to using it.
+You can use it for your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Specifically, the General Public License is designed to make
+sure that you have the freedom to give away or sell copies of free
+software, that you receive source code or can get it if you want it,
+that you can change the software or use pieces of it in new free
+programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if
+you distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of a such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must tell them their rights.
+
+ We protect your rights with two steps: (1) copyright the software,
+and (2) offer you this license which gives you legal permission to
+copy, distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on,
+we want its recipients to know that what they have is not the
+original, so that any problems introduced by others will not reflect
+on the original authors' reputations.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 1. This License Agreement applies to any program or other work
+ which contains a notice placed by the copyright holder saying it
+ may be distributed under the terms of this General Public
+ License. The ``Program'', below, refers to any such program or
+ work, and a ``work based on the Program'' means either the
+ Program or any work containing the Program or a portion of it,
+ either verbatim or with modifications. Each licensee is
+ addressed as ``you''.
+
+ 2. You may copy and distribute verbatim copies of the Program's
+ source code as you receive it, in any medium, provided that you
+ conspicuously and appropriately publish on each copy an
+ appropriate copyright notice and disclaimer of warranty; keep
+ intact all the notices that refer to this General Public License
+ and to the absence of any warranty; and give any other
+ recipients of the Program a copy of this General Public License
+ along with the Program. You may charge a fee for the physical
+ act of transferring a copy.
+
+ 3. You may modify your copy or copies of the Program or any portion
+ of it, and copy and distribute such modifications under the
+ terms of Paragraph 1 above, provided that you also do the
+ following:
+
+ * cause the modified files to carry prominent notices stating
+ that you changed the files and the date of any change; and
+
+ * cause the whole of any work that you distribute or publish,
+ that in whole or in part contains the Program or any part
+ thereof, either with or without modifications, to be
+ licensed at no charge to all third parties under the terms
+ of this General Public License (except that you may choose
+ to grant warranty protection to some or all third parties,
+ at your option).
+
+ * If the modified program normally reads commands
+ interactively when run, you must cause it, when started
+ running for such interactive use in the simplest and most
+ usual way, to print or display an announcement including an
+ appropriate copyright notice and a notice that there is no
+ warranty (or else, saying that you provide a warranty) and
+ that users may redistribute the program under these
+ conditions, and telling the user how to view a copy of this
+ General Public License.
+
+ * You may charge a fee for the physical act of transferring a
+ copy, and you may at your option offer warranty protection
+ in exchange for a fee.
+
+ Mere aggregation of another independent work with the Program
+ (or its derivative) on a volume of a storage or distribution
+ medium does not bring the other work under the scope of these
+ terms.
+
+ 4. You may copy and distribute the Program (or a portion or
+ derivative of it, under Paragraph 2) in object code or
+ executable form under the terms of Paragraphs 1 and 2 above
+ provided that you also do one of the following:
+
+ * accompany it with the complete corresponding
+ machine-readable source code, which must be distributed
+ under the terms of Paragraphs 1 and 2 above; or,
+
+ * accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ charge for the cost of distribution) a complete
+ machine-readable copy of the corresponding source code, to
+ be distributed under the terms of Paragraphs 1 and 2 above;
+ or,
+
+ * accompany it with the information you received as to where
+ the corresponding source code may be obtained. (This
+ alternative is allowed only for noncommercial distribution
+ and only if you received the program in object code or
+ executable form alone.)
+
+ Source code for a work means the preferred form of the work for
+ making modifications to it. For an executable file, complete
+ source code means all the source code for all modules it
+ contains; but, as a special exception, it need not include
+ source code for modules which are standard libraries that
+ accompany the operating system on which the executable file
+ runs, or for standard header files or definitions files that
+ accompany that operating system.
+
+ 5. You may not copy, modify, sublicense, distribute or transfer the
+ Program except as expressly provided under this General Public
+ License. Any attempt otherwise to copy, modify, sublicense,
+ distribute or transfer the Program is void, and will
+ automatically terminate your rights to use the Program under
+ this License. However, parties who have received copies, or
+ rights to use copies, from you under this General Public License
+ will not have their licenses terminated so long as such parties
+ remain in full compliance.
+
+ 6. By copying, distributing or modifying the Program (or any work
+ based on the Program) you indicate your acceptance of this
+ license to do so, and all its terms and conditions.
+
+ 7. Each time you redistribute the Program (or any work based on the
+ Program), the recipient automatically receives a license from
+ the original licensor to copy, distribute or modify the Program
+ subject to these terms and conditions. You may not impose any
+ further restrictions on the recipients' exercise of the rights
+ granted herein.
+
+ 8. The Free Software Foundation may publish revised and/or new
+ versions of the General Public License from time to time. Such
+ new versions will be similar in spirit to the present version,
+ but may differ in detail to address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+ Program specifies a version number of the license which applies
+ to it and ``any later version'', you have the option of
+ following the terms and conditions either of that version or of
+ any later version published by the Free Software Foundation. If
+ the Program does not specify a version number of the license,
+ you may choose any version ever published by the Free Software
+ Foundation.
+
+ 9. If you wish to incorporate parts of the Program into other free
+ programs whose distribution conditions are different, write to
+ the author to ask for permission. For software which is
+ copyrighted by the Free Software Foundation, write to the Free
+ Software Foundation; we sometimes make exceptions for this. Our
+ decision will be guided by the two goals of preserving the free
+ status of all derivatives of our free software and of promoting
+ the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
+ WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
+ LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS''
+ WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
+ INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
+ ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
+ WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE
+ COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+ WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
+ MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
+ LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
+ INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
+ INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS
+ OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+ YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH
+ ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+Appendix: How to Apply These Terms to Your New Programs
+=======================================================
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to humanity, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these
+terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the ``copyright'' line and a pointer to where the full notice is found.
+
+ ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
+ Copyright (C) 19YY NAME OF AUTHOR
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 1, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+ Also add information on how to contact you by electronic and paper
+mail.
+
+If the program is interactive, make it output a short notice like
+this when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+ The hypothetical commands `show w' and `show c' should show the
+appropriate parts of the General Public License. Of course, the
+commands you use may be called something other than `show w' and
+`show c'; they could even be mouse-clicks or menu items--whatever
+suits your program.
+
+You should also get your employer (if you work as a programmer) or
+your school, if any, to sign a ``copyright disclaimer'' for the
+program, if necessary. Here a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the
+ program `Gnomovision' (a program to direct compilers to make passes
+ at assemblers) written by James Hacker.
+
+ SIGNATURE OF TY COON, 1 April 1989
+ Ty Coon, President of Vice
+
+That's all there is to it!
+
+
+
+File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top
+
+Using This Manual
+*****************
+
+The term `gawk' refers to a program (a version of `awk') developed by
+the Free Software Foundation, and to the language you use to tell it
+what to do. When we need to be careful, we call the program ``the
+`awk' utility'' and the language ``the `awk' language''. The purpose
+of this manual is to explain the `awk' language and how to run the
+`awk' utility.
+
+The term "`awk' program" refers to a program written by you in the
+`awk' programming language.
+
+*Note Getting Started::, for the bare essentials you need to know to
+start using `awk'.
+
+Useful ``one--liners'' are included to give you a feel for the `awk'
+language (*note One-liners::.).
+
+A sizable sample `awk' program has been provided for you (*note
+Sample Program::.).
+
+If you find terms that you aren't familiar with, try looking them up
+in the glossary (*note Glossary::.).
+
+Most of the time complete `awk' programs are used as examples, but in
+some of the more advanced sections, only the part of the `awk'
+program that illustrates the concept being described is shown.
+
+* Menu:
+
+This chapter contains the following sections:
+
+* The Files:: Sample data files for use in the `awk' programs
+ illustrated in this manual.
+
+
+
+File: gawk-info, Node: The Files, Up: This Manual
+
+Input Files for the Examples
+============================
+
+This manual contains many sample programs. The data for many of
+those programs comes from two files. The first file, called
+`BBS-list', represents a list of computer bulletin board systems and
+information about those systems.
+
+Each line of this file is one "record". Each record contains the
+name of a computer bulletin board, its phone number, the board's baud
+rate, and a code for the number of hours it is operational. An `A'
+in the last column means the board operates 24 hours all week. A `B'
+in the last column means the board operates evening and weekend
+hours, only. A `C' means the board operates only on weekends.
+
+ aardvark 555-5553 1200/300 B
+ alpo-net 555-3412 2400/1200/300 A
+ barfly 555-7685 1200/300 A
+ bites 555-1675 2400/1200/300 A
+ camelot 555-0542 300 C
+ core 555-2912 1200/300 C
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sdace 555-3430 2400/1200/300 A
+ sabafoo 555-2127 1200/300 C
+
+The second data file, called `inventory-shipped', represents
+information about shipments during the year. Each line of this file
+is also one record. Each record contains the month of the year, the
+number of green crates shipped, the number of red boxes shipped, the
+number of orange bags shipped, and the number of blue packages
+shipped, respectively.
+
+ Jan 13 25 15 115
+ Feb 15 32 24 226
+ Mar 15 24 34 228
+ Apr 31 52 63 420
+ May 16 34 29 208
+ Jun 31 42 75 492
+ Jul 24 34 67 436
+ Aug 15 34 47 316
+ Sep 13 55 37 277
+ Oct 29 54 68 525
+ Nov 20 87 82 577
+ Dec 17 35 61 401
+
+ Jan 21 36 64 620
+ Feb 26 58 80 652
+ Mar 24 75 70 495
+ Apr 21 70 74 514
+
+If you are reading this in GNU Emacs using Info, you can copy the
+regions of text showing these sample files into your own test files.
+This way you can try out the examples shown in the remainder of this
+document. You do this by using the command `M-x write-region' to
+copy text from the Info file into a file for use with `awk' (see your
+``GNU Emacs Manual'' for more information). Using this information,
+create your own `BBS-list' and `inventory-shipped' files, and
+practice what you learn in this manual.
+
+
+
+File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top
+
+Getting Started With `awk'
+**************************
+
+The basic function of `awk' is to search files for lines (or other
+units of text) that contain certain patterns. When a line matching
+any of those patterns is found, `awk' performs specified actions on
+that line. Then `awk' keeps processing input lines until the end of
+the file is reached.
+
+An `awk' "program" or "script" consists of a series of "rules".
+(They may also contain "function definitions", but that is an
+advanced feature, so let's ignore it for now. *Note User-defined::.)
+
+A rule contains a "pattern", an "action", or both. Actions are
+enclosed in curly braces to distinguish them from patterns.
+Therefore, an `awk' program is a sequence of rules in the form:
+
+ PATTERN { ACTION }
+ PATTERN { ACTION }
+ ...
+
+ * Menu:
+
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one--line example with two rules.
+* More Complex:: A more complex example.
+* Running gawk:: How to run gawk programs; includes command line syntax.
+* Comments:: Adding documentation to gawk programs.
+* Statements/Lines:: Subdividing or combining statements into lines.
+
+* When:: When to use gawk and when to use other things.
+
+
+
+File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started
+
+A Very Simple Example
+=====================
+
+The following command runs a simple `awk' program that searches the
+input file `BBS-list' for the string of characters: `foo'. (A string
+of characters is usually called, quite simply, a "string".)
+
+ awk '/foo/ { print $0 }' BBS-list
+
+When lines containing `foo' are found, they are printed, because
+`print $0' means print the current line. (Just `print' by itself
+also means the same thing, so we could have written that instead.)
+
+You will notice that slashes, `/', surround the string `foo' in the
+actual `awk' program. The slashes indicate that `foo' is a pattern
+to search for. This type of pattern is called a "regular
+expression", and is covered in more detail later (*note Regexp::.).
+There are single quotes around the `awk' program so that the shell
+won't interpret any of it as special shell characters.
+
+Here is what this program prints:
+
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sabafoo 555-2127 1200/300 C
+
+In an `awk' rule, either the pattern or the action can be omitted,
+but not both.
+
+If the pattern is omitted, then the action is performed for *every*
+input line.
+
+If the action is omitted, the default action is to print all lines
+that match the pattern. We could leave out the action (the print
+statement and the curly braces) in the above example, and the result
+would be the same: all lines matching the pattern `foo' would be
+printed. (By comparison, omitting the print statement but retaining
+the curly braces makes an empty action that does nothing; then no
+lines would be printed.)
+
+
+
+File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started
+
+An Example with Two Rules
+=========================
+
+The `awk' utility reads the input files one line at a time. For each
+line, `awk' tries the patterns of all the rules. If several patterns
+match then several actions are run, in the order in which they appear
+in the `awk' program. If no patterns match, then no actions are run.
+
+After processing all the rules (perhaps none) that match the line,
+`awk' reads the next line (however, *note Next::.). This continues
+until the end of the file is reached.
+
+For example, the `awk' program:
+
+ /12/ { print $0 }
+ /21/ { print $0 }
+
+contains two rules. The first rule has the string `12' as the
+pattern and `print $0' as the action. The second rule has the string
+`21' as the pattern and also has `print $0' as the action. Each
+rule's action is enclosed in its own pair of braces.
+
+This `awk' program prints every line that contains the string `12'
+*or* the string `21'. If a line contains both strings, it is printed
+twice, once by each rule.
+
+If we run this program on our two sample data files, `BBS-list' and
+`inventory-shipped', as shown here:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+we get the following output:
+
+ aardvark 555-5553 1200/300 B
+ alpo-net 555-3412 2400/1200/300 A
+ barfly 555-7685 1200/300 A
+ bites 555-1675 2400/1200/300 A
+ core 555-2912 1200/300 C
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sdace 555-3430 2400/1200/300 A
+ sabafoo 555-2127 1200/300 C
+ sabafoo 555-2127 1200/300 C
+ Jan 21 36 64 620
+ Apr 21 70 74 514
+
+Note how the line in `BBS-list' beginning with `sabafoo' was printed
+twice, once for each rule.
+
+
+
+File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started
+
+A More Complex Example
+======================
+
+Here is an example to give you an idea of what typical `awk' programs
+do. This example shows how `awk' can be used to summarize, select,
+and rearrange the output of another utility. It uses features that
+haven't been covered yet, so don't worry if you don't understand all
+the details.
+
+ ls -l | awk '$5 == "Nov" { sum += $4 }
+ END { print sum }'
+
+This command prints the total number of bytes in all the files in the
+current directory that were last modified in November (of any year).
+(In the C shell you would need to type a semicolon and then a
+backslash at the end of the first line; in the Bourne shell you can
+type the example as shown.)
+
+The `ls -l' part of this example is a command that gives you a full
+listing of all the files in a directory, including file size and date.
+Its output looks like this:
+
+ -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile
+ -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h
+ -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h
+ -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y
+ -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c
+ -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c
+ -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c
+ -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c
+
+The first field contains read--write permissions, the second field
+contains the number of links to the file, and the third field
+identifies the owner of the file. The fourth field contains the size
+of the file in bytes. The fifth, sixth, and seventh fields contain
+the month, day, and time, respectively, that the file was last
+modified. Finally, the eighth field contains the name of the file.
+
+The `$5 == "Nov"' in our `awk' program is an expression that tests
+whether the fifth field of the output from `ls -l' matches the string
+`Nov'. Each time a line has the string `Nov' in its fifth field, the
+action `{ sum += $4 }' is performed. This adds the fourth field (the
+file size) to the variable `sum'. As a result, when `awk' has
+finished reading all the input lines, `sum' will be the sum of the
+sizes of files whose lines matched the pattern.
+
+After the last line of output from `ls' has been processed, the `END'
+pattern is executed, and the value of `sum' is printed. In this
+example, the value of `sum' would be 80600.
+
+These more advanced `awk' techniques are covered in later sections
+(*note Actions::.). Before you can move on to more advanced `awk'
+programming, you have to know how `awk' interprets your input and
+displays your output. By manipulating "fields" and using special
+"print" statements, you can produce some very useful and spectacular
+looking reports.
+
+
+
+File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started
+
+How to Run `awk' Programs
+=========================
+
+There are several ways to run an `awk' program. If the program is
+short, it is easiest to include it in the command that runs `awk',
+like this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+ where PROGRAM consists of a series of PATTERNS and ACTIONS, as
+described earlier.
+
+When the program is long, you would probably prefer to put it in a
+file and run it with a command like this:
+
+ awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+ * Menu:
+
+* One-shot:: Running a short throw--away `awk' program.
+* Read Terminal:: Using no input files (input from terminal instead).
+* Long:: Putting permanent `awk' programs in files.
+* Executable Scripts:: Making self--contained `awk' programs.
+* Command Line:: How the `awk' command line is laid out.
+
+
+
+File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk
+
+One--shot Throw--away `awk' Programs
+------------------------------------
+
+Once you are familiar with `awk', you will often type simple programs
+at the moment you want to use them. Then you can write the program
+as the first argument of the `awk' command, like this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+ where PROGRAM consists of a series of PATTERNS and ACTIONS, as
+described earlier.
+
+This command format tells the shell to start `awk' and use the
+PROGRAM to process records in the input file(s). There are single
+quotes around the PROGRAM so that the shell doesn't interpret any
+`awk' characters as special shell characters. They cause the shell
+to treat all of PROGRAM as a single argument for `awk'. They also
+allow PROGRAM to be more than one line long.
+
+This format is also useful for running short or medium--sized `awk'
+programs from shell scripts, because it avoids the need for a
+separate file for the `awk' program. A self--contained shell script
+is more reliable since there are no other files to misplace.
+
+
+
+File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk
+
+Running `awk' without Input Files
+---------------------------------
+
+You can also use `awk' without any input files. If you type the
+command line:
+
+ awk 'PROGRAM'
+
+then `awk' applies the PROGRAM to the "standard input", which usually
+means whatever you type on the terminal. This continues until you
+indicate end--of--file by typing `Control-d'.
+
+For example, if you type:
+
+ awk '/th/'
+
+whatever you type next will be taken as data for that `awk' program.
+If you go on to type the following data,
+
+ Kathy
+ Ben
+ Tom
+ Beth
+ Seth
+ Karen
+ Thomas
+ `Control-d'
+
+then `awk' will print
+
+ Kathy
+ Beth
+ Seth
+
+as matching the pattern `th'. Notice that it did not recognize
+`Thomas' as matching the pattern. The `awk' language is "case
+sensitive", and matches patterns *exactly*.
+
+
+
+File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk
+
+Running Long Programs
+---------------------
+
+Sometimes your `awk' programs can be very long. In this case it is
+more convenient to put the program into a separate file. To tell
+`awk' to use that file for its program, you type:
+
+ awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+ The `-f' tells the `awk' utility to get the `awk' program from the
+file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For
+example, you could put the program:
+
+ /th/
+
+into the file `th-prog'. Then the command:
+
+ awk -f th-prog
+
+does the same thing as this one:
+
+ awk '/th/'
+
+which was explained earlier (*note Read Terminal::.). Note that you
+don't usually need single quotes around the file name that you
+specify with `-f', because most file names don't contain any of the
+shell's special characters.
+
+If you want to identify your `awk' program files clearly as such, you
+can add the extension `.awk' to the filename. This doesn't affect
+the execution of the `awk' program, but it does make ``housekeeping''
+easier.
+
+
+
+File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk
+
+Executable `awk' Programs
+-------------------------
+
+(The following section assumes that you are already somewhat familiar
+with `awk'.)
+
+Once you have learned `awk', you may want to write self--contained
+`awk' scripts, using the `#!' script mechanism. You can do this on
+BSD Unix systems and GNU.
+
+For example, you could create a text file named `hello', containing
+the following (where `BEGIN' is a feature we have not yet discussed):
+
+ #! /bin/awk -f
+
+ # a sample awk program
+
+ BEGIN { print "hello, world" }
+
+After making this file executable (with the `chmod' command), you can
+simply type:
+
+ hello
+
+at the shell, and the system will arrange to run `awk' as if you had
+typed:
+
+ awk -f hello
+
+Self--contained `awk' scripts are particularly useful for putting
+`awk' programs into production on your system, without your users
+having to know that they are actually using an `awk' program.
+
+If your system does not support the `#!' mechanism, you can get a
+similar effect using a regular shell script. It would look something
+like this:
+
+ : a sample awk program
+
+ awk 'PROGRAM' "$@"
+
+Using this technique, it is *vital* to enclose the PROGRAM in single
+quotes to protect it from interpretation by the shell. If you omit
+the quotes, only a shell wizard can predict the result.
+
+The `"$@"' causes the shell to forward all the command line arguments
+to the `awk' program, without interpretation.
+
+
+
+File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk
+
+Details of the `awk' Command Line
+---------------------------------
+
+(The following section assumes that you are already familiar with
+`awk'.)
+
+There are two ways to run `awk'. Here are templates for both of
+them; items enclosed in `[' and `]' in these templates are optional.
+
+ awk [ -FFS ] [ -- ] 'PROGRAM' FILE ...
+ awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ...
+
+ Options begin with a minus sign, and consist of a single character.
+The options and their meanings are as follows:
+
+`-FFS'
+ This sets the `FS' variable to FS (*note Special::.). As a
+ special case, if FS is `t', then `FS' will be set to the tab
+ character (`"\t"').
+
+`-f SOURCE-FILE'
+ Indicates that the `awk' program is to be found in SOURCE-FILE
+ instead of in the first non--option argument.
+
+`--'
+ This signals the end of the command line options. If you wish
+ to specify an input file named `-f', you can precede it with the
+ `--' argument to prevent the `-f' from being interpreted as an
+ option. This handling of `--' follows the POSIX argument
+ parsing conventions.
+
+Any other options will be flagged as invalid with a warning message,
+but are otherwise ignored.
+
+If the `-f' option is *not* used, then the first non--option command
+line argument is expected to be the program text.
+
+The `-f' option may be used more than once on the command line.
+`awk' will read its program source from all of the named files, as if
+they had been concatenated together into one big file. This is
+useful for creating libraries of `awk' functions. Useful functions
+can be written once, and then retrieved from a standard place,
+instead of having to be included into each individual program. You
+can still type in a program at the terminal and use library
+functions, by specifying `/dev/tty' as one of the arguments to a
+`-f'. Type your program, and end it with the keyboard end--of--file
+character `Control-d'.
+
+Any additional arguments on the command line are made available to
+your `awk' program in the `ARGV' array (*note Special::.). These
+arguments are normally treated as input files to be processed in the
+order specified. However, an argument that has the form VAR`='VALUE,
+means to assign the value VALUE to the variable VAR--it does not
+specify a file at all.
+
+Command line options and the program text (if present) are omitted
+from the `ARGV' array. All other arguments, including variable
+assignments, are included (*note Special::.).
+
+The distinction between file name arguments and variable--assignment
+arguments is made when `awk' is about to open the next input file.
+At that point in execution, it checks the ``file name'' to see
+whether it is really a variable assignment; if so, instead of trying
+to read a file it will, *at that point in the execution*, assign the
+variable.
+
+Therefore, the variables actually receive the specified values after
+all previously specified files have been read. In particular, the
+values of variables assigned in this fashion are *not* available
+inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run
+before `awk' begins scanning the argument list.
+
+The variable assignment feature is most useful for assigning to
+variables such as `RS', `OFS', and `ORS', which control input and
+output formats, before listing the data files. It is also useful for
+controlling state if multiple passes are needed over a data file.
+For example:
+
+ awk 'pass == 1 { PASS 1 STUFF }
+ pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
+
+
+
+File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started
+
+Comments in `awk' Programs
+==========================
+
+When you write a complicated `awk' program, you can put "comments" in
+the program file to help you remember what the program does, and how
+it works.
+
+A comment starts with the the sharp sign character, `#', and
+continues to the end of the line. The `awk' language ignores the
+rest of a line following a sharp sign. For example, we could have
+put the following into `th-prog':
+
+ # This program finds records containing the pattern `th'. This is how
+ # you continue comments on additional lines.
+ /th/
+
+You can put comment lines into keyboard--composed throw--away `awk'
+programs also, but this usually isn't very useful; the purpose of a
+comment is to help yourself or another person understand the program
+at another time.
+
+
+
+File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started
+
+`awk' Statements versus Lines
+=============================
+
+Most often, each line in an `awk' program is a separate statement or
+separate rule, like this:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+But sometimes statements can be more than one line, and lines can
+contain several statements.
+
+You can split a statement into multiple lines by inserting a newline
+after any of the following:
+
+ , { ? : || &&
+
+Lines ending in `do' or `else' automatically have their statements
+continued on the following line(s). A newline at any other point
+ends the statement.
+
+If you would like to split a single statement into two lines at a
+point where a newline would terminate it, you can "continue" it by
+ending the first line with a backslash character, `\'. This is
+allowed absolutely anywhere in the statement, even in the middle of a
+string or regular expression. For example:
+
+ awk '/This program is too long, so continue it\
+ on the next line/ { print $1 }'
+
+We have generally not used backslash continuation in the sample
+programs in this manual. Since there is no limit on the length of a
+line, it is never strictly necessary; it just makes programs
+prettier. We have preferred to make them even more pretty by keeping
+the statements short. Backslash continuation is most useful when
+your `awk' program is in a separate source file, instead of typed in
+on the command line.
+
+*Warning: this does not work if you are using the C shell.*
+Continuation with backslash works for `awk' programs in files, and
+also for one--shot programs *provided* you are using the Bourne
+shell, the Korn shell, or the Bourne--again shell. But the C shell
+used on Berkeley Unix behaves differently! There, you must use two
+backslashes in a row, followed by a newline.
+
+When `awk' statements within one rule are short, you might want to
+put more than one of them on a line. You do this by separating the
+statements with semicolons, `;'. This also applies to the rules
+themselves. Thus, the above example program could have been written:
+
+ /12/ { print $0 } ; /21/ { print $0 }
+
+*Note:* It is a new requirement that rules on the same line require
+semicolons as a separator in the `awk' language; it was done for
+consistency with the statements in the action part of rules.
+
+
+
+File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started
+
+When to Use `awk'
+=================
+
+What use is all of this to me, you might ask? Using additional
+operating system utilities, more advanced patterns, field separators,
+arithmetic statements, and other selection criteria, you can produce
+much more complex output. The `awk' language is very useful for
+producing reports from large amounts of raw data, like summarizing
+information from the output of standard operating system programs
+such as `ls'. (*Note A More Complex Example: More Complex.)
+
+Programs written with `awk' are usually much smaller than they would
+be in other languages. This makes `awk' programs easy to compose and
+use. Often `awk' programs can be quickly composed at your terminal,
+used once, and thrown away. Since `awk' programs are interpreted,
+you can avoid the usually lengthy edit--compile--test--debug cycle of
+software development.
+
+Complex programs have been written in `awk', including a complete
+retargetable assembler for 8--bit microprocessors (*note Glossary::.
+for more information) and a microcode assembler for a special purpose
+Prolog computer. However, `awk''s capabilities are strained by tasks
+of such complexity.
+
+If you find yourself writing `awk' scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated
+string or pattern matching capabilities. The shell is also good at
+string and pattern matching; in addition it allows powerful use of
+the standard utilities. More conventional languages like C, C++, or
+Lisp offer better facilities for system programming and for managing
+the complexity of large programs. Programs in these languages may
+require more lines of source code than the equivalent `awk' programs,
+but they will be easier to maintain and usually run more efficiently.
+
+
+
+File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top
+
+Reading Files (Input)
+*********************
+
+In the typical `awk' program, all input is read either from the
+standard input (usually the keyboard) or from files whose names you
+specify on the `awk' command line. If you specify input files, `awk'
+reads data from the first one until it reaches the end; then it reads
+the second file until it reaches the end, and so on. The name of the
+current input file can be found in the special variable `FILENAME'
+(*note Special::.).
+
+The input is split automatically into "records", and processed by the
+rules one record at a time. (Records are the units of text mentioned
+in the introduction; by default, a record is a line of text.) Each
+record read is split automatically into "fields", to make it more
+convenient for a rule to work on parts of the record under
+consideration.
+
+On rare occasions you will need to use the `getline' command, which
+can do explicit input from any number of files.
+
+* Menu:
+
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Field Separators:: The field separator and how to change it.
+* Multiple:: Reading multi--line records.
+
+* Assignment Options:: Setting variables on the command line and a summary
+ of command line syntax. This is an advanced method
+ of input.
+
+* Getline:: Reading files under explicit program control
+ using the `getline' function.
+* Close Input:: Closing an input file (so you can read from
+ the beginning once more).
+
+
+
+File: gawk-info, Node: Records, Next: Fields, Up: Reading Files
+
+How Input is Split into Records
+===============================
+
+The `awk' language divides its input into records and fields.
+Records are separated from each other by the "record separator". By
+default, the record separator is the "newline" character. Therefore,
+normally, a record is a line of text.
+
+Sometimes you may want to use a different character to separate your
+records. You can use different characters by changing the special
+variable `RS'.
+
+The value of `RS' is a string that says how to separate records; the
+default value is `"\n"', the string of just a newline character.
+This is why lines of text are the default record. Although `RS' can
+have any string as its value, only the first character of the string
+will be used as the record separator. The other characters are
+ignored. `RS' is exceptional in this regard; `awk' uses the full
+value of all its other special variables.
+
+The value of `RS' is changed by "assigning" it a new value (*note
+Assignment Ops::.). One way to do this is at the beginning of your
+`awk' program, before any input has been processed, using the special
+`BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to
+its new value before any input is read. The new value of `RS' is
+enclosed in quotation marks. For example:
+
+ awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
+
+changes the value of `RS' to `/', the slash character, before reading
+any input. Records are now separated by a slash. The second rule in
+the `awk' program (the action with no pattern) will proceed to print
+each record. Since each `print' statement adds a newline at the end
+of its output, the effect of this `awk' program is to copy the input
+with each slash changed to a newline.
+
+Another way to change the record separator is on the command line,
+using the variable--assignment feature (*note Command Line::.).
+
+ awk '...' RS="/" SOURCE-FILE
+
+`RS' will be set to `/' before processing SOURCE-FILE.
+
+The empty string (a string of no characters) has a special meaning as
+the value of `RS': it means that records are separated only by blank
+lines. *Note Multiple::, for more details.
+
+The `awk' utility keeps track of the number of records that have been
+read so far from the current input file. This value is stored in a
+special variable called `FNR'. It is reset to zero when a new file
+is started. Another variable, `NR', is the total number of input
+records read so far from all files. It starts at zero but is never
+automatically reset to zero.
+
+If you change the value of `RS' in the middle of an `awk' run, the
+new value is used to delimit subsequent records, but the record
+currently being processed (and records already finished) are not
+affected.
+
+