diff options
Diffstat (limited to 'gawk-info-1')
-rw-r--r-- | gawk-info-1 | 1231 |
1 files changed, 1231 insertions, 0 deletions
diff --git a/gawk-info-1 b/gawk-info-1 new file mode 100644 index 00000000..b40278a4 --- /dev/null +++ b/gawk-info-1 @@ -0,0 +1,1231 @@ +Info file gawk-info, produced by Makeinfo, -*- Text -*- from input +file gawk.texinfo. + +This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + +Copyright (C) 1989 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + +Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + + +File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir) + +This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them; it +contains the following chapters: + +* Menu: + +* Preface:: What you can do with `awk'; brief history + and acknowledgements. + +* License:: Your right to copy and distribute `gawk'. + +* This Manual:: Using this manual. + + Includes sample input files that you can use. + +* Getting Started:: A basic introduction to using `awk'. + How to run an `awk' program. Command line syntax. + +* Reading Files:: How to read files and manipulate fields. + +* Printing:: How to print using `awk'. Describes the + `print' and `printf' statements. + Also describes redirection of output. + +* One-liners:: Short, sample `awk' programs. + +* Patterns:: The various types of patterns explained in detail. + +* Actions:: The various types of actions are introduced here. + Describes expressions and the various operators in + detail. Also describes comparison expressions. + +* Statements:: The various control statements are described in + detail. + +* Arrays:: The description and use of arrays. Also includes + array--oriented control statements. + +* User-defined:: User--defined functions are described in detail. + +* Built-in:: The built--in functions are summarized here. + +* Special:: The special variables are summarized here. + +* Sample Program:: A sample `awk' program with a complete explanation. + +* Notes:: Something about the implementation of `gawk'. + +* Glossary:: An explanation of some unfamiliar terms. + +* Index:: + + + +File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top + +Preface +******* + +If you are like many computer users, you frequently would like to +make changes in various text files wherever certain patterns appear, +or extract data from parts of certain lines while discarding the +rest. To write a program to do this in a language such as C or +Pascal is a time--consuming inconvenience that may take many lines of +code. The job may be easier with `awk'. + +The `awk' utility interprets a special--purpose programming language +that makes it possible to handle simple data--reformatting jobs +easily with just a few lines of code. + +The GNU implementation of `awk' is called `gawk'; it is fully upward +compatible with the System V Release 3.1 and later version of `awk'. +All properly written `awk' programs should work with `gawk'. So we +usually don't distinguish between `gawk' and other `awk' +implementations in this manual. + +This manual teaches you what `awk' does and how you can use `awk' +effectively. You should already be familiar with basic, +general--purpose, operating system commands such as `ls'. Using +`awk' you can: + + * manage small, personal databases, + + * generate reports, + + * validate data, + + * produce indexes, and perform other document preparation tasks, + + * even experiment with algorithms that can be adapted later to + other computer languages! + +* Menu: + +* History:: The history of gawk and awk. Acknowledgements. + + + +File: gawk-info, Node: History, Up: Preface + +History of `awk' and `gawk' +=========================== + +The name `awk' comes from the initials of its designers: Alfred V. +Aho, Peter J. Weinberger, and Brian W. Kernighan. The original +version of `awk' was written in 1977. In 1985 a new version made the +programming language more powerful, introducing user--defined +functions, multiple input streams, and computed regular expressions. + +The GNU implementation, `gawk', was written in 1986 by Paul Rubin and +Jay Fenlason, with advice from Richard Stallman. John Woods +contributed parts of the code as well. In 1988, David Trueman, with +help from Arnold Robbins, reworked `gawk' for compatibility with the +newer `awk'. + +Many people need to be thanked for their assistance in producing this +manual. Jay Fenlason contributed many ideas and sample programs. +Richard Mlynarik and Robert Chassell gave helpful comments on drafts +of this manual. The paper ``A Supplemental Document for `awk''' by +John W. Pierce of the Chemistry Department at UC San Diego, +pinpointed several issues relevant both to `awk' implementation and +to this manual, that would otherwise have escaped us. + +Finally, we would like to thank Brian Kernighan of Bell Labs for +invaluable assistance during the testing and debugging of `gawk', and +for help in clarifying several points about the language. + + + +File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top + +GNU GENERAL PUBLIC LICENSE +************************** + + Version 1, February 1989 + + Copyright (C) 1989 Free Software Foundation, Inc. + 675 Mass Ave, Cambridge, MA 02139, USA + + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble +========= + + The license agreements of most software companies try to keep users +at the mercy of those companies. By contrast, our General Public +License is intended to guarantee your freedom to share and change +free software--to make sure the software is free for all its users. +The General Public License applies to the Free Software Foundation's +software and to any other program whose authors commit to using it. +You can use it for your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Specifically, the General Public License is designed to make +sure that you have the freedom to give away or sell copies of free +software, that you receive source code or can get it if you want it, +that you can change the software or use pieces of it in new free +programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if +you distribute copies of the software, or if you modify it. + + For example, if you distribute copies of a such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must tell them their rights. + + We protect your rights with two steps: (1) copyright the software, +and (2) offer you this license which gives you legal permission to +copy, distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, +we want its recipients to know that what they have is not the +original, so that any problems introduced by others will not reflect +on the original authors' reputations. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 1. This License Agreement applies to any program or other work + which contains a notice placed by the copyright holder saying it + may be distributed under the terms of this General Public + License. The ``Program'', below, refers to any such program or + work, and a ``work based on the Program'' means either the + Program or any work containing the Program or a portion of it, + either verbatim or with modifications. Each licensee is + addressed as ``you''. + + 2. You may copy and distribute verbatim copies of the Program's + source code as you receive it, in any medium, provided that you + conspicuously and appropriately publish on each copy an + appropriate copyright notice and disclaimer of warranty; keep + intact all the notices that refer to this General Public License + and to the absence of any warranty; and give any other + recipients of the Program a copy of this General Public License + along with the Program. You may charge a fee for the physical + act of transferring a copy. + + 3. You may modify your copy or copies of the Program or any portion + of it, and copy and distribute such modifications under the + terms of Paragraph 1 above, provided that you also do the + following: + + * cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + * cause the whole of any work that you distribute or publish, + that in whole or in part contains the Program or any part + thereof, either with or without modifications, to be + licensed at no charge to all third parties under the terms + of this General Public License (except that you may choose + to grant warranty protection to some or all third parties, + at your option). + + * If the modified program normally reads commands + interactively when run, you must cause it, when started + running for such interactive use in the simplest and most + usual way, to print or display an announcement including an + appropriate copyright notice and a notice that there is no + warranty (or else, saying that you provide a warranty) and + that users may redistribute the program under these + conditions, and telling the user how to view a copy of this + General Public License. + + * You may charge a fee for the physical act of transferring a + copy, and you may at your option offer warranty protection + in exchange for a fee. + + Mere aggregation of another independent work with the Program + (or its derivative) on a volume of a storage or distribution + medium does not bring the other work under the scope of these + terms. + + 4. You may copy and distribute the Program (or a portion or + derivative of it, under Paragraph 2) in object code or + executable form under the terms of Paragraphs 1 and 2 above + provided that you also do one of the following: + + * accompany it with the complete corresponding + machine-readable source code, which must be distributed + under the terms of Paragraphs 1 and 2 above; or, + + * accompany it with a written offer, valid for at least three + years, to give any third party free (except for a nominal + charge for the cost of distribution) a complete + machine-readable copy of the corresponding source code, to + be distributed under the terms of Paragraphs 1 and 2 above; + or, + + * accompany it with the information you received as to where + the corresponding source code may be obtained. (This + alternative is allowed only for noncommercial distribution + and only if you received the program in object code or + executable form alone.) + + Source code for a work means the preferred form of the work for + making modifications to it. For an executable file, complete + source code means all the source code for all modules it + contains; but, as a special exception, it need not include + source code for modules which are standard libraries that + accompany the operating system on which the executable file + runs, or for standard header files or definitions files that + accompany that operating system. + + 5. You may not copy, modify, sublicense, distribute or transfer the + Program except as expressly provided under this General Public + License. Any attempt otherwise to copy, modify, sublicense, + distribute or transfer the Program is void, and will + automatically terminate your rights to use the Program under + this License. However, parties who have received copies, or + rights to use copies, from you under this General Public License + will not have their licenses terminated so long as such parties + remain in full compliance. + + 6. By copying, distributing or modifying the Program (or any work + based on the Program) you indicate your acceptance of this + license to do so, and all its terms and conditions. + + 7. Each time you redistribute the Program (or any work based on the + Program), the recipient automatically receives a license from + the original licensor to copy, distribute or modify the Program + subject to these terms and conditions. You may not impose any + further restrictions on the recipients' exercise of the rights + granted herein. + + 8. The Free Software Foundation may publish revised and/or new + versions of the General Public License from time to time. Such + new versions will be similar in spirit to the present version, + but may differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the + Program specifies a version number of the license which applies + to it and ``any later version'', you have the option of + following the terms and conditions either of that version or of + any later version published by the Free Software Foundation. If + the Program does not specify a version number of the license, + you may choose any version ever published by the Free Software + Foundation. + + 9. If you wish to incorporate parts of the Program into other free + programs whose distribution conditions are different, write to + the author to ask for permission. For software which is + copyrighted by the Free Software Foundation, write to the Free + Software Foundation; we sometimes make exceptions for this. Our + decision will be guided by the two goals of preserving the free + status of all derivatives of our free software and of promoting + the sharing and reuse of software generally. + + NO WARRANTY + + 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO + WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE + LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT + HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS'' + WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, + INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE + ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS + WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE + COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN + WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY + MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE + LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, + INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR + INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS + OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY + YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH + ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + +Appendix: How to Apply These Terms to Your New Programs +======================================================= + + If you develop a new program, and you want it to be of the greatest +possible use to humanity, the best way to achieve this is to make it +free software which everyone can redistribute and change under these +terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the ``copyright'' line and a pointer to where the full notice is found. + + ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. + Copyright (C) 19YY NAME OF AUTHOR + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 1, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + + Also add information on how to contact you by electronic and paper +mail. + +If the program is interactive, make it output a short notice like +this when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + + The hypothetical commands `show w' and `show c' should show the +appropriate parts of the General Public License. Of course, the +commands you use may be called something other than `show w' and +`show c'; they could even be mouse-clicks or menu items--whatever +suits your program. + +You should also get your employer (if you work as a programmer) or +your school, if any, to sign a ``copyright disclaimer'' for the +program, if necessary. Here a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the + program `Gnomovision' (a program to direct compilers to make passes + at assemblers) written by James Hacker. + + SIGNATURE OF TY COON, 1 April 1989 + Ty Coon, President of Vice + +That's all there is to it! + + + +File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top + +Using This Manual +***************** + +The term `gawk' refers to a program (a version of `awk') developed by +the Free Software Foundation, and to the language you use to tell it +what to do. When we need to be careful, we call the program ``the +`awk' utility'' and the language ``the `awk' language''. The purpose +of this manual is to explain the `awk' language and how to run the +`awk' utility. + +The term "`awk' program" refers to a program written by you in the +`awk' programming language. + +*Note Getting Started::, for the bare essentials you need to know to +start using `awk'. + +Useful ``one--liners'' are included to give you a feel for the `awk' +language (*note One-liners::.). + +A sizable sample `awk' program has been provided for you (*note +Sample Program::.). + +If you find terms that you aren't familiar with, try looking them up +in the glossary (*note Glossary::.). + +Most of the time complete `awk' programs are used as examples, but in +some of the more advanced sections, only the part of the `awk' +program that illustrates the concept being described is shown. + +* Menu: + +This chapter contains the following sections: + +* The Files:: Sample data files for use in the `awk' programs + illustrated in this manual. + + + +File: gawk-info, Node: The Files, Up: This Manual + +Input Files for the Examples +============================ + +This manual contains many sample programs. The data for many of +those programs comes from two files. The first file, called +`BBS-list', represents a list of computer bulletin board systems and +information about those systems. + +Each line of this file is one "record". Each record contains the +name of a computer bulletin board, its phone number, the board's baud +rate, and a code for the number of hours it is operational. An `A' +in the last column means the board operates 24 hours all week. A `B' +in the last column means the board operates evening and weekend +hours, only. A `C' means the board operates only on weekends. + + aardvark 555-5553 1200/300 B + alpo-net 555-3412 2400/1200/300 A + barfly 555-7685 1200/300 A + bites 555-1675 2400/1200/300 A + camelot 555-0542 300 C + core 555-2912 1200/300 C + fooey 555-1234 2400/1200/300 B + foot 555-6699 1200/300 B + macfoo 555-6480 1200/300 A + sdace 555-3430 2400/1200/300 A + sabafoo 555-2127 1200/300 C + +The second data file, called `inventory-shipped', represents +information about shipments during the year. Each line of this file +is also one record. Each record contains the month of the year, the +number of green crates shipped, the number of red boxes shipped, the +number of orange bags shipped, and the number of blue packages +shipped, respectively. + + Jan 13 25 15 115 + Feb 15 32 24 226 + Mar 15 24 34 228 + Apr 31 52 63 420 + May 16 34 29 208 + Jun 31 42 75 492 + Jul 24 34 67 436 + Aug 15 34 47 316 + Sep 13 55 37 277 + Oct 29 54 68 525 + Nov 20 87 82 577 + Dec 17 35 61 401 + + Jan 21 36 64 620 + Feb 26 58 80 652 + Mar 24 75 70 495 + Apr 21 70 74 514 + +If you are reading this in GNU Emacs using Info, you can copy the +regions of text showing these sample files into your own test files. +This way you can try out the examples shown in the remainder of this +document. You do this by using the command `M-x write-region' to +copy text from the Info file into a file for use with `awk' (see your +``GNU Emacs Manual'' for more information). Using this information, +create your own `BBS-list' and `inventory-shipped' files, and +practice what you learn in this manual. + + + +File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top + +Getting Started With `awk' +************************** + +The basic function of `awk' is to search files for lines (or other +units of text) that contain certain patterns. When a line matching +any of those patterns is found, `awk' performs specified actions on +that line. Then `awk' keeps processing input lines until the end of +the file is reached. + +An `awk' "program" or "script" consists of a series of "rules". +(They may also contain "function definitions", but that is an +advanced feature, so let's ignore it for now. *Note User-defined::.) + +A rule contains a "pattern", an "action", or both. Actions are +enclosed in curly braces to distinguish them from patterns. +Therefore, an `awk' program is a sequence of rules in the form: + + PATTERN { ACTION } + PATTERN { ACTION } + ... + + * Menu: + +* Very Simple:: A very simple example. +* Two Rules:: A less simple one--line example with two rules. +* More Complex:: A more complex example. +* Running gawk:: How to run gawk programs; includes command line syntax. +* Comments:: Adding documentation to gawk programs. +* Statements/Lines:: Subdividing or combining statements into lines. + +* When:: When to use gawk and when to use other things. + + + +File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started + +A Very Simple Example +===================== + +The following command runs a simple `awk' program that searches the +input file `BBS-list' for the string of characters: `foo'. (A string +of characters is usually called, quite simply, a "string".) + + awk '/foo/ { print $0 }' BBS-list + +When lines containing `foo' are found, they are printed, because +`print $0' means print the current line. (Just `print' by itself +also means the same thing, so we could have written that instead.) + +You will notice that slashes, `/', surround the string `foo' in the +actual `awk' program. The slashes indicate that `foo' is a pattern +to search for. This type of pattern is called a "regular +expression", and is covered in more detail later (*note Regexp::.). +There are single quotes around the `awk' program so that the shell +won't interpret any of it as special shell characters. + +Here is what this program prints: + + fooey 555-1234 2400/1200/300 B + foot 555-6699 1200/300 B + macfoo 555-6480 1200/300 A + sabafoo 555-2127 1200/300 C + +In an `awk' rule, either the pattern or the action can be omitted, +but not both. + +If the pattern is omitted, then the action is performed for *every* +input line. + +If the action is omitted, the default action is to print all lines +that match the pattern. We could leave out the action (the print +statement and the curly braces) in the above example, and the result +would be the same: all lines matching the pattern `foo' would be +printed. (By comparison, omitting the print statement but retaining +the curly braces makes an empty action that does nothing; then no +lines would be printed.) + + + +File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started + +An Example with Two Rules +========================= + +The `awk' utility reads the input files one line at a time. For each +line, `awk' tries the patterns of all the rules. If several patterns +match then several actions are run, in the order in which they appear +in the `awk' program. If no patterns match, then no actions are run. + +After processing all the rules (perhaps none) that match the line, +`awk' reads the next line (however, *note Next::.). This continues +until the end of the file is reached. + +For example, the `awk' program: + + /12/ { print $0 } + /21/ { print $0 } + +contains two rules. The first rule has the string `12' as the +pattern and `print $0' as the action. The second rule has the string +`21' as the pattern and also has `print $0' as the action. Each +rule's action is enclosed in its own pair of braces. + +This `awk' program prints every line that contains the string `12' +*or* the string `21'. If a line contains both strings, it is printed +twice, once by each rule. + +If we run this program on our two sample data files, `BBS-list' and +`inventory-shipped', as shown here: + + awk '/12/ { print $0 } + /21/ { print $0 }' BBS-list inventory-shipped + +we get the following output: + + aardvark 555-5553 1200/300 B + alpo-net 555-3412 2400/1200/300 A + barfly 555-7685 1200/300 A + bites 555-1675 2400/1200/300 A + core 555-2912 1200/300 C + fooey 555-1234 2400/1200/300 B + foot 555-6699 1200/300 B + macfoo 555-6480 1200/300 A + sdace 555-3430 2400/1200/300 A + sabafoo 555-2127 1200/300 C + sabafoo 555-2127 1200/300 C + Jan 21 36 64 620 + Apr 21 70 74 514 + +Note how the line in `BBS-list' beginning with `sabafoo' was printed +twice, once for each rule. + + + +File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started + +A More Complex Example +====================== + +Here is an example to give you an idea of what typical `awk' programs +do. This example shows how `awk' can be used to summarize, select, +and rearrange the output of another utility. It uses features that +haven't been covered yet, so don't worry if you don't understand all +the details. + + ls -l | awk '$5 == "Nov" { sum += $4 } + END { print sum }' + +This command prints the total number of bytes in all the files in the +current directory that were last modified in November (of any year). +(In the C shell you would need to type a semicolon and then a +backslash at the end of the first line; in the Bourne shell you can +type the example as shown.) + +The `ls -l' part of this example is a command that gives you a full +listing of all the files in a directory, including file size and date. +Its output looks like this: + + -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile + -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h + -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h + -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y + -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c + -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c + -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c + -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c + +The first field contains read--write permissions, the second field +contains the number of links to the file, and the third field +identifies the owner of the file. The fourth field contains the size +of the file in bytes. The fifth, sixth, and seventh fields contain +the month, day, and time, respectively, that the file was last +modified. Finally, the eighth field contains the name of the file. + +The `$5 == "Nov"' in our `awk' program is an expression that tests +whether the fifth field of the output from `ls -l' matches the string +`Nov'. Each time a line has the string `Nov' in its fifth field, the +action `{ sum += $4 }' is performed. This adds the fourth field (the +file size) to the variable `sum'. As a result, when `awk' has +finished reading all the input lines, `sum' will be the sum of the +sizes of files whose lines matched the pattern. + +After the last line of output from `ls' has been processed, the `END' +pattern is executed, and the value of `sum' is printed. In this +example, the value of `sum' would be 80600. + +These more advanced `awk' techniques are covered in later sections +(*note Actions::.). Before you can move on to more advanced `awk' +programming, you have to know how `awk' interprets your input and +displays your output. By manipulating "fields" and using special +"print" statements, you can produce some very useful and spectacular +looking reports. + + + +File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started + +How to Run `awk' Programs +========================= + +There are several ways to run an `awk' program. If the program is +short, it is easiest to include it in the command that runs `awk', +like this: + + awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... + + where PROGRAM consists of a series of PATTERNS and ACTIONS, as +described earlier. + +When the program is long, you would probably prefer to put it in a +file and run it with a command like this: + + awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ... + + * Menu: + +* One-shot:: Running a short throw--away `awk' program. +* Read Terminal:: Using no input files (input from terminal instead). +* Long:: Putting permanent `awk' programs in files. +* Executable Scripts:: Making self--contained `awk' programs. +* Command Line:: How the `awk' command line is laid out. + + + +File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk + +One--shot Throw--away `awk' Programs +------------------------------------ + +Once you are familiar with `awk', you will often type simple programs +at the moment you want to use them. Then you can write the program +as the first argument of the `awk' command, like this: + + awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... + + where PROGRAM consists of a series of PATTERNS and ACTIONS, as +described earlier. + +This command format tells the shell to start `awk' and use the +PROGRAM to process records in the input file(s). There are single +quotes around the PROGRAM so that the shell doesn't interpret any +`awk' characters as special shell characters. They cause the shell +to treat all of PROGRAM as a single argument for `awk'. They also +allow PROGRAM to be more than one line long. + +This format is also useful for running short or medium--sized `awk' +programs from shell scripts, because it avoids the need for a +separate file for the `awk' program. A self--contained shell script +is more reliable since there are no other files to misplace. + + + +File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk + +Running `awk' without Input Files +--------------------------------- + +You can also use `awk' without any input files. If you type the +command line: + + awk 'PROGRAM' + +then `awk' applies the PROGRAM to the "standard input", which usually +means whatever you type on the terminal. This continues until you +indicate end--of--file by typing `Control-d'. + +For example, if you type: + + awk '/th/' + +whatever you type next will be taken as data for that `awk' program. +If you go on to type the following data, + + Kathy + Ben + Tom + Beth + Seth + Karen + Thomas + `Control-d' + +then `awk' will print + + Kathy + Beth + Seth + +as matching the pattern `th'. Notice that it did not recognize +`Thomas' as matching the pattern. The `awk' language is "case +sensitive", and matches patterns *exactly*. + + + +File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk + +Running Long Programs +--------------------- + +Sometimes your `awk' programs can be very long. In this case it is +more convenient to put the program into a separate file. To tell +`awk' to use that file for its program, you type: + + awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ... + + The `-f' tells the `awk' utility to get the `awk' program from the +file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For +example, you could put the program: + + /th/ + +into the file `th-prog'. Then the command: + + awk -f th-prog + +does the same thing as this one: + + awk '/th/' + +which was explained earlier (*note Read Terminal::.). Note that you +don't usually need single quotes around the file name that you +specify with `-f', because most file names don't contain any of the +shell's special characters. + +If you want to identify your `awk' program files clearly as such, you +can add the extension `.awk' to the filename. This doesn't affect +the execution of the `awk' program, but it does make ``housekeeping'' +easier. + + + +File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk + +Executable `awk' Programs +------------------------- + +(The following section assumes that you are already somewhat familiar +with `awk'.) + +Once you have learned `awk', you may want to write self--contained +`awk' scripts, using the `#!' script mechanism. You can do this on +BSD Unix systems and GNU. + +For example, you could create a text file named `hello', containing +the following (where `BEGIN' is a feature we have not yet discussed): + + #! /bin/awk -f + + # a sample awk program + + BEGIN { print "hello, world" } + +After making this file executable (with the `chmod' command), you can +simply type: + + hello + +at the shell, and the system will arrange to run `awk' as if you had +typed: + + awk -f hello + +Self--contained `awk' scripts are particularly useful for putting +`awk' programs into production on your system, without your users +having to know that they are actually using an `awk' program. + +If your system does not support the `#!' mechanism, you can get a +similar effect using a regular shell script. It would look something +like this: + + : a sample awk program + + awk 'PROGRAM' "$@" + +Using this technique, it is *vital* to enclose the PROGRAM in single +quotes to protect it from interpretation by the shell. If you omit +the quotes, only a shell wizard can predict the result. + +The `"$@"' causes the shell to forward all the command line arguments +to the `awk' program, without interpretation. + + + +File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk + +Details of the `awk' Command Line +--------------------------------- + +(The following section assumes that you are already familiar with +`awk'.) + +There are two ways to run `awk'. Here are templates for both of +them; items enclosed in `[' and `]' in these templates are optional. + + awk [ -FFS ] [ -- ] 'PROGRAM' FILE ... + awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ... + + Options begin with a minus sign, and consist of a single character. +The options and their meanings are as follows: + +`-FFS' + This sets the `FS' variable to FS (*note Special::.). As a + special case, if FS is `t', then `FS' will be set to the tab + character (`"\t"'). + +`-f SOURCE-FILE' + Indicates that the `awk' program is to be found in SOURCE-FILE + instead of in the first non--option argument. + +`--' + This signals the end of the command line options. If you wish + to specify an input file named `-f', you can precede it with the + `--' argument to prevent the `-f' from being interpreted as an + option. This handling of `--' follows the POSIX argument + parsing conventions. + +Any other options will be flagged as invalid with a warning message, +but are otherwise ignored. + +If the `-f' option is *not* used, then the first non--option command +line argument is expected to be the program text. + +The `-f' option may be used more than once on the command line. +`awk' will read its program source from all of the named files, as if +they had been concatenated together into one big file. This is +useful for creating libraries of `awk' functions. Useful functions +can be written once, and then retrieved from a standard place, +instead of having to be included into each individual program. You +can still type in a program at the terminal and use library +functions, by specifying `/dev/tty' as one of the arguments to a +`-f'. Type your program, and end it with the keyboard end--of--file +character `Control-d'. + +Any additional arguments on the command line are made available to +your `awk' program in the `ARGV' array (*note Special::.). These +arguments are normally treated as input files to be processed in the +order specified. However, an argument that has the form VAR`='VALUE, +means to assign the value VALUE to the variable VAR--it does not +specify a file at all. + +Command line options and the program text (if present) are omitted +from the `ARGV' array. All other arguments, including variable +assignments, are included (*note Special::.). + +The distinction between file name arguments and variable--assignment +arguments is made when `awk' is about to open the next input file. +At that point in execution, it checks the ``file name'' to see +whether it is really a variable assignment; if so, instead of trying +to read a file it will, *at that point in the execution*, assign the +variable. + +Therefore, the variables actually receive the specified values after +all previously specified files have been read. In particular, the +values of variables assigned in this fashion are *not* available +inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run +before `awk' begins scanning the argument list. + +The variable assignment feature is most useful for assigning to +variables such as `RS', `OFS', and `ORS', which control input and +output formats, before listing the data files. It is also useful for +controlling state if multiple passes are needed over a data file. +For example: + + awk 'pass == 1 { PASS 1 STUFF } + pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile + + + +File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started + +Comments in `awk' Programs +========================== + +When you write a complicated `awk' program, you can put "comments" in +the program file to help you remember what the program does, and how +it works. + +A comment starts with the the sharp sign character, `#', and +continues to the end of the line. The `awk' language ignores the +rest of a line following a sharp sign. For example, we could have +put the following into `th-prog': + + # This program finds records containing the pattern `th'. This is how + # you continue comments on additional lines. + /th/ + +You can put comment lines into keyboard--composed throw--away `awk' +programs also, but this usually isn't very useful; the purpose of a +comment is to help yourself or another person understand the program +at another time. + + + +File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started + +`awk' Statements versus Lines +============================= + +Most often, each line in an `awk' program is a separate statement or +separate rule, like this: + + awk '/12/ { print $0 } + /21/ { print $0 }' BBS-list inventory-shipped + +But sometimes statements can be more than one line, and lines can +contain several statements. + +You can split a statement into multiple lines by inserting a newline +after any of the following: + + , { ? : || && + +Lines ending in `do' or `else' automatically have their statements +continued on the following line(s). A newline at any other point +ends the statement. + +If you would like to split a single statement into two lines at a +point where a newline would terminate it, you can "continue" it by +ending the first line with a backslash character, `\'. This is +allowed absolutely anywhere in the statement, even in the middle of a +string or regular expression. For example: + + awk '/This program is too long, so continue it\ + on the next line/ { print $1 }' + +We have generally not used backslash continuation in the sample +programs in this manual. Since there is no limit on the length of a +line, it is never strictly necessary; it just makes programs +prettier. We have preferred to make them even more pretty by keeping +the statements short. Backslash continuation is most useful when +your `awk' program is in a separate source file, instead of typed in +on the command line. + +*Warning: this does not work if you are using the C shell.* +Continuation with backslash works for `awk' programs in files, and +also for one--shot programs *provided* you are using the Bourne +shell, the Korn shell, or the Bourne--again shell. But the C shell +used on Berkeley Unix behaves differently! There, you must use two +backslashes in a row, followed by a newline. + +When `awk' statements within one rule are short, you might want to +put more than one of them on a line. You do this by separating the +statements with semicolons, `;'. This also applies to the rules +themselves. Thus, the above example program could have been written: + + /12/ { print $0 } ; /21/ { print $0 } + +*Note:* It is a new requirement that rules on the same line require +semicolons as a separator in the `awk' language; it was done for +consistency with the statements in the action part of rules. + + + +File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started + +When to Use `awk' +================= + +What use is all of this to me, you might ask? Using additional +operating system utilities, more advanced patterns, field separators, +arithmetic statements, and other selection criteria, you can produce +much more complex output. The `awk' language is very useful for +producing reports from large amounts of raw data, like summarizing +information from the output of standard operating system programs +such as `ls'. (*Note A More Complex Example: More Complex.) + +Programs written with `awk' are usually much smaller than they would +be in other languages. This makes `awk' programs easy to compose and +use. Often `awk' programs can be quickly composed at your terminal, +used once, and thrown away. Since `awk' programs are interpreted, +you can avoid the usually lengthy edit--compile--test--debug cycle of +software development. + +Complex programs have been written in `awk', including a complete +retargetable assembler for 8--bit microprocessors (*note Glossary::. +for more information) and a microcode assembler for a special purpose +Prolog computer. However, `awk''s capabilities are strained by tasks +of such complexity. + +If you find yourself writing `awk' scripts of more than, say, a few +hundred lines, you might consider using a different programming +language. Emacs Lisp is a good choice if you need sophisticated +string or pattern matching capabilities. The shell is also good at +string and pattern matching; in addition it allows powerful use of +the standard utilities. More conventional languages like C, C++, or +Lisp offer better facilities for system programming and for managing +the complexity of large programs. Programs in these languages may +require more lines of source code than the equivalent `awk' programs, +but they will be easier to maintain and usually run more efficiently. + + + +File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top + +Reading Files (Input) +********************* + +In the typical `awk' program, all input is read either from the +standard input (usually the keyboard) or from files whose names you +specify on the `awk' command line. If you specify input files, `awk' +reads data from the first one until it reaches the end; then it reads +the second file until it reaches the end, and so on. The name of the +current input file can be found in the special variable `FILENAME' +(*note Special::.). + +The input is split automatically into "records", and processed by the +rules one record at a time. (Records are the units of text mentioned +in the introduction; by default, a record is a line of text.) Each +record read is split automatically into "fields", to make it more +convenient for a rule to work on parts of the record under +consideration. + +On rare occasions you will need to use the `getline' command, which +can do explicit input from any number of files. + +* Menu: + +* Records:: Controlling how data is split into records. +* Fields:: An introduction to fields. +* Field Separators:: The field separator and how to change it. +* Multiple:: Reading multi--line records. + +* Assignment Options:: Setting variables on the command line and a summary + of command line syntax. This is an advanced method + of input. + +* Getline:: Reading files under explicit program control + using the `getline' function. +* Close Input:: Closing an input file (so you can read from + the beginning once more). + + + +File: gawk-info, Node: Records, Next: Fields, Up: Reading Files + +How Input is Split into Records +=============================== + +The `awk' language divides its input into records and fields. +Records are separated from each other by the "record separator". By +default, the record separator is the "newline" character. Therefore, +normally, a record is a line of text. + +Sometimes you may want to use a different character to separate your +records. You can use different characters by changing the special +variable `RS'. + +The value of `RS' is a string that says how to separate records; the +default value is `"\n"', the string of just a newline character. +This is why lines of text are the default record. Although `RS' can +have any string as its value, only the first character of the string +will be used as the record separator. The other characters are +ignored. `RS' is exceptional in this regard; `awk' uses the full +value of all its other special variables. + +The value of `RS' is changed by "assigning" it a new value (*note +Assignment Ops::.). One way to do this is at the beginning of your +`awk' program, before any input has been processed, using the special +`BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to +its new value before any input is read. The new value of `RS' is +enclosed in quotation marks. For example: + + awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list + +changes the value of `RS' to `/', the slash character, before reading +any input. Records are now separated by a slash. The second rule in +the `awk' program (the action with no pattern) will proceed to print +each record. Since each `print' statement adds a newline at the end +of its output, the effect of this `awk' program is to copy the input +with each slash changed to a newline. + +Another way to change the record separator is on the command line, +using the variable--assignment feature (*note Command Line::.). + + awk '...' RS="/" SOURCE-FILE + +`RS' will be set to `/' before processing SOURCE-FILE. + +The empty string (a string of no characters) has a special meaning as +the value of `RS': it means that records are separated only by blank +lines. *Note Multiple::, for more details. + +The `awk' utility keeps track of the number of records that have been +read so far from the current input file. This value is stored in a +special variable called `FNR'. It is reset to zero when a new file +is started. Another variable, `NR', is the total number of input +records read so far from all files. It starts at zero but is never +automatically reset to zero. + +If you change the value of `RS' in the middle of an `awk' run, the +new value is used to delimit subsequent records, but the record +currently being processed (and records already finished) are not +affected. + + |