diff options
Diffstat (limited to 'gawk-info-1')
-rw-r--r-- | gawk-info-1 | 1231 |
1 files changed, 0 insertions, 1231 deletions
diff --git a/gawk-info-1 b/gawk-info-1 deleted file mode 100644 index b40278a4..00000000 --- a/gawk-info-1 +++ /dev/null @@ -1,1231 +0,0 @@ -Info file gawk-info, produced by Makeinfo, -*- Text -*- from input -file gawk.texinfo. - -This file documents `awk', a program that you can use to select -particular records in a file and perform operations upon them. - -Copyright (C) 1989 Free Software Foundation, Inc. - -Permission is granted to make and distribute verbatim copies of this -manual provided the copyright notice and this permission notice are -preserved on all copies. - -Permission is granted to copy and distribute modified versions of -this manual under the conditions for verbatim copying, provided that -the entire resulting derived work is distributed under the terms of a -permission notice identical to this one. - -Permission is granted to copy and distribute translations of this -manual into another language, under the above conditions for modified -versions, except that this permission notice may be stated in a -translation approved by the Foundation. - - - -File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir) - -This file documents `awk', a program that you can use to select -particular records in a file and perform operations upon them; it -contains the following chapters: - -* Menu: - -* Preface:: What you can do with `awk'; brief history - and acknowledgements. - -* License:: Your right to copy and distribute `gawk'. - -* This Manual:: Using this manual. - - Includes sample input files that you can use. - -* Getting Started:: A basic introduction to using `awk'. - How to run an `awk' program. Command line syntax. - -* Reading Files:: How to read files and manipulate fields. - -* Printing:: How to print using `awk'. Describes the - `print' and `printf' statements. - Also describes redirection of output. - -* One-liners:: Short, sample `awk' programs. - -* Patterns:: The various types of patterns explained in detail. - -* Actions:: The various types of actions are introduced here. - Describes expressions and the various operators in - detail. Also describes comparison expressions. - -* Statements:: The various control statements are described in - detail. - -* Arrays:: The description and use of arrays. Also includes - array--oriented control statements. - -* User-defined:: User--defined functions are described in detail. - -* Built-in:: The built--in functions are summarized here. - -* Special:: The special variables are summarized here. - -* Sample Program:: A sample `awk' program with a complete explanation. - -* Notes:: Something about the implementation of `gawk'. - -* Glossary:: An explanation of some unfamiliar terms. - -* Index:: - - - -File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top - -Preface -******* - -If you are like many computer users, you frequently would like to -make changes in various text files wherever certain patterns appear, -or extract data from parts of certain lines while discarding the -rest. To write a program to do this in a language such as C or -Pascal is a time--consuming inconvenience that may take many lines of -code. The job may be easier with `awk'. - -The `awk' utility interprets a special--purpose programming language -that makes it possible to handle simple data--reformatting jobs -easily with just a few lines of code. - -The GNU implementation of `awk' is called `gawk'; it is fully upward -compatible with the System V Release 3.1 and later version of `awk'. -All properly written `awk' programs should work with `gawk'. So we -usually don't distinguish between `gawk' and other `awk' -implementations in this manual. - -This manual teaches you what `awk' does and how you can use `awk' -effectively. You should already be familiar with basic, -general--purpose, operating system commands such as `ls'. Using -`awk' you can: - - * manage small, personal databases, - - * generate reports, - - * validate data, - - * produce indexes, and perform other document preparation tasks, - - * even experiment with algorithms that can be adapted later to - other computer languages! - -* Menu: - -* History:: The history of gawk and awk. Acknowledgements. - - - -File: gawk-info, Node: History, Up: Preface - -History of `awk' and `gawk' -=========================== - -The name `awk' comes from the initials of its designers: Alfred V. -Aho, Peter J. Weinberger, and Brian W. Kernighan. The original -version of `awk' was written in 1977. In 1985 a new version made the -programming language more powerful, introducing user--defined -functions, multiple input streams, and computed regular expressions. - -The GNU implementation, `gawk', was written in 1986 by Paul Rubin and -Jay Fenlason, with advice from Richard Stallman. John Woods -contributed parts of the code as well. In 1988, David Trueman, with -help from Arnold Robbins, reworked `gawk' for compatibility with the -newer `awk'. - -Many people need to be thanked for their assistance in producing this -manual. Jay Fenlason contributed many ideas and sample programs. -Richard Mlynarik and Robert Chassell gave helpful comments on drafts -of this manual. The paper ``A Supplemental Document for `awk''' by -John W. Pierce of the Chemistry Department at UC San Diego, -pinpointed several issues relevant both to `awk' implementation and -to this manual, that would otherwise have escaped us. - -Finally, we would like to thank Brian Kernighan of Bell Labs for -invaluable assistance during the testing and debugging of `gawk', and -for help in clarifying several points about the language. - - - -File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top - -GNU GENERAL PUBLIC LICENSE -************************** - - Version 1, February 1989 - - Copyright (C) 1989 Free Software Foundation, Inc. - 675 Mass Ave, Cambridge, MA 02139, USA - - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble -========= - - The license agreements of most software companies try to keep users -at the mercy of those companies. By contrast, our General Public -License is intended to guarantee your freedom to share and change -free software--to make sure the software is free for all its users. -The General Public License applies to the Free Software Foundation's -software and to any other program whose authors commit to using it. -You can use it for your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Specifically, the General Public License is designed to make -sure that you have the freedom to give away or sell copies of free -software, that you receive source code or can get it if you want it, -that you can change the software or use pieces of it in new free -programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if -you distribute copies of the software, or if you modify it. - - For example, if you distribute copies of a such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must tell them their rights. - - We protect your rights with two steps: (1) copyright the software, -and (2) offer you this license which gives you legal permission to -copy, distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, -we want its recipients to know that what they have is not the -original, so that any problems introduced by others will not reflect -on the original authors' reputations. - - The precise terms and conditions for copying, distribution and -modification follow. - - TERMS AND CONDITIONS - - 1. This License Agreement applies to any program or other work - which contains a notice placed by the copyright holder saying it - may be distributed under the terms of this General Public - License. The ``Program'', below, refers to any such program or - work, and a ``work based on the Program'' means either the - Program or any work containing the Program or a portion of it, - either verbatim or with modifications. Each licensee is - addressed as ``you''. - - 2. You may copy and distribute verbatim copies of the Program's - source code as you receive it, in any medium, provided that you - conspicuously and appropriately publish on each copy an - appropriate copyright notice and disclaimer of warranty; keep - intact all the notices that refer to this General Public License - and to the absence of any warranty; and give any other - recipients of the Program a copy of this General Public License - along with the Program. You may charge a fee for the physical - act of transferring a copy. - - 3. You may modify your copy or copies of the Program or any portion - of it, and copy and distribute such modifications under the - terms of Paragraph 1 above, provided that you also do the - following: - - * cause the modified files to carry prominent notices stating - that you changed the files and the date of any change; and - - * cause the whole of any work that you distribute or publish, - that in whole or in part contains the Program or any part - thereof, either with or without modifications, to be - licensed at no charge to all third parties under the terms - of this General Public License (except that you may choose - to grant warranty protection to some or all third parties, - at your option). - - * If the modified program normally reads commands - interactively when run, you must cause it, when started - running for such interactive use in the simplest and most - usual way, to print or display an announcement including an - appropriate copyright notice and a notice that there is no - warranty (or else, saying that you provide a warranty) and - that users may redistribute the program under these - conditions, and telling the user how to view a copy of this - General Public License. - - * You may charge a fee for the physical act of transferring a - copy, and you may at your option offer warranty protection - in exchange for a fee. - - Mere aggregation of another independent work with the Program - (or its derivative) on a volume of a storage or distribution - medium does not bring the other work under the scope of these - terms. - - 4. You may copy and distribute the Program (or a portion or - derivative of it, under Paragraph 2) in object code or - executable form under the terms of Paragraphs 1 and 2 above - provided that you also do one of the following: - - * accompany it with the complete corresponding - machine-readable source code, which must be distributed - under the terms of Paragraphs 1 and 2 above; or, - - * accompany it with a written offer, valid for at least three - years, to give any third party free (except for a nominal - charge for the cost of distribution) a complete - machine-readable copy of the corresponding source code, to - be distributed under the terms of Paragraphs 1 and 2 above; - or, - - * accompany it with the information you received as to where - the corresponding source code may be obtained. (This - alternative is allowed only for noncommercial distribution - and only if you received the program in object code or - executable form alone.) - - Source code for a work means the preferred form of the work for - making modifications to it. For an executable file, complete - source code means all the source code for all modules it - contains; but, as a special exception, it need not include - source code for modules which are standard libraries that - accompany the operating system on which the executable file - runs, or for standard header files or definitions files that - accompany that operating system. - - 5. You may not copy, modify, sublicense, distribute or transfer the - Program except as expressly provided under this General Public - License. Any attempt otherwise to copy, modify, sublicense, - distribute or transfer the Program is void, and will - automatically terminate your rights to use the Program under - this License. However, parties who have received copies, or - rights to use copies, from you under this General Public License - will not have their licenses terminated so long as such parties - remain in full compliance. - - 6. By copying, distributing or modifying the Program (or any work - based on the Program) you indicate your acceptance of this - license to do so, and all its terms and conditions. - - 7. Each time you redistribute the Program (or any work based on the - Program), the recipient automatically receives a license from - the original licensor to copy, distribute or modify the Program - subject to these terms and conditions. You may not impose any - further restrictions on the recipients' exercise of the rights - granted herein. - - 8. The Free Software Foundation may publish revised and/or new - versions of the General Public License from time to time. Such - new versions will be similar in spirit to the present version, - but may differ in detail to address new problems or concerns. - - Each version is given a distinguishing version number. If the - Program specifies a version number of the license which applies - to it and ``any later version'', you have the option of - following the terms and conditions either of that version or of - any later version published by the Free Software Foundation. If - the Program does not specify a version number of the license, - you may choose any version ever published by the Free Software - Foundation. - - 9. If you wish to incorporate parts of the Program into other free - programs whose distribution conditions are different, write to - the author to ask for permission. For software which is - copyrighted by the Free Software Foundation, write to the Free - Software Foundation; we sometimes make exceptions for this. Our - decision will be guided by the two goals of preserving the free - status of all derivatives of our free software and of promoting - the sharing and reuse of software generally. - - NO WARRANTY - - 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO - WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE - LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT - HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS'' - WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, - INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF - MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE - ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS - WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE - COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - - 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN - WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY - MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE - LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, - INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR - INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS - OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY - YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH - ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN - ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS - -Appendix: How to Apply These Terms to Your New Programs -======================================================= - - If you develop a new program, and you want it to be of the greatest -possible use to humanity, the best way to achieve this is to make it -free software which everyone can redistribute and change under these -terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the ``copyright'' line and a pointer to where the full notice is found. - - ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. - Copyright (C) 19YY NAME OF AUTHOR - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 1, or (at your option) - any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - - Also add information on how to contact you by electronic and paper -mail. - -If the program is interactive, make it output a short notice like -this when it starts in an interactive mode: - - Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR - Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - - The hypothetical commands `show w' and `show c' should show the -appropriate parts of the General Public License. Of course, the -commands you use may be called something other than `show w' and -`show c'; they could even be mouse-clicks or menu items--whatever -suits your program. - -You should also get your employer (if you work as a programmer) or -your school, if any, to sign a ``copyright disclaimer'' for the -program, if necessary. Here a sample; alter the names: - - Yoyodyne, Inc., hereby disclaims all copyright interest in the - program `Gnomovision' (a program to direct compilers to make passes - at assemblers) written by James Hacker. - - SIGNATURE OF TY COON, 1 April 1989 - Ty Coon, President of Vice - -That's all there is to it! - - - -File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top - -Using This Manual -***************** - -The term `gawk' refers to a program (a version of `awk') developed by -the Free Software Foundation, and to the language you use to tell it -what to do. When we need to be careful, we call the program ``the -`awk' utility'' and the language ``the `awk' language''. The purpose -of this manual is to explain the `awk' language and how to run the -`awk' utility. - -The term "`awk' program" refers to a program written by you in the -`awk' programming language. - -*Note Getting Started::, for the bare essentials you need to know to -start using `awk'. - -Useful ``one--liners'' are included to give you a feel for the `awk' -language (*note One-liners::.). - -A sizable sample `awk' program has been provided for you (*note -Sample Program::.). - -If you find terms that you aren't familiar with, try looking them up -in the glossary (*note Glossary::.). - -Most of the time complete `awk' programs are used as examples, but in -some of the more advanced sections, only the part of the `awk' -program that illustrates the concept being described is shown. - -* Menu: - -This chapter contains the following sections: - -* The Files:: Sample data files for use in the `awk' programs - illustrated in this manual. - - - -File: gawk-info, Node: The Files, Up: This Manual - -Input Files for the Examples -============================ - -This manual contains many sample programs. The data for many of -those programs comes from two files. The first file, called -`BBS-list', represents a list of computer bulletin board systems and -information about those systems. - -Each line of this file is one "record". Each record contains the -name of a computer bulletin board, its phone number, the board's baud -rate, and a code for the number of hours it is operational. An `A' -in the last column means the board operates 24 hours all week. A `B' -in the last column means the board operates evening and weekend -hours, only. A `C' means the board operates only on weekends. - - aardvark 555-5553 1200/300 B - alpo-net 555-3412 2400/1200/300 A - barfly 555-7685 1200/300 A - bites 555-1675 2400/1200/300 A - camelot 555-0542 300 C - core 555-2912 1200/300 C - fooey 555-1234 2400/1200/300 B - foot 555-6699 1200/300 B - macfoo 555-6480 1200/300 A - sdace 555-3430 2400/1200/300 A - sabafoo 555-2127 1200/300 C - -The second data file, called `inventory-shipped', represents -information about shipments during the year. Each line of this file -is also one record. Each record contains the month of the year, the -number of green crates shipped, the number of red boxes shipped, the -number of orange bags shipped, and the number of blue packages -shipped, respectively. - - Jan 13 25 15 115 - Feb 15 32 24 226 - Mar 15 24 34 228 - Apr 31 52 63 420 - May 16 34 29 208 - Jun 31 42 75 492 - Jul 24 34 67 436 - Aug 15 34 47 316 - Sep 13 55 37 277 - Oct 29 54 68 525 - Nov 20 87 82 577 - Dec 17 35 61 401 - - Jan 21 36 64 620 - Feb 26 58 80 652 - Mar 24 75 70 495 - Apr 21 70 74 514 - -If you are reading this in GNU Emacs using Info, you can copy the -regions of text showing these sample files into your own test files. -This way you can try out the examples shown in the remainder of this -document. You do this by using the command `M-x write-region' to -copy text from the Info file into a file for use with `awk' (see your -``GNU Emacs Manual'' for more information). Using this information, -create your own `BBS-list' and `inventory-shipped' files, and -practice what you learn in this manual. - - - -File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top - -Getting Started With `awk' -************************** - -The basic function of `awk' is to search files for lines (or other -units of text) that contain certain patterns. When a line matching -any of those patterns is found, `awk' performs specified actions on -that line. Then `awk' keeps processing input lines until the end of -the file is reached. - -An `awk' "program" or "script" consists of a series of "rules". -(They may also contain "function definitions", but that is an -advanced feature, so let's ignore it for now. *Note User-defined::.) - -A rule contains a "pattern", an "action", or both. Actions are -enclosed in curly braces to distinguish them from patterns. -Therefore, an `awk' program is a sequence of rules in the form: - - PATTERN { ACTION } - PATTERN { ACTION } - ... - - * Menu: - -* Very Simple:: A very simple example. -* Two Rules:: A less simple one--line example with two rules. -* More Complex:: A more complex example. -* Running gawk:: How to run gawk programs; includes command line syntax. -* Comments:: Adding documentation to gawk programs. -* Statements/Lines:: Subdividing or combining statements into lines. - -* When:: When to use gawk and when to use other things. - - - -File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started - -A Very Simple Example -===================== - -The following command runs a simple `awk' program that searches the -input file `BBS-list' for the string of characters: `foo'. (A string -of characters is usually called, quite simply, a "string".) - - awk '/foo/ { print $0 }' BBS-list - -When lines containing `foo' are found, they are printed, because -`print $0' means print the current line. (Just `print' by itself -also means the same thing, so we could have written that instead.) - -You will notice that slashes, `/', surround the string `foo' in the -actual `awk' program. The slashes indicate that `foo' is a pattern -to search for. This type of pattern is called a "regular -expression", and is covered in more detail later (*note Regexp::.). -There are single quotes around the `awk' program so that the shell -won't interpret any of it as special shell characters. - -Here is what this program prints: - - fooey 555-1234 2400/1200/300 B - foot 555-6699 1200/300 B - macfoo 555-6480 1200/300 A - sabafoo 555-2127 1200/300 C - -In an `awk' rule, either the pattern or the action can be omitted, -but not both. - -If the pattern is omitted, then the action is performed for *every* -input line. - -If the action is omitted, the default action is to print all lines -that match the pattern. We could leave out the action (the print -statement and the curly braces) in the above example, and the result -would be the same: all lines matching the pattern `foo' would be -printed. (By comparison, omitting the print statement but retaining -the curly braces makes an empty action that does nothing; then no -lines would be printed.) - - - -File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started - -An Example with Two Rules -========================= - -The `awk' utility reads the input files one line at a time. For each -line, `awk' tries the patterns of all the rules. If several patterns -match then several actions are run, in the order in which they appear -in the `awk' program. If no patterns match, then no actions are run. - -After processing all the rules (perhaps none) that match the line, -`awk' reads the next line (however, *note Next::.). This continues -until the end of the file is reached. - -For example, the `awk' program: - - /12/ { print $0 } - /21/ { print $0 } - -contains two rules. The first rule has the string `12' as the -pattern and `print $0' as the action. The second rule has the string -`21' as the pattern and also has `print $0' as the action. Each -rule's action is enclosed in its own pair of braces. - -This `awk' program prints every line that contains the string `12' -*or* the string `21'. If a line contains both strings, it is printed -twice, once by each rule. - -If we run this program on our two sample data files, `BBS-list' and -`inventory-shipped', as shown here: - - awk '/12/ { print $0 } - /21/ { print $0 }' BBS-list inventory-shipped - -we get the following output: - - aardvark 555-5553 1200/300 B - alpo-net 555-3412 2400/1200/300 A - barfly 555-7685 1200/300 A - bites 555-1675 2400/1200/300 A - core 555-2912 1200/300 C - fooey 555-1234 2400/1200/300 B - foot 555-6699 1200/300 B - macfoo 555-6480 1200/300 A - sdace 555-3430 2400/1200/300 A - sabafoo 555-2127 1200/300 C - sabafoo 555-2127 1200/300 C - Jan 21 36 64 620 - Apr 21 70 74 514 - -Note how the line in `BBS-list' beginning with `sabafoo' was printed -twice, once for each rule. - - - -File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started - -A More Complex Example -====================== - -Here is an example to give you an idea of what typical `awk' programs -do. This example shows how `awk' can be used to summarize, select, -and rearrange the output of another utility. It uses features that -haven't been covered yet, so don't worry if you don't understand all -the details. - - ls -l | awk '$5 == "Nov" { sum += $4 } - END { print sum }' - -This command prints the total number of bytes in all the files in the -current directory that were last modified in November (of any year). -(In the C shell you would need to type a semicolon and then a -backslash at the end of the first line; in the Bourne shell you can -type the example as shown.) - -The `ls -l' part of this example is a command that gives you a full -listing of all the files in a directory, including file size and date. -Its output looks like this: - - -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile - -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h - -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h - -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y - -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c - -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c - -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c - -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c - -The first field contains read--write permissions, the second field -contains the number of links to the file, and the third field -identifies the owner of the file. The fourth field contains the size -of the file in bytes. The fifth, sixth, and seventh fields contain -the month, day, and time, respectively, that the file was last -modified. Finally, the eighth field contains the name of the file. - -The `$5 == "Nov"' in our `awk' program is an expression that tests -whether the fifth field of the output from `ls -l' matches the string -`Nov'. Each time a line has the string `Nov' in its fifth field, the -action `{ sum += $4 }' is performed. This adds the fourth field (the -file size) to the variable `sum'. As a result, when `awk' has -finished reading all the input lines, `sum' will be the sum of the -sizes of files whose lines matched the pattern. - -After the last line of output from `ls' has been processed, the `END' -pattern is executed, and the value of `sum' is printed. In this -example, the value of `sum' would be 80600. - -These more advanced `awk' techniques are covered in later sections -(*note Actions::.). Before you can move on to more advanced `awk' -programming, you have to know how `awk' interprets your input and -displays your output. By manipulating "fields" and using special -"print" statements, you can produce some very useful and spectacular -looking reports. - - - -File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started - -How to Run `awk' Programs -========================= - -There are several ways to run an `awk' program. If the program is -short, it is easiest to include it in the command that runs `awk', -like this: - - awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... - - where PROGRAM consists of a series of PATTERNS and ACTIONS, as -described earlier. - -When the program is long, you would probably prefer to put it in a -file and run it with a command like this: - - awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ... - - * Menu: - -* One-shot:: Running a short throw--away `awk' program. -* Read Terminal:: Using no input files (input from terminal instead). -* Long:: Putting permanent `awk' programs in files. -* Executable Scripts:: Making self--contained `awk' programs. -* Command Line:: How the `awk' command line is laid out. - - - -File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk - -One--shot Throw--away `awk' Programs ------------------------------------- - -Once you are familiar with `awk', you will often type simple programs -at the moment you want to use them. Then you can write the program -as the first argument of the `awk' command, like this: - - awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... - - where PROGRAM consists of a series of PATTERNS and ACTIONS, as -described earlier. - -This command format tells the shell to start `awk' and use the -PROGRAM to process records in the input file(s). There are single -quotes around the PROGRAM so that the shell doesn't interpret any -`awk' characters as special shell characters. They cause the shell -to treat all of PROGRAM as a single argument for `awk'. They also -allow PROGRAM to be more than one line long. - -This format is also useful for running short or medium--sized `awk' -programs from shell scripts, because it avoids the need for a -separate file for the `awk' program. A self--contained shell script -is more reliable since there are no other files to misplace. - - - -File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk - -Running `awk' without Input Files ---------------------------------- - -You can also use `awk' without any input files. If you type the -command line: - - awk 'PROGRAM' - -then `awk' applies the PROGRAM to the "standard input", which usually -means whatever you type on the terminal. This continues until you -indicate end--of--file by typing `Control-d'. - -For example, if you type: - - awk '/th/' - -whatever you type next will be taken as data for that `awk' program. -If you go on to type the following data, - - Kathy - Ben - Tom - Beth - Seth - Karen - Thomas - `Control-d' - -then `awk' will print - - Kathy - Beth - Seth - -as matching the pattern `th'. Notice that it did not recognize -`Thomas' as matching the pattern. The `awk' language is "case -sensitive", and matches patterns *exactly*. - - - -File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk - -Running Long Programs ---------------------- - -Sometimes your `awk' programs can be very long. In this case it is -more convenient to put the program into a separate file. To tell -`awk' to use that file for its program, you type: - - awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ... - - The `-f' tells the `awk' utility to get the `awk' program from the -file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For -example, you could put the program: - - /th/ - -into the file `th-prog'. Then the command: - - awk -f th-prog - -does the same thing as this one: - - awk '/th/' - -which was explained earlier (*note Read Terminal::.). Note that you -don't usually need single quotes around the file name that you -specify with `-f', because most file names don't contain any of the -shell's special characters. - -If you want to identify your `awk' program files clearly as such, you -can add the extension `.awk' to the filename. This doesn't affect -the execution of the `awk' program, but it does make ``housekeeping'' -easier. - - - -File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk - -Executable `awk' Programs -------------------------- - -(The following section assumes that you are already somewhat familiar -with `awk'.) - -Once you have learned `awk', you may want to write self--contained -`awk' scripts, using the `#!' script mechanism. You can do this on -BSD Unix systems and GNU. - -For example, you could create a text file named `hello', containing -the following (where `BEGIN' is a feature we have not yet discussed): - - #! /bin/awk -f - - # a sample awk program - - BEGIN { print "hello, world" } - -After making this file executable (with the `chmod' command), you can -simply type: - - hello - -at the shell, and the system will arrange to run `awk' as if you had -typed: - - awk -f hello - -Self--contained `awk' scripts are particularly useful for putting -`awk' programs into production on your system, without your users -having to know that they are actually using an `awk' program. - -If your system does not support the `#!' mechanism, you can get a -similar effect using a regular shell script. It would look something -like this: - - : a sample awk program - - awk 'PROGRAM' "$@" - -Using this technique, it is *vital* to enclose the PROGRAM in single -quotes to protect it from interpretation by the shell. If you omit -the quotes, only a shell wizard can predict the result. - -The `"$@"' causes the shell to forward all the command line arguments -to the `awk' program, without interpretation. - - - -File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk - -Details of the `awk' Command Line ---------------------------------- - -(The following section assumes that you are already familiar with -`awk'.) - -There are two ways to run `awk'. Here are templates for both of -them; items enclosed in `[' and `]' in these templates are optional. - - awk [ -FFS ] [ -- ] 'PROGRAM' FILE ... - awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ... - - Options begin with a minus sign, and consist of a single character. -The options and their meanings are as follows: - -`-FFS' - This sets the `FS' variable to FS (*note Special::.). As a - special case, if FS is `t', then `FS' will be set to the tab - character (`"\t"'). - -`-f SOURCE-FILE' - Indicates that the `awk' program is to be found in SOURCE-FILE - instead of in the first non--option argument. - -`--' - This signals the end of the command line options. If you wish - to specify an input file named `-f', you can precede it with the - `--' argument to prevent the `-f' from being interpreted as an - option. This handling of `--' follows the POSIX argument - parsing conventions. - -Any other options will be flagged as invalid with a warning message, -but are otherwise ignored. - -If the `-f' option is *not* used, then the first non--option command -line argument is expected to be the program text. - -The `-f' option may be used more than once on the command line. -`awk' will read its program source from all of the named files, as if -they had been concatenated together into one big file. This is -useful for creating libraries of `awk' functions. Useful functions -can be written once, and then retrieved from a standard place, -instead of having to be included into each individual program. You -can still type in a program at the terminal and use library -functions, by specifying `/dev/tty' as one of the arguments to a -`-f'. Type your program, and end it with the keyboard end--of--file -character `Control-d'. - -Any additional arguments on the command line are made available to -your `awk' program in the `ARGV' array (*note Special::.). These -arguments are normally treated as input files to be processed in the -order specified. However, an argument that has the form VAR`='VALUE, -means to assign the value VALUE to the variable VAR--it does not -specify a file at all. - -Command line options and the program text (if present) are omitted -from the `ARGV' array. All other arguments, including variable -assignments, are included (*note Special::.). - -The distinction between file name arguments and variable--assignment -arguments is made when `awk' is about to open the next input file. -At that point in execution, it checks the ``file name'' to see -whether it is really a variable assignment; if so, instead of trying -to read a file it will, *at that point in the execution*, assign the -variable. - -Therefore, the variables actually receive the specified values after -all previously specified files have been read. In particular, the -values of variables assigned in this fashion are *not* available -inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run -before `awk' begins scanning the argument list. - -The variable assignment feature is most useful for assigning to -variables such as `RS', `OFS', and `ORS', which control input and -output formats, before listing the data files. It is also useful for -controlling state if multiple passes are needed over a data file. -For example: - - awk 'pass == 1 { PASS 1 STUFF } - pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile - - - -File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started - -Comments in `awk' Programs -========================== - -When you write a complicated `awk' program, you can put "comments" in -the program file to help you remember what the program does, and how -it works. - -A comment starts with the the sharp sign character, `#', and -continues to the end of the line. The `awk' language ignores the -rest of a line following a sharp sign. For example, we could have -put the following into `th-prog': - - # This program finds records containing the pattern `th'. This is how - # you continue comments on additional lines. - /th/ - -You can put comment lines into keyboard--composed throw--away `awk' -programs also, but this usually isn't very useful; the purpose of a -comment is to help yourself or another person understand the program -at another time. - - - -File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started - -`awk' Statements versus Lines -============================= - -Most often, each line in an `awk' program is a separate statement or -separate rule, like this: - - awk '/12/ { print $0 } - /21/ { print $0 }' BBS-list inventory-shipped - -But sometimes statements can be more than one line, and lines can -contain several statements. - -You can split a statement into multiple lines by inserting a newline -after any of the following: - - , { ? : || && - -Lines ending in `do' or `else' automatically have their statements -continued on the following line(s). A newline at any other point -ends the statement. - -If you would like to split a single statement into two lines at a -point where a newline would terminate it, you can "continue" it by -ending the first line with a backslash character, `\'. This is -allowed absolutely anywhere in the statement, even in the middle of a -string or regular expression. For example: - - awk '/This program is too long, so continue it\ - on the next line/ { print $1 }' - -We have generally not used backslash continuation in the sample -programs in this manual. Since there is no limit on the length of a -line, it is never strictly necessary; it just makes programs -prettier. We have preferred to make them even more pretty by keeping -the statements short. Backslash continuation is most useful when -your `awk' program is in a separate source file, instead of typed in -on the command line. - -*Warning: this does not work if you are using the C shell.* -Continuation with backslash works for `awk' programs in files, and -also for one--shot programs *provided* you are using the Bourne -shell, the Korn shell, or the Bourne--again shell. But the C shell -used on Berkeley Unix behaves differently! There, you must use two -backslashes in a row, followed by a newline. - -When `awk' statements within one rule are short, you might want to -put more than one of them on a line. You do this by separating the -statements with semicolons, `;'. This also applies to the rules -themselves. Thus, the above example program could have been written: - - /12/ { print $0 } ; /21/ { print $0 } - -*Note:* It is a new requirement that rules on the same line require -semicolons as a separator in the `awk' language; it was done for -consistency with the statements in the action part of rules. - - - -File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started - -When to Use `awk' -================= - -What use is all of this to me, you might ask? Using additional -operating system utilities, more advanced patterns, field separators, -arithmetic statements, and other selection criteria, you can produce -much more complex output. The `awk' language is very useful for -producing reports from large amounts of raw data, like summarizing -information from the output of standard operating system programs -such as `ls'. (*Note A More Complex Example: More Complex.) - -Programs written with `awk' are usually much smaller than they would -be in other languages. This makes `awk' programs easy to compose and -use. Often `awk' programs can be quickly composed at your terminal, -used once, and thrown away. Since `awk' programs are interpreted, -you can avoid the usually lengthy edit--compile--test--debug cycle of -software development. - -Complex programs have been written in `awk', including a complete -retargetable assembler for 8--bit microprocessors (*note Glossary::. -for more information) and a microcode assembler for a special purpose -Prolog computer. However, `awk''s capabilities are strained by tasks -of such complexity. - -If you find yourself writing `awk' scripts of more than, say, a few -hundred lines, you might consider using a different programming -language. Emacs Lisp is a good choice if you need sophisticated -string or pattern matching capabilities. The shell is also good at -string and pattern matching; in addition it allows powerful use of -the standard utilities. More conventional languages like C, C++, or -Lisp offer better facilities for system programming and for managing -the complexity of large programs. Programs in these languages may -require more lines of source code than the equivalent `awk' programs, -but they will be easier to maintain and usually run more efficiently. - - - -File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top - -Reading Files (Input) -********************* - -In the typical `awk' program, all input is read either from the -standard input (usually the keyboard) or from files whose names you -specify on the `awk' command line. If you specify input files, `awk' -reads data from the first one until it reaches the end; then it reads -the second file until it reaches the end, and so on. The name of the -current input file can be found in the special variable `FILENAME' -(*note Special::.). - -The input is split automatically into "records", and processed by the -rules one record at a time. (Records are the units of text mentioned -in the introduction; by default, a record is a line of text.) Each -record read is split automatically into "fields", to make it more -convenient for a rule to work on parts of the record under -consideration. - -On rare occasions you will need to use the `getline' command, which -can do explicit input from any number of files. - -* Menu: - -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Field Separators:: The field separator and how to change it. -* Multiple:: Reading multi--line records. - -* Assignment Options:: Setting variables on the command line and a summary - of command line syntax. This is an advanced method - of input. - -* Getline:: Reading files under explicit program control - using the `getline' function. -* Close Input:: Closing an input file (so you can read from - the beginning once more). - - - -File: gawk-info, Node: Records, Next: Fields, Up: Reading Files - -How Input is Split into Records -=============================== - -The `awk' language divides its input into records and fields. -Records are separated from each other by the "record separator". By -default, the record separator is the "newline" character. Therefore, -normally, a record is a line of text. - -Sometimes you may want to use a different character to separate your -records. You can use different characters by changing the special -variable `RS'. - -The value of `RS' is a string that says how to separate records; the -default value is `"\n"', the string of just a newline character. -This is why lines of text are the default record. Although `RS' can -have any string as its value, only the first character of the string -will be used as the record separator. The other characters are -ignored. `RS' is exceptional in this regard; `awk' uses the full -value of all its other special variables. - -The value of `RS' is changed by "assigning" it a new value (*note -Assignment Ops::.). One way to do this is at the beginning of your -`awk' program, before any input has been processed, using the special -`BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to -its new value before any input is read. The new value of `RS' is -enclosed in quotation marks. For example: - - awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list - -changes the value of `RS' to `/', the slash character, before reading -any input. Records are now separated by a slash. The second rule in -the `awk' program (the action with no pattern) will proceed to print -each record. Since each `print' statement adds a newline at the end -of its output, the effect of this `awk' program is to copy the input -with each slash changed to a newline. - -Another way to change the record separator is on the command line, -using the variable--assignment feature (*note Command Line::.). - - awk '...' RS="/" SOURCE-FILE - -`RS' will be set to `/' before processing SOURCE-FILE. - -The empty string (a string of no characters) has a special meaning as -the value of `RS': it means that records are separated only by blank -lines. *Note Multiple::, for more details. - -The `awk' utility keeps track of the number of records that have been -read so far from the current input file. This value is stored in a -special variable called `FNR'. It is reset to zero when a new file -is started. Another variable, `NR', is the total number of input -records read so far from all files. It starts at zero but is never -automatically reset to zero. - -If you change the value of `RS' in the middle of an `awk' run, the -new value is used to delimit subsequent records, but the record -currently being processed (and records already finished) are not -affected. - - |