start page | rating of books | rating of authors | reviews | copyrights

Linux in a NutshellLinux in a NutshellSearch this book

Chapter 13. The gawk Scripting Language

Contents:

Command-Line Syntax
Patterns and Procedures
gawk System Variables
PROCINFO Array
Operators
Variable and Array Assignments
Group Listing of gawk Commands
Alphabetical Summary of Commands

gawk is the GNU version of awk, a powerful pattern-matching program for processing text files that may be composed of fixed- or variable-length records separated by some delineator (by default, a newline character). gawk may be used from the command line or in gawk scripts. You should normally be able to invoke this utility using either awk or gawk on the shell command line.

With gawk, you can:

For more information on gawk, see sed & awk (O'Reilly) or Effective gawk Programming (O'Reilly).

13.1. Command-Line Syntax

gawk's syntax has two forms:

gawk [options] 'script' var=value file(s)
gawk [options] -f scriptfile var=value file(s)

You can specify a script directly on the command line, or you can store a script in a scriptfile and specify it with -f. Multiple -f options are allowed; awk concatenates the files. This feature is useful for including libraries.

gawk operates on one or more input files. If none are specified (or if - is specified), gawk reads from standard input.

Variables can be assigned a value on the command line. The value assigned to a variable can be a literal, a shell variable ($name), or a command substitution (`cmd`), but the value is available only after a line of input is read (i.e., after the BEGIN statement).

For example, to print the first three (colon-separated) fields of the password file, use -F to set the field separator to a colon:

gawk -F : '{print $1; print $2; print $3}' /etc/passwd

Numerous examples are shown later in Section 13.2.

13.1.1. Options

All options exist in both traditional POSIX (one-letter) format and GNU-style (long) format. Some recognized options are:

--
Treat all subsequent text as commands or filenames, not options.

-f scriptfile, --file=scriptfile
Read gawk commands from scriptfile instead of command line.

-v var=value, --assign=var=value
Assign a value to variable var. This allows assignment before the script begins execution.

-F c, --field-separator=c
Set the field separator to character c. This is the same as setting the variable FS. c may be a regular expression. Each input line, or record, is divided into fields by whitespace (blanks or tabs) or by some other user-definable record separator. Fields are referred to by the variables $1, $2,..., $n. $0 refers to the entire record.

-W option
All -W options are specific to gawk, as opposed to awk. An alternate syntax is --option (i.e., --compat). option may be one of:

compat, traditional
Behave exactly like traditional (non-GNU) awk.

copyleft, copyright
Print copyleft notice and exit.

dump-variables[=file]
Print the name, type, and value of all global variables to the specified file, or to the file awkvars.out in the current directory if no file is specified.

help, usage
Print syntax and list of options, then exit.

lint[=fatal]
Warn about commands that might not port to other versions of awk or that gawk considers problematic. When fatal is specified, warnings are treated as fatal errors.

lint-old
Like lint, but compares to an older version of awk used on Version 7 Unix.

non-decimal-data
When reading data, interpret numbers beginning with 0 to be octal, and those beginning with 0x to be hexadecimal. (To print nondecimal numbers, use the printf command, as print prints only string representations of nondecimal numbers.)

posix
Expect exact compatibility with POSIX; disable all gawk extensions as if traditional had been specified. Ignore \x escape sequences, **, **=, the keyword func, and single-tab field separators. Disallow newlines after ? or : and the fflush function.

profile[=file]
Write a pretty printed version of the script being executed to the specified file, or to the file awkprof.out in the current directory if no other file is specified. When gawk is invoked as pgawk and passed this version of the program with the -f option, it will add profile data to the file inserting execution counts to the left of each statement in the program.

re-interval
Allow use of {n,m} intervals in regular expressions.

source=script
Treat script as gawk commands. Like the 'script' argument, but lets you mix commands from files (using -f options) with commands on the gawk command line.

version
Print version information and exit.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.