This article also covers 
nawk
 and 
gawk
 (
33.12
)
.
 With the exception of array subscripts, values in 
[
brackets
]
 are optional; don't type the 
[
 or 
]
.
awk can be invoked in two ways:
awk [options] 'script' [var=value] [file(s)] awk [options] -fscriptfile[var=value] [file(s)]
You can specify a 
script
 directly on the command line, or you can store a script in a 
scriptfile
 and specify it with 
-f
. In most versions, the 
-f
 option can be used multiple times. The variable 
var
 can be assigned a value on the command line. The value can be a literal, a shell variable (
$
name
), or a command substitution  (
`
cmd
`
), but the value is available only after a line of input is read (i.e., after the BEGIN statement). 
awk
 operates on one or more 
file(s)
. If none are specified (or if 
-
 is specified), 
awk
 reads from the 
standard input (
13.1
)
.
The other recognized 
options
 are:
-F
c
Set the field separator to character 
c
. This is the same as setting the system variable 
FS
. 
nawk
 allows 
c
 to be a 
regular expression (
26.4
)
. Each record (by default, one input line) is divided into fields by white space (blanks or tabs) or by some other user-definable field separator. Fields are referred to by the variables 
$1
,  
$2
,...
$
n
. 
$0
 refers to the entire record. For example, to print the first three (colon-separated) fields on separate lines:
    % 
awk -F: '{print $1; print $2; print $3}' /etc/passwd
-v 
var
=
value
Assign a 
value
 to variable 
var
. This allows assignment before the script begins execution. (Available in 
nawk
 only.)
awk scripts consist of patterns and procedures:
pattern{procedure}
Both are optional. If 
pattern
 is missing,  
{
procedure
}
 is applied to all records. If 
{
procedure
}
 is missing, the matched record is written to the standard output.
pattern
 can be any of the following:
/regular expression/relational expressionpattern-matching expressionBEGIN END
Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later under the section "awk System Variables."
Regular expressions use the extended set of metacharacters  as described in article 
26.4
. In addition, 
^
 and 
$
 can be used to refer to the beginning and end of a  field, respectively, rather than the beginning and end of a record (line).
Relational expressions use the relational operators listed under the section "Operators" later in this article. Comparisons can be either string or numeric. For example, 
$2
 
>
 
$1
 selects records for which the second field is greater than the first.
Pattern-matching expressions use the operators 
~
 (match) and 
!~
 (don't match). See the section "Operators" later in this article.
The BEGIN pattern lets you specify procedures that will take place before the first input record is processed. (Generally, you set global variables here.)
The END pattern lets you specify procedures that will take place after the last input record is read.
Except for BEGIN and END, patterns can be combined with the Boolean operators 
||
 (OR),  
&&
 (AND), and 
!
 (NOT). A range of lines can also be  specified using comma-separated patterns:
pattern,pattern
procedure
 can consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons (
;
), and contained within curly braces (
{}
). Commands fall into four groups:
Variable or array assignments
Printing commands
Built-in functions
Control-flow commands
Print first field of each line:
{ print $1 }
Print all lines that contain 
pattern
:
/pattern/
Print first field of lines that contain 
pattern
:
/pattern/{ print $1 }
Print records containing more than two fields:
NF > 2
Interpret input records as a group of lines up to a blank line:
BEGIN { FS = "\n"; RS = "" } { 
...process records...
 }
Print fields 2 and 3 in switched order, but only on lines whose first field matches the string 
URGENT
:
$1 ~ /URGENT/ { print $3, $2 }
Count and print the number of 
pattern
 found:
/pattern/ { ++x } END { print x }
Add numbers in second column and print total:
{total += $2 };  END { print "column total is", total}
Print lines that contain less than 20 characters:
length($0) < 20
Print each line that begins with  
Name:
 and that contains exactly seven fields:
NF == 7 && /^Name:/
nawk supports all awk variables. gawk supports both nawk and awk .
| Version | Variable | Description | 
|---|---|---|
| awk | FILENAME | Current filename | 
| FS | Field separator (default is whitespace) | |
| NF | Number of fields in current record | |
| NR | Number of the current record | |
| OFMT | 
Output format for numbers (default is 
%.6g
) | 
|
| OFS | Output field separator (default is a blank) | |
| ORS | Output record separator (default is a newline) | |
| RS | Record separator (default is a newline) | |
$0
 | 
Entire input record | |
$
n
 | 
n
th field in current record; fields are separated by 
FS
 | 
|
| nawk | ARGC | Number of arguments on command line | 
| ARGV | An array containing the command-line arguments | |
| ENVIRON | An associative array of environment variables | |
| FNR | Like NR , but relative to the current file | |
| RSTART | First position in the string matched by match function | |
| RLENGTH | Length of the string matched by match function | |
| SUBSEP | 
Separator character for array subscripts (default is 
\034
) | 
The table below lists the operators, in order of increasing precedence, that are available in awk :
| Symbol | Meaning | 
|---|---|
| = += -= *= /= %= ^= | 
Assignment (
^=
 only in 
nawk
 and 
gawk
) | 
| ?: | C conditional expression ( nawk and gawk ) | 
| || | Logical OR | 
| && | Logical AND | 
| ~ !~ | Match regular expression and negation | 
| < <= > >= != == | Relational operators | 
| (blank) | Concatenation | 
| + - | Addition, subtraction | 
| * / % | Multiplication, division, and modulus | 
| + - ! | Unary plus and minus, and logical negation | 
| ^ | Exponentiation ( nawk and gawk ) | 
| ++ -- | Increment and decrement, either prefix or postfix | 
| $ | Field reference | 
Variables can be assigned a value with an equal sign (
=
). For example:
FS = ","
Expressions using the operators 
+
, 
-
, 
*
, 
/
, and 
%
  (modulo) can be assigned to variables.
Arrays can be created with the 
split
 function (see below),  or they can simply be named in an assignment statement. Array elements can be subscripted with numbers  (
array
[1]
,...
array
[
n
]
) or with names. For example, to count the number of occurrences of a pattern, you could use the following script:
/pattern/ {array["pattern"]++ } END { printarray["pattern"] }
awk commands may be classified as follows:
| Arithmetic | String | Control Flow | Input/Output | 
|---|---|---|---|
| Functions | Functions | Statements | Processing | 
| atan2* | gsub* | break | close* | 
| cos* | index | continue | delete* | 
| exp | length | do/while* | getline* | 
| int | match* | exit | next | 
| log | split | for | |
| rand* | sub* | if | printf | 
| sin* | substr | return* | sprintf | 
| sqrt | tolower* | while | system* | 
| srand* | toupper* | 
The following alphabetical list of statements and functions includes all that are available in awk , nawk , or gawk . Unless otherwise mentioned, the statement or function is found in all versions. New statements and functions introduced with nawk are also found in gawk .
atan2
atan2(
y
,
x
)
  
 Returns the arctangent of 
y
/
x
 in radians. (
nawk
)
break
close
In some implementations of awk , you can have only ten files open simultaneously and one pipe; modern versions allow more than one pipe open. Therefore, nawk provides a close statement that allows you to close a file or a pipe. close takes as an argument the same expression that opened the pipe or file. ( nawk )close(filename-expr)close(command-expr)
continue
Begin next iteration of while , for , or do loop immediately.
cos
delete
do
Looping statement. Execute statements indobodywhile (expr)
body
, then evaluate 
expr
. If 
expr
 is true, execute 
body
 again. More than one 
command
 must be put inside braces (
{}
). (
nawk
)
exit
exit
[
expr
]  Do not execute remaining instructions and do not read new input.
 END procedure, if any, will be executed. The 
expr
, if any, becomes 
awk
's  
exit status (
44.7
)
.
exp
for
for (
[
init-expr
]
; 
[
test-expr
]
; 
[
incr-expr
]
)
     
command
 C-language-style looping construct.
 Typically, 
init-expr
 assigns the initial value of a counter variable. 
test-expr
 is a relational expression that is evaluated each time before executing the 
command
. When 
test-expr
 is false, the loop is exited. 
incr-expr
 is used to increment the counter variable after each pass. A series of 
command
s must be put within braces (
{}
).  Example:
for (i = 1; i <= 10; i++) printf "Element %d is %s.\n", i, array[i]
for
for (
item 
in 
array
)
      
command
 For each 
item
 in an associative 
array
, do 
command
. More than one 
command
 must be put inside braces (
{}
). Refer to each element of the array as 
array
[
item
]
. 
getline
getline 
[
var
][
<
file
]         or 
command 
| getline 
[
var
] Read next line of input. Original 
awk
 does not support the syntax to open multiple input streams. The first form reads input from 
file
, and the second form reads the standard output of a UNIX 
command
. Both forms read one line at a time, and each time the statement is executed it gets the next line of input. The line of input is assigned to 
$0
, and it is parsed into fields, setting 
NF
, 
NR
, and 
FNR
. If 
var
 is specified, the result is assigned to 
var
 and the 
$0
 is not changed. Thus, if the result is assigned to a variable, the current line does not change. 
getline
 is actually a function and it returns 1 if it reads a record successfully, 0 if end-of-file is encountered, and -1 if for some reason it is otherwise unsuccessful. (
nawk
)
gsub
gsub(
r
,
s
[
,
t
]
)
  Globally substitute 
s
 for each match of the
 regular expression 
r
 in the string 
t
. Return the number of substitutions. If 
t
 is not supplied, defaults to 
$0
. (
nawk
)
if
if (
condition
)
      
command
 [
else
       
command
]
If 
condition
 is true, do 
command(s)
, otherwise do
 
command(s)
 in 
else
 clause (if any). 
condition
 can be an expression that uses  any of the relational operators 
<
, 
<=
, 
==
,  
!=
, 
>=
, or 
>
, as well as the pattern-matching operators 
~
 or 
!~
 (e.g., 
if ($1 ~ /[Aa].*[Zz]/)
). A series of 
command
s must be put within braces (
{}
).
index
index(
str
,
substr
)
  Return position of first substring 
substr
 in string 
str
 or 0 if not found.
int
length
log
match
match(
s
,
r
)
  Function that matches the pattern, specified by the regular expression
 
r
, in the string 
s
 and returns either the position in 
s
 where the match begins or 0 if no occurrences are found. Sets the values of 
RSTART
 and 
RLENGTH
. (
nawk
) 
next
Read next input line and start new cycle through pattern/procedures statements.
print
print 
[
args
] [
destination
]  Print 
args
 on output, followed by a newline.
 
args
 is usually one or more fields, but may also be one or more of the predefined variables - or arbitrary expressions. If no 
args
 are given, prints 
$0
 (the current input line). Literal strings must be quoted. Fields are printed in the order they are listed. If separated by commas (
,
) in the argument list, they are separated in the output by the 
OFS
 character. If separated by spaces, they are concatenated in the output. 
destination
 is a UNIX redirection or pipe expression (e.g., 
> 
file
) that redirects the  default standard output.
printf
format 
[
, 
expression(s)
] [
destination
]  Formatted print statement.
 Fields or variables can be formatted according to instructions in the 
format
 argument. The number of 
expression
s must correspond to the number specified in the format sections.  
format
 follows the conventions of the C-language 
printf
 statement. Here are a few of the most common formats:
%s
A string.
%d
A decimal number.
%
n
.
m
f
A floating-point number, where 
n
 is the total number of digits and 
m
 is the number of digits after the decimal point.
%
[
-
]
nc
n
 specifies minimum field length for format type 
c
, while 
-
 left justifies value in field; otherwise value is right justified.   
format
 can also contain embedded escape sequences: 
\n
 (newline) or 
\t
 (tab) are the most common. 
destination
 is a UNIX redirection or pipe expression (e.g., 
> 
file
) that redirects the  default standard output.  Example:  Using the script:
{printf "The sum on line %s is %d.\n", NR, $1+$2}
The following input line:
5 5
produces this output, followed by a newline:
The sum on line 1 is 10.
rand
rand()
  Generate a random number between 0 and 1.
 This function returns the same series of numbers each time the script is executed, unless the random number generator is seeded using the 
srand( )
 function. (
nawk
)
return
return 
[
expr
]  Used at end of user-defined functions to exit the function,
 returning value of expression 
expr
, if any. (
nawk
)
sin
split
split(
string
,
array
[
,
sep
]
)
  Split 
string
 into elements of 
array
 
array[1],...  array[
n
]
. 
string
 is split at each occurrence of separator 
sep
. (In 
nawk
, the separator may be a regular expression.) If 
sep
 is not specified, 
FS
 is used. The number of array elements created is returned.
sprintf
sprintf (
format 
[
, 
expression(s)
]
)
  Return the value of 
expression(s)
, using the specified 
format
 (see 
printf
). Data is formatted but not printed.
sqrt
srand
srand(
expr
)
 
 Use 
expr
 to set a new seed for random number generator. Default is time of day. Returns the old seed. (
nawk
)
sub
sub(
r
,
s
[
,
t
]
)
  Substitute 
s
 for first match of the
 regular expression 
r
 in the string 
t
. Return 1 if successful; 0 otherwise. If 
t
 is not supplied, defaults to 
$0
. (
nawk
)
substr
substr(
string
,
m
[
,
n
]
)
  Return substring of 
string
 beginning at character position
 
m
 and consisting of the next 
n
 characters. If 
n
 is omitted, include all characters to the end of string.
system
system(
command
)
  Function that executes the specified UNIX 
command
 and returns its
 
status (
44.7
)
. The status of the command that is executed typically indicates its success (0) or failure (non-zero). The output of the command is not available for processing within the 
nawk
 script. Use 
command
 
|
 
getline
 to read the output of the command into the script. (
nawk
) 
tolower
tolower(
str
)
  Translate all uppercase characters
 in 
str
 to lowercase and return the new string. (
nawk
)
toupper
toupper(
str
)
  Translate all lowercase characters
 in 
str
 to uppercase and return the new string. (
nawk
)
while
Dowhile (condition)command
command
 while 
condition
 is true (see 
if
 for a
 description of allowable conditions). A series of commands must be put within braces (
{}
).
- from O'Reilly & Associates' UNIX in a Nutshell (SVR4/Solaris)