The following characters have special meaning in search patterns:
Character |
Action |
---|---|
. |
Match any single character except newline. |
* |
Match any number (or none) of the single character that immediately precedes it. The preceding character also can be a regular expression (e.g., since . (dot) means any character, .* means match any number of any character—except newlines). |
^ |
Match the beginning of the line or string. |
$ |
Match the end of the line or string. |
[ ] |
Match any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list. |
[^ ] |
Match anything except enclosed characters. |
\{n,m\} |
Match a range of occurrences of the single character that immediately precedes it. The preceding character also can be a regular expression. \{n\} matches exactly n occurrences, \{n,\} matches at least n occurrences, and \{n,m\} matches any number of occurrences between n and m. |
{n,m} |
Like \{n,m\}. Available in grep by default and in gawk with the -Wre-interval option. |
\ |
Turn off the special meaning of the character that follows. |
\( \) |
Save the matched text enclosed between \( and \) in a special holding space. Up to nine patterns can be saved on a single line. They can be "replayed" in the same pattern or within substitutions by the escape sequences \1 to \9. |
\n |
Reuse matched text stored in nth \( \). |
\< |
Match the beginning of a word. |
\> |
Match the end of a word. |
+ |
Match one or more instances of preceding regular expression. |
? |
Match zero or one instance of preceding regular expression. |
| |
Match the regular expression specified before or after. |
( ) |
In egrep and gawk, group regular expressions. |
Many utilities support POSIX character lists, which are useful for matching non-ASCII characters in languages other than English. These lists are recognized only within [ ] ranges. A typical use would be [[:lower:]], which in English is the same as [a-z].
The following table lists POSIX character lists:
Notation |
Matches |
---|---|
[:alnum:] |
Alphanumeric characters |
[:alpha:] |
Alphabetic characters, uppercase and lowercase |
[:blank:] |
Printable whitespace: spaces and tabs but not control characters |
[:cntrl:] |
Control characters, such as ^A through ^Z |
[:digit:] |
Decimal digits |
[:graph:] |
Printable characters, excluding whitespace |
[:lower:] |
Lowercase alphabetic characters |
[:print:] |
Printable characters, including whitespace but not control characters |
[:punct:] |
Punctuation, a subclass of printable characters |
[:space:] |
Whitespace, including spaces, tabs, and some control characters |
[:upper:] |
Uppercase alphabetic characters |
[:xdigit:] |
Hexadecimal digits |
The following characters have special meaning in replacement patterns:
Character |
Action |
---|---|
\ |
Turn off the special meaning of the character that follows. |
\n |
Restore the nth pattern previously saved by \( and \). n is a number from 1 to 9, matching the patterns searched sequentially from left to right. |
& |
Reuse the search pattern as part of the replacement pattern. |
~ |
Reuse the previous replacement pattern in the current replacement pattern. |
\e |
End replacement pattern started by \L or \U. |
\E |
End replacement pattern started by \L or \U. |
\l |
Convert first character of replacement pattern to lowercase. |
\L |
Convert replacement pattern to lowercase. |
\u |
Convert first character of replacement pattern to uppercase. |
\U |
Copyright © 2003 O'Reilly & Associates. All rights reserved.