start page | rating of books | rating of authors | reviews | copyrights

UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 36.1 Putting Things in Order Chapter 36
Sorting
Next: 36.3 Changing the Field Delimiter
 

36.2 Sort Fields: How sort Sorts

Unless you tell it otherwise, sort divides each line into fields at white space (blanks or tabs), and sorts the lines, by field, from left to right.

That is, it sorts on the basis of field 0 (leftmost); but when the leftmost fields are the same, it sorts on the basis of field 1; and so on. This is hard to put into words, but it's really just common sense. Suppose your office inventory manager created a file like this:

supplies     pencils  148 furniture    chairs   40 kitchen      knives   22 kitchen      forks    20 supplies     pens     236 furniture    couches  10 furniture    tables   7 supplies     paper    29

You'd want all the supplies sorted into categories and within each category, you'd want them sorted alphabetically:

% 

sort supplies

 furniture    chairs   40 furniture    couches  10 furniture    tables   7 kitchen      forks    20 kitchen      knives   22 supplies     paper    29 supplies     pencils  148 supplies     pens     236

Of course, you don't always want to sort from left to right. The command line option +n tells sort to start sorting on field n ; -n tells sort to stop sorting on field n . Remember (again) that sort counts fields from left to right, starting with 0. [1] Here's an example. We want to sort a list of telephone numbers of authors, presidents, and blues singers:

[1] I harp on this because I always get confused and have to look it up in the manual page.

Robert M Johnson      344-0909 Lyndon B Johnson      933-1423 Samuel H Johnson      754-2542 Michael K Loukides    112-2535 Jerry O Peek          267-2345 Timothy F O'Reilly    443-2434

According to standard "telephone book rules," we want these names sorted by last name, first name, and middle initial. We don't want the phone number to play a part in the sorting. So we want to start sorting on field 2, stop sorting on field 3, continue sorting on field 0, sort on field 1, and (just to make sure) stop sorting on field 2 (the last name). We can code this as follows:

% 

sort +2 -3 +0 -2 phonelist

 Lyndon B Johnson      933-1423 Robert M Johnson      344-0909 Samuel H Johnson      754-2542 Michael K Loukides    112-2535 Timothy F O'Reilly    443-2434 Jerry O Peek          267-2345

A few notes:

There are a couple of variations that are worth mentioning. You may never need them unless you're really serious about sorting data files, but it's good to keep them in the back of your mind. First, you can add any "collation" operations (discard blanks, numeric sort, etc.) to the end of a field specifier to describe how you want that field sorted. Using our previous example, let's say that if two names are identical, you want them sorted in numeric phone number order. The following command does the trick:

% 
sort +2 -3 +0 -2 +3n phonelist

The +3n option says "do a numeric sort on the fourth field." If you're worried about initial blanks (perhaps some of the phone numbers have area codes), use +3nb .

Second, you can specify individual columns within any field for sorting, using the notation +n.c , where n is a field number, and c is a character position within the field. Likewise, the notation -n.c says "stop sorting at the character before character c ." If you're counting characters, be sure to use the -b (ignore white space) option - otherwise, it will be very difficult to figure out what character you're counting.

- ML


Previous: 36.1 Putting Things in Order UNIX Power Tools Next: 36.3 Changing the Field Delimiter
36.1 Putting Things in Order Book Index 36.3 Changing the Field Delimiter

The UNIX CD Bookshelf Navigation The UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System