|
Processing files with awk, part twoIn this second of two columns on the awk programming utility, we show you how to print reports with awk's print and printf commands |
This is the second of two parts on awk, so if you missed the first part in last month's issue it's advisable to review it (see Resources below). Awk is a text processing utility that runs through a text file by reading and processing a record at a time. This month we show you how to print and format a user list with awk. (2,600 words)
Mail this article to a friend |
One more piece of awk syntax will make it an even more useful tool. I said in last month's column that awk treats the spaces in a record as a field separator. It is possible to change the field separator to another value.
Figure 1 is an example of a passwd file. The password itself in this example is replaced with a single exclamation mark. This file has several separate fields in it, but the field separator is a colon (:) rather than spaces.
Figure 1
root:!:0:1:Super User:/: daemon:!:1:1:System Daemons:/etc lbw:!:209:200:Lavinia Bowder Washinton:/home/lbw:/bin/csh bob:!:210:200:Robbie Cramer:/home/bob:/bin/ksh joann:!:213:200:Jo Ann Batson:/home/joann:/bin/ksh jlan:!:214:200:Jack Landon:/home/jlan:/bin/ksh jank:!:215:200:Jan Kingly:/home/jank:/bin/ksh ljn:!:216:200:Laura Nugent:/home/ljn:/bin/ksh mjb:!:220:200:Mo Budlong:/home/mjb:/bin/ksh bda:!:235:500:Basic Development Accnt:/home/bda:/bin/ksh obrero:!:245:500::/home/obrero:/bin/ksh guest1:!:501:500:Guest1 Account:/disk2/guest1:/bin/ksh guest2:!:502:500:Guest2 Account:/disk2/guest2:/bin/ksh guest3:!:503:500:Guest3 Account:/disk2/guest3:/bin/ksh beb:!:248:202:Becky E Brown :/home/beb:/bin/ksh
A passwd file can be used as the input file to awk for for an awk report by changing the field separator. Figure 2 is a short example. There are two points to notice.
First the logic in BEGIN{FS=":"}. In awk, FS is a pre-defined variable that contains the field separator. If you make no changes to it, the FS value is set to spaces. In this listing, the BEGIN logic sets FS to a colon (:), so the value of the field separator is changed before the first record is read. This allows the passwd records to be broken into fields at the colons.
The second point to notice is on line 3 of Figure 2. In all previous examples the file has been piped into awk using "ls -l|awk etc." In this example, the file is specifically named by placing it on the command line after the closing single quote at the end of the awk commands. Awk can take its input from a pipe as in previous examples, or from an explicitly named file (or files) as in Figure 2. Remember that the closing quote ends multiline input so be sure to type the closing quote, a space and then the name of the file.
Remember to type a TAB wherever you see the ^ mark.
Figure 2
awk ' BEGIN{FS=":"} {print $1 " ^" $5}' /etc/passwd
Unless you are in the C shell, the closing quote ends multiline input so be sure to type the closing quote, followed by a space and followed by the name of the file.
|
|
|
|
Figure 3
awk ' \ BEGIN{FS=":"} \ {print $1 " ^" $5}' /etc/passwd
Figure 4
awk ' BEGIN{FS=":"} {print $1 " ^" $5} ' /etc/passwd < this works as multiline input is still active awk ' BEGIN{FS=":"} {print $1 " ^" $5}' < multiline input ends here /etc/passwd < this won't work multiline input ended on the previous line
Figure 5 is a sample output from Figure 2 or Figure 3 for the C shell. The awk script selects field $1 which is the user id, and field $5 which is the user name and prints them with a tab between them.
Figure 5
root Super User daemon System Daemons lbw Lavinia Bowder Washinton bob Robbie Cramer joann Jo Ann Batson jlan Jack Landon jank Jan Kingly ljn Laura Nugent mjb Mo Budlong bda Basic Development Accnt obrero guest1 Guest1 Account guest2 Guest2 Account guest3 Guest3 Account beb Becky E Brown
Awk has a number of pre-defined variables. You have already seen FS. Another useful one is NR. This is a variable that contains the number of the current record. It is updated by 1 as each record is read. You may use this to number the output records as in Figure 6, the output of which would look like Figure 7.
Figure 6
awk ' BEGIN{FS=":"} {print NR ". ^" $1 " ^" $5}' /etc/passwd
Figure 7
1. root Super User 2. daemon System Daemons 3. lbw Lavinia Bowder Washinton 4. bob Robbie Cramer 6. joann Jo Ann Batson 7. jlan Jack Landon 8. jank Jan Kingly 9. ljn Laura Nugent 10. mjb Mo Budlong 11. bda Basic Development Accnt 12. obrero 13. guest1 Guest1 Account 14. guest2 Guest2 Account 15. guest3 Guest3 Account 16. beb Becky E Brown
You may also use NR in the END logic. After the last record is read, NR is left set to the value of the last record. Figure 8 would produce output that looks like Figure 9.
Figure 8
awk ' BEGIN{FS=":"} {print $1 " ^" $5} END{print "Total users = " NR}' /etc/passwd
Figure 9
root Super User daemon System Daemons lbw Lavinia Bowder Washinton bob Robbie Cramer joann Jo Ann Batson jlan Jack Landon jank Jan Kingly ljn Laura Nugent mjb Mo Budlong bda Basic Development Accnt obrero guest1 Guest1 Account guest2 Guest2 Account guest3 Guest3 Account beb Becky E Brown Total users = 16
Complex reporting: using printf to make it look right
The awk print command is good enough for a lot of reporting,
but when it comes to more complex or longer print layouts
involving tidy columns of information you need something
more powerful. The intent of Figure 10 is to print four
columns of information from the /etc/passwd file -- User id,
name, home pat, and login shell. The columns are separated
by tabs. The actual output looks something like Figure 11. A
single tab is not enough to produce decent alignment when
the fields are of substantially varying lengths.
Figure 10
awk ' BEGIN{FS=":";print "User ^Name ^Home ^Shell} {print $1 " ^" $5 " ^" $6 " ^" $7} END{print "Total users = " NR}' /etc/passwd
Figure 11
User Name Home Shell root Super User / daemon System Daemons /etc lbw Lavinia Bowder Washinton /home/lbw /bin/csh bob Robbie Cramer /home/bob /bin/ksh joann Jo Ann Batson /home/joann /bin/ksh jlan Jack Landon /home/jlan /bin/ksh jank Jan Kingly /home/jank /bin/ksh ljn Laura Nugent /home/ljn /bin/ksh mjb Mo Budlong /home/mjb /bin/ksh bda Basic Development Accnt /home/bda /bin/ksh obrero /home/obrero /bin/ksh guest1 Guest1 Account /disk2/guest1 /bin/ksh guest2 Guest2 Account /disk2/guest2 /bin/ksh guest3 Guest3 Account /disk2/guest3 /bin/ksh beb Becky E Brown /home/beb /bin/ksh Total users = 16
To handle this it is necessary to use the other awk print command which is printf (print formatted). The printf command is similar to the printf command of the C programming language, but a simplified explanation of the command is in order for those who do not know C.
The printf command is executed by providing a format string and a list of the values to be printed using the format string. These are separated by commas as in:
printf "format_string", $1, $3, $6, $7
Some versions of awk require parentheses around the arguments as in:
printf("format_string", $1, $3, $6, $7)
It is always safe to include the parentheses.
The values that can be used in a format string are very extensive and can format data in all sorts of ways, but for simple reports, the most useful format is the fixed width string.
A fixed width string field starts with a percent sign (%). If a minus sign (-) follows, then the printed data is left-justified within the fixed width of the field. Most string data is left-justified, so you should usually include the minus sign. The next part of the format is the length of the field, and finally an `s' ends the formatting. An example of this would be "%-30s" which is a field containing 30 left-justified characters. Using this format string with printf would look something like:
printf("%-30s",$1)
This would print field $1 in a left-justified, 30-character field space.
If field $1 does not contain 30 characters, then the field is padded with spaces until 30 character spaces are filled. One big advantage of a format string is that you can force a field to always print with a certain width by filling unused portions of the field with spaces. You may combine multiple format fields in a format string as in:
printf("%-20s%-30s", $1, $2)
This example will take field $1 and place it, left-justified into the first printing position. The field will be padded until it is 20 characters long. Then field $2 will be appended and padded out to 30 characters. This guarantees that columns will line up under one another. The format string for each field should be long enough to accommodate the largest value that will be placed in the field.
There is one small hitch in printf. The print command automatically prints a newline at the end of each print statement. The printf command does not, so you must explicitly end the format string with a newline "\n".
Using these rules, let's create a format string for the four fields that we want to print from the /etc/passwd file. In Figure 12 I have taken the four fields, found the longest example, made a guess as to a safe width to use, and then created a format string that is one character longer than the safe width. This allows for a minimum of a single space between fields.
Figure 12
Field | Longest | Safe Width | Format |
---|---|---|---|
User id | 6 | 10 | "%-11s" |
Name | 25 | 30 | "%-31s" |
Home | 8 | 15 | "%-16s" |
Shell | 8 | 15 | "%-16s" |
The next step is to combine all of the fields into one long format string and append a newline.
printf("%-11s%-31s%-16s%-16s\n")
Finally list the fields to be printed with separating commas.
printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)
For your version of awk the format string and list of values after printf may not need to be enclosed in parentheses as in:
printf "%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7
It is always safe to use the parentheses, but in many versions of awk you do not need them.
Figure 13 is the first version of the awk script using printf. It does not include column titles.
Figure 13
awk ' BEGIN{FS=":"} {printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)} END{print "Total users = " NR}' /etc/passwd
Figure 14 is the C shell version of the same listing.
Figure 14
awk ' \ BEGIN{FS=":"} \ {printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)} \ END{print "Total users = " NR}' /etc/passwd
Adding column titles involves ensuring that the column titles actually line up with the fields in the format string. Figure 15 uses a simple trick to ensure that the column titles do align. The values used by printf to fill a format string when printing do not need to be variables. They can also be strings. The header or title line can be created by using the same format string that was used in the body of the report.
Figure 15
awk ' BEGIN{FS=":"; printf("%-11s%-31s%-16s%-16s\n","User","Name","Home","Shell")} {printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)} END{print "Total users = " NR}' /etc/passwd
The output from Figure 15 is shown in Figure 16 -- it's a much more readable and useful output.
Figure 16
User Name Home Shell root Super User / daemon System Daemons /etc lbw Lavinia Bowder Washinton /home/lbw /bin/csh bob Robbie Cramer /home/bob /bin/ksh joann Jo Ann Batson /home/joann /bin/ksh jlan Jack Landon /home/jlan /bin/ksh jank Jan Kingly /home/jank /bin/ksh ljn Laura Nugent /home/ljn /bin/ksh mjb Mo Budlong /home/mjb /bin/ksh bda Basic Development Accnt /home/bda /bin/ksh obrero /home/obrero /bin/ksh guest1 Guest1 Account /disk2/guest1 /bin/ksh guest2 Guest2 Account /disk2/guest2 /bin/ksh guest3 Guest3 Account /disk2/guest3 /bin/ksh beb Becky E Brown /home/beb /bin/ksh Total users = 16
In case you're offended by figure 15
Just before I put this article to bed, there is one thing in
Figure 15 that offends me as a programmer. The format string
is repeated twice, on lines 3 and 4. From a programming
standpoint this is not optimum. If you need to change the
report layout you have to modify the format string twice,
and that leads to potential typographical errors.
You will recall from one of the earlier examples that we used a variable to save the total bytes for all files that were listed. Why not create a variable that contains the format string? In Figure 17 the format string has been assigned to a variable named format as part of the BEGIN logic. In the printf commands, the variable "format" is used as the format string for both the title line and the individual record lines instead of a literal format string. The output is exactly the same as Figure 16. Figure 18 is the C shell version.
Figure 17
awk ' BEGIN{FS=":"; format = "%-11s%-31s%-16s%-16s\n"; printf(format,"User","Name","Home","Shell")} {printf(format,$1,$5,$6,$7)} END{print "Total users = " NR}' /etc/passwd
Figure 18
awk ' \ BEGIN{FS=":"; \ format = "%-11s%-31s%-16s%-16s\n"; \ printf(format,"User","Name","Home","Shell")} \ {printf(format,$1,$5,$6,$7)} \ END{print "Total users = " NR}' /etc/passwd
So far all the examples I have given have been typed directly at the command line. You may also open a file with vi, type the above lines exactly as given in Figure 17. Add an initial line that forces a Bourne or Korn shell to execute the commands as in Figure 19 and save the file as userlist.
Figure 19
#!/bin/ksh # (or /bin/sh) awk ' BEGIN{FS=":"; format = "%-11s%-31s%-16s%-16s\n"; printf(format,"User","Name","Home","Shell")} {printf(format,$1,$5,$6,$7)} END{print "Total users = " NR}' /etc/passwd
Change the execution privileges using:
chmod a+x userlist
and you now have a script that will display a user list any time you type "userlist." You may also send the output to a file using redirection as in:
userlist >userlist.txt
or to a printer using one of the printer pipes such as:
userlist|lp
In Figure 19 I created a shell script that executed an awk command on a specific file. This is not a true awk script, but a shell script that executed awk. An awk script includes only the awk commands. Assume for a moment that for security reasons, a copy of the /etc/passwd file is saved every week, allowing a running record of who had access to the system at any time in the past. An awk script could be created by using only the awk commands in Figure 19. This would look like Figure 20. Save this file as userfmt.awk or some similar name to identify it as containing awk commands.
Figure 20
BEGIN{FS=":"; format = "%-11s%-31s%-16s%-16s\n"; printf(format,"User","Name","Home","Shell")} {printf(format,$1,$5,$6,$7)} END{print "Total users = " NR}
To execute the awk script, use a -f switch to identify the awk script as in:
awk -f userfmt.awk /etc/passwd
Using this awk script you can process any earlier saved versions of the passwd file as in:
awk -f userfmt.awk /old/passswd.970404 >users_970404.txt
Believe it or not, these two articles only scratch the surface of awk. An excellent book on the subject is sed & awk by O'Reilly and Associates, Inc (see Resources below). If you intend to pursue awk further I recommend the book strongly.
|
Resources
About the author
Mo Budlong is president of King Computer Services, Inc. and has been involved in Unix development on Sun and other platforms for over 15 years. King Computer Services, Inc. specializes in Unix and client/server consulting and training and currently publishes the COBOL Just In Time Course, a crash course for the Year 2000 problem.
Reach By at mo.budlong@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-05-1997/swol-05-unix101.html
Last modified: