Awk in Shell Script
Awk has had a reputation of being overly complex and difficult to use. Scripting languages such as the UNIX shell and specialty tools such as awk and sed have been a standard part of the UNIX landscape since it became commercially available.
Indeed, both awk and sed are rather peculiar tools/languages. Both be familiar with traditional UNIX “regular expressions”, but not trivial to learn. Both tools seem to offer too many features. Therefore, understanding all the features of awk and sed and confidently applying them can take a while. The user can quickly and efficiently apply these tools once he/she understands their general usefulness and will become familiar with a subset of their most useful features. Explore this article and know more about awk in shell script.
A APROPOS ABOUT REGULAR EXPRESSIONS
Awk was named after its original developers: Aho, Weinberger, and Kernighan. Awk scripts are readily portable across all flavors of UNIX/Linux.
Awk is usually engaged to reprocess structured textual data. It can definitely be used as part of a command-line filter sequence, since by default, it expects its input from the standard input stream (stdin) and writes its output to the standard output stream (stdout). In some of the most operational applications, awk is used in concert with sed complementing each other’s strengths.
The following shell command scans the contents of a file called old file, changing all events of the word “UNIX” to “Linux” and writing the resulting text to a file called new file.
$ awk ‘{gsub(/UNIX/, “Linux”); print}’ oldfile \>\
newfile
Awk does not change the contents of the original file, it behaves as a stream editor passively writing new content to an output stream. Awk is commonly invoked from a parent shell script covering a grander scope, it can be used directly from the command line to perform a single direct task as just shown.
Although awk has been employed to perform a variety of tasks, it is most suitable for construing and manipulating textual data and generating formatted reports. A distinctive example application for awk is one where a lengthy system log file needs to be examined and summarized into a formatted report. Consider the log files generated by the send mail daemon or the uucp program. These files are usually lengthy, boring, and generally hard on a system administrator’s eyes. An awk script can be employed to parse each entry, produce a set of counts and flag those entries which represent chary activity.
Awk scripts is easy-to-read and are often several pages in length. The awk language offers the typical programming constructs expected in any high-level programming language. It has been defined as an interpreted version of the C language, but although there are comparisons, awk differs from C both semantically and syntactically.
AWK INVOCATION
At least two distinct methods can be used to invoke awk. The first includes the awk script in-line within the command line. The second allows the programmer to save the awk script to a file and denote to it on the command line.
awk ‘{
awk -Fc -f script_file [data-file-list …]
Notice that data-file-list is always optional, since by default awk reads from standard input. As a general rule, it is a good idea to maintain the awk script in a different file if it is of any significant size. The -F option controls the input field-delimiter character. The following are all valid instances of invoking awk at a shell prompt:
$ ls -l | awk -f
$ awk -f
$ awk -F: ‘{ print $2 }’
$ awk {‘print’} input_file
If the user acquires a thorough understanding of awk’s behavior, the complexity of the language syntax will not be so great. Awk offers a very well-defined and useful process model. The programmer is able to define groups of actions to occur in sequence before any data processing is performed, while each input record is processed, and after all input data has been processed.
With these groups in mind, the basic syntactical format of any awk script is as follows:
BEGIN { }
{ }
END { }
The code within the BEGIN section is executed by awk before it scans any of its input data. This section can be used to initialize user-defined variables or change the value of a built-in variable. If the user’s script is generating a formatted report, then the user might want to print out a heading in this section. The code within the END section is executed by awk after all of its input data has been processed. Both the END and BEGIN sections are optional in an awk script. The middle section is the implicit main input loop of an awk script. This section must contain at least one explicit action. That action can be as simple as an unconditional print statement. The code in this section is executed each time a record is encountered in the input data set. By default, a record delimiter is a line-feed character. So by default, a record is a single line of text. The programmer can redefine the default value of the record delimiter.
THE VERDICT
Scripting languages and specialty tools that allow fast development have been widely accepted for quite some time. Both awk and sed deserve a spot on any Linux developer’s and administrator’s workbench. Both tools are a standard part of any Linux platform. Together, awk and sed can be used to implement effectively any text filtering application—such as perform repetitive edits to data streams and files. We hope the information provided in this article is useful and inspires you to begin or expand your use of these tools.