Name: AWK command for scanning and analysing a large file in Linux
Item: AWK command for scanning and analysing a large file in Linux
Author: Pratik

Home » » AWK command for scanning and analysing a large file in Linux

5:40 PM

AWK command for scanning and analysing a large file in Linux

awk - A command for pattern scanning and processing of a large file

The awk utility shall execute programs written in the awk programming language, which is specialized for textual data manipulation.
An awk program is a sequence of patterns and corresponding actions. When input is read that matches a pattern, the action associated with that pattern is carried out.
The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non- s. This default white-space field delimiter can be changed by using the FS built-in variable or -F ERE.

Field Annotations

The awk utility shall denote the first field in a record $1, the second $2, and so on.
The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0.
Assigning to $0 shall reset the values of all other fields and the NF built-in variable.

Syntax:
awk '/search pattern1/ {Actions} /search pattern2/ {Actions}' file

Examples:

Pompt:>cat employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22

1. To print the entire file.
awk '{print;}' employee.txt

Pompt:>awk '{print;}' employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22

2. To print the line containing 23 or print employees of age 23.
Pompt:>awk '/23/' employee.txt
Sagar 363533 Pune Male 23

3.To print the line containing 23 or 22 ; or print all employees of age 23 or 22
Note:- All the search patterns are given in new line starting with '>'
Pompt:>awk '/23/
> /22/' employee.txt
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22

4. Print only 2nd column of the file or print only the employee Ids
Pompt:>awk '{print $2}' employee.txt
EmpId
624532
624534
62543
363533
624531

5. Print only 2nd and 4th column of the file or print only the employee Ids and Sex
Pompt:>awk '{print $2, $4}' employee.txt
EmpId sex
624532 Male
624534 Female
62543 Male
363533 Male
624531 Female

6. Print all the male employees. ie. 4th column is matching with 'Male'
Pompt:>awk '$4~/Male/{print;}' employee.txt
Shankar 624532 Hyderabad Male 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23

7. Print all the employee whose age is above or equal to 23.
Pompt:>awk '$5>=23{print;}' employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23

8. Print the no. of employees who are in Hyderabad.
Pompt:>awk 'BEGIN {count=0;}
> $3~/Hyderabad/ {count++;}
> END {print "No of employees in Hyderabad=", count;}' employee.txt
No of employees in Hyderabad= 2

9. Remove duplicate lines from a file

$ awk '!($0 in array) { array[$0]; print }' filename

The above command will use awk tool to search for a duplicate line, if it found again, it will not be printed.

shankys@local:> cat test
first
second
first
third
third
shankys@local:> awk '!($0 in array) { array[$0];print}' test
first
second
third

10. Print all lines from /etc/passwd that has the same uid and gid

$awk -F ':' '$3==$4' passwd.txt

Here in, we have checked if the 3rd columns matches the 4th columns, If yes, then they are printed. ":" is used as the seperator delimeter.

11. Read AWK instructions from a program-file and perform on target file (-f option).

-f program-file
Read the AWK program from the file program-file, instead of from the first command line argument and perform operations on target file. Multiple -f (or --file) options may be used.

awk -f awkfile targetfile

Here the awkfile is containing:

#! /usr/bin/awk -f
BEGIN{ FS= ";" }

$2 == "shankar" {arr[$3]++}

END{
        for(idx in arr) {
                print "./somescript " idx
        }
}

So the above awk will search the targetfile and if second column matches shankar, it will store 3rd column $3 in array arr. Later for each data in the array, a script somescript is executed with array value as argument.

Category: Open System-Linux | Views: 5441 | Added by: shanky | Tags: how to get a column from a large fi, file scanning in linux, awk, awk command in linux, how to use awk/gawk | Rating: 0.0/0

Related blogs

You may also like to see:

[2014-09-13]	[Open System-Linux]
md5sum: calculate and check md5 message digest of a file in Linux

[2017-01-21]	[Open System-Linux]
Useful tips and tricks while working in Linux.

[2016-05-24]	[Open System-Linux]
FACTER command in Linux : showing system facts

[2014-03-03]	[Open System-Linux]
Working with VI Editor: Basic and Advanced

[2014-10-25]	[Open System-Linux]
XMLWF command in Linux to check/validate/parse an XML file

Total comments: 2
Comments display order: 1 1 Pratik • 6:45 PM, 2014-04-01 Nice article. 0 2 shanky • 12:14 PM, 2014-04-03 Thanks

Email:
Password: