Home » 2014 » March » 17 » AWK command for scanning and analysing a large file in Linux

5:40 PM
AWK command for scanning and analysing a large file in Linux
awk - A command for pattern scanning and processing of a large file
 
  1. The awk utility shall execute programs written in the awk programming language, which is specialized for textual data manipulation.
  2. An awk program is a sequence of patterns and corresponding actions. When input is read that matches a pattern, the action associated with that pattern is carried out.
  3. The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non- s. This default white-space field delimiter can be changed by using the FS built-in variable or -F ERE.
     

Field Annotations

  • The awk utility shall denote the first field in a record $1, the second $2, and so on.
  • The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0.
  • Assigning to $0 shall reset the values of all other fields and the NF built-in variable.


Syntax:
awk '/search pattern1/ {Actions} /search pattern2/ {Actions}' file



Examples:

Pompt:>cat employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22



1. To print the entire file.
awk '{print;}' employee.txt

Pompt:>awk '{print;}' employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22


2. To print the line containing 23 or print employees of age 23.
Pompt:>awk '/23/' employee.txt
Sagar 363533 Pune Male 23

 

3.To print the line containing 23 or 22 ; or print all employees of age 23 or 22
Note:- All the search patterns are given in new line starting with '>'
Pompt:>awk '/23/
> /22/' employee.txt
Sagar 363533 Pune Male 23
Bhavani 624531 Lahore Female 22

 

4. Print only 2nd column of the file or print only the employee Ids
Pompt:>awk '{print $2}' employee.txt
EmpId
624532
624534
62543
363533
624531


5. Print only 2nd and 4th column of the file or print only the employee Ids and Sex
Pompt:>awk '{print $2, $4}' employee.txt
EmpId sex
624532 Male
624534 Female
62543 Male
363533 Male
624531 Female

 

6. Print all the male employees. ie. 4th column is matching with 'Male'
Pompt:>awk '$4~/Male/{print;}' employee.txt
Shankar 624532 Hyderabad Male 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23

 

7. Print all the employee whose age is above or equal to 23.
Pompt:>awk '$5>=23{print;}' employee.txt
Name EmpId City sex age
Shankar 624532 Hyderabad Male 24
Priyanka 624534 Bangalore Female 24
Anirudh 62543 Chennai Male 24
Sagar 363533 Pune Male 23

 

8. Print the no. of employees who are in Hyderabad.
Pompt:>awk 'BEGIN {count=0;}
> $3~/Hyderabad/ {count++;}
> END {print "No of employees in Hyderabad=", count;}' employee.txt
No of employees in Hyderabad= 2

9. Remove duplicate lines from a file 

$ awk '!($0 in array) { array[$0]; print }' filename

 The above command will use awk tool to search for a duplicate line, if it found again, it will not be printed.

shankys@local:> cat test
first
second
first
third
third

shankys@local:> awk '!($0 in array) { array[$0];print}' test
first
second
third


10. Print all lines from /etc/passwd that has the same uid and gid

$awk -F ':' '$3==$4' passwd.txt

Here in, we have checked if the 3rd columns matches the 4th columns, If yes, then they are printed. ":" is used as the seperator delimeter.

11. Read AWK instructions from a program-file and perform on target file (-f option).

-f program-file
              Read the AWK program from the file program-file, instead of from the first command line argument and perform operations on target file.  Multiple -f (or --file) options  may be used.

awk -f awkfile targetfile

Here the awkfile is containing: 

#! /usr/bin/awk -f
BEGIN{ FS= ";" }

$2 == "shankar" {arr[$3]++}

END{
        for(idx in arr) {
                print "./somescript " idx
        }
}

 So the above awk will search the targetfile and if second column matches shankar, it will store 3rd column $3 in array arr. Later for each data in the array, a script somescript is executed with array value as argument.

 
 

Category: Open System-Linux | Views: 5017 | Added by: shanky | Tags: how to get a column from a large fi, file scanning in linux, awk, awk command in linux, how to use awk/gawk | Rating: 0.0/0

Related blogs


You may also like to see:


[2016-05-11][Open System-Linux]
An example to understand bash exit code $? in linux.
[2014-09-21][Open System-Linux]
How to define command prompt strings in Linux : what are $PS1, $PS2, $PS3 and $PS4?
[2014-03-25][Open System-Linux]
Create a new user in Linux system: useradd
[2015-06-08][Open System-Linux]
CHAGE command in LINUX
[2014-03-03][Open System-Linux]
Working with VI Editor: Basic and Advanced

Total comments: 2
avatar
1
1 Pratik • 6:45 PM, 2014-04-01
Nice article.
avatar
0
2 shanky • 12:14 PM, 2014-04-03
Thanks wink
ComForm">
avatar