Unix Command to get Common lines in two files

Posted: October 12, 2011 in Unix Commands
Tags: , ,

I have a friend who keeps bugging me for small small scripting solutions, and every time I enjoy helping him just because all those times I learn new things in Unix. But I never thought of capturing these things anywhere. Then I thought of putting them in my own blog, here! 🙂

Coming to the recent question he asked me, he wanted to capture the common entries in two files.
Say, file1 contains following entries:

And file2 contains:

How do I find the common lines? Simple, use comm command.
$ comm -12 file1 file2

Smooth, isn’t it? 🙂
But what if the files are not sorted? Then comm can’t help you alone, you need to take help of another command sort.

Consider the above files are not sorted, then:
$ sort file1 > new_file1
$ sort file2 > new_file2
$ comm -12 new_file1 new_file2

would give the same result as above.

His next question was ‘what is -12 used in the command?’
I thought ‘can’t he have a look at man pages of comm?’

Anyway, here is an extract of manual pages of comm Unix command for your reference:

comm [ -123 ]  file1  file2

The comm utility will read file1 and file2, which should  be ordered in the current collating sequence, and produce three text columns as output: lines only in file1; lines  only  in file2; and lines in both files.
If the input files were ordered according to  the  collating sequence of the current locale, the lines written will be in the collating sequence of the original lines.  If  not,  the results are unspecified.

The following options are supported:
-1    Suppress the output column of lines unique to file1.
-2    Suppress the output column of lines unique to file2.
-3    Suppress the output  column  of  lines  duplicated  in file1 and file2.

The following operands are supported:
file1: A path name of the first file to be compared. If file1 is -, the standard input is used.
file2: A path name of the second  file  to  be  compared.  If file2 is -, the standard input is used.

If file1, file2, and file3 each contained a sorted  list  of utilities:

$ comm -23 file1 file2  | comm -23 – file3
would print a list of utilities in file1  not  specified  by either of the other files;

$ comm -12 file1 file2 | comm -12 – file3
would print a list  of  utilities  specified  by  all  three files;

$ comm -12  file2 file3 | comm -23 -file1
would print a list of utilities specified by both file2  and file3, but not specified in file1.

  1. wow…Thanks for the info gangu 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s