1

I have a file (file1.txt) with text as:

aaa,,,,,
aaa,10001781,,,,
aaa,10001782,,,,
bbb,10001783,,,,

My file2 contents are:

11111111
10001781
11111222

I need to search second field of file1 in file2 and delete the line from file1 if pattern is matching.So output will be:

aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,

Can I use grep and cut commands for this?

2 Answers 2

8

This prints lines from file1.txt only if the second field is not in file2:

$ awk -F, 'FNR==NR{a[$1]=1; next;} !a[$2]' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,

How it works

This works by reading file2 and keeping track of all lines seen in an associative array a. Then, lines in file1.txt are printed only if its column 2 is not in a. In more detail:

  • FNR==NR{a[$1]=1; next;}

    When reading file2, set a[$1] to 1 to signal that we have seen the value on this line. We then instruct awk to skip the rest of the commands and start over on the next line.

    This section is only run for file2 because file2 is listed first on the command line and FNR==NR only when we are reading the first file listed on the command line. This is because FNR is the number of lines read from the current file and NR is the total number of lines read so far. These two are equal only for the first file.

  • !a[$2]

    When reading file1.txt, a[$2] evaluates to true if column 2 was seen in file2. Since ! is negation, !a[$2] evaluates to true when column 2 was not seen. When this evaluates to true, the line is printed.

Alternative

This is the same logic, expressed in a slightly different style, as suggested in the comments by Tom Fenech:

$ awk -F, 'FNR==NR{a[$1]; next;} !($2 in a)' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your reply.as jurgemaister pointed that all command versions in my solaris are pointing to old folder '/usr/bin/'.Hence i get error as "awk: syntax error near line 1,awk: bailing out near line 1"
Try running which -a awk gawk. See if you have a newer awk installed.
@user1768029 OK. In place of the default awk, try running nawk or /usr/xpg4/bin/awk or /usr/xpg6/bin/awk.
I would have probably gone with simply setting the key in the array a[$1]; then using $2 in a but either way works. Nice explanation anyway.
@TomFenech Very good; I added code with your approach to the answer.
1

Soulution with grep

$ grep -vf file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,

John1024's awk soulution would be faster for large files though.

9 Comments

I get error as - "grep: illegal option -- f". I am using bash shell.Can we use this command in bash ?
Which OS and which version of grep do you use?
SunOS boc02 5.10 Generic_144488-12 sun4u sparc SUNW,Sun-Fire-V240
Or just update your PATH to have /usr/xpg4/bin before /usr/bin. Usually this is set in your .profile or similar.
This assumes that the field cannot occur as a substring of an unwanted field, and that it will not occur in another column than the one you intended to search. For these reasons, the Awk solution is much superior.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.