To grep something from a file in shell script

Question

I have a file with some contents as follows -

[1412272372] SERVICE ALERT: abc.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272412] SERVICE ALERT: def.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272432] SERVICE ALERT: fgh.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272442] SERVICE ALERT: fgh.com;value;WARNING;HARD;3;CRITICAL: 2014-09-14

From this, I want to grep the site name and date only, then save it to a new file. So after that the new file should be as follows -

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

Any help would be appreciated.

Thanks in advance.

So you want a list of unique site + date entries? You have one fgh.com entry for 2014-09-14 in the output, despite there being two lines in the input. — Jonathan Leffler
– Jonathan Leffler, Commented Oct 3, 2014 at 18:02
@Abhi: grep alone may be not enough, but grep + cut + sort can be very powerful. The beauty of unix, lots of little programs that do one thing very well can be combined to achieve complex tasks! — Paulo Scardine
– Paulo Scardine, Commented Oct 3, 2014 at 18:23

Cyrus · Accepted Answer · 2014-10-03 18:39:00Z

2

sed -E 's/.*: ([^;]*);.*: (.*)/\1 - \2/' file | uniq

Output:

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

Or something unusual with same output:

tr -s ":; " ":" < file | cut -d : -f 4,10 --output-delimiter=" - " | uniq

edited Oct 3, 2014 at 18:39

answered Oct 3, 2014 at 18:13

Cyrus

90.2k15 gold badges112 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

itzforu Over a year ago

Thank you. I used this and got the desired output

JNevill · Accepted Answer · 2014-10-03 18:11:54Z

1

Just to add to the pile on... You can solve this using arrays within AWK as well:

awk -F'[:;]' '{arr[$2," -",$8]++}END{for (a in arr) print a}' <file>

This will split by semi-colon or colon, then send elements 2 and 8 to an array, then iterate through the array with the for loop, printing each element it finds causing only unique values to come through.

answered Oct 3, 2014 at 18:11

JNevill

50.6k4 gold badges46 silver badges71 bronze badges

3 Comments

Jonathan Leffler Over a year ago

This works quite nicely, but doesn't start producing output until all the data has been read. It probably doesn't matter, but it is possible to do the printing as the data arrives, which makes for a smoother flow if the files are huge (the awk script isn't a bottleneck).

JNevill Over a year ago

@Jonathan. I just realized your script is essentially the same thing without waiting for the file to be completely read by awk before hitting the array. I thought Seen was some cool awk functionality I hadn't run into yet, and now realize it's an array... sometimes I'm a little slow.

Jonathan Leffler Over a year ago

It just occurred to me; the output from this includes the SUBSEP character, which is '\x1C' by default. I suppose the neat trick here would be to set SUBSEP to ' ' (space) and use arr[$2,"-"$8] to get the required format.

Jonathan Leffler · Accepted Answer · 2014-10-03 18:15:42Z

awk -F'[:;]' '{if (seen[$2,$NF]++ == 0) print $2 " -" $NF}' data

This only prints the site if it has not been seen before for that date. Thus, it produces:

 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14

The output includes a blank at the start of the site name. If you want that eliminated too, then you need to go for:

awk -F'[:; ]' '{if (seen[$5,$NF]++ == 0) print $5 " - " $NF}' data

There's an empty field between each occurrence of colon and space. That produces:

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

(which is, admittedly, very similar to the previous output).

You could eliminate the empty fields by using a repeatable delimiter:

awk -F'[:; ]+' '{if (seen[$4,$NF]++ == 0) print $4 " - " $NF}' data

This has the same output as the previous script.

John1024 · Accepted Answer · 2014-10-03 20:22:28Z

1

$ awk -F'[:;]' '{print $2 " -" $NF}' data
 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14
 fgh.com - 2014-09-14

Explanation:

-F'[:;]'

The peculiar part of your data is that the fields are sometimes separated by a colon and sometimes by a semicolon. With the -F option, we tell awk to accept either character as a field separator.
print $2 " -" $NF

This prints the output. $2 refers to the second field which is the site name. The date is the last field which is signified by $NF.

Keeping only unique results

$ awk -F'[:;]' '{print $2 " -" $NF}' data | sort -u
 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14

edited Oct 3, 2014 at 20:22

answered Oct 3, 2014 at 18:01

John1024

115k15 gold badges152 silver badges183 bronze badges

Comments

Avinash Raj · Accepted Answer · 2014-10-03 18:01:22Z

0

You could try the below awk command,

$ awk -F'[:; ]' '{print $5" - "$12}' file
abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14
fgh.com - 2014-09-14

answered Oct 3, 2014 at 18:01

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Comments

Paulo Scardine · Accepted Answer · 2014-10-03 19:02:30Z

0

Without awk:

$ grep WARNING file.log |
    cut --delimiter=":" --output-delimiter=";" --fields=2,3 |
    cut --delimiter=";" --output-delimiter=" -" --fields=1,7

It is more verbose than awk but IMHO more readable as well. If you want unique entries, pipe the result through sort -u.

If you are not filtering, just remove the grep command and add the filename on the first cut:

$ cut -d : --output-delimiter=";" --fields=2,3 file.log |
    cut --delimiter=";" --output-delimiter=" -" --fields=1,7 |
    sort --unique

edited Oct 3, 2014 at 19:02

answered Oct 3, 2014 at 18:09

Paulo Scardine

78.3k12 gold badges134 silver badges153 bronze badges

2 Comments

Jonathan Leffler Over a year ago

The backslashes at the ends of the line are unnecessary. I guess beauty is in the eye of the beholder; seeing three commands used where one is sufficient is not what I like. Also, it is not clear that you can ignore the 'CRITICAL' lines; there could be a site with only critical alerts on a given day that should be shown.

Paulo Scardine Over a year ago

The sample output from the question is one line shorter than the input, so I inferred he wanted to filter out the CRITICAL (off course "critical" seems more important that "warning", but hey, it is not my question :-). Ok, backslashes gone, but you could argue that some of the quotes are optional as well. I'm a Python programmer, "explicit is better than implicit".

Collectives™ on Stack Overflow

To grep something from a file in shell script

6 Answers 6

1 Comment

3 Comments

Comments

Keeping only unique results

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

3 Comments

Comments

Keeping only unique results

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related