0

I have a file with some contents as follows -

[1412272372] SERVICE ALERT: abc.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272412] SERVICE ALERT: def.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272432] SERVICE ALERT: fgh.com;value;WARNING;HARD;3;WARNING: 2014-09-14
[1412272442] SERVICE ALERT: fgh.com;value;WARNING;HARD;3;CRITICAL: 2014-09-14

From this, I want to grep the site name and date only, then save it to a new file. So after that the new file should be as follows -

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

Any help would be appreciated.

Thanks in advance.

3
  • Grep wont help...try awk or sed Commented Oct 3, 2014 at 18:02
  • So you want a list of unique site + date entries? You have one fgh.com entry for 2014-09-14 in the output, despite there being two lines in the input. Commented Oct 3, 2014 at 18:02
  • @Abhi: grep alone may be not enough, but grep + cut + sort can be very powerful. The beauty of unix, lots of little programs that do one thing very well can be combined to achieve complex tasks! Commented Oct 3, 2014 at 18:23

6 Answers 6

2
sed -E 's/.*: ([^;]*);.*: (.*)/\1 - \2/' file | uniq

Output:

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

Or something unusual with same output:

tr -s ":; " ":" < file | cut -d : -f 4,10 --output-delimiter=" - " | uniq
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I used this and got the desired output
1

Just to add to the pile on... You can solve this using arrays within AWK as well:

awk -F'[:;]' '{arr[$2," -",$8]++}END{for (a in arr) print a}' <file>

This will split by semi-colon or colon, then send elements 2 and 8 to an array, then iterate through the array with the for loop, printing each element it finds causing only unique values to come through.

3 Comments

This works quite nicely, but doesn't start producing output until all the data has been read. It probably doesn't matter, but it is possible to do the printing as the data arrives, which makes for a smoother flow if the files are huge (the awk script isn't a bottleneck).
@Jonathan. I just realized your script is essentially the same thing without waiting for the file to be completely read by awk before hitting the array. I thought Seen was some cool awk functionality I hadn't run into yet, and now realize it's an array... sometimes I'm a little slow.
It just occurred to me; the output from this includes the SUBSEP character, which is '\x1C' by default. I suppose the neat trick here would be to set SUBSEP to ' ' (space) and use arr[$2,"-"$8] to get the required format.
1
awk -F'[:;]' '{if (seen[$2,$NF]++ == 0) print $2 " -" $NF}' data

This only prints the site if it has not been seen before for that date. Thus, it produces:

 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14

The output includes a blank at the start of the site name. If you want that eliminated too, then you need to go for:

awk -F'[:; ]' '{if (seen[$5,$NF]++ == 0) print $5 " - " $NF}' data

There's an empty field between each occurrence of colon and space. That produces:

abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14

(which is, admittedly, very similar to the previous output).

You could eliminate the empty fields by using a repeatable delimiter:

awk -F'[:; ]+' '{if (seen[$4,$NF]++ == 0) print $4 " - " $NF}' data

This has the same output as the previous script.

Comments

1
$ awk -F'[:;]' '{print $2 " -" $NF}' data
 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14
 fgh.com - 2014-09-14

Explanation:

  • -F'[:;]'

    The peculiar part of your data is that the fields are sometimes separated by a colon and sometimes by a semicolon. With the -F option, we tell awk to accept either character as a field separator.

  • print $2 " -" $NF

    This prints the output. $2 refers to the second field which is the site name. The date is the last field which is signified by $NF.

Keeping only unique results

$ awk -F'[:;]' '{print $2 " -" $NF}' data | sort -u
 abc.com - 2014-09-14
 def.com - 2014-09-14
 fgh.com - 2014-09-14

Comments

0

You could try the below awk command,

$ awk -F'[:; ]' '{print $5" - "$12}' file
abc.com - 2014-09-14
def.com - 2014-09-14
fgh.com - 2014-09-14
fgh.com - 2014-09-14

Comments

0

Without awk:

$ grep WARNING file.log |
    cut --delimiter=":" --output-delimiter=";" --fields=2,3 |
    cut --delimiter=";" --output-delimiter=" -" --fields=1,7

It is more verbose than awk but IMHO more readable as well. If you want unique entries, pipe the result through sort -u.

If you are not filtering, just remove the grep command and add the filename on the first cut:

$ cut -d : --output-delimiter=";" --fields=2,3 file.log |
    cut --delimiter=";" --output-delimiter=" -" --fields=1,7 |
    sort --unique

2 Comments

The backslashes at the ends of the line are unnecessary. I guess beauty is in the eye of the beholder; seeing three commands used where one is sufficient is not what I like. Also, it is not clear that you can ignore the 'CRITICAL' lines; there could be a site with only critical alerts on a given day that should be shown.
The sample output from the question is one line shorter than the input, so I inferred he wanted to filter out the CRITICAL (off course "critical" seems more important that "warning", but hey, it is not my question :-). Ok, backslashes gone, but you could argue that some of the quotes are optional as well. I'm a Python programmer, "explicit is better than implicit".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.