Bash Script Printing Output Twice

Question

I have the following script which runs commands on each file in a directory to match for a specific pattern. It then prints the matching output to a .csv. I have the desired formatting, however each pattern that I am matching on is getting printed twice. Like this:

Match1
Match2
Match1
Match2

Piping uniq and sort into this script is not fixing the problem so I suspect my syntax is off. I have not been able to find a solution via Google or other answers thus far. Any help is appreciated, thanks!

#!/usr/bin/env bash
FILES=/Users/User1/Desktop/Folder/"*"
for f in $FILES
do
  echo "Processing $f file..."
  # take action on each file. $f store current file name

    sed -n /"New Filters"/,/"Modified Filters"/p "$f" | grep -v -e 'Bugtraq ID:' 
  -e 'Common Vulnerabilities and Exposures:' -e 'Android' | grep -E '(^|[^0-9]) 
  [0-9]{5}($|[^0-9])'| sed 's/:/,/1' >> NewFile.csv

   echo "Complete. Check NewFile.csv"
 done;

Sample Input: Expected Result is to extract text in bold

Filters
New Filters
Modified Filters (logic changes)
Modified
Filters (metadata changes only)
Removed Filters

Filters
New Filters:
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1

Modified Filters (logic changes):
Text I don't want

Modified Filters (metadata changes only):
Text I don't want

Hello, and welcome to Stack Overflow. It would help a lot if you also posted some sample data, so we don't have to try to reverse-engineer them from your code. Without a way to quickly test what's happening, most potential answerers will not even bother trying to decipher it. — Amadan
– Amadan, Commented Jul 4, 2018 at 11:45

Ed Morton · Accepted Answer · 2018-07-04 11:59:57Z

2

We can't tell what your problem is without sample input/output so this isn't an answer to that, but here's how to really do what you're trying to do with that script:

awk '
FNR==1 { printf "Processing %s file...\n", FILENAME | "cat>&2" }
/"New Filters"/ { inBlock=1 }
inBlock {
    if ( !/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/ &&
             /(^|[^0-9])[0-9]{5}($|[^0-9])/ ) {
        sub(/:/,",")
        print
    }
}
/"Modified Filters"/ { inBlock=0 }
' /Users/User1/Desktop/Folder/"*" > "NewFile.csv"
echo "Complete. Check NewFile.csv"

Note that there's no shell loop required. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Any time you find yourself using multiple commands (in particular multiple seds and/or greps) and pipes just to manipulate text, consider just using awk instead.

answered Jul 4, 2018 at 11:59

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

John Kugelman · Accepted Answer · 2018-07-04 11:47:04Z

1

Are you running the script twice? It appends with >> NewFile.csv without truncating the file at the beginning, so if run twice the CSV file would end up with repeated output. You can add > NewFile.csv at the beginning to empty out the output file.

Or, perhaps you have duplicate input files.

answered Jul 4, 2018 at 11:47

John Kugelman

365k70 gold badges555 silver badges600 bronze badges

5 Comments

Nick Over a year ago

I'm running the script once on two files. I tested again by running on a single file and am still getting duplicates. Where exactly do you recommend adding > NewFile.csv to?

Ed Morton Over a year ago

Why not simply show us the file so we can help you? Right now it's like you're asking a mechanic to diagnose a problem with your car but only letting him see half the car. See How to Ask if that's not clear and in particular pay attention to the part about creating a minimal reproducible example.

John Kugelman Over a year ago

Put > NewFile.csv on its own line. It's a standalone command that will truncate the file.

Nick Over a year ago

Thank you all for the input. I've added sample input and what I am aiming to extract. While going through the file I found that "New Filters" and "Modified Filters" was mentioned more than once. I believe I need to specify with the first sed command to grab the text between the 2nd match of "New Filters and 2nd of "Modified Filters".

Ed Morton Over a year ago

The way to format input, output, and code in questions and answers is by indenting it 4 spaces (the editors {} button will do that for you), not by placing a > at the start of each line. Though the results look similar in the forum, the former gives us something we can simply copy/paste for testing with while the latter would require us to edit to remove the >s which is undesirable.

Shakiba Moshiri · Accepted Answer · 2018-07-06 12:20:13Z

if you need:

extract anything between
- New Filter ... Modified Filters
but exclude
- Bugtraq ID:
- Common Vulnerabilities and Exposures:
- Android
also match
- 5 digits up to 1 digit at the end
plus
- replace the first : with ,

then you can try

perl -lne 'BEGIN{$/=undef} push @r,$& while /(?<=New Filters).*?(?=Modified Filters)/gs; @r2=grep(!/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/g,@r); /\d{5}[^\n]+\d/g && ($_=$&) && s/:/,/ && print for @r2' file

for this sample input file

dified Filters (logic changes)   
Modified  
Filters (metadata changes only)   
Removed Filters  

Filters     
New Filters:  
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  

Modified Filters (logic changes):   
Text I don't want  

Modified Filters (metadata changes only):   
Text I don't want  


New Filters:  
Bugtraq ID:

Modified Filters (logic changes):   


New Filters:  
Common Vulnerabilities and Exposures:


Modified Filters (logic changes):   


New Filters:  
Android
Modified Filters (logic changes):   


New Filters:  

29723: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  
Modified Filters (logic changes):   


New Filters:  

29724: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  

Modified Filters (logic changes):

output will be:

29722, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29723, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29724, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1

Collectives™ on Stack Overflow

Bash Script Printing Output Twice

3 Answers 3

Comments

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related