I want to use a bash script to process 1 input file into 2 output files, each containing the same number of lines as the input file but with different parts of the input line. In particular one of the output files has to contain a md5hash of a selection of the input line, (hash calculated per line, not per file!):
So
Input_file.txt: ** 3 fields, separated by space
12347654 abcdfg 1verylongalpha1234numeric1
34543673 nvjfur 2verylongalpha1234numeric2
75868643 vbdhde 3verylongalpha1234numeric3
output file_1.txt would have to look like this: (left field is MD5sum, right field is field3 from input file which is also contained in the MD5hash):
12df5j754G75f738fjk3483df3fdf9 1verylongalpha1234numeric1
3jf75j47fh4G84ka9J884hs355jhd8 2verylongalpha1234numeric2
4hf7dn46chG4875ldgkk348fk345d9 3verylongalpha1234numeric3
output file_2.txt would have to look like this: (field1 and field2 from input file + MD5HASH)
12347654 abcdfg 12df5j754G75f738fjk3483df3fdf9
34543673 nvjfur 3jf75j47fh4G84ka9J884hs355jhd8
75868643 vbdhde 4hf7dn46chG4875ldgkk348fk345d9
I already have a script that doesthe job but it performs very badly: (script below may not work, this is from the top of my head, no linux here where I write this, sorry)
#!/bin/bash
While read line
do MD5_HASH=${sed -nr 's/^[[:digit:]]*\s[[:alpha:]]*\s([[:alnum:]]*)/\1/p' <<<$line | md5sum}
read $line DATA_PART1 DATA_PART2 DATA_PART3
echo "$MD5_HASH $DATA_PART3" >> file_1.txt ##append file_2.txt in loop THIS IS WHERE IT GETS HORRIBLY SLOW!
echo "$DATA_PART1 $DATA_PART2 $MD5_HASH"
done < input_file.txt > file_2.txt
exit 0
I think that the "redirect stdout to file with append construct" '>>' is responsible for the slow performance, but I can't think of another way. Its in the loop because I have to calculate the md5hash per line.
(and oh, the sed command is necessary because in reality the part that goes into the MD5SUM can only be captured with regex and a quite complex pattern)
So anyone have a suggestion?
done). Please correct it and format the code as a code block, not as a quote (I already changed the formatting for you file samples, please do the same for code)