Have a reference file "names.txt" with data as below:
Tom
Jerry
Mickey
Note: there are 20k lines in the file "names.txt"
There is another delimited file with multiple lines for every key from the reference file "names.txt" as below:
Name~~Id~~Marks~~Column4~~Column5
Note: there are about 30 columns in the delimited file:
The delimited file looks like something :
Tom~~123~~50~~C4~~C5
Tom~~111~~45~~C4~~C5
Tom~~321~~33~~C4~~C5
.
.
Jerry~~222~~13~~C4~~C5
Jerry~~888~~98~~C4~~C5
.
.
Need to extract rows from the delimited file for every key from the file "names.txt" having the highest value in the "Marks" column.
So, there will be one row in the output file for every key form the file "names.txt".
Below is the code snipped in unix that I am using which is working perfectly fine but it takes around 2 hours to execute the script.
while read -r line; do
getData `echo ${line// /}`
done < names.txt
function getData
{
name=$1
grep ${name} ${delimited_file} | awk -F"~~" '{if($1==name1 && $3>max){op=$0; max=$3}}END{print op} ' max=0 name1=${name} >> output.txt
}
Is there any way to parallelize this and reduce the execution time. Can only use shell scripting.