1

Can you help me please? I have a task. I have from input some text with numbers. For example:

beta     1
score   9
something   2
beta     4
something   1

I need to calculate all numbers with the same text around. And my output will be:(in this way with ":")

beta:5
something:3
score:9

Also it can be problem with temp files, where i can save my scores. And I need to use mktemp after script finished to delete it. Help me please, thanks.

3
  • 1
    Which part of this is it that you have issues with? Without knowing what you think is hard, it is difficult to give a helpful answer. I don't quite understand what you mean when you talk about temporary files. mktemp creates a file, it does not delete anything. Also, it seems that there is no need for temporary files to solve this exercise. Commented Apr 2, 2022 at 17:25
  • @Kusalananda , I'm not sure, but i think that I need to create temp files for saving my actual scores and after my script is finished, i need delete it. Commented Apr 2, 2022 at 17:32
  • @Kusalananda , I have issues with algorithm , how to do it correctly. Commented Apr 2, 2022 at 17:33

4 Answers 4

2

I will be assuming that the input will always contain exactly two fields per line.

You may use the GNU datamash utility to sort the data, group it by the first field, and calculate the sum of the second field for each group:

datamash -s -W --output-delimiter=: groupby 1 sum 2 <file

Here, the -s sorts the input, -W makes the utility treat any run of consecutive whitespace characters as a field delimiter, and --output-delimiter=: sets the output delimiter to the : character. The rest tells datamash to group by the first field and to calculate the sum of the second field for each group.

Given the input in the question in the file called file, this would produce the following output:

beta:5
score:9
something:3

You can solve this in any number of other ways too. The easiest computational solution would be to use awk:

awk '{ sum[$1] += $2 } END { for (key in sum) printf "%s:%d\n", key, sum[key] }' file 

Here, we use an associative array, sum, to hold the sum for each of the strings in the first field. The END block executes at the end of the input and outputs the calculated sums together with the strings.

Note that this solution also assumes that the first field is a single word containing no whitespace characters, as shown in the question.


Using a shell loop, reading the sorted lines from the original file, printing and resetting the sum of the second field whenever a new first field is encountered:

unset -v prev

sort file |
{
        while read -r key value; do
                if [ "$key" != "${prev-$key}" ]; then
                        # prev is set and different from $key

                        printf '%s:%d\n' "$prev" "$sum"
                        sum=0
                fi

                prev=$key
                sum=$(( sum + value ))
        done

        if [ "${prev+set}" = set ]; then
                printf '%s:%d\n' "$prev" "$sum"
        fi
}

Related: Why is using a shell loop to process text considered bad practice?

7
  • There also way to solve it without awk by writing whole algorithm? Commented Apr 2, 2022 at 18:13
  • @lolilaliaa I'm afraid that I don't understand what "the whole algorithm" is that is not implemented by that awk program (or, for that matter, encapsulated by the datamash command). You may possibly have to update your question if you have further clarifications to it. Commented Apr 2, 2022 at 18:16
  • I want to say, that is that exist solution in a whole Bash language with loops, if/else and etc. I hope you understand me Commented Apr 2, 2022 at 18:38
  • @lolilaliaa Using shell loops to parse data is fragile and not usually what you want to do. See for example Why is using a shell loop to process text considered bad practice? Commented Apr 2, 2022 at 18:49
  • Ok, but this is my task do it with shell loops. But thanks :) Commented Apr 2, 2022 at 18:57
2

If you are dealing with large file, consider using sort and awk so that we don't allocate huge array for storing key and values in the RAM.

λ cat input.txt 
beta     1
score   9
something   2
beta     4
something   1
sort input.txt |
  awk -v OFS=: 'NR==1{ key=$1 }; NR>1&&$1!=key{ print key, sum; sum=0; key=$1 }; {sum+=$2} END{ print key, sum}'
beta:5
score:9
something:3
0
#!/bin/bash
declare -i SECOND
while read first second; do
        if [ -z $FIRST ] || [ $first = $FIRST ]; then
                SECOND+=second
        else 
                echo $FIRST:$SECOND
                SECOND=second
        fi
        FIRST=$first
done < <(sort file)
echo $FIRST:$SECOND

Usually I write a similar blank and in production I put all the variables in quotes.

-1
 for k in $(awk '{if(!seen[$1]++)print $1}' file.txt); do awk -v k="$k" 'BEGIN{sum=0}$0 ~ k {sum=sum+$2}END{print k,sum}' file.txt; done

output

beta 5
score 9
something 3
1
  • That's an O(n²) solution - for n records it will take around n reads of file.txt, comprising n*n lines of data. For comparison all the other solutions are O(n). Obviously for a homework exercise such as this with small values of n (lines in the file) there'll be little difference, but for a larger file such as a 1000 lines, your solution would read the entire file of 1000 lines up to 1001 times. Commented Apr 4, 2022 at 8:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.