OpenMP average of an array

Question

I'm trying to learn OpenMP for a program I'm writing. For part of it I'm trying to implement a function to find the average of a large array. Here is my code:

double mean(double* mean_array){
    double mean = 0;

    omp_set_num_threads( 4 );
    #pragma omp parallel for reduction(+:mean)



    for (int i=0; i<aSize; i++){
        mean = mean + mean_array[i];

    }

    printf("hello %d\n", omp_get_thread_num());



    mean = mean/aSize;

    return mean;

}

However if I run the code it runs slower than the sequential version. Also for the print statement I get:

hello 0
hello 0

Which doesn't make much sense to me, shouldn't there be 4 hellos?

Any help would be appreciated.

Nowhere in the code you posted would there be any hellos, so it's unclear how many there should be. At any rate, what is aSize? If it's small, then it is unsurprising that it is slow; there is overhead associated with starting up threads, and unless you have enough data to make the speed-up of using OpenMP appreciable, the overhead will dominate the timing. — R_Kapp
– R_Kapp, Commented Nov 19, 2015 at 18:48
Hi, sorry I remove the print line by accident, I've updated my code and put it back in. aSize is 2000000 so I think that should be big enough. — user2320239
– user2320239, Commented Nov 19, 2015 at 18:51
For the line just added in, you should only get one hello. It is after the for loop, which is the only thing you have parallelized, so it should only be run by thread 0. It appears, however, that you call your function twice, so it is printed out twice. — R_Kapp
– R_Kapp, Commented Nov 19, 2015 at 18:52
How are you measuring time? Are you using omp_get_wtime()? — R_Kapp
– R_Kapp, Commented Nov 19, 2015 at 18:55
See the accepted answer here to understand why you should use omp_get_wtime instead of clock. — R_Kapp
– R_Kapp, Commented Nov 19, 2015 at 18:58

Anatoly · Accepted Answer · 2020-11-25 06:56:24Z

2

First, the reason why you are not seeing 4 "hello"s, is because the only part of the program which is executed in parallel is the so called parallel region enclosed within an #pragma omp parallel. In your code that is the loop body (since the omp parallel directive is attached to the for statement), the printf is in the sequential part of the program.

rewriting the code as follows would do the trick:

    double mean = 0;
    #pragma omp parallel num_threads(4)
    {
      #pragma omp for reduction(+:mean)
      for (int i=0; i<aSize; i++) {
         mean += mean_array[i];
      }
      mean /= aSize;
      printf("hello %d\n", omp_get_thread_num());
    }

Second, the fact your program runs slower than the sequential version, it can depend on multiple factors. First of all, you need to make sure the array is large enough so that the overhead of creating those threads (which usually happens when the parallel region is created) is negligible. Also, for small arrays you may be running into "cache false sharing" issues in which threads are competing for the same cache line causing performance degradation.

edited Nov 25, 2020 at 6:56

Anatoly

23.2k3 gold badges33 silver badges46 bronze badges

answered Nov 21, 2015 at 8:11

simpel01

1,78212 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ben26941 Over a year ago

I think you should add the division by aSize otherwise it's not a mean.

Collectives™ on Stack Overflow

OpenMP average of an array

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related