1

I'm trying to learn OpenMP for a program I'm writing. For part of it I'm trying to implement a function to find the average of a large array. Here is my code:

double mean(double* mean_array){
    double mean = 0;

    omp_set_num_threads( 4 );
    #pragma omp parallel for reduction(+:mean)



    for (int i=0; i<aSize; i++){
        mean = mean + mean_array[i];

    }

    printf("hello %d\n", omp_get_thread_num());



    mean = mean/aSize;

    return mean;

}

However if I run the code it runs slower than the sequential version. Also for the print statement I get:

hello 0
hello 0

Which doesn't make much sense to me, shouldn't there be 4 hellos?

Any help would be appreciated.

6
  • Nowhere in the code you posted would there be any hellos, so it's unclear how many there should be. At any rate, what is aSize? If it's small, then it is unsurprising that it is slow; there is overhead associated with starting up threads, and unless you have enough data to make the speed-up of using OpenMP appreciable, the overhead will dominate the timing. Commented Nov 19, 2015 at 18:48
  • Hi, sorry I remove the print line by accident, I've updated my code and put it back in. aSize is 2000000 so I think that should be big enough. Commented Nov 19, 2015 at 18:51
  • For the line just added in, you should only get one hello. It is after the for loop, which is the only thing you have parallelized, so it should only be run by thread 0. It appears, however, that you call your function twice, so it is printed out twice. Commented Nov 19, 2015 at 18:52
  • How are you measuring time? Are you using omp_get_wtime()? Commented Nov 19, 2015 at 18:55
  • 1
    See the accepted answer here to understand why you should use omp_get_wtime instead of clock. Commented Nov 19, 2015 at 18:58

1 Answer 1

2

First, the reason why you are not seeing 4 "hello"s, is because the only part of the program which is executed in parallel is the so called parallel region enclosed within an #pragma omp parallel. In your code that is the loop body (since the omp parallel directive is attached to the for statement), the printf is in the sequential part of the program.

rewriting the code as follows would do the trick:

    double mean = 0;
    #pragma omp parallel num_threads(4)
    {
      #pragma omp for reduction(+:mean)
      for (int i=0; i<aSize; i++) {
         mean += mean_array[i];
      }
      mean /= aSize;
      printf("hello %d\n", omp_get_thread_num());
    }

Second, the fact your program runs slower than the sequential version, it can depend on multiple factors. First of all, you need to make sure the array is large enough so that the overhead of creating those threads (which usually happens when the parallel region is created) is negligible. Also, for small arrays you may be running into "cache false sharing" issues in which threads are competing for the same cache line causing performance degradation.

Sign up to request clarification or add additional context in comments.

1 Comment

I think you should add the division by aSize otherwise it's not a mean.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.