I'm trying to learn OpenMP for a program I'm writing. For part of it I'm trying to implement a function to find the average of a large array. Here is my code:
double mean(double* mean_array){
double mean = 0;
omp_set_num_threads( 4 );
#pragma omp parallel for reduction(+:mean)
for (int i=0; i<aSize; i++){
mean = mean + mean_array[i];
}
printf("hello %d\n", omp_get_thread_num());
mean = mean/aSize;
return mean;
}
However if I run the code it runs slower than the sequential version. Also for the print statement I get:
hello 0
hello 0
Which doesn't make much sense to me, shouldn't there be 4 hellos?
Any help would be appreciated.
hellos, so it's unclear how many there should be. At any rate, what isaSize? If it's small, then it is unsurprising that it is slow; there is overhead associated with starting up threads, and unless you have enough data to make the speed-up of using OpenMP appreciable, the overhead will dominate the timing.hello. It is after theforloop, which is the only thing you have parallelized, so it should only be run by thread0. It appears, however, that you call your function twice, so it is printed out twice.omp_get_wtime()?omp_get_wtimeinstead ofclock.