OpenMP for loop and pointer

Question

I don't have much experience with openmp.

Is it possible to make the following code faster by using a for loop over pointer instead of index?

Are there anyway to make the following code faster?

The code multiplies an array by a constant.

Thank you.

code:

#include <iostream>
#include <stdlib.h>
#include <stdint.h>
#include <vector>
using namespace std;
int main(void){
    size_t dim0, dim1;
    dim0 = 100;
    dim1 = 200;
    std::vector<float> vec;
    vec.resize(dim0*dim1);
    float scalar = 0.9;
    size_t size_sq = dim0*dim1;
    #pragma omp parallel
    {       
        #pragma omp for
        for(size_t i = 0; i < size_sq; ++i){
            vec[i] *= scalar;
        }   
    }   
}

serial pointer loop

float* ptr_start = vec.data();
float* ptr_end   = ptr_start + dim0*dim1;
float* ptr_now;
for(ptr_now = ptr_start; ptr_now != ptr_end; ++ptr_now){
    *(ptr_now) *= scalar;
}

There are only 20,000 values in your loop, and CPU synchronisation also has some overhead. Have you measured how fast the loop is with and without OMP? Can you share those results? — H. Guijt
– H. Guijt, Commented Jul 29, 2016 at 17:45
the actual array is much bigger than this one. i also want to know if I did something that hurts performance because I will use openmp at other places too. — rxu
– rxu, Commented Jul 29, 2016 at 17:46
Really generated code may differ from what you wrote. Did you disassemble release program with all optimizations? P.S.: do your OpenMP allow you to use size_t as index type? — ilotXXI
– ilotXXI, Commented Jul 29, 2016 at 18:33
i am using intel c compiler. so far size_t worked. What is the correct index type to use? — rxu
– rxu, Commented Jul 29, 2016 at 18:37
As you are using Intel compiler, the opt-report options should be able to give you a quick assessment on relative efficiency of size_t, pointer, and int, whether setting omp for without simd clause inhibits vectorization, and the like. — tim18
– tim18, Commented Jul 30, 2016 at 13:30

J.J. Hakala · Accepted Answer · 2016-07-31 18:49:47Z

1

Serial pointer loop should be like

size_t size_sq = vec.size();
float * ptr = vec.data();
#pragma omp parallel
{       
    #pragma omp for
    for(size_t i = 0; i < size_sq; i++){
        ptr[i] *= scalar;
    }   
}

ptr will be the same for all threads so no problem there.

As an explanation, Data sharing attribute clauses (wikipedia):

shared: the data within a parallel region is shared, which means visible and accessible by all threads simultaneously. By default, all variables in the work sharing region are shared except the loop iteration counter.

private: the data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. A private variable is not initialized and the value is not maintained for use outside the parallel region. By default, the loop iteration counters in the OpenMP loop constructs are private.

In this case, i is private and ptr shared.

edited Jul 31, 2016 at 18:49

answered Jul 29, 2016 at 20:24

J.J. Hakala

6,2546 gold badges29 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

rxu Over a year ago

thanks. I didn't know the same address would refer to the same block of memory across all threads.

tim18 Over a year ago

If this is parallelized successfully, default static scheduling will give each thread a nearly equal sized chunk.

rxu Over a year ago

threads in the same process share address space except stack: stackoverflow.com/questions/1762418/process-vs-thread

Collectives™ on Stack Overflow

OpenMP for loop and pointer

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related