0

I don't have much experience with openmp.

Is it possible to make the following code faster by using a for loop over pointer instead of index?

Are there anyway to make the following code faster?

The code multiplies an array by a constant.

Thank you.

code:

#include <iostream>
#include <stdlib.h>
#include <stdint.h>
#include <vector>
using namespace std;
int main(void){
    size_t dim0, dim1;
    dim0 = 100;
    dim1 = 200;
    std::vector<float> vec;
    vec.resize(dim0*dim1);
    float scalar = 0.9;
    size_t size_sq = dim0*dim1;
    #pragma omp parallel
    {       
        #pragma omp for
        for(size_t i = 0; i < size_sq; ++i){
            vec[i] *= scalar;
        }   
    }   
}

serial pointer loop

float* ptr_start = vec.data();
float* ptr_end   = ptr_start + dim0*dim1;
float* ptr_now;
for(ptr_now = ptr_start; ptr_now != ptr_end; ++ptr_now){
    *(ptr_now) *= scalar;
}
5
  • There are only 20,000 values in your loop, and CPU synchronisation also has some overhead. Have you measured how fast the loop is with and without OMP? Can you share those results? Commented Jul 29, 2016 at 17:45
  • the actual array is much bigger than this one. i also want to know if I did something that hurts performance because I will use openmp at other places too. Commented Jul 29, 2016 at 17:46
  • Really generated code may differ from what you wrote. Did you disassemble release program with all optimizations? P.S.: do your OpenMP allow you to use size_t as index type? Commented Jul 29, 2016 at 18:33
  • i am using intel c compiler. so far size_t worked. What is the correct index type to use? Commented Jul 29, 2016 at 18:37
  • 2
    As you are using Intel compiler, the opt-report options should be able to give you a quick assessment on relative efficiency of size_t, pointer, and int, whether setting omp for without simd clause inhibits vectorization, and the like. Commented Jul 30, 2016 at 13:30

1 Answer 1

1

Serial pointer loop should be like

size_t size_sq = vec.size();
float * ptr = vec.data();
#pragma omp parallel
{       
    #pragma omp for
    for(size_t i = 0; i < size_sq; i++){
        ptr[i] *= scalar;
    }   
} 

ptr will be the same for all threads so no problem there.

As an explanation, Data sharing attribute clauses (wikipedia):

shared: the data within a parallel region is shared, which means visible and accessible by all threads simultaneously. By default, all variables in the work sharing region are shared except the loop iteration counter.

private: the data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. A private variable is not initialized and the value is not maintained for use outside the parallel region. By default, the loop iteration counters in the OpenMP loop constructs are private.

In this case, i is private and ptr shared.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks. I didn't know the same address would refer to the same block of memory across all threads.
If this is parallelized successfully, default static scheduling will give each thread a nearly equal sized chunk.
threads in the same process share address space except stack: stackoverflow.com/questions/1762418/process-vs-thread

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.