OpenMP while loop

Question

I have a code that runs many iterations and only if a condition is met, the result of the iteration is saved. This is naturally expressed as a while loop. I am attempting to make the code run in parallel, since each realisation is independent. So I have this:

while(nit<avit){
    #pragma omp parallel shared(nit,avit)
    {
        //do some stuff
        if(condition){
            #pragma omp critical
            {
                nit++;
                \\save results
            }
        }
    }//implicit barrier here
}

and this works fine... but there is a barrier after each realization, which means that if the stuff I am doing inside the parallel block takes longer in one iteration than the others, all my threads are waiting for it to finish, instead of continuing with the next iteration.

Is there a way to avoid this barrier so that the threads keep working? I am averaging thousands of iterations, so a few more don't hurt (in case the nit variable has not been incremented in already running threads)...

I have tried to turn this into a parallel for, but the automatic increment in the for loop makes the nit variable go wild. This is my attempt:

#pragma omp parallel shared(nit,avit)
{
    #pragma omp for
    for(nit=0;nit<avit;nit++){
        //do some stuff
        if(condition){
            \\save results
        } else {
            #pragma omp critical
            {
                nit--;
            }
        }
    }
}

and it keeps working and going around the for loop, as expected, but my nit variable takes unpredictable values... as one could expect from the increase and decrease of it by different threads at different times.

I have also tried leaving the increment in the for loop blank, but it doesn't compile, or trying to trick my code to have no increment in the for loop, like

...
incr=0;
for(nit=0;nit<avit;nit+=incr)
...

but then my code crashes...

Any ideas?

Thanks

Edit: Here's a working minimal example of the code on a while loop:

#include <random>
#include <vector>
#include <iostream>
#include <time.h>
#include <omp.h>
#include <stdlib.h>
#include <unistd.h>

using namespace std;

int main(){

    int nit,dit,avit=100,t,j,tmax=100,jmax=10;
    vector<double> Res(10),avRes(10);

    nit=0; dit=0;
    while(nit<avit){
        #pragma omp parallel shared(tmax,nit,jmax,avRes,avit,dit) private(t,j) firstprivate(Res)
        {
            srand(int(time(NULL)) ^ omp_get_thread_num());
            t=0; j=0;
            while(t<tmax&&j<jmax){
                Res[j]=rand() % 10;
                t+=Res[j];
                if(omp_get_thread_num()==5){
                    usleep(100000);
                }
                j++;
            }
            if(t<tmax){
                #pragma omp critical
                {
                    nit++;
                    for(j=0;j<jmax;j++){
                        avRes[j]+=Res[j];
                    }
                    for(j=0;j<jmax;j++){
                        cout<<avRes[j]/nit<<"\t";
                    }
                    cout<<" \t nit="<<nit<<"\t thread: "<<omp_get_thread_num();
                    cout<<endl;
                }
            } else{
                #pragma omp critical
                {
                    dit++;
                    cout<<"Discarded: "<<dit<<"\r"<<flush;
                }
            }
        }
    }
    return 0;
}

I added the usleep part to simulate one thread taking longer than the others. If you run the program, all threads have to wait for thread 5 to finish, and then they start the next run. what I am trying to do is precisely to avoid such wait, i.e. I'd like the other threads to pick the next iteration without waiting for 5 to finish.

It's not really possible to answer the question properly without a more specific understanding of some stuff. Particularly we have no Idea whether nit is accessed in some stuff and what is supposed to happen when multiple threads have condition at the same time, or nit was updated multiple times while one thread was doing some stuff... It's hard, but for a good, specific, answer you have to create a minimal reproducible example. — Zulan
– Zulan, Commented Nov 18, 2017 at 10:13
Thanks for the comment @Zulan. I edited the question to add a minimal working example at the end. Hope this clarifies things. — Francisco Herrerias-Azcue
– Francisco Herrerias-Azcue, Commented Nov 21, 2017 at 11:12

Zulan · Accepted Answer · 2017-11-27 11:42:03Z

4

You can basically follow the same concept as for this question, with a slight variation to ensure that avRes is not written to in parallel:

int nit = 0;
#pragma omp parallel
while(1) {
     int local_nit;
     #pragma omp atomic read
     local_nit = nit;
     if (local_nit >= avit) {
          break;
     }

     [...]

     if (...) { 
          #pragma omp critical
          {
                #pragma omp atomic capture
                local_nit = ++nit;
                for(j=0;j<jmax;j++){
                    avRes[j] += Res[j];
                } 
                for(j=0;j<jmax;j++){
                    // technically you could also use `nit` directly since
                    // now `nit` is only modified within this critical section
                    cout<<avRes[j]/local_nit<<"\t";
                }
          }
     } else {
          #pragma omp atomic update
          dit++;
     }
 }

It also works with critical regions, but atomics are more efficient.

There's another thing you need to consider, rand() should not be used in parallel contexts. See this question. For C++, use a private (i.e. defined within the parallel region) random number generator from <random>.

edited Nov 27, 2017 at 11:42

answered Nov 21, 2017 at 15:22

Zulan

22.8k7 gold badges57 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Francisco Herrerias-Azcue Over a year ago

Thanks @Zulan, with minor modifications this seems to be (almost) doing the trick! I am not very familiar with atomic, but it would not let me add a block of code as with critical... I need that all threads are able to write results, and that they are written more often than just at the end... If I do #pragma omp critical { local_nit = nit++; for(j=0;j<jmax;j++){ avRes[j]+=Res[j]; } [...] } it works, but I am curious as to how I could use atomic, since you mentioned is more efficient. Also, the link to the first question seems to be wrong...

Francisco Herrerias-Azcue Over a year ago

Never mind, I understand now that atomic can only be used for single line instructions. So the final code has to have the critical section for writing. And just as a final correction, I believe the local_nit = nit++; should simply be nit++;, since local_nit is updated above, in #pragma omp atomic read local_nit = nit; and the if that prevents writing until the end is not wanted... Thanks again for your help.

Zulan Over a year ago

My code was mistakenly using postfix instead of prefix increment. I fixed that and the link. With the correct form local_nit = ++nit; if (local_nit == avit) ... you can protect the writing of output without an expensive critical section.

Francisco Herrerias-Azcue Over a year ago

That makes a bit more sense... but this would again only write at the end of the program, right? I need to be able to check results more often, so I think I do need a critical section... Could I "hide it" within an if?:

#pragma omp atomic capture local_wnit=++wnit;   if(local_wnit==wavit){ #pragma omp critical { \\write; wnit-=local_wnit;} }

where wnit is a global write counter local_wnit the locally handled variable and wavit a shared variable with the number of iterations between write events. Would this help or is it the same? Thanks for fixing the link.

Zulan Over a year ago

If you need to update the result for every increment (sorry I misread your question), you might as well just use a critical section for the update. See my edited answer. Alternative you can do a reduction if it's just about summing up if performance is an issue with the critical section - or if jmax is rather small do every result update atomically. Most importantly you don't want to read the counter in a critical section.

|

Collectives™ on Stack Overflow

OpenMP while loop

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related