5

I have a code that runs many iterations and only if a condition is met, the result of the iteration is saved. This is naturally expressed as a while loop. I am attempting to make the code run in parallel, since each realisation is independent. So I have this:

while(nit<avit){
    #pragma omp parallel shared(nit,avit)
    {
        //do some stuff
        if(condition){
            #pragma omp critical
            {
                nit++;
                \\save results
            }
        }
    }//implicit barrier here
}

and this works fine... but there is a barrier after each realization, which means that if the stuff I am doing inside the parallel block takes longer in one iteration than the others, all my threads are waiting for it to finish, instead of continuing with the next iteration.

Is there a way to avoid this barrier so that the threads keep working? I am averaging thousands of iterations, so a few more don't hurt (in case the nit variable has not been incremented in already running threads)...

I have tried to turn this into a parallel for, but the automatic increment in the for loop makes the nit variable go wild. This is my attempt:

#pragma omp parallel shared(nit,avit)
{
    #pragma omp for
    for(nit=0;nit<avit;nit++){
        //do some stuff
        if(condition){
            \\save results
        } else {
            #pragma omp critical
            {
                nit--;
            }
        }
    }
}

and it keeps working and going around the for loop, as expected, but my nit variable takes unpredictable values... as one could expect from the increase and decrease of it by different threads at different times.

I have also tried leaving the increment in the for loop blank, but it doesn't compile, or trying to trick my code to have no increment in the for loop, like

...
incr=0;
for(nit=0;nit<avit;nit+=incr)
...

but then my code crashes...

Any ideas?

Thanks

Edit: Here's a working minimal example of the code on a while loop:

#include <random>
#include <vector>
#include <iostream>
#include <time.h>
#include <omp.h>
#include <stdlib.h>
#include <unistd.h>

using namespace std;

int main(){

    int nit,dit,avit=100,t,j,tmax=100,jmax=10;
    vector<double> Res(10),avRes(10);

    nit=0; dit=0;
    while(nit<avit){
        #pragma omp parallel shared(tmax,nit,jmax,avRes,avit,dit) private(t,j) firstprivate(Res)
        {
            srand(int(time(NULL)) ^ omp_get_thread_num());
            t=0; j=0;
            while(t<tmax&&j<jmax){
                Res[j]=rand() % 10;
                t+=Res[j];
                if(omp_get_thread_num()==5){
                    usleep(100000);
                }
                j++;
            }
            if(t<tmax){
                #pragma omp critical
                {
                    nit++;
                    for(j=0;j<jmax;j++){
                        avRes[j]+=Res[j];
                    }
                    for(j=0;j<jmax;j++){
                        cout<<avRes[j]/nit<<"\t";
                    }
                    cout<<" \t nit="<<nit<<"\t thread: "<<omp_get_thread_num();
                    cout<<endl;
                }
            } else{
                #pragma omp critical
                {
                    dit++;
                    cout<<"Discarded: "<<dit<<"\r"<<flush;
                }
            }
        }
    }
    return 0;
}

I added the usleep part to simulate one thread taking longer than the others. If you run the program, all threads have to wait for thread 5 to finish, and then they start the next run. what I am trying to do is precisely to avoid such wait, i.e. I'd like the other threads to pick the next iteration without waiting for 5 to finish.

2
  • It's not really possible to answer the question properly without a more specific understanding of some stuff. Particularly we have no Idea whether nit is accessed in some stuff and what is supposed to happen when multiple threads have condition at the same time, or nit was updated multiple times while one thread was doing some stuff... It's hard, but for a good, specific, answer you have to create a minimal reproducible example. Commented Nov 18, 2017 at 10:13
  • Thanks for the comment @Zulan. I edited the question to add a minimal working example at the end. Hope this clarifies things. Commented Nov 21, 2017 at 11:12

1 Answer 1

4

You can basically follow the same concept as for this question, with a slight variation to ensure that avRes is not written to in parallel:

int nit = 0;
#pragma omp parallel
while(1) {
     int local_nit;
     #pragma omp atomic read
     local_nit = nit;
     if (local_nit >= avit) {
          break;
     }

     [...]

     if (...) { 
          #pragma omp critical
          {
                #pragma omp atomic capture
                local_nit = ++nit;
                for(j=0;j<jmax;j++){
                    avRes[j] += Res[j];
                } 
                for(j=0;j<jmax;j++){
                    // technically you could also use `nit` directly since
                    // now `nit` is only modified within this critical section
                    cout<<avRes[j]/local_nit<<"\t";
                }
          }
     } else {
          #pragma omp atomic update
          dit++;
     }
 }

It also works with critical regions, but atomics are more efficient.

There's another thing you need to consider, rand() should not be used in parallel contexts. See this question. For C++, use a private (i.e. defined within the parallel region) random number generator from <random>.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks @Zulan, with minor modifications this seems to be (almost) doing the trick! I am not very familiar with atomic, but it would not let me add a block of code as with critical... I need that all threads are able to write results, and that they are written more often than just at the end... If I do #pragma omp critical { local_nit = nit++; for(j=0;j<jmax;j++){ avRes[j]+=Res[j]; } [...] } it works, but I am curious as to how I could use atomic, since you mentioned is more efficient. Also, the link to the first question seems to be wrong...
Never mind, I understand now that atomic can only be used for single line instructions. So the final code has to have the critical section for writing. And just as a final correction, I believe the local_nit = nit++; should simply be nit++;, since local_nit is updated above, in #pragma omp atomic read local_nit = nit; and the if that prevents writing until the end is not wanted... Thanks again for your help.
My code was mistakenly using postfix instead of prefix increment. I fixed that and the link. With the correct form local_nit = ++nit; if (local_nit == avit) ... you can protect the writing of output without an expensive critical section.
That makes a bit more sense... but this would again only write at the end of the program, right? I need to be able to check results more often, so I think I do need a critical section... Could I "hide it" within an if?: #pragma omp atomic capture local_wnit=++wnit; if(local_wnit==wavit){ #pragma omp critical { \\write; wnit-=local_wnit;} } where wnit is a global write counter local_wnit the locally handled variable and wavit a shared variable with the number of iterations between write events. Would this help or is it the same? Thanks for fixing the link.
If you need to update the result for every increment (sorry I misread your question), you might as well just use a critical section for the update. See my edited answer. Alternative you can do a reduction if it's just about summing up if performance is an issue with the critical section - or if jmax is rather small do every result update atomically. Most importantly you don't want to read the counter in a critical section.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.