Openmp multithreaded code giving different answer when using multiple threads

Question

I have the following monte carlo code with openmp multithreading

int main()
{
  std::uniform_int_distribution<int> dir(0, 1);
  std::uniform_int_distribution<int> xyz(0, 2);
  std::uniform_real_distribution<double> angle(0,360);
  mt19937 gen{0};

  auto M = 20;
  long double sum = 0;
  auto num_trials = 10000;
  // omp_set_num_threads(12);

  #pragma omp parallel
  {
    double loc_sum = 0.0;
    #pragma omp for
    for(int i = 0; i < num_trials; ++i)
    {
      double x = 0;
      double y = 0;
      double z = 0;
      double r = 0; 
      auto N = 0;
      
      while(r < M)
      {
        auto d = dir(gen);
        auto p = xyz(gen);
        if(p == 0)
        {
          x += (d == 1) ? -1 : 1;
        }
        else if(p == 1)
        {
          y += (d == 1) ? -1 : 1;
        }
        else
        {
          z += (d == 1) ? -1 : 1;
        }

        r = std::sqrt(x * x + y * y + z * z);
        ++N;
      }

      loc_sum += N;
    }

    #pragma omp critical
      sum += loc_sum;
  }
}

The variable sum is quite different for executing in serial vs multiple threads. I am expecting a slight difference due to the calls to the random uniform distributions, but the difference that I am observing is too significant and can't be due to randomness, and I suspect there's a bug in my multithreaded code.

Are there any race conditions or data races in this code that's affecting sum?

I'm pretty sure you can't call those generators (dir and xyz) without a lock around them. You're also using the PRNG (gen) without locking. Does the code work if you mark the lines to generate d and p with #pragma omp critical? — AVH
– AVH, Commented Oct 21, 2020 at 14:45
@Darhuuk Ah, I think you're right. wrapped d,p with a critical section and it works. I had thought the generators would be produce values atomically. Practically, I don't want to put a critical section around d,p. Would the best solution here be to create the generators inside the parallel region? — student010101
– student010101, Commented Oct 21, 2020 at 14:59
Yeah, in that case your best bet would be to create copies of the generators. Those should be pretty lightweight. You probably don't want to copy the PRNG object, since then you'll have the exact same state in each thread. So I would create it inside the for loop with proper initialization. — AVH
– AVH, Commented Oct 21, 2020 at 15:04

AVH · Accepted Answer · 2020-10-21 15:10:29Z

1

The issue is that you call the generators (dir and xyz) without a lock around them. You're also using the PRNG (gen) without locking.

Neither of those calls is atomic, since implementing them as such by default would make single threaded code slower than it needs to be.

Marking the lines to generate d and p with #pragma omp critical should fix the issue.

If you don't want a critical section, you'll need seperate dir, xyz, and gen objects in each thread. The generators (dir and xyz) can simply be copied. The PRNG (gen) should be properly initialized per thread, otherwise you end up with PNGs with the exact same state in each thread. E.g.:

std::random_device rd; /* Outside the parallel section. */

// Code below once per thread.
/* Initialization of the PRNG calls std::random_device::operator() which 
 * needs a lock around it when called in parallel. */
std::mt199937 gen;    
#pragma omp critical
{
  gen.seed(rd());
}

// for-loop starts here.

answered Oct 21, 2020 at 15:10

AVH

11.5k4 gold badges36 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

student010101 Over a year ago

Is there a race condition in the rd() call?

AVH Over a year ago

Unless explicitly mentioned that the function in question is atomic, I would assume that there are going to be race conditions. It's highly unlikely a function in the standard library is going to be atomic, because that would make it slower than not making it atomic. I.e. by making it atomic, you punish everyone who uses that function from a single thread. Instead, if you write your functions non-atomic, anyway can make them atomic by putting a lock around the call. And now you only pay the price of locking/atomic calls when you really need it.

AVH Over a year ago

Also, the critical section around the generator seeding shouldn't have any discernible impact on the runtime of your program, since it only runs once per thread.

Collectives™ on Stack Overflow

Openmp multithreaded code giving different answer when using multiple threads

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related