0

I have the following monte carlo code with openmp multithreading

int main()
{
  std::uniform_int_distribution<int> dir(0, 1);
  std::uniform_int_distribution<int> xyz(0, 2);
  std::uniform_real_distribution<double> angle(0,360);
  mt19937 gen{0};

  auto M = 20;
  long double sum = 0;
  auto num_trials = 10000;
  // omp_set_num_threads(12);

  #pragma omp parallel
  {
    double loc_sum = 0.0;
    #pragma omp for
    for(int i = 0; i < num_trials; ++i)
    {
      double x = 0;
      double y = 0;
      double z = 0;
      double r = 0; 
      auto N = 0;
      
      while(r < M)
      {
        auto d = dir(gen);
        auto p = xyz(gen);
        if(p == 0)
        {
          x += (d == 1) ? -1 : 1;
        }
        else if(p == 1)
        {
          y += (d == 1) ? -1 : 1;
        }
        else
        {
          z += (d == 1) ? -1 : 1;
        }

        r = std::sqrt(x * x + y * y + z * z);
        ++N;
      }

      loc_sum += N;
    }

    #pragma omp critical
      sum += loc_sum;
  }
}

The variable sum is quite different for executing in serial vs multiple threads. I am expecting a slight difference due to the calls to the random uniform distributions, but the difference that I am observing is too significant and can't be due to randomness, and I suspect there's a bug in my multithreaded code.

Are there any race conditions or data races in this code that's affecting sum?

4
  • I'm pretty sure you can't call those generators (dir and xyz) without a lock around them. You're also using the PRNG (gen) without locking. Does the code work if you mark the lines to generate d and p with #pragma omp critical? Commented Oct 21, 2020 at 14:45
  • @Darhuuk Ah, I think you're right. wrapped d,p with a critical section and it works. I had thought the generators would be produce values atomically. Practically, I don't want to put a critical section around d,p. Would the best solution here be to create the generators inside the parallel region? Commented Oct 21, 2020 at 14:59
  • Yeah, in that case your best bet would be to create copies of the generators. Those should be pretty lightweight. You probably don't want to copy the PRNG object, since then you'll have the exact same state in each thread. So I would create it inside the for loop with proper initialization. Commented Oct 21, 2020 at 15:04
  • Expanded my comments in a proper answer. Commented Oct 21, 2020 at 15:10

1 Answer 1

1

The issue is that you call the generators (dir and xyz) without a lock around them. You're also using the PRNG (gen) without locking.

Neither of those calls is atomic, since implementing them as such by default would make single threaded code slower than it needs to be.

Marking the lines to generate d and p with #pragma omp critical should fix the issue.

If you don't want a critical section, you'll need seperate dir, xyz, and gen objects in each thread. The generators (dir and xyz) can simply be copied. The PRNG (gen) should be properly initialized per thread, otherwise you end up with PNGs with the exact same state in each thread. E.g.:

std::random_device rd; /* Outside the parallel section. */

// Code below once per thread.
/* Initialization of the PRNG calls std::random_device::operator() which 
 * needs a lock around it when called in parallel. */
std::mt199937 gen;    
#pragma omp critical
{
  gen.seed(rd());
}

// for-loop starts here.
Sign up to request clarification or add additional context in comments.

3 Comments

Is there a race condition in the rd() call?
Unless explicitly mentioned that the function in question is atomic, I would assume that there are going to be race conditions. It's highly unlikely a function in the standard library is going to be atomic, because that would make it slower than not making it atomic. I.e. by making it atomic, you punish everyone who uses that function from a single thread. Instead, if you write your functions non-atomic, anyway can make them atomic by putting a lock around the call. And now you only pay the price of locking/atomic calls when you really need it.
Also, the critical section around the generator seeding shouldn't have any discernible impact on the runtime of your program, since it only runs once per thread.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.