Performance problem with automatic parallelization in `pytorch`

I'm having problems with python code that uses pytorch. The details are a bit complicated (the code is part of a quantum mechanical calculation) but the code structure is very straightforward and looks more or less like this:

# p is a batch containing 100000 sets of momenta. 
# Each set contains four vectors in 3 dimensions.
p = momenta[startbatch : endbatch]
# p.shape : (100000 , 4 , 3)

# It should be easy to parallelize the following
# with respect to the first index of `p`:
result = 1.0 # * <complicated expression involving p>
# result.shape : (100000 , 16 , 16)

The same calculation <complicated expression involving p> is performed for, say, a 100000 sets of momenta. Parallelizing this using Fortran would involve adding a simple !$omp parallel do.

I'm using python partially as a learning opportunity and partially because I wold later like to automatically calculate gradients with respect to some parameters. Unfortunately, when measuring the performance of the code I get the following relationship between execution time and number of cores used:

Since memory is plentiful and the calculation can be easily parallelized along the first index of p I would expect that this relationship would instead be much more strongly decreasing. For instance at 20 threads I would expect the execution time to be around 7 seconds: 1/20 of the single thread time.

I'm guessing the automatic parallelization of <complicated expression involving p> is not optimal. Is it possible to specify explicitly, that the calculation needs to be performed in parallel for each i in p[i , : , :]? I could run 20 (or more) python threads and in each run <complicated expression involving p> with threads == interop threads == 1 but I'm hoping there is a simpler / more elegant solution.

Tips, comments would be greatly appreciated.

PS For a single thread the performance is similar to Fortran.

asked Dec 2 at 10:24

kacper

775 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Performance problem with automatic parallelization in `pytorch`

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest