Multi-GPU training does not reduce training time

Question

I have tried training three UNet models using keras for image segmentation to assess the effect of multi-GPU training.

First model was trained using 1 batch size on 1 GPU (P100). Each training step took ~254ms. (Note it is step, not epoch).
Second model was trained using 2 batch size using 1 GPU (P100). Each training step took ~399ms.
Third model was trained using 2 batch size using 2 GPUs (P100). Each training step took ~370ms. Logically it should have taken the same time as the first case, since both GPUs process 1 batch in parallel but it took more time.

Anyone who can tell whether multi-GPU training results in reduced training time or not? For reference, I tried all the models using keras.

You should look at, given the same model initialization, the total convergence time. Otherwise there might be many doubts about "what is a step" for a multigpu model, and also "what is an epoch". — Daniel Möller
– Daniel Möller, Commented Mar 24, 2020 at 12:23
@DanielMöller : Could you please tell what do you mean by total convergence time? — samra irshad
– samra irshad, Commented Mar 25, 2020 at 5:37
Yes, the time the model take to reach what you expect from it. The answer Srihari put here seems to say something similar. — Daniel Möller
– Daniel Möller, Commented Mar 25, 2020 at 11:59

Timbus Calin · Accepted Answer · 2020-03-24 10:55:33Z

3

I presume that this is due to the fact that you use a very small batch_size; in this case, the cost of distributing the gradients/computations over two GPUs and fetching them back (as well as CPU to GPU(2) data distribution) outweigh the parallel time advantage that you might gain versus the sequential training(on 1 GPU).

Expect to see a bigger difference for a batch size of 8/16 for instance.

answered Mar 24, 2020 at 10:55

Timbus Calin

15.2k6 gold badges49 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

samra irshad Over a year ago

You are right. I just tried with training 8 batch size on v100 GPU and 16 batch size on two v100 GPUs, time taken per step for both is equal. This means that multi-GPU trained model took the same amount of time for one training step that single GPU took. But the difference is pronounced for higher batch sizes.

Collectives™ on Stack Overflow

Multi-GPU training does not reduce training time

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related