3

I have tried training three UNet models using keras for image segmentation to assess the effect of multi-GPU training.

  1. First model was trained using 1 batch size on 1 GPU (P100). Each training step took ~254ms. (Note it is step, not epoch).
  2. Second model was trained using 2 batch size using 1 GPU (P100). Each training step took ~399ms.
  3. Third model was trained using 2 batch size using 2 GPUs (P100). Each training step took ~370ms. Logically it should have taken the same time as the first case, since both GPUs process 1 batch in parallel but it took more time.

Anyone who can tell whether multi-GPU training results in reduced training time or not? For reference, I tried all the models using keras.

5
  • stackoverflow.com/questions/59096347/… please check this Commented Mar 24, 2020 at 11:28
  • You should look at, given the same model initialization, the total convergence time. Otherwise there might be many doubts about "what is a step" for a multigpu model, and also "what is an epoch". Commented Mar 24, 2020 at 12:23
  • @DanielMöller : Could you please tell what do you mean by total convergence time? Commented Mar 25, 2020 at 5:37
  • Do you mean by lowest validation error? Commented Mar 25, 2020 at 6:26
  • Yes, the time the model take to reach what you expect from it. The answer Srihari put here seems to say something similar. Commented Mar 25, 2020 at 11:59

1 Answer 1

3

I presume that this is due to the fact that you use a very small batch_size; in this case, the cost of distributing the gradients/computations over two GPUs and fetching them back (as well as CPU to GPU(2) data distribution) outweigh the parallel time advantage that you might gain versus the sequential training(on 1 GPU).

Expect to see a bigger difference for a batch size of 8/16 for instance.

Sign up to request clarification or add additional context in comments.

1 Comment

You are right. I just tried with training 8 batch size on v100 GPU and 16 batch size on two v100 GPUs, time taken per step for both is equal. This means that multi-GPU trained model took the same amount of time for one training step that single GPU took. But the difference is pronounced for higher batch sizes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.