2

Any recommended ways to make PyTorch DataLoader (torch.utils.data.DataLoader) work in distributed environment, single machine and multiple machines? Can it be done without DistributedDataParallel?

1 Answer 1

3

Maybe you need to make your question clear. DistributedDataParallel is abbreviated as DDP, you need to train a model with DDP in a distributed environment. This question seems to ask how to arrange the dataset loading process for distributed training.

First of all,

data.Dataloader is proper for both dist and non-dist training, usually, there is no need to do something on that.

But the sampling strategy varies in this two modes, you need to specify a sampler for the dataloader(the sampler arg in data.Dataloader), adopting torch.utils.data.distributed.DistributedSampler is the simplest way.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.