PyTorch AutoEncoder - Decoded output dimension not the same as input

Question

I am building a Custom Autoencoder to train on a dataset. My model is as follows

class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder,self).__init__()

        self.encoder = nn.Sequential(
        nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3,stride=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=3,stride=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size=3,stride=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=128,out_channels=256,kernel_size=5,stride=2),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=256,out_channels=512,kernel_size=5,stride=2),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=5,stride=2),
        nn.ReLU(inplace=True)
        )

        self.decoder = nn.Sequential(
        nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2),
        nn.ReLU(inplace=True),
        nn.ConvTranspose2d(in_channels=512,out_channels=256,kernel_size=5,stride=2),
        nn.ReLU(inplace=True),
        nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2),
        nn.ReLU(inplace=True),
        nn.ConvTranspose2d(in_channels=128,out_channels=64,kernel_size=3,stride=1),
        nn.ReLU(inplace=True),
        nn.ConvTranspose2d(in_channels=64,out_channels=32,kernel_size=3,stride=1),
        nn.ReLU(inplace=True),
        nn.ConvTranspose2d(in_channels=32,out_channels=3,kernel_size=3,stride=1),
        nn.ReLU(inplace=True)
        )


    def forward(self,x):
        x = self.encoder(x)
        print(x.shape)
        x = self.decoder(x)
        return x



def unit_test():
    num_minibatch = 16
    img = torch.randn(num_minibatch, 3, 512, 640).cuda(0)
    model = AutoEncoder().cuda()
    model = nn.DataParallel(model)
    output = model(img)
    print(output.shape)

if __name__ == '__main__':
    unit_test()

As you can see, my input dimension is (3, 512, 640) but my output after passing it through the decoder is (3, 507, 635). Am I missing something while adding the Conv2D Transpose layers ?

Any help would be appreciated. Thanks

Harshit Kumar · Accepted Answer · 2020-03-16 06:46:36Z

6

The mismatch is caused by the different output shapes of ConvTranspose2d layer. You can add output_padding of 1 to first and third transpose convolution layer to solve this problem.

i.e. nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2, output_padding=1) and nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2, output_padding=1)

As per the documentation:

When stride > 1, Conv2d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side.

Decoder layers' shapes before adding output_padding:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
   ConvTranspose2d-1        [-1, 512, 123, 155]      13,107,712
              ReLU-2        [-1, 512, 123, 155]               0
   ConvTranspose2d-3        [-1, 256, 249, 313]       3,277,056
              ReLU-4        [-1, 256, 249, 313]               0
   ConvTranspose2d-5        [-1, 128, 501, 629]         819,328
              ReLU-6        [-1, 128, 501, 629]               0
   ConvTranspose2d-7         [-1, 64, 503, 631]          73,792
              ReLU-8         [-1, 64, 503, 631]               0
   ConvTranspose2d-9         [-1, 32, 505, 633]          18,464
             ReLU-10         [-1, 32, 505, 633]               0
  ConvTranspose2d-11          [-1, 3, 507, 635]             867
             ReLU-12          [-1, 3, 507, 635]               0

After adding padding:

================================================================
   ConvTranspose2d-1        [-1, 512, 124, 156]      13,107,712
              ReLU-2        [-1, 512, 124, 156]               0
   ConvTranspose2d-3        [-1, 256, 251, 315]       3,277,056
              ReLU-4        [-1, 256, 251, 315]               0
   ConvTranspose2d-5        [-1, 128, 506, 634]         819,328
              ReLU-6        [-1, 128, 506, 634]               0
   ConvTranspose2d-7         [-1, 64, 508, 636]          73,792
              ReLU-8         [-1, 64, 508, 636]               0
   ConvTranspose2d-9         [-1, 32, 510, 638]          18,464
             ReLU-10         [-1, 32, 510, 638]               0
  ConvTranspose2d-11          [-1, 3, 512, 640]             867
             ReLU-12          [-1, 3, 512, 640]               0

edited Mar 16, 2020 at 6:46

answered Mar 16, 2020 at 6:41

Harshit Kumar

13.1k10 gold badges62 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jitesh Malipeddi Over a year ago

Thanks a lot, it worked. But I still can't quite understand what it does. And why does it not work when I apply output_padding for the 1st and 2nd Transpose layers instead of the 1st and 3rd?

Harshit Kumar Over a year ago

You need to manually check where to apply output_padding (I checked it by following the output sizes' of encoder layers) using torchsummary package. Applying it at the wrong place will give the wrong output shape. It's added to right and the bottom of the image so as to arrive at the correct shape. discuss.pytorch.org/t/…

Harshit Kumar Over a year ago

Read this as well: github.com/pytorch/pytorch/issues/8816

Jitesh Malipeddi Over a year ago

Yep, the torchsummary package is really helpful for checking the sizes. Thanks a lot for your help

Collectives™ on Stack Overflow

PyTorch AutoEncoder - Decoded output dimension not the same as input

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related