How to match input and output shapes of Conv2D AutoEncoder

Question

Having a set of black and white images with the following shape (1000, 11, 1). I'm trying to modify the keras mnist example to work with my data, so I've written the following code:

input_img = layers.Input(shape=(1000, 11, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Printing the summary, I can see that the output shape is different from the input shape:

Model: "model_16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_18 (InputLayer)        [(None, 1000, 11, 1)]     0         
_________________________________________________________________
conv2d_119 (Conv2D)          (None, 1000, 11, 16)      160       
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 500, 6, 16)        0         
_________________________________________________________________
conv2d_120 (Conv2D)          (None, 500, 6, 8)         1160      
_________________________________________________________________
max_pooling2d_52 (MaxPooling (None, 250, 3, 8)         0         
_________________________________________________________________
conv2d_121 (Conv2D)          (None, 250, 3, 8)         584       
_________________________________________________________________
max_pooling2d_53 (MaxPooling (None, 125, 2, 8)         0         
_________________________________________________________________
conv2d_122 (Conv2D)          (None, 125, 2, 8)         584       
_________________________________________________________________
up_sampling2d_51 (UpSampling (None, 250, 4, 8)         0         
_________________________________________________________________
conv2d_123 (Conv2D)          (None, 250, 4, 8)         584       
_________________________________________________________________
up_sampling2d_52 (UpSampling (None, 500, 8, 8)         0         
_________________________________________________________________
conv2d_124 (Conv2D)          (None, 498, 6, 16)        1168      
_________________________________________________________________
up_sampling2d_53 (UpSampling (None, 996, 12, 16)       0         
_________________________________________________________________
conv2d_125 (Conv2D)          (None, 996, 12, 1)        145       
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________

And in fact, the training fails with an error:

ValueError: logits and labels must have the same shape ((None, 996, 12, 1) vs (None, 1000, 11, 1))

What am I doing wrong? How can I fix my code to work with my image dimenssions?

I recommend to make input shapes all dimensions (Except last) an even number, in order to be able to get back in decoder in the same way you encode. For example, first dimension is 1000, you can encode and decode like this: 1000->500->250->125->250->500->1000, (just add padding='same' to your Conv2D layer with 16 filters like other layers). But second dimension: 11->6->3->2->4->8->16. For odd numbers division by 2, make it difficult to get back by multiplying by 2. So, considering this, Change your input shape to something you could get back by multiplying by 2. — Kaveh
– Kaveh, Commented Aug 16, 2021 at 13:11
actually, it has to be divisible by 8 (because you divide by 2 3 times). E.g. 12->6->3->2->4->8->16 would still not work even though 12 is even. 16 on the other hand is divisible by 8 and would work: 16->8->4->2->4->8->16. — Ramtin Nouri
– Ramtin Nouri, Commented Aug 16, 2021 at 15:49

user11530462 · Accepted Answer · 2022-11-08 09:50:06Z

You can modify the network structure of the decoder as follows to match the input shape of the encoder and output shape of the decoder. The Cropping2D layer crops along spatial dimensions, i.e. height and width.

input_img = layers.Input(shape=(1000, 11, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((4, 4))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
# Add a cropping layer
decoded=layers.Cropping2D(cropping=((0,0),(3,2)))(decoded)

Output of model.summary():

Model: "model_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_9 (InputLayer)        [(None, 1000, 11, 1)]     0         
                                                                 
 conv2d_49 (Conv2D)          (None, 1000, 11, 16)      160       
                                                                 
 max_pooling2d_24 (MaxPoolin  (None, 500, 6, 16)       0         
 g2D)                                                            
                                                                 
 conv2d_50 (Conv2D)          (None, 500, 6, 8)         1160      
                                                                 
 max_pooling2d_25 (MaxPoolin  (None, 250, 3, 8)        0         
 g2D)                                                            
                                                                 
 conv2d_51 (Conv2D)          (None, 250, 3, 8)         584       
                                                                 
 max_pooling2d_26 (MaxPoolin  (None, 125, 2, 8)        0         
 g2D)                                                            
                                                                 
 conv2d_52 (Conv2D)          (None, 125, 2, 8)         584       
                                                                 
 up_sampling2d_24 (UpSamplin  (None, 250, 4, 8)        0         
 g2D)                                                            
                                                                 
 conv2d_53 (Conv2D)          (None, 250, 4, 8)         584       
                                                                 
 up_sampling2d_25 (UpSamplin  (None, 1000, 16, 8)      0         
 g2D)                                                            
                                                                 
 conv2d_54 (Conv2D)          (None, 1000, 16, 1)       73        
                                                                 
 cropping2d_6 (Cropping2D)   (None, 1000, 11, 1)       0         
                                                                 
=================================================================
Total params: 3,145
Trainable params: 3,145
Non-trainable params: 0
_________________________________________________________________

Collectives™ on Stack Overflow

How to match input and output shapes of Conv2D AutoEncoder

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related