Keras CNN Autoencoder input shape is wrong

Question

I have build a CNN autoencoder using keras and it worked fine for the MNIST test data set. I am now trying it with a different data set collected from another source. There are pure images and I have to read them in using cv2 which works fine. I then convert these images into a numpy array which again I think works fine. But when I try to do the .fit method it gives me this error.

Error when checking target: expected conv2d_39 to have shape (100, 100, 1) but got array with shape (100, 100, 3)

I tried converting the images to grey scale but they then get the shape (100,100) and not (100,100,1) which is what the model wants. What am I doing wrong here?

Here is the code that I am using:

def read_in_images(path):
    images = []
    for files in os.listdir(path):
        img = cv2.imread(os.path.join(path, files))
        if img is not None:
            images.append(img)
    return images

train_images = read_in_images(train_path)
test_images = read_in_images(test_path)
x_train = np.array(train_images)
x_test = np.array(test_images) # (36, 100, 100, 3)

input_img = Input(shape=(100,100,3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)


x = Conv2D(16, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(168, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)


autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')


autoencoder.fit(x_train, x_train,
            epochs=25,
            batch_size=128,
            shuffle=True,
            validation_data=(x_test, x_test),
            callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

The model works fine with the MNIST data set but not with my own data set. Any help will be appreciated.

Here input_img = Input(shape=(100,100,3)) you already mentioned 3 channel and if you are getting this error, there are still contradictory for your error. And to convert your shape (100,100) to (100,100,1) use numpy.expand_dims. — Krunal V
– Krunal V, Commented Jun 4, 2019 at 6:30
I have changed the code a bit and now I read in a grey scale image from cv2. and did the np.expand(x_train, axis=3) to get (36, 100, 100, 1) but the model does nothing. It runs but the loss is loss: -3104.3462 - val_loss: -2954.8867. My original Autoencoder gave me this - loss: 0.1052 - val_loss: 0.1038 for MNIST — MNM
– MNM, Commented Jun 4, 2019 at 6:53
I have also tried flattening the array as well before putting it into Input() — MNM
– MNM, Commented Jun 4, 2019 at 7:28
Your input and output shapes are different, which should not be the case for an autoencoder. — Anakin
– Anakin, Commented Jun 4, 2019 at 7:46

Anakin · Accepted Answer · 2019-06-04 08:34:04Z

2

Your input and output shapes are different. That triggers the error (I think).

decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

should be

decoded = Conv2D(num_channels, (3, 3), activation='sigmoid', padding='same')(x)

edited Jun 4, 2019 at 8:34

answered Jun 4, 2019 at 7:48

Anakin

2,0301 gold badge18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Thibault Bacqueyrisses Over a year ago

Actually in a encoder-decoder network, you try to reconstruct the input and not predict a class. So the output must be the same shape as the input, that's what the 1x1 Conv2D does

Anakin Over a year ago

Sorry, I meant num_channels. Shouldn't the number of channels be same in input and output? In the posted code, I see 3 channels in input and 1 in output and that is what I felt is triggering the error. The original problem was never grayscale to start with.

Thibault Bacqueyrisses Over a year ago

Oh yeah that change everything ahah ! You are right, the channels must corresponds !

Anakin Over a year ago

Also, where do you find 1x1 Convolution? It was a 3x3 convolution with 1 output channel. Your answer is correct, but it does not solve the original problem. It is just hacking I would say. :P

Anakin Over a year ago

I know what a 1x1 conv filter is. I just don't see it anywhere in the code and you mentioned it. So was wondering.

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

I ran some tests, and with data loaded in grayscale like that :

img = cv2.imread(os.path.join(path, files), 0)

then expand the dim of the final loaded array like :

x_train = np.expand_dims(x_train, -1)

and finaly normalize you data with a simple :

x_train = x_train / 255.

(the input of your model must be : input_img = Input(shape=(100, 100, 1))

The loss becomes normal again and the model run well !

UPDATE after comment

In order to keep all the rgb channel throught the network, you need an output corresponding to your input shape.
Here if you want image with shape (100, 100, 3), you need an output of (100, 100, 3) from your decoder.

The decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) will shrink the output to have a shape (100, 100, 1)

So you simply need to change the number of filters, here we want 3 colors channels so the conv must be like that :

decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jun 4, 2019 at 7:42

Thibault Bacqueyrisses

2,3511 gold badge9 silver badges18 bronze badges

4 Comments

MNM Over a year ago

That worked thank you. Is there anyway to keep it in color though?

Thibault Bacqueyrisses Over a year ago

The problem is that the final 1x1 conv layer will shrink your image to a (100, 100, 1) array, if you want to keep the 3 colors dim, you need to change the last layer to : decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x) If you do that, you don't need anymore to load to grayscale and expand the dims

Thibault Bacqueyrisses Over a year ago

One last thing, be careful with your second Conv2D layer, you set a filter 168 instead of a filter 16

Anakin Over a year ago

@ThibaultBacqueyrisses kindly update your answer with the output channel detail, as it is an accepted answer. Might get people confused.

Collectives™ on Stack Overflow

Keras CNN Autoencoder input shape is wrong

2 Answers 2

7 Comments

UPDATE after comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

UPDATE after comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related