How to format input data for Pytorch?

Question

I have written a conv. neural network from scratch before, but I've decided to use Pytorch for its speed. However, I could not find documentation as to how to format for the conv2d layer. In general, there seems to be a lot of overheads and wrappers which prevents me from viewing what exactly is happening and writing my code accordingly.

I have trained a model on the MNIST dataset, and imported the model in order to run it (as per the tutorial):

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 8, 3, stride = 1, padding = 1)
        self.pool = nn.MaxPool2d(2, stride = 2)
        self.conv2 = nn.Conv2d(8, 8, 3, stride = 1, padding = 1)
        self.linear1 = nn.Linear(7 * 7 * 8, 128)
        self.linear2 = nn.Linear(128, 128)
        self.linear3 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = self.linear3(x)
        return x

my_model = NeuralNetwork()
my_model.load_state_dict(torch.load("model_weights.pth", weights_only=True))
my_model.eval()

Now, I have a web application where:

The user draws on a 28x28 canvas in black and white.
The drawing is put into a flattened array of size 784, consisting of 0's (white on canvas) and 1's (black on canvas). (e.g. [0, 0, 1, 1, 1, 1, 0, 0, ..., 1, 1])

I have a sample code of what I wish to perform:

formatted_array = some_formatting_function(flattened_array_of_0_and_1)
x = torch.tensor(formatted_array)
pred = my_model(x)
guessed_digit = some_reading_function(pred)
print(guessed_digit)

# eventually return the guessed_digit

What should my some_formatting_function and some_reading_function be?

Pierre couy · Accepted Answer · 2025-05-20 07:15:22Z

0

Formatting input data

The input of the model should be the same shape as the input of the first layer, which is a Conv2D in your case. According to PyTorch's documentation on Conv2D, the input of such a layer must of the shape (N,C_in,H_in,W_in) or (C_in,H_in,W_in), where N is the batch size, C_in is the number of channels (1 in your case), H_in is the image height (28) and W_in is the image width (28). Since you only evaluate inputs one by one, you can use the second form (or N=1).

This means you should pass a tensor of shape (1,28,28) to your model. To obtain it, you could do something like :

formatted_array = torch.tensor(flattened_array_of_0_and_1).view(1,28,28), optionally followed by a .transpose(1, 2) to swap the two spatial dimensions if they are inverted in the resulting tensor.

You may also consider not flattening the data between the user drawing and the neural network inference, but you should probably still use .view(...) to add the "channels" dimension to your input tensor.

Reading a prediction

Classifier neural networks use the one-hot encoding, meaning they are trained to output (for each training sample) a target vector of all zeros, except for a one in the dimensions corresponding to the category of the training sample. During training, we are trying to get as close as possible to such a representation, and during inference we pick the dimension with the highest value from the output vector, and use this as the predicted label. You can do this using argmax() : guessed_digit = pred.argmax()

answered May 20 at 7:15

Pierre couy

8631 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mingruifu Lin May 22 at 23:42

I see, thanks. You gave me the answer to directly solve my problem. May I also ask how to get the raw tensors in the form of multidimensional arrays?

Pierre couy May 23 at 15:38

You just pass either a Python or Numpy multidimensional array to torch.tensor(...)

Collectives™ on Stack Overflow

How to format input data for Pytorch?

1 Answer 1

Formatting input data

Reading a prediction

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Formatting input data

Reading a prediction

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related