I am using pretrained torchvision models in PyTorch and transfer learning to classify my own data set. That is working fine, but I think I could further improve my classification performance. Our images come in different dimensions, all of them are resized to fit the input of my model (e.g. to 224x224 pixels).
However, the original image size often says a lot of the class this image belongs to. So I thought it might help the model to add the original image dimension as second input to the model.
Currently I build my model in PyTorch like this:
model = resnet50(pretrained=True) # Could be another base model as well
for module, param in zip(model.modules(), model.parameters()):
if isinstance(module, nn.BatchNorm2d):
param.requires_grad = False
model.fc = nn.Sequential(
nn.Linear(2048, 512),
nn.ReLU(),
nn.Dropout(0.25),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.25),
nn.Linear(256, num_classes),
)
Now how would I add another (two-dimensional?) input to that model so that I can feed x and y dimensions of the original image to the model? Also, where does that make most sense - directly into the "beginning" of the model, or better somewhere "in between"?