1

I am trying to implement a simple linear model in PyTorch that can be given x data and y data, and then trained to recognize the equation y = mx + b. However, whenever I try to test my model after training, it thinks that the equation is y= mx + 2b. I'll show my code, and hopefully someone will be able to spot an issue. Thank you in advance for any help.

import torch

D_in = 500
D_out = 500
batch=200
model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

Next I create some data and set a rule. Let's do 3x+4.

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)

for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+5 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.001)

Now to training...

for epoch in range(500):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Then I test my model with a Tensor/matrix of just 1's.

test_data=torch.ones(batch,D_in) 
y_pred=model(test_data)

Now, I'd expect to get 3*1 + 4 = 7, but instead, my model thinks it is 11.

[[ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    ...,
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516]])

Similarly, if I change the rule to y=3x+8, my model guesses 19. So, I am not sure what is going on. Why is the constant being added twice? By the way, if I just set the rule to y=3x, my model correctly infers 3, and for y=mx in general my model correctly infers m. For some reason, the constant term is throwing it off. Any help to solve this problem is much appreciated. Thanks!

3
  • What is the loss finally? Is it going to zero? Commented Jul 5, 2018 at 20:04
  • Yes, the loss goes to a very small number, as in 0.005 or less. Commented Jul 5, 2018 at 20:28
  • I doubt. See M.deckers answer below Commented Jul 5, 2018 at 20:47

1 Answer 1

2

Your network does not learn long enough. It gets a vector with 500 features to describe a single datum.

Your network has to map the big input of 500 features to an output including 500 values. Your trainingdata is randomly created, not like your simple example, so I think you just have to train longer to fit your weights to approximate this function from R^500 to R^500.

If I reduce the input and output dimensionality and increase the batch size, learning rate and training steps I get the expected result:

import torch

D_in = 100
D_out = 100
batch = 512

model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)
for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+4 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.01)

for epoch in range(10000):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

test_data=torch.ones(batch,D_in)
y_pred=model(test_data)
print(y_pred)

If you just want to approximate f(x) = 3x + 4 with only one input you could also set D_in and D_out to 1.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I'm just still curious as to why it would always have a consistent error of 2c instead of c .... ? But thank you a lot this makes sense

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.