Implementing Gradient Descent In Python and receiving an overflow error

Question

Gradient Descent and Overflow Error

I am currently implementing vectorized gradient descent in python. However, I continue to get an overflow error. The numbers in my dataset are not extremely large though. I am using this formula:

I choose this implementation to avoid using derivatives. Does anyone have any suggestion on how to remedy this problem or am I implementing it wrong? Thank you in advance!

Dataset Link: https://www.kaggle.com/CooperUnion/anime-recommendations-database/data

## Cleaning Data ##
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv('anime.csv')
# print(data.corr())
# print(data['members'].isnull().values.any()) # Prints False
# print(data['rating'].isnull().values.any()) # Prints True

members = [] # Corresponding fan club size for row 
ratings = [] # Corresponding rating for row

for row in data.iterrows():
    if not math.isnan(row[1]['rating']): # Checks for Null ratings
        members.append(row[1]['members'])
        ratings.append(row[1]['rating'])


plt.plot(members, ratings)
plt.savefig('scatterplot.png')

theta0 = 0.3 # Random guess
theta1 = 0.3 # Random guess
error = 0

Formula's

def hypothesis(x, theta0, theta1):
    return  theta0 + theta1 * x

def costFunction(x, y, theta0, theta1, m):
    loss = 0 
    for i in range(m): # Represents summation
        loss += (hypothesis(x[i], theta0, theta1) - y[i])**2
    loss *= 1 / (2 * m) # Represents 1/2m
    return loss

def gradientDescent(x, y, theta0, theta1, alpha, m, iterations=1500):
    for i in range(iterations):
        gradient0 = 0
        gradient1 = 0
        for j in range(m):
            gradient0 += hypothesis(x[j], theta0, theta1) - y[j]
            gradient1 += (hypothesis(x[j], theta0, theta1) - y[j]) * x[j]
        gradient0 *= 1/m
        gradient1 *= 1/m
        temp0 = theta0 - alpha * gradient0
        temp1 = theta1 - alpha * gradient1
        theta0 = temp0
        theta1 = temp1
        error = costFunction(x, y, theta0, theta1, len(y))
        print("Error is:", error)
    return theta0, theta1

print(gradientDescent(members, ratings, theta0, theta1, 0.01, len(ratings)))

Error's

After several iterations, my costFunction being called within my gradientDescent function gives me an OverflowError: (34, 'Result too large'). However, I expect my code to continually print out a decreasing error value.

    Error is: 1.7515692852199285e+23
    Error is: 2.012089675182454e+38
    Error is: 2.3113586742689143e+53
    Error is: 2.6551395730578252e+68
    Error is: 3.05005286756189e+83
    Error is: 3.503703756035943e+98
    Error is: 4.024828599077087e+113
    Error is: 4.623463163528686e+128
    Error is: 5.311135890211131e+143
    Error is: 6.101089907410428e+158
    Error is: 7.008538065634975e+173
    Error is: 8.050955905074458e+188
    Error is: 9.248418197694096e+203
    Error is: 1.0623985545062037e+219
    Error is: 1.220414847696018e+234
    Error is: 1.4019337603196565e+249
    Error is: 1.6104509643047377e+264
    Error is: 1.8499820618048921e+279
    Error is: 2.1251399172389593e+294
    Traceback (most recent call last):
      File "tyreeGradientDescent.py", line 54, in <module>
        print(gradientDescent(members, ratings, theta0, theta1, 0.01, len(ratings)))
      File "tyreeGradientDescent.py", line 50, in gradientDescent
        error = costFunction(x, y, theta0, theta1, len(y))
      File "tyreeGradientDescent.py", line 33, in costFunction
        loss += (hypothesis(x[i], theta0, theta1) - y[i])**2
    OverflowError: (34, 'Result too large')

Is your neural network very deep? If so, you could be running to the exploding gradients problem: machinelearningmastery.com/… there's several different ways to try to avoid this - by using a good initializer for example. — enumaris
– enumaris, Commented Apr 16, 2018 at 20:47
@enumaris It is not in a neural net I was interested in implementing gradient descent. Also, thanks for the article I am looking at it now. — user3697597
– user3697597, Commented Apr 16, 2018 at 22:26
Where is the overflow error? What debugging output do you have? Where does it start to diverge from your expectations? In short, your posting isn't yet up to SO standards. — Prune
– Prune, Commented Apr 16, 2018 at 22:51

Mark · Accepted Answer · 2018-04-17 04:01:06Z

5

Your data values are really very large, which makes your loss function very steep. The result is that you need a tiny alpha unless you normalize your data to smaller values. With an alpha value that is too large your gradient descent is hopping all over the place and actually diverges, which is why your error rate is going up rather than down.

With your current data, an alpha of 0.0000000001 will make the error converge. After 30 iterations my loss went from :

Error is: 66634985.91339202

to

Error is: 16.90452378179708

answered Apr 17, 2018 at 4:01

Mark

92.7k8 gold badges116 silver badges156 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3697597 Over a year ago

How were you able to find the appropriate alpha value?

Mark Over a year ago

@user3697597 I couldn't see another error in your code. If the code is correct and the loss is not decreasing, a alpha of too large a value should be the first suspect. So i just lowered it until it started to behave.

kamran kausar · Accepted Answer · 2018-08-14 19:24:21Z

import numpy as np
import pandas as pd

X = [0.5, 2.5]
Y = [0.2, 0.9]

def f(w, b, x): #sigmoid with parameter w,b
    return 1.0/(1.0 * np.exp(-(w * x + b)))


def error(w, b):
    err = 0.0
    for x, y in zip(X, Y):
        fx = f(w, b, x)
        err += 0.5 * (fx - y)**2
    return err

def grad_b(w, b, x, y):
    fx = f(w, b, x)
    return (fx - y) * fx * (1 - fx)

def grad_w(w, b, x, y):
    fx = f(w, b, x)
    return (fx - y) * fx * (1 - fx) * x

def do_gradient_descent():
    w, b, eta, max_epochs = 1, 1, 0.01, 100
    for i in range(max_epochs):
        dw, db = 0, 0
        for x, y in zip(X, Y):
            dw += grad_w(w, b, x, y)
            db += grad_b(w, b, x, y)
        w = w - eta * dw
        print(w)
        b = b - eta * db
        print(b)
    er = error(w, b)
    #print(er)
    return er
##Calling Gradient Descent function
do_gradient_descent()

Collectives™ on Stack Overflow

Implementing Gradient Descent In Python and receiving an overflow error

Gradient Descent and Overflow Error

Formula's

Error's

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Gradient Descent and Overflow Error

Formula's

Error's

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related