1

I am getting a very weird and unexpected ERROR from python-numpy

I am working with the following libraries:

import os, glob, string, math, csv, json
import datetime as dt
import numpy as np
import scipy as sci
import pandas as pd
import matplotlib.pyplot as plt
import feedparser as fp
import cPickle as pickle
import networkx as nx
from urllib2 import urlopen
import statsmodels.formula.api as sm
import patsy

The following code:

n,k = 2643605051, 648128.068241
print n,type(n)
print k, type(k)
nkvar = (k + 1)*(n + 2)/( (n+2) * (n+1)**2 )
print nkvar

n = np.int64(n)
k = np.float64(k)
print n,type(n)
print k, type(k)
nkvar = (k + 1)*(n + 2)/( (n+2) * (n+1)**2 )
print nkvar

Yields:

2643605051 <type 'int'>
648128.068241 <type 'float'>
9.27402694708e-14
2643605051 <type 'numpy.int64'>
648128.068241 <type 'numpy.float64'>
-0.00383719008751

The second answer is OBVIOUSLY wrong! Could someone please help me understand what is going on?

6
  • Ok, sorry for blaming pandas. The issue seems to be with numpy! Here is an example: n,k = 2643605051, 648128.068241 print n,type(n) print k, type(k) nkvar = (k + 1)*(n + 2)/( (n+2) * (n+1)**2 ) print nkvar n = np.int64(n) k = np.float64(k) print n,type(n) print k, type(k) nkvar = (k + 1)*(n + 2)/( (n+2) * (n+1)**2 ) print nkvar which yields: 2643605051 <type 'int'> 648128.068241 <type 'float'> 9.27402694708e-14 2643605051 <type 'numpy.int64'> 648128.068241 <type 'numpy.float64'> -0.00383719008751 Commented Sep 11, 2013 at 18:16
  • 1
    Can you put that example in your question? It's near impossible to follow in a comment. Commented Sep 11, 2013 at 18:18
  • 2
    int64 is 64-bit. Operations on it are restricted to 64 bits. int produces arbitrary-precision longs if the result doesn't fit into an int. Commented Sep 11, 2013 at 18:30
  • is there a way to set the default precision to int64 and float128 in NUMPY? Commented Sep 11, 2013 at 18:36
  • Is there a reason that you haven't canceled the common factor (n+2) in the numerator and denominator of your expression? Commented Sep 12, 2013 at 2:28

1 Answer 1

4

You are suffering from arithmetic overflow. With NumPy, for the sake of speed, most operations do not check for arithmetic overflow. The onus is on you to choose the proper dtype to avoid overflow.

import numpy as np

n,k = 2643605051, 648128.068241
nkvar = (k + 1)*(n + 2)/((n+1)**2 * (n+2))
print "In foo nkvar = ", nkvar, "  from (n,k) = ", (n,k)
# In foo nkvar =  9.27402694708e-14   from (n,k) =  (2643605051L, 648128.068241)       

n,k = np.int64(2643605051), np.float32(648128.068241)
nkvar = (k + 1)*(n + 2)/((n+1)**2 * (n+2))
print "In foo nkvar = ", nkvar, "  from (n,k) = ", (n,k)
# In foo nkvar =  -0.00383719005352   from (n,k) =  (2643605051, 648128.06)

A workaround: Since there is no NumPy integer dtype large enough to perform the computation without overflow, you'll need to convert n to a Python int first:

n = int(w.sum())

Another alternative is to change the dtype of n to float64:

n,k = np.float64(2643605051), np.float64(648128.068241)
nkvar = (k + 1)*(n + 2)/((n+1)**2 * (n+2))
print "In foo nkvar = ", nkvar, "  from (n,k) = ", (n,k)
# In foo nkvar =  9.27402694708e-14   from (n,k) =  (2643605051.0, 648128.06824099994)
Sign up to request clarification or add additional context in comments.

2 Comments

is there a way to set the default precision to int64 and float128 in NUMPY?
When you define your arrays, you can supply the desired dtype. Or, after the fact, the dtype can be changed with arr = arr.astype('int64'). However, even int64 is not big enough to avoid overflow in this case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.