4

Can anyone please help me to understand that from where does Numpy's array function infers data type.

I understand it basically infers from the kind of value that has been assigned to the array.

For Example:

> data = [1,2,3,4]
> arr = np.array(data)

So in the above lines the "arr" will have the dtype('int64') or dtype('int32').

What I am trying to understand is how does it decides whether to give it a int64 or a int32?

I understand that it might be a trivial question but I am just trying to understand that how does it work as I was recently asked this in an interview.

1
  • maybe this link might help Commented Aug 9, 2015 at 19:17

4 Answers 4

3

Numeric data types include integers and floats.

If we have an array that contains both integers and floating point numbers, numpy will assign the entire array to the float data type so the decimal points are not lost.

An integer will never have a decimal point. So for example, 2.55 would be stored as 2

As mentioned by @unutbu int32 and int64 depends on the type of bit-machines you have, whether it is a 32 bit-machine or a 64 bit-machine

Strings, are values that contain numbers and/or characters. For example, a string might be a word, a sentence, or several sentences. The most general dtype=string will be assigned to your array if your array has mixed types (numbers and strings).

To have a complete detailed look, you can have a look at this website of scipy docs

Sign up to request clarification or add additional context in comments.

Comments

2

Per the docs,

Some types, such as int and intp, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines).

So, on 32-bit machines, np.array([1,2,3,4]) returns an array of dtype int32, but on 64-bit machines it returns an array of dtype int64.

Comments

2

In Python3 (and a basic 32 bit machine), int32 v int64 depends on the size of the input

In [447]: np.array(123456789)
Out[447]: array(123456789)

In [448]: _.dtype
Out[448]: dtype('int32')

In [449]: np.array(12345678901234)
Out[449]: array(12345678901234, dtype=int64)

From the np.array docs:

dtype: The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array.

Looks like int32 is the smallest default int size (at least with my configuration). The is also the value of np.int_.

As an example of the disallowed downcast:

In [456]: np.array(12345678901234, dtype=np.int32)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-456-da7c96e4b0b3> in <module>()
----> 1 np.array(12345678901234, dtype=np.int32)

OverflowError: Python int too large to convert to C long

Comments

0

I think there is some kind of a hierarchical treatment, where it uses the most conservative yet also all-encompassing type that can "legally" represent the input. If you just have integers, you will preserve all of the elements using int32/64. As soon as you introduce a float, you need to use float32/64 to preserve all of the elements of the array, and you can always back-convert a float to an int. As soon as you introduce a string, you need to use strings to legally represent everything in the array, and again, you can always back-convert to float or int if you need to

Ex:

>>> array([1]).dtype
dtype('int64')
>>> array([1, 2.0]).dtype
dtype('float64')
>>> array([1, 2.0, 'a']).dtype
dtype('S3')

In short, it is pretty smart about it ;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.