How does Numpy infers dtype for array

Question

Can anyone please help me to understand that from where does Numpy's array function infers data type.

I understand it basically infers from the kind of value that has been assigned to the array.

For Example:

> data = [1,2,3,4]
> arr = np.array(data)

So in the above lines the "arr" will have the dtype('int64') or dtype('int32').

What I am trying to understand is how does it decides whether to give it a int64 or a int32?

I understand that it might be a trivial question but I am just trying to understand that how does it work as I was recently asked this in an interview.

maybe this link might help

Srivatsan
– Srivatsan

2015-08-09 19:17:01 +00:00
Commented Aug 9, 2015 at 19:17 — Srivatsan
– Srivatsan, Commented Aug 9, 2015 at 19:17

Srivatsan · Accepted Answer · 2015-08-10 06:37:07Z

3

Numeric data types include integers and floats.

If we have an array that contains both integers and floating point numbers, numpy will assign the entire array to the float data type so the decimal points are not lost.

An integer will never have a decimal point. So for example, 2.55 would be stored as 2

As mentioned by @unutbu int32 and int64 depends on the type of bit-machines you have, whether it is a 32 bit-machine or a 64 bit-machine

Strings, are values that contain numbers and/or characters. For example, a string might be a word, a sentence, or several sentences. The most general dtype=string will be assigned to your array if your array has mixed types (numbers and strings).

To have a complete detailed look, you can have a look at this website of scipy docs

edited Aug 10, 2015 at 6:37

answered Aug 9, 2015 at 19:23

Srivatsan

9,39113 gold badges62 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2015-08-09 19:36:51Z

2

Per the docs,

Some types, such as int and intp, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines).

So, on 32-bit machines, np.array([1,2,3,4]) returns an array of dtype int32, but on 64-bit machines it returns an array of dtype int64.

answered Aug 9, 2015 at 19:36

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Comments

hpaulj · Accepted Answer · 2015-08-09 20:21:32Z

In Python3 (and a basic 32 bit machine), int32 v int64 depends on the size of the input

In [447]: np.array(123456789)
Out[447]: array(123456789)

In [448]: _.dtype
Out[448]: dtype('int32')

In [449]: np.array(12345678901234)
Out[449]: array(12345678901234, dtype=int64)

From the np.array docs:

dtype: The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array.

Looks like int32 is the smallest default int size (at least with my configuration). The is also the value of np.int_.

As an example of the disallowed downcast:

In [456]: np.array(12345678901234, dtype=np.int32)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-456-da7c96e4b0b3> in <module>()
----> 1 np.array(12345678901234, dtype=np.int32)

OverflowError: Python int too large to convert to C long

isosceleswheel · Accepted Answer · 2015-08-09 20:06:30Z

I think there is some kind of a hierarchical treatment, where it uses the most conservative yet also all-encompassing type that can "legally" represent the input. If you just have integers, you will preserve all of the elements using int32/64. As soon as you introduce a float, you need to use float32/64 to preserve all of the elements of the array, and you can always back-convert a float to an int. As soon as you introduce a string, you need to use strings to legally represent everything in the array, and again, you can always back-convert to float or int if you need to

Ex:

>>> array([1]).dtype
dtype('int64')
>>> array([1, 2.0]).dtype
dtype('float64')
>>> array([1, 2.0, 'a']).dtype
dtype('S3')

In short, it is pretty smart about it ;)

Collectives™ on Stack Overflow

How does Numpy infers dtype for array

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related