3

I wanted to create an array to hold mixed types - string and int.

The following code did not work as desired - all elements got typed as String.

>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>

All elements of the array were typed as 'numpy.string_'

But, oddly enough, if I pass one of the elements as "None", the types turn out as desired:

>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>

Thus, including a "None" element provides me with a workaround, but I am wondering why this should be the case. Even if I don't pass one of the elements as None, shouldn't the elements be typed as they are passed?

2
  • 2
    both not really duplicates, second one is better, but a more explicit explanation with regards to None would be better for OP Commented Jul 3, 2018 at 8:29
  • The proposed duplicate explains just the string dtype: stackoverflow.com/questions/49751000/…. Commented Jul 3, 2018 at 14:49

2 Answers 2

2

Mixed types in NumPy is strongly discouraged. You lose the benefits of vectorised computations. In this instance:

  • For your first array, NumPy makes the decision to convert your array to a uniform array of strings of 3 or less characters.
  • For your second array, None is not permitted as a "stringable" variable in NumPy, so NumPy uses the standard object dtype. object dtype represents a collection of pointers to arbitrary types.

You can see this when you print the dtype attributes of your arrays:

print(np.array(["Str",1,2,3,4]).dtype)     # <U3
print(np.array(["Str",None,2,3,4]).dtype)  # object

This should be entirely expected. NumPy has a strong preference for homogenous types, as indeed you should have for any meaningful computations. Otherwise, Python list may be a more appropriate data structure.

For a more detailed descriptions of how NumPy prioritises dtype choice, see:

Sign up to request clarification or add additional context in comments.

Comments

1

An alternative to adding the None is to make the dtype explicit:

In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)

Creating a object dtype array and filling it from a list is another option:

In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)

Here it isn't needed, but it matters when you want an array of lists.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.