2

I want to create a 2D Numpy array in python shaped (2,7) specifying the type of each column. Some of the columns will be array. So my desired array should be like this:

[[ (0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])]
 [(0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])]]

I tried

>>> A = np.zeros(shape=(2), dtype= 'int, (3)float, (8)float, (8)float, (8)float, (10)float, (10)float')

But I get a 1D array:

>>> print A
[ (0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
 (0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])]

And if I define it this way:

>>> A = np.zeros(shape=(2,7), dtype= 'int, (3)float, (8)float, (8)float, (8)float, (10)float, (10)float')

I get an array much bigger than what I want; it's (2,7x7).

While doing this I get an error:

>>> A = np.zeros(shape=([[2],[7]]), dtype= 'int, (3)float, (8)float, (8)float, (8)float, (10)float, (10)float')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required

I don't understand how to get to my output. Any help, possibly with explanation is highly appreciated! Thanks!

2 Answers 2

4

A = np.zeros(shape=(2), dtype= '...') means make an array with shape (2,) and with a compound dtype. That's exactly what you got.

(2,) is a 1d shape. It has named fields rather than columns. Specifying a (2,7) shape just makes a 2d array with the same 7 fields.

With a dtype like this you get a structured array. You access fields by name, e.g. A['f0'].

Read the docs on dtype and structured arrays if you want to get anywhere with this approach.

The other answer directs you to pandas. That may be better for your purposes - or maybe not. But under the covers pandas uses numpy arrays, and in the case of mixed data like this it will use structured arrays or dtype=object.

With a simpler dtype:

In [742]: A = np.zeros(shape=(2), dtype= 'int, (3)float, (4)float')
In [743]: A
Out[743]: 
array([(0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]),
       (0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0])], 
      dtype=[('f0', '<i4'), ('f1', '<f8', (3,)), ('f2', '<f8', (4,))])

The first field is a 1d array of ints:

In [744]: A['f0']
Out[744]: array([0, 0])

The third can be viewed as a 2x4 of floats

In [745]: A['f2']
Out[745]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

You can select a record or element from this array:

In [746]: A[0]
Out[746]: (0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0])

You can perform normal numeric array operations on individual fields. But operating across fields is limited.

You can't for example do np.sum(A), sum across fields; but you can act on one field:

In [749]: np.sum(A['f1'],axis=1)
Out[749]: array([ 0.,  0.])

Structured arrays are most often created by reading a CSV file, where fields correspond to columns in the file, and some columns are text.

My illustrated A could, for example, represent a file where the 1st column is the record/line counter, the next 3 numbers represent one value, and the following 4 a logically distinct value. The alternative would have been to make a (2,(1+3+4)) 2d array of floats.


Regarding setting elements of a compound type arrays:

In [916]: A = np.zeros(shape=(2), dtype= 'int, (3)float, (4)float')

I can set all the values of one field with an matching size array or list:

In [918]: A['f0']=[1,2]

I can set all the values of the multi-element field in the same where; here I just fill them all:

In [920]: A['f1']=1
In [921]: A
Out[921]: 
array([(1, [1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0]),
       (2, [1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0])], 
      dtype=[('f0', '<i4'), ('f1', '<f8', (3,)), ('f2', '<f8', (4,))])

I can index and slice one of the fields in the usual way, treating it, in this case as a 2d array:

In [922]: A['f2'][1,2:]=34
In [923]: A
Out[923]: 
array([(1, [1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0]),
       (2, [1.0, 1.0, 1.0], [0.0, 0.0, 34.0, 34.0])], 
      dtype=[('f0', '<i4'), ('f1', '<f8', (3,)), ('f2', '<f8', (4,))])

I cannot assign all the values of one record (row) with a list of values, even a nested one:

In [924]: A[1]=[3,[1,2,3],[1,2,3,4]]
...
TypeError: 'list' does not support the buffer interface

But I can set it with a tuple

In [925]: A[1]=(3,[1,2,3],[1,2,3,4])
In [926]: A
Out[926]: 
array([(1, [1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0]),
       (3, [1.0, 2.0, 3.0], [1.0, 2.0, 3.0, 4.0])], 
      dtype=[('f0', '<i4'), ('f1', '<f8', (3,)), ('f2', '<f8', (4,))])

The distinction between list and tuples is important when dealing with structured arrays. Notice in the display of A that each record is displayed with the tuple (). Multiple rows of A can be set or initialized with a list of tuples. The use of tuples draws the line between the dimensions of the containing array, and structure within the dtype.

Sign up to request clarification or add additional context in comments.

5 Comments

That is really cool! I certainly did not understand well the official documentation, or there is a lack between the doc and the implementation. I hope this helps the OP solve his problem.
Thanks hpaulj! But when I create an empty 2D array shape=(2,7) and then I try to insert a list I get this error ValueError: setting an array element with a sequence.. So I guess I need to specify that that A[0][1], for instance, should be a sub-array of 3 floats, and so on for the other cols. How to do that? Is it possible?
I've added some examples of setting values in my A array.
Thanks very much! This is a great explanation, much better that the documentation. I hope @innoSPG doesn't mind if I chose your answer as is way more detailed.
Not at all @Stefano_g, you choose the answer that is best for your needs. We want the accepted answer to be the more useful for people who get here later.
1

This is possibly best suited as a comment, I judged that it contains enough information to be put as answer.

Numpy array is not what you are looking for, you will better look at other tools like Pandas Dataframe. You need to understand what a numpy array is; from the documentation of numpy array, you have this statement:

NumPy provides an N-dimensional array type, the ndarray, which describes a collection of “items” of the same type.

And that is somehow contrary to what you are trying to achieve. From the same documentation, you have this other statement:

An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in Numpy. The array scalars allow easy manipulation of also more complicated arrangements of data.

Which means that the datatype you provide must correspond to one of those scalar types. You are providing a string of many scalar type.

3 Comments

Thanks for the exhaustive answer. If I understood correctly I should have all float, for instance. I am still a bit confused though as in the dtypes documentation I understood that it is possible to do what I want >>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 structured sub-array. But then it is probably something different.
You are right that you should have all float, for instance. The np.dtype that you are talking about is a data structure, not a scalar, so you can not have a numpy array with a user defined np.dtype. However, you can also build what you are trying to do with that. But You won't have the easy indexation given by array.
Ok! Thanks, this very clarifying indeed. I am new to python and sometimes numpy arrays are a bit strange for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.