How to do numpy structural arrays

Question

I'm having trouble getting my head around structural arrays in numpy.

lets say i have

two lists of tuples (to use native python types). foo_list and bar_list.
len(foo_list)==len(bar_list) The lists are the same length
for all i,j: len(foo_list[i])==len(foo_list[j]) and len(bar_list[i])==len(bar_list[j]) all the tuples in each list are the same length. But these lengths are not known til runtime (so I can't hard code them into a dtype string)
for all i,j: len(foo_list[i])!=len(bar_list[j]) The tuples in different list has different lengths

How do I zip these two together into as structure array?
It seems like specifying the dtype is going to involve a mass of string manipulation after i do things like examine the structure myself. I did try this once it was not nice code, so i figure there must be a better way to do it.

Currently I am doing: Currently my solution is to zip them and pass them to a numpy.asarray

, but that has weird consequences. It makes a 2D array of objects and those objects are arrays. If you slice it you end up with a array of arrays - not a 2D array.

Example data:

foo_list = [(0.0, 1.0, 1.0, 0.0, 1.0),
 (1.0, 0.0, 1.0, 0.0, 1.0),
 (1.0, 1.0, 1.0, 0.0, 0.0),
 (0.0, 0.0, 0.0, 0.0, 1.0),
 (0.0, 1.0, 1.0, 1.0, 0.0),
 (1.0, 1.0, 1.0, 0.0, 1.0),
 (0.0, 0.0, 0.0, 0.0, 0.0),
 (0.0, 0.0, 0.0, 1.0, 0.0),
 (1.0, 1.0, 1.0, 1.0, 0.0),
 (1.0, 0.0, 0.0, 1.0, 0.0)]
bar_list = [(0.56885990540494535, 0.54212235514533669),
 (-1.0024727291757354, 0.75636919036826),
 (1.0912423038752346, 0.66209493674389353),
 (0.52256034116805239, 0.36499434352207855),
 (-1.6837689312941191, 0.90001803836488747),
 (-3.1590090289110528, -0.3383410738003263),
 (1.4080085734609102, -1.6283826051481185),
 (1.5037872498731264, 1.5673560444854553),
 (-2.271232989935922, 0.24542353558497185),
 (-1.9752557923680221, 0.07968567723276497)]

hpaulj: For a normal nump array, yes, but structure arrays should (as I understnad it) get around that problem — Frames Catherine White
– Frames Catherine White, Commented May 22, 2014 at 4:59
foo_list and bar_list can individually be made into arrays (size (10,5) and (10,2)). What's the reason for combining them into a structured array? It's not going to speed up any numpy calculations. If you do combine them, what shape and dtype do you want it have? — hpaulj
– hpaulj, Commented May 22, 2014 at 5:01
They are logically not distinct lists. For example each item in foo_list is a label for a image represented by the the data in bar_list — Frames Catherine White
– Frames Catherine White, Commented May 22, 2014 at 5:31
stackoverflow.com/questions/21308785 is another SO question about reliably constructing an array of arrays. — hpaulj
– hpaulj, Commented May 22, 2014 at 16:49

Warren Weckesser · Accepted Answer · 2014-05-22 05:08:01Z

You could create a structured array in which each structure has two fields, "foo" and "bar". Each field is a 1-D array. Here's one way to create such a structured array.

First get the lengths of the "foo" and "bar" fields:

In [26]: nfoo = len(foo_list[0])

In [27]: nbar = len(bar_list[0])

Create the dtype for the structured array. It has two fields, "foo" and "bar". Each field will contain an array of floating point values, with lengths nfoo and nbar, respectively.

In [28]: dt = np.dtype([('foo', np.float64, nfoo), ('bar', np.float64, nbar)])

Create the array with np.array, giving it the zipped lists and the new dtype.

In [29]: a = np.array(zip(foo_list, bar_list), dtype=dt)

a is a 1-D array with length 10:

In [30]: a.shape
Out[30]: (10,)

In [31]: a
Out[31]: 
array([([0.0, 1.0, 1.0, 0.0, 1.0], [0.5688599054049454, 0.5421223551453367]),
       ([1.0, 0.0, 1.0, 0.0, 1.0], [-1.0024727291757354, 0.75636919036826]),
       ([1.0, 1.0, 1.0, 0.0, 0.0], [1.0912423038752346, 0.6620949367438935]),
       ([0.0, 0.0, 0.0, 0.0, 1.0], [0.5225603411680524, 0.36499434352207855]),
       ([0.0, 1.0, 1.0, 1.0, 0.0], [-1.683768931294119, 0.9000180383648875]),
       ([1.0, 1.0, 1.0, 0.0, 1.0], [-3.159009028911053, -0.3383410738003263]),
       ([0.0, 0.0, 0.0, 0.0, 0.0], [1.4080085734609102, -1.6283826051481185]),
       ([0.0, 0.0, 0.0, 1.0, 0.0], [1.5037872498731264, 1.5673560444854553]),
       ([1.0, 1.0, 1.0, 1.0, 0.0], [-2.271232989935922, 0.24542353558497185]),
       ([1.0, 0.0, 0.0, 1.0, 0.0], [-1.975255792368022, 0.07968567723276497])], 
      dtype=[('foo', '<f8', (5,)), ('bar', '<f8', (2,))])

We can slice and dice a in many ways.

a['foo'] is the entire 2-D array from foo_list:

In [32]: a['foo']
Out[32]: 
array([[ 0.,  1.,  1.,  0.,  1.],
       [ 1.,  0.,  1.,  0.,  1.],
       [ 1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.]])

a['bar'][0, -1] is the last column from the first row of bar_list:

In [33]: a['bar'][0,-1]
Out[33]: 0.54212235514533669

a[0]['bar'] is the first row from bar_list. (This could also be accessed as a['bar'][0]).

In [34]: a[0]['bar']
Out[34]: array([ 0.56885991,  0.54212236])

Because the individual data elements in the "foo" and "bar" fields are all of type np.float64, you can create a 2-D view of this data. In the following, v is a 2-D array with shape (10, 7).

In [42]: v = a.view(np.float64).reshape(len(a), -1)

In [43]: v.shape
Out[43]: (10, 7)

In [44]: v[0]
Out[44]: 
array([ 0.        ,  1.        ,  1.        ,  0.        ,  1.        ,
        0.56885991,  0.54212236])

In [45]: v[0, -1]
Out[45]: 0.54212235514533669

But if a 2-D array is what you want, you don't need to create a structured array. You can create the 2-D array directly, in several ways. For example,

In [46]: b = np.array([f+b for f, b in zip(foo_list, bar_list)])

In [47]: b.shape
Out[47]: (10, 7)

In [48]: b[0]
Out[48]: 
array([ 0.        ,  1.        ,  1.        ,  0.        ,  1.        ,
        0.56885991,  0.54212236])

In [49]: b[0, -1]
Out[49]: 0.54212235514533669

Collectives™ on Stack Overflow

How to do numpy structural arrays

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related