0

I'm having trouble getting my head around structural arrays in numpy.

lets say i have

  • two lists of tuples (to use native python types). foo_list and bar_list.
  • len(foo_list)==len(bar_list) The lists are the same length
  • for all i,j: len(foo_list[i])==len(foo_list[j]) and len(bar_list[i])==len(bar_list[j]) all the tuples in each list are the same length. But these lengths are not known til runtime (so I can't hard code them into a dtype string)
  • for all i,j: len(foo_list[i])!=len(bar_list[j]) The tuples in different list has different lengths

How do I zip these two together into as structure array?
It seems like specifying the dtype is going to involve a mass of string manipulation after i do things like examine the structure myself. I did try this once it was not nice code, so i figure there must be a better way to do it.

Currently I am doing: Currently my solution is to zip them and pass them to a numpy.asarray

, but that has weird consequences. It makes a 2D array of objects and those objects are arrays. If you slice it you end up with a array of arrays - not a 2D array.

Example data:

foo_list = [(0.0, 1.0, 1.0, 0.0, 1.0),
 (1.0, 0.0, 1.0, 0.0, 1.0),
 (1.0, 1.0, 1.0, 0.0, 0.0),
 (0.0, 0.0, 0.0, 0.0, 1.0),
 (0.0, 1.0, 1.0, 1.0, 0.0),
 (1.0, 1.0, 1.0, 0.0, 1.0),
 (0.0, 0.0, 0.0, 0.0, 0.0),
 (0.0, 0.0, 0.0, 1.0, 0.0),
 (1.0, 1.0, 1.0, 1.0, 0.0),
 (1.0, 0.0, 0.0, 1.0, 0.0)]
bar_list = [(0.56885990540494535, 0.54212235514533669),
 (-1.0024727291757354, 0.75636919036826),
 (1.0912423038752346, 0.66209493674389353),
 (0.52256034116805239, 0.36499434352207855),
 (-1.6837689312941191, 0.90001803836488747),
 (-3.1590090289110528, -0.3383410738003263),
 (1.4080085734609102, -1.6283826051481185),
 (1.5037872498731264, 1.5673560444854553),
 (-2.271232989935922, 0.24542353558497185),
 (-1.9752557923680221, 0.07968567723276497)]
5
  • please post some sample data. Commented May 22, 2014 at 3:54
  • hpaulj: For a normal nump array, yes, but structure arrays should (as I understnad it) get around that problem Commented May 22, 2014 at 4:59
  • foo_list and bar_list can individually be made into arrays (size (10,5) and (10,2)). What's the reason for combining them into a structured array? It's not going to speed up any numpy calculations. If you do combine them, what shape and dtype do you want it have? Commented May 22, 2014 at 5:01
  • They are logically not distinct lists. For example each item in foo_list is a label for a image represented by the the data in bar_list Commented May 22, 2014 at 5:31
  • stackoverflow.com/questions/21308785 is another SO question about reliably constructing an array of arrays. Commented May 22, 2014 at 16:49

1 Answer 1

2

You could create a structured array in which each structure has two fields, "foo" and "bar". Each field is a 1-D array. Here's one way to create such a structured array.

First get the lengths of the "foo" and "bar" fields:

In [26]: nfoo = len(foo_list[0])

In [27]: nbar = len(bar_list[0])

Create the dtype for the structured array. It has two fields, "foo" and "bar". Each field will contain an array of floating point values, with lengths nfoo and nbar, respectively.

In [28]: dt = np.dtype([('foo', np.float64, nfoo), ('bar', np.float64, nbar)])

Create the array with np.array, giving it the zipped lists and the new dtype.

In [29]: a = np.array(zip(foo_list, bar_list), dtype=dt)

a is a 1-D array with length 10:

In [30]: a.shape
Out[30]: (10,)

In [31]: a
Out[31]: 
array([([0.0, 1.0, 1.0, 0.0, 1.0], [0.5688599054049454, 0.5421223551453367]),
       ([1.0, 0.0, 1.0, 0.0, 1.0], [-1.0024727291757354, 0.75636919036826]),
       ([1.0, 1.0, 1.0, 0.0, 0.0], [1.0912423038752346, 0.6620949367438935]),
       ([0.0, 0.0, 0.0, 0.0, 1.0], [0.5225603411680524, 0.36499434352207855]),
       ([0.0, 1.0, 1.0, 1.0, 0.0], [-1.683768931294119, 0.9000180383648875]),
       ([1.0, 1.0, 1.0, 0.0, 1.0], [-3.159009028911053, -0.3383410738003263]),
       ([0.0, 0.0, 0.0, 0.0, 0.0], [1.4080085734609102, -1.6283826051481185]),
       ([0.0, 0.0, 0.0, 1.0, 0.0], [1.5037872498731264, 1.5673560444854553]),
       ([1.0, 1.0, 1.0, 1.0, 0.0], [-2.271232989935922, 0.24542353558497185]),
       ([1.0, 0.0, 0.0, 1.0, 0.0], [-1.975255792368022, 0.07968567723276497])], 
      dtype=[('foo', '<f8', (5,)), ('bar', '<f8', (2,))])

We can slice and dice a in many ways.

a['foo'] is the entire 2-D array from foo_list:

In [32]: a['foo']
Out[32]: 
array([[ 0.,  1.,  1.,  0.,  1.],
       [ 1.,  0.,  1.,  0.,  1.],
       [ 1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.]])

a['bar'][0, -1] is the last column from the first row of bar_list:

In [33]: a['bar'][0,-1]
Out[33]: 0.54212235514533669

a[0]['bar'] is the first row from bar_list. (This could also be accessed as a['bar'][0]).

In [34]: a[0]['bar']
Out[34]: array([ 0.56885991,  0.54212236])

Because the individual data elements in the "foo" and "bar" fields are all of type np.float64, you can create a 2-D view of this data. In the following, v is a 2-D array with shape (10, 7).

In [42]: v = a.view(np.float64).reshape(len(a), -1)

In [43]: v.shape
Out[43]: (10, 7)

In [44]: v[0]
Out[44]: 
array([ 0.        ,  1.        ,  1.        ,  0.        ,  1.        ,
        0.56885991,  0.54212236])

In [45]: v[0, -1]
Out[45]: 0.54212235514533669

But if a 2-D array is what you want, you don't need to create a structured array. You can create the 2-D array directly, in several ways. For example,

In [46]: b = np.array([f+b for f, b in zip(foo_list, bar_list)])

In [47]: b.shape
Out[47]: (10, 7)

In [48]: b[0]
Out[48]: 
array([ 0.        ,  1.        ,  1.        ,  0.        ,  1.        ,
        0.56885991,  0.54212236])

In [49]: b[0, -1]
Out[49]: 0.54212235514533669
Sign up to request clarification or add additional context in comments.

2 Comments

Can you address a[0][0]? ?
a[0][0] will give the first row of the 'foo' data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.