0

Sometimes data, such as speech data, have a known number of observations (n), an unknown duration, and a known number of measurements (k).

In the 2D case in NumPy, it is clear how data with a known number of observations (n) and an unknown duration is represented with an ndarray of shape (n, ). For example:

import numpy as np

x = np.array([ [ 1, 2 ],
               [ 1, 2, 3 ]
             ])

print(x.shape) ### Returns: (2, )

Is there an equivalent for the 3D case in NumPy, where we could have an ndarray of shape (n, , k)? The best alternative to this I can think of is to have a 2D ndarray of shape (n, ) and have each element also be 2D with a (transpose) shape of (k, ). For example,

import numpy as np

x = np.array([ [ [1,2], [1,2] ],
               [ [1,2], [1,2], [1,2] ]
             ])

print(x.shape) ### Returns: (2, ); Desired: (2, , 2)

Ideally, a solution would be able to tell us the dimensionality properties of an ndarray without the need for a recursive call (maybe with an alternative to shape?).

6
  • 1
    Your first code snippet is not doing what I think you believe it is doing. When I print the result of it I get array([array([1, 2, 3]), array([1, 2])], dtype=object). This means that you are getting a one dimensional array of objects, which are in this case np.ndarray objects. As for as I am aware it is not possible to allocate an array without a fixed dimension in any direction. Commented Apr 7, 2019 at 1:23
  • Define x as (2,2) object dtype, and set the the elements from x1 and x2. But it is tricky to do this without getting broadcasting errors, Commented Apr 7, 2019 at 1:29
  • It might be easier to create a (4,) array with list or 1d array elements, and if needed reshape that to (2,2). Commented Apr 7, 2019 at 1:54
  • Thank you for the correction, I revised the code with your suggestion. Commented Apr 7, 2019 at 2:08
  • @JosephKonan: Your revised code is still a one-dimensional array of object dtype. The inner arrays are just Python lists now instead of NumPy arrays. Commented Apr 7, 2019 at 6:38

2 Answers 2

2

You seem to have misunderstood what a shape of (2,) means. It doesn't mean (2, <unknown>); the comma is not a separator between 2 and some sort of blank dimension. (2,) is the Python syntax for a one-element tuple whose one element is 2. Python uses this syntax because (2) would mean the integer 2, not a tuple.

You are not creating a two-dimensional array with an arbitrary-length second dimension. You are creating a one-dimensional array of object dtype. Its elements are ordinary Python lists. An array like this is incompatible with almost every useful thing in NumPy.

There is no way to create NumPy arrays with variable-length dimensions, whether in the 2D case you thought worked, or in the 3D case you're trying to make work.

Sign up to request clarification or add additional context in comments.

Comments

0

Just to review the 1d case:

In [33]: x = np.array([[1,2],[1,2,3]])                                          
In [34]: x.shape                                                                
Out[34]: (2,)
In [35]: x                                                                      
Out[35]: array([list([1, 2]), list([1, 2, 3])], dtype=object)

The result is a 2 element array of lists, where as we started with a list of lists. Not much difference.

But note that if the lists are same size, np.array creates a numeric 2d array:

In [36]: x = np.array([[1,2,4],[1,2,3]])                                        
In [37]: x                                                                      
Out[37]: 
array([[1, 2, 4],
       [1, 2, 3]])

So don't count on the behavior we see in [33].

I could create a 2d object array:

In [59]: x = np.empty((2,2),object)                                             
In [60]: x                                                                      
Out[60]: 
array([[None, None],                  # in this case filled with None
       [None, None]], dtype=object)

I can assign each element with a different kind and size of object:

In [61]: x[0,0] = np.arange(3)                                                  
In [62]: x[0,0] = [1,2,3]                                                       
In [63]: x[1,0] = 'abc'                                                         
In [64]: x[1,1] = np.arange(6).reshape(2,3)                                     
In [65]: x                                                                      
Out[65]: 
array([[list([1, 2, 3]), None],
       ['abc', array([[0, 1, 2],
       [3, 4, 5]])]], dtype=object)

It is still 2d. For most purposes it is like a list or list of lists, containing objects. The databuffer actually has pointers to objects stored else where in memory (just as list buffer does).

There really isn't such a thing as a 3d array with a variable last dimension. At best we can get a 2d array that contains lists or arrays of various sizes.


Make a list of 2 2d arrays:

In [69]: alist = [np.arange(6).reshape(2,3), np.arange(4.).reshape(2,2)]        
In [70]: alist                                                                  
Out[70]: 
[array([[0, 1, 2],
        [3, 4, 5]]), array([[0., 1.],
        [2., 3.]])]

In this case, giving it to np.array raises an error: In [71]: np.array(alist)
--------------------------------------------------------------------------- ValueError: could not broadcast input array from shape (2,3) into shape (2)

We could fill an object array with elements from this list:

In [72]: x = np.empty((4,),object)                                              
In [73]: x[0]=alist[0][0]                                                       
In [74]: x[1]=alist[0][1]                                                       
In [75]: x[2]=alist[1][0]                                                       
In [76]: x[3]=alist[1][1]                                                       
In [77]: x                                                                      
Out[77]: 
array([array([0, 1, 2]), array([3, 4, 5]), array([0., 1.]),
       array([2., 3.])], dtype=object)

and reshape it to 2d

In [78]: x.reshape(2,2)                                                         
Out[78]: 
array([[array([0, 1, 2]), array([3, 4, 5])],
       [array([0., 1.]), array([2., 3.])]], dtype=object)

Result is a 2d array containing 1d arrays. To get the shapes of the elements I have to do something like:

In [87]: np.frompyfunc(lambda i:i.shape, 1,1)(Out[78])                          
Out[87]: 
array([[(3,), (3,)],
       [(2,), (2,)]], dtype=object)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.