4

I need to create a 2D numpy array from a list of 1D arrays and scalars so that the scalars are replicated to match the length of the 1D arrays.

Example of desired behaviour

>>> x = np.ones(5)
>>> something([x, 0, x])
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

I know that the vectorial elements of the list are always going to have the same length (shape) so I can do it "by hand" by doing something like this:

def something(lst):
    for e in lst:
        if isinstance(e, np.ndarray):
            l = len(e)
            break
    tmp = []
    for e in lst:
        if isinstance(e, np.ndarray):
            tmp.append(e)
            l = len(e)
        else:
            tmp.append(np.empty(l))
            tmp[-1][:] = e
    return np.array(tmp)

What I am asking for is whether there is some ready-made solution hidden somewhere in numpy or, if there is none, whether there is a better (e.g. more general, more reliable, faster) solution than the one above.

3 Answers 3

4
In [179]: np.column_stack(np.broadcast(x, 0, x))
Out[179]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

or

In [187]: np.row_stack(np.broadcast_arrays(x, 0, x))
Out[187]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

Using np.broadcast is faster than np.broadcast_arrays:

In [195]: %timeit np.column_stack(np.broadcast(*[x, 0, x]*10))
10000 loops, best of 3: 46.4 µs per loop

In [196]: %timeit np.row_stack(np.broadcast_arrays(*[x, 0, x]*10))
1000 loops, best of 3: 380 µs per loop

but slower than your something function:

In [201]: %timeit something([x, 0, x]*10)
10000 loops, best of 3: 37.3 µs per loop

Note that np.broadcast can be passed at most 32 arrays:

In [199]: np.column_stack(np.broadcast(*[x, 0, x]*100))
ValueError: Need at least two and fewer than (32) array objects.

whereas np.broadcast_arrays is unlimited:

In [198]: np.row_stack(np.broadcast_arrays(*[x, 0, x]*100))
Out[198]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       ..., 
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

Using np.broadcast or np.broadcast_arrays is a bit more general than something. It will work on arrays of different (but broadcastable) shapes, for instance:

In [209]: np.column_stack(np.broadcast(*[np.atleast_2d(x), 0, x]))
Out[209]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

whereas something([np.atleast_2d(x), 0, x]) returns:

In [211]: something([np.atleast_2d(x), 0, x])
Out[211]: 
array([array([[ 1.,  1.,  1.,  1.,  1.]]), array([ 0.]),
       array([ 1.,  1.,  1.,  1.,  1.])], dtype=object)
Sign up to request clarification or add additional context in comments.

Comments

1

A shorter way, but I doubt if faster:

l = len(max(lst, key=lambda e: len(e) if isinstance(e, np.ndarray) else 0))
new_lst = np.array([(x if isinstance(x, np.ndarray) else np.ones(l) * x) for x in lst])

Edit: use np.fromiter to do it faster:

l = len(max(lst, key=lambda e: len(e) if isinstance(e, np.ndarray) else 0))
new_lst = np.fromiter(((x if isinstance(x, np.ndarray) else np.ones(l) * x) for x in lst))

And use while loop to do it faster, but code is bit longer:

i = 0
while not isinstance(lst[i], np.ndarray):
  i += 1
l = len(lst[i])
new_lst = np.fromiter(((x if isinstance(x, np.ndarray) else np.ones(l) * x) for x in lst))

Comments

0

For 25 rows, a list comprehension version of something is between broadcase and broadcast_arrays in speed:

In [48]: ll=[x,0,x,x,0]*5

In [49]: np.vstack([y if isinstance(y,np.ndarray) else np.zeros(5) for y in ll]).shape
Out[49]: (25, 5)

In [50]: timeit np.vstack([y if isinstance(y,np.ndarray) else np.zeros(5) for y in ll]).shape
1000 loops, best of 3: 219 us per loop

In [51]: timeit np.vstack(np.broadcast_arrays(*ll))
1000 loops, best of 3: 790 us per loop

In [52]: timeit np.column_stack(np.broadcast(*ll)).shape
10000 loops, best of 3: 126 us per loop

Using np.array instead of vstack it gets even better:

In [54]: timeit np.array([y if isinstance(y,np.ndarray) else np.zeros(5) for y in ll]).shape
10000 loops, best of 3: 54.2 us per loop

For 2d x, vstack on the if comprehension may be the only correct one:

In [66]: x=np.arange(10).reshape(2,5)

In [67]: ll=[x,0,x,x,0]

In [68]: np.vstack([y if isinstance(y,np.ndarray) else np.zeros(5) for y in ll]) 
Out[68]: 
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [ 0.,  0.,  0.,  0.,  0.]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.