Ignore nested structures in numpy's array creation

Question

I want to write to a vlen hdf5 dataset, for that I am using h5py.Dataset.write_direct to speed up the process. Suppose I have a list of numpy arrays (e.g. given by cv2.findContours), and by dataset:

dataset = h5file.create_dataset('dataset', \
                                shape=..., \
                                dtype=h5py.special_type(vlen='int32'))
contours = [numpy array, ...]

For writing contours to a destination given by the slice dest, I must first convert contours to a numpy array of numpy arrays:

contours = numpy.array(contours) # shape=(len(contours),); dtype=object
dataset.write_direct(contours, None, dest)

But this only works, if all numpy arrays in contours have different shapes, e.g.:

contours = [np.zeros((10,), 'int32'), np.zeros((10,), 'int32')]
contours = numpy.array(contours) # shape=(2,10); dtype='int32'

The question is: How can I tell numpy to create an array of objects?

Possible solutions:

Manual creation:

contours_np = np.empty((len(contours),), dtype=object)
for i, contour in enumerate(contours):
    contours_np[i] = contour

But loops are super slow, thus using map:

map(lambda (i, contour): contour.__setitem_(i, contour),  \
    enumerate(contours))

I have tested a second option, which is twice as fast as the above, but also super ugly:

contours = np.array(contours + [None])[:-1]

Here are the micro benchmarks:

l = [np.random.normal(size=100) for _ in range(1000)]

Option 1:

$ start = time.time(); l_array = np.zeros(shape=(len(l),), dtype='O'); map(lambda (i, c): l_array.__setitem__(i, c), enumerate(l)); end = time.time(); print("%fms" % ((end - start) * 10**3))
0.950098ms

Option 2:

$ start = time.time(); np.array(l + [None])[:-1]; end = time.time(); print("%fms" % ((end - start) * 10**3))
0.409842ms

This looks kind of ugly, any other suggestions?

Warren Weckesser · Accepted Answer · 2016-06-11 10:34:18Z

1

In this version

contours_np = np.empty((len(contours),), dtype=object)
for i, contour in enumerate(contours):
    contours_np[i] = contour

you can replace the loop with the single statement

contours_np[...] = contours

answered Jun 11, 2016 at 10:34

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1447257 Over a year ago

This is what I've been looking for :)

user707650 · Accepted Answer · 2016-06-11 10:26:57Z

0

One solution appears to be to create the "outer" array first (with an 'object' dtype), then fill the elements with the inner array.

Thus:

contours = [np.zeros((10,), 'int32'), np.zeros((10,), 'int32')]
a = np.empty(len(contours), dtype=np.object)
for i in range(len(contours)):
    a[i] = contours[i]
print(a)
print()
print(repr(a))

results in

[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
 array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)]

array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32),
       array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)], dtype=object)

answered Jun 11, 2016 at 10:26

user707650

Collectives™ on Stack Overflow

Ignore nested structures in numpy's array creation

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related