How to create numpy records array with numerical entries without dtype name

Question

I am trying to create a numpy records array to match data that I am reading from an HDF5 file. The dtype of the HDF5 dataset (dataset) has a dtype of np.dtype(('u1', (3,))). The dtype of dataset[0] is dtype('uint8'). I am trying to write an HDF5 to match this as shown below:

import numpy as np
np.recarray((2,), dtype=('u1', (3,)))

However, this produces a result that looks it contains bytes rather than integers (which is how it looks when I read the HDF5 dataset):

records = rec.array([b'\xB0\xCE\x61', b'\x38\xD4\x01'], dtype=|V3)

When, I check the dtype of this array with records[0].dtype I get dtype(V3) instead of dtype('uint8') as I do when reading the HDF5 file.

How do I get the records to store these values as uint8 rather than bytes? I noticed that, if give the dtype a name, then the values are represented by numbers rather than bytes, but the HDF5 dataset does not have a dtype name.

>>>np.recarray((2,), dtype=[('test', 'u1', (3,))])
rec.array([([ 48,  26,  98],), ([ 56, 212,   1],)], dtype=[('test', 'u1', (3,))])

kcw78 · Accepted Answer · 2024-09-05 14:24:32Z

There are several items in your question that require explanation.

The (3,) makes the object an array of length 3. That's why you see '|V3'.
The "garbage values" you see are the random values used to initialize the array when was created (because you created an empty array). You won't see this behavior if you initialize with values (or zeros or ones).
When reading HDF5 data you don't have to create the NumPy array in advance. You can create it when you read the data.
Better still, you can create a h5py dataset object to access the data, then use it "like" an array. Dataset objects behave like NumPy objects (and use less memory). For example, you can get the dtype and shape. These will be the same as the equivalent NumPy array.

Example to access values in an HDF5 dataset as a dataset object or an array:

with h5py.File('example.h5') as h5f:
    # create dataset object:
    ds_obj = h5f['dataset_name']
    print(ds_obj.dtype)
    print(ds_obj.shape)
    # create an array of the entire dataset:
    arr = h5f['dataset_name'][()]
    # or:
    arr = ds_obj[()]
    # create an array from a slice of the dataset:
    arr_slice = h5f['dataset_name'][0:10,]
    # or:
    arr_slice = ds_obj[0:10]

Example to create the record array (for completeness):

rec_arr = np.empty((10,), dtype=([('int_value','uint8'), ('str_value','S10')]))
rec_arr[0,] = (100, 'row 0 val')

Collectives™ on Stack Overflow

How to create numpy records array with numerical entries without dtype name

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related