0

I am trying to create a numpy records array to match data that I am reading from an HDF5 file. The dtype of the HDF5 dataset (dataset) has a dtype of np.dtype(('u1', (3,))). The dtype of dataset[0] is dtype('uint8'). I am trying to write an HDF5 to match this as shown below:

import numpy as np
np.recarray((2,), dtype=('u1', (3,)))

However, this produces a result that looks it contains bytes rather than integers (which is how it looks when I read the HDF5 dataset):

records = rec.array([b'\xB0\xCE\x61', b'\x38\xD4\x01'], dtype=|V3)

When, I check the dtype of this array with records[0].dtype I get dtype(V3) instead of dtype('uint8') as I do when reading the HDF5 file.

How do I get the records to store these values as uint8 rather than bytes? I noticed that, if give the dtype a name, then the values are represented by numbers rather than bytes, but the HDF5 dataset does not have a dtype name.

>>>np.recarray((2,), dtype=[('test', 'u1', (3,))])
rec.array([([ 48,  26,  98],), ([ 56, 212,   1],)], dtype=[('test', 'u1', (3,))])

1 Answer 1

1

There are several items in your question that require explanation.

  1. The (3,) makes the object an array of length 3. That's why you see '|V3'.
  2. The "garbage values" you see are the random values used to initialize the array when was created (because you created an empty array). You won't see this behavior if you initialize with values (or zeros or ones).
  3. When reading HDF5 data you don't have to create the NumPy array in advance. You can create it when you read the data.
  4. Better still, you can create a h5py dataset object to access the data, then use it "like" an array. Dataset objects behave like NumPy objects (and use less memory). For example, you can get the dtype and shape. These will be the same as the equivalent NumPy array.

Example to access values in an HDF5 dataset as a dataset object or an array:

with h5py.File('example.h5') as h5f:
    # create dataset object:
    ds_obj = h5f['dataset_name']
    print(ds_obj.dtype)
    print(ds_obj.shape)
    # create an array of the entire dataset:
    arr = h5f['dataset_name'][()]
    # or:
    arr = ds_obj[()]
    # create an array from a slice of the dataset:
    arr_slice = h5f['dataset_name'][0:10,]
    # or:
    arr_slice = ds_obj[0:10]

Example to create the record array (for completeness):

rec_arr = np.empty((10,), dtype=([('int_value','uint8'), ('str_value','S10')]))
rec_arr[0,] = (100, 'row 0 val')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.