1

How can I compact the dtype (which is effectively about memory usage) in a numpy string array, where the maximum string length, say U5, is shorter than that is defined in the dtype attribute, say U10.

One way to "compact" the dtype is to explicitly cast it to U5, but can this be done automatically by some function calls without mannually inspecting the string length?

For example:

>>> import numpy as np
>>> a = np.array([['a', 'bb', 'cc'], ['aaabc', 'ccc', 'b']], dtype='U10')

# Non-existing pseudo function
>>> a = compact(a)
>>> print(a.dtype)
dtype('<U5')

so the function compact compacts the redundant U10 to U5.

Thanks in advance!

1 Answer 1

1

This does it for both unicode/str ('U') and bytes ('S') arrays:

def compact(str_arr): 
    dtype = (str_arr.dtype.kind, np.char.str_len(str_arr).max()) 
    return np.asarray(str_arr, dtype) 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.