How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'

Question

In Python 3, I have the follow NumPy array of strings.

Each string in the NumPy array is in the form b'MD18EE instead of MD18EE.

For example:

import numpy as np
print(array1)
(b'first_element', b'element',...)

Normally, one would use .decode('UTF-8') to decode these elements.

However, if I try:

array1 = array1.decode('UTF-8')

I get the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

How do I decode these elements from a NumPy array? (That is, I don't want b'')

EDIT:

Let's say I was dealing with a Pandas DataFrame with only certain columns that were encoded in this manner. For example:

import pandas as pd
df = pd.DataFrame(...)

df
        COL1          ....
0   b'entry1'         ...
1   b'entry2'
2   b'entry3'
3   b'entry4'
4   b'entry5'
5   b'entry6'

Community · Accepted Answer · 2017-05-23 10:32:41Z

22

You have an array of bytestrings; dtype is S:

In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]: 
array([b'first_element', b'element'], 
      dtype='|S13')

astype easily converts them to unicode, the default string type for Py3.

In [340]: arr.astype('U13')
Out[340]: 
array(['first_element', 'element'], 
      dtype='<U13')

There is also a library of string functions - applying the corresponding str method to the elements of a string array

In [341]: np.char.decode(arr)
Out[341]: 
array(['first_element', 'element'], 
      dtype='<U13')

The astype is faster, but the decode lets you specify an encoding.

See also How to decode a numpy array of dtype=numpy.string_?

edited May 23, 2017 at 10:32

CommunityBot

11 silver badge

answered Nov 3, 2016 at 0:33

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

John Jiang Over a year ago

The astype method seems too specific with the byte length information. For instance what if my input dtype is '|S1' rather than '|S13'?

hpaulj Over a year ago

@John, it looks like we don't have to specify the length: np.array('one', 'S7').astype('U')

John Jiang Over a year ago

I tried astype('U') on some bytearray and got UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128). However np.char.decode(arr) worked alright.

Wander Nauta · Accepted Answer · 2016-11-02 20:23:12Z

6

If you want the result to be a (Python) list of strings, you can use a list comprehension:

>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>

Alternatively, if you want to keep it as a Numpy array, you can use np.vectorize to make a vectorized decoder function:

>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>

answered Nov 2, 2016 at 20:23

Wander Nauta

19.7k1 gold badge50 silver badges65 bronze badges

7 Comments

ShanZhengYang Over a year ago

Thanks! I'm taking the numpy array and putting it into a pandas dataframe. Maybe there are quicker shortcuts? Convert by column?

Wander Nauta Over a year ago

Do you mean quicker as in 'runs faster' or quicker as in 'less code'? Because both methods are oneliners, the print statements are just to show that they work :)

ShanZhengYang Over a year ago

:) I was thinking run faster. However, I think this method works fine---this appears to be a Python2/Python3 side effect, so I suspect others have run into this issue.

ShanZhengYang Over a year ago

In any sense, using decoder gives me this error: AttributeError: 'numpy.void' object has no attribute 'decode'

Wander Nauta Over a year ago

Hmm, in that case, it looks like your array is not an array of strings at all, but rather an array of strings and voids - but I'm sure you'll be able to modify the decoder to handle those as well. At any rate, I think the best (and probably fastest) way to approach this would be to make sure you use strings everywhere, rather than bytes. How you would do that depends on where your data is coming from and how you read it.

|

Collectives™ on Stack Overflow

How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'

2 Answers 2

3 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related