0

I have a numpy array A which looks like this:

array([list(['nan', 'nan']),
       list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
       list(['red', 'red']), ...,
       list(['nan', 'festival'])], dtype=object)

I want to convert this to an array in which each list contains only unique elements. For example, I want the above array to get converted to:

['nan'],['nan','apple','banana'],['red'],...,['nan','festival']

I have tried doing this:

output = []
for i in A:
    output.append(np.unique(i))
output

The output which I get doing this is not desired and currently looks like this:

[array(['nan'], dtype='<U3'),
 array(['nan'], dtype='<U3'),
 array(['nan'], dtype='<U3'),....]

What can be done?

3
  • This code produse [array(['nan'], dtype='<U3'), array(['apple', 'banana', 'nan'], dtype='<U6'), array(['red'], dtype='<U3'), array([Ellipsis], dtype=object), array(['festival', 'nan'], dtype='<U8')] for me. Commented Jun 6, 2021 at 9:00
  • Yes it does, but can we remove the [array([..],dtype='<U3'] part and only have ['nan'],['nan','apple','banana'],['red'],...,['nan','festival'] Commented Jun 6, 2021 at 9:06
  • Use list instead of an array, and set instead of unique Commented Jun 6, 2021 at 9:29

2 Answers 2

2
arr=np.array([list(['nan', 'nan']),
       list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
       list(['red', 'red']), ...,
       list(['nan', 'festival'])], dtype=object)

try via list comprehension:

out=[np.unique(x).tolist() for x in arr]

OR

out=[list(np.unique(x)) for x in arr]

output of out:

[['nan'], ['apple', 'banana', 'nan'], ['red'], [Ellipsis], ['festival', 'nan']]
Sign up to request clarification or add additional context in comments.

8 Comments

This is exactly the output I was getting. What I am looking for is getting an output that contains only the unique values and not additional information such as dtype.
This is not what the OP is looking for, ['nan','apple','banana'] should remain in the same list.
@user31934 Updated answer...Kindly have a look :)
[ list(set(x)) for x in ...] should be faster.
Your copy-n-paste included ...!
|
0

Simple non-numpy answer

In [239]: alist =[list(['nan', 'nan']),
     ...:        list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
     ...:        list(['red', 'red']),
     ...:        list(['nan', 'festival'])]
In [240]: alist
Out[240]: 
[['nan', 'nan'],
 ['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan'],
 ['red', 'red'],
 ['nan', 'festival']]
In [241]: [list(set(i)) for i in alist]
Out[241]: [['nan'], ['banana', 'apple', 'nan'], ['red'], ['festival', 'nan']]

numpy will be slower.

There shouldn't have been any problem with your iteration:

In [245]: output = []
     ...: for i in np.array(alist,object):
     ...:     output.append(np.unique(i))
     ...: output
Out[245]: 
[array(['nan'], dtype='<U3'),
 array(['apple', 'banana', 'nan'], dtype='<U6'),
 array(['red'], dtype='<U3'),
 array(['festival', 'nan'], dtype='<U8')]

np.unique is a numpy function, so returns an array, not a list. For simple case like this, with lists of strings, set works and is much faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.