Way to create numpy array to contain only unique elements for lists within it

Question

I have a numpy array A which looks like this:

array([list(['nan', 'nan']),
       list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
       list(['red', 'red']), ...,
       list(['nan', 'festival'])], dtype=object)

I want to convert this to an array in which each list contains only unique elements. For example, I want the above array to get converted to:

['nan'],['nan','apple','banana'],['red'],...,['nan','festival']

I have tried doing this:

output = []
for i in A:
    output.append(np.unique(i))
output

The output which I get doing this is not desired and currently looks like this:

[array(['nan'], dtype='<U3'),
 array(['nan'], dtype='<U3'),
 array(['nan'], dtype='<U3'),....]

What can be done?

This code produse [array(['nan'], dtype='<U3'), array(['apple', 'banana', 'nan'], dtype='<U6'), array(['red'], dtype='<U3'), array([Ellipsis], dtype=object), array(['festival', 'nan'], dtype='<U8')] for me. — Guy
– Guy, Commented Jun 6, 2021 at 9:00
Yes it does, but can we remove the [array([..],dtype='<U3'] part and only have ['nan'],['nan','apple','banana'],['red'],...,['nan','festival'] — user31934
– user31934, Commented Jun 6, 2021 at 9:06

Anurag Dabas · Accepted Answer · 2021-06-06 09:03:52Z

2

arr=np.array([list(['nan', 'nan']),
       list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
       list(['red', 'red']), ...,
       list(['nan', 'festival'])], dtype=object)

try via list comprehension:

out=[np.unique(x).tolist() for x in arr]

OR

out=[list(np.unique(x)) for x in arr]

output of out:

[['nan'], ['apple', 'banana', 'nan'], ['red'], [Ellipsis], ['festival', 'nan']]

edited Jun 6, 2021 at 9:03

answered Jun 6, 2021 at 8:57

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user31934 Over a year ago

This is exactly the output I was getting. What I am looking for is getting an output that contains only the unique values and not additional information such as dtype.

Guy Over a year ago

This is not what the OP is looking for, ['nan','apple','banana'] should remain in the same list.

Anurag Dabas Over a year ago

@user31934 Updated answer...Kindly have a look :)

hpaulj Over a year ago

[ list(set(x)) for x in ...] should be faster.

hpaulj Over a year ago

Your copy-n-paste included ...!

|

hpaulj · Accepted Answer · 2021-06-06 17:28:03Z

Simple non-numpy answer

In [239]: alist =[list(['nan', 'nan']),
     ...:        list(['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan']),
     ...:        list(['red', 'red']),
     ...:        list(['nan', 'festival'])]
In [240]: alist
Out[240]: 
[['nan', 'nan'],
 ['nan', 'nan', 'apple', 'apple', 'banana', 'nan', 'nan'],
 ['red', 'red'],
 ['nan', 'festival']]
In [241]: [list(set(i)) for i in alist]
Out[241]: [['nan'], ['banana', 'apple', 'nan'], ['red'], ['festival', 'nan']]

numpy will be slower.

There shouldn't have been any problem with your iteration:

In [245]: output = []
     ...: for i in np.array(alist,object):
     ...:     output.append(np.unique(i))
     ...: output
Out[245]: 
[array(['nan'], dtype='<U3'),
 array(['apple', 'banana', 'nan'], dtype='<U6'),
 array(['red'], dtype='<U3'),
 array(['festival', 'nan'], dtype='<U8')]

np.unique is a numpy function, so returns an array, not a list. For simple case like this, with lists of strings, set works and is much faster.

Collectives™ on Stack Overflow

Way to create numpy array to contain only unique elements for lists within it

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related