Python in vs iteration for nested list

Question

What is the fastest way to check if a list is inside of a nested list, full iteration or using in ?

Given

A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]

Count number of duplicates between A and B.

I see either iterating over all elements:

for i in range(len(A)):
    for j in range(len(B)):
        if A[i] == B[j]:
            count+=1

Or using in with one element iteration:

for i in range(len(A)):
    if A[i] in B:
        count+=1

With the actual lengths of A and B being over 100,000 arrays, and each contains 4 elements, are there any specific functions or strategies to do this comparison efficiently?

With my data, option 1 is green, option 2 is blue, the answer from qqvc is red, user1245262 answer is turquoise (it is at the bottom with very fast, linear complexity) y axis is seconds, x axis is number of 4 element arrays being compared in each list.

enter image description here

@wwii I edited to show the results on my data between the two from using the profiler. I am wondering if there are other methods that can accomplish the same thing — user-2147482637
– user-2147482637, Commented Dec 24, 2014 at 4:44
Are all the items in A unique - are there duplicates in A? ditto for B — wwii
– wwii, Commented Dec 24, 2014 at 4:56
They should all be unique, but I guess it is possible the data has errors. I can try checking against themselves — user-2147482637
– user-2147482637, Commented Dec 24, 2014 at 5:03
I have about 30% duplicates between A and B, have not yet checked for uniques within themselves — user-2147482637
– user-2147482637, Commented Dec 24, 2014 at 5:04

user1245262 · Accepted Answer · 2014-12-24 04:46:40Z

1

You might try using sets. Consider:

>>> A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
>>> B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]

sets require hashable elements, so you need to convert the lists to tuples. I'm assuming that your lists are all in some particular order, so that ['dog',2,'mouse'] will always appear that way, and not as ['mouse', 2, 'dog']. Then,

>>> AA = set(map(tuple,A))
>>> BB = set(map(tuple,B))

Then,

>>> BB.intersection(AA)
set([('No', '2009', 'You'), ('Yes', '2009', 'You')])

Since you only seem to want the size of the intersection,

>>> len(BB.intersection(AA))
2

This might be faster than your looping, but you'd have to check it.

answered Dec 24, 2014 at 4:46

user1245262

7,55512 gold badges55 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wwii · Accepted Answer · 2014-12-24 05:10:10Z

0

option z:

sum(thing in B for thing in A)

option y:

sum(itertools.starmap(operator.eq, itertools.product(A,B)))

edited Dec 24, 2014 at 5:10

answered Dec 24, 2014 at 4:55

wwii

23.9k7 gold badges42 silver badges81 bronze badges

3 Comments

user-2147482637 Over a year ago

this is the same efficiency as option 2, although this is slightly slower (in my testing). 0.5 seconds slower when at 20k lists

user-2147482637 Over a year ago

the itertools seems slow in the profiler, but im not sure if it is the sum operation or not. how would you test it otherwise? It seems to be slower than option z

wwii Over a year ago

itertools.starmap(operator.eq, itertools.product(A,B))generates booleans - just count all the True's. Probably slower because of the iterators. The set/tuple(hashable) solution should be the fastest for these types of problems.

Collectives™ on Stack Overflow

Python in vs iteration for nested list

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related