0

What is the fastest way to check if a list is inside of a nested list, full iteration or using in ?

Given

A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]

Count number of duplicates between A and B.

I see either iterating over all elements:

for i in range(len(A)):
    for j in range(len(B)):
        if A[i] == B[j]:
            count+=1

Or using in with one element iteration:

for i in range(len(A)):
    if A[i] in B:
        count+=1

With the actual lengths of A and B being over 100,000 arrays, and each contains 4 elements, are there any specific functions or strategies to do this comparison efficiently?

With my data, option 1 is green, option 2 is blue, the answer from qqvc is red, user1245262 answer is turquoise (it is at the bottom with very fast, linear complexity) y axis is seconds, x axis is number of 4 element arrays being compared in each list.

enter image description here

8
  • You can time it yourself using the timeit module. Commented Dec 24, 2014 at 4:22
  • @wwii I edited to show the results on my data between the two from using the profiler. I am wondering if there are other methods that can accomplish the same thing Commented Dec 24, 2014 at 4:44
  • Are all the items in A unique - are there duplicates in A? ditto for B Commented Dec 24, 2014 at 4:56
  • They should all be unique, but I guess it is possible the data has errors. I can try checking against themselves Commented Dec 24, 2014 at 5:03
  • I have about 30% duplicates between A and B, have not yet checked for uniques within themselves Commented Dec 24, 2014 at 5:04

2 Answers 2

1

You might try using sets. Consider:

>>> A = [['Yes','2009','Me'],['Yes','2009','You'],['No','2009','You']]
>>> B = [['No','2009','Me'],['Yes','2009','You'],['No','2009','You']]

sets require hashable elements, so you need to convert the lists to tuples. I'm assuming that your lists are all in some particular order, so that ['dog',2,'mouse'] will always appear that way, and not as ['mouse', 2, 'dog']. Then,

>>> AA = set(map(tuple,A))
>>> BB = set(map(tuple,B))

Then,

>>> BB.intersection(AA)
set([('No', '2009', 'You'), ('Yes', '2009', 'You')])

Since you only seem to want the size of the intersection,

>>> len(BB.intersection(AA))
2

This might be faster than your looping, but you'd have to check it.

Sign up to request clarification or add additional context in comments.

Comments

0

option z:

sum(thing in B for thing in A)

option y:

sum(itertools.starmap(operator.eq, itertools.product(A,B)))

3 Comments

this is the same efficiency as option 2, although this is slightly slower (in my testing). 0.5 seconds slower when at 20k lists
the itertools seems slow in the profiler, but im not sure if it is the sum operation or not. how would you test it otherwise? It seems to be slower than option z
itertools.starmap(operator.eq, itertools.product(A,B))generates booleans - just count all the True's. Probably slower because of the iterators. The set/tuple(hashable) solution should be the fastest for these types of problems.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.