python massive performance difference array iteration vs "if in"

Question

Both the code snippets below check if an element exists in the array but first approach takes < 100ms while the second approach takes ~6 seconds .

Does anyone know why ?

import numpy as np
import time

xs = np.random.randint(90000000, size=8000000)

start = time.monotonic()
is_present = -4 in xs

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 100 milliseconds

start = time.monotonic()
for x in xs:
  if (x == -4):
    break

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 6000 milliseconds ```

repl link

Related: stackoverflow.com/questions/8385602/… and medium.com/@gough.cory/… — pho
– pho, Commented May 2, 2021 at 9:28
Try this with PyPy rather than CPython and it is magically much faster and the gap is getting closer. The reason is that CPython is a (slow) interpreter. The first line execute a optimized native C call while the second use the interpreter to iterate over the list (which is insanely slow compared to doing that using a native compiled code). — Jérôme Richard
– Jérôme Richard, Commented May 2, 2021 at 11:57

AntiMatterDynamite · Accepted Answer · 2021-05-02 09:16:33Z

3

numpy is specifically built to accelerate this kind of code, it is written in c with almost all of the python overhead removed, comparatively your second attempt is pure python so it takes much longer to loop through all the elements

answered May 2, 2021 at 9:16

AntiMatterDynamite

1,5129 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python massive performance difference array iteration vs "if in"

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related