Improve python performance for array operations

Question

I have a python script that reads two tiff images and finds unique combinations, counts the observations and saves the count to a txt file.

You can find the full script in www.spatial-ecology.net

The result is:

The script works fine. This is how it is implemented.

read line by line (for irows in range(rows):) in order do not load the full image in the memory (eventually a flag option can be insert to read 10 by 10 lines)
go trough the arrays and create a tuple
check if the tuple is already stored in the dic()

My question is: which are the tricks in this case to speed up the process?

I tested to save the results in 2 dimension array rather than dic() but it slow down the process. I check this link and maybe the python map function can improve the speed. Is this the case?

Thanks in advance Giuseppe

You should definitely profile, as in Slater's answer below. But most numeric-heavy processing problems in Python are solved best by numpy. Other than that, you might check out the Python Imaging Library (PIL). Both these libraries are designed to solve these kinds of problems efficiently. — HardlyKnowEm
– HardlyKnowEm, Commented Jul 10, 2013 at 14:33

Slater Victoroff · Accepted Answer · 2013-07-10 14:29:38Z

2

If you want some real advice on performance you need to post the parts of your code that are directly relevant to what you want to do. If you won't at least do that there's very little we can do. That said, if you are having trouble discovering where exactly your inefficiencies are, I would highly recommend using python's cProfile module. Usage is as follows:

import cProfile

def foo(*args, **kwargs):
    #do something

cProfile.run("foo(*args, **kwargs)")

This will print a detailed time profile of your code that let's know you which steps of your code take up the most time. Usually it'll be a couple methods that either get called far more frequently than they should be getting called, or do some silly extra processing leading to a performance bottleneck.

answered Jul 10, 2013 at 14:29

Slater Victoroff

22k23 gold badges92 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

J. Martinot-Lagarde Over a year ago

You can also use line_profiler to have the timings line by line instead of function by function.

Collectives™ on Stack Overflow

Improve python performance for array operations

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related