1

I have a python script that reads two tiff images and finds unique combinations, counts the observations and saves the count to a txt file.

You can find the full script in www.spatial-ecology.net

The result is:

tif1

2 2 3

0 0 3

2 3 3

tif2 

2 2 3 

3 3 4

1 1 1

result

2 2 2

3 3 1

0 3 2

3 4 1

2 1 1

3 1 2

The script works fine. This is how it is implemented.

  1. read line by line (for irows in range(rows):) in order do not load the full image in the memory (eventually a flag option can be insert to read 10 by 10 lines)

  2. go trough the arrays and create a tuple

  3. check if the tuple is already stored in the dic()

My question is: which are the tricks in this case to speed up the process?

I tested to save the results in 2 dimension array rather than dic() but it slow down the process. I check this link and maybe the python map function can improve the speed. Is this the case?

Thanks in advance Giuseppe

1
  • 1
    You should definitely profile, as in Slater's answer below. But most numeric-heavy processing problems in Python are solved best by numpy. Other than that, you might check out the Python Imaging Library (PIL). Both these libraries are designed to solve these kinds of problems efficiently. Commented Jul 10, 2013 at 14:33

1 Answer 1

2

If you want some real advice on performance you need to post the parts of your code that are directly relevant to what you want to do. If you won't at least do that there's very little we can do. That said, if you are having trouble discovering where exactly your inefficiencies are, I would highly recommend using python's cProfile module. Usage is as follows:

import cProfile

def foo(*args, **kwargs):
    #do something

cProfile.run("foo(*args, **kwargs)")

This will print a detailed time profile of your code that let's know you which steps of your code take up the most time. Usually it'll be a couple methods that either get called far more frequently than they should be getting called, or do some silly extra processing leading to a performance bottleneck.

Sign up to request clarification or add additional context in comments.

1 Comment

You can also use line_profiler to have the timings line by line instead of function by function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.