Speed up Python Loop append

Question

Hello I have an huge list of values, I want to to find all n values pattern like list[0:30], list[1:31]. And to each value compare percentage to the first, like percentage_change(array[0],array[1]), percentage_change(array[0],array[2]), all the way till the end of pattern. After this, I want to store all the 30 values patterns in an array of patterns to compare to other values in the future.

To do so I have to build a function: To this function, 30 values can be changed to any of my choices by change variable numberOfEntries For each pattern, I do the mean of the 10 next outcomes and store it in an array of outcomes with the same index

#end point is the end of array
#inputs (array, numberOfEntries)
#outPut(list of Patterns, list of outcomes)

y=0
condition= numberOfEntries+1
#each pattern list
pattern=[]
#list of patterns
Patterns=[] 
#outcomes array
outcomes=[]



while (y<len(array)):
    i=1
    while(i<condition):

        #this is percentage change function, I have built it inside to gain speed. Try is used because possibility of 0 division
        try:
            x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(array[y-numberOfEntries]))*100.00
            if x == 0.0:
                x=0.000000001
        except:
            x= 0.00000001
        i+=1
        pattern.append(x)
 #here is the outcomes
     outcomeRange = array[y+5:y+15]
     outcome.append(outcomeRange)
     Patterns.append(pattern)
     #clean pattern array
     pattern=[]
     y+=1

Doing this to an 8559 values array, which is small for the quantity of data I have took me 229.6792.

There is a way of adapt this to multithreading or an way of improve this speed?

EDIT:

To explain better, I have this ohlc data:

                     open      high       low     close      volume
TimeStamp                                                            
2016-08-20 15:50:00  0.003008  0.003008  0.002995  0.003000    6.351215
2016-08-20 15:55:00  0.003000  0.003008  0.003000  0.003008    6.692174
2016-08-20 16:00:00  0.003008  0.003009  0.002996  0.003001   10.813029
2016-08-20 16:05:00  0.003001  0.003000  0.002991  0.002991    4.368509
2016-08-20 16:10:00  0.002991  0.002993  0.002989  0.002990    6.662944
2016-08-20 16:15:00  0.002990  0.003015  0.002989  0.003015    8.495640

I extract this as

array=df['close'].values

Then I apply this array to the function and it will return a list full of lists like this for this particular set of values,

[0.26, 0.03, -0.03, -0.04, ,0.005]

This are percent changes from each row to the begin of the sample, and this is what I call a pattern. I can choose how much entries can have a pattern.

Hope I'm more clear now...

Multithreading is a dead-end, don't pursue it. Potentially multiprocessing, but the ideal approach would be vectorization of your loops. — roganjosh
– roganjosh, Commented Jan 10, 2018 at 20:53
For that reason, I want to tag this as numpy but it looks like you're just using python lists (despite saying you have np arrays)? — roganjosh
– roganjosh, Commented Jan 10, 2018 at 20:55
actually at this point i'm not using numpy, just pandas than return a list. @roganjosh how can I use vectorization of loops? — hopieman
– hopieman, Commented Jan 10, 2018 at 20:58
Then the question is pretty confused. You could use a rolling window on your series, keeping it in pandas. Your example should be representative of what you're trying to do but, at a guess, you don't want to pull this data out as a python list. — roganjosh
– roganjosh, Commented Jan 10, 2018 at 21:03
"pandas that return a list", do you mean a Pandas Series? If so, that behaves very similarly to a numpy array — scnerd
– scnerd, Commented Jan 10, 2018 at 21:03

Jean-François Fabre · Accepted Answer · 2018-01-10 21:02:39Z

First, I would turn the while loop to a for loop, since i is now incremented faster.

for i in range(1,condition):

Now, since y doesn't change within your inner loop, you can optimize your computation from:

x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(close[y-numberOfEntries]))*100.00

to:

x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z

where z is precomputed before the while/for loop as:

    z = 100.00 / abs(close[y-numberOfEntries])

why?

first, z is pre-computed so no computation of abs and access to close array
second, z is the inverse of the value to divide, so you can multiply by it. Multiplication is way faster than division.
third: no more division by zero is possible since you're no longer dividing. The zerodiv can occur on z outside the loop, and has to be handled accordingly (wrap the whole z + loop thing in try/except and set result to x= 0.00000001 when it occurs, it should be equivalent)

so your inner loop could be:

try:
    z = 100.00 / abs(close[y-numberOfEntries])
    for i in range(1,condition):
        x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z
except ZeroDivisionError:
    x = 0.00000001
pattern.append(x)

Collectives™ on Stack Overflow

Speed up Python Loop append

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related