1

Hello I have an huge list of values, I want to to find all n values pattern like list[0:30], list[1:31]. And to each value compare percentage to the first, like percentage_change(array[0],array[1]), percentage_change(array[0],array[2]), all the way till the end of pattern. After this, I want to store all the 30 values patterns in an array of patterns to compare to other values in the future.

To do so I have to build a function: To this function, 30 values can be changed to any of my choices by change variable numberOfEntries For each pattern, I do the mean of the 10 next outcomes and store it in an array of outcomes with the same index

#end point is the end of array
#inputs (array, numberOfEntries)
#outPut(list of Patterns, list of outcomes)

y=0
condition= numberOfEntries+1
#each pattern list
pattern=[]
#list of patterns
Patterns=[] 
#outcomes array
outcomes=[]



while (y<len(array)):
    i=1
    while(i<condition):

        #this is percentage change function, I have built it inside to gain speed. Try is used because possibility of 0 division
        try:
            x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(array[y-numberOfEntries]))*100.00
            if x == 0.0:
                x=0.000000001
        except:
            x= 0.00000001
        i+=1
        pattern.append(x)
 #here is the outcomes
     outcomeRange = array[y+5:y+15]
     outcome.append(outcomeRange)
     Patterns.append(pattern)
     #clean pattern array
     pattern=[]
     y+=1

Doing this to an 8559 values array, which is small for the quantity of data I have took me 229.6792.

There is a way of adapt this to multithreading or an way of improve this speed?

EDIT:

To explain better, I have this ohlc data:

                     open      high       low     close      volume
TimeStamp                                                            
2016-08-20 15:50:00  0.003008  0.003008  0.002995  0.003000    6.351215
2016-08-20 15:55:00  0.003000  0.003008  0.003000  0.003008    6.692174
2016-08-20 16:00:00  0.003008  0.003009  0.002996  0.003001   10.813029
2016-08-20 16:05:00  0.003001  0.003000  0.002991  0.002991    4.368509
2016-08-20 16:10:00  0.002991  0.002993  0.002989  0.002990    6.662944
2016-08-20 16:15:00  0.002990  0.003015  0.002989  0.003015    8.495640

I extract this as

array=df['close'].values

Then I apply this array to the function and it will return a list full of lists like this for this particular set of values,

[0.26, 0.03, -0.03, -0.04, ,0.005]

This are percent changes from each row to the begin of the sample, and this is what I call a pattern. I can choose how much entries can have a pattern.

Hope I'm more clear now...

10
  • Multithreading is a dead-end, don't pursue it. Potentially multiprocessing, but the ideal approach would be vectorization of your loops. Commented Jan 10, 2018 at 20:53
  • 2
    For that reason, I want to tag this as numpy but it looks like you're just using python lists (despite saying you have np arrays)? Commented Jan 10, 2018 at 20:55
  • actually at this point i'm not using numpy, just pandas than return a list. @roganjosh how can I use vectorization of loops? Commented Jan 10, 2018 at 20:58
  • 1
    Then the question is pretty confused. You could use a rolling window on your series, keeping it in pandas. Your example should be representative of what you're trying to do but, at a guess, you don't want to pull this data out as a python list. Commented Jan 10, 2018 at 21:03
  • "pandas that return a list", do you mean a Pandas Series? If so, that behaves very similarly to a numpy array Commented Jan 10, 2018 at 21:03

1 Answer 1

2

First, I would turn the while loop to a for loop, since i is now incremented faster.

for i in range(1,condition):

Now, since y doesn't change within your inner loop, you can optimize your computation from:

x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(close[y-numberOfEntries]))*100.00

to:

x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z

where z is precomputed before the while/for loop as:

    z = 100.00 / abs(close[y-numberOfEntries])

why?

  • first, z is pre-computed so no computation of abs and access to close array
  • second, z is the inverse of the value to divide, so you can multiply by it. Multiplication is way faster than division.
  • third: no more division by zero is possible since you're no longer dividing. The zerodiv can occur on z outside the loop, and has to be handled accordingly (wrap the whole z + loop thing in try/except and set result to x= 0.00000001 when it occurs, it should be equivalent)

so your inner loop could be:

try:
    z = 100.00 / abs(close[y-numberOfEntries])
    for i in range(1,condition):
        x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z
except ZeroDivisionError:
    x = 0.00000001
pattern.append(x)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.