1

I wrote this piece of code shown below. I am having severe performance issues with it. Especially the loop where i loop 50 million times(for z in range(total):) seems very slow. Could I modify it to be a bit more efficient? - Maybe modify how it is storing sum of last 10 values in r1,r2?

import numpy as np
import math
import scipy.stats as sp

# Define sample size 
sample=4999999
cutoff=int((sample+1)/100)
# Define days for x-day VaR
xdays=10

# Calculate the whole sample size and extended total sample size
size=sample*xdays+xdays-1
total=size+xdays
cutoff_o=int((size+1)/100)


# Sample values for kurtosis
#kurt=[0.0000001,1.0,2.0,3.0,4.0,5.0,6.0,10.0]

kurt=[6.0]

# Number of repetitions
rep=2

# Define correlation coefficient
rho=0.5

# Loop for different iterations
for x in range(rep):
    uni=sp.uniform.rvs(size=total)

# Loop for different values of kurtosis
    for y in kurt:
        df=(6.0/y)+4.0

        # Initialize arrays
        t_corr=np.empty(total)
        n_corr=np.empty(total)
        t_corr_2=np.empty(total)

        r1=np.empty(sample)
        r2=np.empty(size)

        r3=np.empty(sample)
        r4=np.empty(size)

        # Define t dist from uniform
        t_dist=sp.t.ppf(uni,df)
        n_dist=sp.norm.ppf(uni)

        # Loop to generate autocorrelated distributions
        for z in range(total):
            if z==0:
                t_corr[z]=t_dist[z]
                n_corr[z]=n_dist[z]
                t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
            else:
                t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
                n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
                t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
            if z>xdays-1:
                z_x=int(z/xdays)-1
                if (z%xdays)==0 and z_x<sample:
                    r1[z_x]= sum(t_corr[z-10:z])
                    r3[z_x]= sum(t_corr_2[z-10:z])

                r2[z-xdays]= sum(t_corr[z-10:z])
                r4[z-xdays]= sum(t_corr_2[z-10:z])

        print (np.partition(r1, cutoff-1)[cutoff-1], np.partition(r3, cutoff-1)[cutoff-1], np.partition(r2, cutoff_o-1)[cutoff_o-1], np.partition(r4, cutoff_o-1)[cutoff_o-1])
    print ()
2
  • 1
    Learn to use a profiler. I could guess a few improvements here, but I wouldn't want to guarantee that they even cause improvements in speed. E.g. you don't seem to use more than ten elements of t_corr, n_corr and t_corr_2 and a single element of t_dist and n_dist, so why create large arrays for them? Commented Mar 7, 2016 at 7:50
  • 1
    If you are seeking performance improvements in your code, you should ask at CodeReview Commented Mar 7, 2016 at 20:47

1 Answer 1

1

Some suggestions:

Unneccessary ifs

First, you could remove your if statements from your loop. Checking z == 0 millions of times seems a bit unnecessary when you, the programmer, knows that z is equal to zero on the first loop. The same goes for if z>xdays-1:

if z==0:
    t_corr[z]=t_dist[z]
    n_corr[z]=n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)

for z in range(1, xdays - 1):
    t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
    n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)

for z in range(xdays - 1, total)
    t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
    n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
    z_x=int(z/xdays)-1
    if (z%xdays)==0 and z_x<sample:
        r1[z_x]= sum(t_corr[z-10:z])
        r3[z_x]= sum(t_corr_2[z-10:z])    
    r2[z-xdays]= sum(t_corr[z-10:z])
    r4[z-xdays]= sum(t_corr_2[z-10:z])

Please double check this; I just threw it out :)

Compile your code!

A cheap/hack fix that could actually provide some serious benefit! You could try compile your python code into a binary, using Cython for example. I actually tested this with a contrived but not dissimilar example to yours that I hope will provide you enough information to start with. Suppose I have the following python script:

import math

for j in range(1000):
    for i in range(1000):
        a = math.sqrt(i) * math.sqrt(j)

Running it with python3 fast.py takes consistently .4s of real time on my Ubuntu VM. Running the following:

$ cython3 --embed -o fast.c fast.py
$ gcc -I /usr/include/python3.4m/ -o fast fast.c -lpython3.4m

produces a .c file from my python code and automatically compiles the binary fast for me from it. Running the executable now gives me an average real time of .14 seconds - a huge improvement!

Less list slicing (EDIT - not going to help, this is NumPy slicing not list slicing!)

Another problem could be down to your list slicing. Remember that slice notation involves creating a new list each time, meaning you're creating ~200,000,000 new lists with your four slices. Now I'm not certain this will be faster, but you could achieve the same behavior without copying, e.g.:

sum(t_corr[z-10:z])

could be replaced with

sum(t_coor[i] for i in range(z, 10))

Again, fix this to be what you actually want; this is just a concept piece.

Let me know if that helps at all!

Sign up to request clarification or add additional context in comments.

6 Comments

You could also try to exploit the delta between e.g. r1[x] and r1[x + 1], which is t_corr[x+1] - t_corr[x-9], as far as I can tell.
Thanks a lot for the comments. I'll try to incorporate those and see if I get any performance gains.
@user3765371 As Ulrich said, a profiler could be hugely helpful. I have little experience with numpy/scipy but if the profiler shows you're spending the majority of your time in those modules then making these improvements are going to be little help :)
@user3765371 I got a bit carried away and added another suggestion, using cython to compile your script into a native binary. If you mange to get it compiling then I feel this could be a good improvement!
Those slices are NumPy array slices, not list slices. NumPy slicing makes a view, not a copy. (I'm kind of surprised you know Cython, but not NumPy.)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.