How can I improve the performance of the below python code

Question

I wrote this piece of code shown below. I am having severe performance issues with it. Especially the loop where i loop 50 million times(for z in range(total):) seems very slow. Could I modify it to be a bit more efficient? - Maybe modify how it is storing sum of last 10 values in r1,r2?

import numpy as np
import math
import scipy.stats as sp

# Define sample size 
sample=4999999
cutoff=int((sample+1)/100)
# Define days for x-day VaR
xdays=10

# Calculate the whole sample size and extended total sample size
size=sample*xdays+xdays-1
total=size+xdays
cutoff_o=int((size+1)/100)


# Sample values for kurtosis
#kurt=[0.0000001,1.0,2.0,3.0,4.0,5.0,6.0,10.0]

kurt=[6.0]

# Number of repetitions
rep=2

# Define correlation coefficient
rho=0.5

# Loop for different iterations
for x in range(rep):
    uni=sp.uniform.rvs(size=total)

# Loop for different values of kurtosis
    for y in kurt:
        df=(6.0/y)+4.0

        # Initialize arrays
        t_corr=np.empty(total)
        n_corr=np.empty(total)
        t_corr_2=np.empty(total)

        r1=np.empty(sample)
        r2=np.empty(size)

        r3=np.empty(sample)
        r4=np.empty(size)

        # Define t dist from uniform
        t_dist=sp.t.ppf(uni,df)
        n_dist=sp.norm.ppf(uni)

        # Loop to generate autocorrelated distributions
        for z in range(total):
            if z==0:
                t_corr[z]=t_dist[z]
                n_corr[z]=n_dist[z]
                t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
            else:
                t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
                n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
                t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
            if z>xdays-1:
                z_x=int(z/xdays)-1
                if (z%xdays)==0 and z_x<sample:
                    r1[z_x]= sum(t_corr[z-10:z])
                    r3[z_x]= sum(t_corr_2[z-10:z])

                r2[z-xdays]= sum(t_corr[z-10:z])
                r4[z-xdays]= sum(t_corr_2[z-10:z])

        print (np.partition(r1, cutoff-1)[cutoff-1], np.partition(r3, cutoff-1)[cutoff-1], np.partition(r2, cutoff_o-1)[cutoff_o-1], np.partition(r4, cutoff_o-1)[cutoff_o-1])
    print ()

Learn to use a profiler. I could guess a few improvements here, but I wouldn't want to guarantee that they even cause improvements in speed. E.g. you don't seem to use more than ten elements of t_corr, n_corr and t_corr_2 and a single element of t_dist and n_dist, so why create large arrays for them? — Ulrich Eckhardt
– Ulrich Eckhardt, Commented Mar 7, 2016 at 7:50
If you are seeking performance improvements in your code, you should ask at CodeReview — OneCricketeer
– OneCricketeer, Commented Mar 7, 2016 at 20:47

DaveBensonPhillips · Accepted Answer · 2016-03-07 21:04:04Z

1

Some suggestions:

Unneccessary ifs

First, you could remove your if statements from your loop. Checking z == 0 millions of times seems a bit unnecessary when you, the programmer, knows that z is equal to zero on the first loop. The same goes for if z>xdays-1:

if z==0:
    t_corr[z]=t_dist[z]
    n_corr[z]=n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)

for z in range(1, xdays - 1):
    t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
    n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)

for z in range(xdays - 1, total)
    t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
    n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
    t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
    z_x=int(z/xdays)-1
    if (z%xdays)==0 and z_x<sample:
        r1[z_x]= sum(t_corr[z-10:z])
        r3[z_x]= sum(t_corr_2[z-10:z])    
    r2[z-xdays]= sum(t_corr[z-10:z])
    r4[z-xdays]= sum(t_corr_2[z-10:z])

Please double check this; I just threw it out :)

Compile your code!

A cheap/hack fix that could actually provide some serious benefit! You could try compile your python code into a binary, using Cython for example. I actually tested this with a contrived but not dissimilar example to yours that I hope will provide you enough information to start with. Suppose I have the following python script:

import math

for j in range(1000):
    for i in range(1000):
        a = math.sqrt(i) * math.sqrt(j)

Running it with python3 fast.py takes consistently .4s of real time on my Ubuntu VM. Running the following:

$ cython3 --embed -o fast.c fast.py
$ gcc -I /usr/include/python3.4m/ -o fast fast.c -lpython3.4m

produces a .c file from my python code and automatically compiles the binary fast for me from it. Running the executable now gives me an average real time of .14 seconds - a huge improvement!

Less list slicing (EDIT - not going to help, this is NumPy slicing not list slicing!)

Another problem could be down to your list slicing. Remember that slice notation involves creating a new list each time, meaning you're creating ~200,000,000 new lists with your four slices. Now I'm not certain this will be faster, but you could achieve the same behavior without copying, e.g.:

sum(t_corr[z-10:z])

could be replaced with

sum(t_coor[i] for i in range(z, 10))

~~Again, fix this to be what you actually want; this is just a concept piece.~~

Let me know if that helps at all!

edited Mar 7, 2016 at 21:04

answered Mar 7, 2016 at 4:28

DaveBensonPhillips

3,2321 gold badge22 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ulrich Eckhardt Over a year ago

You could also try to exploit the delta between e.g. r1[x] and r1[x + 1], which is t_corr[x+1] - t_corr[x-9], as far as I can tell.

anuragsodhi Over a year ago

Thanks a lot for the comments. I'll try to incorporate those and see if I get any performance gains.

DaveBensonPhillips Over a year ago

@user3765371 As Ulrich said, a profiler could be hugely helpful. I have little experience with numpy/scipy but if the profiler shows you're spending the majority of your time in those modules then making these improvements are going to be little help :)

DaveBensonPhillips Over a year ago

@user3765371 I got a bit carried away and added another suggestion, using cython to compile your script into a native binary. If you mange to get it compiling then I feel this could be a good improvement!

user2357112 Over a year ago

Those slices are NumPy array slices, not list slices. NumPy slicing makes a view, not a copy. (I'm kind of surprised you know Cython, but not NumPy.)

|

Collectives™ on Stack Overflow

How can I improve the performance of the below python code

1 Answer 1

Unneccessary ifs

Compile your code!

Less list slicing (EDIT - not going to help, this is NumPy slicing not list slicing!)

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Unneccessary ifs

Compile your code!

Less list slicing (EDIT - not going to help, this is NumPy slicing not list slicing!)

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related