How to avoid for-loop while using append()

Question

First of all, I apologize for being an absolute beginner in both python and numpy. Please forgive my ignorance.

I have a 4D cube of pressure measurements where the dimensions are (number of samples, time, y-axis, x-axis), which means, for each sample, I have a 3D cube of spatio-temporal profile. I need to collect the pressure readings of this 3D cube (time, y-axis, x-axis) and store it into an array for each sample only where the coordinates satisfy a specific condition. Upon varying the specific condition, the size of this array will vary too. So, I have to use append() to build this array. However, since say for 1000 samples, I have to search through more than a millions coordinates using For-Loop for each sample, the code I have written is pretty inefficient and takes a lot of time to run (more than several hours). Can you please help me to write it more efficiently?

Below is the code I've tried to solve the problem. It works nicely and gives expected result but it is extremely slow.

import numpy as np

# Number of sample points in x,y and t-axis
Nx = 101
Ny = 101
Nt = 100
n_train = 1000
target_array = []

for i_train in range (n_train):
    for k in range (Nt):
        for j in range (Ny):
            for i in range (Nx):
                if np.round(np.sqrt((i-np.round(Nx/2))**2+(j-np.round(Ny/2))**2)) == 2*k:
                    target_array.append(Pressure[i_train,k,j,i])

How does it work nicely? You overwrite target_array in the outer loop, so at the end it will only contain the values corresponding to i_train = 999. — CristiFati
– CristiFati, Commented Sep 29, 2019 at 9:39
Extremely sorry for the mistake, I made an error while copying the code from my file. I have corrected it. Thanks a lot for pointing it out. — Bahauddin Omar
– Bahauddin Omar, Commented Sep 29, 2019 at 9:43
First you could try to extract all possible (e.g. np.round(Nx/2)) operations outside of the (inner) loops. This will get you some time improvements but at a max factor of 10. Then you could use vectorized operations (np specific) to reduce the time to seconds (probably). — CristiFati
– CristiFati, Commented Sep 29, 2019 at 9:55
Thank you so much for the suggestion. I apologize for my ignorance, but can you please explain this sentence a little bit detail: "Then you could use vectorized operations (np specific) to reduce the time to seconds (probably)"? — Bahauddin Omar
– Bahauddin Omar, Commented Sep 29, 2019 at 10:00
What @CristiFati means is that you should try and use numpy arrays' built-in capability for performing vectorized operations, instead of relying on Python's for-loops, which are extremely poor performance wise. I'll try and sketch out a solution for you. — Jethro Cao
– Jethro Cao, Commented Sep 29, 2019 at 10:10

Valentino · Accepted Answer · 2019-09-29 13:52:53Z

Since the condition involves the indexes and not the values of your 4D array, you can vectorize it using numpy.meshgrid.

Here pp is your 4D array:

iv, jv, kv = np.meshgrid(np.arange(pp.shape[3]), np.arange(pp.shape[2]), np.arange(pp.shape[1]))
selecting = np.round(np.sqrt((iv - np.round(pp.shape[3]/2))**2 + (jv - np.round(pp.shape[2]/2))**2)) == 2*kv
target = pp[:,selecting]

Provided that I've understood correctly how your 4D array is organized:

the arrays created by meshgrid hold the indexes to select pp elements on the 3 dimensions x, y, t.
selecting is a boolean array created by replicating your equation, to check which coordinates satisfies the condition.
target is a selection of pp, taking all element on 0 axis which satisfies the condition (i.e. selecting is True) on the other 3 axes.

Note that target is a 2D array, to have a 1D array, use target.flatten().

Collectives™ on Stack Overflow

How to avoid for-loop while using append()

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related