1

First of all, I apologize for being an absolute beginner in both python and numpy. Please forgive my ignorance.

I have a 4D cube of pressure measurements where the dimensions are (number of samples, time, y-axis, x-axis), which means, for each sample, I have a 3D cube of spatio-temporal profile. I need to collect the pressure readings of this 3D cube (time, y-axis, x-axis) and store it into an array for each sample only where the coordinates satisfy a specific condition. Upon varying the specific condition, the size of this array will vary too. So, I have to use append() to build this array. However, since say for 1000 samples, I have to search through more than a millions coordinates using For-Loop for each sample, the code I have written is pretty inefficient and takes a lot of time to run (more than several hours). Can you please help me to write it more efficiently?

Below is the code I've tried to solve the problem. It works nicely and gives expected result but it is extremely slow.

import numpy as np

# Number of sample points in x,y and t-axis
Nx = 101
Ny = 101
Nt = 100
n_train = 1000
target_array = []

for i_train in range (n_train):
    for k in range (Nt):
        for j in range (Ny):
            for i in range (Nx):
                if np.round(np.sqrt((i-np.round(Nx/2))**2+(j-np.round(Ny/2))**2)) == 2*k:
                    target_array.append(Pressure[i_train,k,j,i])
8
  • How does it work nicely? You overwrite target_array in the outer loop, so at the end it will only contain the values corresponding to i_train = 999. Commented Sep 29, 2019 at 9:39
  • Extremely sorry for the mistake, I made an error while copying the code from my file. I have corrected it. Thanks a lot for pointing it out. Commented Sep 29, 2019 at 9:43
  • First you could try to extract all possible (e.g. np.round(Nx/2)) operations outside of the (inner) loops. This will get you some time improvements but at a max factor of 10. Then you could use vectorized operations (np specific) to reduce the time to seconds (probably). Commented Sep 29, 2019 at 9:55
  • Thank you so much for the suggestion. I apologize for my ignorance, but can you please explain this sentence a little bit detail: "Then you could use vectorized operations (np specific) to reduce the time to seconds (probably)"? Commented Sep 29, 2019 at 10:00
  • What @CristiFati means is that you should try and use numpy arrays' built-in capability for performing vectorized operations, instead of relying on Python's for-loops, which are extremely poor performance wise. I'll try and sketch out a solution for you. Commented Sep 29, 2019 at 10:10

1 Answer 1

1

Since the condition involves the indexes and not the values of your 4D array, you can vectorize it using numpy.meshgrid.

Here pp is your 4D array:

iv, jv, kv = np.meshgrid(np.arange(pp.shape[3]), np.arange(pp.shape[2]), np.arange(pp.shape[1]))
selecting = np.round(np.sqrt((iv - np.round(pp.shape[3]/2))**2 + (jv - np.round(pp.shape[2]/2))**2)) == 2*kv
target = pp[:,selecting]

Provided that I've understood correctly how your 4D array is organized:

  • the arrays created by meshgrid hold the indexes to select pp elements on the 3 dimensions x, y, t.
  • selecting is a boolean array created by replicating your equation, to check which coordinates satisfies the condition.
  • target is a selection of pp, taking all element on 0 axis which satisfies the condition (i.e. selecting is True) on the other 3 axes.

Note that target is a 2D array, to have a 1D array, use target.flatten().

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.