0

I have a Xmatrix of Row=12584 and Col 784. I want to extract each row based on another Tmatrix of Row=12584 Col 1 and append the values to numpy array X1 or X2. Even with smaller row size of 1500 it takes over 10 mins. I am sure there is better and efficient way to extract entire row and append to an array

import numpy as np
import time
start_time = time.time()

Row = 12584
#Row = 1500
Col = 784
Xmatrix = np.random.rand(Row,Col)

Tmatrix = np.random.randint(1,3,(Row,1))
X1 = np.array([])
X2 = np.array([])

for i in range(Row):
    if Tmatrix[i] == 1:
        for y in range(Col):
            print ('Current row and col are --', i, y, Xmatrix[i][y])
            X1 = np.append(X1, Xmatrix[i][y])
    else:
        for y in range(Col):
            X2 = np.append(X2, Xmatrix[i][y])

print (X1)
print("--- %s seconds ---" % (time.time() - start_time))
2
  • 1
    alist.append(Xmatrix[i,y]) should be faster. But either way, iterating on rows and cols is slow. Even if you iterate on Row and do the test, you don't need to iterate on Col, alist.extend(Xmatrix[i] puts the whole row in the list at once. Commented Sep 7, 2019 at 20:00
  • @hpaulj - ur suggestion of extend with list is working out - if u could post it as answer I can go ahead and select it. Commented Sep 7, 2019 at 23:14

3 Answers 3

2

try this:

import numpy as np
import time
start_time = time.time()

Row = 12584
#Row = 1500
Col = 784
Xmatrix = np.random.rand(Row,Col)

Tmatrix = np.random.randint(1,3,(Row,1))

X1 = Xmatrix[(Tmatrix==1).reshape(-1)]
X2 = Xmatrix[(Tmatrix==2).reshape(-1)]

print(X1.reshape(-1))

print(time.time() - start_time)

On my computer the program runs in 0.34 seconds. When using numpy it is good to avoid loops by indexing and slicing http://codeinpython.com/tutorials/numpy-array-indexing-slicing/

Sign up to request clarification or add additional context in comments.

2 Comments

Can you please explain "X1 = Xmatrix[(Tmatrix==1).reshape(-1)]" what does it do .. too pythonic for me i guess
I will explain "X1 = Xmatrix[(Tmatrix==1).reshape(-1)]" "reshape(-1)" will flatten the array into an 1d-array "Xmatrix[Bool_Array]" returns the rows, where Bool_Array is True. See stackoverflow.com/questions/7994394/… and docs.scipy.org/doc/numpy/user/…
2

You can drop iteration through columns for y in range(Col):, in numpy you can retrieve the whole row by:

Xmatrix[i, :]

and then append it by

X1=np.append(X1, [Xmatrix[i, :]], axis=0)

or alternatively:

X1=np.vstack([X1, Xmatrix[i, :]])

EDIT

To make appending work - first you need to create X1 and X2 in the proper shape parameters. In this case:

X1=np.empty(shape=(0, Col))
X2=np.empty(shape=(0, Col))

4 Comments

getting error with append - "ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)" and for vstack getting error "ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 784
Small tweak - you need to create X1 and X2 in the predefined shape - see EDIT in my answer.
Still get below error - "ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 12584 and the array at index 1 has size 784
Pardon! Now it should work - I used number of rows to create empty list, instead of number of columns...
1

With lists, this should be fairly efficient:

X1 =[]
X2 =[]    
for i in range(Row):
    if Tmatrix[i] == 1:
        X1.extend(Xmatrix[i])
    else:
        X2.extend(Xmatrix[i])

You can np.array(X1) after if needed.

1 Comment

@hpaulji - ur solution was most intuitive to me but I see selected answer as best way to do it. Thanks for the help :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.