0

I have a large array K (29000 x 29000):

K= numpy.random.random((29000, 29000))

I want to apply the following operation on K:

output = K* (1.5 - 0.5 * K* K)

To try preventing 'MemoryError' , I am doing my computations as suggested on the answer from this thread.

However, when I try to do the assignment operation on the large array as follows, I still get the MemoryError:

K *= 1.5 - 0.5 * K * K

Any help welcome.

NOTE: this is not a duplicate post. There is a suggestion on this post using cython. But I am looking for alternative solutions which may not rely on Cython.

7
  • Use dask.array. chek it here dask.pydata.org/en/latest/array.html Commented Jan 5, 2018 at 15:35
  • Possible duplicate of How can I apply the assignment operator correctly in Python? Commented Jan 5, 2018 at 15:40
  • @Till Hoffman not a duplicate.......this post is made separately to raise the issue of MemoryError to the wider audience. Your suggestion using cython is appreciated but the aim of this new post is to make clear my current issue of MemoryError as I am looking for alternative solutions which may not rely on Cython. Commented Jan 5, 2018 at 15:49
  • 2
    The expression 1.5 - 0.5 * K * K still requires the creation of temporary arrays to hold intermediate results. Your array requires almost 7 gigabytes. How much RAM does your computer have? Commented Jan 5, 2018 at 16:00
  • 1
    Perhaps temp = K*K; temp *= -0.5; temp += 1.5; K *= temp; del temp. This should avoid ever having to have 3 arrays in memory. Commented Jan 5, 2018 at 16:13

1 Answer 1

5

You can do assignment in blocks, say, of 1000 rows. The additional array this creates will be 1/29 of the size of your array, and having a for loop running 29 times shouldn't be much of a speed problem. Typical memory/speed tradeoff.

block = 1000          # the size of row blocks to use 
K = np.random.random((29000, 29000))
for i in range(int(np.ceil(K.shape[0] / block))):
    K[i*block:(i+1)*block, :] *= 1.5 - 0.5 * K[i*block:(i+1)*block, :]**2

Since there was some concern about the performance on smaller matrices, here is a test for those:

block = 1000
K = np.arange(9).astype(np.float).reshape((3, 3))
print(1.5 * K - 0.5 * K**3)
for i in range(int(np.ceil(K.shape[0] / block))):
    K[i*block:(i+1)*block_size, :] *= 1.5 - 0.5 * K[i*block:(i+1)*block_size, :]**2
print(K)

This prints

[[   0.    1.   -1.]
 [  -9.  -26.  -55.]
 [ -99. -161. -244.]]

twice.

Sign up to request clarification or add additional context in comments.

11 Comments

is this superior to the answer by @StevenRumbalski in the comments above?
Test both and you'll find out. I don't have your memory/CPU configuration.
is the suggested code robust to all array sizes (e.g., 30213 x 30213)?
I'd say this is better solution than @StevenRumbalski's, because in his approach he's iterating through all cells many times, where as here only as many as needed are iterated (basic algebra is almost free and even in CPU you can do many computations in parallel). Also, consider using np.power(K, 2) instead of K**2.
@unknown121 This is a different question. Try casting the array to single precision (np.float32) if that is enough precision for your purpose. There is also memmap that could be used to hold the array on disk inside of memory, freeing the memory for temporary arrays creating in the computation.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.