Why is assignment operation still giving MemoryError for large arrays in Python?

Question

I have a large array K (29000 x 29000):

K= numpy.random.random((29000, 29000))

I want to apply the following operation on K:

output = K* (1.5 - 0.5 * K* K)

To try preventing 'MemoryError' , I am doing my computations as suggested on the answer from this thread.

However, when I try to do the assignment operation on the large array as follows, I still get the MemoryError:

K *= 1.5 - 0.5 * K * K

Any help welcome.

NOTE: this is not a duplicate post. There is a suggestion on this post using cython. But I am looking for alternative solutions which may not rely on Cython.

Use dask.array. chek it here dask.pydata.org/en/latest/array.html — Sahil Dahiya
– Sahil Dahiya, Commented Jan 5, 2018 at 15:35
Possible duplicate of How can I apply the assignment operator correctly in Python? — Till Hoffmann
– Till Hoffmann, Commented Jan 5, 2018 at 15:40
@Till Hoffman not a duplicate.......this post is made separately to raise the issue of MemoryError to the wider audience. Your suggestion using cython is appreciated but the aim of this new post is to make clear my current issue of MemoryError as I am looking for alternative solutions which may not rely on Cython. — user121
– user121, Commented Jan 5, 2018 at 15:49
The expression 1.5 - 0.5 * K * K still requires the creation of temporary arrays to hold intermediate results. Your array requires almost 7 gigabytes. How much RAM does your computer have? — Warren Weckesser
– Warren Weckesser, Commented Jan 5, 2018 at 16:00
Perhaps temp = K*K; temp *= -0.5; temp += 1.5; K *= temp; del temp. This should avoid ever having to have 3 arrays in memory. — Steven Rumbalski
– Steven Rumbalski, Commented Jan 5, 2018 at 16:13

score 5 · Accepted Answer · 2018-01-05 17:59:43Z

5

You can do assignment in blocks, say, of 1000 rows. The additional array this creates will be 1/29 of the size of your array, and having a for loop running 29 times shouldn't be much of a speed problem. Typical memory/speed tradeoff.

block = 1000          # the size of row blocks to use 
K = np.random.random((29000, 29000))
for i in range(int(np.ceil(K.shape[0] / block))):
    K[i*block:(i+1)*block, :] *= 1.5 - 0.5 * K[i*block:(i+1)*block, :]**2

Since there was some concern about the performance on smaller matrices, here is a test for those:

block = 1000
K = np.arange(9).astype(np.float).reshape((3, 3))
print(1.5 * K - 0.5 * K**3)
for i in range(int(np.ceil(K.shape[0] / block))):
    K[i*block:(i+1)*block_size, :] *= 1.5 - 0.5 * K[i*block:(i+1)*block_size, :]**2
print(K)

This prints

[[   0.    1.   -1.]
 [  -9.  -26.  -55.]
 [ -99. -161. -244.]]

twice.

edited Jan 5, 2018 at 17:59

answered Jan 5, 2018 at 16:20

user6655984

Sign up to request clarification or add additional context in comments.

11 Comments

user121 Over a year ago

is this superior to the answer by @StevenRumbalski in the comments above?

user6655984 Over a year ago

Test both and you'll find out. I don't have your memory/CPU configuration.

user121 Over a year ago

is the suggested code robust to all array sizes (e.g., 30213 x 30213)?

Dawid Laszuk Over a year ago

I'd say this is better solution than @StevenRumbalski's, because in his approach he's iterating through all cells many times, where as here only as many as needed are iterated (basic algebra is almost free and even in CPU you can do many computations in parallel). Also, consider using np.power(K, 2) instead of K**2.

user6655984 Over a year ago

@unknown121 This is a different question. Try casting the array to single precision (np.float32) if that is enough precision for your purpose. There is also memmap that could be used to hold the array on disk inside of memory, freeing the memory for temporary arrays creating in the computation.

|

Collectives™ on Stack Overflow

Why is assignment operation still giving MemoryError for large arrays in Python?

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related