numpy memory usage when assigning new values

Question

I have a data generation class that produces data in batches. It's simplified as below:

import numpy as np
import os
import psutil


def memory_check():
    pid = os.getpid()
    py_mem = psutil.Process(pid)
    memory_use = py_mem.memory_info()[0] / 2. ** 30
    return {"python_usage": memory_use}


class DataBatcher:
    def __init__(self, X, batch_size):
        self.X = X
        self.start = 0
        self.batch_size = batch_size
        self.row_dim, col_dim = X.shape
        self.batch = np.zeros((batch_size, col_dim))

    def gen_batch(self):
        end_index = self.start + self.batch_size
        if end_index < self.row_dim:
            indices = range(self.start, end_index)
            print("before assign batch \n", memory_check())
            self.batch[:] = self.X.take(indices, axis=0, mode='wrap')
            print("after assign batch \n", memory_check())
            self.start = end_index
            return self.batch


if __name__ == "__main__":
    X = np.random.sample((1000000, 50))
    for i in range(100):
        data_batcher = DataBatcher(X, 5000)
        x = data_batcher.gen_batch()

The actual code is pretty close to the above one except that self.X is generated in another method inside DataBatcher class and it's updated periodically. I noticed that Python's memory usage increases steadily every round at line self.batch[:] = self.X.take(indices, axis=0, mode='wrap') when there is no changes made to self.X. I thought it shouldn't be since I pre-allocated memory for self.batch ?

The take does create a new temporary array object (with its own databuffer). Yes, that does get assigned to self.batch. But we don't know what numpy and/or Python does with the temporary array/buffer. numpy appears to do some of its own memory management that's independent (or above) of Python's own garbage collection. — hpaulj
– hpaulj, Commented Jan 30, 2019 at 7:23

Vitor SRG · Accepted Answer · 2019-01-30 04:15:06Z

1

As answered in Why does numpy.zeros takes up little space, this surprising behavior might be some OS-level optimizations:np.zeros doesn't actually takes up memory util you effectively write on it with self.batch[:] = self.X.take(indices, axis=0, mode='wrap')

answered Jan 30, 2019 at 4:15

Vitor SRG

6948 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

neghez Over a year ago

But then the memory shouldn't keep going up after the first round, no?

Vitor SRG Over a year ago

It will increase every time you change a position in the array from its default value

Collectives™ on Stack Overflow

numpy memory usage when assigning new values

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related