0

I have a data generation class that produces data in batches. It's simplified as below:

import numpy as np
import os
import psutil


def memory_check():
    pid = os.getpid()
    py_mem = psutil.Process(pid)
    memory_use = py_mem.memory_info()[0] / 2. ** 30
    return {"python_usage": memory_use}


class DataBatcher:
    def __init__(self, X, batch_size):
        self.X = X
        self.start = 0
        self.batch_size = batch_size
        self.row_dim, col_dim = X.shape
        self.batch = np.zeros((batch_size, col_dim))

    def gen_batch(self):
        end_index = self.start + self.batch_size
        if end_index < self.row_dim:
            indices = range(self.start, end_index)
            print("before assign batch \n", memory_check())
            self.batch[:] = self.X.take(indices, axis=0, mode='wrap')
            print("after assign batch \n", memory_check())
            self.start = end_index
            return self.batch


if __name__ == "__main__":
    X = np.random.sample((1000000, 50))
    for i in range(100):
        data_batcher = DataBatcher(X, 5000)
        x = data_batcher.gen_batch()

The actual code is pretty close to the above one except that self.X is generated in another method inside DataBatcher class and it's updated periodically. I noticed that Python's memory usage increases steadily every round at line self.batch[:] = self.X.take(indices, axis=0, mode='wrap') when there is no changes made to self.X. I thought it shouldn't be since I pre-allocated memory for self.batch ?

1
  • The take does create a new temporary array object (with its own databuffer). Yes, that does get assigned to self.batch. But we don't know what numpy and/or Python does with the temporary array/buffer. numpy appears to do some of its own memory management that's independent (or above) of Python's own garbage collection. Commented Jan 30, 2019 at 7:23

1 Answer 1

1

As answered in Why does numpy.zeros takes up little space, this surprising behavior might be some OS-level optimizations:np.zeros doesn't actually takes up memory util you effectively write on it with self.batch[:] = self.X.take(indices, axis=0, mode='wrap')

Sign up to request clarification or add additional context in comments.

2 Comments

But then the memory shouldn't keep going up after the first round, no?
It will increase every time you change a position in the array from its default value

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.