Memory overflow in numpy boolean array

Question

I am trying to create a large Boolean array (for a prime number sieve). I used first Python lists, but at limit = 10^9 this created a MemoryError.

boolarray = [True] * limit

Then I learned about Numpy and read that it is better with space organisation, so I tried

boolarray = np.full(limit, True, dtype = bool)

The limit only marginally increased to 10^10, which is not sufficient, since I need 10^12. I find this surprising, you just need a bit for Boolean, don't you? Any idea, how to overcome this problem? Thanks in advance.

Docs: bool_ Boolean (True or False) stored as a byte. Now i let others / you do some research on this design-decision. It's highly linked to CPU-architecture and intended use-cases. — sascha
– sascha, Commented Nov 20, 2017 at 11:59
Well, I may have made a mistake so you will want to verify that :) (I calculated this number as 10**12 / (8 * 1024**3)). Unless you have a crazy amount of ram you should probably look for a different approach. — MB-F
– MB-F, Commented Nov 20, 2017 at 12:33
@Piinthesky You may find this answer interesting. It discusses ways to store primes efficiently. However, I'm not sure if this helps in generating them in the first place... — MB-F
– MB-F, Commented Nov 20, 2017 at 15:32
Excellent. This is also something I was thinking about. Just in case somebody else is not so much interested in programming and just wants to have a practical solution - there are C libraries like primesieve.org for that. — Mr. T
– Mr. T, Commented Nov 20, 2017 at 15:51

MB-F · Accepted Answer · 2017-11-20 13:00:25Z

2

Let's put aside the fact that 10^12 bits will probably not easily fit into memory. If you care more about memory usage than performance you can pack the bits into a byte array. This comes at the cost of additional computations when reading/writing bits (which is the reason numpy stores booleans as bytes).

import numpy as np


def getbit(bitarray, index):
    i, j = index // 8, index % 8
    x = bitarray[i]
    return x & (1 << j) != 0


def setbit(bitarray, index, value):
    value = bool(value)
    i, j = index // 8, index % 8
    x = bitarray[i]
    bitarray[i] ^= (np.uint(-value) ^ x) & (1 << j)


n = 10**5 // 8
bitarray = np.zeros(n, dtype=np.uint8)  # initialize all bits to 0

print(getbit(bitarray, 19))  # False

setbit(bitarray, 19, True)
print(getbit(bitarray, 19))  # True

setbit(bitarray, 19, False)
print(getbit(bitarray, 19))  # False

answered Nov 20, 2017 at 13:00

MB-F

23.8k5 gold badges71 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mr. T Over a year ago

Thanks for the explanation and the examples. Speaking from experience with my first self-written summation functions, before I discovered the Python function sum(): I assume these operations mean A LOT more computation time, since they don't give parameters directly to Python to handle the data efficiently within their libraries written in C. If this is true, then I'm afraid that means back to the drawing board.

MB-F Over a year ago

@Piinthesky Jep, if you call these functions a lot - which I assume you do - you will notice the interpreter overhead. Anyway, this does not solve the problem of squeezing 10^12 bits into the ram, but I thought it may be worth answering the question anyway because it can be useful for others too.

Mr. T Over a year ago

I marked it as a good answer, because it shows that I am on the wrong track. This is valuable information to me (and maybe others as well).

Collectives™ on Stack Overflow

Memory overflow in numpy boolean array

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related