6

I have a sparse array that seems to be too large to handel effectively in memory (2000x2500000, float). I can form it into a sparse lil_array (scipy) but if I try output a column or row compressed sparse array (A.tocsc(), A.tocsr()) my machine runs out of memory (and there's also a serious mismatch between the data in a text file 4.4G and the pickeled lil array 12G - it would be nice to have a disk format that more closely approximates the raw data size).

I will probably be handeling even larger arrays in the future.

Question: What's the best way to handle large on disk arrays in such a way that I can use the regular numpy functions in a transparent way. For instance, sums along rows and columns, vector products, max, min, slicing etc?

Is pytables the way to go? is there a good (fast) sql-numpy middleware layer? a secret on disk array built into numpy?

In the past with (slightly smaller) arrays I've always just pickel-cached long calculated results to disk. This works when the arrays end up being < 4G or so but is not longer tenable.

1
  • When you pickled your array, did you make sure to use the binary protocol? If you are using the default text protocol, then this could be the cause of the huge file size. Commented Apr 26, 2012 at 5:57

1 Answer 1

2

I often use memory-mapped numpy arrays to process multi-gigabyte numerical matrices. I find them to work really well for my purposes. Obviously, if the size of the data exceeds the amount of RAM, one has to be careful about access patterns to avoid thrashing.

Sign up to request clarification or add additional context in comments.

4 Comments

This might be doable but seems pretty inefficient for sparse arrays. Is there a sparse version?
@AnthonyBak: Not that I know of. However, a 2000x100000 dense array of float32 is only 800MB in size (both on disk and in memory).
Yes, there was a typo in my original question. It should have said 2000x2500000.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.