2

I am new to Python. I am using Python 2.7. I want to creat a 2D array, I know how to do it using a list. But the data is large by using a list. In order to save memory, I want to use array rather than list. This was inspired by the "Use array.array('l') instead of list for the (integer) values" given in the answer to Huge memory usage of loading large dictionaries in memory .

Can this method work for 2D array?

1
  • If you haven't already, have a look at numpy. numpy.org Numpy's arrays are multidimensional (not just 1, 2, or 3d, but N-d), memory-efficient (just as efficient as array.array) containers. Commented Mar 7, 2014 at 17:40

3 Answers 3

4

You can't really create a 2d array.array() because their elements are restricted to the types: characters, integers, and floating point numbers. Instead you could store your data in a regular one-dimensional array and access it through some helper functions.

Here's an illustration of what I'm trying to describe:

from array import array

INFO_SIZE = 3  # Number of entries used to store info at beginning of array.
WIDTH, HEIGHT = 1000, 1000  # Dimensions.

array2d = array('l', (0 for _ in range(INFO_SIZE + WIDTH*HEIGHT)))
array2d[:INFO_SIZE] = array('l', (INFO_SIZE, WIDTH, HEIGHT))  # save array info

def get_elem(two_d_array, i, j):
    info_size, width, height = two_d_array[:INFO_SIZE]
    return two_d_array[info_size + j*width + i]

def set_elem(two_d_array, i, j, value):
    info_size, width, height = two_d_array[:INFO_SIZE]
    two_d_array[info_size + j*width + i] = value


import sys
print(format(sys.getsizeof(array2d), ",d"))  # -> 4,091,896

print(get_elem(array2d, 999, 999))           # -> 0
set_elem(array2d, 999, 999, 42)
print(get_elem(array2d, 999, 999))           # -> 42

As you can see the size of array2d is only slightly more (relatively speaking) than the size of the data itself (4,000,000 bytes in this case). You could dispense with the functions altogether and just do the offset calculation in-line to avoid the overhead of calling a function to do it on each access. On the other hand, if that's not a big concern, you could go even further and encapsulate all the logic in a generalized class Array2D.

Update

Encapsulating the implementation in a Class

Here's an example of that generalized class Array2D I mentioned. It has the advantage of being able to be used in the more natural array-like fashion of passing two integers to the indexing operator — i.e. my_array2d[row, col] — instead of calling standalone functions retrieve or set the values of its elements.

import array
from array import array as Array
import string
import sys


# Determine dictionary of valid typecodes and default initializer values.
_typecodes = dict()
for code in string.ascii_lowercase + string.ascii_uppercase:  # Assume single ASCII chars.
    initializer = 0
    try:
        Array(code, [initializer])
    except ValueError:
        continue  # Skip
    except TypeError:
        initializer = u'\x20'  # Assume it's a Unicode character.

    _typecodes[code] = initializer


class Array2D:
    """Partial implementation of preallocated 2D array.array()."""
    def __init__(self, width, height, typecode, initializer=None):
        if typecode not in _typecodes:
            raise NotImplementedError
        self.width, self.height, self._typecode = width, height, typecode
        initializer = _typecodes[typecode]
        self.data = Array(typecode, (initializer for _ in range(width * height)))

    def __getitem__(self, key):
        i, j = key
        return self.data[j*self.width + i]

    def __setitem__(self, key, value):
        i, j = key
        self.data[j*self.width + i] = value

    def __sizeof__(self):
        # Not called by sys.getsizeof() in Python 2 (although it should be).
        return sum(map(sys.getsizeof, (self.width, self.height, self.data)))

    @property
    def typecode(self):
        return self._typecode

    @property
    def itemsize(self):
        return self.data.itemsize


array2d = Array2D(1000, 1000, 'l')  # 1 million unsigned 4 byte longs.
print(format(sys.getsizeof(array2d), ',d'))  # -> 4,091,936
print(format(array2d.itemsize, ',d'))        # -> 4
print(array2d[999, 999])                     # -> 0
array2d[999, 999] = 42
print(array2d[999, 999])                     # -> 42
Sign up to request clarification or add additional context in comments.

Comments

1

The question you refer to is about dictionaries, not arrays. Anyhow you could do this, which creates a list of arrays of 4 byte integers initialized to zero, which is effectively a 2D array:

from array import array

width, height = 1000, 1000
array2d = [array('l', (0 for _ in xrange(width))) for _ in xrange(height)]

array2d[999][999] = 42

6 Comments

The answer makes the reference to Use array.array('l') instead of list for the (integer) value
@Noelkd: Oops. Yes, for 4 byte integers it would need to be type code 'l' -- thanks.
Thanks Martineau, but when I check the sizes of the array2d and the array_list = [[0 for _ in xrange(width)] for _ in xrange(height)], they are the same. It seems to me that the argument 'Use array.array('l') instead of list for the (integer) value' can save memory does not stand. Is my understanding wrong?
Shirley: Using an array.array('l') -- an array of 4 byte integers -- saves memory in the answer to the linked question because it was used instead of storing a 7-byte id which "appears to be an integer". That was something specific to the data in that situation. The basic idea might apply in other cases, depending on the nature of the data being stored.
Shirley: P.S. sys.getsizeof(array_list) and sys.getsizeof(array2d) return the same value because neither of them is counting the size of the elements in each row. 1000x1000 4-byte integers would require at least 4,000,000 bytes of memory. You can see this with something like the value of sys.getsizeof(array('i', (0 for _ in xrange(width*height)))). Using something like that and doing the 2d indexing manually would probably be the most space-efficient approach. If you're interested, I could add another answer showing how that could be done.
|
-1

In python arrays are lists.

The memory advantage in the other question was gained from not using a dictionary.

In general you will not see memory savings in moving "from a list to a 2d array".

Give me a sample of your data and I will update my answer.

1 Comment

It's array.array, not refering to the builtin list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.