Can array.array() be used to define a 2d array?

Question

I am new to Python. I am using Python 2.7. I want to creat a 2D array, I know how to do it using a list. But the data is large by using a list. In order to save memory, I want to use array rather than list. This was inspired by the "Use array.array('l') instead of list for the (integer) values" given in the answer to Huge memory usage of loading large dictionaries in memory .

Can this method work for 2D array?

If you haven't already, have a look at numpy. numpy.org Numpy's arrays are multidimensional (not just 1, 2, or 3d, but N-d), memory-efficient (just as efficient as array.array) containers. — Joe Kington
– Joe Kington, Commented Mar 7, 2014 at 17:40

martineau · Accepted Answer · 2022-06-24 03:16:41Z

You can't really create a 2d array.array() because their elements are restricted to the types: characters, integers, and floating point numbers. Instead you could store your data in a regular one-dimensional array and access it through some helper functions.

Here's an illustration of what I'm trying to describe:

from array import array

INFO_SIZE = 3  # Number of entries used to store info at beginning of array.
WIDTH, HEIGHT = 1000, 1000  # Dimensions.

array2d = array('l', (0 for _ in range(INFO_SIZE + WIDTH*HEIGHT)))
array2d[:INFO_SIZE] = array('l', (INFO_SIZE, WIDTH, HEIGHT))  # save array info

def get_elem(two_d_array, i, j):
    info_size, width, height = two_d_array[:INFO_SIZE]
    return two_d_array[info_size + j*width + i]

def set_elem(two_d_array, i, j, value):
    info_size, width, height = two_d_array[:INFO_SIZE]
    two_d_array[info_size + j*width + i] = value


import sys
print(format(sys.getsizeof(array2d), ",d"))  # -> 4,091,896

print(get_elem(array2d, 999, 999))           # -> 0
set_elem(array2d, 999, 999, 42)
print(get_elem(array2d, 999, 999))           # -> 42

As you can see the size of array2d is only slightly more (relatively speaking) than the size of the data itself (4,000,000 bytes in this case). You could dispense with the functions altogether and just do the offset calculation in-line to avoid the overhead of calling a function to do it on each access. On the other hand, if that's not a big concern, you could go even further and encapsulate all the logic in a generalized class Array2D.

Update

Encapsulating the implementation in a Class

Here's an example of that generalized class Array2D I mentioned. It has the advantage of being able to be used in the more natural array-like fashion of passing two integers to the indexing operator — i.e. my_array2d[row, col] — instead of calling standalone functions retrieve or set the values of its elements.

import array
from array import array as Array
import string
import sys


# Determine dictionary of valid typecodes and default initializer values.
_typecodes = dict()
for code in string.ascii_lowercase + string.ascii_uppercase:  # Assume single ASCII chars.
    initializer = 0
    try:
        Array(code, [initializer])
    except ValueError:
        continue  # Skip
    except TypeError:
        initializer = u'\x20'  # Assume it's a Unicode character.

    _typecodes[code] = initializer


class Array2D:
    """Partial implementation of preallocated 2D array.array()."""
    def __init__(self, width, height, typecode, initializer=None):
        if typecode not in _typecodes:
            raise NotImplementedError
        self.width, self.height, self._typecode = width, height, typecode
        initializer = _typecodes[typecode]
        self.data = Array(typecode, (initializer for _ in range(width * height)))

    def __getitem__(self, key):
        i, j = key
        return self.data[j*self.width + i]

    def __setitem__(self, key, value):
        i, j = key
        self.data[j*self.width + i] = value

    def __sizeof__(self):
        # Not called by sys.getsizeof() in Python 2 (although it should be).
        return sum(map(sys.getsizeof, (self.width, self.height, self.data)))

    @property
    def typecode(self):
        return self._typecode

    @property
    def itemsize(self):
        return self.data.itemsize


array2d = Array2D(1000, 1000, 'l')  # 1 million unsigned 4 byte longs.
print(format(sys.getsizeof(array2d), ',d'))  # -> 4,091,936
print(format(array2d.itemsize, ',d'))        # -> 4
print(array2d[999, 999])                     # -> 0
array2d[999, 999] = 42
print(array2d[999, 999])                     # -> 42

martineau · Accepted Answer · 2022-06-23 19:54:57Z

1

The question you refer to is about dictionaries, not arrays. Anyhow you could do this, which creates a list of arrays of 4 byte integers initialized to zero, which is effectively a 2D array:

from array import array

width, height = 1000, 1000
array2d = [array('l', (0 for _ in xrange(width))) for _ in xrange(height)]

array2d[999][999] = 42

edited Jun 23, 2022 at 19:54

answered Mar 7, 2014 at 11:30

martineau

124k29 gold badges181 silver badges319 bronze badges

6 Comments

Noelkd Over a year ago

The answer makes the reference to Use array.array('l') instead of list for the (integer) value

martineau Over a year ago

@Noelkd: Oops. Yes, for 4 byte integers it would need to be type code 'l' -- thanks.

Shirley Over a year ago

Thanks Martineau, but when I check the sizes of the array2d and the array_list = [[0 for _ in xrange(width)] for _ in xrange(height)], they are the same. It seems to me that the argument 'Use array.array('l') instead of list for the (integer) value' can save memory does not stand. Is my understanding wrong?

martineau Over a year ago

Shirley: Using an array.array('l') -- an array of 4 byte integers -- saves memory in the answer to the linked question because it was used instead of storing a 7-byte id which "appears to be an integer". That was something specific to the data in that situation. The basic idea might apply in other cases, depending on the nature of the data being stored.

martineau Over a year ago

Shirley: P.S. sys.getsizeof(array_list) and sys.getsizeof(array2d) return the same value because neither of them is counting the size of the elements in each row. 1000x1000 4-byte integers would require at least 4,000,000 bytes of memory. You can see this with something like the value of sys.getsizeof(array('i', (0 for _ in xrange(width*height)))). Using something like that and doing the 2d indexing manually would probably be the most space-efficient approach. If you're interested, I could add another answer showing how that could be done.

|

avoid3d · Accepted Answer · 2014-03-07 10:57:08Z

-1

In python arrays are lists.

The memory advantage in the other question was gained from not using a dictionary.

In general you will not see memory savings in moving "from a list to a 2d array".

Give me a sample of your data and I will update my answer.

answered Mar 7, 2014 at 10:57

avoid3d

6206 silver badges13 bronze badges

1 Comment

zhangxaochen Over a year ago

It's array.array, not refering to the builtin list

Collectives™ on Stack Overflow

Can array.array() be used to define a 2d array?

3 Answers 3

Update

Comments

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Update

Comments

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related