12

I would like to combine the functionality of numpy's array with native python's dict, namely creating a multidimensional array that can be indexed with strings.

For example, I could do this:

dict_2d = {'a': {'x': 1, 'y': 2},
           'b': {'x': 3, 'y': 4}}
print dict_2d['a','y']  # returns 2

I know I could do dict_2d['a']['x'] but long term I'd like to be able to treat them like numpy arrays including doing matrix multiplication and such and thats not possible with layered dicts.

Its also not that hard to write up a simple version of the class where I just use the class to convert all the strings to int indexes and then use numpy but I'd like to use something that already exists if possible.

Edit: I don't need incredible performance. I'll be working with maybe 10x10 arrays. My goal is to make writing the code simple and robust. Working with numpy arrays is not really much different than just writing it in Fortran. I've spent enough of my life tracking down Fortran indexing errors...

9
  • 2
    Do you actually need the data to be stored in a nested structure? You could just use a dict whose keys are tuples. Commented May 12, 2015 at 18:45
  • @BrenBarn that could use a lot of memory Commented May 12, 2015 at 18:49
  • 3
    What are you trying to achieve by not using numpy? If you just want string labels for the rows/columns, looks at pandas, which provides nice tabular data types that wrap numpy arrays. Commented May 12, 2015 at 19:00
  • 2
    @BrenBarn I'm not trying to avoid numpy at all. It just doesn't seem to do string labels by itself. pandas looks like it might do just what I'm hoping for. I'll check it out in more detail Commented May 12, 2015 at 19:07
  • 1
    @BrenBarn Pandas was just what I wanted. If you want to write it up as an answer I'll accept it. Commented May 12, 2015 at 19:41

3 Answers 3

11

You may be looking for pandas, which provides handy datatypes that wrap numpy arrays, allowing you do access rows and columns by name instead of just by number.

Sign up to request clarification or add additional context in comments.

Comments

2

I dislike giving ready made answers - but I think it would take much more time to explain it in English -

The basic idea to fetch objects the way numpy does is to customize the __getitem__ method - comma separated values are presented to the method as tuples - you them just use the values in the tuple as indexes to your nested dictionaries in sequence.

Beyond that, Python made easy to create fully functional dict equivalentes with the collections.abc classes: if you implement a minimal set of methods when inhetiring from collections[.abc].MutableMapping, all dictionary behavior is emulated - (__getitem__, __setitem__, __delitem__, __iter__, __len__) - Then, it is just a matter of proper iterating through the key components, and create new, empty, regular dictionaries to store the needed values.

try:
    from collections import MutableMapping
except ImportError:
    # Python3 compatible import
    from collections.abc import MutableMapping

class NestedDict(MutableMapping):
    def __init__(self, *args, **kw):
        self.data = dict(*args, **kw)

    def get_last_key_levels(self, key, create=False):
        if not isinstance(key, tuple):
            key = (key,)
        current_data = self.data
        for subkey in key:
            previous = current_data
            current_data = current_data[subkey] if not create else current_data.setdefault(subkey, {})
        return previous, current_data, subkey

    def __getitem__(self, key):
        previous, current_data, lastkey = self.get_last_key_levels(key)
        return current_data

    def __setitem__(self, key, value):
        previous, current_data, lastkey = self.get_last_key_levels(key, True)
        previous[lastkey] = value

    def __delitem__(self, key):
        previous, current_data, lastkey = self.get_last_key_levels(key)
        del previous[lastkey]

    def __iter__(self):
        return iter(self.data)

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return "NestedDict({})".format(repr(self.data))

And you are set to go:

>>> from nesteddict import NestedDict
>>> x = NestedDict(a={})
NestedDict({'a': {}})
>>> x["a", "b"] = 10
>>> x
NestedDict({'a': {'b': 10}})
>>> x["a", "c", "e"]  = 25
>>> x
NestedDict({'a': {'c': {'e': 25}, 'b': 10}})
>>> x["a", "c", "e"] 
25
>>> 

Please note that this is a high-level implementation, which will just work, but you will have nowhere near the optimization level you get on NumPy with this - to the contrary. If you will need to perform fast data operations in these objects, you maybe could check "cython" - or resort to your idea of transposing the dict keys to nuemric keys and use NumPy (that idea could still pick some ideas from this answer)

1 Comment

BrenBarn's pandas suggestion does most of what I want, although it doesn't seem to allow NestedDict['a', 'x']. Wrapping a thin layer on top of pandas with __getitem__ and __setitem__ as you described will pull it all together.
1

Use pandas Lets say the file is like this:

test.csv:

Params, Val1, Val2, Val3
Par1,23,58,412
Par2,56,45,123
Par3,47,89,984

So you can do something like this in python:

import pandas as pd
x = pd.read_csv('test.csv', index_col='Params')
x['Val1']['Par3']
47

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.