2

I was wondering if there was an easy way to create a class to handle both integer and keyword indexing of a numpy array of numbers.

The end goal is to have a numpy array that I can also index using the names of each variable. For example, if I have the lists

import numpy as np
a = [0,1,2,3,4]
names = ['name0','name1','name2','name3','name4']
A = np.array(a)

I would like to be able to get the values of A easily with a call of (for example) A['name1'], yet have the array retain all of the functionality of a numpy array.

Thanks!

Peter

Edit:

Thanks so much for the help, I'll try to be more clear on the intended use! I have an existing set of code which uses a numpy array to store and apply a vector of variables. My vector has around 30 entries.

When I want to see the value of a particular variable, or when I want to make a change to one of them, I have to remember which entry corresponds to which variable (the order or number of entries doesn't necessarily change once the array is created). Right now I use a dictionary to keep track. For example, I have a numpy array 'VarVector' with with 30 values. "vmax" is entry 15, with a value of 0.432. I'll then have a concurrent dictionary with 30 keys 'VarDict', such that VarDict[entry] = index. This way I can find the value of vmax by chaining the calls

VarVector[VarDict["vmax"]]

which would return 0.432

I was wondering if there would be a good way of simply combining these two structures, such that both VarVector[15] (for compatibility) and VarVector["vmax"] (for convenience to me) would point to the same number.

Thanks! Peter

4
  • 2
    The point of numpy arrays is that they're written in C and hence fast. If you do this you lose the benefit of numpy arrays -- you might as well use a Python list! Commented Jan 17, 2012 at 22:33
  • Can you give a reason why you want to do this? Commented Jan 17, 2012 at 22:33
  • 1
    @katrielalex - Not necessarily... The __getitem__ of a numpy array is already quite slow. You're not going to significantly slow things down by adding this to it. However, this is a fairly common use case and has already been done a couple of times (pandas and larry). Have a look at this comparison: scipy.org/StatisticalDataStructures Having "labeled axes" or "labeled items" is a nice thing to have in some cases. Commented Jan 18, 2012 at 0:06
  • Fair enough, I stand corrected. Thanks =) Commented Jan 18, 2012 at 0:24

3 Answers 3

1

From your description, it sounds like you just want a structured array (which is built-in to numpy). E.g.

# Let's suppose we have 30 observations with 5 variables each...
# The five variables are temp, pressure, x-velocity, y-velocity, and z-velocity
x = np.random.random((30, 5))

# Make a structured dtype to represent our variables...
dtype=dict(names=['temp', 'pressure', 'x_vel', 'y_vel', 'z_vel'],
           formats=5 * [np.float])

# Now view "x" as a structured array with the dtype we created...
data = x.view(dtype)

# Each measurement will now have the name fields we created...
print data[0]
print data[0]['temp']

# If we want, say, all the "temp" measurements:
print data['temp']

# Or all of the "temp" and "x_vel" measurements:
print data[['temp', 'x_vel']]

Also have a look at rec arrays. They're slightly more flexible but significantly slower.

data = np.rec.fromarrays(*x, 
              names=['temp', 'pressure', 'x_vel', 'y_vel', 'z_vel'])
print data.temp

However, you'll soon hit the limitations of either of these methods (i.e. you can name both axes). In that case, have a look at larry, if you just want to label items, or pandas if you want to have labeled arrays with a lot of nice missing-value handling.

Sign up to request clarification or add additional context in comments.

Comments

0

I have not tested this, but it should work.

The idea is to assume that the input is an int and use it for the numpy array, and if it isn't, use it for the dict.

import numbers
import numpy

class ThingArray:
    def __init__(self):
        self.numpy_array = numpy.array()
        self.other_array = dict()

    def __setitem__(self, key, value):
        if isinstance(key, numbers.Integral):
            self.numpy_array[key] = value
        else:
            self.other_array[key] = value

    def __getitem__(self, key):
        if isinstance(key, numbers.Integral):
            return self.numpy_array[key]
        else:
            return self.other_array[key]


thing = ThingArray()

thing[1] = 100
thing["one"] = "hundred"        

print thing[1]
print thing["one"]

Comments

0

You could subclass the ndarray and override the relevant methods (ie __getitem__, __setitem__, ...). More info here. This is similar to @Joe's answer, but has the advantage that it preserves almost all of the functionality of the ndarray. You obviously won't be able to do the following anymore:

In [25]: array = np.empty(3, dtype=[('char', '|S1'), ('int', np.int)])

In [26]: array['int'] = [0, 1, 2]

In [27]: array['char'] = ['a', 'b', 'c']

In [28]: array
Out[28]: 
array([('a', 0), ('b', 1), ('c', 2)], 
      dtype=[('char', '|S1'), ('int', '<i8')])

In [29]: array['char']
Out[29]: 
array(['a', 'b', 'c'], 
      dtype='|S1')

In [30]: array['int']
Out[30]: array([0, 1, 2])

If we knew why you wanted to do this, we might be able to give a more detailed answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.