1

I am trying to understand (and eventually use) the implementation of object arrays using record arrays from numpy from here: Numpy object array in reviewing the code I am apparently learning new things about python and I can't seem to fully understand the following:

In the obarray.py file a function is used to create a new object and I am confused as to

  1. Why a function is used,
  2. How the arguments play into the function,
  3. How using this function differs from creating the class with the arguments directly and (presumably using the arguments as attributes), and
  4. What is this main.Obarray I get when I just call the function?

For 1 and 2 I have a hunch that the arguments somehow become that objects "local scope" and are perhaps similar to a object attribute?

Here is the code for the new object from the link:

import numpy as np
def make_obarray(klass, dtype):
    class Obarray(np.ndarray):
        def __new__(cls, obj):
            print "CLS:", cls
            print "OBJ:", obj
            A = np.array(obj,dtype=np.object)
            N = np.empty(shape=A.shape, dtype=dtype)
            for idx in np.ndindex(A.shape):
                for name, type in dtype:
                    N[name][idx] = type(getattr(A[idx],name))
            return N.view(cls)
        def __getitem__(self, idx):
            V = np.ndarray.__getitem__(self,idx)
            if np.isscalar(V):
                kwargs = {}
                for i, (name, type) in enumerate(dtype):
                     kwargs[name] = V[i]
                return klass(**kwargs)
            else:
                return V
        def __setitem__(self, idx, value):
            if isinstance(value, klass):
                value = tuple(getattr(value, name) for name, type in dtype)
            # FIXME: treat lists of lists and whatnot as arrays
            return np.ndarray.__setitem__(self, idx, value)
    return Obarray

Here is how I am testing it:

class Foo:
            def __init__(self, a, b):
                self.a = a
                self.b = b
            def __str__(self):
                return "<Foo a=%s b=%s>" % (self.a, self.b)
dtype = [("a",np.int),
                 ("b",np.float)]
FooArray = make_obarray(Foo, dtype)

A = FooArray([Foo(0,0.1),Foo(1,1.2),Foo(2,2.1),Foo(3,3.3)])
  1. When I call FooArray I get __main__.Obarray - what is this?
  2. What happened to "klass" and "dtype" that I entered as arguments?
  3. How is this different from something along the lines of:

Blockquote

class Obarray(np.ndarray):
    def __new__(cls,input_array, klass, dtype):
       obj = np.assarray(input_array).view(cls) 
       obj.klass = klass
       obj.dtype = dtype
       A = np.array(obj,dtype=np.object)
       N = np.empty(shape=A.shape, dtype=dt ype)
       for idx in np.ndindex(A.shape):
            for name, type in dtype:
                N[name][idx] = type(getattr(A[idx],name))
       obj.N = N.view(np.ndarray) 
       return obj
2
  • I'm not sure if your confusion is specific to this numpy code or more general. Do you understand how closures work? Have you ever used a type-factory like collections.namedtuple before? Commented Feb 12, 2016 at 23:07
  • it's a more general question that came up in this code so I use it as my example. Yes, I am familiar with namedtuple, however namedtuple's are immutable Commented Feb 12, 2016 at 23:09

1 Answer 1

1

The make_obarray function is a factory that produces classes. The methods of the class's it returns will be closures that can access the function's local variables (e.g. the klass and dtype arguments) even after it has finished running.

Here's a much simpler closure that might help you understand how they work:

def make_adder(x):
    def adder(y):
        return x + y
    return adder

make_adder is a factory function. It returns an adder function which is a closure. adder can still see the x argument of the make_adder call in which it was defined, even after make_adder has returned.

This is simlar to the numpy code you've shown. The make_obarray function returns a class, rather than a function, but otherwise it's almost the same. The class's qualified name will be some_module.Obarray in Python 2 (or some_module.make_obarray.<locals>.Obarray in Python 3), where some_module is the name of the module it was defined in (or __main__ if you've executed its module as a script). The methods of the returned class will be able to see the klass and dtype arguments passed into make_obarray, just like the adder function could see the x argument to make_adder in my simpler example.

As for why the code you've found is written that way, I couldn't say. Perhaps the code's author thought it would be useful to be able to use isinstance to distinguish between instances of Obarray with different klass or dtype values:

FooArray = make_obarray(Foo, dtype)
BarArray = make_obarray(Bar, some_other_dtype)

f = FooArray([Foo(1,2)])

print(isinstance(f, FooArray)) # True
print(isinstance(f, BarArray)) # False

If the klass and dtype were just arguments to a single class, you couldn't tell the difference between the array instances in this way (though you could probably come up with an equivalent check that compared the instances' attributes).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.