26

Simple question about numpy:

I load 100 values to a vector a. From this vector, I want to create an array A with 2 columns, where one column has name "C1" and second one "C2", one has type int32 and another int64. An example:

a = range(100)
A = array(a).reshape( len(a)/2, 2)
# A.dtype = ...?

How to define the columns' types and names, when I create array from a?

4
  • Your best bet is to wrap the array (actually 2) and a list of names into a container class and use that. Commented Aug 12, 2011 at 9:24
  • @Keith: do you mean any particular class (I am new in numpy)? Commented Aug 12, 2011 at 9:25
  • No, I mean one you create. Then you delegate operations to your arrays from methods you define in your new class. Also define a __str__ method to pretty-print your arrays with headers. Commented Aug 12, 2011 at 9:28
  • Do you need to have your data in one array? That is, are you going to perform operations on the whole array at once (even though you state you want different datatypes per column), or are you going to perform different operations per column? In the latter case, there is perhaps no reason to put them in one numpy array instead of multiple different arrays with different names. And as per Keith's suggestion you could combine those separate arrays in a class or a named tuple. Commented Aug 12, 2011 at 9:36

2 Answers 2

24

NumPy structured arrays have named columns:

import numpy as np
    
a = range(100)
A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')])
print(A.dtype)
[('C1', '<i4'), ('C2', '<i8')]

You can access the columns by name like this:

print(A['C1'])
# [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
#  50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]

Note that using np.array with zip causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip.

Instead, given a NumPy array A, you could use ravel() to make A a 1D array, and then use view to turn it into a structured array, and then use astype to convert the columns to the desired type:

a = range(100)
A = np.array(a).reshape( len(a)//2, 2)
A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),])
print(A[:5])
# array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)], 
#       dtype=[('col1', '<i4'), ('col2', '<i8')])

print(A.dtype)
# dtype([('col1', '<i4'), ('col2', '<i8')])
Sign up to request clarification or add additional context in comments.

2 Comments

brilliant and not unknown but amazingly not common knowledge which it should be.
11

I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.