numpy doesn't recognize data types in conversion

Question

I would be grateful if you could help me with a solution you gave a while back in the link below: Converting a list of ints, tuples into an numpy array

as you may recall you explained a method of converting a tuple to a numpy array. I'm working on a project of a data mining nature and I found out that the most fastest way to collect the data is by using tuples but for more then just recording input I need a numpy array. so I looked up your solution and in kinda worked - the problem is with data types. I have a tuple that looks like this :

t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

and when I try to modify your code like so

A = np.array([tuple(i) for i in t1],dtype=[('ReportTime',datetime.datetime.__class__),('activity',str.__class__)])

the numpy doesn't recognize the data types. am I putting the wrong data types? thank you for your time

ericmjl · Accepted Answer · 2013-12-30 20:31:06Z

3

Since you're working on a project of a datamining nature, have you considered using Pandas instead?

Here's an example of how I can convert a list of tuples into a Pandas dataframe. I've highlighted a few common newbie errors I made when I first started out with Pandas, to give you an idea of what you can do and cannot do.

In [1]: import pandas as pd

In [2]: data = [(1, 2), (1, 5), (2, 3), (2, 2)]

In [3]: pd.datafr                         

In [3]: pd.DataFrame(data)
Out[3]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [4]: pd.columns[0] = 'column 1'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c313e6b0cb87> in <module>()
----> 1 pd.columns[0] = 'column 1'

AttributeError: 'module' object has no attribute 'columns'

In [5]: df = pd.DataFrame(data)

In [6]: df
Out[6]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [7]: df.columns
Out[7]: Int64Index([0, 1], dtype=int64)

In [8]: df.columns[1] = "column 2"
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-76ee806aec72> in <module>()
----> 1 df.columns[1] = "column 2"

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.12.0-py2.7-macosx-10.6-intel.egg/pandas/core/index.pyc in __setitem__(self, key, value)
    328 
    329     def __setitem__(self, key, value):
--> 330         raise Exception(str(self.__class__) + ' object is immutable')
    331 
    332     def __getitem__(self, key):

Exception: <class 'pandas.core.index.Int64Index'> object is immutable

In [9]: df.columns = ["column 1", "column 2"]

In [10]: df
Out[10]: 
   column 1  column 2
0         1         2
1         1         5
2         2         3
3         2         2

In [11]: exit()

Specifically with your example:

In [1]: import pandas as pd

In [3]: import datetime

In [4]: t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [5]: t1
Out[5]: 
[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],
 [datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],
 [datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [6]: df = pd.DataFrame(t1)

In [7]: df
Out[7]: 
                    0       1
0 2013-10-01 20:54:51    last
1 2013-08-01 20:54:51   First
2 2013-09-02 20:54:51  second

answered Dec 30, 2013 at 20:31

ericmjl

14.9k13 gold badges57 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

omryjs Over a year ago

it moves later to signal processing and to working with scipy toolset. does Pandas play nice with scipy?

ericmjl Over a year ago

Yep, absolutely. Check it out: pandas.pydata.org For reference, I'm doing biological science analysis, and I have been using Pandas in IPython HTML notebooks. BioPython, NetworkX etc. all come into play.

ericmjl Over a year ago

Because I deal with biological sequences and its metadata, I do a lot of saving CSV files in order to preserve the intermediate steps of my analysis work. Pandas plays extremely well with CSV files, for example. Also, because Pandas dataframes are essentially built on numpy arrays, you can always convert a dataframe to a numpy array by using df.as_matrix().

Iguananaut · Accepted Answer · 2013-12-30 20:24:01Z

1

Don't use .__class__? If you're unsure, just look at what that actually does:

>>> import datetime
>>> datetime.datetime.__class__
<class 'type'>
>>> str.__class__
<class 'type'>

datetime.datetime and str are already classes, essentially, that you can pass to Numpy for it to determine the appropriate dtype for that class (if in fact it does have a dtype associated with those classes, which should work for datetime.datetime and for str).

str.__class__ on the other hand, is the class of the class str (Python classes are objects too). The class of most classes is type unless it was defined with a custom metaclass.

answered Dec 30, 2013 at 20:24

Iguananaut

23.8k6 gold badges54 silver badges65 bronze badges

2 Comments

abarnert Over a year ago

Also, as a side note, on the rare occasions when you do want to know the type of the str class, it's usually clearer to write type(str) rather than str.__class__.

omryjs Over a year ago

I see my mistake. I moved on and tried 'np.asarray(sortedHer)' and it did the conversion without any problems. Thanks

Collectives™ on Stack Overflow

numpy doesn't recognize data types in conversion

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related