1

I would be grateful if you could help me with a solution you gave a while back in the link below: Converting a list of ints, tuples into an numpy array

as you may recall you explained a method of converting a tuple to a numpy array. I'm working on a project of a data mining nature and I found out that the most fastest way to collect the data is by using tuples but for more then just recording input I need a numpy array. so I looked up your solution and in kinda worked - the problem is with data types. I have a tuple that looks like this :

t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

and when I try to modify your code like so

A = np.array([tuple(i) for i in t1],dtype=[('ReportTime',datetime.datetime.__class__),('activity',str.__class__)])

the numpy doesn't recognize the data types. am I putting the wrong data types? thank you for your time

2 Answers 2

3

Since you're working on a project of a datamining nature, have you considered using Pandas instead?

Here's an example of how I can convert a list of tuples into a Pandas dataframe. I've highlighted a few common newbie errors I made when I first started out with Pandas, to give you an idea of what you can do and cannot do.

In [1]: import pandas as pd

In [2]: data = [(1, 2), (1, 5), (2, 3), (2, 2)]

In [3]: pd.datafr                         

In [3]: pd.DataFrame(data)
Out[3]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [4]: pd.columns[0] = 'column 1'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c313e6b0cb87> in <module>()
----> 1 pd.columns[0] = 'column 1'

AttributeError: 'module' object has no attribute 'columns'

In [5]: df = pd.DataFrame(data)

In [6]: df
Out[6]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [7]: df.columns
Out[7]: Int64Index([0, 1], dtype=int64)

In [8]: df.columns[1] = "column 2"
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-76ee806aec72> in <module>()
----> 1 df.columns[1] = "column 2"

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.12.0-py2.7-macosx-10.6-intel.egg/pandas/core/index.pyc in __setitem__(self, key, value)
    328 
    329     def __setitem__(self, key, value):
--> 330         raise Exception(str(self.__class__) + ' object is immutable')
    331 
    332     def __getitem__(self, key):

Exception: <class 'pandas.core.index.Int64Index'> object is immutable

In [9]: df.columns = ["column 1", "column 2"]

In [10]: df
Out[10]: 
   column 1  column 2
0         1         2
1         1         5
2         2         3
3         2         2

In [11]: exit()

Specifically with your example:

In [1]: import pandas as pd

In [3]: import datetime

In [4]: t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [5]: t1
Out[5]: 
[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],
 [datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],
 [datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [6]: df = pd.DataFrame(t1)

In [7]: df
Out[7]: 
                    0       1
0 2013-10-01 20:54:51    last
1 2013-08-01 20:54:51   First
2 2013-09-02 20:54:51  second
Sign up to request clarification or add additional context in comments.

3 Comments

it moves later to signal processing and to working with scipy toolset. does Pandas play nice with scipy?
Yep, absolutely. Check it out: pandas.pydata.org For reference, I'm doing biological science analysis, and I have been using Pandas in IPython HTML notebooks. BioPython, NetworkX etc. all come into play.
Because I deal with biological sequences and its metadata, I do a lot of saving CSV files in order to preserve the intermediate steps of my analysis work. Pandas plays extremely well with CSV files, for example. Also, because Pandas dataframes are essentially built on numpy arrays, you can always convert a dataframe to a numpy array by using df.as_matrix().
1

Don't use .__class__? If you're unsure, just look at what that actually does:

>>> import datetime
>>> datetime.datetime.__class__
<class 'type'>
>>> str.__class__
<class 'type'>

datetime.datetime and str are already classes, essentially, that you can pass to Numpy for it to determine the appropriate dtype for that class (if in fact it does have a dtype associated with those classes, which should work for datetime.datetime and for str).

str.__class__ on the other hand, is the class of the class str (Python classes are objects too). The class of most classes is type unless it was defined with a custom metaclass.

2 Comments

Also, as a side note, on the rare occasions when you do want to know the type of the str class, it's usually clearer to write type(str) rather than str.__class__.
I see my mistake. I moved on and tried 'np.asarray(sortedHer)' and it did the conversion without any problems. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.