Python genfromtext multiple datatypes

Question

I would like to read in a csv file using genfromtxt. I have six columns that are float, and one column that is a string.

How do I set the datatype so that the float columns will be read in as floats and the string column will be read in as strings? I tried dtype='void' but that is not working.

Suggestions?

Thanks

.csv file

999.9, abc, 34, 78, 12.3
1.3, ghf, 12, 8.4, 23.7
101.7, evf, 89, 2.4, 11.3



x = sys.argv[1]
f = open(x, 'r')
y = np.genfromtxt(f, delimiter = ',', dtype=[('f0', '<f8'), ('f1', 'S4'), (\
'f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8')])

ionenergy = y[:,0]
units = y[:,1]

Error:

ionenergy = y[:,0]
IndexError: invalid index

I don't get this error when I specify a single data type..

unutbu · Accepted Answer · 2013-10-27 21:17:42Z

4

dtype=None tells genfromtxt to guess the appropriate dtype.

From the docs:

dtype: dtype, optional

Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.

(my emphasis.)

Since your data is comma-separated, be sure to include delimiter=',' or else np.genfromtxt will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns.

For example:

import numpy as np

arr = np.genfromtxt('data', dtype=None, delimiter=',')

print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]

This shows the names and dtypes of each column. For example, ('f3', <f8) means the fourth column has name 'f3' and is of dtype '<i4. The i means it is an integer dtype. If you need the third column to be a float dtype then there are a few options.

You could manually edit the data by adding a decimal point in the third column to force genfromtxt to interpret values in that column to be of a float dtype.

You could supply the dtype explicitly in the call to genfromtxt

arr = np.genfromtxt(
    'data', delimiter=',',
    dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])

print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
#  (101.7, ' evf', 89, 2.4, 11.3)]

print(arr['f2'])
# [34 12 89]

The error message IndexError: invalid index is being generated by the line

ionenergy = y[:,0]

When you have mixed dtypes, np.genfromtxt returns a structured array. You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype.

Instead of y[:, 0], to access the first column of the structured array y, use

y['f0']

Or, better yet, supply the names parameter in np.genfromtxt, so you can use a more relevant column name, like y['ionenergy']:

import numpy as np
arr = np.genfromtxt(
    'data', delimiter=',', dtype=None,
    names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])

print(arr['ionenergy'])
# [ 999.9    1.3  101.7]

edited Oct 27, 2013 at 21:17

answered Oct 27, 2013 at 20:15

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

unutbu Over a year ago

What was the resultant dtype? Was there any text in the columns that should be floats? Did you use names or skip_header to deal with the header (if there was one)?

user2483176 Over a year ago

I don't have a header. Using dtype=None, I get an error of an invalid index range when I try to assign each column. Also, there is not text in a column that should be just float.

unutbu Over a year ago

Please post a sample of the text you are parsing with genfromtxt.

user2483176 Over a year ago

Okay, I tried specifying each column, but an still getting an invalid index error.

unutbu Over a year ago

Could you post your code and the full traceback error message?

|

Ken Williams · Accepted Answer · 2017-03-16 03:06:15Z

-1

Please try this:

import numpy

ionenergy = y.iloc[:,0]
units = y.iloc[:,1]

edited Mar 16, 2017 at 3:06

Ken Williams

24.3k12 gold badges100 silver badges157 bronze badges

answered Mar 16, 2017 at 3:01

Joye

1

1 Comment

Ken Williams Over a year ago

Hi @Joye, can you explain what the .iloc construction is doing here, and what that has to do with data types?

Collectives™ on Stack Overflow

Python genfromtext multiple datatypes

2 Answers 2

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related