ignoring string while reading into an array

Question

I am trying to load a .csv file into an array. However, the file looks something like this.

"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
 .................................
 .................................

I am trying to skip the leading string. I've been doing away with the first row till now.

 a = np.genfromtxt(file,delimiter=',',skiprows=1)

But I was wondering if there's a way to read into an array ignoring the string at the beginning in processing.

Is there just 1 string in the file? Or are there strings randomly dispersed throughout? — mgilson
– mgilson, Commented Jan 7, 2014 at 5:54
@GamesBrainiac -- with csv, you'd need to convert all of the strings to numbers yourself, manually filter out the stuff you don't want (the strings which are strings and not numbers) and then convert the entire thing into a numpy array. genfromtext is meant to handle csv files, although (AFAIK) not ones with "strings" in them. — mgilson
– mgilson, Commented Jan 7, 2014 at 5:55
Use Pandas : pandas.pydata.org/pandas-docs/stable/generated/… — cyborg
– cyborg, Commented Jan 7, 2014 at 8:20

CT Zhu · Accepted Answer · 2014-01-07 19:33:15Z

2

Can you just use loadtxt(..., usecols=(1,2,3), ...), which avoids skipping a line at the start of the file?

The usecols argument just tells loadtxt which columns to extract (and are numeric)

# Put data into file (in shell, just me copying the sample)
cat >> /tmp/data.csv
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422

# In IPython
In [1]: import numpy as np

In [2]: a = np.loadtxt('/tmp/data.csv', usecols=(1,2,3), delimiter=',')

In [3]: a
Out[3]: 
array([[ 0.03435345, -1.234556  , -3.        ],
       [ 1.43567896, -1.45322124,  9.543422  ]])

edited Jan 7, 2014 at 19:33

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

answered Jan 7, 2014 at 13:11

Chris

4302 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CT Zhu Over a year ago

He can do it with np.genfromtxt('temp.csv', delimiter=',', usecols=(1,2,3)) as well.

mgilson · Accepted Answer · 2014-01-07 09:56:53Z

0

since it's just the first line at the beginning of the file, you could write a helper generator to remove that string for now:

def helper(filename):
    with open(filename) as fin:
        # this could get more robust ... e.g. by doing typechecking if necessary.
        line = next(fin).split(',')
        yield ','.join(line[1:])
        for line in fin:
            yield line

arr = np.genfromtxt(helper('myfile.csv'), delimiter=',')

edited Jan 7, 2014 at 9:56

answered Jan 7, 2014 at 6:00

mgilson

312k70 gold badges656 silver badges722 bronze badges

4 Comments

Ada Xu Over a year ago

I get a nan in the second row. That's obvious because the first row has more elements. However if I get rid of extra element from the first row genfromtxt raises execption line 2 got 4 columns instead of 3 . Why?

mgilson Over a year ago

@AdaXu -- Not sure. I suppose I'd need to be able to reproduce the problem that you're having, but I don't know if I can do that with the data you've shown.

Ada Xu Over a year ago

I just deleted 45671234 from the first row in the above example. I am guessing the rows below the string "myfilename" are substituted with nan at the first position. Not sure..I am new at python

CT Zhu Over a year ago

@AdaXu, your second row starts with a ,, indicating it has 4 columns just as the first row. The helper() above removes the 1st column in the 1st row and it now has 3 columns. To get things working in this way, you need to remove the extra , as well. IMO, usecols=(1,2,3) will be just fine.

Collectives™ on Stack Overflow

ignoring string while reading into an array

2 Answers 2

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related