1

I am trying to load a .csv file into an array. However, the file looks something like this.

"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
 .................................
 .................................

I am trying to skip the leading string. I've been doing away with the first row till now.

 a = np.genfromtxt(file,delimiter=',',skiprows=1)   

But I was wondering if there's a way to read into an array ignoring the string at the beginning in processing.

5
  • why not just use the csv module? Commented Jan 7, 2014 at 5:51
  • Is there just 1 string in the file? Or are there strings randomly dispersed throughout? Commented Jan 7, 2014 at 5:54
  • @GamesBrainiac -- with csv, you'd need to convert all of the strings to numbers yourself, manually filter out the stuff you don't want (the strings which are strings and not numbers) and then convert the entire thing into a numpy array. genfromtext is meant to handle csv files, although (AFAIK) not ones with "strings" in them. Commented Jan 7, 2014 at 5:55
  • @mgilson 1 string at the beginning Commented Jan 7, 2014 at 5:57
  • Use Pandas : pandas.pydata.org/pandas-docs/stable/generated/… Commented Jan 7, 2014 at 8:20

2 Answers 2

2

Can you just use loadtxt(..., usecols=(1,2,3), ...), which avoids skipping a line at the start of the file?

The usecols argument just tells loadtxt which columns to extract (and are numeric)

# Put data into file (in shell, just me copying the sample)
cat >> /tmp/data.csv
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422

# In IPython
In [1]: import numpy as np

In [2]: a = np.loadtxt('/tmp/data.csv', usecols=(1,2,3), delimiter=',')

In [3]: a
Out[3]: 
array([[ 0.03435345, -1.234556  , -3.        ],
       [ 1.43567896, -1.45322124,  9.543422  ]])
Sign up to request clarification or add additional context in comments.

1 Comment

He can do it with np.genfromtxt('temp.csv', delimiter=',', usecols=(1,2,3)) as well.
0

since it's just the first line at the beginning of the file, you could write a helper generator to remove that string for now:

def helper(filename):
    with open(filename) as fin:
        # this could get more robust ... e.g. by doing typechecking if necessary.
        line = next(fin).split(',')
        yield ','.join(line[1:])
        for line in fin:
            yield line

arr = np.genfromtxt(helper('myfile.csv'), delimiter=',')

4 Comments

I get a nan in the second row. That's obvious because the first row has more elements. However if I get rid of extra element from the first row genfromtxt raises execption line 2 got 4 columns instead of 3 . Why?
@AdaXu -- Not sure. I suppose I'd need to be able to reproduce the problem that you're having, but I don't know if I can do that with the data you've shown.
I just deleted 45671234 from the first row in the above example. I am guessing the rows below the string "myfilename" are substituted with nan at the first position. Not sure..I am new at python
@AdaXu, your second row starts with a ,, indicating it has 4 columns just as the first row. The helper() above removes the 1st column in the 1st row and it now has 3 columns. To get things working in this way, you need to remove the extra , as well. IMO, usecols=(1,2,3) will be just fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.