2

I have a csv file in which there are always two first columns, but with varying number of columns for different files. The csv can look like this:

Gondi,4012,227,233,157,158,149,158
Gondi,4013,227,231,156,159,145,153
Gondu,4014,228,233,157,158,145,153
Gondu,4015,227,231,156,159,149,158

For now I am working with NumPy, and my code for loading this data is:

import numpy as np
def readfile(fname):
    with open(fname) as f:
       ncols = len(f.readline().split(','))
    name = np.loadtxt(fname, delimiter=',', usecols=[0],dtype=str)
    ind  = np.loadtxt(fname, delimiter=',', usecols=[1],dtype=int)
    data = np.loadtxt(fname, delimiter=',', usecols=range(2,ncols),dtype=int)
    return data,name,ind

Can I do the same thing with pandas more efficiently?

1 Answer 1

1

I think you can use read_csv and iloc for select first, second and other columns:

import pandas as pd
import io

temp=u"""Gondi,4012,227,233,157,158,149,158
Gondi,4013,227,231,156,159,145,153
Gondu,4014,228,233,157,158,145,153
Gondu,4015,227,231,156,159,149,158"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None)
print df

name = df.iloc[:,0]
print name
0    Gondi
1    Gondi
2    Gondu
3    Gondu
Name: 0, dtype: object

ind = df.iloc[:,1]
print ind
0    4012
1    4013
2    4014
3    4015
Name: 1, dtype: int64

data = df.iloc[:,2:]
print data
     2    3    4    5    6    7
0  227  233  157  158  149  158
1  227  231  156  159  145  153
2  228  233  157  158  145  153
3  227  231  156  159  149  158
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.