0

I am trying to open this dataset: https://www.kaggle.com/dalpozz/creditcardfraud

Using Ipython notebook. I tried:

data = pd.read_csv("...Desktop/creditcard.csv")

And got:

CParserError: Error tokenizing data. C error: out of memory.

Then I tried the solution pointed by Noobie here: Error tokenizing data. C error: out of memory pandas python, large file csv

And now it can load the data. However, now my data looks like a matrix:

entry 0,0: blank;
entry 0,1: All the headers are here;
entry 1,0: 0
entry 1,1: A whole line of unseparated data here
entry 2,0: 1
entry 2,1: A whole line of unseparated data here
...

What can I do to properly format the data?

My implementation:

mylist = []

for chunk in  pd.read_csv('.../Desktop/creditcard.csv', sep=',', chunksize=2000):
    mylist.append(chunk)

data = pd.concat(mylist, axis= 0)
del mylist

Few lines of data:
1st line: Time,"V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
2nd line:
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62,"0"

8
  • What's the separator for the csv? If I understand the sample you provided, it's not splitting the data correctly. Specify sep in pd.read_csv. Commented Mar 17, 2017 at 12:42
  • Please edit your post with a snippet of your data as kaggle requires an account for csv file and post actual solution tried as we need to see implementation. Commented Mar 17, 2017 at 12:50
  • Hi, added my implementation. Can't seem to understand how to upload an image here... Commented Mar 17, 2017 at 13:01
  • No image needed. Just copy and paste the first few lines of the data Commented Mar 17, 2017 at 13:23
  • What is ...Desktop? What operating system are you using? Commented Mar 17, 2017 at 13:37

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.