4

I'm starting with python and pandas and matplotlib. I'm working with data with over million entries. I'm trying to change the date format. In CSV file date format is 23-JUN-11. I will like to use dates in future to plot amount of donation for each candidate. How to convert the date format to a readable format for pandas?

Here is the link to cut file 149 entries

My code:

%matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

First candidate

reader_bachmann = pd.read_csv('P00000001-ALL.csv' ,converters={'cand_id': lambda x: str(x)[1:]},parse_dates=True, squeeze=True, low_memory=False, nrows=411 )

date_frame = pd.DataFrame(reader_bachmann, columns = ['contb_receipt_dt'])

Data slice

  s = date_frame.iloc[:,0]
    date_slice = pd.Series([s])
    date_strip = date_slice.str.replace('JUN','6') 

Trying to convert to new date format

 date = pd.to_datetime(s, format='%d%b%Y')
    print(date_slice)

Here is the error message

ValueError: could not convert string to float: '05-JUL-11'
6
  • Please show an example of the date - as it is in the csv Commented Apr 20, 2017 at 16:58
  • @GiantsLoveDeathMetal Column name is contb_receipt_dt and date format is 6/20/2011 Commented Apr 20, 2017 at 17:05
  • But pandas when I print array see it as date format 23-JUN-11 Commented Apr 20, 2017 at 17:08
  • Can you please post a snippet of the CSV, 151MB kinda big. just a hundred rows is good enough to work on this question yeah? Commented Apr 20, 2017 at 17:20
  • @JimFactor I posted a new link with a smaller version of the file 149 entries. Commented Apr 20, 2017 at 17:33

2 Answers 2

7

You need to use a different date format string:

format='%d-%b-%y'

Why?

The error message gives a clue as to what is wrong:

ValueError: could not convert string to float: '05-JUL-11'

The format string controls the conversion, and is currently:

format='%d%b%Y'

And the fields needed are:

%y - year without a century (range 00 to 99)
%b - abbreviated month name
%d - day of the month (01 to 31)

What is missing is the - that are separating the field in your data string, and the y for a two digit year instead of the current Y for a four digit year.

Sign up to request clarification or add additional context in comments.

Comments

2

As an alternative you can use dateutil.parser to parse dates containing string directly, I have created a random dataframe for demo.

l = [] 
for i in range(100):
    l.append('23-JUN-11') 
B = pd.DataFrame({'Date':l})

Now, Let's import dateutil.parser and apply it on our date column

import dateutil.parser
B['Date2'] = B['Date'].apply(lambda x : dateutil.parser.parse(x))
B.head()
Out[106]: 
    Date      Date2
0  23-JUN-11 2011-06-23
1  23-JUN-11 2011-06-23
2  23-JUN-11 2011-06-23
3  23-JUN-11 2011-06-23
4  23-JUN-11 2011-06-23

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.