1

I have a Twitter data set, which I am trying to analyse using pandas, but I can't figure out how to convert (e.g. "2 days", "24 hours" or "2 months", "5 years ")into datetime format.

I used the following code:

for i df_merge['last_tweet']:
    n = i['last_tweet'].split(" ") [0]
    d =  i['last_tweet'].split(" ") [1]
if d in ["years", "year"]:
    n_days = n*365
elif d in ["months", "month"]:
    n_days = n*30

1 Answer 1

2

you may want to write a helper function...

import numpy as np
import pandas as pd

def ym2nptimedelta(delta):
    delta_cfg = {
        'month': 'M',
        'months': 'M',
        'year': 'Y',
        'years': 'Y'
    }
    n, item = delta.lower().split()
    return np.timedelta64(n, delta_cfg.get(item))

print(pd.datetime.today() - pd.Timedelta('2 days'))
print(pd.datetime.today() - pd.Timedelta('24 hours'))
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('2 years'))
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('5 years'))

Output:

2016-03-08 20:39:34.315969
2016-03-09 20:39:34.315969
2014-03-11 09:01:10.316969
2011-03-11 15:33:34.317969

UPDATE1 (this helper function will take care of all acceptable numpy time-deltas):

import numpy as np
import pandas as pd

def deltastr2date(delta):
    delta_cfg = {
        'year': 'Y',
        'years': 'Y',
        'month': 'M',
        'months': 'M',
        'week': 'W',
        'weeks': 'W',
        'day': 'D',
        'days': 'D',
        'hour': 'h',
        'hours': 'h',
        'min': 'm',
        'minute': 'm',
        'minutes': 'm',
        'sec': 's',
        'second': 's',
        'seconds': 's',
    }
    n, item = delta.lower().split()
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item))

print(deltastr2date('2 days'))
print(deltastr2date('24 hours'))
print(deltastr2date('2 years'))
print(deltastr2date('5 years'))
print(deltastr2date('1 week'))
print(deltastr2date('10 hours'))
print(deltastr2date('45 minutes'))

OUTPUT:

2016-03-08 20:50:01.701853
2016-03-09 20:50:01.702853
2014-03-11 09:11:37.702853
2011-03-11 15:44:01.703853
2016-03-03 20:50:01.704854
2016-03-10 10:50:01.705854
2016-03-10 20:05:01.705854

UPDATE2 (showing how to apply the helper function to the DF column):

import numpy as np
import pandas as pd

def deltastr2date(delta):
    delta_cfg = {
        'year': 'Y',
        'years': 'Y',
        'month': 'M',
        'months': 'M',
        'week': 'W',
        'weeks': 'W',
        'day': 'D',
        'days': 'D',
        'hour': 'h',
        'hours': 'h',
        'min': 'm',
        'minute': 'm',
        'minutes': 'm',
        'sec': 's',
        'second': 's',
        'seconds': 's',
    }
    n, item = delta.lower().split()
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item))

N = 20

dt_units = ['seconds','minutes','hours','days','weeks','months','years']

# generate random list of deltas
deltas = ['{0[0]} {0[1]}'.format(tup) for tup in zip(np.random.randint(1,5,N), np.random.choice(dt_units, N))]

df = pd.DataFrame({'delta': pd.Series(deltas)})

# add new column 
df['last_tweet_dt'] = df['delta'].apply(deltastr2date)
print(df)

OUTPUT:

        delta              last_tweet_dt
0     3 hours 2016-03-10 20:32:49.252525
1      4 days 2016-03-06 23:32:49.252525
2   3 seconds 2016-03-10 23:32:46.253525
3     1 weeks 2016-03-03 23:32:49.253525
4   1 minutes 2016-03-10 23:31:49.253525
5   2 minutes 2016-03-10 23:30:49.253525
6      4 days 2016-03-06 23:32:49.254525
7     1 years 2015-03-11 17:43:37.254525
8   2 seconds 2016-03-10 23:32:47.254525
9   3 minutes 2016-03-10 23:29:49.254525
10    1 hours 2016-03-10 22:32:49.255525
11  2 seconds 2016-03-10 23:32:47.255525
12  3 minutes 2016-03-10 23:29:49.255525
13   3 months 2015-12-10 16:05:31.255525
14    4 weeks 2016-02-11 23:32:49.256526
15   3 months 2015-12-10 16:05:31.256526
16    4 hours 2016-03-10 19:32:49.256526
17    1 years 2015-03-11 17:43:37.256526
18    2 years 2014-03-11 11:54:25.257526
19  1 minutes 2016-03-10 23:31:49.257526
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! I am very new to Python and I have problem applying this function to a column of the dataset. I tried this code: date=df_merge['last_tweet'] new_tweet=(deltastr2date(date)) print(new_tweet)
please post sample input data and expected output data and also the error stack
#Sample input<br> |**last_tweet**|<br> |-----------------|<br> |4 days |<br> |NaN |<br> |1 day |<br> |2 days | <br> |24 hours |<br> |1 month | <br> #Sample output<br> |**last_tweet**|<br> |-----------------|<br> |4 |<br> |NaN |<br> |1 |<br> |2 | <br> |24 |<br> |1 |<br>
@Sil, i've updated my answer. please feel free to accept the answer if it was helpful

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.