1

I have duration string noted as '48m 37s' and sometimes as '1h 38m 29s' and a collection of string that holds this data in pandas dataframe's column

I'm trying to convert datatype of duration column to datetime as follows

pd.to_datetime(usg['duration'], format='%Hh %Mm %Ss')

but failing with the following error

ValueError: time data '1h 38m 29s' does not match format '%Mm %Ss' (match)

I understand that sometimes the hour part is missing in the list entries of duration column and wondering if there is a way to specify multiple formats if in case one fails to match other succeeds.

Doing so shall yield the output as below preserving the order of entries in the column

     00:39:40
     01:38:29
     07:39:40

2 Answers 2

1

Better use: pd.to_timedelta(usg['duration'])

usg = pd.DataFrame({'duration': ['48m 37s', '1h 38m 29s']})

pd.to_timedelta(usg['duration'])

gives the output:

0   00:48:37
1   01:38:29
Name: duration, dtype: timedelta64[ns]
Sign up to request clarification or add additional context in comments.

3 Comments

This will fail on the OP's sample data, also don't post a code snippet as this isn't useful for others. You should post a complete code example that demonstrates that this works for the OP
This does not fail.
Sorry I had missing s in my data, the point remains, please don't post short code snippets, answers should be complete with sample data, code and produced output as currently it looks like a comment
1

You need:

usg = pd.DataFrame({'duration':['7h 39m 40s','15h 39m 40s','39m 40s']})
print (usg)


usg['duration'] = np.where(usg.duration.str.contains('h'), 
                pd.to_datetime(usg['duration'], format='%Hh %Mm %Ss', errors='coerce'),
                pd.to_datetime(usg['duration'], format='%Mm %Ss',errors='coerce'))
print (usg)
             duration
0 1900-01-01 07:39:40
1 1900-01-01 15:39:40
2 1900-01-01 00:39:40

Another solution:

usg['duration'] = pd.to_datetime(usg['duration'].where(usg.duration.str.contains('h'), 
                                 '0h '+ usg['duration']),format='%Hh %Mm %Ss')
print (usg)
             duration
0 1900-01-01 07:39:40
1 1900-01-01 15:39:40
2 1900-01-01 00:39:40

usg.loc[~usg.duration.str.contains('h'), 'duration'] = '0h '+ usg['duration']
usg['duration'] = pd.to_datetime(usg['duration'], format='%Hh %Mm %Ss')
print (usg)
             duration
0 1900-01-01 07:39:40
1 1900-01-01 15:39:40
2 1900-01-01 00:39:40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.