0

I have a large data set with many columns. I am making each column an array. The first column is time in $H:$M:$S

00:00:00
00:00:01
00:00:02
...
23:59:58
23:59:59

When I put this into an array, it makes an object array. I use this to convert it to datetime:

time1=np.array2string(time)                  
dt.datetime.strptime(time1, "%H:%M:%S")

However, I keep getting an error:

ValueError: time data "[b'00:00:00' b'00:01:00' b'00:02:00' ... b'23:57:00' b'23:58:00'\n b'23:59:00']" does not match format '%H:%M:%S'

When I look at the actual array, it indeed does have that phantom 'b', but there is no 'b' in my dataset. It generates it out of thin air. What is causing this?

UPDATE:

I tried

time1=np.array2string(time)                  
time_strings = [dt.datetime.strptime(t, "%H:%M:%S") for t in time1]

and received the error:

ValueError: time data '[' does not match format '%H:%M:%S'

Not sure why a bracket is in there. It still appears to be making a 'b'.

8
  • strptime takes a string as first positional argument, not the string representation of a whole array (which is what you tried to do). could use list comp instead: time_strings = [dt.datetime.strptime(t, "%H:%M:%S") for t in time] Commented May 4, 2020 at 14:42
  • 1
    prefix b indicates binary strings. maybe your data is encoded as binary? Commented May 4, 2020 at 14:49
  • @QuangHoang: have a look at the return value of np.array2string - it will be hard to make datetime.datetime.strptime work with that I guess Commented May 4, 2020 at 14:50
  • @MrFuppes didn't notice the function. No idea why OP used it anyway. Commented May 4, 2020 at 14:52
  • I am still receiving errors and edited the question to show them. The "type" of object array for time is listed as "bytes64". Commented May 4, 2020 at 14:52

1 Answer 1

1

your input seems to be an array of byte objects. you'll need to decode the bytes to string before you can parse them with strptime. example:

from datetime import datetime
import numpy as np

time = np.array([b'00:00:00', b'00:00:01', b'00:00:02'])

dt_list = [datetime.strptime(t.decode(encoding='utf-8'), "%H:%M:%S") for t in time]

# dt_list 
# [datetime.datetime(1900, 1, 1, 0, 0),
#  datetime.datetime(1900, 1, 1, 0, 0, 1),
#  datetime.datetime(1900, 1, 1, 0, 0, 2)]

note: 'utf-8' is the default, adjust if you have a different encoding.

Sign up to request clarification or add additional context in comments.

1 Comment

sidenote: this seems like a pretty inefficient way to store something with 8 bytes per entry that only needs max. 3 bytes :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.