0

I have been working on a python Data Visualization project of WhatsApp Chat. I have a string like this.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'

And I want to break it down to exactly like this.

['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

I have tried doing with split() function, but I can't seem to get this exact same thing. Also first time field will vary so length of that field might not be the same every time.

I would appriciate some help. Thanks.

1
  • 2
    maybe [line[:line.index(']')+1], line[line.index(']')+2:]] Commented May 14, 2020 at 13:12

5 Answers 5

1

Find the first occurrence of ] and use that for slicing:

[line[:line.find(']')+1],line[line.find(']')+2:]]

BTW: It should be faster to use a helper variable for the find result, which might be better for you when you are doing DataViz:

f=line.find(']')
[line[:f+1],line[f+2:]]

Results from timeit:

>>> import timeit
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; [line[:line.find(']')+1],line[line.find(']')+2:]]")
0.33965302700016764
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; f=line.find(']'); [line[:f+1],line[f+2:]]")
0.21619235799971648
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, This is perfect. Thanks
1

try this:

r = line.split(']', 1)
r[0] += ']'

2 Comments

Also if message contains any ']' then also this would might fail.
@ghostshelltaken it will split on first occurence
1
line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
reslist =line.split(']',1)
reslist[0] += "]" # needed because split removes delimiter
reslist[1] = reslist[1].lstrip()
print(reslist) # ['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

1 Comment

This fails if the message contains any ']'.
1
import re
re.split(r'(?<=\])\s', line, 1)
['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

It will split at whitespace if there is a bracket in front of it and only split once.

  • \s matches any whitespace
  • (?<=\]) is a look behind to look for ] (denoted as \])

3 Comments

Thanks this works and I would appreciate if you would have shown how this works since I am not familier with import re.
@ghostshelltaken i gave a short explanation and shortend the regex. maybe it helps a bit
This helps, Thanks. I will aslo look into re.
0

I guess we take for granted that the format of the date is a constant thus we have a max length of it of 22.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
loc = line.find(']',0,22)   # find its location => 21 

our_result = [
    line[0:loc+1], line[loc+2:]
]

4 Comments

The format of the date is not constant: [14/11/18, 2:47:26 PM] or [14/11/18, 10:47:26 PM]
when I said constant I meant length will not be larger than 22
Yes, but your code is not working when the time is bigger than 9: find will return -1, just try it. You must either set the end parameter of find to 23 or drop the start and end parameter completely, because find will return the first occurrence.
just add a simple if to check for it: if loc == -1: loc = line.find(']',0,23)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.