Problem with string manipulation in python

Question

I have been working on a python Data Visualization project of WhatsApp Chat. I have a string like this.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'

And I want to break it down to exactly like this.

['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

I have tried doing with split() function, but I can't seem to get this exact same thing. Also first time field will vary so length of that field might not be the same every time.

I would appriciate some help. Thanks.

maybe [line[:line.index(']')+1], line[line.index(']')+2:]] — FObersteiner
– FObersteiner, Commented May 14, 2020 at 13:12

Ocaso Protal · Accepted Answer · 2020-05-20 13:37:57Z

1

Find the first occurrence of ] and use that for slicing:

[line[:line.find(']')+1],line[line.find(']')+2:]]

BTW: It should be faster to use a helper variable for the find result, which might be better for you when you are doing DataViz:

f=line.find(']')
[line[:f+1],line[f+2:]]

Results from timeit:

>>> import timeit
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; [line[:line.find(']')+1],line[line.find(']')+2:]]")
0.33965302700016764
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; f=line.find(']'); [line[:f+1],line[f+2:]]")
0.21619235799971648

edited May 20, 2020 at 13:37

answered May 14, 2020 at 13:13

Ocaso Protal

20.5k8 gold badges80 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ghostshelltaken Over a year ago

Yes, This is perfect. Thanks

Wariored · Accepted Answer · 2020-05-14 13:25:00Z

1

try this:

r = line.split(']', 1)
r[0] += ']'

edited May 14, 2020 at 13:25

answered May 14, 2020 at 13:08

Wariored

1,35315 silver badges25 bronze badges

2 Comments

ghostshelltaken Over a year ago

Also if message contains any ']' then also this would might fail.

Wariored Over a year ago

@ghostshelltaken it will split on first occurence

J. Question · Accepted Answer · 2020-05-14 13:28:50Z

1

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
reslist =line.split(']',1)
reslist[0] += "]" # needed because split removes delimiter
reslist[1] = reslist[1].lstrip()
print(reslist) # ['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

edited May 14, 2020 at 13:28

answered May 14, 2020 at 13:12

J. Question

414 bronze badges

1 Comment

ghostshelltaken Over a year ago

This fails if the message contains any ']'.

gbruenjes · Accepted Answer · 2020-05-14 13:35:10Z

1

import re
re.split(r'(?<=\])\s', line, 1)
['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

It will split at whitespace if there is a bracket in front of it and only split once.

\s matches any whitespace
(?<=\]) is a look behind to look for ] (denoted as \])

edited May 14, 2020 at 13:35

answered May 14, 2020 at 13:14

gbruenjes

4,2251 gold badge18 silver badges32 bronze badges

3 Comments

ghostshelltaken Over a year ago

Thanks this works and I would appreciate if you would have shown how this works since I am not familier with import re.

gbruenjes Over a year ago

@ghostshelltaken i gave a short explanation and shortend the regex. maybe it helps a bit

ghostshelltaken Over a year ago

This helps, Thanks. I will aslo look into re.

khamlichi.khalil · Accepted Answer · 2020-05-14 13:37:37Z

0

I guess we take for granted that the format of the date is a constant thus we have a max length of it of 22.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
loc = line.find(']',0,22)   # find its location => 21 

our_result = [
    line[0:loc+1], line[loc+2:]
]

answered May 14, 2020 at 13:37

khamlichi.khalil

1366 bronze badges

4 Comments

Ocaso Protal Over a year ago

The format of the date is not constant: [14/11/18, 2:47:26 PM] or [14/11/18, 10:47:26 PM]

khamlichi.khalil Over a year ago

when I said constant I meant length will not be larger than 22

Ocaso Protal Over a year ago

Yes, but your code is not working when the time is bigger than 9: find will return -1, just try it. You must either set the end parameter of find to 23 or drop the start and end parameter completely, because find will return the first occurrence.

khamlichi.khalil Over a year ago

just add a simple if to check for it: if loc == -1: loc = line.find(']',0,23)

Collectives™ on Stack Overflow

Problem with string manipulation in python

5 Answers 5

1 Comment

2 Comments

1 Comment

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

2 Comments

1 Comment

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related