7

I have an input string like this: a1b2c30d40 and I want to tokenize the string to: a, 1, b, 2, c, 30, d, 40.

I know I can read each character one by one and keep track of the previous character to determine if I should tokenize it or not (2 digits in a row means don't tokenize it) but is there a more pythonic way of doing this?

1 Answer 1

13
>>> re.split(r'(\d+)', 'a1b2c30d40')
['a', '1', 'b', '2', 'c', '30', 'd', '40', '']

On the pattern: as the comment says, \d means "match one digit", + is a modifier that means "match one or more", so \d+ means "match as much digits as possible". This is put into a group (), so the entire pattern in context of re.split means "split this string using as much digits as possible as the separator, additionally capturing matched separators into the result". If you'd omit the group, you'd get ['a', 'b', 'c', 'd', ''].

Sign up to request clarification or add additional context in comments.

6 Comments

Umm I don't understand regex very well. Do you mind putting some explanation of the (\d+) pattern?
It splits on numbers/consecutive digits (\d is 0-9, + is one or more).
+1: but I think it would be better to use a raw string for the regexp
@6502: it matters only if there are any Python-interpreted escapes (e.g. \n or \t) in the pattern (to avoid escaping the `\` in them).
@Piotr: I know... but I think your answer would be more instructive with a raw string for two reasons: 1) in my experience the fact that Python leaves the backslash in the string if the escaped code is not valid is often surprising: open("c:\zer\woob.dat") works and open("c:\temp\buf.dat") doesn't 2) there are much more special chars than \t and \n and specifically '\b' is one of them and r'\b' is instead special for a regexp. By using that fancy r'(\d+)' may be the reader will take the time to read about the escaping problem...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.