Split a split (regex) in python

Question

I do have got the below string and I am looking for a way to split it in order to consistently end up with the following output

'1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'

['1GB 02060250396L1.060,70',
'2BE 129517720L2.639,40',
'3NL 134187650L4.024,23',
'4DE 165893440L8.111,00',
'5PL 65775644897L3.010,00',
'6DE 811506926L3.547,40',
'7AT U16235008L-830,00',
'8SE U57469158L8.0221,30']

My current approach re.split("([0-9][0-9][0-9][A-Z][A-Z])", input) however is also splitting my delimiter which gives and there is no other split possible than the one I am currently using in order to remain consistent. Is it possible to split my delimiter as well and assign a part of it "70" to the string in front and a part "2BE" to the following string?

pho · Accepted Answer · 2021-08-02 14:47:31Z

Use re.findall() instead of re.split().

You want to match

a number \d, followed by
two letters [A-Z]{2}, followed by
a space \s, followed by
a bunch of characters until you encounter a comma [^,]+, followed by
two digits \d{2}

Try it at regex101

So do:

input_str = '1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'

re.findall(r"\d[A-Z]{2}\s[^,]+,\d{2}", input_str)

Which gives

['1GB 02060250396L7.067,70',
 '2BE 129517720L6.633,40',
 '3NL 134187650L3.824,23',
 '4DE 165893440L3.111,00',
 '5PL 65775644897L1.010,00',
 '6DE 811506926L3.547,40',
 '7AT U16235008L-830,00',
 '8SE U57469158L3.001,30']

Alternatively, if you don't want to be so specific with your pattern, you could simply use the regex [^,]+,\d{2} Try it at regex101

This will match as many of any character except a comma, then a single comma, then two digits.

re.findall(r"[^,]+,\d{2}", input_str)

# Output:
['1GB 02060250396L7.067,70',
 '2BE 129517720L6.633,40',
 '3NL 134187650L3.824,23',
 '4DE 165893440L3.111,00',
 '5PL 65775644897L1.010,00',
 '6DE 811506926L3.547,40',
 '7AT U16235008L-830,00',
 '8SE U57469158L3.001,30']

Daweo · Accepted Answer · 2021-08-02 14:55:08Z

1

Is it possible to split my delimiter as well and assign a part of it "70" to the string in front and a part "2BE" to the following string?

If you must use re.split AT ANY PRICE then you might exploit zero-length assertion for this task following way

import re
text = '1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'
parts = re.split(r'(?<=,[0-9][0-9])', text)
print(parts)

output

['1GB 02060250396L7.067,70', '2BE 129517720L6.633,40', '3NL 134187650L3.824,23', '4DE 165893440L3.111,00', '5PL 65775644897L1.010,00', '6DE 811506926L3.547,40', '7AT U16235008L-830,00', '8SE U57469158L3.001,30', '']

Explanation: This particular one is positive lookbehind, it does find zero-length substring preceded by , digit digit. Note that parts has superfluous empty str at end.

answered Aug 2, 2021 at 14:55

Daweo

38.2k3 gold badges18 silver badges33 bronze badges

1 Comment

Fisqkuz Over a year ago

Hi Daweo, could you further elaborate on the zero-length assertion please? In the case where you got "1IT 02060250996L7.067,70" and you're working on a regex: (\d|[A-Z])[L|T|S](\d|[-]) ... how would you use the same logic here? where 6 has to remain attached to 99, L has to be a stand alone and 7 has to be attached to whatever comes after it (in this case a point (.) but could also be another number) ?

Collectives™ on Stack Overflow

Split a split (regex) in python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related