1

I do have got the below string and I am looking for a way to split it in order to consistently end up with the following output

'1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'
['1GB 02060250396L1.060,70',
'2BE 129517720L2.639,40',
'3NL 134187650L4.024,23',
'4DE 165893440L8.111,00',
'5PL 65775644897L3.010,00',
'6DE 811506926L3.547,40',
'7AT U16235008L-830,00',
'8SE U57469158L8.0221,30']

My current approach re.split("([0-9][0-9][0-9][A-Z][A-Z])", input) however is also splitting my delimiter which gives and there is no other split possible than the one I am currently using in order to remain consistent. Is it possible to split my delimiter as well and assign a part of it "70" to the string in front and a part "2BE" to the following string?

2 Answers 2

2

Use re.findall() instead of re.split().

You want to match

  • a number \d, followed by
  • two letters [A-Z]{2}, followed by
  • a space \s, followed by
  • a bunch of characters until you encounter a comma [^,]+, followed by
  • two digits \d{2}

Try it at regex101

So do:

input_str = '1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'

re.findall(r"\d[A-Z]{2}\s[^,]+,\d{2}", input_str)

Which gives

['1GB 02060250396L7.067,70',
 '2BE 129517720L6.633,40',
 '3NL 134187650L3.824,23',
 '4DE 165893440L3.111,00',
 '5PL 65775644897L1.010,00',
 '6DE 811506926L3.547,40',
 '7AT U16235008L-830,00',
 '8SE U57469158L3.001,30']

Alternatively, if you don't want to be so specific with your pattern, you could simply use the regex [^,]+,\d{2} Try it at regex101

This will match as many of any character except a comma, then a single comma, then two digits.

re.findall(r"[^,]+,\d{2}", input_str)

# Output:
['1GB 02060250396L7.067,70',
 '2BE 129517720L6.633,40',
 '3NL 134187650L3.824,23',
 '4DE 165893440L3.111,00',
 '5PL 65775644897L1.010,00',
 '6DE 811506926L3.547,40',
 '7AT U16235008L-830,00',
 '8SE U57469158L3.001,30']
Sign up to request clarification or add additional context in comments.

Comments

1

Is it possible to split my delimiter as well and assign a part of it "70" to the string in front and a part "2BE" to the following string?

If you must use re.split AT ANY PRICE then you might exploit zero-length assertion for this task following way

import re
text = '1GB 02060250396L7.067,702BE 129517720L6.633,403NL 134187650L3.824,234DE 165893440L3.111,005PL 65775644897L1.010,006DE 811506926L3.547,407AT U16235008L-830,008SE U57469158L3.001,30'
parts = re.split(r'(?<=,[0-9][0-9])', text)
print(parts)

output

['1GB 02060250396L7.067,70', '2BE 129517720L6.633,40', '3NL 134187650L3.824,23', '4DE 165893440L3.111,00', '5PL 65775644897L1.010,00', '6DE 811506926L3.547,40', '7AT U16235008L-830,00', '8SE U57469158L3.001,30', '']

Explanation: This particular one is positive lookbehind, it does find zero-length substring preceded by , digit digit. Note that parts has superfluous empty str at end.

1 Comment

Hi Daweo, could you further elaborate on the zero-length assertion please? In the case where you got "1IT 02060250996L7.067,70" and you're working on a regex: (\d|[A-Z])[L|T|S](\d|[-]) ... how would you use the same logic here? where 6 has to remain attached to 99, L has to be a stand alone and 7 has to be attached to whatever comes after it (in this case a point (.) but could also be another number) ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.