0

I'm parsing an XML file and need to remove some clutter from the final output.

str = <?xml version="1.0" encoding="UTF-8" standalone="yes"?><chat-message>2018-10

my attempt at a solution is:

re.sub(r'<(\w|\d|\s){1,}>{1,4}',"",str)

and my desired output is:

2018-10

Currently Python is finding no matches and just returning str. I don't think < or > are special characters so no escaping needed; I tried escaping anyway and it still did not work.

1
  • Could you give some more examples it's hard to infer what you want to match exactly. For now maybe like re.sub(r'.*>(?!<)', "", str) (We match as much as possible until we hit a > which isn't immediately followed by a <) Commented Dec 20, 2018 at 19:16

3 Answers 3

4

In my opinion you are better off using an XML parser rather than regex. Here is an example using xml.etree.ElementTree:

import xml.etree.ElementTree as ET

xmlstring = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><chat-message>2018-10</chat-message>'
root = ET.fromstring(xmlstring)

print(root.text)
# OUTPUT
# 2018-10
Sign up to request clarification or add additional context in comments.

3 Comments

Did this already. Remnants of the xml tags remain in the parser output.
@MSanders If the xml is valid, then you could post an example of the xml that is giving you trouble and we may be able to help.
The xml has a lot of sensitive info in it (SSN, account numbers, etc). The time it would take to redact/remove this info would be insane.
1

You could try something simpler:

re.sub(r'<.*?>', '', str)

1 Comment

Winner winner chicken dinner Fred! The look ahead functionality of regex is very confusing to me. I've read some stuff on-line but cannot wrap my head around it. Thanks.
0

This regex works for the test case in your question -

r"<[\w\D]+>([-\d]+)"

You can test it here -

https://regex101.com/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.