0

I am trying to convert an XML to JSON without using python package. To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. I am unable to distinguish the following elements while reading the XML from a list :

  1. <Description>TestData</Description>\n
  2. Data</Description>\n
  3. <Description>Test\n

The regex I am using to distinguish 1 and 3 are :

  1. x = re.compile("<Description>(.+?)<\/Description>\n")
  2. x = re.compile("^((?!Description).)*<\/Description>\\n")

I am finding it difficult to develop a regex for the THIRD one.

  1. x = re.compile("\s*<Description>(.+)(?!((<\/Description>)))\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1. This should identify only text 3.

1
  • 1
    Do not use regular expressions to parse XML. Your code will be wrong. Use an XML parser. (Incidentally, none of the three fragments you have posted are elements.) Commented Apr 6, 2018 at 7:55

2 Answers 2

1

You were very close. This regex works for what you need:

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? and ! to make a negative lookbehind assertion. Check this for more info: https://docs.python.org/2/library/re.html

Sign up to request clarification or add additional context in comments.

Comments

1

Do you want something like this?

<Description>([^<]+)\n

Demo

python script is

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is

['Test']

It seems capture[0] value is what you want..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.