How to distinguish list pattern using a regex in python

Question

I am trying to convert an XML to JSON without using python package. To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. I am unable to distinguish the following elements while reading the XML from a list :

<Description>TestData</Description>\n
Data</Description>\n
<Description>Test\n

The regex I am using to distinguish 1 and 3 are :

x = re.compile("<Description>(.+?)<\/Description>\n")
x = re.compile("^((?!Description).)*<\/Description>\\n")

I am finding it difficult to develop a regex for the THIRD one.

x = re.compile("\s*<Description>(.+)(?!((<\/Description>)))\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1. This should identify only text 3.

Do not use regular expressions to parse XML. Your code will be wrong. Use an XML parser. (Incidentally, none of the three fragments you have posted are elements.) — Michael Kay
– Michael Kay, Commented Apr 6, 2018 at 7:55

A. Barrozo · Accepted Answer · 2018-04-06 02:28:15Z

1

You were very close. This regex works for what you need:

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? and ! to make a negative lookbehind assertion. Check this for more info: https://docs.python.org/2/library/re.html

answered Apr 6, 2018 at 2:28

A. Barrozo

959 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jon Clements · Accepted Answer · 2018-04-08 10:41:21Z

1

Do you want something like this?

<Description>([^<]+)\n

Demo

python script is

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is

['Test']

It seems capture[0] value is what you want..

edited Apr 8, 2018 at 10:41

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

answered Apr 6, 2018 at 2:40

Thm Lee

1,2361 gold badge9 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to distinguish list pattern using a regex in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related