How to convert this regular expression into Python

Question

I want to use this regular expression in Python:

 <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>

def removeHtmlTags(page):
    p = re.compile(r'XXXX')
    return p.sub('', page)

It seems that I cannot directly substitute the complex regular expression into the above function.

Are you escaping the apostrophes in the regex with a backslash? Can we see the real code you have that isn't working? — Tom
– Tom, Commented Mar 10, 2010 at 14:00

Ignacio Vazquez-Abrams · Accepted Answer · 2010-03-10 13:59:13Z

3

Works fine here. You're probably having trouble because of the quotes. Just triple-quote it:

def removeHtmlTags(page):
    p = re.compile(r'''<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>''')
    return p.sub('', page)

answered Mar 10, 2010 at 13:59

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mcrisc · Accepted Answer · 2010-03-10 14:04:54Z

0

If you need to remove HTML tags, this should do it:

import re

def removeHtmlTags(page):
    pattern = re.compile(r'\<[^>]+\>', re.I)
    return pattern.sub('', page)

answered Mar 10, 2010 at 14:04

mcrisc

8191 gold badge10 silver badges19 bronze badges

That wasn't the question, but the point of the original regex is to allow for angle brackets within attribute values.