0

I want to use this regular expression in Python:

 <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>

(from RegEx match open tags except XHTML self-contained tags)

def removeHtmlTags(page):
    p = re.compile(r'XXXX')
    return p.sub('', page)

It seems that I cannot directly substitute the complex regular expression into the above function.

3
  • What's the error or problem you got? Commented Mar 10, 2010 at 13:59
  • Are you escaping the apostrophes in the regex with a backslash? Can we see the real code you have that isn't working? Commented Mar 10, 2010 at 14:00
  • that helps me: regex101.com (check the python flavor) Commented Feb 9, 2017 at 20:42

2 Answers 2

3

Works fine here. You're probably having trouble because of the quotes. Just triple-quote it:

def removeHtmlTags(page):
    p = re.compile(r'''<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>''')
    return p.sub('', page)
Sign up to request clarification or add additional context in comments.

Comments

0

If you need to remove HTML tags, this should do it:

import re

def removeHtmlTags(page):
    pattern = re.compile(r'\<[^>]+\>', re.I)
    return pattern.sub('', page)

1 Comment

That wasn't the question, but the point of the original regex is to allow for angle brackets within attribute values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.