-2

Possible Duplicate:
using python, Remove HTML tags/formatting from a string

I read in a HTML file:

fi = open("Tree.html", "r")
text = fi.read()

I want to delete the HTML header from the text:

text = re.sub("<head>.*?</head>", "", text)

Why does this not work?

2

1 Answer 1

1

It looks like you're not catching newlines. You need to add the DOTALL flag.

text = re.sub("<head>.*?</head>", "", text, flags=re.DOTALL)
Sign up to request clarification or add additional context in comments.

3 Comments

Error message: TypeError: sub() got an unexpected keyword argument 'flags'
What version of python are you using? flags keyword is v2.7+.
I am using Python v2.6. Without "flags=" it works. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.