1

Lets say I need to get a string inside some h1, h2, or h3 tags

/<[hH][1-3][^>]*>(.*?)<\/[hH][1-3]>/

This works great if the user decides to take a sane approach to headers:

<h1>My Header</h1>

but knowing my users, they want bold, italic, underlined h1's. And they have that coding quagmire tinyMCE to help them do it. TinyMCE would output:

<h1><b><span style='text-decoration: underline'><i>My Hideous Header</i></span></b></h1>

So my question is:

How do i get a string inside h1 h2, or h3, and then inside any amount of surrounding other tags as well?

Thanks, Joe

1
  • 1
    What about this? <h1><b>My <i>Hideous</i> Header</b></h1> Would you want to retrieve the full title string with its embedded <i> tags? Commented Sep 3, 2009 at 0:06

3 Answers 3

3
/<(h[1-3])[^>]*>(?:.*?>)?([^<]+)(?:<.*?)?<\/\1>/i

It will not be too hard to make cases that break it hideously, since (as I'm sure people will tell you) parsing HTML is a job for an HTML parser, not a regex, but it works for your given case and various similar ones.

Sign up to request clarification or add additional context in comments.

Comments

1

If you're in php you can use your regex:

/<[hH][1-3][^>]*>(.*?)<\/[hH][1-3]>/

then pass the captured result through strip_tags() function to get rid of all the insanity inside.

If you are not on php you can pass the result through regexp replace that removes tags. Something like replace /<\/?[^>]+?>/ with empty string.

Comments

-1

If you only want to capture the ultimately nested text you could just drop all tags inside the header tag with:

/<([hH][1-3]).*>(.*?)<.*\/$1>/

Untested, but I think it should work.

1 Comment

Nope. (.*?) is allowed to match nothing, and thanks to the greedy .* ahead of it, that's exactly what it does.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.