1

Funny that my last question was on the same topic, but alas:

I'm running the following code:

preg_match('/<th.*>.*Organizer.*title=\".*\">(.*)<\/a>/mi', $file_string, $organizer);

On the following content:

<tr>
<th valign="top"> Organizer:
</th>
<td style="width:55%;"> <a href="/starcraft2/TaKe" title="TaKe">TaKe</a>
</td></tr>

And I can't for the life of me figure out why it's not working. I can get it to match Organizer: with the regexp '/.*Organizer', but it seems that as soon as there's a new line it stops to work, despite having the /m option. Any ideas?

5
  • Its a multiline output? because your regex I believe doesn't say it should traverse lines, so by default it works on one line Commented Jan 15, 2014 at 13:20
  • My bad, php's m behavior is multiline Commented Jan 15, 2014 at 13:21
  • Your problem is that .* does NOT match newline Commented Jan 15, 2014 at 13:24
  • Isn't /s the modifier you are looking for? php.net/manual/en/reference.pcre.pattern.modifiers.php Commented Jan 15, 2014 at 13:24
  • If you use the /m option this first piece of code <th.*> will match everything till the last > that's your problem Commented Jan 15, 2014 at 13:25

2 Answers 2

1

Okay so the issue is the new-line constant, however this Regex will get the text of the a element:

<th.*|\n>.*|\nOrganizer.*|\n*title=".*">(.*)<\/a>

Take note to the expression *|\n.

Here is a Regex 101 to prove it.


As Niet stated, you could just use the s modifier. The Regex would then be:

<th.*>.*Organizer.*title=".*">(.*)<\/a>

but you would send in an additional modifier - s. Here is a Regex 101 to prove it.

Sign up to request clarification or add additional context in comments.

4 Comments

Why not just use the DOTALL modifier s?
@NiettheDarkAbsol, fantastic idea--thanks a lot! You learn something new every day!
I too would use the s modifier instead. That's a lot more easier.
Thanks a lot for the help, however, I've run into an interesting problem. I'm trying to grab this data from this page from the table on the right hand side, and the regexp you provided is working fine when taking out the text on its own, but as soon as I try to get the page with file_get_contents() and run the regexp on that, the match stops working. Any ideas? Perhaps this is a question in itself.
0

The dot metacharacter, by default, does not match newlines. If you also want . to match newlines, you need the s modifier.

From the PHP manual:

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded.

However, it's generally a bad idea to use regex to parse HTML. I suggest you use a DOM Parser instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.