Trouble with regexp in PHP

Question

Funny that my last question was on the same topic, but alas:

I'm running the following code:

preg_match('/<th.*>.*Organizer.*title=\".*\">(.*)<\/a>/mi', $file_string, $organizer);

On the following content:

<tr>
<th valign="top"> Organizer:
</th>
<td style="width:55%;"> <a href="/starcraft2/TaKe" title="TaKe">TaKe</a>
</td></tr>

And I can't for the life of me figure out why it's not working. I can get it to match Organizer: with the regexp '/.*Organizer', but it seems that as soon as there's a new line it stops to work, despite having the /m option. Any ideas?

Its a multiline output? because your regex I believe doesn't say it should traverse lines, so by default it works on one line — Noam Rathaus
– Noam Rathaus, Commented Jan 15, 2014 at 13:20
Isn't /s the modifier you are looking for? php.net/manual/en/reference.pcre.pattern.modifiers.php — Aioros
– Aioros, Commented Jan 15, 2014 at 13:24
If you use the /m option this first piece of code <th.*> will match everything till the last > that's your problem — Jorge Campos
– Jorge Campos, Commented Jan 15, 2014 at 13:25

Mike Perrenoud · Accepted Answer · 2014-01-15 13:29:53Z

1

Okay so the issue is the new-line constant, however this Regex will get the text of the a element:

<th.*|\n>.*|\nOrganizer.*|\n*title=".*">(.*)<\/a>

Take note to the expression *|\n.

Here is a Regex 101 to prove it.

As Niet stated, you could just use the s modifier. The Regex would then be:

<th.*>.*Organizer.*title=".*">(.*)<\/a>

but you would send in an additional modifier - s. Here is a Regex 101 to prove it.

answered Jan 15, 2014 at 13:29

Mike Perrenoud

68.1k32 gold badges167 silver badges238 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Niet the Dark Absol Over a year ago

Why not just use the DOTALL modifier s?

Mike Perrenoud Over a year ago

@NiettheDarkAbsol, fantastic idea--thanks a lot! You learn something new every day!

Amal Over a year ago

I too would use the s modifier instead. That's a lot more easier.

Anders Over a year ago

Thanks a lot for the help, however, I've run into an interesting problem. I'm trying to grab this data from this page from the table on the right hand side, and the regexp you provided is working fine when taking out the text on its own, but as soon as I try to get the page with file_get_contents() and run the regexp on that, the match stops working. Any ideas? Perhaps this is a question in itself.

Amal · Accepted Answer · 2014-01-15 13:26:27Z

0

The dot metacharacter, by default, does not match newlines. If you also want . to match newlines, you need the s modifier.

From the PHP manual:

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded.

However, it's generally a bad idea to use regex to parse HTML. I suggest you use a DOM Parser instead.

answered Jan 15, 2014 at 13:26

Amal

76.8k18 gold badges134 silver badges155 bronze badges

Collectives™ on Stack Overflow

Trouble with regexp in PHP

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related