How to use php to get each matched regex pattern

Question

I am attempting to use preg_match_all to extract a repeated pattern out of an html string.

The problem seems to be that my pattern has a defined beginning and end, but a wildcard portion in between. So the preg_match_all ends up only getting the biggest match, but not the individual matches.

My ultimate goal is to isolate each <a ...>some text</a> out of an html string, and to wrap them as so: <a ...>some text</a>.

But first off I want to simply successfully isolate them each:

$lvs_regex = "/<a.+<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
for($i = 0 ; $i < count( $matches ) ; $i++ )
  { print $matches[ $i ][0] . "<br/>" ;
  }

The return that I want:

[0] => <a href='...'>AAA</a>

[1] => <a href='...'>BBB</a>

[2] => <a href='...'>CCC</a>

But I only get one match:

[0] => <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a>

Read up on greediness with .*? or use a negated character class to only match non-tagish content in between. — mario
– mario, Commented Dec 14, 2013 at 11:31
Instead of enclose each links inside  tags, why you don't use a css rule? — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Dec 14, 2013 at 12:08
Casimir, the text is actually being sent to flash, which has limits on its html text. — dsdsdsdsd
– dsdsdsdsd, Commented Dec 14, 2013 at 12:10
Mario ... 'greediness' was a concept that I had never heard of .. it got me on the right track, and enabled me to understand ilpaijin's answer. — dsdsdsdsd
– dsdsdsdsd, Commented Dec 14, 2013 at 12:11

ilpaijin · Accepted Answer · 2013-12-14 13:02:24Z

1

Maybe something like this:

$lvs_regex = "/<a.*?<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches);

Basically the pattern needed is /<a.*?<\/a>/. This match every occurrence in your string.

Now, var_dump($matches[0]) gives

array (size=3)
    0 => string '<a href='...'>AAA</a>' (length=21)
    1 => string '<a href='...'>BBB</a>' (length=21)
    2 => string '<a href='...'>CCC</a>' (length=21)

that is the return that you want.

So by following with

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    var_dump($matches[0][ $i ] . "<br/>");
}

you see now it's matching every occurrence:

string '<a href='...'>AAA</a><br/>' (length=26)
string '<a href='...'>BBB</a><br/>' (length=26)
string '<a href='...'>CCC</a><br/>' (length=26)

-------- NEW EDIT ---------

So now you can modifiy your loop in order to wrap every a tag matched.

$result='';

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    $result .= "<font ...>".$matches[0][ $i ] . "</font><br/>";
} 

var_dump($result);

And you get

<a href='...'>AAA</a> <a href='...'>BBB</a> <a href='...'>CCC</a> 

---------- NEW EDIT ----------

As suggested @Casimir et Hippolyte by you can avoid the matching of "wrong or unwanted" tag as abbr by adding a word boudary in the pattern:

$lvs_regex = "/<a\b.*?<\/a>/" ;

and optionally obtain the same result by using a foreach instead of a for loop. Ex:

foreach($matches[0] as $matches)
{ 
    $result .= "<font ...>".$matches . "</font><br/>";
}

And a link about foreach internal behaviour, in case you would get a deep look at the construct.

edited Dec 14, 2013 at 13:02

answered Dec 14, 2013 at 11:35

ilpaijin

3,7152 gold badges25 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dsdsdsdsd Over a year ago

that works ... I incorrectly posted that it did not work because I did not notice the $matches[0]... part of your answer ... sorry, and thanks.

Casimir et Hippolyte Over a year ago

Adding a word boundary avoid to match an <abbr> tag: <a\b.... Why using a for loop when foreach can do the job?

ilpaijin Over a year ago

You're absolutely right but I tent to not entirely explode the OP code, trying to follow his direction. I think it's part of his journey to find the correct way to do his stuff. By the way I'll edit the code with your suggestion

Dmitry Dubovitsky · Accepted Answer · 2013-12-16 10:59:14Z

0

$lvs_regex = "/<a.+<\/a>/U" ;

$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
if ($matches) {
    foreach ($matches[0] as $match) {
        print $match."\n";
    }
}

Result is:

<a href='...'>AAA</a>
<a href='...'>BBB</a>
<a href='...'>CCC</a>

Use 'ungreedy' specificator /U

http://www.php.net/manual/fa/reference.pcre.pattern.modifiers.php

answered Dec 16, 2013 at 10:59

Dmitry Dubovitsky

2,2361 gold badge17 silver badges24 bronze badges

Collectives™ on Stack Overflow

How to use php to get each matched regex pattern

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related