0

I am attempting to use preg_match_all to extract a repeated pattern out of an html string.

The problem seems to be that my pattern has a defined beginning and end, but a wildcard portion in between. So the preg_match_all ends up only getting the biggest match, but not the individual matches.

My ultimate goal is to isolate each <a ...>some text</a> out of an html string, and to wrap them as so: <font ...><a ...>some text</a></font>.

But first off I want to simply successfully isolate them each:

$lvs_regex = "/<a.+<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
for($i = 0 ; $i < count( $matches ) ; $i++ )
  { print $matches[ $i ][0] . "<br/>" ;
  } 

The return that I want:

[0] => <a href='...'>AAA</a>

[1] => <a href='...'>BBB</a>

[2] => <a href='...'>CCC</a>

But I only get one match:

[0] => <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a>

5
  • 1
    Read up on greediness with .*? or use a negated character class to only match non-tagish content in between. Commented Dec 14, 2013 at 11:31
  • Instead of enclose each links inside <font> tags, why you don't use a css rule? Commented Dec 14, 2013 at 12:08
  • Casimir, the text is actually being sent to flash, which has limits on its html text. Commented Dec 14, 2013 at 12:10
  • Mario ... 'greediness' was a concept that I had never heard of .. it got me on the right track, and enabled me to understand ilpaijin's answer. Commented Dec 14, 2013 at 12:11
  • @dsdsdsdsd: I can't wait that this technology disappears! Commented Dec 14, 2013 at 12:14

2 Answers 2

1

Maybe something like this:

$lvs_regex = "/<a.*?<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches);

Basically the pattern needed is /<a.*?<\/a>/. This match every occurrence in your string.

Now, var_dump($matches[0]) gives

array (size=3)
    0 => string '<a href='...'>AAA</a>' (length=21)
    1 => string '<a href='...'>BBB</a>' (length=21)
    2 => string '<a href='...'>CCC</a>' (length=21)

that is the return that you want.

So by following with

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    var_dump($matches[0][ $i ] . "<br/>");
} 

you see now it's matching every occurrence:

string '<a href='...'>AAA</a><br/>' (length=26)
string '<a href='...'>BBB</a><br/>' (length=26)
string '<a href='...'>CCC</a><br/>' (length=26)

-------- NEW EDIT ---------

So now you can modifiy your loop in order to wrap every a tag matched.

$result='';

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    $result .= "<font ...>".$matches[0][ $i ] . "</font><br/>";
} 

var_dump($result);

And you get

<font ...><a href='...'>AAA</a></font><br/><font ...><a href='...'>BBB</a></font><br/><font ...><a href='...'>CCC</a></font><br/>

---------- NEW EDIT ----------

As suggested @Casimir et Hippolyte by you can avoid the matching of "wrong or unwanted" tag as abbr by adding a word boudary in the pattern:

$lvs_regex = "/<a\b.*?<\/a>/" ; 

and optionally obtain the same result by using a foreach instead of a for loop. Ex:

foreach($matches[0] as $matches)
{ 
    $result .= "<font ...>".$matches . "</font><br/>";
} 

And a link about foreach internal behaviour, in case you would get a deep look at the construct.

Sign up to request clarification or add additional context in comments.

3 Comments

that works ... I incorrectly posted that it did not work because I did not notice the $matches[0]... part of your answer ... sorry, and thanks.
Adding a word boundary avoid to match an <abbr> tag: <a\b.... Why using a for loop when foreach can do the job?
You're absolutely right but I tent to not entirely explode the OP code, trying to follow his direction. I think it's part of his journey to find the correct way to do his stuff. By the way I'll edit the code with your suggestion
0
$lvs_regex = "/<a.+<\/a>/U" ;

$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
if ($matches) {
    foreach ($matches[0] as $match) {
        print $match."\n";
    }
}

Result is:

<a href='...'>AAA</a>
<a href='...'>BBB</a>
<a href='...'>CCC</a>

Use 'ungreedy' specificator /U

http://www.php.net/manual/fa/reference.pcre.pattern.modifiers.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.