2

Im trying to get the sentence which contain the link in the following text :

<p> Referencement PG1 est spécialiste en référencement depuis 2004. Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver. Fascinez le regard avec le film vidéo. Vous demeurerez persistant sur les plateformes Youtube, Dailymotion ... Les images Video apparaissant dans les index de Google appâteront les surfeurs. <img style="padding:5px;float:left" src="http://thumbs.virtual-tour.tv/referencementpage1.jpg Par le appel à la Vidéo, faites-vous connaître. </p>

which means this sentence :

Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver.

Im using this regexp :

([A-Z][^<]*)<a[^>]*>([^<]*)</a>([^\.!\?]*)

I cant find ou why its not working, it's giving me the previsou sentence with the one i need :

Referencement PG1 est spécialiste en référencement depuis 2004. Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver.

What am-I missing ? Thanks for help =D

EDIT (some code):

preg_match_all('#([A-Z][^<\.!\?]*)<a[^>]*>([^<]*)</a>(.*[^\.!\?]*)#U', $spinnedText, $matches);
echo "<pre>";
print_r($matches);
echo "</pre>";
foreach($matches[1] as $key=>$value){
//$spinnedText = str_replace($matches[0][$key], "<a {title=\"".$this->url."\"|} {rev=\"{index|help|bookmark|friend}\"|} {dir=\"rtl\"|}{rel=\"{friend|bookmark|help|}\"|} href=\"".$this->url."\">".trim($value)."</a>", $spinnedText);
$spinnedText = str_replace($matches[0][$key], "<a {title=\"".$this->url."\"|} {rev=\"{index|help|bookmark|friend}\"|} {dir=\"rtl\"|}{rel=\"{friend|bookmark|help|}\"|} href=\"".$this->url."\">".$matches[1][$key].$matches[2][$key].$matches[3][$key]."</a>", $spinnedText);
}
4
  • Could you post the full php code you are using to do this? Commented May 31, 2012 at 13:07
  • Your HTML is not valid. always make sure the tags are properly closed. Commented May 31, 2012 at 13:10
  • @twall HTML doesnt matter, its cincidered as text for string matching Commented May 31, 2012 at 13:13
  • 2
    I'll just leave this here. Commented May 31, 2012 at 13:14

3 Answers 3

1

Your regular expression still matches the first sentence since it begins with a capital letter. You need to start out with \. or (?:^|[\.!?]) or something, but that may be a problem for you since the first sentence may also be valid in some circumstances. Is it possible that you can have multiple sentences with these links? The important question is what defines a sentence.

This will work with what you have, in addition to the first sentence after a p> and a sentence at the start of the string:

preg_match('/
   (?:           # match, but do not capture any of
   ^             # the start of the string
   |p>\s*        # or an opening or closing p tag followed by any number of spaces
   |[\.!?] )     # or sentence punctuation followed by a space
   (             # capture
   [A-Z]         # a capital letter
   .*?           # followed by any characters until
   <\/a>         # a closing anchor tag
   .*?           # followed by any characters until
   [.?!])        # closing punctuation
/x', $item, $matches);
Sign up to request clarification or add additional context in comments.

Comments

0

This is called "greedy matching". It means that regex engines usually match ALL characters that the regular expression is valid for. In your example, you have to limit the START of the regex so it won't greedy-match different sentences.

Try this:

[^.!?]*<\s*a[^>]+>([^<]*)</a>[^.?!]*[.?!]

It should match the whole sentence and nothing more.

Hope this helps.

Comments

0

You might want to look into a DOM Parser instead:

For example: http://simplehtmldom.sourceforge.net/

Example from their site:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
    echo $element->src . '<br>';

2 Comments

You can't use a DOM parser to locate a sentence in text.
Ah yeah. I misunderstood the question completely. I thought he wanted to parse the html and get the sentence between two tags.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.