1

This is my regular expression:

$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(?<price>([0-9.]*)).*?)\$(.*?)(\n|\s)*?</";

This is the sample pattern from which I have to do a match:

<td><strong>.zx</strong></td><td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399</td><td>zxcddcdcdcdc</td></tr><tr class="dark"><td><strong>.aa.rr</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&eae;s $199</td><td>xxxx</td></tr><tr class="bar"><td colspan="3"></td></tr><tr class="bright"><td><strong>.vfd</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>du&ee;s $199</td><td>xxxxxxxx</td></tr><tr class="dark"><td><strong>.qwe</strong></td><td><span class="offer"><strong>xxx<br></strong>$99 xxxc;o<span class="fineprint_number">2</span>

Here is what I am doing in PHP

$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?</";
$source = file_get_contents("https://www.abc.com/sources/data.txt");
preg_match_all($pattern_new, $source, $match_newprice, PREG_PATTERN_ORDER);
echo$source;
print_r($match_newprice);

the$match_newprice is returning an empty array.

When I am using a regex tester like myregextester or solmetra.com I am getting a perfect match no issues at all but when I am using php preg_match_all to do the match it is returning an empty array. I increased the pcre.backtrack_limit but its still the same issue. I don't seem to understand the problem. Any help would be much appreciated.

3
  • 2
    I assume you were trying to do a noncapture group for <price... but you missed the :... (?:<price.. Commented Jun 20, 2013 at 20:20
  • No that does solve the issue because I am mining another pattern '$pattern_do="/<strong>(?<do>\.(<\w*>)?(.*?)(<\/\w*>)?)<\/strong>/";' works perfectly. Update: the above pattern works fine when I use perl but my requirement is in PHP Commented Jun 20, 2013 at 20:23
  • What is this <span(\n|\s|.)*?<\/strong>? Maybe you meant <span[\\s]+.*?<\/strong>. Why use () instead of []. I think you need DOMDocument and DOMXPath to help you as you're kind of breaking your neck with this convoluted and quite bad regexp. Commented Jun 20, 2013 at 20:25

3 Answers 3

2

I assume you were trying to do a noncapture group for <price... but you missed the :. Or you should take out the question mark. If the price group is optional, try like the regex below. You should use the following website to help you with regex. I find it extremely helpful.

<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?<

Regular expression image

Edit live on Debuggex

In the above example, your first match would have the following captures:

0: "<td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399<"
1: ""
2: "<span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s "
3: ">"
4: ""
5: ""
6: "299"
7: "399"
8: ""

Is this what you are looking for?

Sign up to request clarification or add additional context in comments.

6 Comments

I put the price there so that when I do a preg_match_all I get in the array with a name like this:'preg_match_all($pattern_new, $source, $match_newprice, PREG_PATTERN_ORDER);' and the output is in 'match_newprice( [0] => Array() price=>array())' in this way I can automate the process and dont have to look back
so is the price bit optional? I'll update the post with what I think you are looking for now. also, perhaps it would help if you show what text you are hoping will match exactly.
I am trying to capture the 299, 99, 99, 99 values
The most recent update reaches the numbers (and puts them in a capture group) as you can see if you click "Edit live on Debuggex". Is this what you are going for?
Yes that is what I was going for but the issue here is I am not able capture the numbers when I use php preg_match_all I am not able to capture the numbers. It works when I am using PERL.
|
1

Another problem which is PHP related with this:

<?php
echo "\$".PHP_EOL;
echo '\$'.PHP_EOL;

Result:

$
\$

... as in double quoted strings the $ is expected to signify the start of a variable, and needs escaping if you mean a bare $. Put single quotes around your regex & it will probably be fine (haven't looked at in detail though, you may want to use the /x option & add some formatting whitespace/comments if you need to debug this a half year from now).

1 Comment

Perfect,thank you very much I was raking my brain since yesterday to solve this issue. Putting Single quotes worked like a charm. I think now I understand the issue. Thanks.
1

The good way to do that:

$oProductsHTML = new DOMDocument();
@$oProductsHTML->loadHTML($sHtml);

$oSpanNodes = $oProductsHTML->getElementsByTagName('span');

foreach ($oSpanNodes as $oSpanNode) {
    if (preg_match('~\boffer\b~', $oSpanNode->getAttribute('class')) &&
        preg_match('~\$\K\d++~', $oSpanNode->nodeValue, $aMatch) )
    {
        $sPrice = $aMatch[0];
        echo '<br/>' . $sPrice;
    }
}

$sHtml stands for your string.

And i'm sure you can make it shorter with XPath.

The bad way:

$sPattern = '~<span class="offer\b(?>[^>]++|>(?!\$))+>\$\K\d++~';
preg_match_all($sPattern, $sHtml, $aMatches);

print_r ($aMatches[0]);

Notice: \d++ can be replaced by \d++(?>\.\d++)? to allow decimal numbers.

1 Comment

This solution also works. Thanks. I will use it the next time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.