0

I want to scrape a star based rating, that is the corresponding code

<div class="product_detail_info_rating_stars">
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star"></div>
</div>

Every rating has this codesnippet. I am looking for a way to convert these snippets into numbers like this one would be a 4 (4 of 5 stars).

The way that comes to my mind is to match the whole block for each rating and then match the full class and count it, but maybe there is a better way that I am not seeing.

Is there a better way to solve this problem?

Thanks!

4
  • 1
    What have you tried so far? What DOM library are you using? Why do you think you need a regexp? Commented Oct 16, 2012 at 9:21
  • 1
    stackoverflow.com/questions/1732348/… You really ought to use a proper HTML parser, there's even one built into PHP (DOMDocument). Commented Oct 16, 2012 at 9:24
  • I am not using a DOM library as it is just a small scraping script for a wordpress plugin. I am currently working on the regex to match the inner divs and then i would loop through the matches and search for full. /<div class="product_detail_info_rating_stars">(<div class="product_detail_star( full)?">)+</div><\/div>/msU is what i've got so far, needs testing though as I am not fluent at all in RegEx. Commented Oct 16, 2012 at 9:25
  • @GordonM I'll look into the parser, thanks. Commented Oct 16, 2012 at 9:31

1 Answer 1

2

Here is a quick example of how you can use SimpleXML parser and XPath.

// Get your page HTML string
$html = file_get_contents('1page.htm');

// To suppress invalid markup warnings
libxml_use_internal_errors(true);

// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);

// Find a nodes
$blocks = $xml->xpath('//div[contains(@class, "product_detail_info_rating_stars")]');

foreach ($blocks as $block)
{
    $count = 0;
    foreach ($block->children() as $child) {
        if ($child['class'] == 'product_detail_star full') {
            $count++;
        }
    }
    echo '<pre>'; print_r('Rating: ' . $count . ' of ' . $block->count()); echo '</pre>';
}

// Clear invalid markup error buffer
libxml_clear_errors();

For test html page like this:

<!doctype html>
<html>
<head></head>
<body>

<table>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
</table>

</body>
</html>

It will output something like:

Rating: 1 of 5
Rating: 2 of 5
Rating: 4 of 5

Play with this to adjust to your needs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.