0

I have a my HTML which looks like below. I would like to get the text which is in the <span class="zzAggregateRatingStat">. According to the e.g given below I would get 3 and 5.

For this work I am using Python2.7 & lxml

<div class="pp-meta-review">
<span class="zrvwidget" style="">
    <span g:inline="true" g:type="NumUsersFoundThisHelpful" g:hideonnoratings="true" g:entity.annotation.groups="maps"    g:entity.annotation.id="http://maps.google.com/?q=Central+Kia+of+Irving++(972)+659-2204+loc:+1600+East+Airport+Freeway,+Irving,+TX+75062&gl=US&sll=32.83624,-96.92526" g:entity.annotation.author="AIe9_BH8MR-1JD_4BhwsKrGCazUyU5siqCtjchckDcg5BAl5rOLd9nvhJJDTrtjL-xFI8D42bD_7">
        <span class="zzNumUsersFoundThisHelpfulActive" zzlabel="helpful">
            <span>
                <span class="zzAggregateRatingStat">3</span>
            </span>
            <span>
                <span>&nbsp;</span>
                      out of
                <span>&nbsp;</span>
            </span>
            <span>
                <span class="zzAggregateRatingStat">5</span>
            </span>
            <span>
                <span>&nbsp;</span>
                    people found this review helpful.
            </span>
       </span>
   </span>
</span>
</div>
3
  • 1
    get the text which is in the . <-- finish this sentence please Commented Mar 28, 2012 at 13:36
  • 1
    ... and finish the question by showing what you have tried. Commented Mar 28, 2012 at 13:47
  • Im really sorry for the typo. Stackoverflow took that as a HTML tag Commented Mar 28, 2012 at 15:09

2 Answers 2

4

The following code works with your input:

import lxml.html
root = lxml.html.parse('text.html').getroot()
for span in root.xpath('//span[@class="zzAggregateRatingStat"]'):
    print span.text

it prints:

3
5

I prefer using lxml's xpath over CSSSelectors though they can both do the job.

ChrisP's example prints 3 but if you run it on your actual input we get errors:

$ python chrisp.py
Traceback (most recent call last):
  File "chrisp.py", line 6, in <module>
    doc = fromstring(text)
  File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48270)
  File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71812)
  File "parser.pxi", line 1424, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70673)
  File "parser.pxi", line 938, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:67442)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63824)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64745)
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64088)
lxml.etree.XMLSyntaxError: EntityRef: expecting ';', line 3, column 210

ChrisP's code can be changed to use lxml.html.fromstring - which is a more lenient parser - instead of lxml.etree.fromstring.

If this change is made it prints 3.

Sign up to request clarification or add additional context in comments.

2 Comments

Hey thanks for the reply, I am not entirely able to put down your code for the web site maps.google.com/maps/…. It keeps on giving different errors
changing lxml.etree.fromstring to lxml.html.fromstring works! tanx! only problem is that you don't have pretty_print option in lxml.html :(
0

This is clearly documented at the lxml website

from lxml.etree import fromstring
from lxml.cssselect import CSSSelector

sel = CSSSelector('.zzAggregateRatingStat')
text = '<span><span class="zzAggregateRatingStat">3</span></span>'
doc = fromstring(text)
el = sel(doc)[0]
print el.text

2 Comments

thank u for the answer, i've been trying this code on the web site maps.google.com/maps/… but all in vain, can you please look into it
@Zulaikha, if you are trying to get ratings for businesses, you may want to look into the APIs available from Google and Yelp rather than scraping pages.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.