0

I am running below code, but i am getting an empty list. Can you please help me in find out the issue.

execution: xvfb-run python dynamic_scrapy.py

import sys
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage
import bs4 as bs


class Client(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def on_page_load(self):
        self.app.quit()

url = "https://pythonprogramming.net/parsememcparseface/"
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
print(soup)
js_test = soup.find_all('p', class_='jstest')
print(js_test)
6
  • have you tried printing source. Are you getting any data? Commented Apr 3, 2017 at 10:25
  • As i see, this is not a dynamic content as class jstest exists in the source. Commented Apr 3, 2017 at 10:26
  • please inspect this Look at you shinin! these line. Commented Apr 3, 2017 at 10:27
  • if you do view page source. Please find out these code: <p class='jstest' id='yesnojs'>y u bad tho?</p> <script> document.getElementById('yesnojs').innerHTML = 'Look at you shinin!'; </script> Commented Apr 3, 2017 at 10:29
  • if i do print(source) i am getting error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u1d90' in position 6781: ordinal not in range(128) Commented Apr 3, 2017 at 10:31

1 Answer 1

1

You need to convert the QString into string to pass it into BeautifulSoup. You can do something like this :

import sys
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage
import bs4 as bs

class Client(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def on_page_load(self):
        self.app.quit()

url = "https://pythonprogramming.net/parsememcparseface/"
client_response = Client(url)

source = client_response.mainFrame().toHtml()
source_utf = unicode(source.toUtf8(), encoding="UTF-8") # Added
soup = bs.BeautifulSoup(source_utf, 'lxml')
js_test = soup.find_all('p', class_='jstest')
print(js_test)

This will result in :

[<p class="jstest" id="yesnojs">Look at you shinin!</p>]
Sign up to request clarification or add additional context in comments.

10 Comments

getting error: NameError: name 'unicode' is not defined
You using python3 ?
can you please help me.
can you please help me with these. i am struggling last five days
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.