1

So, what i want do is start the browser, get the page content (with the JavaScript rendered) and find the element i want using BeautifulSoup, here's my code:

from selenium import webdriver
from bs4 import BeautifulSoup as bs4
from selenium.webdriver.support.ui import WebDriverWait

browser = webdriver.Edge()
browser.get('https://www.premierleague.com/match/22721')
element = WebDriverWait(browser, 10)
html=bs4(browser.page_source,'html.parser')
print(html.body.main.find('div',attrs={'class':'mcTabs'}))

browser.quit()

I get None from the print statement

3
  • What is the exact problem that you are experiencing? Are you getting an exception, or unexpected output? Try to be more specific about what you need help with. Commented Jul 27, 2018 at 21:47
  • I simply don't get the element that i want from the print statement, it looks like the JavaScript is not being executed. Commented Jul 27, 2018 at 21:49
  • you can directly use this print(driver.page_source) Commented Jul 28, 2018 at 7:34

1 Answer 1

2

First of all you have a typo in your code:

print(html.body.main.find('div',attrs='class':'mcTabs'}))

should be replaced with:

print(html.body.main.find('div',attrs={'class':'mcTabs'})) # { is missing

The second thing:

element = WebDriverWait(browser, 10)

is redundant, since you are not using element anywhere.

And now to the question itself. I'm not very familiar with BeautifulSoup, but what I have found is this:

browser.get('https://www.premierleague.com/match/22721')
# wait for element to be present
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.mcTabs"))) 
# get page source when element is already present
html = bs4(driver.page_source,'html.parser')
print(html.body.main.find('div', attrs={'class':'mcTabs'}).prettify())

Explanation: you are getting page_source of the document which is not full ready yet, that's why you have to wait until div.mcTabs will be present in the DOM and only then get page_source.

Output:

<div class="mcTabs">
 <section class="mcLatestContainer mcMainTab active" data-ui-args='{"type": "latest"}' data-ui-tab="Latest">
  <nav class="tabs" data-built-class="matchLatestContainer" data-script="pl_tabbed" data-tab-class="mcLatestTab" data-tab-wrap=".tabs" data-widget="tabbed-content">
  </nav>
  <div class="matchLatestContainer">
   <nav class="tabs">
    <ul class="tablist" role="tablist">
     <li class="active" data-tab-index="0" role="tab" tabindex="0">
      Latest
     </li>
     <li data-tab-index="1" role="tab" tabindex="0">
      Photos
     </li>
    </ul>
   </nav>
   <div class="blogStreamMatchContainer mcLatestTab active" data-tab-aware-default="true" data-ui-tab="Latest">
    <div class="preMatchContainer" style="display: none;">
     <div class="matchPreviewStreamContainer">
     </div>
     <p class="noContentAvailableContainer" style="display: none;">
      No Content Available
     </p>
    </div>
    <div class="liveMatchContainer" style="">
     <section class="matchBlog">
      <div class="wrapper">
       <div class="mcBlogStream">
        <div class="matchReportStreamContainer" data-report-rendered="true">
         <header>
          <h3 class="subHeader">
           Match summary
          </h3>
         </header>
         <div class="wrapper col-12">
          <div class="standardArticle">
           <p>
            Manuel Lanzini scored twice as West Ham United finished the season with a 3-1 win over Everton.
           </p>
           <p>
            The midfielder opened the scoring from the edge of the area on 39 minutes after latching on to Marko Arnautovic's flick of a Cheikhou Kouyate pass.
           </p>
           <p>
            Arnautovic doubled the lead in the 63rd minute with a fierce shot for his 11th goal of the season.
           </p>
...
Sign up to request clarification or add additional context in comments.

4 Comments

thank you very much, i'm new to this whole asking questions stuff. Anyways your code helped me a lot, have a good one
Is this specific to this particular page, or always the case? For example, even when the page I'm looking at is fully loaded, I see elements on screen that are not found via selenium. I see them via an inspector, but not in view page source. You seem to be solving a problem of waiting to render, but I thought the question was about html hidden behind a javascript generator script and thus they really aren't in the source. Does that make sense, or do I misunderstand?
@Hendy if you can see element in inspector, you can interact with it via Selenium. I guess your element is under frame or iframe element or even shadow-root element. If it is the case, just google it and you will find answer(also in SO)
Interesting. I was trying to download a bunch of files and could see the ID of all the buttons, but browser.find_element_by_id('single-download-btn') found nothing. I ended up copying the inner html from the inspector and working with that directly. I'll refer back to this and try again if I run into the issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.