Getting inner HTML with python's selenium not working

Question

I am trying to scrape a JS enabled webpage, but I am not able to access the HTML code that is seen in my web browser. I am successfully logging into and navigating to the relevant url. However, getting the inner html is not working.

from selenium import webdriver

browser = webdriver.Chrome("path-to-webdriver")

page = browser.get(url)
inner_html = browser.execute_script("return document.body.innerHTML")
print(inner_html)

Below is the part of the HTML code that I want to become accessible; it is inside the first <div></div> tags. The JS script generating the content is found below. The output of my python script contains no extra information compared to the HTML code presented below.

So, how I can get the inner HTML code of this page?

<div class="divmyTrReport" id="divmyTrReport">
    </div>
    <script>
     function loadForm()
    {
                $('#divmyTrReport').html('<img src="/jottonia/gfx/ajaxbar.gif">' );
                $.get( "/jottonia/news/jottoniantimes/frontpageo.jsp", function( data ) {
                    $('#divmyTrReport').html(data );
                });       
    }
            $(document).ready(function(){
                loadForm('');
            });
    </script>

Edit:

The part of the HTML I want is printed below, particularly the "Last update:" part.

<html><head>
 <div id="divContent1" class="clearfix">
<div id="divmyTrReport" class="divmyTrReport">
<title>Jottonian Times</title>
<p>
</p>
<p>&nbsp;</p>
<table width="610" border="0" cellspacing="0" cellpadding="0">
  <tbody><tr>
    <td colspan="2"><img src="img/logo.jpg" alt="The Jottonian Times"></td>
  </tr>
  <tr>
    <td colspan="2"><img src="img/invisible.gif" width="10" height="5"></td>
  </tr>
  <tr>
    <td colspan="2"><table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#9A9A9A">
        <tbody><tr>
          <td><table width="100%" border="0" cellspacing="1" cellpadding="0">
              <tbody><tr>
                <td bgcolor="#EBEBEB"> <div align="center">
                    <table width="600" border="0" cellpadding="0" cellspacing="0">
                      <tbody><tr>
                        <td><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">
                          &nbsp;Jottonian time: 2018-02-26 09:24 </font></td>
                        <td> <div align="center"><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">

                            Last update:
                             166:24 hours ago</font></div></td>
                        <td> <div align="right"><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">Issues: Quite some&nbsp; </font></div></td>
                      </tr>
</body></html>

Running this

news_page = browser.get(news_url)
inner_html = wait(browser, 20).until(lambda browser: browser.find_element_by_id("divContent1").get_attribute("innerHTML").strip())
print(inner_html)

results in

<div id="divmyTrReport" class="divmyTrReport"><img src="/jottonia/gfx/ajaxbar.gif"></div>

 <script>
    function loadForm()
    {
                $('#divmyTrReport').html('<img src="/jottonia/gfx/ajaxbar.gif">' );
                $.get( "/jottonia/news/jottoniantimes/frontpageo.jsp", function( data ) {
                    $('#divmyTrReport').html(data );
                });       
    }
            $(document).ready(function(){
                loadForm('');
            });
      </script>





 <script type="text/javascript">
<!--
    $( document ).ready(function() {
        newMail(1);
    });
//-->
</script>

Andersson · Accepted Answer · 2018-02-26 09:15:16Z

1

If you want to get innerHTML which is generated dynamically you can try below code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait

browser = webdriver.Chrome("path-to-webdriver")

page = browser.get(url)
inner_html = wait(browser, 10).until(lambda browser: browser.find_element_by_id("divmyTrReport").get_attribute("innerHTML").strip())
wait(browser, 10).until(lambda browser: browser.find_element_by_id("divmyTrReport").get_attribute("innerHTML").strip() != inner_html)
inner_html = browser.find_element_by_id("divmyTrReport").get_attribute("innerHTML")
print(inner_html)

This should allow you to wait up to 10 seconds (increase timeout if needed) until innerHTML of target div returned non-empty value

edited Feb 26, 2018 at 9:15

answered Feb 25, 2018 at 16:44

Andersson

52.8k18 gold badges83 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Yoda Over a year ago

This results in something, but not the same code I can inspect in my browser. This gives "<img src="/jottonia/gfx/ajaxbar.gif">".

Andersson Over a year ago

And what is your desired output?

Andersson Over a year ago

So content of target div is changed two times: 1) Interim image, e.g. loading or something 2) Table with data... And you want to get table, right?

Yoda Over a year ago

Yes, the table.

Andersson Over a year ago

No. First time we are waiting for appearing non-empty content (the image) and assign this content to inner_html variable. Second time we are waiting for inner_html to change its value (second wait returns boolean True/False). With third line we assign new value to inner_html

|

Yakir Tsuberi · Accepted Answer · 2018-02-26 10:58:17Z

0

If you want to inner html like javascript you need to behave like javascript for example:

browser.execute_script('''document.getElementById("divmyTrReport").innerHTML = '<img src="/jottonia/gfx/ajaxbar.gif">';''')

edited Feb 26, 2018 at 10:58

answered Feb 25, 2018 at 16:01

Yakir Tsuberi

2472 silver badges9 bronze badges

1 Comment

Yoda Over a year ago

You are missing a "(", I think. Not sure where to place it. If I place it such that I get innerHTML =( '<img src="/jottonia/gfx/ajaxbar.gif">'), then I get "None" returned.

Collectives™ on Stack Overflow

Getting inner HTML with python's selenium not working

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related