I am trying to scrape a JS enabled webpage, but I am not able to access the HTML code that is seen in my web browser. I am successfully logging into and navigating to the relevant url. However, getting the inner html is not working.
from selenium import webdriver
browser = webdriver.Chrome("path-to-webdriver")
page = browser.get(url)
inner_html = browser.execute_script("return document.body.innerHTML")
print(inner_html)
Below is the part of the HTML code that I want to become accessible; it is inside the first <div></div> tags. The JS script generating the content is found below. The output of my python script contains no extra information compared to the HTML code presented below.
So, how I can get the inner HTML code of this page?
<div class="divmyTrReport" id="divmyTrReport">
</div>
<script>
function loadForm()
{
$('#divmyTrReport').html('<img src="/jottonia/gfx/ajaxbar.gif">' );
$.get( "/jottonia/news/jottoniantimes/frontpageo.jsp", function( data ) {
$('#divmyTrReport').html(data );
});
}
$(document).ready(function(){
loadForm('');
});
</script>
Edit:
The part of the HTML I want is printed below, particularly the "Last update:" part.
<html><head>
<div id="divContent1" class="clearfix">
<div id="divmyTrReport" class="divmyTrReport">
<title>Jottonian Times</title>
<p>
</p>
<p> </p>
<table width="610" border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td colspan="2"><img src="img/logo.jpg" alt="The Jottonian Times"></td>
</tr>
<tr>
<td colspan="2"><img src="img/invisible.gif" width="10" height="5"></td>
</tr>
<tr>
<td colspan="2"><table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#9A9A9A">
<tbody><tr>
<td><table width="100%" border="0" cellspacing="1" cellpadding="0">
<tbody><tr>
<td bgcolor="#EBEBEB"> <div align="center">
<table width="600" border="0" cellpadding="0" cellspacing="0">
<tbody><tr>
<td><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">
Jottonian time: 2018-02-26 09:24 </font></td>
<td> <div align="center"><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">
Last update:
166:24 hours ago</font></div></td>
<td> <div align="right"><font size="-2" face="Verdana, Arial, Helvetica, sans-serif">Issues: Quite some </font></div></td>
</tr>
</body></html>
Running this
news_page = browser.get(news_url)
inner_html = wait(browser, 20).until(lambda browser: browser.find_element_by_id("divContent1").get_attribute("innerHTML").strip())
print(inner_html)
results in
<div id="divmyTrReport" class="divmyTrReport"><img src="/jottonia/gfx/ajaxbar.gif"></div>
<script>
function loadForm()
{
$('#divmyTrReport').html('<img src="/jottonia/gfx/ajaxbar.gif">' );
$.get( "/jottonia/news/jottoniantimes/frontpageo.jsp", function( data ) {
$('#divmyTrReport').html(data );
});
}
$(document).ready(function(){
loadForm('');
});
</script>
<script type="text/javascript">
<!--
$( document ).ready(function() {
newMail(1);
});
//-->
</script>