I can't get my scraper to return the specific content I'm looking for. If I return $output, I see digg as though it's being hosted on my server, so I know I'm accessing the site properly, I'm just not able to then access elements from the new DOM. What am I doing wrong?
<?php
include('simple_html_dom.php');
function curl_download($url) {
$ch = curl_init(); //creates a new cURL resource handle
curl_setopt($ch, CURLOPT_URL, "http://digg.com"); // Set URL to download
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0"); // Set a referer
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true ); // Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_HEADER, 0); // Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_TIMEOUT, 10); // Timeout in seconds
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
}
$html = new simple_html_dom();
$html->load($output, true, false );
foreach($html->find('div.digg-story__kicker') as $article) {
$article_title = $article->find('.digg-story__kicker')->innertext;
return $article_title;
}
echo $article_title;
?>
Edit: Okay, dumb mistake, I'm calling the function now:
$html = curl_download('http://digg.com')
and if I echo $html I'm seeing the "mirrored site", but when I use str_get_html($html) which simple_html_dom.php says will //get html dom from stringI keep getting this error message:
Fatal error: Call to a member function str_get_html() on null in /home/andrew73124/public_html/scraper/scraper.php on line 31
curl_downloadbut that never gets called and it doesn't return any value either so it is unclear where$outputvariable comes from$output=curl_download('http://digg.com')before$html = new simple_html_dom();$html->load($output, true, false );<?php foreach(@DOMDocument::loadHTML(file_get_contents('http://digg.com/'))->getElementsByTagName("div") as $div){ if($div->getAttribute("class")!=='digg-story__kicker'){ continue; } var_dump($div->textContent); }- literally just that, no curl, no simple_html_dom.php, no nothing, just that.