Basically, a page generates some dynamic content, and I want to get that dynamic content, and not just the static html. I am not being able to do this with cURL. Help please.
-
1that's impossible using just curl...dandavis– dandavis2013-06-12 22:30:21 +00:00Commented Jun 12, 2013 at 22:30
-
You need to find out where to properly get it from. Likely, the js is either making an ajax call which you can curl to scrape the data or it's hardcoded in a js/html file that's loaded with the normal page load.Matt Berkowitz– Matt Berkowitz2013-06-12 22:58:50 +00:00Commented Jun 12, 2013 at 22:58
2 Answers
You can't with just cURL.
cURL will grab the specific raw (static) files from the site, but to get javascript generated content, you would have to put that content into a browser-like envirionment that supports javascript and all other host objects that the javascript uses so the script can run.
Then once the script runs, you would have to access the DOM to grab whatever content you wanted from it.
This is why most search engines don't index javascript-generated content. It's not easy.
If this is one specific site that you're trying to gather info on, you may want to look into exactly how the site gets the data itself and see if you can't get the data directly from that source. For example, is the data embedded in JS in the page (in which case you can just parse out that JS) or is the JS obtained from an ajax call (in which case you can maybe just make that ajax call directly) or some other method.