I need to crawl several URLs and grab their contents into a DB.
the crawled data must contain both the HTML and external CSS and JS files.
I used Nokogiri to grab CSS with no problem but am unable to get the Javacript as easily..
here is my relevant code:
...
arrJS = []
page = Nokogiri::HTML(open(url))
page.css('script').map {|link| arrJS << link['src'].to_s}
...
when I use this on a site like yahoo.com - I get a wierd arrJS array that has no relevance to the javascripts on the html.
any thoughts?
<script>tags without asrcattribute)scripttags don't have asrcattribute. Also you should useeachinstead ofmaphere.