0

I am trying to scrape a website for product names.

My controller does the following:

page = Nokogiri::HTML(open(PAGE_URL))
@items_array = page.css("li.item h3")

Then displaying it in the view as:

<%= @items_array.each do |item| %>
<%= item.text %><br /><br />
<% end %>

The problem is that the HTML is only loaded for the first 10 items. The rest is generated by JavaScript. I can't seem to figure out how exactly.

Any ideas on how to scrape the rest of the content is much appreciated!

2 Answers 2

1

It won't work. Nokogiri cannot scrape anything that is not on the page, and for what I can see (using "view source" on my browser), a good part of the list is not HTML. How is it loaded is irrelevant in this case (probably using JavaScript).

Best option would be to ask them if they expose an API you could use (that would make your work much easier).

Scrapping is very fragile as it depend on the exact layout of the page.

Sign up to request clarification or add additional context in comments.

1 Comment

Should definitely ask if you can use their information, even if they don;t have an API. Likelihood is that they will welcome it as long as you link back to their page.
0

You need to use Web drivers with headless like, https://github.com/watir/watir-webdriver

http://watirwebdriver.com/headless/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.