0

I am trying to scrape this page for rental listings using a Ruby script. Some of the methods that I have tried unsuccessfully are using Nokogiri and Mechanize however the browser only loads 14 listings the rest are loaded through what I presume is embedded javascript. I have briefly looked at rkelly with no luck in reading through the classes available.

Here is what I have so far:

##First Solution only returned 14 Results
require 'mechanize'
require 'nokogiri'
require 'open-uri'

url = "http://streeteasy.com/for-rent/soho/"

listings = Nokogiri::HTML(open(url))

# agent = Mechanize.new
# agent.get(url)
# pp signin_page = agent.page.link_with(:text => 'Sign In').click
# # pp signin_page.forms

listing_sorted = listings.css('.item_inner')

object = listing_sorted.map do |listing|
    object = {}
        object[:address] = listing.css("div.details_title a").first.inner_html
        object[:price] = listing.css("span.price").inner_html.gsub(/[^0-9.]/, '')
    object
end

sorted_object = object.sort! { |a,b| a[:price].to_i <=> b[:price].to_i }.last 20


puts @json_object = sorted_object.to_json
puts "There are #{sorted_object.length} listings"

There is also an xls file that you can export the listings to however you need to be logged in and the sign in is a javascript modal, so im really reaching a sticking point here. What would be the best way to approach this problem.

2
  • I'm looking at that page now, and I can see the data for page 2 listings right there in the response. Commented Nov 5, 2014 at 6:33
  • If you double check the listings that return you only get a partial response using curl requests. 1/3 of the listings per page are rendered using Javascript. I was able to use Watir to open a browser to grab the rest of the listings. Commented Feb 10, 2015 at 17:18

2 Answers 2

1

What I managed to do is use Watir, a Ruby Wrapper for Selenium to open the page in a browser and then pass the loaded html into Nokogiri for parsing.

Sign up to request clarification or add additional context in comments.

Comments

0

You can numerate the links of http://streeteasy.com/for-rent/soho?page=n with n going from 1 to the maximum page number. Then you can collect all the listings from the webpage.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.