0

Is it possible to achieve infinite read scrap using just get method. for example,

http://www.justdial.com/Ahmedabad/Bearing-Dealers/ct-302676

gives following link for each page when we scroll-down

http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Ahmedabad&search=Bearing+Dealers&where=&catid=302676&psearch=&prid=&page=2&SID=&mntypgrp=0&toknbkt=&bookDate=&jdsrc=

http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Ahmedabad&search=Bearing+Dealers&where=&catid=302676&psearch=&prid=&page=3&SID=&mntypgrp=0&toknbkt=&bookDate=&jdsrc=

so far my code looks like:

import requests

def readJustDial(c):
    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
           'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
           'Accept-Encoding': 'none',
           'Accept-Language': 'en-US,en;q=0.8',
           'Connection': 'keep-alive'}    
    for i in range(1,10):
        url = 'http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city='+str(c)+'&search=Bearing+Dealers&where=&catid=302676&psearch=&prid=&page='+str(i)+'&SID=&mntypgrp=0&toknbkt=&bookDate=&jdsrc='
        page = requests.get(url,hdr)

def main(): #this is main function of this program
    allCities=["Ahmedabad","Hyderabad","Bangalore","Kolkata","Chennai","Mumbai","Delhi-NCR","Pune"]
    for city in allCities:
        readJustDial(city)
        #print(city)

if __name__ == "__main__":
    main()    

also please suggest any changes I can make to my existing code. I am just learning python so any suggestions will be good.

5
  • Looks fine to me. How is your code not meeting your needs? Commented Nov 27, 2016 at 12:24
  • @Ouroborusaccess denied...also if you try to open that page (with page 2 or 3) you can't actually open it in your browser..it will show empty list..why? Commented Nov 27, 2016 at 12:26
  • Looks like referer and cookies are required. Commented Nov 27, 2016 at 12:34
  • @Ouroborussorry can you please give more idea about it? I have not worked much on referer or cookies before. I know it is possible using selenium, but it looks clumsy solution to me. Commented Nov 27, 2016 at 12:37
  • 1
    The requests documentation has a section on cookies. Referer can be treated like any other header. Google can help you work out the details. Commented Nov 27, 2016 at 12:40

1 Answer 1

1

Try to imitate the headers that come with a normal working browser xhr request. You can view those headers using a browser's developer tools (I use chrome's). When I look at the request, I see that it is sent with these headers:

Accept:application/json, text/javascript, */*; q=0.01
Accept-Encoding:gzip, deflate, sdch
Accept-Language:he-IL,he;q=0.8,en-US;q=0.6,en;q=0.4
Connection:keep-alive
Cookie:f5avrbbbbbbbbbbbbbbbb=BEKIPCFANCEHKADKNPJJJLHGCDKJOEEGKIIEPAAPHGEDJDNKFFBPCKEGMMIAECHOLECIMLJDAICKIFECEPMNKJNMKIDIMHPCOMHNNHMANENHHKEGMABPKFGKBAPGCHCJ; ppc=; PHPSESSID=bh34mlv2ba4gmgbntjtsjtt753; www=1712105664.20480.0000; _gat=1; scity=Ahmedabad; sarea=; dealBackCity=Ahmedabad; inweb_city=Ahmedabad; profbd=0; bdcheck=1; _ga=GA1.2.338746795.1480258713; tab=toprs; BDprofile=1; prevcatid=302676; view=lst_v; main_city=Ahmedabad
Host:www.justdial.com
Referer:http://www.justdial.com/Ahmedabad/Bearing-Dealers/ct-302676
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36
X-Requested-With:XMLHttpRequest

Try sending a request with these headers, except the cookies (usually they work only temporarily).

If that doesn't work either, you'll need the cookies. You can either use a browser (using selenium, for example), or do some reverse engineering of the webpage or the cookies and try to write a method for getting working cookies.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.