0

I am trying to send some json requests for scraping an infinite scroll box like this link. Its json link is:

http://www.marketwatch.com/news/headline/getheadlines?ticker=XOM&countryCode=US&dateTime=12%3A00+a.m.+Nov.+8%2C+2016&docId=&docType=2007&sequence=6e09aca3-7207-446e-bb8a-db1a4ea6545c&messageNumber=1826&count=10&channelName=%2Fnews%2Fpressrelease%2Fcompany%2Fus%2Fxom&topic=&_=1479366266513

Some of the parameters are not neccesary and I created a dictionary of effective parameters. For example,the parameter Count is the number of items that are shown in each scrolling. My code is :

import json
import requests

parameters = {'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }
data = json.dumps(parameters)
firstUrl = "http://www.marketwatch.com/investing/stock/xom"
html = requests.post(firstUrl, params = data).text 

My problem is that I cannot send the requests according to the parameters, when I remove all parameters, I get the same page (firstUrl link) as if I include all of them. Do you have any idea why it happens and how I can fix this problem?

2
  • I guess, content that you want to scrap couldn't be received via single request (even if you specify count:1000) as each time you make another scroll, your browser send new XHR request for another (10 entries) piece of data. Commented Nov 17, 2016 at 16:42
  • Thank you Anderson, my problem is that even without defining any parameter, I get the same result which is the main page and not the container that I am interested in(there are 3 different infinite scroll boxes and I am interested in one of them ), I am giving the parameters of that specific element but it couldn't detect it . Commented Nov 18, 2016 at 7:36

2 Answers 2

1

I think the firstUrl you are using is not correct. Moreover you should use requests.get instead of post. You should send the same parameters as in your link.

import json
import requests

parameters = {'ticker':'XOM', 'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }
firstUrl = "http://www.marketwatch.com/news/headline/getheadlines"
html = requests.get(firstUrl, params = parameters)
print(json.loads(html.text)) # array of size 10
Sign up to request clarification or add additional context in comments.

1 Comment

@ vtni : thank you so much, it works very well and I realle appreciate your help.
0

params expects a Python dictionary, not a string, so you should directly pass parameters:

parameters = {'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }

html = requests.post(firstUrl, parameters).text

Also, make sure that you should actually be using post and not get.

1 Comment

@DeepSpapce, thank you so much , I directly inserted parameters but it didn't change the result. when I change count from 10 to 100, I expect to get 100 items, but it's still the same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.