1

I'm new to Selenium and am struggling to extract data from JSON. I have tried multiple tools and haven't succeeded and coincidentally found that it seems like I could access the data through an API but it is split over thousands and thousands of pages.

I want to make the following actions automatically:

  • Extract "title" or "slug", "reviews", "star_rating", "listing_price", "pretty_price"
  • Extract "next_is_after" and concatenate it with "https://api.takealot.com/rest/v-1-10-0/searches/products,filters,facets,sort_options,breadcrumbs,slots_audience,context,seo?" load it and start the extraction from the beginning. This could happen a couple 100k times judging by the summary in the above.

I would really appreciate any pointers into the right direction. I am already failing at extracting the data. So if you could point me in the right direction with the below it would already be a lot of help.

import requests
res = requests.get('https://api.takealot.com/rest/v-1-10-0/searches/products,filters,facets,sort_options,breadcrumbs,slots_audience,context,seo?').json()

for data in res:
    print(data["next_is_after"])

2
  • It would have been nice if you added a link to the API documentation. This is a very complex JSON response. You need to understand the structure of the JSON file. I recommend making several requests with each category from "section_keys" separately. This will allow you to deal with much smaller and simpler JSON structured responses. Commented Sep 5, 2021 at 17:53
  • 1
    Thank you Daniel. Unfortunately I'm not aware of a public documentation for this API, I only found it by coincidence when looking at Network>XHR in Chrome, so I don't even think the public is supposed to use it like I want to. I will play around with it some more and hopefully I'll figure something out. Commented Sep 5, 2021 at 18:22

1 Answer 1

1

The response json is an object of objects. So you need to fetch the data accordingly.

Solution:

import requests
res = requests.get('https://api.takealot.com/rest/v-1-10-0/searches/products,filters,facets,sort_options,breadcrumbs,slots_audience,context,seo?').json()

next_is_after = res['sections']['products']['paging']['next_is_after']

for data in res['sections']['products']['results']:
    name, slug, reviews, star_rating, listing_price, pretty_price = data['product_views']['core']['title'], data['product_views']['core']['slug'], data['product_views']['core']['reviews'], data['product_views']['core']['star_rating'], data['product_views']['buybox_summary']['listing_price'], data['product_views']['buybox_summary']['pretty_price']
    print(name, slug, reviews, star_rating, listing_price, pretty_price)

print("Next is after : ", next_is_after)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.