I have some experience with web scraping and API, however I'm not able to search the proper API do to do so in this website:
https://www.giga.com.vc/Bebida obs: /Bebida is just a category like "/Drinks"
The issue is, I found several APIs but they are for one product only, or they even are for some products, but I can't seem to find the right rules to paginate it with proper categories or pages and iterate through category products getting prices, EANS etc.
import requests
import pandas as pd
from bs4 import BeautifulSoup
Ex: This works, but the format is horrible:
print(requests.get('https://www.giga.com.vc/padaria?initialMap=c&initialQuery=padaria&map=category-1&page=1').content)
or
urlx = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&operationName=Products&variables=%7B%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%2249a77e3e2082563773aff56ad9c0432d59302e86fd1baaad9ca0f4bca2630d46%22%2C%22sender%22%3A%22vtex.store-resources%400.x%22%2C%22provider%22%3A%22vtex.search-graphql%400.x%22%7D%2C%22variables%22%3A%22eyJoaWRlVW5hdmFpbGFibGVJdGVtcyI6ZmFsc2UsInNrdXNGaWx0ZXIiOiJBTExfQVZBSUxBQkxFIiwiaW5zdGFsbG1lbnRDcml0ZXJpYSI6Ik1BWF9XSVRIT1VUX0lOVEVSRVNUIiwiY2F0ZWdvcnkiOiIiLCJjb2xsZWN0aW9uIjoiMTYvIiwic3BlY2lmaWNhdGlvbkZpbHRlcnMiOltdLCJvcmRlckJ5IjoiIiwiZnJvbSI6MCwidG8iOjExfQ%3D%3D%22%7D'
r = requests.get(urlx)
for x in r.json()['data']['products']:
print(x)
As well this also works:
url2 = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&__bindingId=3f6e91e6-44f2-4fb0-a2d9-e238b53082e0&operationName=ProductRecommendations&variables=%7B%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%22e5782bd9e8bc64d337a7d7f96b9c280b462cdb0754d15b415192dac2755ad280%22%2C%22sender%22%3A%22vtex.shelf%401.x%22%2C%22provider%22%3A%22vtex.search-graphql%400.x%22%7D%2C%22variables%22%3A%22eyJpZGVudGlmaWVyIjp7ImZpZWxkIjoiaWQiLCJ2YWx1ZSI6IjE0NzUyMyJ9LCJ0eXBlIjoidmlldyJ9%22%7D'
requests.get(url2).json()['data']['productRecommendations']
Expected output something like this:
r = requests.get(urlx)
for items in r.json()['data']['products']:
prd_dict = {
'product_id': items['productId'],
'price': items['priceRange']['sellingPrice']['highPrice'],
'product_name': items['productName'],
'category_id': items['categoryId'],
'ean': items['items'][0]['ean'],
'box_qty': items['specificationGroups'][0]['specifications'][0]['values']
}
print(prd_dict)
raw output:
{'product_id': '141917', 'price': 20.54, 'product_name': 'Banana Nanica Kg', 'category_id': '433', 'ean': '4511', 'box_qty': ['0']}
{'product_id': '148077', 'price': 1.45, 'product_name': 'Água de Coco Tradicional Quadrado 200Ml', 'category_id': '148', 'ean': '0751320333650', 'box_qty': ['27']}
variablesinbase64which decoded have'{"hideUnavailableItems":false,"skusFilter":"ALL","simulationBehavior":"default","installmentCriteria":"MAX_WITHOUT_INTEREST","productOriginVtex":false,"map":"c","query":"bebida","orderBy":"OrderByScoreDESC","from":40,"to":59,"selectedFacets":[{"key":"c","value":"bebida"}],"operator":"and","fuzzy":"0","searchState":null,"facetsBehavior":"Static","categoryTreeBehavior":"default","withFacets":false}'"from":40,"to":59which gives 20 values - and page loads 20 items when you click button.variables-text = "eyJoaWRlVW5hdmFp...."and usebase64.b64decode( text.encode() ).decode()