I'm trying to web scrape an ecommerce website. However, the page is dynamic. Within the html source code is the script that generates a json format of the products.
My code is
from bs4 import BeautifulSoup, SoupStrainer
import requests
import json
url = "https://www.lazada.com.ph/chuwi-pilipinas/?q=All-Products&langFlag=en&from=wangpu&lang=en&pageTypeId=2"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data,'html.parser')
scripts = soup.find_all('script')
jsonObj = None
for script in scripts:
if 'window.pageData = ' in script.text:
jsonStr = script.text
jsonStr = jsonStr.split('window.pageData = ')[1]
jsonObj = json.loads(jsonStr)
products = jsonObj['mods']['listItems']
for item in products:
print (item['productUrl'])
the result is:
PS C:\Users\nate\Documents\Python\LazadaScapper> & "C:/Program Files/Python39/python.exe" c:/Users/nate/Documents/Python/LazadaScapper/LazadaScraper3.py
Traceback (most recent call last):
File "c:\Users\nate\Documents\Python\LazadaScapper\LazadaScraper3.py", line 21, in <module>
products = jsonObj['mods']['listItems']
TypeError: 'NoneType' object is not subscriptable
PS C:\Users\nate\Documents\Python\LazadaScapper>
I did a research and it seems that for loop doesn't work thus, dictionary products is empty.
This is related to this thread that was posted 2 years ago but not working anymore.
I'm new at python and still studying, I hope you guys can help me.