0

Im using bs4 for Python, I want to get a json from a web page but its like this:

<script>
vtex.events.addData({"pageCategory":"Product","pageDepartment":"Calzado","pageUrl":"http://www.taf.com.mx/air-force-1-07-lv8-cu8070-100/p","pageTitle":"AIR FORCE 1 07 LV8 | MASCULINO - tafmx","skuStockOutFromShelf":[],"skuStockOutFromProductDetail":["23312","23313","23314","23316","23325","23326","23327","23328"],"shelfProductIds":["140","141","142","3775","3777","3782","3785","545","17","314","318","530","645","801","822","940"],"accountName":"tafmx","pageFacets":[],"productId":"3829","productReferenceId":"CU8070-100","productEans":["194502172393","194502172409","194502172416","194502172423","194502172430","194502172447","194502172454","194502172461","194502172478","194502172485","194502172492","194502172508","194502172515","194502172522","194502172539","194502172546","194502172553"],"skuStocks":{"23312":0,"23313":0,"23314":0,"23315":11,"23316":0,"23317":19,"23318":29,"23319":22,"23320":12,"23321":7,"23322":9,"23323":15,"23324":14,"23325":0,"23326":0,"23327":0,"23328":0},"productName":"AIR FORCE 1 07 LV8","productBrandId":2000004,"productBrandName":"Nike","productDepartmentId":7,"productDepartmentName":"Calzado","productCategoryId":8,"productCategoryName":"Sneakers","productListPriceFrom":"2199","productListPriceTo":"2199","productPriceFrom":"2199","productPriceTo":"2199","sellerId":"1","sellerIds":"1"});
</script>

Using beautifulsoup for python, but there is no class to identify

Thank you

1 Answer 1

2

You can simply use the 'script' tag to find the element:

soup = BeautifulSoup('''<script>vtex.events.addData({"pageCategory":"Product","pageDepartment":"Calzado","pageUrl":"http://www.taf.com.mx/air-force-1-07-lv8-cu8070-100/p","pageTitle":"AIR FORCE 1 07 LV8 | MASCULINO - tafmx","skuStockOutFromShelf":[],"skuStockOutFromProductDetail":["23312","23313","23314","23316","23325","23326","23327","23328"],"shelfProductIds":["140","141","142","3775","3777","3782","3785","545","17","314","318","530","645","801","822","940"],"accountName":"tafmx","pageFacets":[],"productId":"3829","productReferenceId":"CU8070-100","productEans":["194502172393","194502172409","194502172416","194502172423","194502172430","194502172447","194502172454","194502172461","194502172478","194502172485","194502172492","194502172508","194502172515","194502172522","194502172539","194502172546","194502172553"],"skuStocks":{"23312":0,"23313":0,"23314":0,"23315":11,"23316":0,"23317":19,"23318":29,"23319":22,"23320":12,"23321":7,"23322":9,"23323":15,"23324":14,"23325":0,"23326":0,"23327":0,"23328":0},"productName":"AIR FORCE 1 07 LV8","productBrandId":2000004,"productBrandName":"Nike","productDepartmentId":7,"productDepartmentName":"Calzado","productCategoryId":8,"productCategoryName":"Sneakers","productListPriceFrom":"2199","productListPriceTo":"2199","productPriceFrom":"2199","productPriceTo":"2199","sellerId":"1","sellerIds":"1"});</script>''', 'html.parser')
    
js_code = soup.find('script').contents[0]

js_code is then

vtex.events.addData({"pageCategory":"Product","pageDepartment":"Calzado","pageUrl":"http://www.taf.com.mx/air-force-1-07-lv8-cu8070-100/p","pageTitle":"AIR FORCE 1 07 LV8 | MASCULINO - tafmx","skuStockOutFromShelf":[],"skuStockOutFromProductDetail":["23312","23313","23314","23316","23325","23326","23327","23328"],"shelfProductIds":["140","141","142","3775","3777","3782","3785","545","17","314","318","530","645","801","822","940"],"accountName":"tafmx","pageFacets":[],"productId":"3829","productReferenceId":"CU8070-100","productEans":["194502172393","194502172409","194502172416","194502172423","194502172430","194502172447","194502172454","194502172461","194502172478","194502172485","194502172492","194502172508","194502172515","194502172522","194502172539","194502172546","194502172553"],"skuStocks":{"23312":0,"23313":0,"23314":0,"23315":11,"23316":0,"23317":19,"23318":29,"23319":22,"23320":12,"23321":7,"23322":9,"23323":15,"23324":14,"23325":0,"23326":0,"23327":0,"23328":0},"productName":"AIR FORCE 1 07 LV8","productBrandId":2000004,"productBrandName":"Nike","productDepartmentId":7,"productDepartmentName":"Calzado","productCategoryId":8,"productCategoryName":"Sneakers","productListPriceFrom":"2199","productListPriceTo":"2199","productPriceFrom":"2199","productPriceTo":"2199","sellerId":"1","sellerIds":"1"});

The tricky sketchy part is getting the json from it. I will rarely root for regex for this kind of tasks, but this is a rare one.

import re
...
js_code = soup.find('script').contents[0]
print(re.search('{.*}', js_code).group(0))

This outputs

{"pageCategory":"Product","pageDepartment":"Calzado","pageUrl":"http://www.taf.com.mx/air-force-1-07-lv8-cu8070-100/p","pageTitle":"AIR FORCE 1 07 LV8 | MASCULINO - tafmx","skuStockOutFromShelf":[],"skuStockOutFromProductDetail":["23312","23313","23314","23316","23325","23326","23327","23328"],"shelfProductIds":["140","141","142","3775","3777","3782","3785","545","17","314","318","530","645","801","822","940"],"accountName":"tafmx","pageFacets":[],"productId":"3829","productReferenceId":"CU8070-100","productEans":["194502172393","194502172409","194502172416","194502172423","194502172430","194502172447","194502172454","194502172461","194502172478","194502172485","194502172492","194502172508","194502172515","194502172522","194502172539","194502172546","194502172553"],"skuStocks":{"23312":0,"23313":0,"23314":0,"23315":11,"23316":0,"23317":19,"23318":29,"23319":22,"23320":12,"23321":7,"23322":9,"23323":15,"23324":14,"23325":0,"23326":0,"23327":0,"23328":0},"productName":"AIR FORCE 1 07 LV8","productBrandId":2000004,"productBrandName":"Nike","productDepartmentId":7,"productDepartmentName":"Calzado","productCategoryId":8,"productCategoryName":"Sneakers","productListPriceFrom":"2199","productListPriceTo":"2199","productPriceFrom":"2199","productPriceTo":"2199","sellerId":"1","sellerIds":"1"}

Which can be converted to a Python dict using json.loads.

import json
...
print(json.loads(re.search('{.*}', js_code).group(0)))

Outputs

{'pageCategory': 'Product', 'pageDepartment': 'Calzado', 'pageUrl': 'http://www.taf.com.mx/air-force-1-07-lv8-cu8070-100/p', 'pageTitle': 'AIR FORCE 1 07 LV8 | MASCULINO - tafmx', 'skuStockOutFromShelf': [], 'skuStockOutFromProductDetail': ['23312', '23313', '23314', '23316', '23325', '23326', '23327', '23328'], 'shelfProductIds': ['140', '141', '142', '3775', '3777', '3782', '3785', '545', '17', '314', '318', '530', '645', '801', '822', '940'], 'accountName': 'tafmx', 'pageFacets': [], 'productId': '3829', 'productReferenceId': 'CU8070-100', 'productEans': ['194502172393', '194502172409', '194502172416', '194502172423', '194502172430', '194502172447', '194502172454', '194502172461', '194502172478', '194502172485', '194502172492', '194502172508', '194502172515', '194502172522', '194502172539', '194502172546', '194502172553'], 'skuStocks': {'23312': 0, '23313': 0, '23314': 0, '23315': 11, '23316': 0, '23317': 19, '23318': 29, '23319': 22, '23320': 12, '23321': 7, '23322': 9, '23323': 15, '23324': 14, '23325': 0, '23326': 0, '23327': 0, '23328': 0}, 'productName': 'AIR FORCE 1 07 LV8', 'productBrandId': 2000004, 'productBrandName': 'Nike', 'productDepartmentId': 7, 'productDepartmentName': 'Calzado', 'productCategoryId': 8, 'productCategoryName': 'Sneakers', 'productListPriceFrom': '2199', 'productListPriceTo': '2199', 'productPriceFrom': '2199', 'productPriceTo': '2199', 'sellerId': '1', 'sellerIds': '1'}

Note that you may need to use a more complex regex if the script tag contains other things you did not show in the question.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you, but this only will work if the json is static, right?
@JesusTorres I'm not sure what you mean by "static". It will work with whatever exists in the <script> tag when the soup object was created
I mean what happen if the values of the json change?
@JesusTorres They can not change, by definition. soup loads the source code of the page once.
thank you one last question, in a html page there is many script tags, how I could identify the script that i want if it doesnt have id or class

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.