How To Extract Data Within a Javascript Tag Using Python's BeautifulSoup

Question

I am trying to pull the data that proceeds 'series: ', as shown below.

 }
        },
        series: [{ name: '', showInLegend: false, animation: false, color: '#c84329', lineWidth: 2, data: [[1640926800000,164243],[1638248400000,224192],[1635566400000,143606],[1632974400000,208461],[1630382400000,85036],[1627704000000,25604],[1625025600000,44012],[1622433600000,111099],[1619755200000,53928],[1617163200000,12286],[1614488400000,12622],[1612069200000,4519],[1609390800000,12665],[1606712400000,314],[1604116800000,3032],[1601438400000,4164],[1598846400000,3302],[1596168000000,22133],[1593489600000,8098],[1590897600000,-1385],[1588219200000,43165],[1585627200000,427],[1582952400000,175],[1580446800000,174],[1577768400000,116],[1575090000000,196],[1572494400000,215],[1569816000000,418],[1567224000000,375],[1564545600000,375],[1561867200000,179],[1559275200000,132],[1556596800000,146],[1554004800000,163],[1551330000000,3],[1548910800000,49],[1546232400000,-29],[1543381200000,108],[1540958400000,35],[1538280000000,159],[1535688000000,287],[1533009600000,1152],[1530331200000,1306]] }],
        navigation: { menuItemStyle: { fontSize: '9px' } }
    });

More specifically, I'm trying to pull data, which has a list of unix timestamps and ints. This is what I have so far...

url = "https://socialblade.com/twitter/user/twitter"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
req = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(req)
soup = bs(response.read(), 'html.parser')

soup = soup.find_all('script', {"type": "text/javascript"})
script = soup[6].text

Any thoughts?

NALLAPANENIVENKATESH CHOWDARY · Accepted Answer · 2022-01-25 05:34:24Z

1

The datatype of the script is a string so we can use the "re" module to find all occurrences of "data" in the script and then we can observe that every data in the script ends with "}" so we can find out the first "}" after data now using the index of the start of "data" substring and index of first "}" after data we can use string slicing to find out the data. you can see the code and output given below.

import re
sub = "data"
res = re.finditer(sub, script)
for i in res:
  k = script.find("}",i.start())
  print(script[i.start():k])

Output is:

answered Jan 25, 2022 at 5:34

NALLAPANENIVENKATESH CHOWDARY

6145 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

YOGESHWARAN R · Accepted Answer · 2022-01-25 05:52:13Z

0

Complete script for your required data

import requests
from bs4 import BeautifulSoup

url = "https://socialblade.com/twitter/user/twitter"

s = requests.Session()

r = requests.get(
    url,
    headers={
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
    },
)

soup = BeautifulSoup(r.text, "html.parser")
req = soup.find_all("script", {"type": "text/javascript"})
script = req[6].contents[0]
data = script[2447: 3873]

print(data)

answered Jan 25, 2022 at 5:52

YOGESHWARAN R

11 bronze badge

Collectives™ on Stack Overflow

How To Extract Data Within a Javascript Tag Using Python's BeautifulSoup

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related