0

I am trying to pull the data that proceeds 'series: ', as shown below.

 }
        },
        series: [{ name: '', showInLegend: false, animation: false, color: '#c84329', lineWidth: 2, data: [[1640926800000,164243],[1638248400000,224192],[1635566400000,143606],[1632974400000,208461],[1630382400000,85036],[1627704000000,25604],[1625025600000,44012],[1622433600000,111099],[1619755200000,53928],[1617163200000,12286],[1614488400000,12622],[1612069200000,4519],[1609390800000,12665],[1606712400000,314],[1604116800000,3032],[1601438400000,4164],[1598846400000,3302],[1596168000000,22133],[1593489600000,8098],[1590897600000,-1385],[1588219200000,43165],[1585627200000,427],[1582952400000,175],[1580446800000,174],[1577768400000,116],[1575090000000,196],[1572494400000,215],[1569816000000,418],[1567224000000,375],[1564545600000,375],[1561867200000,179],[1559275200000,132],[1556596800000,146],[1554004800000,163],[1551330000000,3],[1548910800000,49],[1546232400000,-29],[1543381200000,108],[1540958400000,35],[1538280000000,159],[1535688000000,287],[1533009600000,1152],[1530331200000,1306]] }],
        navigation: { menuItemStyle: { fontSize: '9px' } }
    });      

More specifically, I'm trying to pull data, which has a list of unix timestamps and ints. This is what I have so far...

url = "https://socialblade.com/twitter/user/twitter"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
req = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(req)
soup = bs(response.read(), 'html.parser')

soup = soup.find_all('script', {"type": "text/javascript"})
script = soup[6].text

Any thoughts?

2 Answers 2

1

The datatype of the script is a string so we can use the "re" module to find all occurrences of "data" in the script and then we can observe that every data in the script ends with "}" so we can find out the first "}" after data now using the index of the start of "data" substring and index of first "}" after data we can use string slicing to find out the data. you can see the code and output given below.

import re
sub = "data"
res = re.finditer(sub, script)
for i in res:
  k = script.find("}",i.start())
  print(script[i.start():k])

Output is: enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

Complete script for your required data

import requests
from bs4 import BeautifulSoup

url = "https://socialblade.com/twitter/user/twitter"

s = requests.Session()

r = requests.get(
    url,
    headers={
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
    },
)

soup = BeautifulSoup(r.text, "html.parser")
req = soup.find_all("script", {"type": "text/javascript"})
script = req[6].contents[0]
data = script[2447: 3873]

print(data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.