1

Hi a learner in python I have written code the extracts a son file from a sports website.

The code is

from bs4 import BeautifulSoup
import requests
import json

url = "https://s3-ap-southeast-2.amazonaws.com/racevic.static/2022-08-01/sportsbet-pakenham-synthetic/sectionaltimes/race-2.json?"

payload={}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

and the output looks like this (small portion)

sectionaltimes_callback({"Horses":[{"Comment":"Slow Out 1 Lengths , got back 2nd last off tardy start 8 Lengths 800m, still mile off them getting widest from the corner, charged home last 200m for eye catching second spot # powered home widest","FinalPosition":2,"FinalPositionAbbreviation":"2nd","FullName":"Ameerati","SaddleNumber":12,"HorseUrl":"/horses/ameerati","SilkUrl":"//cdn.silks.racing.com/bb/114031.png","Trainer":"Robbie Griffiths & Mathew de Kock","TrainerUrl":"/trainers/robbie-griffiths","Jockey":"P.M.Moloney","JockeyUrl":"/jockeys/patrick-moloney","SectionalTimes":[{"Distance":"1200m","Position":11,"Time":"1:11.43","AvgSpeed":0.0},{"Distance":"1000m","Position":11,"Time":"59.29","AvgSpeed":0.0},{"Distance":"800m","Position":11,"Time":"46.95","AvgSpeed":0.0},{"Distance":"600m","Position":11,"Time":"34.77","AvgSpeed":0.0},{"Distance":"400m","Position":11,"Time":"22.71","AvgSpeed":0.0},{"Distance":"200m","Position":4,"Time":"11.45","AvgSpeed":0.0},{"Distance":"Finish","Position":2,"Time":"","AvgSpeed":0.0}],"SplitTimes":[{"Distance":"1200m-1000m","Position":11,"Time":"12.14","AvgSpeed":0.0},{"Distance":"1000m-800m","Position":11,"Time":"12.34","AvgSpeed":0.0},{"Distance":"800m-600m","Position":11,"Time":"12.18","AvgSpeed":0.0},{"Distance":"600m-400m","Position":11,"Time":"12.06","AvgSpeed":0.0},{"Distance":"400m-200m","Position":11,"Time":"11.26","AvgSpeed":0.0},{"Distance":"200m-Finish","Position":4,"Time":"11.45","AvgSpeed":0.0}],"StartPosition":0,"BarrierNumber":12,"RaceTime":"","TimeVarToWinner":0.0,"BeatenMargin":0.0,"DistanceRun":0,"DistanceVarToWinner":"","SixHundredMetresTime":"34.77","TwoHundredMetresTime":"11.45","Early":0.0,"Mid":0.0,"Late":0.0,"OverallPeakSpeed":0.0,"PeakSpeedLocation":null,"OverallAvgSpeed":0.0,"DistanceFromRail":0.0},

The help I would appreciate now is what do I do to put this in a format that I can open in excel

2
  • 1
    That's JSONP, not JSON. It has a function call sectionaltimes_callback(...) wrapped around the JSON. You need to remove that before calling json.loads() Commented Aug 8, 2022 at 2:17
  • So would it be possible for you to expand on your comment a bit more because I am a novice finding my way by watching youtube videos and reading this site. Commented Aug 8, 2022 at 2:23

3 Answers 3

1
import pandas as pd
import requests
import json
from openpyxl import Workbook
from openpyxl.utils import get_column_letter
from openpyxl.utils.dataframe import dataframe_to_rows

def race_data_to_xslxs(url, fname):
    # get data
    data = json.loads(requests.get(url).text[24:-1])

    # create dataframes
    dfs = {}
    singulars = pd.DataFrame()
    for k, v in data.items():
        if isinstance(v, list):
            dfs[k] = pd.DataFrame(v)
        else:
            singulars[k] = [v]
    dfs = {'summary': singulars, **dfs}

    # create workbook
    wb = Workbook()
    for k, df in dfs.items():
        # create sheet
        wsx = wb.create_sheet(title=k)
        rows = dataframe_to_rows(df)
        for r_idx, row in enumerate(rows, 1):
            for c_idx, value in enumerate(row, 1):
                wsx.cell(row=r_idx, column=c_idx, value=str(value))
    del wb['Sheet']
    # write excel file
    wb.save(filename=fname)

url = "https://s3-ap-southeast-2.amazonaws.com/racevic.static/2022-08-01/sportsbet-pakenham-synthetic/sectionaltimes/race-2.json?"
path = 'fname.xlsx'
race_data_to_xslxs(url=url, fname=path)
Sign up to request clarification or add additional context in comments.

3 Comments

So to totally understand your solution is this coded added to existing code or replacing?
@Learner it is self-contained
Completed but I note that it is sending data to multiple worksheets I would like all the data to be populated in one worksheet with headings across the top and lines of data running down the page. Any chance you could provide a different layout format option
1

The API is returning JSONP, not JSON. This is JSON wrapped in a call to a callback function, which can be used by browsers without violating the same-origin rule. You need to remove that function call before parsing it as JSON.

import re
import json

response = requests.request("GET", url, headers=headers, data=payload)
json_string = re.sub(r'^sectionaltimes_callback\((.*)\)$', r'\1', response)
data = json.loads(json_string)

Comments

1

You can try this -

import requests
import json

url = "https://s3-ap-southeast-2.amazonaws.com/racevic.static/2022-08-01/sportsbet-pakenham-synthetic/sectionaltimes/race-2.json?"
response = requests.get(url)
json.loads(response.text.split("(", 1)[1].strip(")"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.