0

I'm scraping a website for product reviews. I can successfully get the JSON data, but I'm having an issue with the parsing. The levels of data are like this: payload -> reviews -> 22Y6N61W6TO2 -> customerReviews.

The data I want is in the "customerReviews level. However, the "6IYETQATGRMP" value will be different when looking at another item.

I don't want to have to use a different python script for each item to account for this one value. How do I use something like a wild card or something to get the data I'm after?

I'm using Python 3, requests, and JSON in my script.

My script looks like this:

import json
import pandas as pd
with open('data.json', 'r') as f:
    data = json.load(f)

df = pd.json_normalize(data['payload']['reviews']['22Y6N61W6TO2']['customerReviews'])
    print(df)

Below is a section of the JSON I'm working with:

"payload": {
      "products": {},
      "offers": {},
      "idmlMap": {},
      "reviews": {
         "22Y6N61W6TO2": {
            "averageOverallRating": 4.4783,
            "roundedAverageOverallRating": 4.5,
            "overallRatingRange": 5.0,
            "totalReviewCount": 759,
            "recommendedPercentage": 89,
            "ratingValueOneCount": 35,
            "ratingValueTwoCount": 27,
            "ratingValueThreeCount": 30,
            "ratingValueFourCount": 115,
            "ratingValueFiveCount": 552,
            "percentageOneCount": 4,
            "percentageTwoCount": 3,
            "percentageThreeCount": 3,
            "percentageFourCount": 15,
            "percentageFiveCount": 72,
            "activeSort": "relevancy",
            "pagination": {
               "total": 759,
               "pages": [
                  {
                     "num": 1,
                     "gap": false,
                     "active": true,
                     "url": "sort=relevancy&page=1"
                  },
                  {
                     "num": 2,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=2"
                  },
                  {
                     "num": 3,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=3"
                  },
                  {
                     "num": 4,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=4"
                  },
                  {
                     "num": 5,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=5"
                  },
                  {
                     "num": 6,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=6"
                  },
                  {
                     "num": 0,
                     "gap": true,
                     "active": false
                  },
                  {
                     "num": 38,
                     "gap": false,
                     "active": false,
                     "url": "sort=relevancy&page=38"
                  }
               ],
               "next": {
                  "num": 0,
                  "gap": false,
                  "active": false,
                  "url": "sort=relevancy&page=2"
               },
               "currentSpan": "1-20"
            },
            "customerReviews": [
               {
                  "reviewId": "248695872",
                  "authorId": "13b0b650b7694a54267279bf80e0fdfa99cc7c3c5150d32aff7db274e74c07f5f6e7f7b6c4fe8cb64a007c9e3c0f0c04",
                  "negativeFeedback": 0,
                  "positiveFeedback": 0,
                  "rating": 5.0,
                  "reviewTitle": "Amazing",
                  "reviewText": "This thing is amazing. I cooked bbq ribs in 30 mins. Then caramelized for 6 mins in my oven. They was awesome. Best kitchen appliance of 2020. Wish i had bought it before dec 31st. Buy one folks. You'll love it.",
                  "reviewSubmissionTime": "1/1/2021",
                  "userNickname": "Keith",
                  "badges": [
                     {
                        "badgeType": "Custom",
                        "id": "VerifiedPurchaser",
                        "contentType": "REVIEW"
                     }
                  ],
                  "userAttributes": {},
                  "photos": [
                     {
                        "Id": "e917ed53-cf49-48af-b454-42f3fd87536a",
                        "Sizes": {
                           "normal": {
                              "Id": "normal",
                              "Url": "https://i5.walmartimages.com/dfw/6e29e393-988c/k2-_d716ba9d-2c5b-4f82-b9a6-588575975fe6.v1.bin"
                           },
                           "thumbnail": {
                              "Id": "thumbnail",
                              "Url": "https://i5.walmartimages.com/dfw/6e29e393-988c/k2-_d716ba9d-2c5b-4f82-b9a6-588575975fe6.v1.bin?odnWidth=150&odnHeight=150&odnBg=ffffff"
                           }
                        },
                        "SizesOrder": [
                           "normal",
                           "thumbnail"
                        ]
                     }
                  ],
                  "videos": [],
                  "externalSource": "bazaarvoice"
               }
2
  • 4
    Please make this a minimal reproducible example Commented Jan 10, 2021 at 21:57
  • and include sample of JSON response that you get Commented Jan 10, 2021 at 22:05

2 Answers 2

1

i believe that you can get the key first like this

key = list(data["payload"]['reviews'].keys())[0]
df = pd.json_normalize(data['payload']['reviews'][key]['customerReviews'])
Sign up to request clarification or add additional context in comments.

Comments

0

You have to set a simple variable to have a standard export function.

    import json
    import random

    id_1 = '6IYETQATGRMP'
    id_2 = '7GAADHOOLWCT'
    id_3 = '8WWBHOLWQNNZ'
    the_json_data = '''{
        "level_1": {
            "level_2": {
                "6IYETQATGRMP": {
                    "level_4": "your_1st_level_4_data"
                },
                "7GAADHOOLWCT": {
                    "level_4": "your_2nd_level_4_data"
                },
                "8WWBHOLWQNNZ": {
                    "level_4": "your_3rd_level_4_data"
                }
            }
        }
    }'''
    your_var = random.choice([id_1, id_2, id_3])
    data = json.loads(the_json_data)
    print(data['level_1']['level_2'][your_var]['level_4'])   

In this way you can use 'your_var' to set the desired ID and the function will work as expected if the ID exists.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.