0

I'm having trouble reading correctly a nested JSON file into a dataframe. This a sample of the json file with pharmaceutical products I'm working on:

[
    [
        {
            "ScrapingOriginIdentifier": "N",
            "ActiveSubstances": [
                "A.C.T.H. pour préparations homéopathiques"
            ],
            "ATC": null,
            "Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 499 638 6",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "MA Holder since: 06/10/2021",
                    "Type": "string"
                }
            ],
            "Package": "1 tube de 4 g de granules",
            "PharmaceuticalForm": "Granules",
        },
        {
            "ScrapingOriginIdentifier": "N",
            "ActiveSubstances": [
                "A.C.T.H. pour préparations homéopathiques"
            ],
            "ATC": null,
            "Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 499 638 6",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "MA Holder since: 06/10/2021",
                    "Type": "string"
                }
            ],
            "Package": "1 tube de 20 g de pommade",
            "PharmaceuticalForm": "Granules",
        }
    ],
    [
        {
            "ScrapingOriginIdentifier": "34009 341 687 6 5",
            "ActiveSubstances": [],
            "ATC": null,
            "Name": "17 B ESTRADIOL BESINS-ISCOVESCO 0,06 POUR CENT, gel pour application cutanée en tube",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 858 620 3",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "Codes: 34009 341 687 6 5 or 341 687-6",
                    "Type": "string"
                }
            ],
            "Package": "1 tube(s) aluminium verni de 80 g avec applicateur polystyrène",
            "PharmaceuticalForm": "Gel",
        }
    ]
]

I can see the problem is that it's nested by ScrapingOriginIdentifier. I read the file using:

dataset = pd.read_json('data.json', orient='records')

And tried to 'shape' it correctly using:

dataset = pd.json_normalize(dataset)

This still did not work. How can I read the file correctly in order to get all?

1
  • Try removing trailing commas. Commented Jun 23, 2022 at 9:26

1 Answer 1

1

At first, it contains unquoted values null, which should be "null". Then, the structure of your json is not suitable for creating a dataframe. The structure is the following:

[
  [
   { "ScrapingOriginIdentifier": "...", ...},
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ]
]

While it should be constructed like this:

[
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ]
]

Please consider restructuring your list this way:

json = your_json
new_list = []
for list in json:
    for item in list:
        new_list.append(item)
df = pd.DataFrame.from_dict(new_list)
Sign up to request clarification or add additional context in comments.

1 Comment

Yeah the biggest problem was the structure of the json, I managed to manipulate it from the origin. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.