python problems reading correctly a nested JSON file

Question

I'm having trouble reading correctly a nested JSON file into a dataframe. This a sample of the json file with pharmaceutical products I'm working on:

[
    [
        {
            "ScrapingOriginIdentifier": "N",
            "ActiveSubstances": [
                "A.C.T.H. pour préparations homéopathiques"
            ],
            "ATC": null,
            "Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 499 638 6",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "MA Holder since: 06/10/2021",
                    "Type": "string"
                }
            ],
            "Package": "1 tube de 4 g de granules",
            "PharmaceuticalForm": "Granules",
        },
        {
            "ScrapingOriginIdentifier": "N",
            "ActiveSubstances": [
                "A.C.T.H. pour préparations homéopathiques"
            ],
            "ATC": null,
            "Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 499 638 6",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "MA Holder since: 06/10/2021",
                    "Type": "string"
                }
            ],
            "Package": "1 tube de 20 g de pommade",
            "PharmaceuticalForm": "Granules",
        }
    ],
    [
        {
            "ScrapingOriginIdentifier": "34009 341 687 6 5",
            "ActiveSubstances": [],
            "ATC": null,
            "Name": "17 B ESTRADIOL BESINS-ISCOVESCO 0,06 POUR CENT, gel pour application cutanée en tube",
            "OtherFields": [
                {
                    "Name": null,
                    "Value": "CIS: 6 858 620 3",
                    "Type": "string"
                },
                {
                    "Name": null,
                    "Value": "Codes: 34009 341 687 6 5 or 341 687-6",
                    "Type": "string"
                }
            ],
            "Package": "1 tube(s) aluminium verni de 80 g avec applicateur polystyrène",
            "PharmaceuticalForm": "Gel",
        }
    ]
]

I can see the problem is that it's nested by ScrapingOriginIdentifier. I read the file using:

dataset = pd.read_json('data.json', orient='records')

And tried to 'shape' it correctly using:

dataset = pd.json_normalize(dataset)

This still did not work. How can I read the file correctly in order to get all?

Try removing trailing commas.

matszwecja
– matszwecja

2022-06-23 09:26:44 +00:00
Commented Jun 23, 2022 at 9:26 — matszwecja
– matszwecja, Commented Jun 23, 2022 at 9:26

artemonsh · Accepted Answer · 2022-06-23 09:47:44Z

1

At first, it contains unquoted values null, which should be "null". Then, the structure of your json is not suitable for creating a dataframe. The structure is the following:

[
  [
   { "ScrapingOriginIdentifier": "...", ...},
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ]
]

While it should be constructed like this:

[
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ],
  [
   { "ScrapingOriginIdentifier": "...", ...},
  ]
]

Please consider restructuring your list this way:

json = your_json
new_list = []
for list in json:
    for item in list:
        new_list.append(item)
df = pd.DataFrame.from_dict(new_list)

answered Jun 23, 2022 at 9:47

artemonsh

1233 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pedro Domingues Over a year ago

Yeah the biggest problem was the structure of the json, I managed to manipulate it from the origin. Thank you!

Collectives™ on Stack Overflow

python problems reading correctly a nested JSON file

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related