6

I'm trying to do a python script that merge 2 json files for example:

First file: students.json

{"John Smith":{"age":16, "id": 1}, ...., "Paul abercom":{"age":18, "id": 764}}

Second file: teacher.json

{"Agathe Magesti":{"age":36, "id": 765}, ...., "Tom Ranliver":{"age":54, "id": 801}}

So in a first time, to not lose any informations I modify the files to add the status of each person like that:

{"John Smith":{"age":16, "id": 1, "status":"student"}, ...., "Paul abercom":{"age":18, "id": 764, "status":"student"}}

{"Agathe Magesti":{"age":36, "id": 765, "status":"teacher"}, ...., "Tom Ranliver":{"age":54, "id": 801, "status":"teacher"}}

To do that I did the following code:

import pandas as pd
type_student = pd.read_json('student.json')
type_student.loc["status"] = "student"
type_student.to_json("testStudent.json")
type_teacher = pd.read_json('teacher.json')
type_teacher.loc["status"] = "teacher"
type_teacher.to_json("testTeacher.json")
with open("testStudent.json") as data_file:
   data_student = json.load(data_file)
with open("testTeacher.json") as data_file:
   data_teacher = json.load(data_file)

What I want to do is to merge data_student and data_teacher and print the resulting JSON in a json file, but I can only use the standard library, pandas, numpy and scipy.

After some tests I realize that some teacher are also students which can be a problem for the merge.

1
  • 1
    You don't need pandas to mess with JSON data Commented Feb 6, 2016 at 8:01

2 Answers 2

2

it looks like your JSON files contain "objects" as top-level structures. These map to Python dictionaries. So this should be easy using just Python. Just update the first dictionary with the second.

import json

with open("mel1.json") as fo:
    data1 = json.load(fo)

with open("mel2.json") as fo:
    data2 = json.load(fo)

data1.update(data2)

with open("melout.json", "w") as fo:
    json.dump(data1, fo)
Sign up to request clarification or add additional context in comments.

2 Comments

What will become the duplicate data? Will their status become something like "status": ["teacher","student"]
@mel it will be replaced. I presume you already took care of the duplicates when you edited it. If not, just a little more code is needed to loop over one and update the value on a duplicate.
1

You should concatenate the two data frames before converting to JSON:

pd.concat([data_teacher, data_student], axis=1).to_json()

2 Comments

I can't edit your post because it's inferior to 6 caracters, but can you rectify: type_pd.concat by pd.concat I get the following error: ValueError: DataFrame columns must be unique for orient='columns'. I thinks it come from that some teacher are also students and when I load my json into the dataframe, pandas is inversing the index and column, i guess because it's a nested JSON.
It's difficult to know what is causing the problem without seeing the actual data. The proposed solution works with type_student and type_teacher instead of data_student and data_teacher. I suggest you take a look at the documentation for concat.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.