2

I need to read some JSON data for processing. I have a single line file that has multiple JSON objects how can I parse this?

I want the output to be a file with a single line per object.

I have tried a brute force method that will use json.loads recursively to check if the json is valid but I'm getting different results every time I run the program

import json

with open('sample.json') as inp:
s = inp.read()

jsons = []

start, end = s.find('{'), s.find('}')
while True:
 try:
    jsons.append(json.loads(s[start:end + 1]))
    print(jsons)
except ValueError:
    end = end + 1 + s[end + 1:].find('}')
else:
    s = s[end + 1:]
    if not s:
        break
    start, end = s.find('{'), s.find('}')

for x  in jsons:
  writeToFilee(x)

The json format can be seen here https://pastebin.com/DgbyjAG9

9
  • Paste a sample of your file along with how you'd like to have the output. Commented Apr 9, 2019 at 12:49
  • You want to replace the taxi_group_id with what? Commented Apr 9, 2019 at 12:50
  • I want to split the single line file containing multiple objects to a multiple line file containing an object on each line Commented Apr 9, 2019 at 12:53
  • @Jessica are these objects delimited somehow? Or is it just like {...}{...}? I found only 1 occurrence of "}\s*{" regex in the paste you provided, am I right to assume this file contains 2 different JSON objects, or are there more? Commented Apr 9, 2019 at 13:00
  • 1
    how about jsons = s.replace('}{', '}|{').split('|') to create a list of json strings? Commented Apr 9, 2019 at 13:07

3 Answers 3

4

why not just use the pos attribute of the JSONDecodeError to tell you where to delimit things?

something like:

import json

def json_load_all(buf):
    while True:
        try:
            yield json.loads(buf)
        except json.JSONDecodeError as err:
            yield json.loads(buf[:err.pos])
            buf = buf[err.pos:]
        else:
            break

works with your demo data as:

with open('data.json') as fd:
    arr = list(json_load_all(fd.read()))

gives me exactly two elements, but I presume you have more?

to complete this using the standard library, writing out would look something like:

with open('data.json') as inp, open('out.json', 'w') as out:
    for obj in json_load_all(inp.read()):
        json.dump(obj, out)
        print(file=out)

otherwise the jsonlines package is good for dealing with this data format

Sign up to request clarification or add additional context in comments.

Comments

1

The code below worked for me:

import json
with open(input_file_path) as f_in: 
    file_data = f_in.read() 
    file_data = file_data.replace("}{", "},{") 
    file_data = "[" + file_data + "]"
    data = json.loads(file_data)

Comments

0

Following @Chris A's comment, I've prepared this snippet which should work just fine:

with open('my_jsons.file') as file:
    json_string = file.read()

json_objects = re.sub('}\s*{', '}|!|{', json_string).split('|!|')
# replace |!| with whatever suits you best

for json_object in json_objects:
    print(json.loads(obj))

This example, however, will become worthless as soon as '}{' string appears in some value inside your JSON, so I strongly recommend using @Sam Mason's solution

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.