8

Is there a "lenient" JSON Parser for Python?

I keep getting (handwritten) JSON files such as this:

/* This JSON file is created by someone who does not know JSON
   And not competent enough to search about "JSON Validators" */

{

  /* Hey look!
     A honkin' block comment here!
     Yeehaw */

  "key1": "value1",  // Hey look there's a standard-breaking comment here!
  "key3": .65,       // I'm too lazy to type "0"
  "key4": -.75,      // That "other" .Net program works anyways...
  "key5": [ 1 /* One */, 2 /* Two */, 3 /* Three */, 4 /* Four */],
  "key2": "value2",  // Whoopsie, forgot to delete the comma here...
}

The program that actually consumed those monstrously malformed JSON files somehow doesn't puke on those errors. That program is written using C#, by the way.

I'm writing some scripts in Python that will perform things based on those JSON files, but it keeps crashing (correctly) on those mistakes.

I can manually edit those .json files to be standard-compliant... but there are a LOT of them and thus it's too effort-intensive -- not to mention that I will have to keep editing new incoming JSON files, urgh.

So, back to my question, is there a lenient JSON parser that can consume those malformed JSON files without dying?

Note: This question concerns only trailing comma of last object; it does NOT handle block-comments and/or inline comments.


Edit: What the... I just received a JSON file in which the creator decided to remove leading zero for 0 < numbers < 1 ... -_-

And I discovered a file where the comment is embedded... :fuming_red:

I'll update the example above to reflect my additional "findings"...

10
  • 1
    this gist might help or if you just want to use library use jsoncomment Commented Jun 21, 2019 at 6:34
  • The commentjson library might help Commented Jun 21, 2019 at 6:41
  • 1
    @pepoluan I am aware, just suggested in case u might want to write instead of using a library. PS: its not my gist, I know this as i have used it in past in one of my project Commented Jun 21, 2019 at 9:25
  • 1
    Basically, you want to parse something which does not adhere to any standards and may or may not resemble some subset of Javascript… Good luck with that. This is really something that needs to be fixed on the producer side. If it can't be fixed there, well… FML, or FYL I guess. Commented Jun 21, 2019 at 9:57
  • 1
    @deceze it makes me wonder, though... what kind of godforsaken .Net library dare to accept this unholy mangling of JSON as a valid input?? Commented Jun 21, 2019 at 10:13

2 Answers 2

3

Okay, so @warl0ck's comment made me think that I might be better off writing my own "JSON Preprocessor" to do the heavy-duty cleanup.

So, here it is in my BitBucket Snippet, complete with a simple unit test.

I've tested it with my corpus of human-generated malformed JSON files, and it seems to work well so far...

Let me know if there's a bug in there code.

But for the time being, I'm content.


EDIT: Because BitBucket deleted all my snippets, I reupload the code to GitHub: https://gist.github.com/pepoluan/361724bfa5cce9d863dadc6e2bdcb8c9

Sign up to request clarification or add additional context in comments.

Comments

2

You might want to consider JSON5 which is a superset of JSON that allows things like comments and trailing commas. It manages to correctly parse the example in the question.

>>> import json5
>>> json5.loads("""
... {
... 
...   /* Hey look!
...      A honkin' block comment here!
...      Yeehaw */
... 
...   "key1": "value1",  // Hey look there's a standard-breaking comment here!
...   "key3": .65,       // I'm too lazy to type "0"
...   "key4": -.75,      // That "other" .Net program works anyways...
...   "key5": [ 1 /* One */, 2 /* Two */, 3 /* Three */, 4 /* Four */],
...   "key2": "value2",  // Whoopsie, forgot to delete the comma here...
}
""")
{'key1': 'value1', 'key3': 0.65, 'key4': -0.75, 'key5': [1, 2, 3, 4], 'key2': 'value2'}

1 Comment

Oh interesting, thanks! That said, for my purposes, my Preprocessor works great; but if there comes a time to do more heavy-lifting, I probably will use the JSON5 library. (But if that time ever comes, I think I'll fight tooth-and-nail to just move over to YAML and be done with it.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.