python json.loads Unterminated string error

Question

I have been following a chat bot tutorial and am stuck. I have included the exact step that I am on as a link at the bottom of this post in case you are curious what my code looks like (I was frustrated so I copied his code word for word).

During the execution of my code, it processes just over 26,000 lines before it throws the exception. My code can be found below. As you can see, I have tried various solutions including replacing /r and /n characters with nothing and adding the tag strict=False which should allow unterminated strings into the json, but that didn't work either.

with open('C:/Python34/stuff/chatbot/{}/RC_{}'.format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
    for row in f:
        row_counter += 1

        if row_counter > start_row:
            try:
                row = json.loads(row.replace('\n','').replace('\r',''), strict=False)

            ---------blah blah blah blah------------ 

            except Exception as e:
                print("RUH ROH " + str(e))

and the exact error message is below:

RUH ROH Unterminated string starting at: line 1 column 368 (char 367)

link: https://pythonprogramming.net/building-database-chatbot-deep-learning-python-tensorflow/

EDIT:

getting rid of the try catch gave me a little more information when the error is thrown and can be found below:

Traceback (most recent call last):
  File "C:/Python34/stuff/chatbot/chatbot_db2.py", line 103, in <module>
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
  File "C:\Python34\lib\json\__init__.py", line 331, in loads
return cls(**kw).decode(s)
  File "C:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python34\lib\json\decoder.py", line 359, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 368 (char 367)

EDIT2:

Following up on a comment, they suggested I print out the line that the exception was being thrown at. And it did shed some light.

{"subreddit":"sydney","author_flair_text":null,"id":"cqugtij","gilded":0,"removal_reason":null,"downs":0,"archived":false,"created_utc":"1430439358","link_id":"t3_34e5fd","ups":6,"subreddit_id":"t5_2qkob","name":"t1_cqugtij","score_hidden":false,"author_flair_css_class":null,"parent_id":"t1_cqttsc3","controversiality":0,"score":6,"author":"SilverMeteor9798","body":"As state transport minister almost every press release from Gladys had something in there about how the liberals were \"getting on with the job\" and blaming Labor for something. It wasn't necessarily false, it just got tiresome after a while particular

while a successful row will look like this:

{"created_utc":"1430438400","ups":4,"subreddit_id":"t5_378oi","link_id":"t3_34di91","name":"t1_cqug90g","score_hidden":false,"author_flair_css_class":null,"author_flair_text":null,"subreddit":"soccer_jp","id":"cqug90g","removal_reason":null,"gilded":0,"downs":0,"archived":false,"author":"rx109","score":4,"retrieved_on":1432703079,"body":"\u304f\u305d\n\u8aad\u307f\u305f\u3044\u304c\u8cb7\u3063\u305f\u3089\u8ca0\u3051\u306a\u6c17\u304c\u3059\u308b\n\u56f3\u66f8\u9928\u306b\u51fa\u306d\u30fc\u304b\u306a","distinguished":null,"edited":false,"controversiality":0,"parent_id":"t3_34di91"}

I am honestly more confused now but it does look like it ends in a "} for all of the objects. So either it isn't ending, or there is a character that can't be parsed?

EDIT3 - SOLVED

I assumed that the file was complete, but I guess there was an error downloading it and the file was cut off with an incomplete JSON Object as the last entry. So just deleting that entry solved the issue.

Thanks to everyone for the help

How about except ... print(row.replace('\n','').replace('\r',''))? That should give an idea of what's throwing you off. — Brad Solomon
– Brad Solomon, Commented Mar 14, 2018 at 17:30
The JSON doc is 20000 lines long? Well, you obviously don't want to post it here. If you can strip it down to something small enough that produces the same error, that would be great, but there's a good chance you can't. So link to it in the repo or wherever it comes from, or at least tell us which generated pathname had the error. Also: If you can try a standalone json.load directly on that file (either in the REPL, or in a one-liner script) and verify that you get the same error, that would help. — abarnert
– abarnert, Commented Mar 14, 2018 at 17:36
Ah, a truncated file is a lot simpler. So now you get to figure out how to write the error handling so next time you get an incomplete file it won't be as much pain to debug. :) — abarnert
– abarnert, Commented Mar 14, 2018 at 17:48
In answer to your earlier question, It is very common in cases when data could get jumbled to throw out bad lines of data. You could just put an exception handler around the entire line. Often when people do that in production code they log the bad line of data to be looked at by a human to decide whether there is a bug or just bad data. — Michael Robellard
– Michael Robellard, Commented Mar 14, 2018 at 17:50

unlockme · Accepted Answer · 2019-03-22 03:09:06Z

5

I discovered the good guys at Luminoso have written a Library to sort this kind of issue.

Apparently, sometimes you might have to deal with text that comes out of other code. where the text has often passed through several different pieces of software, each with their own quirks, probably with Microsoft Office somewhere in the chain --- see this blog post

This is where ftfy comes to the rescue.

from ftfy import fix_text
import json
# text = some text source with a potential unicode problem
fixed_text = fix_text(text)
data = json.loads(fixed_text)

answered Mar 22, 2019 at 3:09

unlockme

4,3053 gold badges34 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cjnash · Accepted Answer · 2018-03-14 18:14:42Z

As I explained in EDIT2, I printed out the line that was giving me trouble, and saw that it did not end in a }, which every JSON Object should. I then went into the file, and checked the exact line that was giving me trouble by using a simple search, and I found that the line was not only truncated, but it was also the last line of my file as well.

There was definitely an error when I was either downloading or extracting this file, and it seemed to cut it short. This in turn threw the error that I got with no solution seeming to work.

To anyone who is having this error and .replace() solutions are not working: try to look through your data and make sure that there is in fact something there to replace or edit. In my case there was a truncating error during the download or extraction which made such solutions impossible.

Big thanks to abarnert, Michael Robellard and Anton Kachurin

Collectives™ on Stack Overflow

python json.loads Unterminated string error

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related