I have been following a chat bot tutorial and am stuck. I have included the exact step that I am on as a link at the bottom of this post in case you are curious what my code looks like (I was frustrated so I copied his code word for word).
During the execution of my code, it processes just over 26,000 lines before it throws the exception. My code can be found below. As you can see, I have tried various solutions including replacing /r and /n characters with nothing and adding the tag strict=False which should allow unterminated strings into the json, but that didn't work either.
with open('C:/Python34/stuff/chatbot/{}/RC_{}'.format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
for row in f:
row_counter += 1
if row_counter > start_row:
try:
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
---------blah blah blah blah------------
except Exception as e:
print("RUH ROH " + str(e))
and the exact error message is below:
RUH ROH Unterminated string starting at: line 1 column 368 (char 367)
link: https://pythonprogramming.net/building-database-chatbot-deep-learning-python-tensorflow/
EDIT:
getting rid of the try catch gave me a little more information when the error is thrown and can be found below:
Traceback (most recent call last):
File "C:/Python34/stuff/chatbot/chatbot_db2.py", line 103, in <module>
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
File "C:\Python34\lib\json\__init__.py", line 331, in loads
return cls(**kw).decode(s)
File "C:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python34\lib\json\decoder.py", line 359, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 368 (char 367)
EDIT2:
Following up on a comment, they suggested I print out the line that the exception was being thrown at. And it did shed some light.
{"subreddit":"sydney","author_flair_text":null,"id":"cqugtij","gilded":0,"removal_reason":null,"downs":0,"archived":false,"created_utc":"1430439358","link_id":"t3_34e5fd","ups":6,"subreddit_id":"t5_2qkob","name":"t1_cqugtij","score_hidden":false,"author_flair_css_class":null,"parent_id":"t1_cqttsc3","controversiality":0,"score":6,"author":"SilverMeteor9798","body":"As state transport minister almost every press release from Gladys had something in there about how the liberals were \"getting on with the job\" and blaming Labor for something. It wasn't necessarily false, it just got tiresome after a while particular
while a successful row will look like this:
{"created_utc":"1430438400","ups":4,"subreddit_id":"t5_378oi","link_id":"t3_34di91","name":"t1_cqug90g","score_hidden":false,"author_flair_css_class":null,"author_flair_text":null,"subreddit":"soccer_jp","id":"cqug90g","removal_reason":null,"gilded":0,"downs":0,"archived":false,"author":"rx109","score":4,"retrieved_on":1432703079,"body":"\u304f\u305d\n\u8aad\u307f\u305f\u3044\u304c\u8cb7\u3063\u305f\u3089\u8ca0\u3051\u306a\u6c17\u304c\u3059\u308b\n\u56f3\u66f8\u9928\u306b\u51fa\u306d\u30fc\u304b\u306a","distinguished":null,"edited":false,"controversiality":0,"parent_id":"t3_34di91"}
I am honestly more confused now but it does look like it ends in a "} for all of the objects. So either it isn't ending, or there is a character that can't be parsed?
EDIT3 - SOLVED
I assumed that the file was complete, but I guess there was an error downloading it and the file was cut off with an incomplete JSON Object as the last entry. So just deleting that entry solved the issue.
Thanks to everyone for the help
except ... print(row.replace('\n','').replace('\r',''))? That should give an idea of what's throwing you off.json.loaddirectly on that file (either in the REPL, or in a one-liner script) and verify that you get the same error, that would help.