3

I want to load a json mined from twitter api into python. Attached is sample of json object:

{"created_at":"Mon Apr 22 18:17:09 +0000 2019","id":1120391103813910529,"id_str":"1120391103813910529","text":"On peut dire que la base de cette 8e saison est en place \ud83d\ude4c #GOTS8E2","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":243071138,"id_str":"243071138","name":"Mr B","screen_name":"skeyos","location":"Namur","url":null,"description":null,"translator_type":"none","protected":false,"verified":false,"followers_count":197,"friends_count":1811,"listed_count":6,"favourites_count":7826,"statuses_count":8044,"created_at":"Wed Jan 26 06:49:05 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/243071138\/1406574068","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"GOTS8E2","indices":[59,67]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1555957029666"}

{"created_at":"Mon Apr 22 18:17:14 +0000 2019","id":1120391124722565123,"id_str":"1120391124722565123","text":"...

I am trying the following code:

with open('tweets.json') as tweet_data:
    json_data = json.load(tweet_data)

But get the following error:

JSONDecodeError: Extra data: line 3 column 1 (char 2149)

Unfortunately it is not possible for me to edit the json object too much, as it is really big. I need to figure out how to read this into Python. Any help would be greatly appreciated!

Edit: It works with the following code:

dat=list()
with open ('data_tweets_E2.json', 'r') as f:
    for l in f.readlines():
        if not l.strip (): # skip empty lines
            continue

        json_data = json.loads (l)
        dat.append(json_data)
2
  • Every line contains a new object, so try parsing them line by line. Also, use loads if you're parsing from a string. Commented Apr 30, 2019 at 23:13
  • Please try to use my code down below, I also helped you with the DataFrame. The error you got was due to wrong syntax of your json file. But the code is correct. So try 1 json object and check your mistakes. Commented Apr 30, 2019 at 23:45

3 Answers 3

2

Here is the code.You need to install Pandas first of course. If the solution helped you please mark this answer with the green check.

import json
import pandas as pd

with open('tweets.json') as json_file:
    data_list = json.load(json_file)

tweet_data_frame = pd.DataFrame.from_dict(data_list)
print(tweet_data_frame)
print(data_list)

So as you can see print(data_list) prints out a list and print(tweet_data_frame) prints out dataframe.

If you want to see the types of these variables just use type() print(type(data_list))

Important: What I tried to tell you is that your JSON file has bad format and a lot of mistakes. If you have more JSON objects they need to be in array [{"example":"value"},{"example":"value"}] . Your JSON file has errors. Try it with different JSON file.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi, the only issue i have a list instead of a dictionary now (see the edit in the question). Is there a way of doing this with a list? Or if I could append the data into a dictionary in the first place that works too, but I do not know how.
I tried your code, but run into the same JSONDecodeError. That is why I'm appending all the data into a list. If there is a way of loading all the data into a dictionary instead of list, that works for me too
@ArtTatum Okay so I edited my code. Try it with different JSON file and you will find out that you have damaged file. Hopefully it helps. Please let me know if this is what you are looking for. :)
Thanks, I see what you meant. Fortunately I dont need those columns with the bad data.
1

Every line contains a new object, so try parsing them line by line.

import json

with open ('tweets.json', 'r') as f:
    for l in f.readlines():
        if not l.strip (): # skip empty lines
            continue

        json_data = json.loads (l)
        print (json_data)

2 Comments

Thanks it has loaded. Although when i type json_data only the last tweet shows up. How do I store all the tweets in one object? Also if I could convert json format into dataframe that would be great!
Each iteration, json_data stores one object. Push them into a list or something if you want to have access to all of them.
1

Each line contains a separate json object, parse and store them into a list:

with open('tweets.json', 'r') as tweet_data:
    values = [json.loads(line) for line in tweet_data.readlines() 
              if not line.strip()]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.