Cannot insert extracted json into dataframe column

Question

I have a DataFrame that has a json array as the values of one column. I want to select one of those as the values of the column and get rid of the rest. I have gotten the desired values into a series but I can't figure out how to join them back to the DataFrame in place of the existing column:

import json
from pandas.io.json import json_normalize

df = pd.DataFrame({
    'bank_account': [101, 102, 201, 301],
    'data': [
        '{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
        '{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
        '{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
        '{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
    ]},
    index = ['Alice', 'Alice', 'Bob', 'Carol']
)

lst = []
for d in df['data']:
    d = pd.read_json(d, lines=True)['uid'].values[0]
    lst.append(d)
s = pd.DataFrame(lst)
df['data'] = s
print(s)
print(df)

returns

     0
0  100
1  100
2  200
3  300
       bank_account  data
Alice           101   NaN
Alice           102   NaN
Bob             201   NaN
Carol           301   NaN

currently and I don't know why the data column shows all nan values. Any help appreciated.

Updated Issue: Some of the rows have lists of json arrays instead of just one. Here is what I have so far:

import json
from pandas.io.json import json_normalize

df = pd.DataFrame({
    'bank_account': [101, 102, 201, 301],
    'data': [
        '[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
        '{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
        '{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
        '{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
    ]},
    index = ['Alice', 'Alice', 'Bob', 'Carol']
)

# df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])

df["data"] = df["data"].apply(lambda array : (",".join(list(map(lambda x : pd.read_json(x, lines=True)["uid"][0], array),(df['data'])))))
print(df)

rachwa · Accepted Answer · 2022-03-03 23:10:33Z

1

This works for me:

df = pd.DataFrame({
    'bank_account': [101, 102, 201, 301],
    'data': [
        '{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
        '{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
        '{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
        '{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
    ]},
    index = ['Alice', 'Alice', 'Bob', 'Carol']
)

df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])

Your code does not work because df and s have different indices. If you want to fix your code set df['data'] = s[0].values (instead of df['data'] = s) before your two print statements.

edited Mar 3, 2022 at 23:10

answered Mar 3, 2022 at 23:02

rachwa

2,3901 gold badge21 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Justin Benfit Over a year ago

Thanks! Using apply looks much cleaner than a loop.

user7864386 · Accepted Answer · 2022-03-03 23:34:00Z

1

As @rachwa notes, the issue is that the indexes don't match because the index of s is numbers while the index of df is names. If you assign lst directly instead of casting it to a DataFrame, you will get the desired outcome, i.e.

df['data'] = lst

would work as expected.

You could also use json.loads instead of read_json (it should be faster):

import json
df['data'] = [json.loads(d)['uid'] for d in df['data']]

Output:

       bank_account  data
Alice           101   100
Alice           102   100
Bob             201   200
Carol           301   300

answered Mar 3, 2022 at 23:34

user7864386

3 Comments

Justin Benfit Over a year ago

Hi @enke, thanks for the suggestion! I ran this on the bigger dataset (the one I posted was a test set) and realized that some of the values are lists of json arrays. Any Idea how to tackle that. I have updated the original post with a sample row to create the issue as well as my best stab at it so far.

user7864386 Over a year ago

@JustinBenfit maybe use a nested list comprehension like: [[json.loads(d)['uid'] for d in li] for li in df['data']]

Justin Benfit Over a year ago

I tried this (maybe I have it wrong) df["data"] = [[json.loads(d)['uid'] for d in li] for li in df['data']] and got JSONDecodeError: Expecting value: line 1 column 2 (char 1)

Collectives™ on Stack Overflow

Cannot insert extracted json into dataframe column

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related