df.to_numpy returning numpy array of lists instead of uniform numpy array

Question

I'm trying to read a json file as a pandas dataframe and convert it to a numpy array:

    sample.json = [[["1", "2"], ["3", "4"]], [["7", "8"], ["9", "10"]]] 

    -------------------------------------------------------------------

    df = pd.read_json('sample.json', dtype=float)
    data = df.to_numpy()

    print(df)
    print(data)

However, this yields a numpy array of python lists:

                0        1
        0  [1, 2]   [3, 4]
        1  [7, 8]  [9, 10]

        [[list(['1', '2']) list(['3', '4'])]
        [list(['7', '8']) list(['9', '10'])]]

When I want it to look like this:

        [[1, 2], [3, 4]], 
         [[7, 8], [9, 10]]

I understand this can be accomplished by iterating over the array manually, but I'd rather avoid doing that as the data set is quite large. I have read that using df.values() is not encouraged. Any help appreciated

jkr · Accepted Answer · 2020-09-30 18:58:35Z

3

Why not load the JSON file with the builtin json module and convert to a numpy array?

import json
import numpy as np

data = json.loads("""[[["1", "2"], ["3", "4"]], [["7", "8"], ["9", "10"]]]""")

np.array(data, dtype=float)

array([[[ 1.,  2.],
        [ 3.,  4.]],

       [[ 7.,  8.],
        [ 9., 10.]]])

answered Sep 30, 2020 at 18:58

jkr

19.6k5 gold badges49 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Add Over a year ago

Hi, this produces an error, but definitely set me on the right track. Thanks :)

jkr Over a year ago

This doesn't produce an error for me. What do you see?

Add Over a year ago

Not sure what was causing the error, but managed to fix it. Thanks for the help :)

Robby the Belgian · Accepted Answer · 2020-09-30 19:05:01Z

0

Your data is 3-dimensional, not 2-dimensional. DataFrames are 2-dimensional, so the only way that it can convert your sample.json to a dataframe is by having a 2-dimensional table containing 1-dimensional items.

The easiest is to skip the pandas part completely:

import json
with open('/home/robby/temp/sample.json', 'r') as f:
    jsonarray = json.load(f)
    np.array(jsonarray, dtype=float)

answered Sep 30, 2020 at 19:05

Robby the Belgian

6634 silver badges12 bronze badges

Collectives™ on Stack Overflow

df.to_numpy returning numpy array of lists instead of uniform numpy array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related