How to convert the values in a Python Defaultdict to a Numpy array?

Question

I want multiple values to belong to the same key, so I used a Python defaultdict to walk around this. However, since now the values in the defaultdict are nested lists, how do I make each element of the nested lists a row of a Numpy ndarray?

Let's say my defaultdict looks like this:

my_dict = defaultdict(list)

*** in some for loop *** 
 my_dict[key].append(value) # key is a string and value is a Numpy array of shape (1,10)
*** end of the for loop ***

I guess the slowest way would be using a nested for loop like:

data = np.empty((0,10),np.uint8)
for i in my_dict:
    for j in my_dict[i]:
        data = np.append(data,j,axis=0)

is there a faster way to do this?

That is too slow, yes. Using a list comprehension to create a python list and constructing the numpy array from that would be faster. Still, I feel the problem is in having the dict in the first place. This smells like an X-Y problem. — zvone
– zvone, Commented Jan 3, 2023 at 8:18
Your question is not clear? dictionary can not have duplicate keys — The6thSense
– The6thSense, Commented Jan 3, 2023 at 8:23
np.array(my_dict.values()) will suppose to work but it's an jagged array, please provide me details — Akash Kumar
– Akash Kumar, Commented Jan 3, 2023 at 8:37

hpaulj · Accepted Answer · 2023-01-03 17:14:56Z

You should have provided an example, but I think the following is as general as your code implies.

In [131]: from collections import defaultdict
In [132]: dd = defaultdict(list)
In [133]: dd[1].append(np.ones((1,5),int))
In [134]: dd[2].append(2*np.ones((1,5),int))
In [135]: dd[1].append(3*np.ones((1,5),int))

In [136]: dd
Out[136]: 
defaultdict(list,
            {1: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
             2: [array([[2, 2, 2, 2, 2]])]})

Several suggested making array from:

In [137]: list(dd.values())
Out[137]: 
[[array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
 [array([[2, 2, 2, 2, 2]])]]

But with the possibility that there is more than one array in each list, that won't work.

We can flatten the nested lies with something similar to your code, but with a faster list append:

In [140]: alist = []
     ...: for i in dd:
     ...:     for a in dd[i]:
     ...:         alist.append(a)
     ...:         
In [141]: alist
Out[141]: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]]), array([[2, 2, 2, 2, 2]])]

We can make a 2d array from this (provided the subarrays match in shape):

In [142]: np.vstack(alist)
Out[142]: 
array([[1, 1, 1, 1, 1],
       [3, 3, 3, 3, 3],
       [2, 2, 2, 2, 2]])

or:

In [144]: np.array(alist).shape
Out[144]: (3, 1, 5)

As a general rule, repeated np.append is inefficient. list append (or a list comprehension) is best when iteration is unavoidable.

Guy's

Trying to recreate the dict with @Guy's suggestion:

In [147]: my_dict = dict()
     ...: key,value=(1,np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

I would prefer to use np.hstack here (np.append is misused too often).

In [148]: key,value=(2,2*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)    
In [149]: key,value=(1,3*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

In [150]: my_dict
Out[150]: 
{1: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3]),
 2: array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}

This has duplicated values for some of the additions. And making an array from list(my_dict.values()) is no easier.

We could collect the dict values as arrays, but it's not a simple as with lists. Array doesn't have a simple "empty", and doesn't have an inplace "append".

In [157]: dd = defaultdict(lambda: np.zeros([0,5],int))
In [158]: dd[1]=np.vstack((dd[1],(np.ones((1,5),int))))
In [159]: dd[2]=np.vstack((dd[2],(2*np.ones((1,5),int))))
In [160]: dd[3]=np.vstack((dd[3],(3*np.ones((1,5),int))))

In [161]: dd
Out[161]: 
defaultdict(<function __main__.<lambda>()>,
            {1: array([[1, 1, 1, 1, 1]]),
             2: array([[2, 2, 2, 2, 2]]),
             3: array([[3, 3, 3, 3, 3]])})

In [162]: np.vstack(list(dd.values()))
Out[162]: 
array([[1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3]])

This avoids an iteration after the dict is constructed, but the dict construction is more complex and slower. So I don't think it helps.

Guy · Accepted Answer · 2023-01-03 08:46:59Z

1

Instead of using defaultdict(list) use setdefault functionality, this will spare you from the nested list

my_dict = dict()
for key, value in values:
    my_dict[key] = np.append(my_dict.setdefault(key, value), value)

data = np.array(list(my_dict.values()))

answered Jan 3, 2023 at 8:46

Guy

51.2k10 gold badges49 silver badges96 bronze badges

5 Comments

sensationti Over a year ago

What is values in "for key, value in values"

Guy Over a year ago

@sensationti just a place holder for your data.

hpaulj Over a year ago

Doesn't list(d.values()) work with a defauldict?

Guy Over a year ago

@hpaulj it does, but with defaultdict(list) the OP had another lair of list, that way he doesn't.

hpaulj Over a year ago

I tested your code in my answer, and found that it duplicates values.

Collectives™ on Stack Overflow

How to convert the values in a Python Defaultdict to a Numpy array?

2 Answers 2

Guy's

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Guy's

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related