0

I want multiple values to belong to the same key, so I used a Python defaultdict to walk around this. However, since now the values in the defaultdict are nested lists, how do I make each element of the nested lists a row of a Numpy ndarray?

Let's say my defaultdict looks like this:

my_dict = defaultdict(list)

*** in some for loop *** 
 my_dict[key].append(value) # key is a string and value is a Numpy array of shape (1,10)
*** end of the for loop ***

I guess the slowest way would be using a nested for loop like:

data = np.empty((0,10),np.uint8)
for i in my_dict:
    for j in my_dict[i]:
        data = np.append(data,j,axis=0)   

is there a faster way to do this?

3
  • 1
    That is too slow, yes. Using a list comprehension to create a python list and constructing the numpy array from that would be faster. Still, I feel the problem is in having the dict in the first place. This smells like an X-Y problem. Commented Jan 3, 2023 at 8:18
  • Your question is not clear? dictionary can not have duplicate keys Commented Jan 3, 2023 at 8:23
  • np.array(my_dict.values()) will suppose to work but it's an jagged array, please provide me details Commented Jan 3, 2023 at 8:37

2 Answers 2

1

You should have provided an example, but I think the following is as general as your code implies.

In [131]: from collections import defaultdict
In [132]: dd = defaultdict(list)
In [133]: dd[1].append(np.ones((1,5),int))
In [134]: dd[2].append(2*np.ones((1,5),int))
In [135]: dd[1].append(3*np.ones((1,5),int))

In [136]: dd
Out[136]: 
defaultdict(list,
            {1: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
             2: [array([[2, 2, 2, 2, 2]])]})

Several suggested making array from:

In [137]: list(dd.values())
Out[137]: 
[[array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
 [array([[2, 2, 2, 2, 2]])]]

But with the possibility that there is more than one array in each list, that won't work.

We can flatten the nested lies with something similar to your code, but with a faster list append:

In [140]: alist = []
     ...: for i in dd:
     ...:     for a in dd[i]:
     ...:         alist.append(a)
     ...:         
In [141]: alist
Out[141]: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]]), array([[2, 2, 2, 2, 2]])]

We can make a 2d array from this (provided the subarrays match in shape):

In [142]: np.vstack(alist)
Out[142]: 
array([[1, 1, 1, 1, 1],
       [3, 3, 3, 3, 3],
       [2, 2, 2, 2, 2]])

or:

In [144]: np.array(alist).shape
Out[144]: (3, 1, 5)

As a general rule, repeated np.append is inefficient. list append (or a list comprehension) is best when iteration is unavoidable.

Guy's

Trying to recreate the dict with @Guy's suggestion:

In [147]: my_dict = dict()
     ...: key,value=(1,np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

I would prefer to use np.hstack here (np.append is misused too often).

In [148]: key,value=(2,2*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)    
In [149]: key,value=(1,3*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

In [150]: my_dict
Out[150]: 
{1: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3]),
 2: array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}

This has duplicated values for some of the additions. And making an array from list(my_dict.values()) is no easier.

We could collect the dict values as arrays, but it's not a simple as with lists. Array doesn't have a simple "empty", and doesn't have an inplace "append".

In [157]: dd = defaultdict(lambda: np.zeros([0,5],int))
In [158]: dd[1]=np.vstack((dd[1],(np.ones((1,5),int))))
In [159]: dd[2]=np.vstack((dd[2],(2*np.ones((1,5),int))))
In [160]: dd[3]=np.vstack((dd[3],(3*np.ones((1,5),int))))

In [161]: dd
Out[161]: 
defaultdict(<function __main__.<lambda>()>,
            {1: array([[1, 1, 1, 1, 1]]),
             2: array([[2, 2, 2, 2, 2]]),
             3: array([[3, 3, 3, 3, 3]])})

In [162]: np.vstack(list(dd.values()))
Out[162]: 
array([[1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3]])

This avoids an iteration after the dict is constructed, but the dict construction is more complex and slower. So I don't think it helps.

Sign up to request clarification or add additional context in comments.

Comments

1

Instead of using defaultdict(list) use setdefault functionality, this will spare you from the nested list

my_dict = dict()
for key, value in values:
    my_dict[key] = np.append(my_dict.setdefault(key, value), value)

data = np.array(list(my_dict.values()))

5 Comments

What is values in "for key, value in values"
@sensationti just a place holder for your data.
Doesn't list(d.values()) work with a defauldict?
@hpaulj it does, but with defaultdict(list) the OP had another lair of list, that way he doesn't.
I tested your code in my answer, and found that it duplicates values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.