1

I'm trying to create a routine in Python to collect every diagonal group of values in df. Here's a reproducible example of what I'm trying to achieve:

data = {'column1':[1,1, 2, 3,6, 4,5,6], 'column2':[np.nan,4,3,5,6,2,3,4], 'column3':[np.nan,np.nan,3,2,5,np.nan,8,4], 'column4':[np.nan,np.nan,np.nan,3,6,np.nan,np.nan, 6], 'column5':[np.nan, np.nan, np.nan, np.nan, 8, np.nan, np.nan,np.nan]}

df = pd.DataFrame(data, columns = ['column1', 'column2', 'column3', 'column4', 'column5'])
my_list = []
# dict_list = {'list' + str(i):[] for i in list(range(len(df)))}

for i in range(len(df)):
    for j in range(len(df.columns)):   
        
        if (i + j) < df.iloc[6,2]:
            my_list.append(df.iloc[i + j, j])
            
        else:
            break

This code returns me one single list:

my_list = [1,4.0,3.0,3.0,8.0,1,3.0,2.0,6.0,nan,2,5.0,5.0,nan,nan,3,6.0,nan,nan,nan,6,2.0,8.0,6.0,4,3.0,40,5,4.0,6]

And based on the structure of the given df, what I'm trying to achieve is:

dict_list = [[1,4,3,3,8],[1,3,2,6],[2,5,5],[3,6],[6,2,8,6],[4,3,4],[5,4],[6]]

From what I've seen I could do this by creating a list of lists (commented in the code as dict_list, here's the reference: Python : creating multiple lists), but I haven't been able to put my data as shown in dict_listobject.

I will appreciate any help or guide.

Thank you!

2
  • Yes it solved it, I've already answered to you too and accepted the solution as the one I was looking for. Cheers! Commented Aug 14, 2020 at 21:42
  • I am glad to hear that it's the answer you are looking for. Commented Aug 14, 2020 at 21:49

1 Answer 1

1

Using the numpy.diag() will help you

This is the code I used:

import pandas as pd
import numpy as np

data = {'column1':[1,1, 2, 3,6, 4,5,6], 'column2':[np.nan,4,3,5,6,2,3,4], 'column3':[np.nan,np.nan,3,2,5,np.nan,8,4], 'column4':[np.nan,np.nan,np.nan,3,6,np.nan,np.nan, 6], 'column5':[np.nan, np.nan, np.nan, np.nan, 8, np.nan, np.nan,np.nan]}
df = pd.DataFrame(data, columns = ['column1', 'column2', 'column3', 'column4', 'column5'])
nump=df.to_numpy()

my_list = []
for i in range(len(nump)):
    my_list.append(np.diag(nump,k=-(i)))

OUTPUT:

[array([1., 4., 3., 3., 8.]),
 array([ 1.,  3.,  2.,  6., nan]),
 array([ 2.,  5.,  5., nan, nan]),
 array([ 3.,  6., nan, nan, nan]),
 array([6., 2., 8., 6.]),
 array([4., 3., 4.]),
 array([5., 4.]),
 array([6.])]

To clean nan values:

cleanedList=[]

for i in range(len(my_list)):
    l=[x for x in my_list[i] if str(x) != 'nan']
    print(l)
    cleanedList.append(l)

OUTPUT:

[[1.0, 4.0, 3.0, 3.0, 8.0],
 [1.0, 3.0, 2.0, 6.0],
 [2.0, 5.0, 5.0],
 [3.0, 6.0],
 [6.0, 2.0, 8.0, 6.0],
 [4.0, 3.0, 4.0],
 [5.0, 4.0],
 [6.0]]

For more information about how to use numpy.diag() visit the documentation numpy.diag

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much @Youness Saadna, your solution fitted exactly what I was looking for! Besides, Numpy allows you to work incredible faster when doing for loops. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.