1

I would like to know how I can get something like this

Net     123  21   41   42  12  21
123      1   0    1    0    0   0
21       0   0    0    0    0   1
41       0   0    1    1    0   0
42       0   0    1    1    0   0
12       0   0    0    0    1   0
21       0   1    0    0    0   0

from the original dataset:

Net     L
123    [123,41]
21     [21]
41     [41,42]
42     [42,41]
12     [12]
21     [21]

I thought of explode, but it works only on rows, not on columns.

3
  • 1
    Have a look into pandas.crosstab. Commented Aug 3, 2020 at 18:05
  • Thanks S3DEV. Do you think also stack/unstack could work in this case? Commented Aug 3, 2020 at 18:06
  • Not sure those functions would be a good fit, given you are looking for the paired frequency. With a bit of simple data engineering, crosstab is what you’re after. Commented Aug 3, 2020 at 18:15

2 Answers 2

1

We can do dot

s=df.drop('Net',1)
df['New']=s.dot(s.columns+',').str[:-1].str.split(',')
df
Out[283]: 
   Net  123  21  41  42  12    21        New
0  123    1   0   1   0   0     0  [123, 41]
1   21    0   0   0   0   0     1     [21.1]
2   41    0   0   1   1   0     0   [41, 42]
3   42    0   0   1   1   0     0   [41, 42]
4   12    0   0   0   0   1     0       [12]
5   21    0   1   0   0   0     0       [21]
Sign up to request clarification or add additional context in comments.

1 Comment

Would it be possible to use your code also for strings instead of int/numbers in L column? I am asking because after trying with strings, I have got this message: TypeError: can't multiply sequence by non-int of type 'str'. But probably is because of numpy.dot
0

I assume the values in your column 'L' are str (not list), and each value is separated by a comma. If so, you can:

# create a set of column names
columns = set()
for cols in df.L.unique():
    cols = cols.split(',')
    for col in cols:
        columns.add(col)

# generate columns
for col in columns:
    df[col] = df.L.str.contains(col, regex=False)

# change False/True to 0/1
df.loc[:, columns] = df.loc[:, columns].astype(int)

4 Comments

Hi hoomant. In df.L.unique() unfortunately I get this: TypeError: unhashable type: 'list'.
Is 'df' a pandas.DataFrame? If yes, what is the data type of the 'L' pandas.Series? (hint: df.L.dtype)
Thanks for replying me. Yes, df is a dataframe. L is dtype('O')
Try df['L'] = df.L.apply(str) before running the code above. (you may also need to change the 4th line to cols[1:-1].split(', '))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.