So I have a code where I use numpy to transform a dataframe to an array to calculate the hamming distance between the different entries in the array.
To find the unwanted entries i use a np.where-statement which returns the following:
array([[1, 2, 3, 4]], dtype=int32)
There four numbers are equal to the row-index in the dataframe. My question is how I can transform this array to someting so I can tell dataframe to drop these?
EDIT: This is how the code looks like right now with example data:
from sklearn.preprocessing import OrdinalEncoder
import pandas as pd
import numpy as np
data = [[1,1,1,1,1,1,1,1,1,1,1,1,1], [1,1,1,1,1,1,1,1,1,1,1,1,2], [1,1,1,1,1,1,1,1,1,1,1,1,3]]
df = pd.DataFrame(data, columns=['csk_1', 'csk_2', 'csk_3', 'csk_4', 'csk_5', 'csk_6', 'csk_7', 'csk_8', 'csk_9', 'csk_10', 'csk_11', 'csk_12', 'csk_13'])
enc = OrdinalEncoder()
X = enc.fit_transform(df.to_numpy())
j = 0
totals = (len(X) - 1)
threshold = 1
while j < totals:
idx = len(X) - j - 1
row = X[idx]
prev_rows = X[0:idx]
dists = np.sum(row != prev_rows, axis=1)
a = np.where(dists <= threshold)
df = df.drop(a.flatten(), axis=0)
X = enc.fit_transform(df.to_numpy())
j = j + 1
print(df)
np.where, or rathernp.nonzerocan give you the indices that could be used in adropfunction. The usual 3 argumentwhereused inpandasis not useful; it generates values based on conditions.