0

So I have a code where I use numpy to transform a dataframe to an array to calculate the hamming distance between the different entries in the array.

To find the unwanted entries i use a np.where-statement which returns the following:

array([[1, 2, 3, 4]], dtype=int32)

There four numbers are equal to the row-index in the dataframe. My question is how I can transform this array to someting so I can tell dataframe to drop these?

EDIT: This is how the code looks like right now with example data:

from sklearn.preprocessing import OrdinalEncoder
import pandas as pd
import numpy as np

data = [[1,1,1,1,1,1,1,1,1,1,1,1,1], [1,1,1,1,1,1,1,1,1,1,1,1,2], [1,1,1,1,1,1,1,1,1,1,1,1,3]]
df = pd.DataFrame(data, columns=['csk_1', 'csk_2', 'csk_3', 'csk_4', 'csk_5', 'csk_6', 'csk_7', 'csk_8', 'csk_9', 'csk_10', 'csk_11', 'csk_12', 'csk_13'])

enc = OrdinalEncoder()
X = enc.fit_transform(df.to_numpy())
j = 0
totals = (len(X) - 1)
threshold = 1

while j < totals:
    idx = len(X) - j - 1
    row = X[idx]
    prev_rows = X[0:idx]
    dists = np.sum(row != prev_rows, axis=1)
    a = np.where(dists <= threshold)
    df = df.drop(a.flatten(), axis=0)
    X = enc.fit_transform(df.to_numpy())
    j = j + 1
print(df)
4
  • What have you tried so far? For example, from the pandas drop() documentation: "labels (list-like): Index or column labels to drop" If you have the indices you want to drop, have you tried using them in the drop method? Commented Aug 27, 2021 at 21:40
  • Yes I've tried ths but not succeeded. I'm quite new to Python so I think it's my lack of knowledge which makes me fail. Commented Aug 27, 2021 at 22:11
  • See this link on how to create a minimal reproducible example, which will help us to know how to help you better in the future. Sample input, expected output, and code for what you've tried so far (with the full traceback of any error) gives us an idea of where you're stuck Commented Aug 27, 2021 at 22:17
  • Single argument, np.where, or rather np.nonzero can give you the indices that could be used in a drop function. The usual 3 argument where used in pandas is not useful; it generates values based on conditions. Commented Aug 27, 2021 at 22:20

2 Answers 2

1

So, you need to convert the array to list and then use df.drop

a = array([1,2,3,4])
a = a.tolist()
df = df.drop(df.index[[a]])
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for the answer. I've tried this before but I get following error: AttributeError: 'tuple' object has no attribute 'tolist'
drop() can take an array just as easily as a list, it just needs to be flattened into a 1D array
show a sample of data or the where function to see what type is generated, or let me know if Anderson idea worked out
I've added example code with a data sample as well
So this is working, but will fail after two iterations as the index will be out of range: a = np.array(a).tolist() print(a) df = df.drop(df.index[a])
|
1

The thing that is likely tripping you up is that you have a 2D array and df.drop() takes only a 1D array or list-like object. Luckily you can use indexing or flatten() to sort it right out.

If your array were named, for example, ind:

df1=df.drop(ind.flatten(), axis=0)

or

df1=df.drop(ind[0], axis=0)

Either should work, but it's difficult to know without seeing sample data

1 Comment

Thank you! I tried your code but it gives me error. I've added example code with a data sample as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.