How do I delete rows in a dataframe based on numpy where

Question

So I have a code where I use numpy to transform a dataframe to an array to calculate the hamming distance between the different entries in the array.

To find the unwanted entries i use a np.where-statement which returns the following:

array([[1, 2, 3, 4]], dtype=int32)

There four numbers are equal to the row-index in the dataframe. My question is how I can transform this array to someting so I can tell dataframe to drop these?

EDIT: This is how the code looks like right now with example data:

from sklearn.preprocessing import OrdinalEncoder
import pandas as pd
import numpy as np

data = [[1,1,1,1,1,1,1,1,1,1,1,1,1], [1,1,1,1,1,1,1,1,1,1,1,1,2], [1,1,1,1,1,1,1,1,1,1,1,1,3]]
df = pd.DataFrame(data, columns=['csk_1', 'csk_2', 'csk_3', 'csk_4', 'csk_5', 'csk_6', 'csk_7', 'csk_8', 'csk_9', 'csk_10', 'csk_11', 'csk_12', 'csk_13'])

enc = OrdinalEncoder()
X = enc.fit_transform(df.to_numpy())
j = 0
totals = (len(X) - 1)
threshold = 1

while j < totals:
    idx = len(X) - j - 1
    row = X[idx]
    prev_rows = X[0:idx]
    dists = np.sum(row != prev_rows, axis=1)
    a = np.where(dists <= threshold)
    df = df.drop(a.flatten(), axis=0)
    X = enc.fit_transform(df.to_numpy())
    j = j + 1
print(df)

What have you tried so far? For example, from the pandas drop() documentation: "labels (list-like): Index or column labels to drop" If you have the indices you want to drop, have you tried using them in the drop method? — G. Anderson
– G. Anderson, Commented Aug 27, 2021 at 21:40
Yes I've tried ths but not succeeded. I'm quite new to Python so I think it's my lack of knowledge which makes me fail. — OldSport
– OldSport, Commented Aug 27, 2021 at 22:11
See this link on how to create a minimal reproducible example, which will help us to know how to help you better in the future. Sample input, expected output, and code for what you've tried so far (with the full traceback of any error) gives us an idea of where you're stuck — G. Anderson
– G. Anderson, Commented Aug 27, 2021 at 22:17
Single argument, np.where, or rather np.nonzero can give you the indices that could be used in a drop function. The usual 3 argument where used in pandas is not useful; it generates values based on conditions. — hpaulj
– hpaulj, Commented Aug 27, 2021 at 22:20

Mohamed Afify · Accepted Answer · 2021-08-27 21:50:11Z

1

So, you need to convert the array to list and then use df.drop

a = array([1,2,3,4])
a = a.tolist()
df = df.drop(df.index[[a]])

answered Aug 27, 2021 at 21:50

Mohamed Afify

1602 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

OldSport Over a year ago

Thank you for the answer. I've tried this before but I get following error: AttributeError: 'tuple' object has no attribute 'tolist'

G. Anderson Over a year ago

drop() can take an array just as easily as a list, it just needs to be flattened into a 1D array

Mohamed Afify Over a year ago

show a sample of data or the where function to see what type is generated, or let me know if Anderson idea worked out

OldSport Over a year ago

I've added example code with a data sample as well

Mohamed Afify Over a year ago

So this is working, but will fail after two iterations as the index will be out of range: a = np.array(a).tolist() print(a) df = df.drop(df.index[a])

|

G. Anderson · Accepted Answer · 2021-08-27 22:14:55Z

1

The thing that is likely tripping you up is that you have a 2D array and df.drop() takes only a 1D array or list-like object. Luckily you can use indexing or flatten() to sort it right out.

If your array were named, for example, ind:

df1=df.drop(ind.flatten(), axis=0)

or

df1=df.drop(ind[0], axis=0)

Either should work, but it's difficult to know without seeing sample data

answered Aug 27, 2021 at 22:14

G. Anderson

5,9652 gold badges16 silver badges22 bronze badges

1 Comment

OldSport Over a year ago

Thank you! I tried your code but it gives me error. I've added example code with a data sample as well

Collectives™ on Stack Overflow

How do I delete rows in a dataframe based on numpy where

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related