Python Dataframe delete rows after comparing multiple column values with a value

Question

I have data frame of many columns consisting float values. I want to delete a row if any of the columns have value below 20.

code:

xdf = pd.DataFrame({'A':np.random.uniform(low=-50, high=53.3, size=(5)),'B':np.random.uniform(low=10, high=130, size=(5)),'C':np.random.uniform(low=-50, high=130, size=(5)),'D':np.random.uniform(low=-100, high=200, size=(5))})

xdf =  
           A          B           C           D
0  -9.270533  42.098425   91.125009  148.350655
1  17.771411  55.564825  106.396381  -89.082831
2 -22.602563  99.330643   17.590466   73.985202
3  15.890920  76.011631   52.366311  194.023063
4  35.202379  41.973846   32.576890  100.523902

# my code
xdf[xdf[cols].ge(20).all(axis=1)]

Out[17]: 
           A          B         C           D
4  35.202379  41.973846  32.57689  100.523902

Expected output: drop a row if any column has below 20 value

xdf =  
           A          B           C           D
4  35.202379  41.973846   32.576890  100.523902

Is this the best way of doing it?

probably, could be faster in numpy

Z Li
– Z Li

2022-01-26 22:53:41 +00:00
Commented Jan 26, 2022 at 22:53 — Z Li
– Z Li, Commented Jan 26, 2022 at 22:53
@ZLi how do we do it in numpy?

Mainland
– Mainland

2022-01-26 23:03:02 +00:00
Commented Jan 26, 2022 at 23:03 — Mainland
– Mainland, Commented Jan 26, 2022 at 23:03
added an answer below

Z Li
– Z Li

2022-01-26 23:16:04 +00:00
Commented Jan 26, 2022 at 23:16 — Z Li
– Z Li, Commented Jan 26, 2022 at 23:16

Z Li · Accepted Answer · 2022-01-26 23:13:45Z

1

To do it in numpy:

xdf = pd.DataFrame({'A':np.random.uniform(low=-50, high=53.3, size=(5)),'B':np.random.uniform(low=10, high=130, size=(5)),'C':np.random.uniform(low=-50, high=130, size=(5)),'D':np.random.uniform(low=-100, high=200, size=(5))})

%timeit xdf[xdf[['A','B','C','D']].ge(20).all(axis=1)]
%timeit xdf[(xdf[['A','B','C','D']].values >= 20).all(axis=1)]

705 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
460 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you do not want to keep result in DataFrame this can even be faster:

xdf.values[(xdf[['A','B','C','D']].values >= 20).all(axis=1)]

answered Jan 26, 2022 at 23:13

Z Li

4,3381 gold badge8 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel Seger · Accepted Answer · 2022-01-26 23:11:16Z

1

As numpy is lighter and therefore faster in terms of calculations with numbers, try this:

a = np.array([np.random.uniform(low=-50, high=53.3, size=(5)),
    np.random.uniform(low=10, high=130, size=(5)),
    np.random.uniform(low=-50, high=130, size=(5)),
    np.random.uniform(low=-100, high=200, size=(5))])

print(a[np.all(a > 20, axis=1)])

If you want to stick with pandas, another idea would be:

xdfFiltered = xdf.loc[(xdf["A"] > 20) & (xdf["B"] > 20) & (xdf["C"] > 20) & (xdf["D"] > 20)]

answered Jan 26, 2022 at 23:11

Daniel Seger

864 bronze badges

Comments

Derek O · Accepted Answer · 2022-01-26 23:17:15Z

1

You can use the numpy equivalent of .ge instead:

xdf.loc[np.greater(xdf,20).all(axis=1)]

edited Jan 26, 2022 at 23:17

answered Jan 26, 2022 at 23:03

Derek O

20.2k4 gold badges32 silver badges49 bronze badges

Collectives™ on Stack Overflow

Python Dataframe delete rows after comparing multiple column values with a value

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related