Pandas: replace list of values from list of columns

Question

I've got a many row, many column dataframe with different 'placeholder' values needing substitution (in a subset of columns). I've read many examples in the forum using nested lists or dictionaries, but haven't had luck with variations..

# A test dataframe
df = pd.DataFrame({'Sample':['alpha','beta','gamma','delta','epsilon'],
                  'element1':[1,-0.01,-5000,1,-2000], 
                  'element2':[1,1,1,-5000,2], 
                  'element3':[-5000,1,1,-0.02,2]})

# List of headings containing values to replace
headings = ['element1', 'element2', 'element3']

And I am trying to do something like this (obviously this doesn't work):

 # If any rows have value <-1, NaN
 df[headings].replace(df[headings < -1], np.nan)

 # If a value is between -1 and 0, make a replacement
 df[headings].replace(df[headings < 0 & headings > -1], 0.05)

So, is there possibly a better way to accomplish this using loops or fancy pandas tricks?

akuiper · Accepted Answer · 2016-07-11 00:16:16Z

4

You can set the Sample column as index and then replace values on the whole data frame based on conditions:

df = df.set_index('Sample')
df[df < -1] = np.nan
df[(df < 0) & (df > -1)] = 0.05

Which gives:

#           element1    element2    element3
#  Sample           
#   alpha       1.00        1.0          NaN
#    beta       0.05        1.0         1.00
#   gamma        NaN        1.0         1.00
#   delta       1.00        NaN         0.05
# epsilon        NaN        2.0         2.00

edited Jul 11, 2016 at 0:16

answered Jul 10, 2016 at 5:23

akuiper

216k33 gold badges363 silver badges380 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Shawn Over a year ago

Thanks for that lead @Psidom! The complexity is that I have dozens of columns, and only about two-thirds need the function.

Shawn Over a year ago

Would something like this be the best way? df[df.loc[:,(headings)] < -1] = np.nan

akuiper Over a year ago

That's a good catch.I didn't even think that would work since the index has different dimension as the original data frame. But obviously it does.

Shawn Over a year ago

Unfortunately, it only works when replacing values with np.nan. An error is produced when trying df[df.loc[:,(headings)] < -1] = 0.05: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

akuiper Over a year ago

There is an alternative here you can work with if your data is not really big where you can copy the columns that you would like to modify, and apply the function separately using the syntax in the answer and then assign them back to the data frame. In detail, it can be df1 = df.loc[:, headings]; df1[df1 < -1] = np.nan; df1[(df1 < 0) & (df1 > -1)] = 0.05; df.loc[:, headings] = df1.

|

akuiper · Accepted Answer · 2016-07-11 00:57:35Z

2

Here is the successful answer as suggested by @Psidom.

The solution involves taking a slice out of the dataframe, applying the function, then reincorporates the amended slice:

df1 = df.loc[:, headings]
df1[df1 < -1] = np.nan
df1[(df1 < 0)] = 0.05 
df.loc[:, headings] = df1

edited Jul 11, 2016 at 0:57

akuiper

216k33 gold badges363 silver badges380 bronze badges

answered Jul 11, 2016 at 0:30

Shawn

6133 gold badges8 silver badges17 bronze badges

Collectives™ on Stack Overflow

Pandas: replace list of values from list of columns

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related