3

I've got a many row, many column dataframe with different 'placeholder' values needing substitution (in a subset of columns). I've read many examples in the forum using nested lists or dictionaries, but haven't had luck with variations..

# A test dataframe
df = pd.DataFrame({'Sample':['alpha','beta','gamma','delta','epsilon'],
                  'element1':[1,-0.01,-5000,1,-2000], 
                  'element2':[1,1,1,-5000,2], 
                  'element3':[-5000,1,1,-0.02,2]})

# List of headings containing values to replace
headings = ['element1', 'element2', 'element3']

And I am trying to do something like this (obviously this doesn't work):

 # If any rows have value <-1, NaN
 df[headings].replace(df[headings < -1], np.nan)

 # If a value is between -1 and 0, make a replacement
 df[headings].replace(df[headings < 0 & headings > -1], 0.05)

So, is there possibly a better way to accomplish this using loops or fancy pandas tricks?

2 Answers 2

4

You can set the Sample column as index and then replace values on the whole data frame based on conditions:

df = df.set_index('Sample')
df[df < -1] = np.nan
df[(df < 0) & (df > -1)] = 0.05

Which gives:

#           element1    element2    element3
#  Sample           
#   alpha       1.00        1.0          NaN
#    beta       0.05        1.0         1.00
#   gamma        NaN        1.0         1.00
#   delta       1.00        NaN         0.05
# epsilon        NaN        2.0         2.00
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for that lead @Psidom! The complexity is that I have dozens of columns, and only about two-thirds need the function.
Would something like this be the best way? df[df.loc[:,(headings)] < -1] = np.nan
That's a good catch.I didn't even think that would work since the index has different dimension as the original data frame. But obviously it does.
Unfortunately, it only works when replacing values with np.nan. An error is produced when trying df[df.loc[:,(headings)] < -1] = 0.05: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
There is an alternative here you can work with if your data is not really big where you can copy the columns that you would like to modify, and apply the function separately using the syntax in the answer and then assign them back to the data frame. In detail, it can be df1 = df.loc[:, headings]; df1[df1 < -1] = np.nan; df1[(df1 < 0) & (df1 > -1)] = 0.05; df.loc[:, headings] = df1.
|
2

Here is the successful answer as suggested by @Psidom.

The solution involves taking a slice out of the dataframe, applying the function, then reincorporates the amended slice:

df1 = df.loc[:, headings]
df1[df1 < -1] = np.nan
df1[(df1 < 0)] = 0.05 
df.loc[:, headings] = df1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.