0

This is a very naive question but after referring to multiple articles, I am raising this concern. I have a column in the dataset where the column has numeric/blank/null values. I have data like below:

fund_value

Null
123
-10

I wrote a method to handle it but it doesn't work and keeps on giving me the error:

def values(x):
    if x:
        if int(x) > 0:
            return 'Positive'
        elif int(x) < 0:
            return 'Negative'
    else:
        return 'Zero'

    df2 = pd.read_csv('/home/siddhesh/Downloads/s2s_results.csv')  # Assuming it as query results
    df2 = df2.astype(str)
    df2['fund_value'] = df2.fund_value.apply(values)  

Error:

Traceback (most recent call last):
  File "/home/../Downloads/pyspark/src/sample/actual_dataset_testin.py", line 31, in <module>
    df2['fund_value'] = df2.fund_value.apply(values)
  File "/home/../.local/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/home/../.local/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/home/../.local/lib/python3.8/site-packages/pandas/core/apply.py", line 1099, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
  File "/home/../Downloads/pyspark/src/sample/actual_dataset_testin.py", line 16, in values
    if int(x) > 0:
ValueError: invalid literal for int() with base 10: 'nan'  

I even tried if x=="" or if not x: but nothing worked.

Expected Output:

fund_value  
Zero
Positive
Negative
5
  • 1
    Please share a sample input dataframe with expected output for better understanding. Commented Feb 16, 2022 at 14:09
  • I have added input and the expected output. Commented Feb 16, 2022 at 14:16
  • 1
    Is Null a string or a proper Nan? Commented Feb 16, 2022 at 14:18
  • a proper Nan value Commented Feb 16, 2022 at 14:21
  • How is the dataframe being constructed? There are built in functions in pandas that can deal with missing data Commented Feb 16, 2022 at 14:22

3 Answers 3

2

Considering df to be:

In [1278]: df = pd.DataFrame({'fund_value': [np.nan, 123, '', 10]})

In [1279]: df
Out[1279]: 
  fund_value
0        NaN
1        123
2           
3         10

Use numpy.select with pd.to_numeric:

In [1246]: import numpy as np

In [1283]: df['fund_value'] = pd.to_numeric(df.fund_value, errors='coerce')

In [1247]: conds = [df.fund_value.gt(0), df.fund_value.lt(0)]
In [1250]: choices = ['Positive', 'Negative']

In [1261]: df['fund_value'] = np.select(conds, choices, default='Zero')

In [1288]: df
Out[1288]: 
  fund_value
0       Zero
1   Positive
2       Zero
3   Positive
Sign up to request clarification or add additional context in comments.

9 Comments

Not sure this tackles the problem.. You are implicitly solving this by using float (which is great), but original question was trying to make a decision based on int cast .. that's what is the true problem here, less so the selection of the condition
@KingOtto I concentrated on sample input and expected output. I think that ways it's correct.
Sure, it is a great solution and fixes the problem! Just that next time someone is trying to do int(np.nan), we go back to square 1
I want to replace the fund_value column itself. I don't want to create any flags column for it.
@whatsinthename Please check my answer now.
|
1

You are facing a problem of NaN support with int. That is something that does not work...

Your solution: Fill your "missing" values using pd.fillna(). Fill those values with something (e.g., with 0), or remove them. Just read the values as float, which has native NaN support, then fill or remove those NaN

Background: The fact that you first cast the column to str, but then in your check function convert it back to int, which gives you a NaN error, looks like a workaround... Here is what causes the problem: Reading directly as int won't work, as int does not understand NaN --> see Int with capital I

Exampe: Assume you have a 'dirty int' input, that includes NaN, like this

df = pd.DataFrame({'fund_value': [None, 123, 10]})
   fund_value
0         NaN
1       123.0
2        10.0

Pandas will do you the courtesy of converting this to float, given all values are numeric, and fills the "gaps" (None or np.nan) with NaN. You get something to screen, but in fact it is a column of float, not int.

Option 1: How to "convert" NaN values to '0' integer (for your case distinction between 'positive' or 'negative')

df.fillna(0).astype('int')

Option 2: Directly cast a column with NaN values to Int:

df.astype('Int32')

You can then work with either one of the datasets, which truly contain integers (option 1 with assuming all NaN==0, second one with true <NA>, not floats)

2 Comments

Can you help with the snippet? Little bit confusing
Type int(np.nan) into your console and see what happens... You cannot ever cast an NaN value into int. To fill NA values, see updated answer
0

Your df2.astype(str) turns everything into string, and when you apply values(...) to the contents of the column which are all string, the first if-check will only return you a False if it's an empty string which is not the case for str(np.nan). Converting np.nan into string gives you a non-empty string 'nan'.

'nan' will pass your first if-check, and then in the second if-check, it finds itself not convertible into an int and python returns you an error.

To take care of that,

x = df['fund_value'].replace('', np.nan).astype(float)

(x > 0).map({True: 'Positive'}).fillna( (x < 0).map({True: 'Negative'}) ).fillna('Zero')

3 Comments

Thank you again, Raymond :) Just one question, do I need to assign x to some variable?
x is a temporary variable to keep the float version of fund_value because I am going to use it 2 times in my second line of code. Btw, you may just assign the outcome of the second line back to df['fund_value'] or to the dataframe but a different column name.
Cool. Understood :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.