21

How can I preserve nulls after converting to string? I'm working with social security numbers, where it's necessary to go back and forth between float and string.

import pandas as pd
import numpy as np    
x = pd.Series([np.nan, 123., np.nan, 456.], dtype = float)
x.isnull()

...Has nulls

y = x.astype(str)
y.isnull()

...No nulls

So ideally x.isnull() and y.isnull() would be the same.

I think it's dangerous to use a Series of mixed dtypes, but thinking this is the best solution for the time being:

z = y.copy()
z[z == 'nan'] = np.nan
z.isnull() # works as desired
type(z[0]) # but has floats for nulls
type(z[1]) # and strings for values

6 Answers 6

19

You can also use the "string" dtype instead of str in pandas >= 1.0:

y = x.astype("string")

should preserve the NaNs.

It's described in the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

Sign up to request clarification or add additional context in comments.

3 Comments

This one should be the top answer as this is the easiest and most "pandas way of solving"
The problem I run into with this solution is that all the null values are turned into "<NA>" when I later try to save it to the database.
So now I have object (str) columns with None values and object (Str) columns with <Na> (<class 'pandas._libs.missing.NAType'>) values, why couldn't they just accept None as the only NA/NaN/NaI/NaD value... I think the top rated answer is better in this regard, since it allows you to set the desired "None" value yourself.
16

I encountered this problem too, but for DataFrames. A method which works on both pandas Series and DataFrame is to make use of mask():

data = pd.Series([np.NaN, 10, 30, np.NaN]) # Also works for pd.DataFrame
null_cells = data.isnull()
data = data.astype(str).mask(null_cells, np.NaN)

1 Comment

I love this solution. Worked well for my data set with mixed data types. Thank you.
6

You can cast to to string, conditional on not being null.

x[x.notnull()] = x.astype(str)

x
Out[32]
0      NaN
1    123.0
2      NaN
3    456.0
dtype: object

x.values
Out[33]: array([nan, '123.0', nan, '456.0'], dtype=object)

x.isnull()
Out[34]
0     True
1    False
2     True
3    False
dtype: bool

1 Comment

This won't works if you converting categorical (int) to categorical (str). And will do: x[c.notnull()] = x[c.notnull()].astype(str)
1

If you convert np.nan to str, it becomes the string 'nan' which will be treated by isnull like every other string.

Regarding your edit: After converting to str values, you need to define, what strings are "null" in your opinion. One way to do so might be:

y.isin(['nan', '0', '']) # list contains whatever you want to be evaluated as null

This would at least give you the desired result.

5 Comments

I don't think I asked this clear enough. I want to preserve nulls, instead of converting them into 'nan' strings. So ideally y.isnull() would be the same as x.isnull()
I think, you cannot use isnull for that purpose, however you could write your own method looking for all the string values that you consider to be null
Got it. It seems sketchy to have a Series of object dtype that's character for everything except the nulls, but I'm thinking that's the best option here
It's good but I'm more comfortable using pre-built pandas methods wherever possible. As long as I'm careful setting everything up, having mixed dtypes is something I can forget about downstream in my work, but having different ways of finding NAs will be arduous IMO
1

Use series where to only convert non-null values to str:

y = x.where(x.isnull(), x.astype(str))
y.isnull()

Comments

0

For some reason, np.NaN is converted to the string 'nan' when you convert a series using Series.astype(str), but not when creating a new series with dtype=str. So the following would work:

x_str = pd.Series([np.nan, 123., np.nan, 456.], dtype = str)
x_str.isnull() # Has nulls as expected

Knowing this, we can use the Series constructor to convert an existing series to string while preserving null values:

x = pd.Series([np.nan, 123., np.nan, 456.], dtype = float)
x.isnull() 
y1 = pd.Series(x.array, dtype=str)
y1.isnull() # Has nulls as expected

Just be aware that in order for this to work, you need to pass an array or list to the Series constructor (which, in the current example, means calling x.array or x.values). If you pass a Series, the null values will be converted as if you had called astype()

y2 = pd.Series(x, dtype=str)  # x is a series
y2.isnull()  # Nulls converted to 'nan'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.