Python pandas series: convert float to string, preserving nulls

Question

How can I preserve nulls after converting to string? I'm working with social security numbers, where it's necessary to go back and forth between float and string.

import pandas as pd
import numpy as np    
x = pd.Series([np.nan, 123., np.nan, 456.], dtype = float)
x.isnull()

...Has nulls

y = x.astype(str)
y.isnull()

...No nulls

So ideally x.isnull() and y.isnull() would be the same.

I think it's dangerous to use a Series of mixed dtypes, but thinking this is the best solution for the time being:

z = y.copy()
z[z == 'nan'] = np.nan
z.isnull() # works as desired
type(z[0]) # but has floats for nulls
type(z[1]) # and strings for values

janmonko · Accepted Answer · 2021-08-24 16:18:13Z

19

You can also use the "string" dtype instead of str in pandas >= 1.0:

y = x.astype("string")

should preserve the NaNs.

It's described in the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

answered Aug 24, 2021 at 16:18

janmonko

2312 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sam Over a year ago

This one should be the top answer as this is the easiest and most "pandas way of solving"

Chris Over a year ago

The problem I run into with this solution is that all the null values are turned into "<NA>" when I later try to save it to the database.

Siete Over a year ago

So now I have object (str) columns with None values and object (Str) columns with <Na> (<class 'pandas._libs.missing.NAType'>) values, why couldn't they just accept None as the only NA/NaN/NaI/NaD value... I think the top rated answer is better in this regard, since it allows you to set the desired "None" value yourself.

Greg Dooley · Accepted Answer · 2020-03-20 17:41:52Z

16

I encountered this problem too, but for DataFrames. A method which works on both pandas Series and DataFrame is to make use of mask():

data = pd.Series([np.NaN, 10, 30, np.NaN]) # Also works for pd.DataFrame
null_cells = data.isnull()
data = data.astype(str).mask(null_cells, np.NaN)

answered Mar 20, 2020 at 17:41

Greg Dooley

1611 silver badge4 bronze badges

1 Comment

Aaron England Over a year ago

I love this solution. Worked well for my data set with mixed data types. Thank you.

chrisb · Accepted Answer · 2017-03-30 21:03:21Z

6

You can cast to to string, conditional on not being null.

x[x.notnull()] = x.astype(str)

x
Out[32]
0      NaN
1    123.0
2      NaN
3    456.0
dtype: object

x.values
Out[33]: array([nan, '123.0', nan, '456.0'], dtype=object)

x.isnull()
Out[34]
0     True
1    False
2     True
3    False
dtype: bool

answered Mar 30, 2017 at 21:03

chrisb

52.7k8 gold badges73 silver badges70 bronze badges

1 Comment

drnk Over a year ago

This won't works if you converting categorical (int) to categorical (str). And will do: x[c.notnull()] = x[c.notnull()].astype(str)

Arco Bast · Accepted Answer · 2017-03-30 19:27:22Z

1

If you convert np.nan to str, it becomes the string 'nan' which will be treated by isnull like every other string.

Regarding your edit: After converting to str values, you need to define, what strings are "null" in your opinion. One way to do so might be:

y.isin(['nan', '0', '']) # list contains whatever you want to be evaluated as null

This would at least give you the desired result.

edited Mar 30, 2017 at 19:27

answered Mar 30, 2017 at 18:47

Arco Bast

3,9502 gold badges31 silver badges56 bronze badges

5 Comments

mef jons Over a year ago

I don't think I asked this clear enough. I want to preserve nulls, instead of converting them into 'nan' strings. So ideally y.isnull() would be the same as x.isnull()

Arco Bast Over a year ago

I think, you cannot use isnull for that purpose, however you could write your own method looking for all the string values that you consider to be null

mef jons Over a year ago

Got it. It seems sketchy to have a Series of object dtype that's character for everything except the nulls, but I'm thinking that's the best option here

mef jons Over a year ago

It's good but I'm more comfortable using pre-built pandas methods wherever possible. As long as I'm careful setting everything up, having mixed dtypes is something I can forget about downstream in my work, but having different ways of finding NAs will be arduous IMO

mef jons Over a year ago

Let us continue this discussion in chat.

JohnLazar · Accepted Answer · 2018-04-16 10:50:36Z

1

Use series where to only convert non-null values to str:

y = x.where(x.isnull(), x.astype(str))
y.isnull()

answered Apr 16, 2018 at 10:50

JohnLazar

111 bronze badge

Comments

capelastegui · Accepted Answer · 2019-06-28 11:19:28Z

For some reason, np.NaN is converted to the string 'nan' when you convert a series using Series.astype(str), but not when creating a new series with dtype=str. So the following would work:

x_str = pd.Series([np.nan, 123., np.nan, 456.], dtype = str)
x_str.isnull() # Has nulls as expected

Knowing this, we can use the Series constructor to convert an existing series to string while preserving null values:

x = pd.Series([np.nan, 123., np.nan, 456.], dtype = float)
x.isnull() 
y1 = pd.Series(x.array, dtype=str)
y1.isnull() # Has nulls as expected

Just be aware that in order for this to work, you need to pass an array or list to the Series constructor (which, in the current example, means calling x.array or x.values). If you pass a Series, the null values will be converted as if you had called astype()

y2 = pd.Series(x, dtype=str)  # x is a series
y2.isnull()  # Nulls converted to 'nan'

Collectives™ on Stack Overflow

Python pandas series: convert float to string, preserving nulls

6 Answers 6

3 Comments

1 Comment

1 Comment

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

1 Comment

1 Comment

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related