Pandas - extract numeric values from string column using replace + regex

Question

I have a dataframe with a column with many value ranges. Example below:

dirty_col = pd.Series([5, 6, '1-2', '40-60', 10])

I am trying to clean up this column producing a new column with the average of the value ranges. Expected result:

clean_col = pd.Series([5, 6, 1.5, 50, 10])

I am trying to map this using regex in vectorized mapping functions, something like:

clean_col = pd.Series([5, 6, '1-2', '40-60', 10]).replace({'^[0-9]-[0-9]$':--average here--},regex=True)

But I am stuck here. How could I get the expected result above USING a mapping dictionary and regular expressions? I am aware I could work directly in the dataframe spliting the text by '-' and then averaging out, but, I already have many other cleaning mappings inside above dictionary, that it would be more convenient and cleaner to keep using the same dictionary for all the cleaning.

I think the solution I am looking for probably uses lambdas, or an extra function that gets called from inside the dictionary, but I cannot figure out how to do this.

Chris · Accepted Answer · 2020-12-11 08:21:49Z

3

I don't think pandas.Series.replace supports callable. One possible way using pandas.eval:

dirty_col.replace({'^(\d+)-(\d+)$': "(\\1+\\2)/2"},regex=True).apply(pd.eval)

Output:

0     5.0
1     6.0
2     1.5
3    50.0
4    10.0
dtype: float64

answered Dec 11, 2020 at 8:21

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andy L. · Accepted Answer · 2020-12-11 08:42:28Z

2

You may try series.str.replace with repl as a callable and fillna back

f_repr = lambda m: str(sum(map(int, m[0].split('-')))/2)
s_out = s.str.replace(r'^[0-9]+-[0-9]+$', f_repr).fillna(s)

Out[30]:
0       5
1       6
2     1.5
3    50.0
4      10
dtype: object

answered Dec 11, 2020 at 8:42

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

1 Comment

Pab Over a year ago

+1 for your input, but as I explain in the post above, I was looking for a dictionary-based solution. Regards.

Collectives™ on Stack Overflow

Pandas - extract numeric values from string column using replace + regex

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related