2

I have a dataframe with a column with many value ranges. Example below:

dirty_col = pd.Series([5, 6, '1-2', '40-60', 10])

I am trying to clean up this column producing a new column with the average of the value ranges. Expected result:

clean_col = pd.Series([5, 6, 1.5, 50, 10])

I am trying to map this using regex in vectorized mapping functions, something like:

clean_col = pd.Series([5, 6, '1-2', '40-60', 10]).replace({'^[0-9]-[0-9]$':--average here--},regex=True)

But I am stuck here. How could I get the expected result above USING a mapping dictionary and regular expressions? I am aware I could work directly in the dataframe spliting the text by '-' and then averaging out, but, I already have many other cleaning mappings inside above dictionary, that it would be more convenient and cleaner to keep using the same dictionary for all the cleaning.

I think the solution I am looking for probably uses lambdas, or an extra function that gets called from inside the dictionary, but I cannot figure out how to do this.

0

2 Answers 2

3

I don't think pandas.Series.replace supports callable. One possible way using pandas.eval:

dirty_col.replace({'^(\d+)-(\d+)$': "(\\1+\\2)/2"},regex=True).apply(pd.eval)

Output:

0     5.0
1     6.0
2     1.5
3    50.0
4    10.0
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

2

You may try series.str.replace with repl as a callable and fillna back

f_repr = lambda m: str(sum(map(int, m[0].split('-')))/2)
s_out = s.str.replace(r'^[0-9]+-[0-9]+$', f_repr).fillna(s)

Out[30]:
0       5
1       6
2     1.5
3    50.0
4      10
dtype: object

1 Comment

+1 for your input, but as I explain in the post above, I was looking for a dictionary-based solution. Regards.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.