0

I'm searching for solution how to replace several strings within one cell inside of data frame in Python-Pandas.

Each column has unique elements to be replaced based on legend that is already defined.

I've already find solution how to replace values within column, but result that I get replace only one string at time, and removing other. For example cell value: AA, BB, CC legend: AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4 result: Level 1.

Data set:
Field Name | Category 1 | Category 2
Test1        AA BB CC      LD DD
Test2        BB CC         DD
Test3        AA            LD
Test4        AA BB DD      LD DD

Legend:
AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4
LD - High, DD - Low

I expect result to be combined with one cell, for example: Level 1; Level 2 while cell value was AA, BB

6
  • What kind of data structure does your legend have? Is that a dictionary? Commented Aug 6, 2019 at 8:27
  • Create dictionary and then replace Commented Aug 6, 2019 at 8:28
  • @jezrael, this would not work for this case since he has multiple values in one cell Commented Aug 6, 2019 at 8:29
  • yes legend can be dictionary it simple match string to value: AA - Level 1 Commented Aug 6, 2019 at 8:30
  • Which pandas version are you on? print(pd.__version__)? Commented Aug 6, 2019 at 8:31

1 Answer 1

1

Use:

d = {'AA':'Level 1','BB':'Level 2','CC':'Level 3','DD':'Level 4','LD': 'High', 'DD' :'Low'}

regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df = df.apply(lambda x: x.str.replace(regex, lambda x: d[x.group()], regex=True))

print (df)

  Field Name               Category 1 Category 2
0      Test1  Level 1 Level 2 Level 3   High Low
1      Test2          Level 2 Level 3        Low
2      Test3                  Level 1       High
3      Test4      Level 1 Level 2 Low   High Low

If need apply solution only for one column:

df['Category 1'] = df['Category 1'].str.replace(regex, lambda x: d[x.group()], regex=True)
Sign up to request clarification or add additional context in comments.

11 Comments

Nice one, what does d[x.group()] do in this case? +1
Thank you! I will check this very shortly and come back with feedback
@Erfan - It is regex with callback :)
Once I add code as described below (add reference to the specify column in my data set) it is throwing an error as follow: File "<ipython-input-17-5e5ed74417ef>", line 4 df.['Data Class'] = df.['Data Class'].apply(lambda x: x.str.replace(regex, lambda x: d[x.group()]) ^ SyntaxError: invalid syntax
Change df.['Data Class'] to df['Data Class']
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.