Replacing multiple value within cell in dataframe - Python/Pandas

Question

I'm searching for solution how to replace several strings within one cell inside of data frame in Python-Pandas.

Each column has unique elements to be replaced based on legend that is already defined.

I've already find solution how to replace values within column, but result that I get replace only one string at time, and removing other. For example cell value: AA, BB, CC legend: AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4 result: Level 1.

Data set:
Field Name | Category 1 | Category 2
Test1        AA BB CC      LD DD
Test2        BB CC         DD
Test3        AA            LD
Test4        AA BB DD      LD DD

Legend:
AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4
LD - High, DD - Low

I expect result to be combined with one cell, for example: Level 1; Level 2 while cell value was AA, BB

What kind of data structure does your legend have? Is that a dictionary? — Erfan
– Erfan, Commented Aug 6, 2019 at 8:27
@jezrael, this would not work for this case since he has multiple values in one cell — Erfan
– Erfan, Commented Aug 6, 2019 at 8:29
yes legend can be dictionary it simple match string to value: AA - Level 1 — mikemorrison
– mikemorrison, Commented Aug 6, 2019 at 8:30

jezrael · Accepted Answer · 2019-08-06 09:17:39Z

1

Use:

d = {'AA':'Level 1','BB':'Level 2','CC':'Level 3','DD':'Level 4','LD': 'High', 'DD' :'Low'}

regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df = df.apply(lambda x: x.str.replace(regex, lambda x: d[x.group()], regex=True))

print (df)

  Field Name               Category 1 Category 2
0      Test1  Level 1 Level 2 Level 3   High Low
1      Test2          Level 2 Level 3        Low
2      Test3                  Level 1       High
3      Test4      Level 1 Level 2 Low   High Low

If need apply solution only for one column:

df['Category 1'] = df['Category 1'].str.replace(regex, lambda x: d[x.group()], regex=True)

edited Aug 6, 2019 at 9:17

answered Aug 6, 2019 at 8:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Erfan Over a year ago

Nice one, what does d[x.group()] do in this case? +1

mikemorrison Over a year ago

Thank you! I will check this very shortly and come back with feedback

jezrael Over a year ago

@Erfan - It is regex with callback :)

mikemorrison Over a year ago

Once I add code as described below (add reference to the specify column in my data set) it is throwing an error as follow: File "<ipython-input-17-5e5ed74417ef>", line 4 df.['Data Class'] = df.['Data Class'].apply(lambda x: x.str.replace(regex, lambda x: d[x.group()]) ^ SyntaxError: invalid syntax

jezrael Over a year ago

Change df.['Data Class'] to df['Data Class']

|

Collectives™ on Stack Overflow

Replacing multiple value within cell in dataframe - Python/Pandas

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related