Replace some values of csv column for values in another

Question

I've csv that looks like this (without the spaces between the columns):

id, process_id, name,                                   application_label
2,  384,        com.qualcomm.telephony,                 com.qualcomm.atfwd
6,  0,          com.facebook.katana:videoplayer,        \N
7,  0,          com.facebook.orca:videoplayer,          \N
9,  29195,      com.wsandroid.suite,                    McAfee Security
10, 12909,      com.life360.android.safetymapd:service, \N

How to, and Which is the fastest way to replace the '\N' on application_label columns for the values in the name column?

The output should be:

id, process_id, name,                                   application_label
2,  384,        com.qualcomm.telephony,                 com.qualcomm.atfwd
6,  0,          com.facebook.katana:videoplayer,        com.facebook.katana:videoplayer
7,  0,          com.facebook.orca:videoplayer,          com.facebook.orca:videoplayer
9,  29195,      com.wsandroid.suite,                    McAfee Security
10, 12909,      com.life360.android.safetymapd:service, com.life360.android.safetymapd:service

Curiosity:

If it was a pandas dataframe what's the fastest way to do this? i've make something like this:

for index in df.index:
    if df.get_value(index, 'application_label') == r'\N':
        df.set_value(index, 'application_label', df.get_value(index, 'name'))

But can i do this even faster?

Have a look at replace: pandas.pydata.org/pandas-docs/stable/generated/… — Alan
– Alan, Commented May 21, 2018 at 17:15
yes but i can't figure out which value to give to the 'value' param, i see that value=1 replace all my '\N' to 1 but i want dynamic values compared to the other column — Ruben Alves
– Ruben Alves, Commented May 21, 2018 at 17:25
If you need raw speed on CPython, Pandas is the way to go. pandas.read_csv() is really well optimized. Then get rid of the for loop and follow @chthonicdaemon's example. Finally, write the dataframe back using to_csv(). — akaihola
– akaihola, Commented May 21, 2018 at 17:37

chthonicdaemon · Accepted Answer · 2018-05-21 17:16:42Z

2

Try this,

mask = DF['application_label'] == r'\N'
DF.loc[mask, 'application_label'] = DF['name']

edited May 21, 2018 at 17:16

chthonicdaemon

19.9k2 gold badges55 silver badges70 bronze badges

answered May 21, 2018 at 17:09

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Orenshi Over a year ago

I think you need == for the mask

Mohamed Thasin ah Over a year ago

@Orenshi- updated to the answer. Thanx for your comments 😊

Ruben Alves Over a year ago

clever and way fast then my loop.

Ruben Alves Over a year ago

and if i want to change values directly on the csv?

Mohamed Thasin ah Over a year ago

If the above answer useful to you, please upvote and accept the answer.

|

Collectives™ on Stack Overflow

Replace some values of csv column for values in another

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related