2

I've csv that looks like this (without the spaces between the columns):

id, process_id, name,                                   application_label
2,  384,        com.qualcomm.telephony,                 com.qualcomm.atfwd
6,  0,          com.facebook.katana:videoplayer,        \N
7,  0,          com.facebook.orca:videoplayer,          \N
9,  29195,      com.wsandroid.suite,                    McAfee Security
10, 12909,      com.life360.android.safetymapd:service, \N

How to, and Which is the fastest way to replace the '\N' on application_label columns for the values in the name column?

The output should be:

id, process_id, name,                                   application_label
2,  384,        com.qualcomm.telephony,                 com.qualcomm.atfwd
6,  0,          com.facebook.katana:videoplayer,        com.facebook.katana:videoplayer
7,  0,          com.facebook.orca:videoplayer,          com.facebook.orca:videoplayer
9,  29195,      com.wsandroid.suite,                    McAfee Security
10, 12909,      com.life360.android.safetymapd:service, com.life360.android.safetymapd:service

Curiosity:

If it was a pandas dataframe what's the fastest way to do this? i've make something like this:

for index in df.index:
    if df.get_value(index, 'application_label') == r'\N':
        df.set_value(index, 'application_label', df.get_value(index, 'name'))

But can i do this even faster?

3
  • Have a look at replace: pandas.pydata.org/pandas-docs/stable/generated/… Commented May 21, 2018 at 17:15
  • yes but i can't figure out which value to give to the 'value' param, i see that value=1 replace all my '\N' to 1 but i want dynamic values compared to the other column Commented May 21, 2018 at 17:25
  • If you need raw speed on CPython, Pandas is the way to go. pandas.read_csv() is really well optimized. Then get rid of the for loop and follow @chthonicdaemon's example. Finally, write the dataframe back using to_csv(). Commented May 21, 2018 at 17:37

1 Answer 1

2

Try this,

mask = DF['application_label'] == r'\N'
DF.loc[mask, 'application_label'] = DF['name']
Sign up to request clarification or add additional context in comments.

7 Comments

I think you need == for the mask
@Orenshi- updated to the answer. Thanx for your comments 😊
clever and way fast then my loop.
and if i want to change values directly on the csv?
If the above answer useful to you, please upvote and accept the answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.