I've csv that looks like this (without the spaces between the columns):
id, process_id, name, application_label
2, 384, com.qualcomm.telephony, com.qualcomm.atfwd
6, 0, com.facebook.katana:videoplayer, \N
7, 0, com.facebook.orca:videoplayer, \N
9, 29195, com.wsandroid.suite, McAfee Security
10, 12909, com.life360.android.safetymapd:service, \N
How to, and Which is the fastest way to replace the '\N' on application_label columns for the values in the name column?
The output should be:
id, process_id, name, application_label
2, 384, com.qualcomm.telephony, com.qualcomm.atfwd
6, 0, com.facebook.katana:videoplayer, com.facebook.katana:videoplayer
7, 0, com.facebook.orca:videoplayer, com.facebook.orca:videoplayer
9, 29195, com.wsandroid.suite, McAfee Security
10, 12909, com.life360.android.safetymapd:service, com.life360.android.safetymapd:service
Curiosity:
If it was a pandas dataframe what's the fastest way to do this? i've make something like this:
for index in df.index:
if df.get_value(index, 'application_label') == r'\N':
df.set_value(index, 'application_label', df.get_value(index, 'name'))
But can i do this even faster?
replace: pandas.pydata.org/pandas-docs/stable/generated/…pandas.read_csv()is really well optimized. Then get rid of the for loop and follow @chthonicdaemon's example. Finally, write the dataframe back usingto_csv().