I want to join columns of type string, in a pandas dataframe or numupy ndarray, into a last column like this:
a b c a b c d
---------- ---> ---------------
a b c a b c a_b_c
d e f d e f d_e_f
g h i g h i g_h_i
I can think of two representative options:
# Compose data
a = ['a','b','c']
b = ['d','e','f']
c = ['g','h','i']
pdf = pd.DataFrame([a,b,c], columns=['a','b','c'])
# One option
%%timeit
pdf.loc[:,'d'] = [i for i in map(lambda x: '_'.join([x.a, x.b, x.c]), pdf.itertuples())]
>>>1.08 ms ± 4.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Another option
%%timeit
tmp=[]
for i in pdf.itertuples():
tmp.append('_'.join([i.a, i.b, i.c]))
pdf.loc[:,'d'] = tmp
>>>1.08 ms ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I understand that there might be too little data to see any difference between these methods but my question is: Is there a smarter method built-in in numpy or pandas that I can call? Also, is there any problem with any of the two methods that I thought of?
Thank you!
