Copying (efficiently) dataframe content inside of another dataframe using Python

Question

I have one dataframe (df1):

I then create a new dataframe (df2) which has twice as many rows as fd1. My goal is to copy some elements of the first dataframe inside the second one in a smart way so that the result looks like this:

So far I was able to reach this goal by using the following commands:

raw_data = {'A': ['pinco', 'pallo', 'pollo'], 
            'B': ['lollo', 'fallo', 'gollo'], 
            'C': ['pizzo', 'pazzo', 'razzo']}
df1 = pd.DataFrame(raw_data, columns = ['A', 'B', 'C'])
columns = ['XXX','YYY', 'ZZZ']
N = 3
df2 = pd.DataFrame(columns=columns,index=range(N*2))

idx = 0
for i in range(N):
    df2['XXX'].loc[idx] = df1['A'].loc[i]
    df2['XXX'].loc[idx+1] = df1['A'].loc[i]
    df2['YYY'].loc[idx] = df1['B'].loc[i]
    df2['YYY'].loc[idx+1] = df1['C'].loc[i]
    idx += 2

However I am looking for a more efficient (more compact and elegant) way to obtain this result. I tried to use the following combination inside of the for loop without success:

df2[['XXX','YYY']].loc[idx] = df1[['A', 'B']].loc[i]
df2[['XXX','YYY']].loc[idx+1] = df1[['A', 'C']].loc[i]

Nickil Maveli · Accepted Answer · 2016-09-16 09:09:46Z

4

You could do it this way:

df2['XXX'] = np.repeat(df1['A'].values, 2)   # Repeat elements in A twice
df2.loc[::2, 'YYY'] = df1['B'].values        # Fill even rows with B values
df2.loc[1::2, 'YYY'] = df1['C'].values       # Fill odd rows with C values

     XXX    YYY  ZZZ
0  pinco  lollo  NaN
1  pinco  pizzo  NaN
2  pallo  fallo  NaN
3  pallo  pazzo  NaN
4  pollo  gollo  NaN
5  pollo  razzo  NaN

edited Sep 16, 2016 at 9:09

answered Sep 16, 2016 at 9:04

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:00:55Z

2

Working from Nickil Maveli's answer, there's a faster (if somewhat more arcane) solution if you interleave B and C into a single array first. (c. f. this question).

# Repeat elements in A twice
df2['XXX'] = np.repeat(df1['A'].values, 2)
# make a single interleaved array from the values of B and C and copy to YYYY
df2['YYY'] = np.dstack((df1['B'].values, df1['C'].values)).ravel()

On my machine there was about a 3x speedup

In [110]: %timeit df2.loc[::2, 'YYY'] = df1['B'].values; df2.loc[::2, 'YYY'] = df1['C'].values
1000 loops, best of 3: 274 µs per loop

In [111]: %timeit df2['YYY'] = np.dstack((df1['B'].values, df1['C'].values)).ravel()
10000 loops, best of 3: 87.5 µs per loop

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Sep 16, 2016 at 15:02

Elliot

2,69920 silver badges29 bronze badges

1 Comment

Nickil Maveli Over a year ago

Definitely a much better approach in terms of speed.

Collectives™ on Stack Overflow

Copying (efficiently) dataframe content inside of another dataframe using Python

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related