2

I have one dataframe (df1):

enter image description here

I then create a new dataframe (df2) which has twice as many rows as fd1. My goal is to copy some elements of the first dataframe inside the second one in a smart way so that the result looks like this:

enter image description here

So far I was able to reach this goal by using the following commands:

raw_data = {'A': ['pinco', 'pallo', 'pollo'], 
            'B': ['lollo', 'fallo', 'gollo'], 
            'C': ['pizzo', 'pazzo', 'razzo']}
df1 = pd.DataFrame(raw_data, columns = ['A', 'B', 'C'])
columns = ['XXX','YYY', 'ZZZ']
N = 3
df2 = pd.DataFrame(columns=columns,index=range(N*2))

idx = 0
for i in range(N):
    df2['XXX'].loc[idx] = df1['A'].loc[i]
    df2['XXX'].loc[idx+1] = df1['A'].loc[i]
    df2['YYY'].loc[idx] = df1['B'].loc[i]
    df2['YYY'].loc[idx+1] = df1['C'].loc[i]
    idx += 2

However I am looking for a more efficient (more compact and elegant) way to obtain this result. I tried to use the following combination inside of the for loop without success:

df2[['XXX','YYY']].loc[idx] = df1[['A', 'B']].loc[i]
df2[['XXX','YYY']].loc[idx+1] = df1[['A', 'C']].loc[i]

2 Answers 2

4

You could do it this way:

df2['XXX'] = np.repeat(df1['A'].values, 2)   # Repeat elements in A twice
df2.loc[::2, 'YYY'] = df1['B'].values        # Fill even rows with B values
df2.loc[1::2, 'YYY'] = df1['C'].values       # Fill odd rows with C values

     XXX    YYY  ZZZ
0  pinco  lollo  NaN
1  pinco  pizzo  NaN
2  pallo  fallo  NaN
3  pallo  pazzo  NaN
4  pollo  gollo  NaN
5  pollo  razzo  NaN
Sign up to request clarification or add additional context in comments.

Comments

2

Working from Nickil Maveli's answer, there's a faster (if somewhat more arcane) solution if you interleave B and C into a single array first. (c. f. this question).

# Repeat elements in A twice
df2['XXX'] = np.repeat(df1['A'].values, 2)
# make a single interleaved array from the values of B and C and copy to YYYY
df2['YYY'] = np.dstack((df1['B'].values, df1['C'].values)).ravel() 

On my machine there was about a 3x speedup

In [110]: %timeit df2.loc[::2, 'YYY'] = df1['B'].values; df2.loc[::2, 'YYY'] = df1['C'].values
1000 loops, best of 3: 274 µs per loop

In [111]: %timeit df2['YYY'] = np.dstack((df1['B'].values, df1['C'].values)).ravel()
10000 loops, best of 3: 87.5 µs per loop

1 Comment

Definitely a much better approach in terms of speed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.