I have two dataframes which contain data collected at two different frequencies. I want to update the label of df2, to that of df1 if it falls into the duration of an event.
I created a nested for-loop to do it, but it takes a rather long time. Here is the code I used:
for i in np.arange(len(df1)-1):
for j in np.arange(len(df2)):
if (df2.timestamp[j] > df1.timestamp[i]) & (df2.timestamp[j] < (df1.timestamp[i] + df1.duration[i])):
df2.loc[j,"label"] = df1.loc[i,"label"]
Is there a more efficient way of doing this? df1 size (367, 4) df2 size (342423, 9)
short example data:
import numpy as np
import pandas as pd
data1 = {'timestamp': [1,2,3,4,5,6,7,8,9],
'duration': [0.5,0.3,0.8,0.2,0.4,0.5,0.3,0.7,0.5],
'label': ['inh','exh','inh','exh','inh','exh','inh','exh','inh']
}
df1 = pd.DataFrame (data1, columns = ['timestamp','duration','label'])
data2 = {'timestamp': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5],
'label': ['plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc','plc']
}
df2 = pd.DataFrame (data2, columns = ['timestamp','label'])
df2matches more than one indf1, is to then take the last (by index order)?