One way to refactor the loop is to locate the desired rows with idxmax() and then index them in one shot:
df = df[df.index < current_day]
th_array = th_array[th_array < df.cumulated.max()]
indexes = pd.DataFrame(df.cumulated.values[:, None] > th_array).idxmax()
df.iloc[indexes]
# cumulated
# 2021-03-25 185
# 2021-03-30 1110
# 2021-04-04 2035
# 2021-04-10 3145
Explanation
First, keep only the rows before current_day:
df = df[df.index < current_day]
# cumulated
# 2021-03-24 0
# 2021-03-25 185
# ...
# 2021-04-10 3145
# 2021-04-11 3330
And keep only the th_array values less than cumulated.max():
th_array = th_array[th_array < df.cumulated.max()]
# array([ 0, 1000, 2000, 3000])
Then use array broadcasting to build a boolean matrix of cumulated > th_array where rows correspond to cumulated and columns to th_array:
valid = pd.DataFrame(df.cumulated.values[:, None] > th_array)
# 0 1 2 3
# 0 False False False False
# 1 True False False False
# 2 True False False False
# 3 True False False False
# 4 True False False False
# 5 True False False False
# 6 True True False False
# 7 True True False False
# 8 True True False False
# 9 True True False False
# 10 True True False False
# 11 True True True False
# 12 True True True False
# 13 True True True False
# 14 True True True False
# 15 True True True False
# 16 True True True False
# 17 True True True True
# 18 True True True True
So for each column (th_array), we want the first True row (cumulated). These can be found with idxmax(). Since False is 0 and True is 1, all the True indexes are tied for the max, and the first one wins the tiebreaker:
indexes = valid.idxmax()
# 0 1
# 1 6
# 2 11
# 3 17
# dtype: int64
Then just iloc these indexes for the final filtered df:
df.iloc[indexes]
# cumulated
# 2021-03-25 185
# 2021-03-30 1110
# 2021-04-04 2035
# 2021-04-10 3145
Timing
For the sample data, the indexing method runs ~11 times faster than looping+appending:
>>> %timeit iloc(df, th_array)
989 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit loop(df, th_array)
10.9 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Testing functions for reference:
def iloc(df, th_array):
df = df[df.index < current_day]
th_array = th_array[th_array < df.cumulated.max()]
indexes = pd.DataFrame(df.cumulated.values[:, None] > th_array).idxmax()
return df.iloc[indexes]
def loop(df, th_array):
df_filtered = pd.DataFrame()
for y in th_array:
x = df[(df['cumulated'] > y) & (df.index < current_day)]
if x.empty is False:
df_filtered = df_filtered.append(x.iloc[0])
return df_filtered