I have a dataframe in the following general format:
id,transaction_dt,units,measures
1,2018-01-01,4,30.5
1,2018-01-03,4,26.3
2,2018-01-01,3,12.7
2,2018-01-03,3,8.8
What I am trying to accomplish is stack and enumerate the 'transaction_dt' based on the value of 'units' field in same record and unroll them into new records to produce something like this:
id,transaction_dt,measures
1,2018-01-01,30.5
1,2018-01-02,30.5
1,2018-01-03,30.5
1,2018-01-04,30.5
1,2018-01-03,26.3
1,2018-01-04,26.3
1,2018-01-05,26.3
1,2018-01-06,26.3
2,2018-01-01,12.7
2,2018-01-02,12.7
2,2018-01-03,12.7
2,2018-01-03,8.8
2,2018-01-04,8.8
2,2018-01-05,8.8
I have been working on trying to create a vectorized performant version of the answer to my prior question that someone was kind enough to answer here: Python PANDAS: Stack and Enumerate Date to Create New Records
df.set_index('transaction_dt', inplace=True)
df.apply(lambda x: pd.Series(pd.date_range(x.name, periods=x.units)), axis=1). \
stack(). \
reset_index(level=1). \
join(df['measure']). \
drop('level_1', axis=1). \
reset_index(). \
rename(columns={0:'enumerated_dt'})
This does work but I have a very large dataset to run this on, so I need to invest in optimizing it a bit more. He suggests creating an array of all dates which I can do with something like this:
date_range = pd.date_range('2004-01-01', '2017-12-31', freq='1D')
And he suggests then reindexing the array and forward filling the values somehow. If anyone could help me, I would sincerely appreciate it!