9

Let's say I have a dataframe as:

|       timestamp     | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 |  2.1  |
| 01/01/2013 00:00:03 |  3.7  |
| 01/01/2013 00:00:05 |  2.4  |

I'd like to have the dataframe as:

|       timestamp     | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 |  2.1  |
| 01/01/2013 00:00:01 |  2.1  |
| 01/01/2013 00:00:02 |  2.1  |
| 01/01/2013 00:00:03 |  3.7  |
| 01/01/2013 00:00:04 |  3.7  |
| 01/01/2013 00:00:05 |  2.4  |

How do I go about this?

2 Answers 2

21

You can use resample with ffill:

print (df.dtypes)
timestamp     object
value        float64
dtype: object

df['timestamp'] = pd.to_datetime(df['timestamp'])

print (df.dtypes)
timestamp    datetime64[ns]
value               float64
dtype: object

df = df.set_index('timestamp').resample('S').ffill()
print (df)
                     value
timestamp                 
2013-01-01 00:00:00    2.1
2013-01-01 00:00:01    2.1
2013-01-01 00:00:02    2.1
2013-01-01 00:00:03    3.7
2013-01-01 00:00:04    3.7
2013-01-01 00:00:05    2.4

df = df.set_index('timestamp').resample('S').ffill().reset_index()
print (df)
            timestamp  value
0 2013-01-01 00:00:00    2.1
1 2013-01-01 00:00:01    2.1
2 2013-01-01 00:00:02    2.1
3 2013-01-01 00:00:03    3.7
4 2013-01-01 00:00:04    3.7
5 2013-01-01 00:00:05    2.4
Sign up to request clarification or add additional context in comments.

6 Comments

could you tell me why you did 'pd.to_datetime()'? isn't timestamp already in datetime format?
Because resample working only with datetime and 01/01/2013 00:00:00 is not datetime, only string repr of datetime
but once you 'resample', timestamp would become the index right? So I'd have to copy the df.index.values to a list, make it a column, and then reindex?
Can I just say I've spent the best part of a whole day on stack and other websites and this is the first solution that has worked, thank you so much :)
Could someone please tell me why I am getting the following error? cannot reindex a non-unique index with a method or limit
|
0

note: if your index were already datetime...

...then attempting to resample will throw an error. You could convert the index back to a column and use @jezreal's answer or calculate a new index with pd.date_range.

Consider df_test with 5 minute data and missing rows:

enter image description here

# create new datetime index based on specified range
daterng_all = pd.date_range(start='2021-08-17 15:00:00', end='2021-08-17 16:30:00', freq="5T")

# create rows with missing intervals and fill missing data
df_test = df_test.reindex(daterng_all, fill_value=np.nan).interpolate()

enter image description here

Above, I've chained interpolate() to fill missing data values, but you could also use .ffill() as @jezreal's answer. Interpolate has more kwargs...it works well for my particular data (environmental time series), i particularly like the 'limit' kwarg so I can set it to ignore gaps that don't make sense to fill that way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.