Tensorflow predict() timeseries alignment in Python

Question

Suppose I create a sequential input LSTM in Tensorflow along the lines of:

def Sequential_Input_LSTM(df, input_sequence):
    df_np = df.to_numpy()
    X = []
    y = []
    
    for i in range(len(df_np) - input_sequence):
        row = [a for a in df_np[i:i + input_sequence]]
        X.append(row)
        label = df_np[i + input_sequence]
        y.append(label)
        
    return np.array(X), np.array(y)

X, y = Sequential_Input_LSTM(df_data , 10) # pandas DataFrame df_data contains our data

In this example, I slice my data in X (input vector) and y (labels) in such a way that e.g. the first 10 values (sequence length) serve as X and the 11th value is the first y. Then, the window of 10 values is moved one step to the right (one timestep further) and we take again 10 values for X and the value after this second row as the next y, and so on.

Then suppose I take a part of X as my X_test, and use a LSTM model to make a time-series prediction, like predictions = model.predict(X_test).

When I actually tried this, and plotted the results from predict(X_test), it looks like the y array and the predictions results are synchronized without further adjustments. I expected that I would have to shift the prediction array manually 10 timesteps to the right when plotting it together with the labels, since I cannot explain where the first 10 timestamps of prediction come from.

Where do the predictions for the first 10 timesteps of X_test come from, seeing as the model has not received 10 input sequence values yet? Does Tensorflow use the last timesteps in X_test to create the predictions of the first 10 values, or are the predictions at the beginning just pure guesses?

mhenning · Accepted Answer · 2024-10-28 08:33:43Z

0

If I get it right, the problem is that the first 10 timesteps from X_test use the last 10 timesteps from X (or more precise, X_train) for the predicition. With big enough X_test, this does not make much difference, but is theoretically data leakage from the training set to the test set.

I demonstrate it with a small example (correct me if I'm wrong):

df_data = [0, 1, 2, .., 15]  # len 16
window_size = 3
X = [[0,1,2], [1,2,3], [2,3,4], ..., [12,13,14]]  # len 13
y = [3, 4, 5, .., 15]  # len 13
# split the data 10-3 for train-test
X_train = [[0,1,2], [1,2,3], [2,3,4], ..., [9,10,11]]
y_train = [3, 4, 5, .., 12]
X_test = [[10,11,12], [11,12,13], [12,13,14]]
y_test = [13, 14, 15]

The problem in this example is that 10 and 11 are both used in sequences for X_train and X_test. So you have to first split df_data into train/test (without shuffling) and then do the sequencing separately. With this, you'd lose the first n-th values for y in both train and test.

Edit: To clarify how the actual split would look like for a bit smaller X_train (the X_train from above is too big to leave predictions for X_test without leakage):

df_data = [0, 1, 2, .., 15]  # len 16
window_size = 3
train = [0, 1, 2, .., 10]
test = [11, 12, .., 15]
# make windows
X_train = [[0,1,2], [1,2,3], [2,3,4], ..., [7,8,9]]
y_train = [3, 4, 5, .., 10]
X_test = [[11,12,13], [12,13,14]]
y_test = [14,15]

Here, no values that are present in X_train or y_train are present in either X_test or y_test. The first 3 values in both train and test can not be predicted and are only used for windowing.

edited Oct 28, 2024 at 8:33

answered Oct 22, 2024 at 10:37

mhenning

1,9531 gold badge6 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user27886601 Over a year ago

Thanks, so just to clarify: there actually are no predictions for the first 3 time steps that go into X_test (10,11,12) when I call predict(X_test) as we only predict the time steps 13, 14 ,15.

mhenning Over a year ago

I expanded my answer with a correct example for splitting. You'd loose prediction for time steps 0, 1, 2 and 11, 12, 13, as they are the first 3 values (with window_size=3) for their set.

Collectives™ on Stack Overflow

Tensorflow predict() timeseries alignment in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related