Suppose I create a sequential input LSTM in Tensorflow along the lines of:
def Sequential_Input_LSTM(df, input_sequence):
df_np = df.to_numpy()
X = []
y = []
for i in range(len(df_np) - input_sequence):
row = [a for a in df_np[i:i + input_sequence]]
X.append(row)
label = df_np[i + input_sequence]
y.append(label)
return np.array(X), np.array(y)
X, y = Sequential_Input_LSTM(df_data , 10) # pandas DataFrame df_data contains our data
In this example, I slice my data in X (input vector) and y (labels) in such a way that e.g. the first 10 values (sequence length) serve as X and the 11th value is the first y. Then, the window of 10 values is moved one step to the right (one timestep further) and we take again 10 values for X and the value after this second row as the next y, and so on.
Then suppose I take a part of X as my X_test, and use a LSTM model to make a time-series prediction, like predictions = model.predict(X_test).
When I actually tried this, and plotted the results from predict(X_test), it looks like the y array and the predictions results are synchronized without further adjustments. I expected that I would have to shift the prediction array manually 10 timesteps to the right when plotting it together with the labels, since I cannot explain where the first 10 timestamps of prediction come from.
Where do the predictions for the first 10 timesteps of X_test come from, seeing as the model has not received 10 input sequence values yet? Does Tensorflow use the last timesteps in X_test to create the predictions of the first 10 values, or are the predictions at the beginning just pure guesses?