Fail recovery when missing redo checkpoint record without backup_labelREL_15_STABLE

This commit adds an extra check at the beginning of recovery to ensure that the redo record of a checkpoint exists before attempting WAL replay, logging a PANIC if the redo record referenced by the checkpoint record could not be found. This is the same level of failure as when a checkpoint record is missing. This check is added when a cluster is started without a backup_label, after retrieving its checkpoint record. The redo LSN used for the check is retrieved from the checkpoint record successfully read. In the case where a backup_label exists, the startup process already fails if the redo record cannot be found after reading a checkpoint record at the beginning of recovery. Previously, the presence of the redo record was not checked. If the redo and checkpoint records were located on different WAL segments, it would be possible to miss a entire range of WAL records that should have been replayed but were just ignored. The consequences of missing the redo record depend on the version dealt with, these becoming worse the older the version used: - On HEAD, v18 and v17, recovery fails with a pointer dereference at the beginning of the redo loop, as the redo record is expected but cannot be found. These versions are good students, because we detect a failure before doing anything, even if the failure is misleading in the shape of a segmentation fault, giving no information that the redo record is missing. - In v16 and v15, problems show at the end of recovery within FinishWalRecovery(), the startup process using a buggy LSN to decide from where to start writing WAL. The cluster gets corrupted, still it is noisy about it. - v14 and older versions are worse: a cluster gets corrupted but it is entirely silent about the matter. The redo record missing causes the startup process to skip entirely recovery, because a missing record is the same as not redo being required at all. This leads to data loss, as everything is missed between the redo record and the checkpoint record. Note that I have tested that down to 9.4, reproducing the issue with a version of the author's reproducer slightly modified. The code is wrong since at least 9.2, but I did not look at the exact point of origin. This problem has been found by debugging a cluster where the WAL segment including the redo segment was missing due to an operator error, leading to a crash, based on an investigation in v15. Requesting archive recovery with the creation of a recovery.signal or a standby.signal even without a backup_label would mitigate the issue: if the record cannot be found in pg_wal/, the missing segment can be retrieved with a restore_command when checking that the redo record exists. This was already the case without this commit, where recovery would re-fetch the WAL segment that includes the redo record. The check introduced by this commit makes the segment to be retrieved earlier to make sure that the redo record can be found. On HEAD, the code will be slightly changed in a follow-up commit to not rely on a PANIC, to include a test able to emulate the original problem. This is a minimal backpatchable fix, kept separated for clarity. Reported-by: Andres Freund <andres@anarazel.de> Analyzed-by: Andres Freund <andres@anarazel.de> Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com Backpatch-through: 14
author: Michael Paquier 2025-12-16 04:29:42 +0000
committer: Michael Paquier 2025-12-16 04:29:42 +0000
commit: 95ef6f4904a53a643590021fbcc27812df169379 (patch)
tree: adb5319035e017356a1f0a95276fa7648bac8ae9
parent: b0b52b7123ae6f26775d35c868f14a5dca7884d0 (diff)
1 files changed, 10 insertions, 0 deletions
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 7c6692dee6e..b07a54a9216 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -811,6 +811,16 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 		}
 		memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
 		wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
+
+		/* Make sure that REDO location exists. */
+		if (checkPoint.redo < CheckPointLoc)
+		{
+			XLogPrefetcherBeginRead(xlogprefetcher, checkPoint.redo);
+			if (!ReadRecord(xlogprefetcher, LOG, false, checkPoint.ThisTimeLineID))
+				ereport(PANIC,
+						errmsg("could not find redo location %X/%08X referenced by checkpoint record at %X/%08X",
+							   LSN_FORMAT_ARGS(checkPoint.redo), LSN_FORMAT_ARGS(CheckPointLoc)));
+		}
 	}
 
 	/*
author	Michael Paquier	2025-12-16 04:29:42 +0000
committer	Michael Paquier	2025-12-16 04:29:42 +0000
commit	95ef6f4904a53a643590021fbcc27812df169379 (patch)
tree	adb5319035e017356a1f0a95276fa7648bac8ae9
parent	b0b52b7123ae6f26775d35c868f14a5dca7884d0 (diff)