| Age | Commit message (Collapse) | Author |
|
This change makes pgstat_report_vacuum() more consistent with
pgstat_report_analyze(), that also uses a Relation. This enforces a
policy that callers of this routine should open and lock the relation
whose statistics are updated before calling this routine. We will
unlikely have a lot of callers of this routine in the tree, but it seems
like a good idea to imply this requirement in the long run.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
|
|
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.
That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.
Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Backpatch-through: 14
Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
|
|
heap_page_prune_and_freeze() fills in PruneState->deadoffsets, the array
of OffsetNumbers of dead tuples. It is returned to the caller in the
PruneFreezeResult. To avoid having two copies of the array, the
PruneState saves only a pointer to the array. This was a bit unusual and
confusing, so add a clarifying comment.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=jiD1nqch4JQN+odAxZSD7mRvdoHUGJYN2r6tQG_66yQ@mail.gmail.com
|
|
The const qualification of the presult argument to prune_freeze_setup()
is later cast away, so it was not correct. Remove it and add a comment
explaining that presult should not be modified.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fb97d0ae-a0bc-411d-8a87-f84e7e146488%40eisentraut.org
|
|
This commit provides test coverage for dc7c77f825d7, where the redo
record and the checkpoint record finish on different WAL segments with
the start of recovery able to detect that the redo record is missing.
This test uses a wait injection point done in the critical section of a
checkpoint, method that requires not one but actually two wait injection
points to avoid any memory allocations within the critical section of
the checkpoint:
- Checkpoint run with a background psql.
- One first wait point is run by the checkpointer before the critical
section, allocating the shared memory required by the DSM registry for
the wait machinery in the library injection_points.
- First point is woken up.
- Second wait point is loaded before the critical section, allocating
the memory to build the path to the library loaded, then run in the
critical section once the checkpoint redo record has been logged.
- WAL segment is switched while waiting on the second point.
- Checkpoint completes.
- Stop cluster with immediate mode.
- The segment that includes the redo record is removed.
- Start, recovery fails as the redo record cannot be found.
The error message introduced in dc7c77f825d7 is now reduced to a FATAL,
meaning that the information is still provided while being able to use a
test for it. Nitin has provided a basic version of the test, that I
have enhanced to make it portable with two points. Without
dc7c77f825d7, the cluster crashes in this test, not on a PANIC but due
to the pointer dereference at the beginning of recovery, failure
mentioned in the other commit.
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
|
|
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found. This is the same level of failure as when a
checkpoint record is missing. This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.
In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.
Previously, the presence of the redo record was not checked. If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored. The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found. These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL. The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter. The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all. This leads to data loss, as
everything is missed between the redo record and the checkpoint record.
Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified. The code is wrong
since at least 9.2, but I did not look at the exact point of origin.
This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.
Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists. This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record. The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.
On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.
Reported-by: Andres Freund <andres@anarazel.de>
Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
|
|
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding. It rewrites other comments.
Back-patch to v18, where commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704
first appeared. Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN
contains no calls to that function.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
|
|
This corrects commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Surya Poondla <s_poondla@apple.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CA+renyWCW+_2QvXERBQ+mna6ANwAVXXmHKCA-WzL04bZRsjoBA@mail.gmail.com
|
|
Late-model gcc with -fsanitize=undefined enabled issues warnings
about uses of PageGetItemId() when it can't prove that the
offsetNumber is > 0. The call sites where this happens are
checking that the offnum is <= PageGetMaxOffsetNumber(page), so
it seems reasonable to add an explicit check that offnum >= 1 too.
While at it, rearrange the code to be less contorted and avoid
duplicate checks on PageGetMaxOffsetNumber. Maybe the compiler
would optimize away the duplicate logic or maybe not, but the
existing coding has little to recommend it anyway.
There are multiple instances of this identical coding pattern in
heapam.c and heapam_xlog.c. Current gcc only complains about two
of them, but I fixed them all in the name of consistency.
Potentially this could be back-patched in the name of silencing
warnings; but I think enabling UBSAN is mainly something people
would do on HEAD, so for now it seems not worth the trouble.
Discussion: https://postgr.es/m/1699806.1765746897@sss.pgh.pa.us
|
|
In the server, check explicitly for multixids with zero members. We
used to have an assertion for it, but commit d4b7bde418 replaced it
with more extensive runtime checks, but it missed the original case of
zero members.
In the upgrade code, a negative length never makes sense, so better
check for it explicitly. Commit d4b7bde418 added a similar sanity
check to the corresponding server code on master, and in backbranches,
the 'length' is passed to palloc which would fail with "invalid memory
alloc request size" error. Clarify the comments on what kind of
invalid entries are tolerated by the upgrade code and which ones are
reported as fatal errors.
Coverity complained about 'length' in the upgrade code being
tainted. That's bogus because we trust the data on disk at least to
some extent, but hopefully this will silence the complaint. If not,
I'll dismiss it manually.
Discussion: https://www.postgresql.org/message-id/7b505284-c6e9-4c80-a7ee-816493170abc@iki.fi
|
|
The current list from the buildfarm includes quite a few typedef
names that it used to miss. The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array. In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it. Install the current list and re-run pgindent
to update affected code.
Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.
Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
|
|
Change WAIT_LSN_TYPE_COUNT from an enum sentinel to a macro definition,
in a similar way to IOObject, IOContext, and BackendType enums. Remove
explicit enum value assignments well.
Author: Xuneng Zhou <xunengzhou@gmail.com>
|
|
Similar to commit 75f49221c22, it is preferable to use
StaticAssertDecl() instead of StaticAssertStmt() when possible.
Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
|
|
Before this commit, when multixid wraparound happens,
MultiXactState->nextMXact goes to 0, which is invalid. All the readers
need to deal with that possibility and skip over the 0. That's
error-prone and we've missed it a few times in the past. This commit
changes the responsibility so that all the writers of
MultiXactState->nextMXact skip over the zero already, and readers can
trust that it's never 0.
We were already doing that for MultiXactState->oldestMultiXactId; none
of its writers would set it to 0. ReadMultiXactIdRange() was
nevertheless checking for that possibility. For clarity, remove that
check.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
|
|
It's not far-fetched that we'd try to read a multixid with an invalid
offset in case of bugs or corruption. Or if you call
pg_get_multixact_members() after a crash that left behind invalid but
unused multixids. Better to get a somewhat descriptive error message
if that happens.
Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
|
|
Make it clear why _bt_killitems sorts the scan's so->killedItems[]
array. Also add an assertion to the _bt_killitems loop (that iterates
through this array) to verify it accesses tuples in leaf page order.
Follow-up to commit bfb335df58.
Author: Peter Geoghegan <pg@bowt.ie>
Suggested-by: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAGnEboirgArezZDNeFrR8FOGvKF-Xok333s2iVwWi65gZf8MEA@mail.gmail.com
|
|
Oversight in commit bd8d9c9b.
Discussion: https://postgr.es/m/CAH2-WzmvwVKZ+0Z=RL_+g_aOku8QxWddDCXmtyLj02y+nYaD0g@mail.gmail.com
|
|
Always return TIDs in descending order when returning groups of TIDs
from an nbtree posting list tuple during nbtree backwards scans. This
makes backwards scans tend to require fewer buffer hits, since the scan
is less likely to repeatedly pin and unpin the same heap page/buffer
(we'll get exactly as many buffer hits as we get with a similar forwards
scan case).
Commit 0d861bbb, which added nbtree deduplication, originally did things
this way to avoid interfering with _bt_killitems's approach to setting
LP_DEAD bits on posting list tuples. _bt_killitems makes a soft
assumption that it can always iterate through posting lists in ascending
TID order, finding corresponding killItems[]/so->currPos.items[] entries
in that same order. This worked out because of the prior _bt_readpage
backwards scan behavior. If we just changed the backwards scan posting
list logic in _bt_readpage, without altering _bt_killitems itself, it
would break its soft assumption.
Avoid that problem by sorting the so->killedItems[] array at the start
of _bt_killitems. That way the order that dead items are saved in from
btgettuple can't matter; so->killedItems[] will always be in the same
order as so->currPos.items[] in the end. Since so->currPos.items[] is
now always in leaf page order, regardless of the scan direction used
within _bt_readpage, and since so->killedItems[] is always in that same
order, the _bt_killitems loop can continue to make a uniform assumption
about everything being in page order. In fact, sorting like this makes
the previous soft assumption about item order into a hard invariant.
Also deduplicate the so->killedItems[] array after it is sorted. That
way there's no risk of the _bt_killitems loop becoming confused by a
duplicate dead item/TID. This was possible in cases that involved a
scrollable cursor that encountered the same dead TID more than once
(within the same leaf page/so->currPos context). This doesn't come up
very much in practice, but it seems best to be as consistent as possible
about how and when _bt_killitems will LP_DEAD-mark index tuples.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=Wut2pKvbW-u3hJ_LXwsYeiXHiW8oN1GfbKPavcGo8Ow@mail.gmail.com
|
|
The comment above heap_xlog_visible() about the critical integrity
requirement for PD_ALL_VISIBLE and the visibility map should also be in
heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE.
Oversight in add323da40a6bf9e
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
|
|
Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.
Rewrite the assertion to avoid an Assert(false).
Suggested by Andres Freund.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
|
|
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
This batch of changes includes most of the trivial changes suggested by
the author for src/backend/.
A total of 334 files are updated here. Among these files, 48 of them
have their build change slightly; these are caused by line number
changes as the new allocation formulas are simpler, shaving around 100
lines of code in total.
Similar work has been done in 0c3c5c3b06a3 and 31d3847a37be.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
|
|
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
|
|
Author: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BFpmFf-hWXtrC0Q3Cr_Xo78zuP_M_VC5xgWPOYOkwqOD0T8eg@mail.gmail.com
|
|
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.
Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.
We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.
Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
|
|
This makes them accessible from pg_upgrade, needed by the next commit.
I'm doing this mechanical move as a separate commit to make the next
commit's changes to these definitions more obvious.
Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezbZo_3_fnx%3DS5BfepwRftzrpJ%2B7WET4EkTU6wnjDTsnjg@mail.gmail.com
|
|
There were a number of useless casts in format arguments, either
where the input to the cast was already in the right type, or
seemingly uselessly casting between types instead of just using the
right format placeholder to begin with.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
|
|
Many of these are probably left over from before use of %zu/%zd was
portable.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
|
|
Plus a similar fix to the README.
Backpatch as far back as the sgml issue exists. The README issue does
exist in v14, but that seems unlikely to harm anyone.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
|
|
Make _bt_readpage pass down the current scan direction to various
utility functions within its pstate variable. Also have _bt_readpage
work off of a local copy of scan->ignore_killed_tuples within its
per-tuple loop (rather than using scan->ignore_killed_tuples directly).
Testing has shown that this significantly benefits large range scans,
which are naturally able to take full advantage of the pstate.startikey
optimization added by commit 8a510275. Running a pgbench script with a
"SELECT abalance FROM pgbench_accounts WHERE aid BETWEEN ..." query
shows an increase in transaction throughput of over 5%. There also
appears to be a small performance benefit when running pgbench's
built-in select-only script.
Follow-up to commit 65d6acbc.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
|
|
Quite a bit of code within nbtutils.c is only called by _bt_readpage.
Move _bt_readpage and all of the nbtutils.c functions it depends on into
a new .c file, nbtreadpage.c. Also reorder some of the functions within
the new file for clarity.
This commit has no functional impact. It is strictly mechanical.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
|
|
The code in BootStrapXLOG() and in pg_test_fsync.c tried to align WAL
buffers in complicated ways. Also, they still used XLOG_BLCKSZ for
the alignment, even though that should now be PG_IO_ALIGN_SIZE. This
can now be simplified and made more consistent by using
PGAlignedXLogBlock, either directly in BootStrapXLOG() and using
alignas in pg_test_fsync.c.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f462a175-b608-44a1-b428-bdf351e914f4%40eisentraut.org
|
|
In commit 789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.
Discussion: https://www.postgresql.org/message-id/d9996478-389a-4340-8735-bfad456b313c@iki.fi
Backpatch-through: 14
|
|
Adjust the prune_freeze_setup() parameter types of new_relfrozen_xid and
new_relmin_mxid to prevent misleading Coverity analysis.
heap_page_prune_and_freeze() compared these values against NULL when
passing them to prune_freeze_setup(), causing Coverity to assume they
could be NULL and flag a possible null-pointer dereference later, even
though it occurs inside a directly related conditional.
Reported-by: Coverity
Author: Melanie Plageman <melanieplageman@gmail.com>
|
|
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e9), so they are not needed anymore.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e9), so they are not needed anymore.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
The assertion checking MyProcNumber used MaxBackends as the upper
bound, but the procInfos array is allocated with size
MaxBackends + NUM_AUXILIARY_PROCS. This inconsistency would cause
a false assertion failure if an auxiliary process calls WaitForLSN().
Author: Xuneng Zhou <xunengzhou@gmail.com>
|
|
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.
The waiting mechanism was broken in a few scenarios:
- When nextMulti was advanced without WAL-logging the next
multixid. For example, if a later multixid was already assigned and
WAL-logged before the previous one was WAL-logged, and then the
server crashed. In that case the next offset would never be set in
the offsets SLRU, and a query trying to read it would get stuck
waiting for it. Same thing could happen if pg_resetwal was used to
forcibly advance nextMulti.
- In hot standby mode, a deadlock could happen where one backend waits
for the next multixid assignment record, but WAL replay is not
advancing because of a recovery conflict with the waiting backend.
The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.
Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru
Backpatch-through: 14
|
|
Standard practice in PostgreSQL is to use "foo(void)" instead of
"foo()", as the latter looks like an "old-style" function
declaration. Similar changes were made in commits cdf4b9aff2,
0e72b9d440, 7069dbcc31, f1283ed6cc, 7b66e2c086, e95126cf04, and
9f7c527af3.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/aTBObQPg%2Bps5I7vl%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
The comment for the Pointer type says 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
fixes that. Change from Pointer to use char * explicitly where
pointer arithmetic is needed. This makes the meaning of the code
clearer locally and removes a dependency on the actual definition of
the Pointer type. (The definition of the Pointer type is not changed
in this commit.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
in arguments of memcpy() and memmove() calls
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
To increase our test coverage in general, and because I will use this
in the next commit to test a bug we currently have in amcheck.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru
|
|
To increase our test coverage in general, and because I will add onto
this in the next commit to also test amcheck with incomplete splits.
This is copied from the similar test we had for GIN indexes. B-tree's
incomplete splits work similarly to GIN's, so with small changes, the
same test works for B-tree too.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/abd65090-5336-42cc-b768-2bdd66738404@iki.fi
|
|
This removes some casts where the input already has the same type as
the type specified by the cast. Their presence could cause risks of
hiding actual type mismatches in the future or silently discarding
qualifiers. It also improves readability. Same kind of idea as
7f798aca1d5 and ef8fe693606. (This does not change all such
instances, but only those hand-picked by the author.)
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
Instead of complicated pointer arithmetic, overlay a uint32 array and
just access the array members. That's safe thanks to
XLogRecGetBlockData() returning a MAXALIGNed buffer.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
The input is ASCII anyway, so it's better to be clear that it's not
locale-dependent.
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
This split exists for most of the other RMGRs, and makes cleaner the
separation between the WAL code, the redo code and the record
description code (already in its own file) when it comes to the sequence
RMGR. The redo and masking routines are moved to a new file,
sequence_xlog.c. All the RMGR routines are now located in a new header,
sequence_xlog.h.
This separation is useful for a different patch related to sequences
that I have been working on, where it makes a refactoring of sequence.c
easier if its RMGR routines and its core routines are split.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz
|
|
We have some limited ability to detect redundant and contradictory
conditions involving an nbtree row comparison key following commits
f09816a0 and bd3f59fd: we can do so in simple cases involving IS NULL
and IS NOT NULL keys on a row compare key's first column. We can
likewise determine that a scan's qual is unsatisfiable given a row
compare whose first subkey's arg is NULL. Update obsolete comments that
claimed that we merely copied row compares into the output key array
"without any editorialization".
Also update another _bt_preprocess_keys header comment paragraph: add a
parenthetical remark that points out that preprocessing will generate a
skip array for the preceding example qual. That will ultimate lead to
preprocessing marking the example's lower-order y key required -- which
is exactly what the example supposes cannot happen. Keep the original
comment, though, since it accurately describes the mechanical rules that
determine which keys get marked required in the absence of skip arrays
(which can occasionally still matter). This fixes an oversight in
commit 92fe23d9, which added the nbtree skip scan optimization.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
|
|
The solution used in 0ca3b1697 to determine the Parallel TID Range
Scan's start location was to modify the signature of
table_block_parallelscan_startblock_init() to allow the startblock
to be passed in as a parameter. This allows the scan limits to be
adjusted before that function is called so that the limits are picked up
when the parallel scan starts. The commit made it so the call to
table_block_parallelscan_startblock_init uses the HeapScanDesc's
rs_startblock to pass the startblock to the parallel scan. That all
works ok for Parallel TID Range scans as the HeapScanDesc rs_startblock
gets set by heap_setscanlimits(), but for Parallel Seq Scans, initscan()
does not initialize rs_startblock, and that results in passing an
uninitialized value to table_block_parallelscan_startblock_init() as
noted by the buildfarm member skink, running Valgrind.
To fix this issue, make it so initscan() sets the rs_startblock for
parallel scans unless we're doing a rescan. This makes it so
table_block_parallelscan_startblock_init() will be called with the
startblock set to InvalidBlockNumber, and that'll allow the syncscan
code to find the correct start location (when enabled). For Parallel
TID Range Scans, this InvalidBlockNumber value will be overwritten in
the call to heap_setscanlimits().
initscan() is a bit light on documentation on what's meant to get
initialized where for parallel scans. From what I can tell, it looks like
it just didn't matter prior to 0ca3b1697 that rs_startblock was left
uninitialized for parallel scans. To address the light documentation,
I've also added some comments to mention that the syncscan location for
parallel scans is figured out in table_block_parallelscan_startblock_init.
I've also taken the liberty to adjust the if/else if/else code in
initscan() to make it clearer which parts apply to parallel scans and
which parts are for the serial scans.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqALm+k7FyfdQdCw1yF_8HojvR61YRrNhwRQPE=zSmnQA@mail.gmail.com
|
|
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose. Here, we allow these scans to
be parallelized. The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan. Doing that means workers finish at roughly the same time.
Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers). It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range. There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.
Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
|
|
Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
No code change beyond what was required to extract the code into helper
functions.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
|