summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
10 hoursChange pgstat_report_vacuum() to use RelationMichael Paquier
This change makes pgstat_report_vacuum() more consistent with pgstat_report_analyze(), that also uses a Relation. This enforces a policy that callers of this routine should open and lock the relation whose statistics are updated before calling this routine. We will unlikely have a lot of callers of this routine in the tree, but it seems like a good idea to imply this requirement in the long run. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
13 hoursRemove useless code in InjectionPointAttach()Michael Paquier
strlcpy() ensures that a target string is zero-terminated, so there is no need to enforce it a second time in this code. This simplification could have been done in 0eb23285a257. Author: Feilong Meng <feelingmeng@foxmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/tencent_771178777C5BC17FCB7F7A1771CD1FFD5708@qq.com
13 hoursAvoid global LC_CTYPE dependency in pg_locale_icu.c.Jeff Davis
ICU still depends on libc for compatibility with certain historical behavior for single-byte encodings. Make the dependency explicit by holding a locale_t object when required. We should consider a better solution in the future, such as decoding the text to UTF-32 and using u_tolower(). That would be a behavior change and require additional infrastructure though; so for now, just avoid the global LC_CTYPE dependency. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
13 hoursdowncase_identifier(): use method table from locale provider.Jeff Davis
Previously, libc's tolower() was always used for lowercasing identifiers, regardless of the database locale (though only characters beyond 127 in single-byte encodings were affected). Refactor to allow each provider to supply its own implementation of identifier downcasing. For historical compatibility, when using a single-byte encoding, ICU still relies on tolower(). One minor behavior change is that, before the database default locale is initialized, it uses ASCII semantics to downcase the identifiers. Previously, it would use the postmaster's LC_CTYPE setting from the environment. While that could have some effect during GUC processing, for example, it would have been fragile to rely on the environment setting anyway. (Also, it only matters when the encoding is single-byte.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
20 hoursSwitch memory contexts in ReinitializeParallelDSM.Robert Haas
We already do this in CreateParallelContext, InitializeParallelDSM, and LaunchParallelWorkers. I suspect the reason why the matching logic was omitted from ReinitializeParallelDSM is that I failed to realize that any memory allocation was happening here -- but shm_mq_attach does allocate, which could result in a shm_mq_handle being allocated in a shorter-lived context than the ParallelContext which points to it. That could result in a crash if the shorter-lived context is freed before the parallel context is destroyed. As far as I am currently aware, there is no way to reach a crash using only code that is present in core PostgreSQL, but extensions could potentially trip over this. Fixing this in the back-branches appears low-risk, so back-patch to all supported versions. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Backpatch-through: 14 Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
21 hoursAdd explanatory comment to prune_freeze_setup()Melanie Plageman
heap_page_prune_and_freeze() fills in PruneState->deadoffsets, the array of OffsetNumbers of dead tuples. It is returned to the caller in the PruneFreezeResult. To avoid having two copies of the array, the PruneState saves only a pointer to the array. This was a bit unusual and confusing, so add a clarifying comment. Author: Melanie Plageman <melanieplageman@gmail.com> Suggested-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=jiD1nqch4JQN+odAxZSD7mRvdoHUGJYN2r6tQG_66yQ@mail.gmail.com
21 hoursFix const qualification in prune_freeze_setup()Melanie Plageman
The const qualification of the presult argument to prune_freeze_setup() is later cast away, so it was not correct. Remove it and add a comment explaining that presult should not be modified. Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fb97d0ae-a0bc-411d-8a87-f84e7e146488%40eisentraut.org
29 hoursSeparate out bytea sort support from varlena.cJohn Naylor
In the wake of commit b45242fd3, bytea_sortsupport() still called out to varstr_sortsupport(). Treating bytea as a kind of text/varchar required varstr_sortsupport() to allow for the possibility of NUL bytes, but only for C collation. This was confusing. For better separation of concerns, create an independent sortsupport implementation in bytea.c. The heuristics for bytea_abbrev_abort() remain the same as for varstr_abbrev_abort(). It's possible that the bytea case warrants different treatment, but that is left for future investigation. In passing, adjust some strange looking comparisons in varstr_abbrev_abort(). Author: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
31 hoursAdd TAP test to check recovery when redo LSN is missingMichael Paquier
This commit provides test coverage for dc7c77f825d7, where the redo record and the checkpoint record finish on different WAL segments with the start of recovery able to detect that the redo record is missing. This test uses a wait injection point done in the critical section of a checkpoint, method that requires not one but actually two wait injection points to avoid any memory allocations within the critical section of the checkpoint: - Checkpoint run with a background psql. - One first wait point is run by the checkpointer before the critical section, allocating the shared memory required by the DSM registry for the wait machinery in the library injection_points. - First point is woken up. - Second wait point is loaded before the critical section, allocating the memory to build the path to the library loaded, then run in the critical section once the checkpoint redo record has been logged. - WAL segment is switched while waiting on the second point. - Checkpoint completes. - Stop cluster with immediate mode. - The segment that includes the redo record is removed. - Start, recovery fails as the redo record cannot be found. The error message introduced in dc7c77f825d7 is now reduced to a FATAL, meaning that the information is still provided while being able to use a test for it. Nitin has provided a basic version of the test, that I have enhanced to make it portable with two points. Without dc7c77f825d7, the cluster crashes in this test, not on a PANIC but due to the pointer dereference at the beginning of recovery, failure mentioned in the other commit. Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
32 hoursFail recovery when missing redo checkpoint record without backup_labelMichael Paquier
This commit adds an extra check at the beginning of recovery to ensure that the redo record of a checkpoint exists before attempting WAL replay, logging a PANIC if the redo record referenced by the checkpoint record could not be found. This is the same level of failure as when a checkpoint record is missing. This check is added when a cluster is started without a backup_label, after retrieving its checkpoint record. The redo LSN used for the check is retrieved from the checkpoint record successfully read. In the case where a backup_label exists, the startup process already fails if the redo record cannot be found after reading a checkpoint record at the beginning of recovery. Previously, the presence of the redo record was not checked. If the redo and checkpoint records were located on different WAL segments, it would be possible to miss a entire range of WAL records that should have been replayed but were just ignored. The consequences of missing the redo record depend on the version dealt with, these becoming worse the older the version used: - On HEAD, v18 and v17, recovery fails with a pointer dereference at the beginning of the redo loop, as the redo record is expected but cannot be found. These versions are good students, because we detect a failure before doing anything, even if the failure is misleading in the shape of a segmentation fault, giving no information that the redo record is missing. - In v16 and v15, problems show at the end of recovery within FinishWalRecovery(), the startup process using a buggy LSN to decide from where to start writing WAL. The cluster gets corrupted, still it is noisy about it. - v14 and older versions are worse: a cluster gets corrupted but it is entirely silent about the matter. The redo record missing causes the startup process to skip entirely recovery, because a missing record is the same as not redo being required at all. This leads to data loss, as everything is missed between the redo record and the checkpoint record. Note that I have tested that down to 9.4, reproducing the issue with a version of the author's reproducer slightly modified. The code is wrong since at least 9.2, but I did not look at the exact point of origin. This problem has been found by debugging a cluster where the WAL segment including the redo segment was missing due to an operator error, leading to a crash, based on an investigation in v15. Requesting archive recovery with the creation of a recovery.signal or a standby.signal even without a backup_label would mitigate the issue: if the record cannot be found in pg_wal/, the missing segment can be retrieved with a restore_command when checking that the redo record exists. This was already the case without this commit, where recovery would re-fetch the WAL segment that includes the redo record. The check introduced by this commit makes the segment to be retrieved earlier to make sure that the redo record can be found. On HEAD, the code will be slightly changed in a follow-up commit to not rely on a PANIC, to include a test able to emulate the original problem. This is a minimal backpatchable fix, kept separated for clarity. Reported-by: Andres Freund <andres@anarazel.de> Analyzed-by: Andres Freund <andres@anarazel.de> Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com Backpatch-through: 14
40 hoursAllow passing a pointer to GetNamedDSMSegment()'s init callback.Nathan Bossart
This commit adds a new "void *arg" parameter to GetNamedDSMSegment() that is passed to the initialization callback function. This is useful for reusing an initialization callback function for multiple DSM segments. Author: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
41 hoursRevisit cosmetics of "For inplace update, send nontransactional invalidations."Noah Misch
This removes a never-used CacheInvalidateHeapTupleInplace() parameter. It adds README content about inplace update visibility in logical decoding. It rewrites other comments. Back-patch to v18, where commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 first appeared. Since this removes a CacheInvalidateHeapTupleInplace() parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN contains no calls to that function. Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reported-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com Backpatch-through: 18
41 hoursCorrect comments of "Fix data loss at inplace update after heap_update()".Noah Misch
This corrects commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257. Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reported-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Discussion: https://postgr.es/m/CA+renyWCW+_2QvXERBQ+mna6ANwAVXXmHKCA-WzL04bZRsjoBA@mail.gmail.com
43 hoursRemove unused single-byte char_is_cased() API.Jeff Davis
https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
43 hoursUse multibyte-aware extraction of pattern prefixes.Jeff Davis
Previously, like_fixed_prefix() used char-at-a-time logic, which forced it to be too conservative for case-insensitive matching. Introduce like_fixed_prefix_ci(), and use that for case-insensitive pattern prefixes. It uses multibyte and locale-aware logic, along with the new pg_iswcased() API introduced in 630706ced0. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
43 hoursAdd offnum range checks to suppress compile warnings with UBSAN.Tom Lane
Late-model gcc with -fsanitize=undefined enabled issues warnings about uses of PageGetItemId() when it can't prove that the offsetNumber is > 0. The call sites where this happens are checking that the offnum is <= PageGetMaxOffsetNumber(page), so it seems reasonable to add an explicit check that offnum >= 1 too. While at it, rearrange the code to be less contorted and avoid duplicate checks on PageGetMaxOffsetNumber. Maybe the compiler would optimize away the duplicate logic or maybe not, but the existing coding has little to recommend it anyway. There are multiple instances of this identical coding pattern in heapam.c and heapam_xlog.c. Current gcc only complains about two of them, but I fixed them all in the name of consistency. Potentially this could be back-patched in the name of silencing warnings; but I think enabling UBSAN is mainly something people would do on HEAD, so for now it seems not worth the trouble. Discussion: https://postgr.es/m/1699806.1765746897@sss.pgh.pa.us
2 daysImprove sanity checks on multixid members lengthHeikki Linnakangas
In the server, check explicitly for multixids with zero members. We used to have an assertion for it, but commit d4b7bde418 replaced it with more extensive runtime checks, but it missed the original case of zero members. In the upgrade code, a negative length never makes sense, so better check for it explicitly. Commit d4b7bde418 added a similar sanity check to the corresponding server code on master, and in backbranches, the 'length' is passed to palloc which would fail with "invalid memory alloc request size" error. Clarify the comments on what kind of invalid entries are tolerated by the upgrade code and which ones are reported as fatal errors. Coverity complained about 'length' in the upgrade code being tainted. That's bogus because we trust the data on disk at least to some extent, but hopefully this will silence the complaint. If not, I'll dismiss it manually. Discussion: https://www.postgresql.org/message-id/7b505284-c6e9-4c80-a7ee-816493170abc@iki.fi
2 daysFix typo in tablecmds.cDavid Rowley
Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2%3DAib%2BcatZn6wHKmz0BWe8-q10NAhpxu8wUDT19SddKNA%40mail.gmail.com
2 daysAdd retry logic to pg_sync_replication_slots().Amit Kapila
Previously, pg_sync_replication_slots() would finish without synchronizing slots that didn't meet requirements, rather than failing outright. This could leave some failover slots unsynchronized if required catalog rows or WAL segments were missing or at risk of removal, while the standby continued removing needed data. To address this, the function now waits for the primary slot to advance to a position where all required data is available on the standby before completing synchronization. It retries cyclically until all failover slots that existed on the primary at the start of the call are synchronized. Slots created after the function begins are not included. If the standby is promoted during this wait, the function exits gracefully and the temporary slots will be removed. Author: Ajin Cherian <itsajin@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Yilin Zhang <jiezhilove@126.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
3 daysAllow cumulative statistics to read/write auxiliary data from/to diskMichael Paquier
Cumulative stats kinds gain the capability to write additional per-entry data when flushing the stats at shutdown, and read this data when loading back the stats at startup. This can be fit for example in the case of variable-length data (like normalized query strings), so as it becomes possible to link the shared memory stats entries to data that is stored in a different area, like a DSA segment. Three new optional callbacks are added to PgStat_KindInfo, available to variable-numbered stats kinds: * to_serialized_data: writes auxiliary data for an entry. * from_serialized_data: reads auxiliary data for an entry. * finish: performs actions after read/write/discard operations. This is invoked after processing all the entries of a kind, allowing extensions to close file handles and clean up resources. Stats kinds have the option to store this data in the existing pgstats file, but can as well store it in one or more additional files whose names can be built upon the entry keys. The new serialized callbacks are called once an entry key is read or written from the main stats file. A file descriptor to the main pgstats file is available in the arguments of the callbacks. Author: Sami Imseih <samimseih@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
3 daysUpdate typedefs.list to match what the buildfarm currently reports.Tom Lane
The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
3 daysbufmgr: Add one-entry cache for private refcountAndres Freund
The private refcount entry for a buffer is often looked up repeatedly for the same buffer, e.g. to pin and then unpin a buffer. Benchmarking shows that it's worthwhile to have a one-entry cache for that case. With that cache in place, it's worth splitting GetPrivateRefCountEntry() into a small inline portion (for the cache hit case) and an out-of-line helper for the rest. This is helpful for some workloads today, but becomes more important in an upcoming patch that will utilize the private refcount infrastructure to also store whether the buffer is currently locked, as that increases the rate of lookups substantially. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
3 daysbufmgr: Separate keys for private refcount infrastructureAndres Freund
This makes lookups faster, due to allowing auto-vectorized lookups. It is also beneficial for an upcoming patch, independent of auto-vectorization, as the upcoming patch wants to track more information for each pinned buffer, making the existing loop, iterating over an array of PrivateRefCountEntry, more expensive due to increasing its size. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
3 daysRefactor WaitLSNType enum to use a macro for type countAlexander Korotkov
Change WAIT_LSN_TYPE_COUNT from an enum sentinel to a macro definition, in a similar way to IOObject, IOContext, and BackendType enums. Remove explicit enum value assignments well. Author: Xuneng Zhou <xunengzhou@gmail.com>
3 daysFix usage of palloc() in MERGE/SPLIT PARTITION(s) codeAlexander Korotkov
f2e4cc427951 and 4b3d173629f4 implement ALTER TABLE ... MERGE/SPLIT PARTITION(s) commands. In several places, these commits use palloc(), where we should use palloc_object() and palloc_array(). This commit provides appropriate usage of palloc_object() and palloc_array(). Reported-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/tencent_3661BB522D5466B33EA33666%40qq.com
3 daysImplement ALTER TABLE ... SPLIT PARTITION ... commandAlexander Korotkov
This new DDL command splits a single partition into several partitions. Just like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are created using the createPartitionTable() function with the parent partition as the template. This commit comprises a quite naive implementation which works in a single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations, including the tuple routing. This is why the new DDL command can't be recommended for large, partitioned tables under high load. However, this implementation comes in handy in certain cases, even as it is. Also, it could serve as a foundation for future implementations with less locking and possibly parallelism. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval <d.koval@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Richard Guo <guofenglinux@gmail.com> Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Co-authored-by: Jian He <jian.universality@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Robert Haas <rhaas@postgresql.org> Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com>
3 daysImplement ALTER TABLE ... MERGE PARTITIONS ... commandAlexander Korotkov
This new DDL command merges several partitions into a single partition of the target table. The target partition is created using the new createPartitionTable() function with the parent partition as the template. This commit comprises a quite naive implementation which works in a single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations, including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation comes in handy in certain cases, even as it is. Also, it could serve as a foundation for future implementations with less locking and possibly parallelism. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval <d.koval@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Richard Guo <guofenglinux@gmail.com> Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Co-authored-by: Jian He <jian.universality@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Robert Haas <rhaas@postgresql.org> Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com>
4 daysFix jsonb_object_agg crash after eliminating null-valued pairs.Tom Lane
In commit b61aa76e4 I added an assumption in jsonb_object_agg_finalfn that it'd be okay to apply uniqueifyJsonbObject repeatedly to a JsonbValue. I should have studied that code more closely first, because in skip_nulls mode it removed leading nulls by changing the "pairs" array start pointer. This broke the data structure's invariants in two ways: pairs no longer references a repalloc-able chunk, and the distance from pairs to the end of its array is less than parseState->size. So any subsequent addition of more pairs is at high risk of clobbering memory and/or causing repalloc to crash. Unfortunately, adding more pairs is exactly what will happen when the aggregate is being used as a window function. Fix by rewriting uniqueifyJsonbObject to not do that. The prior coding had little to recommend it anyway. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/ec5e96fb-ee49-4e5f-8a09-3f72b4780538@gmail.com
4 daysFix out-of-date comment on makeRangeConstructorsPeter Eisentraut
We did define 4 functions in 4429f6a9e3, but in df73584431e7 we got rid of the 0- and 1-arg versions. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BrenyVQti3iC7LE4UxtQb4ROLYMs6%2Bu-d4LrN5U4idH1Ghx6Q%40mail.gmail.com
4 daysClarify comment about temporal foreign keysPeter Eisentraut
In RI_ConstraintInfo, period_contained_by_oper and period_intersect_oper can take either anyrange or anymultirange. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/CA%2BrenyWzDth%2BjqLZA2L2Cezs3wE%2BWX-5P8W2EOVx_zfFD%3Daicg%40mail.gmail.com
5 daysReject opclass options in ON CONFLICT clauseÁlvaro Herrera
It's as pointless as ASC/DESC and NULLS FIRST/LAST are, so reject all of them in the same way. While at it, normalize the others' error messages to have less translatable strings. Add tests for these errors. Noticed while reviewing recent INSERT ON CONFLICT patches. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/202511271516.oiefpvn3z27m@alvherre.pgsql
5 daysReplace most StaticAssertStmt() with StaticAssertDecl()Peter Eisentraut
Similar to commit 75f49221c22, it is preferable to use StaticAssertDecl() instead of StaticAssertStmt() when possible. Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
5 daysNever store 0 as the nextMXactHeikki Linnakangas
Before this commit, when multixid wraparound happens, MultiXactState->nextMXact goes to 0, which is invalid. All the readers need to deal with that possibility and skip over the 0. That's error-prone and we've missed it a few times in the past. This commit changes the responsibility so that all the writers of MultiXactState->nextMXact skip over the zero already, and readers can trust that it's never 0. We were already doing that for MultiXactState->oldestMultiXactId; none of its writers would set it to 0. ReadMultiXactIdRange() was nevertheless checking for that possibility. For clarity, remove that check. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
6 daysFix some comments.Nathan Bossart
Like commit 123661427b, these were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent.
6 daysFix infer_arbiter_index for partitioned tablesÁlvaro Herrera
The fix for concurrent index operations in bc32a12e0db2 started considering indexes that are not yet marked indisvalid as arbiters for INSERT ON CONFLICT. For partitioned tables, this leads to including indexes that may not exist in partitions, causing a trivially reproducible "invalid arbiter index list" error to be thrown because of failure to match the index. To fix, it suffices to ignore !indisvalid indexes on partitioned tables. There should be no risk that the set of indexes will change for concurrent transactions, because in order for such an index to be marked valid, an ALTER INDEX ATTACH PARTITION must run which requires AccessExclusiveLock. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/17622f79-117a-4a44-aa8e-0374e53faaf0%40gmail.com
6 daysFix comment on how temp files and subtransactions are handledHeikki Linnakangas
The comment was accurate a long time ago, but not any more. I failed to update the comment in commit ab3148b712.
6 daysAdd runtime checks for bogus multixact offsetsHeikki Linnakangas
It's not far-fetched that we'd try to read a multixid with an invalid offset in case of bugs or corruption. Or if you call pg_get_multixact_members() after a crash that left behind invalid but unused multixids. Better to get a somewhat descriptive error message if that happens. Discussion: https://www.postgresql.org/message-id/3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
6 daysUse palloc_object() and palloc_array(), the last changeMichael Paquier
This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in 1b105f9472bd, 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
6 daysEnhance slot synchronization API to respect promotion signal.Amit Kapila
Previously, during a promotion, only the slot synchronization worker was signaled to shut down. The backend executing slot synchronization via the pg_sync_replication_slots() SQL function was not signaled, allowing it to complete its synchronization cycle before exiting. An upcoming patch improves pg_sync_replication_slots() to wait until replication slots are fully persisted before finishing. This behaviour requires the backend to exit promptly if a promotion occurs. This patch ensures that, during promotion, a signal is also sent to the backend running pg_sync_replication_slots(), allowing it to be interrupted and exit immediately. Author: Ajin Cherian <itsajin@gmail.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
6 daysClarify why _bt_killitems sorts its items array.Peter Geoghegan
Make it clear why _bt_killitems sorts the scan's so->killedItems[] array. Also add an assertion to the _bt_killitems loop (that iterates through this array) to verify it accesses tuples in leaf page order. Follow-up to commit bfb335df58. Author: Peter Geoghegan <pg@bowt.ie> Suggested-by: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAGnEboirgArezZDNeFrR8FOGvKF-Xok333s2iVwWi65gZf8MEA@mail.gmail.com
6 daysFix allocation formula in llvmjit_expr.cMichael Paquier
An array of LLVMBasicBlockRef is allocated with the size used for an element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef". LLVMBasicBlockRef is a type that refers to a pointer, so this did not directly cause a problem because both should have the same size, still it is incorrect. This issue has been spotted while reviewing a different patch, and exists since 2a0faed9d702, so backpatch all the way down. Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com Backpatch-through: 14
7 daysFix MULTIXACT_DEBUG builds.Peter Geoghegan
Oversight in commit bd8d9c9b. Discussion: https://postgr.es/m/CAH2-WzmvwVKZ+0Z=RL_+g_aOku8QxWddDCXmtyLj02y+nYaD0g@mail.gmail.com
7 daysReturn TIDs in desc order during backwards scans.Peter Geoghegan
Always return TIDs in descending order when returning groups of TIDs from an nbtree posting list tuple during nbtree backwards scans. This makes backwards scans tend to require fewer buffer hits, since the scan is less likely to repeatedly pin and unpin the same heap page/buffer (we'll get exactly as many buffer hits as we get with a similar forwards scan case). Commit 0d861bbb, which added nbtree deduplication, originally did things this way to avoid interfering with _bt_killitems's approach to setting LP_DEAD bits on posting list tuples. _bt_killitems makes a soft assumption that it can always iterate through posting lists in ascending TID order, finding corresponding killItems[]/so->currPos.items[] entries in that same order. This worked out because of the prior _bt_readpage backwards scan behavior. If we just changed the backwards scan posting list logic in _bt_readpage, without altering _bt_killitems itself, it would break its soft assumption. Avoid that problem by sorting the so->killedItems[] array at the start of _bt_killitems. That way the order that dead items are saved in from btgettuple can't matter; so->killedItems[] will always be in the same order as so->currPos.items[] in the end. Since so->currPos.items[] is now always in leaf page order, regardless of the scan direction used within _bt_readpage, and since so->killedItems[] is always in that same order, the _bt_killitems loop can continue to make a uniform assumption about everything being in page order. In fact, sorting like this makes the previous soft assumption about item order into a hard invariant. Also deduplicate the so->killedItems[] array after it is sorted. That way there's no risk of the _bt_killitems loop becoming confused by a duplicate dead item/TID. This was possible in cases that involved a scrollable cursor that encountered the same dead TID more than once (within the same leaf page/so->currPos context). This doesn't come up very much in practice, but it seems best to be as consistent as possible about how and when _bt_killitems will LP_DEAD-mark index tuples. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=Wut2pKvbW-u3hJ_LXwsYeiXHiW8oN1GfbKPavcGo8Ow@mail.gmail.com
7 daysAdd pg_iswcased().Jeff Davis
True if character has multiple case forms. Will be a useful multibyte-aware replacement for char_is_cased(). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
7 daysRemove char_tolower() API.Jeff Davis
It's only useful for an ILIKE optimization for the libc provider using a single-byte encoding and a non-C locale, but it creates significant internal complexity. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
7 daysAdd comment about keeping PD_ALL_VISIBLE and VM in syncMelanie Plageman
The comment above heap_xlog_visible() about the critical integrity requirement for PD_ALL_VISIBLE and the visibility map should also be in heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE. Oversight in add323da40a6bf9e Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
7 daysSimplify vacuum visibility assertionMelanie Plageman
Phase I vacuum gives the page a once-over after pruning and freezing to check that the values of all_visible and all_frozen agree with the result of heap_page_is_all_visible(). This is meant to keep the logic in phase I for determining visibility in sync with the logic in phase III. Rewrite the assertion to avoid an Assert(false). Suggested by Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
7 daysFix comment in GetPublicationRelationsHeikki Linnakangas
This function gets the list of relations associated with the publication but the comment said the opposite. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CANhcyEV3C_CGBeDtjvKjALDJDMH-Uuc9BWfSd=eck8SCXnE=fQ@mail.gmail.com
7 daysFix some near-bugs related to ResourceOwner function argumentsHeikki Linnakangas
These functions took a ResourceOwner argument, but only checked if it was NULL, and then used CurrentResourceOwner for the actual work. Surely the intention was to use the passed-in resource owner. All current callers passed CurrentResourceOwner or NULL, so this has no consequences at the moment, but it's an accident waiting to happen for future caller and extensions. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com Backpatch-through: 17
8 daysFix typo in commentHeikki Linnakangas
Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CABPTF7V8CbOXGePqrad6EH3Om7DRhNiO3C0rQ-62UuT7RdU-GQ@mail.gmail.com