postgresql.git - This is the main PostgreSQL git repository.

Age	Commit message (Collapse)	Author
3 days	bufmgr: Add one-entry cache for private refcount	Andres Freund
	The private refcount entry for a buffer is often looked up repeatedly for the same buffer, e.g. to pin and then unpin a buffer. Benchmarking shows that it's worthwhile to have a one-entry cache for that case. With that cache in place, it's worth splitting GetPrivateRefCountEntry() into a small inline portion (for the cache hit case) and an out-of-line helper for the rest. This is helpful for some workloads today, but becomes more important in an upcoming patch that will utilize the private refcount infrastructure to also store whether the buffer is currently locked, as that increases the rate of lookups substantially. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
3 days	bufmgr: Separate keys for private refcount infrastructure	Andres Freund
	This makes lookups faster, due to allowing auto-vectorized lookups. It is also beneficial for an upcoming patch, independent of auto-vectorization, as the upcoming patch wants to track more information for each pinned buffer, making the existing loop, iterating over an array of PrivateRefCountEntry, more expensive due to increasing its size. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
5 days	Replace most StaticAssertStmt() with StaticAssertDecl()	Peter Eisentraut
	Similar to commit 75f49221c22, it is preferable to use StaticAssertDecl() instead of StaticAssertStmt() when possible. Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
6 days	Use palloc_object() and palloc_array(), the last change	Michael Paquier
	This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in 1b105f9472bd, 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
13 days	Fix key size of PrivateRefCountHash.	Nathan Bossart
	The key is the first member of PrivateRefCountEntry, which has type Buffer. This commit changes the key size from sizeof(int32) to sizeof(Buffer). This appears to be an oversight in commit 4b4b680c3d, but it's of no consequence because Buffer has been a signed 32-bit integer for a long time. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aS77DTpl0fOkIKSZ%40ip-10-97-1-34.eu-west-3.compute.internal
14 days	Rename BUFFERPIN wait event class to BUFFER	Andres Freund
	In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
14 days	bufmgr: Turn BUFFER_LOCK_* into an enum	Andres Freund
	It seems cleaner to use an enum to tie the different values together. It also helps to have a more descriptive type in the argument to various functions. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-27	Add routines for marking buffers dirty efficiently	Michael Paquier
	This commit introduces new internal bufmgr routines for marking shared buffers as dirty: * MarkDirtyUnpinnedBuffer() * MarkDirtyRelUnpinnedBuffers() * MarkDirtyAllUnpinnedBuffers() These functions provide an efficient mechanism to respectively mark one buffer, all the buffers of a relation, or the entire shared buffer pool as dirty, something that can be useful to force patterns for the checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is used by these three, to mark as dirty an unpinned buffer. They are intended as developer tools to manipulate buffer dirtiness in bulk, and will be used in a follow-up commit. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
2025-11-08	Remove blank line in C code.	Bruce Momjian
	Was added in commit 5e89985928795f243dc287210c2aa016dfd00bfe. Reported-by: Ashutosh Bapat Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tba_biyuMrd_iPVzq-+XvsMdPcEnjQ+d+__V=cjYj8Pg@mail.gmail.com Backpatch-through: master
2025-11-06	bufmgr: Use atomic sub for unpinning buffers	Andres Freund
	The prior commit made it legal to modify BufferDesc.state while the buffer header spinlock is held. This allows us to replace the CAS loop inUnpinBufferNoOwner() with an atomic sub. This improves scalability significantly. See the prior commits for more background. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-06	bufmgr: Allow some buffer state modifications while holding header lock	Andres Freund
	Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in 48354581a49c. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in 5e899859287). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-06	Use XLogRecPtrIsValid() in various places	Álvaro Herrera
	Now that commit 06edbed47862 has introduced XLogRecPtrIsValid(), we can use that instead of: - XLogRecPtrIsInvalid() - direct comparisons with InvalidXLogRecPtr - direct comparisons with literal 0 This makes the code more consistent. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-04	Add CHECK_FOR_INTERRUPTS in Evict{Rel,All}UnpinnedBuffers.	Masahiko Sawada
	This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These functions, used by pg_buffercache's pg_buffercache_evict_relation and pg_buffercache_evict_all, can now be interrupted during long-running operations. Backpatch to version 18, where these functions and their corresponding pg_buffercache functions were introduced. Author: Yuhang Qiu <iamqyh@gmail.com> Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com Backpatch-through: 18
2025-10-21	Make smgr access for a BufferManagerRelation safer in relcache inval	Álvaro Herrera
	Currently there's no bug, because we have no code path where we invalidate relcache entries where it'd cause a problem. But it's more robust to do it this way in case we introduce such a path later, as some Postgres forks reportedly already have. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Stepan Neretin <slpmcf@gmail.com> Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com
2025-10-09	bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer()	Andres Freund
	In 5e899859287 I made StrategyGetBuffer() pin buffers with a single CAS, instead of using PinBuffer_Locked(). Unfortunately I missed that PinBuffer_Locked() marked the page as defined for valgrind. Fix this oversight by centralizing the valgrind initialization into TrackNewBufferPin(), which also allows us to reduce the number of places doing VALGRIND_MAKE_MEM_DEFINED. Per buildfarm animal skink and Amit Langote. Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Discussion: https://postgr.es/m/CA+HiwqGKJ6nEXEPQW7EpykVsEtzxp5-up_xhtcUAkWFtATVQvQ@mail.gmail.com
2025-10-08	bufmgr: Don't lock buffer header in StrategyGetBuffer()	Andres Freund
	Previously StrategyGetBuffer() acquired the buffer header spinlock for every buffer, whether it was reusable or not. If reusable, it'd be returned, with the lock held, to GetVictimBuffer(), which then would pin the buffer with PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for holding spinlocks (i.e. that they are only held for a few lines of consecutive code) and necessitates using PinBuffer_Locked(), which scales worse than PinBuffer() due to holding the spinlock. This alone makes it worth changing the code. However, the main reason to change this is that a future commit will make PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain scalability for the much more common case of pinning a pre-existing buffer. By pinning the buffer with a single atomic operation, iff the buffer is reusable, we avoid any potential regression for miss-heavy workloads. There strictly are fewer atomic operations for each potential buffer after this change. The price for this improvement is that freelist.c needs two CAS loops and needs to be able to set up the resource accounting for pinned buffers. The latter is achieved by exposing a new function for that purpose from bufmgr.c, that seems better than exposing the entire private refcount infrastructure. The improvement seems worth the complexity. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08	bufmgr: fewer calls to BufferDescriptorGetContentLock	Andres Freund
	We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08	bufmgr: Fix signedness of mask variable in BufferSync()	Andres Freund
	BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted as a signed integer. Unfortunately the mask variable in BufferSync() was signed. This has been wrong for a long time, but failed to fail, due to integer conversion rules. However, in an upcoming patch the width of the state variable will be increased, with the wrong signedness leading to never flushing permanent buffers - luckily caught in a test. It seems better to fix this separately, instead of doing so as part of a large, otherwise mechanical, patch. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08	bufmgr: Introduce FlushUnlockedBuffer	Andres Freund
	There were several copies of code locking a buffer, flushing its contents, and unlocking the buffer. It seems worth centralizing that into a helper function. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08	Improve ReadRecentBuffer() scalability	Andres Freund
	While testing a new potential use for ReadRecentBuffer(), Andres reported that it scales badly when called concurrently for the same buffer by many backends. Instead of a naive (but wrong) coding with PinBuffer(), it used the spinlock, so that it could be careful to pin only if the buffer was valid and holding the expected block, to avoid breaking invariants in eg GetVictimBuffer(). Unfortunately that made it less scalable than PinBuffer(), which uses compare-exchange instead. We can fix that by giving PinBuffer() a new skip_if_not_valid mode that doesn't pin invalid buffers. It might occasionally skip when it shouldn't due to the unlocked read of the header flags, but that's unlikely and perfectly acceptable for an opportunistic optimisation routine, and it can only succeed when it really should due to the compare-exchange loop. Note that this fixes ReadRecentBuffer()'s failure to bump the usage count. While this could be seen as a bug, there currently aren't cases affected by this in core, so it doesn't seem worth backpatching that portion. Author: Thomas Munro <thomas.munro@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-09-18	Mark shared buffer lookup table HASH_FIXED_SIZE	Andres Freund
	StrategyInitialize() calls InitBufTable() with maximum number of entries that the buffer lookup table can ever have. Thus there should not be any need to allocate more element after initialization. Hence mark the hash table as fixed sized. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com
2025-09-05	bufmgr: Remove freelist, always use clock-sweep	Andres Freund
	This set of changes removes the list of available buffers and instead simply uses the clock-sweep algorithm to find and return an available buffer. This also removes the have_free_buffer() function and simply caps the pg_autoprewarm process to at most NBuffers. While on the surface this appears to be removing an optimization it is in fact eliminating code that induces overhead in the form of synchronization that is problematic for multi-core systems. The main reason for removing the freelist, however, is not the moderate improvement in scalability, but that having the freelist would require dedicated complexity in several upcoming patches. As we have not been able to find a case benefiting from the freelist... Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
2025-09-05	bufmgr: Use consistent naming of the clock-sweep algorithm	Andres Freund
	Minor edits to comments only. Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
2025-08-26	Message style improvements	Peter Eisentraut
	Mostly adding some quoting.
2025-08-08	Add missing Datum conversions	Peter Eisentraut
	Add various missing conversions from and to Datum. The previous code mostly relied on implicit conversions or its own explicit casts instead of using the correct DatumGet() or GetDatum() functions. We think these omissions are harmless. Some actual bugs that were discovered during this process have been committed separately (80c758a2e1d, fd2ab03fea2). Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-08	Remove obsolete comment.	Thomas Munro
	Remove a comment about potential for AIO in StartReadBuffersImpl(), because that change happened.
2025-08-03	Fix MemoryContextAllocAligned's interaction with Valgrind.	Tom Lane
	Arrange that only the "aligned chunk" part of the allocated space is included in a Valgrind vchunk. This suppresses complaints about that vchunk being possibly lost because PG is retaining only pointers to the aligned chunk. Also make sure that trailing wasted space is marked NOACCESS. As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes only the returned "aligned chunk", not the wasted padding space. In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned instead of rolling its own implementation, which was equally broken according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
2025-07-29	Run pgindent.	Robert Haas
	Per buildfarm member koel, Nathan Bossart, and David Rowley.
2025-07-28	Remove misleading hint for "unexpected data beyond EOF" error.	Robert Haas
	Commit ffae5cc5a6024b4e25ec920ed5c4dfac649605f8 added this hint in 2006, but it's now obsolete and doesn't reflect what users should really check in this situation. We were not able to agree on a new hint, so just delete the existing one and update the comments to mention one possibility that is known to cause problems of this kind: something other than PostgreSQL is modifying files in the PostgreSQL data directory. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Robert Haas <rhaas@postgresql.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/CAKZiRmxNbcaL76x=09Sxf7aUmrRQJBf8drzDdUHo+j9_eM+VMg@mail.gmail.com
2025-07-11	Rename CHECKPOINT_IMMEDIATE to CHECKPOINT_FAST.	Nathan Bossart
	The new name more accurately reflects the effects of this flag on a requested checkpoint. Checkpoint-related log messages (i.e., those controlled by the log_checkpoints configuration parameter) will now say "fast" instead of "immediate", too. Likewise, references to "immediate" checkpoints in the documentation have been updated to say "fast". This is preparatory work for a follow-up commit that will add a MODE option to the CHECKPOINT command. Author: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
2025-07-11	Rename CHECKPOINT_FLUSH_ALL to CHECKPOINT_FLUSH_UNLOGGED.	Nathan Bossart
	The new name more accurately relects the effects of this flag on a requested checkpoint. Checkpoint-related log messages (i.e., those controlled by the log_checkpoints configuration parameter) will now say "flush-unlogged" instead of "flush-all", too. This is preparatory work for a follow-up commit that will add a FLUSH_UNLOGGED option to the CHECKPOINT command. Author: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
2025-07-04	Speed up truncation of temporary relations.	Fujii Masao
	Previously, truncating a temporary relation required scanning the entire local buffer pool once per relation fork to invalidate buffers. This could be slow, especially with a large local buffers, as the scan was repeated multiple times. A similar issue with regular tables (shared buffers) was addressed in commit 6d05086c0a7 by scanning the buffer pool only once for all forks. This commit applies the same optimization to temporary relations, improving truncation performance. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Discussion: https://postgr.es/m/CAJDiXggNqsJOH7C5co4jA8nDk8vw-=sokyh5s1_TENWnC6Ofcg@mail.gmail.com
2025-06-03	Fix incorrect format placeholders	Peter Eisentraut

2025-05-10	Remove GLOBALTABLESPACE_OID assert for locked buffers.	Noah Misch
	Commit f4ece891fc2f3f96f0571720a1ae30db8030681b added the assertion in an attempt to catch some defects even after VACUUM FULL or REINDEX. However, IsCatalogTextUniqueIndexOid(tag.relNumber) always returns false after a relfilenode change, provoking unintended assertion failures. Reported-by: Adam Guo <adamguo@amazon.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Bug: #18912 Discussion: https://postgr.es/m/18912-a41c9bd0e0ad19b1@postgresql.org
2025-04-27	Don't use double-quotes in #include's of system headers, redux.	Tom Lane
	This cleans up some loose ends left by commit e8ca9ed1d. I hadn't looked closely enough at these places before, but now I have. The use of double-quoted #includes for Perl headers in plperl_system.h seems to be simply a mistake introduced in 6c944bf3c and faithfully copied forward since then. (I had thought possibly it was required by some weird Windows build setup, but there's no evidence of that in our history.) The occurrences in SectionMemoryManager.h and SectionMemoryManager.cpp evidently stem from those files' origin as LLVM code. It's understandable that LLVM would treat their own files as needing double-quoted #includes; but they're still system headers to us. I also applied the same check to *.c files, and found a few other random incorrect usages in both directions. Our ECPG headers and test files routinely use angle brackets to refer to ECPG headers. I left those usages alone, since it seems reasonable for an ECPG user to regard those headers as system headers.
2025-04-21	Fix a few more duplicate words in comments	David Rowley
	Similar to 84fd3bc14 but these ones were found using a regex that can span multiple lines. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-19	Fix typos and grammar in the code	Michael Paquier
	The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-17	Assert lack of hazardous buffer locks before possible catalog read.	Noah Misch
	Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind, which existed in all branches for six days before detection. While the probability of reaching the trouble was low, the disruption was extreme. No new backends could start, and service restoration needed an immediate shutdown. Hence, add this to catch the next bug like it. The new check in RelationIdGetRelation() suffices to make autovacuum detect the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit 0bada39. This also checks in a number of similar places. It replaces each Assert(IsTransactionState()) that pertained to a conditional catalog read. No back-patch for now, but a back-patch of commit 243e9b4 should back-patch this, too. A back-patch could omit the src/test/regress changes, since back branches won't gain new index columns. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2025-04-12	Harmonize function parameter names for Postgres 18.	Peter Geoghegan
	Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 18 development. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
2025-04-08	Increase BAS_BULKREAD based on effective_io_concurrency	Andres Freund
	Before, BAS_BULKREAD was always of size 256kB. With the default io_combine_limit of 16, that only allowed 1-2 IOs to be in flight - insufficient even on very low latency storage. We don't just want to increase the size to a much larger hardcoded value, as very large rings (10s of MBs of of buffers), appear to have negative performance effects when reading in data that the OS has cached (but not when actually needing to do IO). To address this, increase the size of BAS_BULKREAD to allow for io_combine_limit * effective_io_concurrency buffers getting read in. To prevent the ring being much larger than useful, limit the increased size with GetPinLimit(). The formula outlined above keeps the ring size to sizes for which we have not observed performance regressions, unless very large effective_io_concurrency values are used together with large shared_buffers setting. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/lqwghabtu2ak4wknzycufqjm5ijnxhb4k73vzphlt2a3wsemcd@gtftg44kdim6 Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt
2025-04-08	Add pg_buffercache_evict_{relation,all} functions	Andres Freund
	In addition to the added functions, the pg_buffercache_evict() function now shows whether the buffer was flushed. pg_buffercache_evict_relation(): Evicts all shared buffers in a relation at once. pg_buffercache_evict_all(): Evicts all shared buffers at once. Both functions provide mechanism to evict multiple shared buffers at once. They are designed to address the inefficiency of repeatedly calling pg_buffercache_evict() for each individual buffer, which can be time-consuming when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for 16GB of fully populated shared buffers). These functions are intended for developer testing and debugging purposes and are available to superusers only. Minimal tests for the new functions are included. Also, there was no test for pg_buffercache_evict(), test for this added too. No new extension version is needed, as it was already increased this release by ba2a3c2302f. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-07	aio: Make AIO more compatible with valgrind	Andres Freund
	In some edge cases valgrind flags issues with the memory referenced by IOs. All of the cases addressed in this change are false positives. Most of the false positives are caused by UnpinBuffer[NoOwner] marking buffer data as inaccessible. This happens even though the AIO subsystem still holds a pin. That's good, there shouldn't be accesses to the buffer outside of AIO related code until it is pinned by "user" code again. But it requires some explicit work - if the buffer is not pinned by the current backend, we need to explicitly mark the buffer data accessible/inaccessible while executing completion callbacks. That however causes a cascading issue in IO workers: After the completion callbacks for a buffer is executed, the page is marked as inaccessible. If subsequently the same worker is executing IO targeting the same buffer, we would get an error, as the memory is still marked inaccessible. To avoid that, we need to explicitly mark the memory as accessible in IO workers. Another issue is that IO executed in workers or via io_uring will not mark memory as DEFINED. In the case of workers that is because valgrind does not track memory definedness across processes. For io_uring that is because valgrind does not understand io_uring, and therefore its IOs never mark memory as defined, whether the completions are processed in the defining process or in another context. It's not entirely clear how to best solve that. The current user of AIO is not affected, as it explicitly marks buffers as DEFINED & NOACCESS anyway. Defer solving this issue until we have a user with different needs. Per buildfarm animal skink. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6
2025-04-07	localbuf: Add Valgrind buffer access instrumentation	Andres Freund
	This mirrors 1e0dfd166b3 (+ 46ef520b9566), for temporary table buffers. This is mainly interesting right now because the AIO work currently triggers spurious valgrind errors, and the fix for that is cleaner if temp buffers behave the same as shared buffers. This requires one change beyond the annotations themselves, namely to pin local buffers while writing them out in FlushRelationBuffers(). Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6
2025-04-06	aio: Avoid spurious coverity warning	Andres Freund
	PgAioResult.result is never accessed in the relevant path, but coverity complains about an uninitialized access anyway. So just zero-initialize the whole thing. While at it, reduce the scope of the variable. Reported-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAEudQApsKqd-s+fsUQ0OmxJAMHmBSXxrAz3dCs+uvqb3iRtjSw@mail.gmail.com
2025-04-01	aio: Add test_aio module	Andres Freund
	To make the tests possible, a few functions from bufmgr.c/localbuf.c had to be exported, via buf_internals.h. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30	read_stream: Introduce and use optional batchmode support	Andres Freund
	Submitting IO in larger batches can be more efficient than doing so one-by-one, particularly for many small reads. It does, however, require the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not: a) block without first calling pgaio_submit_staged(), unless a to-be-waited-on lock cannot be part of a deadlock, e.g. because it is never held while waiting for IO. b) directly or indirectly start another batch pgaio_enter_batchmode() As this requires care and is nontrivial in some cases, batching is only used with explicit opt-in. This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and uses it where appropriate. There are two cases where batching would likely be beneficial, but where we aren't using it yet: 1) bitmap heap scans, because the callback reads the VM This should soon be solved, because we are planning to remove the use of the VM, due to that not being sound. 2) The first phase of heap vacuum This could be made to support batchmode, but would require some care. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30	bufmgr: Use AIO in StartReadBuffers()	Andres Freund
	This finally introduces the first actual use of AIO. StartReadBuffers() now uses the AIO routines to issue IO. As the implementation of StartReadBuffers() is also used by the functions for reading individual blocks (StartReadBuffer() and through that ReadBufferExtended()) this means all buffered read IO passes through the AIO paths. However, as those are synchronous reads, actually performing the IO asynchronously would be rarely beneficial. Instead such IOs are flagged to always be executed synchronously. This way we don't have to duplicate a fair bit of code. When io_method=sync is used, the IO patterns generated after this change are the same as before, i.e. actual reads are only issued in WaitReadBuffers() and StartReadBuffers() may issue prefetch requests. This allows to bypass most of the actual asynchronicity, which is important to make a change as big as this less risky. One thing worth calling out is that, if IO is actually executed asynchronously, the precise meaning of what track_io_timing is measuring has changed. Previously it tracked the time for each IO, but that does not make sense when multiple IOs are executed concurrently. Now it only measures the time actually spent waiting for IO. A subsequent commit will adjust the docs for this. While AIO is now actually used, the logic in read_stream.c will often prevent using sufficiently many concurrent IOs. That will be addressed in the next commit. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30	bufmgr: Implement AIO read support	Andres Freund
	This commit implements the infrastructure to perform asynchronous reads into the buffer pool. To do so, it: - Adds readv AIO callbacks for shared and local buffers It may be worth calling out that shared buffer completions may be run in a different backend than where the IO started. - Adds an AIO wait reference to BufferDesc, to allow backends to wait for in-progress asynchronous IOs - Adapts StartBufferIO(), WaitIO(), TerminateBufferIO(), and their localbuf.c equivalents, to be able to deal with AIO - Moves the code to handle BM_PIN_COUNT_WAITER into a helper function, as it now also needs to be called on IO completion As of this commit, nothing issues AIO on shared/local buffers. A future commit will update StartReadBuffers() to do so. Buffer reads executed through this infrastructure will report invalid page / checksum errors / warnings differently than before: In the error case the error message will cover all the blocks that were included in the read, rather than just the reporting the first invalid block. If more than one block is invalid, the error will include information about the range of the read, the first invalid block and the number of invalid pages, with a HINT towards the server log for per-block details. For the warning case (i.e. zero_damaged_buffers) we would previously emit one warning message for each buffer in a multi-block read. Now there is only a single warning message for the entire read, again referring to the server log for more details in case of multiple checksum failures within a single larger read. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30	Let caller of PageIsVerified() control ignore_checksum_failure	Andres Freund
	For AIO the completion of a read into shared buffers (i.e. verifying the page including the checksum, updating the BufferDesc to reflect the IO) can happen in a different backend than the backend that started the IO. As ignore_checksum_failure can differ between backends, we need to allow the caller of PageIsVerified() control whether to ignore checksum failures. The commit leaves a gap in the PIV_* values, as an upcoming commit, which depends on this commit, will add PIV_LOG_LOG, which better fits just after PIV_LOG_WARNING. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com
2025-03-30	pgstat: Allow checksum errors to be reported in critical sections	Andres Freund
	For AIO we execute completion callbacks in critical sections (to ensure that AIO can in the future be used for WAL, which in turn requires that we can call completion callbacks in critical sections, to get the resources for WAL io). To report checksum errors a backend now has to call pgstat_prepare_report_checksum_failure(), before entering a critical section, which guarantees the relevant pgstats entry is in shared memory, the relevant DSM segment is mapped into the backend's memory and the address is known via a PgStat_EntryRef. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/wkjj4p2rmkevutkwc6tewoovdqznj6c6nvjmvii4oo5wmbh5sr@retq7d6uqs4j