summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2025-10-08bufmgr: Don't lock buffer header in StrategyGetBuffer()Andres Freund
Previously StrategyGetBuffer() acquired the buffer header spinlock for every buffer, whether it was reusable or not. If reusable, it'd be returned, with the lock held, to GetVictimBuffer(), which then would pin the buffer with PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for holding spinlocks (i.e. that they are only held for a few lines of consecutive code) and necessitates using PinBuffer_Locked(), which scales worse than PinBuffer() due to holding the spinlock. This alone makes it worth changing the code. However, the main reason to change this is that a future commit will make PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain scalability for the much more common case of pinning a pre-existing buffer. By pinning the buffer with a single atomic operation, iff the buffer is reusable, we avoid any potential regression for miss-heavy workloads. There strictly are fewer atomic operations for each potential buffer after this change. The price for this improvement is that freelist.c needs two CAS loops and needs to be able to set up the resource accounting for pinned buffers. The latter is achieved by exposing a new function for that purpose from bufmgr.c, that seems better than exposing the entire private refcount infrastructure. The improvement seems worth the complexity. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08bufmgr: fewer calls to BufferDescriptorGetContentLockAndres Freund
We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08Add mem_exceeded_count column to pg_stat_replication_slots.Masahiko Sawada
This commit introduces a new column mem_exceeded_count to the pg_stat_replication_slots view. This counter tracks how often the memory used by logical decoding exceeds the logical_decoding_work_mem limit. The new statistic helps users determine whether exceeding the logical_decoding_work_mem limit is a rare occurrences or a frequent issue, information that wasn't available through existing statistics. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
2025-10-08Cleanup NAN code in float.h, too.Tom Lane
In the same spirit as 3bf905692, assume that all compilers we still support provide the NAN macro, and get rid of workarounds for that. The C standard allows implementations to omit NAN if the underlying float arithmetic lacks quiet (non-signaling) NaNs. However, we've required that feature for years: the workarounds only supported lack of the macro, not lack of the functionality. I put in a compile-time #error if there's no macro, just for clarity. Also fix up the copies of these functions in ecpglib, and leave a breadcrumb for the next hacker who touches them. History of the hacks being removed here can be found in commits 1bc2d544b, 4d17a2146, cec8394b5. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us
2025-10-08Add extension_state member to PlannedStmt.Robert Haas
Extensions can stash data computed at plan time into this list using planner_shutdown_hook (or perhaps other mechanisms) and then access it from any code that has access to the PlannedStmt (such as explain hooks), allowing for extensible debugging and instrumentation of plans. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08Add planner_setup_hook and planner_shutdown_hook.Robert Haas
These hooks allow plugins to get control at the earliest point at which the PlannerGlobal object is fully initialized, and then just before it gets destroyed. This is useful in combination with the extendable plan state facilities (see extendplan.h) and perhaps for other purposes as well. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08Add ExplainState argument to pg_plan_query() and planner().Robert Haas
This allows extensions to have access to any data they've stored in the ExplainState during planning. Unfortunately, it won't help with EXPLAIN EXECUTE is used, but since that case is less common, this still seems like an improvement. Since planner() has quite a few arguments now, also add some documentation of those arguments and the return value. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08Implement Eager AggregationRichard Guo
Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
2025-10-08Allow negative aggtransspace to indicate unbounded state sizeRichard Guo
This patch reuses the existing aggtransspace in pg_aggregate to signal that an aggregate's transition state can grow unboundedly. If aggtransspace is set to a negative value, it now indicates that the transition state may consume unpredictable or large amounts of memory, such as in aggregates like array_agg or string_agg that accumulate input rows. This information can be used by the planner to avoid applying memory-sensitive optimizations (e.g., eager aggregation) when there is a risk of excessive memory usage during partial aggregation. Bump catalog version. Per idea from Robert Haas, though applied differently than originally suggested. Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
2025-10-08Add stats_reset to pg_stat_user_functionsMichael Paquier
It is possible to call pg_stat_reset_single_function_counters() for a single function, but the reset time was missing the system view showing its statistics. Like all the fields of pg_stat_user_functions, the GUC track_functions needs to be enabled to show the statistics about function executions. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to PgStat_StatFuncEntry. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz
2025-10-07Cleanup INFINITY code in float.hDavid Rowley
The INFINITY macro is always defined per C99 standard, so this should mean we can now get rid of the workaround code for when that macro isn't defined. Also, delete the (now unneeded) #pragma code which was disabling a compiler warning in MSVC. There was a comment explaining why the #pragma was placed outside the function body to work around a MSVC compiler bug, but the link explaining that was dead, as reported by jian he. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com
2025-10-07Remove PlannerInfo's join_search_private method.Robert Haas
Instead, use the new mechanism that allows planner extensions to store private state inside a PlannerInfo, treating GEQO as an in-core planner extension. This is a useful test of the new facility, and also buys back a few bytes of storage. To make this work, we must remove innerrel_is_unique_ext's hack of testing whether join_search_private is set as a proxy for whether the join search might be retried. Add a flag that extensions can use to explicitly signal their intentions instead. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-07Allow private state in certain planner data structures.Robert Haas
Extension that make extensive use of planner hooks may want to coordinate their efforts, for example to avoid duplicate computation, but that's currently difficult because there's no really good way to pass data between different hooks. To make that easier, allow for storage of extension-managed private state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very similar lines to what we have permitted for ExplainState since commit c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-07Assign each subquery a unique name prior to planning it.Robert Haas
Previously, subqueries were given names only after they were planned, which makes it difficult to use information from a previous execution of the query to guide future planning. If, for example, you knew something about how you want "InitPlan 2" to be planned, you won't know whether the subquery you're currently planning will end up being "InitPlan 2" until after you've finished planning it, by which point it's too late to use the information that you had. To fix this, assign each subplan a unique name before we begin planning it. To improve consistency, use textual names for all subplans, rather than, as we did previously, a mix of numbers (such as "InitPlan 1") and names (such as "CTE foo"), and make sure that the same name is never assigned more than once. We adopt the somewhat arbitrary convention of using the type of sublink to set the plan name; for example, a query that previously had two expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up named expr_1 and expr_2. Because names are assigned before rather than after planning, some of the regression test outputs show the numerical part of the name switching positions: what was previously SubPlan 2 was actually the first one encountered, but we finished planning it later. We assign names even to subqueries that aren't shown as such within the EXPLAIN output. These include subqueries that are a FROM clause item or a branch of a set operation, rather than something that will be turned into an InitPlan or SubPlan. The purpose of this is to make sure that, below the topmost query level, there's always a name for each subquery that is stable from one planning cycle to the next (assuming no changes to the query or the database schema). Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
2025-10-06Optimize hex_encode() and hex_decode() using SIMD.Nathan Bossart
The hex_encode() and hex_decode() functions serve as the workhorses for hexadecimal data for bytea's text format conversion functions, and some workloads are sensitive to their performance. This commit adds new implementations that use routines from port/simd.h, which testing indicates are much faster for larger inputs. For small or invalid inputs, we fall back on the existing scalar versions. Since we are using port/simd.h, these optimizations apply to both x86-64 and AArch64. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com> Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan
2025-10-06Expose sequence page LSN via pg_get_sequence_data.Amit Kapila
This patch enhances the pg_get_sequence_data function to include the page-level LSN (Log Sequence Number) of the sequence. This additional metadata will be used by upcoming patches to support synchronization of sequences during logical replication. By exposing the LSN, we enable more accurate tracking of sequence changes, which is essential for maintaining consistency across replicated nodes. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-06Add comment in ginxlog.h about block used with ginxlogInsertListPageMichael Paquier
All the other structures describe the list of blocks used, and in the case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with the items added to it. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
2025-10-06Add stats_reset to pg_stat_all_{tables,indexes} and related viewsMichael Paquier
It is possible to call pg_stat_reset_single_table_counters() on a relation (index or table) but the reset time was missing from the system views showing their statistics. This commit adds the reset time as an attribute of pg_stat_all_tables, pg_stat_all_indexes, and other relations related to them. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to PgStat_StatTabEntry. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
2025-10-05Don't include access/htup_details.h in executor/tuptable.hÁlvaro Herrera
This is not at all needed; I suspect it was a simple mistake in commit 5408e233f066. It causes htup_details.h to bleed into a huge number of places via execnodes.h. Remove it and fix fallout. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
2025-10-05Don't include execnodes.h in brin.h or gin.hÁlvaro Herrera
These headers don't need execnodes.h for anything. I think they never have. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
2025-10-03Optimize vector8_has_le() on AArch64.Nathan Bossart
Presently, the SIMD implementation of this function uses unsigned saturating subtraction to find bytes less than or equal to the given value, which is a workaround for the lack of unsigned comparison instructions on some architectures. However, Neon offers vminvq_u8(), which returns the minimum (unsigned) value in the vector. This commit adds a Neon-specific implementation that uses vminvq_u8() to optimize vector8_has_le() on AArch64. In passing, adjust the SSE2 implementation to use vector8_min() and vector8_eq() to find values less than or equal to the given value. This was the only use of vector8_ssub(), so it has been removed. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan
2025-10-03Add IGNORE NULLS/RESPECT NULLS option to Window functions.Tatsuo Ishii
Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead, lag, first_value, last_value and nth_value window functions. If unspecified, the default is RESPECT NULLS which includes NULL values in any result calculation. IGNORE NULLS ignores NULL values. Built-in window functions are modified to call new API WinCheckAndInitializeNullTreatment() to indicate whether they accept IGNORE NULLS/RESPECT NULLS option or not (the API can be called by user defined window functions as well). If WinGetFuncArgInPartition's allowNullTreatment argument is true and IGNORE NULLS option is given, WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return evaluated function's argument expression on specified non NULL row (if it exists) in the partition or the frame. When IGNORE NULLS option is given, window functions need to visit and evaluate same rows over and over again to look for non null rows. To mitigate the issue, 2-bit not null information array is created while executing window functions to remember whether the row has been already evaluated to NULL or NOT NULL. If already evaluated, we could skip the evaluation work, thus we could get better performance. Author: Oliver Ford <ojford@gmail.com> Co-authored-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com> Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
2025-09-30Rename pg_builtin_integer_constant_p to pg_integer_constant_pPeter Eisentraut
Since it's not builtin. Also fix a related typo. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAApHDvom02B_XNVSkvxznVUyZbjGAR%2B5myA89ZcbEd3%3DPA9UcA%40mail.gmail.com
2025-09-30Make some use of anonymous unions [reorderbuffer xact_time]Peter Eisentraut
Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the ReorderBufferTXN struct. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30Make some use of anonymous unions [pg_locale_t]Peter Eisentraut
Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the pg_locale_t type. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30Do a tiny bit of header file maintenanceÁlvaro Herrera
Stop including utils/relcache.h in access/genam.h, and stop including htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and tidbitmap.h) are widely used in other header files, so it's in our best interest that they remain as lean as reasonable. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
2025-09-29Add GROUP BY ALL.Tom Lane
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does not contain an aggregate or window function into the groupClause of the query, making it exactly equivalent to specifying those same expressions in an explicit GROUP BY list. This feature is useful for certain kinds of data exploration. It's already present in some other DBMSes, and the SQL committee recently accepted it into the standard, so we can be reasonably confident in the syntax being stable. We do have to invent part of the semantics, as the standard doesn't allow for expressions in GROUP BY, so they haven't specified what to do with window functions. We assume that those should be treated like aggregates, i.e., left out of the constructed GROUP BY list. In passing, wordsmith some existing documentation about GROUP BY, and update some neglected synopsis entries in select_into.sgml. Author: David Christensen <david@pgguru.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
2025-09-28Add support for tracking of entry count in pgstatsMichael Paquier
Stats kinds can set a new option called "track_entry_count" (disabled by default, available for variable-numbered stats) that will make pgstats track the number of entries that exist in its shared hashtable. As there is only one code path where a new entry is added, and one code path where entries are freed, the count tracking is straight-forward in its implementation. Reads of these counters are optimistic, and may change across two calls. The counter is incremented when an entry is created (not when reused), and is decremented when an entry is freed from the hashtable (marked for drop with its refcount reaching 0), which is something that pgstats decides internally. A first use case of this facility would be pg_stat_statements, where we need to be able to cap the number of entries that would be stored in the shared hashtable, based on its "max" GUC. The module currently relies on hash_get_num_entries(), which offers a cheap way to count how many entries are in its hash table, but we cannot do that in pgstats for variable-sized stats kinds as a single hashtable is used for all the stats kinds. Independently of PGSS, this is useful for other custom stats kinds that want to cap, control, or track the number of entries they have, without depending on a potentially expensive sequential scan to know the number of entries while holding an extra exclusive lock. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Keisuke Kuroda <keisuke.kuroda.3862@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
2025-09-27Teach MSVC that elog/ereport ERROR doesn't returnDavid Rowley
It had always been intended that this already works correctly as pg_unreachable() uses __assume(0) on MSVC, and that directs the compiler in a way so it knows that a given function won't return. However, with ereport_domain(), it didn't work... It's now understood that the failure to determine that elog(ERROR) does not return comes from the inability of the MSVC compiler to detect the "const int elevel_" is the same as the "elevel" macro parameter. MSVC seems to be unable to make the "if (elevel_ >= ERROR) branch constantly-true when the macro is used with any elevel >= ERROR, therefore the pg_unreachable() is seen to only be present in a *conditional* branch rather than present *unconditionally*. While there seems to be no way to force the compiler into knowing that elevel_ is equal to elevel within the ereport_domain() macro, there is a way in C11 to determine if the elevel parameter is a compile-time constant or not. This is done via some hackery using the _Generic() intrinsic function, which gives us functionality similar to GCC's __builtin_constant_p(), albeit only for integers. Here we define pg_builtin_integer_constant_p() for this purpose. Callers can check for availability via HAVE_PG_BUILTIN_INTEGER_CONSTANT_P. ereport_domain() has been adjusted to use pg_builtin_integer_constant_p() instead of __builtin_constant_p(). It's not quite clear at this stage if this now allows us to forego doing the likes of "return NULL; /* keep compiler quiet */" as there may be other compilers in use that have similar struggles. It's just a matter of time before someone commits a function that does not "return" a value after an elog(ERROR). Let's make time and lack of complaints about said commit be the judge of if we need to continue the "/* keep compiler quiet */" palaver. Author: David Rowley <drowleyml@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvom02B_XNVSkvxznVUyZbjGAR+5myA89ZcbEd3=PA9UcA@mail.gmail.com
2025-09-26Remove unused for_all_tables field from AlterPublicationStmt.Masahiko Sawada
No backpatch as AlterPublicationStmt struct is exposed. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAD21AoC6B6AuxWOST-TkxUbDgp8FwX=BLEJZmKLG_VrH-hfxpA@mail.gmail.com
2025-09-26Create a separate file listing backend typesÁlvaro Herrera
Use our established coding pattern to reduce maintenance pain when adding other per-process-type characteristics. Like PG_KEYWORD, PG_CMDTAG, PG_RMGR. To keep the strings translatable, the relevant makefile now also scans src/include for this specific file. I didn't want to have it scan all .h files, as then gettext would have to scan all header files. I didn't find any way to affect the meson behavior in this respect though. Author: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com> Discussion: https://postgr.es/m/202507151830.dwgz5nmmqtdy@alvherre.pgsql
2025-09-25Don't include execnodes.h in replication/conflict.hÁlvaro Herrera
... which silently propagates a lot of headers into many places via pgstat.h, as evidenced by the variety of headers that this patch needs to add to seemingly random places. Add a minimum of typedefs to conflict.h to be able to remove execnodes.h, and fix the fallout. Backpatch to 18, where conflict.h first appeared. Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
2025-09-25Update some more forward declarations to use typedefÁlvaro Herrera
As commit d4d1fc527bdb. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql
2025-09-24Fix incorrect and inconsistent comments in tableam.h and heapam.c.Fujii Masao
This commit corrects several issues in function comments: * The parameter "rel" was incorrectly referred to as "relation" in the comments for table_tuple_delete(), table_tuple_update(), and table_tuple_lock(). * In table_tuple_delete(), "changingPart" was listed as an output parameter in the comments but is actually input. * In table_tuple_update(), "slot" was listed as an input parameter in the comments but is actually output. * The comment for "update_indexes" in table_tuple_update() was mis-indented. * The comments for heap_lock_tuple() incorrectly referenced a non-existent "tid" parameter. Author: Chao Li <lic@highgo.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com
2025-09-24Remove PointerIsValid()Peter Eisentraut
This doesn't provide any value over the standard style of checking the pointer directly or comparing against NULL. Also remove related: - AllocPointerIsValid() [unused] - IndexScanIsValid() [had one user] - HeapScanIsValid() [unused] - InvalidRelation [unused] Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(), RelationIsValid for now, to reduce code churn. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi
2025-09-23Keep track of what RTIs a Result node is scanning.Robert Haas
Result nodes now include an RTI set, which is only non-NULL when they have no subplan, and is taken from the relid set of the RelOptInfo that the Result is generating. ExplainPreScanNode now takes notice of these RTIs, which means that a few things get schema-qualified in the regression tests that previously did not. This makes the output more consistent between cases where some part of the plan tree is replaced by a Result node and those where this does not happen. Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs stored in a Result node just as it already does for other RTI-bearing node types. Result nodes also now include a result_reason, which tells us something about why the Result node was inserted. Using that information, EXPLAIN now emits, where relevant, a "Replaces" line describing the origin of a Result node. The purpose of these changes is to allow code that inspects a Plan tree to understand the origin of Result nodes that appear therein. Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
2025-09-22Fix various incorrect filename referencesDavid Rowley
Author: Chao Li <li.evan.chao@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com
2025-09-22Fix misleading comment in RangeTblEntryRichard Guo
The comment describing join_using_alias incorrectly referred to the alias field as being defined "below", when it actually appears earlier in the RangeTblEntry struct. This patch fixes that. Author: Steve Lau <stevelauc@outlook.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/TYWPR01MB10612B020C33FD08F729415CEB613A@TYWPR01MB10612.jpnprd01.prod.outlook.com
2025-09-20Track the maximum possible frequency of non-MCE array elements.Tom Lane
The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we previously stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In passing, improve the comments in routines that use STATISTIC_KIND_MCELEM data. Particularly, explain why we use minfreq / 2 not minfreq as the estimate for non-MCE values. Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost <FROSTMAR@uk.ibm.com> Reported-by: Matt Long <matt@mattlong.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-19Fix obsolete references to postgres.h in comments.Nathan Bossart
Oversights in commits d08741eab5 and d952373a98. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aMxbfSJ2wLWd32x-%40nathan
2025-09-19Add optional pid parameter to pg_replication_origin_session_setup().Amit Kapila
Commit 216a784829c introduced parallel apply workers, allowing multiple processes to share a replication origin. To support this, replorigin_session_setup() was extended to accept a pid argument identifying the process using the origin. This commit exposes that capability through the SQL interface function pg_replication_origin_session_setup() by adding an optional pid parameter. This enables multiple processes to coordinate replication using the same origin when using SQL-level replication functions. This change allows the non-builtin logical replication solutions to implement parallel apply for large transactions. Additionally, an existing internal error was made user-facing, as it can now be triggered via the exposed SQL API. Author: Doruk Yilmaz <doruk@mixrank.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
2025-09-19Document and check that PgStat_HashKey has no paddingMichael Paquier
This change is a tighter rework of 7d85d87f4d5c, which tried to improve the code so as it would work should PgStat_HashKey gain new fields that create padding bytes. However, the previous change is proving to not be enough as some code paths of pgstats do not pass PgStat_HashKey by reference (valgrind would warn when padding is added to the structure, through a new field). Per discussion, let's document and check that PgStat_HashKey has no padding rather than try to complicate the code of pgstats so as it is able to work around that. This removes a couple of memset(0) calls that should not be required. While on it, this commit adds a static assertion checking that no padding is introduced in the structure, by checking that the size of PgStat_HashKey matches with the sum of the size of all its fields. The object ID part of the hash key is already 8 bytes, which should be plenty enough already. A comment is added to discourage the addition of new fields. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com
2025-09-17jit: Fix type used for Datum values in LLVM IR.Thomas Munro
Commit 2a600a93 made Datum 8 bytes wide everywhere. It was no longer appropriate to use TypeSizeT on 32 bit systems, and JIT compilation would fail with various type check errors. Introduce a separate LLVMTypeRef with the name TypeDatum. TypeSizeT is still used in some places for actual size_t values. Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru> Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru
2025-09-16Provide more-specific error details/hints for function lookup failures.Tom Lane
Up to now we've contented ourselves with a one-size-fits-all error hint when we fail to find any match to a function or procedure call. That was mostly okay in the beginning, but it was never great, and since the introduction of named arguments it's really not adequate. We at least ought to distinguish "function name doesn't exist" from "function name exists, but not with those argument names". And the rules for named-argument matching are arcane enough that some more detail seems warranted if we match the argument names but the call still doesn't work. This patch creates a framework for dealing with these problems: FuncnameGetCandidates and related code will now pass back a bitmask of flags showing how far the match succeeded. This allows a considerable amount of granularity in the reports. The set-bits-in-a-bitmask approach means that when there are multiple candidate functions, the report will reflect the match(es) that got the furthest, which seems correct. Also, we can avoid mentioning "maybe add casts" unless failure to match argument types is actually the issue. Extend the same return-a-bitmask approach to OpernameGetCandidates. The issues around argument names don't apply to operator syntax, but it still seems worth distinguishing between "there is no operator of that name" and "we couldn't match the argument types". While at it, adjust these messages and related ones to more strictly separate "detail" from "hint", following our message style guidelines' distinction between those. Reported-by: Dominique Devienne <ddevienne@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us
2025-09-16Move pg_int64 back to postgres_ext.hPeter Eisentraut
Fix for commit 3c86223c998. That commit moved the typedef of pg_int64 from postgres_ext.h to libpq-fe.h, because the only remaining place where it might be used is libpq users, and since the type is obsolete, the intent was to limit its scope. The problem is that if someone builds an extension against an older (pre-PG18) server version and a new (PG18) libpq, they might get two typedefs, depending on include file order. This is not allowed under C99, so they might get warnings or errors, depending on the compiler and options. The underlying types might also be different (e.g., long int vs. long long int), which would also lead to errors. This scenario is plausible when using the standard Debian packaging, which provides only the newest libpq but per-major-version server packages. The fix is to undo that part of commit 3c86223c998. That way, the typedef is in the same header file across versions. At least, this is the safest fix doable before PostgreSQL 18 releases. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/25144219-5142-4589-89f8-4e76948b32db%40eisentraut.org
2025-09-16Fix incorrect const qualifierPeter Eisentraut
Commit 7202d72787d added in passing some const qualifiers, but the one on the postmaster_child_launch() startup_data argument was incorrect, because the function itself modifies the pointed-to data. This is hidden from the compiler because of casts. The qualifiers on the functions called by postmaster_child_launch() are still correct.
2025-09-15Change fmgr.h typedefs to use original namesPeter Eisentraut
fmgr.h defined some types such as fmNodePtr which is just Node *, but it made its own types to avoid having to include various header files. With C11, we can now instead typedef the original names without fear of conflicts. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-15Remove hbaPort typePeter Eisentraut
This was just a workaround to avoid including the header file that defines the Port type. With C11, we can now just re-define the Port type without the possibility of a conflict. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-15Update various forward declarations to use typedefPeter Eisentraut
There are a number of forward declarations that use struct but not the customary typedef, because that could have led to repeat typedefs, which was not allowed. This is now allowed in C11, so we can update these to provide the typedefs as well, so that the later uses of the types look more consistent. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-15Improve ExplainState type handling in header filesPeter Eisentraut
Now that we can have repeat typedefs with C11, we don't need to use "struct ExplainState" anymore but can instead make a typedef where necessary. This doesn't change anything but makes it look nicer. (There are more opportunities for similar changes, but this is broken out because there was a separate discussion about it, and it's somewhat bulky on its own.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f36c0a45-98cd-40b2-a7cc-f2bf02b12890%40eisentraut.org#a12fb1a2c1089d6d03010f6268871b00 Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org