summaryrefslogtreecommitdiff
path: root/src/backend/commands
AgeCommit message (Collapse)Author
2025-03-19Introduce io_max_combine_limit.Thomas Munro
The existing io_combine_limit can be changed by users. The new io_max_combine_limit is fixed at server startup time, and functions as a silent clamp on the user setting. That in itself is probably quite useful, but the primary motivation is: aio_init.c allocates shared memory for all asynchronous IOs including some per-block data, and we didn't want to waste memory you'd never used by assuming they could be up to PG_IOV_MAX. This commit already halves the size of 'AioHandleIov' and 'AioHandleData'. A follow-up commit can now expand PG_IOV_MAX without affecting that. Since our GUC system doesn't support dependencies or cross-checks between GUCs, the user-settable one now assigns a "raw" value to io_combine_limit_guc, and the lower of io_combine_limit_guc and io_max_combine_limit is maintained in io_combine_limit. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-18Add some new hooks so extensions can add details to EXPLAIN.Robert Haas
Specifically, add a per-node hook that is called after the per-node information has been displayed but before we display children, and a per-query hook that is called after existing query-level information is printed. This assumes that extension-added information should always go at the end rather than the beginning or the middle, but that seems like an acceptable limitation for simplicity. It also assumes that extensions will only want to add information, not remove or reformat existing details; those also seem like acceptable restrictions, at least for now. If multiple EXPLAIN extensions are used, the order in which any additional details are printed is likely to depend on the order in which the modules are loaded. That seems OK, since the user may have opinions about the order in which output should appear, and the extension author can't really know whether their stuff is more or less important to a particular user than some other extension. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18Make it possible for loadable modules to add EXPLAIN options.Robert Haas
Modules can use RegisterExtensionExplainOption to register new EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState, and SetExplainExtensionState to store related state inside the ExplainState object. Since this substantially increases the amount of code that needs to handle ExplainState-related tasks, move a few bits of existing code to a new file explain_state.c and add the rest of this infrastructure there. See the comments at the top of explain_state.c for further explanation of how this mechanism works. This does not yet provide a way for such such options to do anything useful. The intention is that we'll add hooks for that purpose in a separate commit. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18Allow non-btree unique indexes for matviewsPeter Eisentraut
We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. Stop rejecting these indexes, but instead rely on all unique indexes having equality operators. Unique indexes must have equality operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18Allow non-btree unique indexes for partition keysPeter Eisentraut
We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. The problem of not knowing the strategy number for equality in other index AMs is already resolved. Stop rejecting the indexes upfront, and instead reject any for which the equality operator lookup fails. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18Use correct variable name in publicationcmds.c.Amit Kapila
subid was used at few places for publicationid in publicationcmds.c/.h. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-14Remove direct handling of reloptions for toast tablesÁlvaro Herrera
It doesn't actually work, even with allow_system_table_mods turned on: the ALTER TABLE operation is rejected by ATSimplePermissions(), so even the error message we're adding in this commit is unreachable. Add a test case for it. Author: Nikolay Shaplov <dhyan@nataraj.su> Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
2025-03-13ATExecSetRelOptions: Reduce scope of 'isnull' variableÁlvaro Herrera
Author: Nikolay Shaplov <dhyan@nataraj.su> Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru> Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
2025-03-13Avoid invalidating all RelationSyncCache entries on publication rename.Amit Kapila
On Publication rename, we need to only invalidate the RelationSyncCache entries corresponding to relations that are part of the publication being renamed. As part of this patch, we introduce a new invalidation message to invalidate the cache maintained by the logical decoding output plugin. We can't use existing relcache invalidation for this purpose, as that would unnecessarily cause relcache invalidations in other backends. This will improve performance by building fewer relation cache entries during logical replication. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-11Assert that a snapshot is active or registered before it's usedHeikki Linnakangas
The comment in GetTransactionSnapshot() said that you "should call RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be used very long". That felt too unclear to me. Make the comment more strongly worded. To enforce that rule and to catch potential bugs where a snapshot might get invalidated while it's still in use, add an assertion to HeapTupleSatisfiesMVCC() to check that the snapshot is registered or pushed to active stack. No new bugs were found by this, but it seems like good future-proofing. It's not a great place for the check; HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered snapshot, and the assertion won't catch other unsafe uses. But it goes a long way in practice. Fix a few cases that were playing fast and loose with that and just assumed that the snapshot cannot be invalidated during a scan. Those assumptions were not wrong, but they're not performance critical, so let's drop the excuses and just register the snapshot. These were false positives found by the new assertion. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11Improve EXPLAIN's display of window functions.Tom Lane
Up to now we just punted on showing the window definitions used in a plan, with window function calls represented as "OVER (?)". To improve that, show the window definition implemented by each WindowAgg plan node, and reference their window names in OVER. For nameless window clauses generated by "OVER (...)", assign unique names w1, w2, etc. In passing, re-order the properties shown for a WindowAgg node so that the Run Condition (if any) appears after the Window property and before the Filter (if any). This seems more sensible since the Run Condition is associated with the Window and acts before the Filter. Thanks to David G. Johnston and Álvaro Herrera for design suggestions. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11Show index search count in EXPLAIN ANALYZE, take 2.Peter Geoghegan
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit 5ead85fb (though that was quickly reverted by commit d00107cd following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit 5a1e6df3, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11Move RemoveInheritedConstraint() call slightly earlierPeter Eisentraut
This change is harmless and does not affect the existing intended operation. It is necessary for a subsequent patch operation (NOT ENFORCED foreign keys), where we may need to change the child constraint to enforced. In this case, we would create the necessary triggers and queue the constraint for validation, so it is important to remove any unnecessary constraints before proceeding. This is a small change that could have been included in the previous "split tryAttachPartitionForeignKey" refactoring patch (commit 1d26c2d2c4b), but was kept separate to highlight the changes. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11refactor: Split tryAttachPartitionForeignKey()Peter Eisentraut
Split tryAttachPartitionForeignKey() into three functions: AttachPartitionForeignKey(), RemoveInheritedConstraint(), and DropForeignKeyConstraintTriggers(), so they can be reused in some subsequent patches for the NOT ENFORCED feature. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11refactor: re-add ATExecAlterChildConstr()Peter Eisentraut
ATExecAlterChildConstr() was removed in commit 80d7f990496, but it is needed in some subsequent patches for the NOT ENFORCED feature, to recurse over child constraints. This adds it back in slightly altered form. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-10Remove support for temporal RESTRICT foreign keysPeter Eisentraut
It isn't clear how these should behave, so let's wait to implement them until we are sure how to do it. This feature was initially added by commit 89f908a6d0a, so it hasn't been released yet. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://postgr.es/m/e773bc11-4ac1-40de-bb91-814e02f05b6d%40eisentraut.org
2025-03-05Revert "Show index search count in EXPLAIN ANALYZE."Peter Geoghegan
This reverts commit 5ead85fbc81162ab1594f656b036a22e814f96b3. This commit shows test failures with debug_parallel_query=regress. The underlying issue needs to be debugged, so revert for now.
2025-03-05Show index search count in EXPLAIN ANALYZE.Peter Geoghegan
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index scans isn't predictable in advance (at least not with optimizations like the one added to nbtree by Postgres 17 commit 5bf748b8). It will also be useful when EXPLAIN ANALYZE shows details of an nbtree index scan that uses skip scan optimizations set to be introduced by an upcoming patch. The instrumentation works by teaching index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs must already increment the index's pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). The new counter is stored in the scan descriptor (IndexScanDescData), which explain.c reaches by going through the scan node's PlanState. This approach doesn't match the approach used when tracking other index scan specific costs (e.g., "Rows Removed by Filter:"). It is similar to the approach used in other cases where we must track costs that are only readily accessible inside an access method, and not from the executor (e.g., "Heap Blocks:" output for a Bitmap Heap Scan). It is inherently necessary to maintain a counter that can be incremented multiple times during a single amgettuple call (or amgetbitmap call), and directly exposing PlanState.instrument to index access methods seems unappealing. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
2025-03-05Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERITÁlvaro Herrera
This allows to redefine an existing non-inheritable constraint to be inheritable, which allows to straighten up situations with NO INHERIT constraints so that thay can become normal constraints without having to re-verify existing data. For existing inheritance children this may require creating additional constraints, if they don't exist already. It also allows to do the opposite, if only for symmetry. Author: Suraj Kharage <suraj.kharage@enterprisedb.com> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com
2025-03-04Refactor Copy{From|To}GetRoutine() to use pass-by-reference argument.Masahiko Sawada
The change improves efficiency by eliminating unnecessary copying of CopyFormatOptions. The coverity also complained about inefficiencies caused by pass-by-value. Oversight in 7717f6300 and 2e4127b6d. Reported-by: Junwang Zhao <zhjwpku@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per reports from coverity) Author: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAEG8a3L6YCpPksTQMzjD_CvwDEhW3D_t=5md9BvvdOs5k+TA=Q@mail.gmail.com
2025-03-04Fix accidental use of = instead of ==Peter Eisentraut
Fix for commit 630f9a43cec. It used = instead of ==. The result would be an incorrect error message. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CA%2BCOZaC-JMbhQ4O0Q8V1Bxa0R%2BNex_RN9D6UyuLPiEx_CK4Heg%40mail.gmail.com
2025-03-04Fix ALTER TABLE ADD VIRTUAL GENERATED COLUMN when table rewritePeter Eisentraut
demo: CREATE TABLE gtest20a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL); ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 60); ERROR: no generation expression found for column number 2 of table "pg_temp_17306" In ATRewriteTable, the variable OIDNewHeap (if valid) corresponding pg_attrdef default expression entry was not populated. So OIDNewHeap cannot be used to call expand_generated_columns_in_expr or build_generation_expression. Therefore in ATRewriteTable, we can only use the existing relation to expand the generated expression. Author: jian he <jian.universality@gmail.com> Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CACJufxEJ%3DFoajabWXjszo_yrQeKSxdZ87KJqBW373rSbajKGAA%40mail.gmail.com
2025-03-03Simplify some logic around setting pg_attribute.atthasdef.Tom Lane
DefineRelation was of the opinion that it could usefully pre-fill atthasdef flags to eliminate work for StoreAttrDefault. This is not the case, however: the tupledesc that it's filling is not the one that InsertPgAttributeTuples will work from. The tupledesc used there is made by RelationBuildLocalRelation, which deliberately doesn't copy atthasdef. Moreover, if this did happen as the code thinks, it would be wrong for the case of plain "DEFAULT NULL" clauses, since we detect and ignore simple-null-Const defaults later on. Hence, remove the useless code. It also emerges that it's not really worth a special-case path in StoreAttrDefault() for atthasdef already being set, because as far as we can see that never happens: cases where an existing default gets updated always do RemoveAttrDefault first, so as to clean up possibly-no-longer-correct dependency entries. If it were the case the code would still work, anyway. Also remove a nearby comment made moot by 5eaa0e92e. Author: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
2025-03-03Remove now-dead code in StoreAttrDefault().Tom Lane
StoreAttrDefault() is no longer responsible for filling attmissingval, so remove the code for that. Get rid of RawColumnDefault.missingMode, too, as we no longer need that to pass information around. While here, clean up some sloppy coding in StoreAttrDefault(), such as failure to use XXXGetDatum macros. These aren't bugs but they're not good code either. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
2025-03-03Fix broken handling of domains in atthasmissing logic.Tom Lane
If a domain type has a default, adding a column of that type (without any explicit DEFAULT clause) failed to install the domain's default value in existing rows, instead leaving the new column null. This is unexpected, and it used to work correctly before v11. The cause is confusion in the atthasmissing mechanism about which default value to install: we'd only consider installing an explicitly-specified default, and then we'd decide that no table rewrite is needed. To fix, take the responsibility for filling attmissingval out of StoreAttrDefault, and instead put it into ATExecAddColumn's existing logic that derives the correct value to fill the new column with. Also, centralize the logic that determines the need for default-related table rewriting there, instead of spreading it over four or five places. In the back branches, we'll leave the attmissingval-filling code in StoreAttrDefault even though it's now dead, for fear that some extension may be depending on that functionality to exist there. A separate HEAD-only patch will clean up the now-useless code. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com Backpatch-through: 13
2025-03-03Add relallfrozen to pg_classMelanie Plageman
Add relallfrozen, an estimate of the number of pages marked all-frozen in the visibility map. pg_class already has relallvisible, an estimate of the number of pages in the relation marked all-visible in the visibility map. This is used primarily for planning. relallfrozen, together with relallvisible, is useful for estimating the outstanding number of all-visible but not all-frozen pages in the relation for the purposes of scheduling manual VACUUMs and tuning vacuum freeze parameters. A future commit will use relallfrozen to trigger more frequent vacuums on insert-focused workloads with significant volume of frozen data. Bump catalog version Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
2025-02-28Re-export NextCopyFromRawFields() to copy.h.Masahiko Sawada
Commit 7717f630069 removed NextCopyFromRawFields() from copy.h. While it was hoped that NextCopyFrom() could serve as an alternative, certain use cases still require NextCopyFromRawFields(). For instance, extensions like file_text_array_fdw, which process source data with an unknown number of columns, rely on this function. Per buildfarm member crake. Reported-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Discussion: https://postgr.es/m/5c7e1ac8-5083-4c08-af19-cb9ade2f16ce@dunslane.net
2025-02-28Refactor COPY FROM to use format callback functions.Masahiko Sawada
This commit introduces a new CopyFromRoutine struct, which is a set of callback routines to read tuples in a specific format. It also makes COPY FROM with the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY FROM command extensible in terms of input formats. Similar to 2e4127b6d2d, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-28Avoid including explain.h in explain_format.h and explain_dr.hRobert Haas
As per a suggestion from Tom Lane, we do this by declaring "struct ExplainState" here and refer to that rather than "ExplainState". Also per Tom, CreateExplainSerializeDestReceiver was still defined in explain.h in addition to explain_dr.h. Remove leftover prototype. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYtaad3i21V0jqua-fbr+CR0ix6uBvEX8_s6BG96abd=g@mail.gmail.com
2025-02-28Fix missing space in EXPLAIN ANALYZE output.Robert Haas
Commit ddb17e387aa28d61521227377b00f997756b8a27 introduced this regression. Ideally, the regression tests would have caught this mistake, but apparently they don't test with timing enabled, presumably because that would make the output vary. Author: Thom Brown <thom@linux.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Discussion: http://postgr.es/m/CAA-aLv6nq=UeiyvM7_Mxgo9TVBzs2oh46b9vfyLzuyVEz3j1-g@mail.gmail.com
2025-02-27Refactor COPY TO to use format callback functions.Masahiko Sawada
This commit introduces a new CopyToRoutine struct, which is a set of callback routines to copy tuples in a specific format. It also makes the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Additionally, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-27Create explain_dr.c and move DestReceiver-related code there.Robert Haas
explain.c has grown rather large, and the code that deals with the DestReceiver that supports the SERIALIZE option is pretty easily severable from the rest of explain.c; hence, move it to a separate file. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27Create explain_format.c and move relevant code there.Robert Haas
explain.c has grown rather large, so move various functions that are principally concerned with output generation to a new source file, explain_format.c, instead of lumping them in with everything else that is part of explain.c Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27EXPLAIN: Always use two fractional digits for row counts.Robert Haas
Commit ddb17e387aa28d61521227377b00f997756b8a27 attempted to avoid confusing users by displaying digits after the decimal point only when nloops > 1, since it's impossible to have a fraction row count after a single iteration. However, this made the regression tests unstable since parallal queries will have nloops>1 for all nodes below the Gather or Gather Merge in normal cases, but if the workers don't start in time and the leader finishes all the work, they will suddenly have nloops==1, making it unpredictable whether the digits after the decimal point would be displayed or not. Although 44cbba9a7f51a3888d5087fc94b23614ba2b81f2 seemed to fix the immediate failures, it may still be the case that there are lower-probability failures elsewhere in the regression tests. Various fixes are possible here. For example, it has previously been proposed that we should try to display the digits after the decimal point only if rows/nloops is an integer, but currently rows is storead as a float so it's not theoretically an exact quantity -- precision could be lost in extreme cases. It has also been proposed that we should try to display the digits after the decimal point only if we're under some sort of construct that could potentially cause looping regardless of whether it actually does. While such ideas are not without merit, this patch adopts the much simpler solution of always display two decimal digits. If that approach stands up to scrutiny from the buildfarm and human users, it spares us the trouble of doing anything more complex; if not, we can reassess. This commit incidentally reverts 44cbba9a7f51a3888d5087fc94b23614ba2b81f2, which should no longer be needed. Author: Robert Haas <robertmhaas@gmail.com> Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Discussion: http://postgr.es/m/CA+TgmoazzVHn8sFOMFAEwoqBTDxKT45D7mvkyeHgqtoD2cn58Q@mail.gmail.com
2025-02-27Generalize hash and ordering support in amapiPeter Eisentraut
Stop comparing access method OID values against HASH_AM_OID and BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see if it advertises its ability to perform the necessary ordering, hashing, or cross-type comparing functionality. A field amcanorder already existed, this uses it more widely. Fields amcanhash and amcancrosscompare are added for the other purposes. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-21Avoid race condition between "GRANT role" and "DROP ROLE".Tom Lane
Concurrently dropping either the granted role or the grantee does not stop GRANT from completing, instead resulting in a dangling role reference in pg_auth_members. That's relatively harmless in the short run, but inconsistent catalog entries are not a good thing. This patch solves the problem by adding the granted and grantee roles as explicit shared dependencies of the pg_auth_members entry. That's a bit indirect, but it works because the pg_shdepend code applies the necessary locking and rechecking. Commit 6566133c5 previously established similar handling for the grantor column of pg_auth_members; it's not clear why it didn't cover the other two role OID columns. A side-effect of this approach is that DROP OWNED BY will now drop pg_auth_members entries that mention the target role as either the granted or grantee role. That's clearly appropriate for the grantee, since we'll drop its other privileges too. It doesn't seem too far out of line for the granted role, since we're presumably about to drop it and besides we're removing all reasons why it'd matter to be a member of it. (One could argue that this makes DropRole's code to auto-drop pg_auth_members entries unnecessary, but I chose to leave it in place since perhaps some people's workflows expect that to work without a DROP OWNED BY.) Note to patch readers: CreateRole's first CommandCounterIncrement call is now unconditional, because this change creates another case in which it's needed, and it seemed to be more trouble than it's worth to preserve that micro-optimization. Arguably this is a bug fix, but the fact that it changes the expected contents of pg_shdepend seems like not a great thing to do in the stable branches, and perhaps we don't want the change in DROP OWNED BY semantics there either. On the other hand, I opted not to force a catversion bump in HEAD, because the presence or absence of these entries doesn't matter for most purposes. Reported-by: Virender Singla <virender.cse@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
2025-02-21Allow EXPLAIN to indicate fractional rows.Robert Haas
When nloops > 1, we now display two digits after the decimal point, rather than none. This is important because what we print is actually planstate->instrument->ntuples / nloops, and sometimes what you want to know is planstate->instrument->ntuples. You can estimate that by multiplying the displayed row count by the displayed nloops value, but the fact that the displayed value is rounded makes that inexact. It's still inexact even if we show these two extra decimal places, but less so. Perhaps we will agree on a way to further improve this output later, but for now this seems better than doing nothing. Author: Ibrar Ahmed <ibrar.ahmad@gmail.com> Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Naeem Akhter <akhternaeem@gmail.com> Reviewed-by: Hamid Akhtar <hamid.akhtar@percona.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Guillaume Lelarge <guillaume@lelarge.info> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: http://postgr.es/m/603c8f070905281830g2e5419c4xad2946d149e21f9d%40mail.gmail.com
2025-02-21Fix a WARNING for data origin discrepancies.Amit Kapila
Previously, a WARNING was issued at the time of defining a subscription with origin=NONE only when the publisher subscribed to the same table from other publishers, indicating potential data origination from different origins. However, the publisher can subscribe to the partition ancestors or partition children of the table from other publishers, which could also result in mixed-origin data inclusion. So, give a WARNING in those cases as well. Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/5eda6a9c-63cf-404d-8a49-8dcb116a29f3@postgrespro.ru
2025-02-21Drop opcintype from index AM strategy translation APIPeter Eisentraut
The type argument wasn't actually really necessary. It was a remnant of converting the API of the gist strategy translation from using opclass to using opfamily+opcintype (commits c09e5a6a016, 622f678c102). For looking up the gist translation function, we used the convention "amproclefttype = amprocrighttype = opclass's opcintype" (see pg_amproc.h). But each operator family should only have one translation function, and getting the right type for the lookup is sometimes cumbersome and fragile, so this is all unnecessarily complicated. To simplify this, change the gist stategy support procedure to take "any", "any" as argument. (This is arbitrary but seems intuitive. The alternative of using InvalidOid as argument(s) upsets various DDL commands, so it's not practical.) Then we don't need opcintype for the lookup, and we can remove it from all the API layers introduced by commit c09e5a6a016. This also adds some more documentation about the correct signature of the gist support function and adds more checks in gistvalidate(). This was previously underspecified. (It relied implicitly on convention mentioned above.) Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-20Don't lock partitions pruned by initial pruningAmit Langote
Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit cbc127917e), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit cbc127917e), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-02-19Add ATAlterConstraint struct for ALTER .. CONSTRAINTÁlvaro Herrera
Replace the use of Constraint with a new ATAlterConstraint struct, which allows us to pass additional information. No functionality is added by this commit. This is necessary for future work that allows altering constraints in other ways. I (Álvaro) took the liberty of restructuring the code for ALTER CONSTRAINT beyond what Amul did. The original coding before Amul's patch was unnecessarily baroque, and this change makes things simpler by removing one level of subroutine. Also, partly remove the assumption that only partitioned tables are relevant (by passing sensible 'recurse' arguments) and no longer ignore whether ONLY was specified. I say 'partly' because the current coding only walks down via the 'conparentid' relationship, which is only used for partitioned tables; but future patches could handle ONLY or not for other types of constraint changes for legacy inheritance trees too. Author: Amul Sul <sulamul@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com
2025-02-17Add information about WAL buffers full to VACUUM/ANALYZE (VERBOSE)Michael Paquier
This commit adds the information about the number of times WAL buffers have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in the logs generated by autovacuum, complementing the existing information stored by WalUsage. This is the last part of the backend code where the value of wal_buffers_full can be reported, similarly to all the other fields of WalUsage. 320545bfcfee and ce5bcc4a9f26 have done the same for EXPLAIN and pgss. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-17Add information about WAL buffers being full to EXPLAIN (WAL)Michael Paquier
This is similar to ce5bcc4a9f26, relying on the addition of wal_buffers_full to WalUsage. This time, the information is added to the output generated by EXPLAIN (WAL). Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-14Add delay time to VACUUM/ANALYZE (VERBOSE) and autovacuum logs.Nathan Bossart
Commit bb8dff9995 added this information to the pg_stat_progress_vacuum and pg_stat_progress_analyze system views. This commit adds the same information to the output of VACUUM and ANALYZE with the VERBOSE option and to the autovacuum logs. Suggested-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-13Remove unused parameter from execute_extension_script().Nathan Bossart
This function's schemaOid parameter appears to have never been used for anything. Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Discussion: https://postgr.es/m/20250214010218.550ebe4ec1a7c7811a7fa2bb%40sraoss.co.jp
2025-02-13Remove unnecessary (char *) casts [xlog]Peter Eisentraut
Remove (char *) casts no longer needed after XLogRegisterData() and XLogRegisterBufData() argument type change. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-13Fix MakeTransitionCaptureState() to return a consistent resultMichael Paquier
When an UPDATE trigger referencing a new table and a DELETE trigger referencing an old table are both present, MakeTransitionCaptureState() returns an inconsistent result for UPDATE commands in its set of flags and tuplestores holding the TransitionCaptureState for transition tables. As proved by the test added here, this issue causes a crash in v14 and earlier versions (down to 11, actually, older versions do not support triggers on partitioned tables) during cross-partition updates on a partitioned table. v15 and newer versions are safe thanks to 7103ebb7aae8. This commit fixes the function so that it returns a consistent state by using portions of the changes made in commit 7103ebb7aae8 for v13 and v14. v15 and newer versions are slightly tweaked to match with the older versions, mainly for consistency across branches. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20250207.150238.968446820828052276.horikyota.ntt@gmail.com Backpatch-through: 13
2025-02-11Add cost-based vacuum delay time to progress views.Nathan Bossart
This commit adds the amount of time spent sleeping due to cost-based delay to the pg_stat_progress_vacuum and pg_stat_progress_analyze system views. A new configuration parameter named track_cost_delay_timing, which is off by default, controls whether this information is gathered. For vacuum, the reported value includes the sleep time of any associated parallel workers. However, parallel workers only report their sleep time once per second to avoid overloading the leader process. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Sergei Kornilov <sk@zsrv.org> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-11Add is_analyze parameter to vacuum_delay_point().Nathan Bossart
This function is used in both vacuum and analyze code paths, and a follow-up commit will require distinguishing between the two. This commit forces callers to specify whether they are in a vacuum or analyze path, but it does not use that information for anything yet. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-11Eagerly scan all-visible pages to amortize aggressive vacuumMelanie Plageman
Aggressive vacuums must scan every unfrozen tuple in order to advance the relfrozenxid/relminmxid. Because data is often vacuumed before it is old enough to require freezing, relations may build up a large backlog of pages that are set all-visible but not all-frozen in the visibility map. When an aggressive vacuum is triggered, all of these pages must be scanned. These pages have often been evicted from shared buffers and even from the kernel buffer cache. Thus, aggressive vacuums often incur large amounts of extra I/O at the expense of foreground workloads. To amortize the cost of aggressive vacuums, eagerly scan some all-visible but not all-frozen pages during normal vacuums. All-visible pages that are eagerly scanned and set all-frozen in the visibility map are counted as successful eager freezes and those not frozen are counted as failed eager freezes. If too many eager scans fail in a row, eager scanning is temporarily suspended until a later portion of the relation. The number of failures tolerated is configurable globally and per table. To effectively amortize aggressive vacuums, we cap the number of successes as well. Capping eager freeze successes also limits the amount of potentially wasted work if these pages are modified again before the next aggressive vacuum. Once we reach the maximum number of blocks successfully eager frozen, eager scanning is disabled for the remainder of the vacuum of the relation. Original design idea from Robert Haas, with enhancements from Andres Freund, Tomas Vondra, and me Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com