| Age | Commit message (Collapse) | Author |
|
Like commit 123661427b, these were discovered while reviewing
Aleksander Alekseev's proposed changes to pgindent.
|
|
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent. Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates. The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions. Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable. Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.
Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.
Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit. It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit. The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
|
|
For several types of plan nodes that use TupleHashTables, the
planner estimated the expected size of the table as basically
numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)).
This is pretty far off, especially for small data widths, because
it doesn't account for the overhead of the simplehash.h hash table
nor for any per-tuple "additional space" the plan node may request.
Jeff Janes noted a case where the estimate was off by about a factor
of three, even though the obvious hazards such as inaccurate estimates
of numEntries or dataWidth didn't apply.
To improve matters, create functions provided by the relevant executor
modules that can estimate the required sizes with reasonable accuracy.
(We're still not accounting for effects like allocator padding, but
this at least gets the first-order effects correct.)
I added functions that can estimate the tuple table sizes for
nodeSetOp and nodeSubplan; these rely on an estimator for
TupleHashTables in general, and that in turn relies on one for
simplehash.h hash tables. That feels like kind of a lot of mechanism,
but if we take any short-cuts we're violating modularity boundaries.
The other places that use TupleHashTables are nodeAgg, which took
pains to get its numbers right already, and nodeRecursiveunion.
I did not try to improve the situation for nodeRecursiveunion because
there's nothing to improve: we are not making an estimate of the hash
table size, and it wouldn't help us to do so because we have no
non-hashed alternative implementation. On top of that, our estimate
of the number of entries to be hashed in that module is so suspect
that we'd likely often choose the wrong implementation if we did have
two ways to do it.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
|
|
For all extant uses of TupleHashTables, execGrouping.c itself does
nothing with the "tablecxt" except to allocate new hash entries in it,
and the callers do nothing with it except to reset the whole context.
So this is an ideal use-case for a BumpContext, and the hash tables
are frequently big enough for the savings to be significant.
(Commit cc721c459 already taught nodeAgg.c this idea, but neglected
the other callers of BuildTupleHashTable.)
While at it, let's clean up some ill-advised leftovers from rebasing
TupleHashTables on simplehash.h:
* Many comments and variable names were based on the idea that the
tablecxt holds the whole TupleHashTable, whereas now it only holds the
hashed tuples (plus any caller-defined "additional storage"). Rename
to names like tuplescxt and tuplesContext, and adjust the comments.
Also adjust the memory context names to be like "<Foo> hashed tuples".
* Make ResetTupleHashTable() reset the tuplescxt rather than relying
on the caller to do so; that was fairly bizarre and seems like a
recipe for leaks. This is less efficient in the case where nodeAgg.c
uses the same tuplescxt for several different hashtables, but only
microscopically so because mcxt.c will short-circuit the extra resets
via its isReset flag. I judge the extra safety and intellectual
cleanliness well worth those few cycles.
* Remove the long-obsolete "allow_jit" check added by ac88807f9;
instead, just Assert that metacxt and tuplescxt are different.
We need that anyway for this definition of ResetTupleHashTable() to
be safe.
There is a side issue of the extent to which this change invalidates
the planner's estimates of hashtable memory consumption. However,
those estimates are already pretty bad, so improving them seems like
it can be a separate project. This change is useful to do first to
establish consistent executor behavior that the planner can expect.
A loose end not addressed here is that the "entrysize" calculation
in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) +
additionalsize" corresponds neither to the size of the simplehash
entries nor to the total space needed per tuple. It's questionable
why BuildTupleHashTable is second-guessing its caller's nbuckets
choice at all, since the original source of the number should have had
more information. But that all seems wrapped up with the planner's
estimation logic, so let's leave it for the planned followup patch.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us
|
|
This is a follow up 991295f. I searched over src/ and made all
ItemPointer arguments as const as much as possible.
Note: We cut out from the original patch the pieces that would have
created incompatibilities in the index or table AM APIs. Those could
be considered separately.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
|
|
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.
In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION
These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.
This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.
Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
|
|
There are a number of forward declarations that use struct but not the
customary typedef, because that could have led to repeat typedefs,
which was not allowed. This is now allowed in C11, so we can update
these to provide the typedefs as well, so that the later uses of the
types look more consistent.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
|
|
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.
Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
|
|
Commit e2d4ef8de86 (the fix for CVE-2017-7484) added security checks
to the selectivity estimation functions to prevent them from running
user-supplied operators on data obtained from pg_statistic if the user
lacks privileges to select from the underlying table. In cases
involving inheritance/partitioning, those checks were originally
performed against the child RTE (which for plain inheritance might
actually refer to the parent table). Commit 553d2ec2710 then extended
that to also check the parent RTE, allowing access if the user had
permissions on either the parent or the child. It turns out, however,
that doing any checks using the child RTE is incorrect, since
securityQuals is set to NULL when creating an RTE for an inheritance
child (whether it refers to the parent table or the child table), and
therefore such checks do not correctly account for any RLS policies or
security barrier views. Therefore, do the security checks using only
the parent RTE. This is consistent with how RLS policies are applied,
and the executor's ACL checks, both of which use only the parent
table's permissions/policies. Similar checks are performed in the
extended stats code, so update that in the same way, centralizing all
the checks in a new function.
In addition, note that these checks by themselves are insufficient to
ensure that the user has access to the table's data because, in a
query that goes via a view, they only check that the view owner has
permissions on the underlying table, not that the current user has
permissions on the view itself. In the selectivity estimation
functions, there is no easy way to navigate from underlying tables to
views, so add permissions checks for all views mentioned in the query
to the planner startup code. If the user lacks permissions on a view,
a permissions error will now be reported at planner-startup, and the
selectivity estimation functions will not be run.
Checking view permissions at planner-startup in this way is a little
ugly, since the same checks will be repeated at executor-startup.
Longer-term, it might be better to move all the permissions checks
from the executor to the planner so that permissions errors can be
reported sooner, instead of creating a plan that won't ever be run.
However, such a change seems too far-reaching to be back-patched.
Back-patch to all supported versions. In v13, there is the added
complication that UPDATEs and DELETEs on inherited target tables are
planned using inheritance_planner(), which plans each inheritance
child table separately, so that the selectivity estimation functions
do not know that they are dealing with a child table accessed via its
parent. Handle that by checking access permissions on the top parent
table at planner-startup, in the same way as we do for views. Any
securityQuals on the top parent table are moved down to the child
tables by inheritance_planner(), so they continue to be checked by the
selectivity estimation functions.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-8713
|
|
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.
This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
|
|
This was left out of the original patch for virtual generated columns
(commit 83ea6c54025).
This just involves a bit of extra work in the executor to expand the
generation expressions and run a "IS NOT NULL" test against them.
There is also a bit of work to make sure that not-null constraints are
checked during a table rewrite.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Navneet Kumar <thanit3111@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
|
|
Reduces memory required for hash aggregation by avoiding an allocation
and a pointer in the TupleHashEntryData structure. That structure is
used for all buckets, whether occupied or not, so the savings is
substantial.
Discussion: https://postgr.es/m/AApHDvpN4v3t_sdz4dvrv1Fx_ZPw=twSnxuTEytRYP7LFz5K9A@mail.gmail.com
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
|
|
Refactor for upcoming optimizations.
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/1cc3b400a0e8eead18ff967436fa9e42c0c14cfb.camel@j-davis.com
|
|
Commit cbc127917e introduced tracking of unpruned relids to avoid
processing pruned relations, and changed ExecInitModifyTable() to
initialize only unpruned result relations. As a result, MERGE
statements that prune all target partitions can now lead to crashes
or incorrect behavior during execution.
The crash occurs because some executor code paths rely on
ModifyTableState.resultRelInfo[0] being present and initialized,
even when no result relations remain after pruning. For example,
ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo
to determine the appropriate action. Similarly,
ExecInitPartitionInfo() assumes that at least one result relation
exists.
To preserve these assumptions, ExecInitModifyTable() now includes the
first result relation in the initialized result relation list if all
result relations for that ModifyTable were pruned. To enable that,
ExecDoInitialPruning() ensures the first relation is locked if it was
pruned and locking is necessary.
To support this exception to the pruning logic, PlannedStmt now
includes a list of RT indexes identifying the first result relation
of each ModifyTable node in the plan. This allows
ExecDoInitialPruning() to check whether each such relation was
pruned and, if so, lock it if necessary.
Bug: #18830
Reported-by: Robins Tharakan <tharakan@gmail.com>
Diagnozed-by: Tender Wang <tndrwang@gmail.com>
Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
|
|
Knowing when the side-effects of an expression is the intended result
of the execution, rather than the returnvalue, is important for being
able generate more efficient JITed code. This replaces EEOP_DONE with
two new steps: EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN. Expressions
which return a value should use the former step; expressions used for
their side-effects which don't return value should use the latter.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
|
|
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.
This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.
* Changes to locking when executing generic plans:
AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily. The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.
This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.
ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.
* Testing:
To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.
The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
This commit introduces changes to track unpruned relations explicitly,
making it possible for top-level plan nodes, such as ModifyTable and
LockRows, to avoid processing partitions pruned during initial
pruning. Scan-level nodes, such as Append and MergeAppend, already
avoid the unnecessary processing by accessing partition pruning
results directly via part_prune_index. In contrast, top-level nodes
cannot access pruning results directly and need to determine which
partitions remain unpruned.
To address this, this commit introduces a new bitmapset field,
es_unpruned_relids, which the executor uses to track the set of
unpruned relations. This field is referenced during plan
initialization to skip initializing certain nodes for pruned
partitions. It is initialized with PlannedStmt.unprunableRelids,
a new field that the planner populates with RT indexes of relations
that cannot be pruned during runtime pruning. These include relations
not subject to partition pruning and those required for execution
regardless of pruning.
PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
ExecDoInitialPruning() then updates es_unpruned_relids by adding
partitions that survive initial pruning.
To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes to
their corresponding RT indexes. The former is used in set_plan_refs()
when constructing unprunableRelids, while the latter is used in
ExecDoInitialPruning() to convert partition indexes returned by
get_matching_partitions() into RT indexes, which are then added to
es_unpruned_relids.
These changes make it possible for ModifyTable and LockRows nodes to
process only relations that remain unpruned after initial pruning.
ExecInitModifyTable() trims lists, such as resultRelations,
withCheckOptionLists, returningLists, and updateColnosLists, to
consider only unpruned partitions. It also creates ResultRelInfo
structs only for these partitions. Similarly, child RowMarks for
pruned relations are skipped.
By avoiding unnecessary initialization of structures for pruned
partitions, these changes improve the performance of updates and
deletes on partitioned tables during initial runtime pruning.
Due to ExecInitModifyTable() changes as described above, EXPLAIN on a
plan for UPDATE and DELETE that uses runtime initial pruning no longer
lists partitions pruned during initial pruning.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
This turns GistTranslateCompareType() into a callback function of the
gist index AM instead of a standalone function. The existing callers
are changed to use IndexAmTranslateCompareType(). This then makes
that code not hardcoded toward gist.
This means in particular that the temporal keys code is now
independent of gist. Also, this generalizes commit 74edabce7a3, so
other index access methods other than the previously hardcoded ones
could now work as REPLICA IDENTITY in a logical replication
subscriber.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:
RETURNING old.colname, new.colname, ...
RETURNING old.*, new.*
Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:
RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...
This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.
For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.
Dean Rasheed, reviewed by Jian He and Jeff Davis.
Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
|
|
This reverts commit e0ece2a981ee9068f50c4423e303836c2585eb02 due to
performance regressions.
Reported-by: David Rowley
|
|
Previously, the caller needed to allocate the memory and the
TupleHashTable would store a pointer to it. That wastes space for the
palloc overhead as well as the size of the pointer itself.
Now, the TupleHashTable relies on the caller to correctly specify the
additionalsize, and allocates that amount of space. The caller can
then request a pointer into that space.
Discussion: https://postgr.es/m/b9cbf0219a9859dc8d240311643ff4362fd9602c.camel@j-davis.com
Reviewed-by: Heikki Linnakangas
|
|
Backpatch-through: 13
|
|
It was reasonable to preserve the old API of BuildTupleHashTable()
in the back branches, but in HEAD we should actively discourage use
of that version. There are no remaining callers in core, so just
get rid of it. Then rename BuildTupleHashTableExt() back to
BuildTupleHashTable().
While at it, fix up the miserably-poorly-maintained header comment
for BuildTupleHashTable[Ext]. It looks like more than one patch in
this area has had the opinion that updating comments is beneath them.
Discussion: https://postgr.es/m/538343.1734646986@sss.pgh.pa.us
|
|
The original design for set operations involved appending the two
input relations into one and adding a flag column that allows
distinguishing which side each row came from. Then the SetOp node
pries them apart again based on the flag. This is bizarre. The
only apparent reason to do it is that when sorting, we'd only need
one Sort node not two. But since sorting is at least O(N log N),
sorting all the data is actually worse than sorting each side
separately --- plus, we have no chance of taking advantage of
presorted input. On top of that, adding the flag column frequently
requires an additional projection step that adds cycles, and then
the Append node isn't free either. Let's get rid of all of that
and make the SetOp node have two separate children, using the
existing outerPlan/innerPlan infrastructure.
This initial patch re-implements nodeSetop.c and does a bare minimum
of work on the planner side to generate correctly-shaped plans.
In particular, I've tried not to change the cost estimates here,
so that the visible changes in the regression test results will only
involve removal of useless projection steps and not any changes in
whether to use sorted vs hashed mode.
For SORTED mode, we combine successive identical tuples from each
input into groups, and then merge-join the groups. The tuple
comparisons now use SortSupport instead of simple equality, but
the group-formation part should involve roughly the same number of
tuple comparisons as before. The cross-comparisons between left and
right groups probably add to that, but I'm not sure to quantify how
many more comparisons we might need.
For HASHED mode, nodeSetop's logic is almost the same as before,
just refactored into two separate loops instead of one loop that
has an assumption that it will see all the left-hand inputs first.
In both modes, I added early-exit logic to not bother reading the
right-hand relation if the left-hand input is empty, since neither
INTERSECT nor EXCEPT modes can produce any output if the left input
is empty. This could have been done before in the hashed mode, but
not in sorted mode. Sorted mode can also stop as soon as it exhausts
the left input; any remaining right-hand tuples cannot have matches.
Also, this patch adds some infrastructure for detecting whether
child plan nodes all output the same type of tuple table slot.
If they do, the hash table logic can use slightly more efficient
code based on assuming that that's the input slot type it will see.
We'll make use of that infrastructure in other plan node types later.
Patch by me; thanks to Richard Guo and David Rowley for review.
Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
|
|
0f5738202 adjusted the execGrouping.c code so it made use of ExprStates to
generate hash values. That commit made a wrong assumption that the slot
type to pass to ExecBuildHash32FromAttrs() is always &TTSOpsMinimalTuple.
That's not the case as the slot type depends on the slot type passed to
LookupTupleHashEntry(), which for nodeRecursiveunion.c, could be any of
the current slot types.
Here we fix this by adding a new parameter to BuildTupleHashTableExt()
to allow the slot type to be passed in. In the case of nodeSubplan.c
and nodeAgg.c the slot type is always &TTSOpsVirtual, so for both of
those cases, it's beneficial to pass the known slot type as that allows
ExecBuildHash32FromAttrs() to skip adding the tuple deform step to the
resulting ExprState. Another possible fix would have been to have
ExecBuildHash32FromAttrs() set "fetch.kind" to NULL so that
ExecComputeSlotInfo() always determines the EEOP_INNER_FETCHSOME is
required, however, that option isn't favorable as slows down aggregation
and hashed subplan evaluation due to the extra (needless) deform step.
Thanks to Nathan Bossart for bisecting to find the offending commit
based on Paul's report.
Reported-by: Paul Ramsey <pramsey@cleverelephant.ca>
Discussion: https://postgr.es/m/99F064C1-B3EB-4BE7-97D2-D2A0AA487A71@cleverelephant.ca
|
|
This speeds up obtaining hash values for GROUP BY and hashed SubPlans by
using the ExprState support for hashing, thus allowing JIT compilation for
obtaining hash values for these operations.
This, even without JIT compilation, has been shown to improve Hash
Aggregate performance in some cases by around 15% and hashed NOT IN
queries in one case by over 30%, however, real-world cases are likely to
see smaller gains as the test cases used were purposefully designed to
have high hashing overheads by keeping the hash table small to prevent
additional memory overheads that would be a factor when working with large
hash tables.
In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the
initial value is stored directly in the ExprState's result field if
there are no expressions to hash. None of the current users of this
function use an initial value, so the bug is only hypothetical.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com
|
|
get_equal_strategy_number_for_am() gets the equal strategy number for
an AM. This currently only supports btree and hash. In the more
general case, this also depends on the operator class (see for example
GistTranslateStratnum()). To support that, replace this function with
get_equal_strategy_number() that takes an opclass and derives it from
there. (This function already existed before as a static function, so
the signature is kept for simplicity.)
This patch is only a refactoring, it doesn't add support for other
index AMs such as gist. This will be done separately.
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
|
|
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
|
|
This patch provides the additional logging information in the following
conflict scenarios while applying changes:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: Updating a row that was previously modified by another origin.
update_exists: The updated row value violates a NOT DEFERRABLE unique constraint.
update_missing: The tuple to be updated is missing.
delete_differ: Deleting a row that was previously modified by another origin.
delete_missing: The tuple to be deleted is missing.
For insert_exists and update_exists conflicts, the log can include the origin
and commit timestamp details of the conflicting key with track_commit_timestamp
enabled.
update_differ and delete_differ conflicts can only be detected when
track_commit_timestamp is enabled on the subscriber.
We do not offer additional logging for exclusion constraint violations because
these constraints can specify rules that are more complex than simple equality
checks. Resolving such conflicts won't be straightforward. This area can be
further enhanced if required.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar
Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
Here we add ExprState support for obtaining a 32-bit hash value from a
list of expressions. This allows both faster hashing and also JIT
compilation of these expressions. This is especially useful when hash
joins have multiple join keys as the previous code called ExecEvalExpr on
each hash join key individually and that was inefficient as tuple
deformation would have only taken into account one key at a time, which
could lead to walking the tuple once for each join key. With the new
code, we'll determine the maximum attribute required and deform the tuple
to that point only once.
Some performance tests done with this change have shown up to a 20%
performance increase of a query containing a Hash Join without JIT
compilation and up to a 26% performance increase when JIT is enabled and
optimization and inlining were performed by the JIT compiler. The
performance increase with 1 join column was less with a 14% increase
with and without JIT. This test was done using a fairly small hash
table and a large number of hash probes. The increase will likely be
less with large tables, especially ones larger than L3 cache as memory
pressure is more likely to be the limiting factor there.
This commit only addresses Hash Joins, but lays expression evaluation
and JIT compilation infrastructure for other hashing needs such as Hash
Aggregate.
Author: David Rowley
Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net>
Reviewed-by: Tels <nospam-pg-abuse@bloodgate.com>
Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com
|
|
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.
A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.
Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.
Dean Rasheed, reviewed by Jian He and Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
This excludes any changes that would change the external AM APIs.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org
|
|
This commit removes unnecessary ExecExprFreeContext() calls in
ExecEnd* routines because the actual cleanup is managed by
FreeExecutorState(). With no callers remaining for
ExecExprFreeContext(), this commit also removes the function.
This commit also drops redundant ExecClearTuple() calls, because
ExecResetTupleTable() in ExecEndPlan() already takes care of
resetting and dropping all TupleTableSlots initialized with
ExecInitScanTupleSlot() and ExecInitExtraTupleSlot().
After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. So, this
commit removes them.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
Commit 89e46da5e5 allowed using BTREE indexes that are neither
PRIMARY KEY nor REPLICA IDENTITY on the subscriber during apply of
update/delete. This patch extends that functionality to also allow HASH
indexes.
We explored supporting other index access methods as well but they don't
have a fixed strategy for equality operation which is required by the
current infrastructure in logical replication to scan the indexes.
Author: Kuroda Hayato
Reviewed-by: Peter Smith, Onder Kalaci, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58669D7414E59664E17A5827F522A@TYAPR01MB5866.jpnprd01.prod.outlook.com
|
|
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals. (In join cases, other relations in the query are still
scanned normally.) This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children. We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself. This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.
Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck. In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState). Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.
This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested. I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.
Per report from Ante Krešić (via Aleksander Alekseev)
Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
|
|
This provides a very simple way to see the generic plan for a
parameterized query. Without this, it's necessary to define
a prepared statement and temporarily change plan_cache_mode,
which is a bit tedious.
One thing that's a bit of a hack perhaps is that we disable
execution-time partition pruning when the GENERIC_PLAN option
is given. That's because the pruning code may attempt to
fetch the value of one of the parameters, which would fail.
Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg,
Michel Pelletier, Jim Jones, and myself
Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at
|
|
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.
A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.
This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.
This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.
Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.
Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
|
|
While testing a fix for bug #17823, I discovered that EvalPlanQualStart
failed to copy es_rteperminfos from the parent EState, resulting in
failure if anything in EPQ execution wanted to consult that information.
This led me to conclude that commit a61b1f748 had been too haphazard
about where to fill es_rteperminfos, and that we need to be sure that
that happens exactly where es_range_table gets filled. So I changed the
signature of ExecInitRangeTable to help ensure that this new requirement
doesn't get missed. (Indeed, pgoutput.c was also failing to fill it.
Maybe we don't ever need it there, but I wouldn't bet on that.)
No test case yet; one will arrive with the fix for #17823.
But that needs to be back-patched, while this fix is HEAD-only.
Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
|
|
Backpatch-through: 11
|
|
Currently, information about the permissions to be checked on relations
mentioned in a query is stored in their range table entries. So the
executor must scan the entire range table looking for relations that
need to have permissions checked. This can make the permission checking
part of the executor initialization needlessly expensive when many
inheritance children are present in the range range. While the
permissions need not be checked on the individual child relations, the
executor still must visit every range table entry to filter them out.
This commit moves the permission checking information out of the range
table entries into a new plan node called RTEPermissionInfo. Every
top-level (inheritance "root") RTE_RELATION entry in the range table
gets one and a list of those is maintained alongside the range table.
This new list is initialized by the parser when initializing the range
table. The rewriter can add more entries to it as rules/views are
expanded. Finally, the planner combines the lists of the individual
subqueries into one flat list that is passed to the executor for
checking.
To make it quick to find the RTEPermissionInfo entry belonging to a
given relation, RangeTblEntry gets a new Index field 'perminfoindex'
that stores the corresponding RTEPermissionInfo's index in the query's
list of the latter.
ExecutorCheckPerms_hook has gained another List * argument; the
signature is now:
typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable,
List *rtePermInfos,
bool ereport_on_violation);
The first argument is no longer used by any in-core uses of the hook,
but we leave it in place because there may be other implementations that
do. Implementations should likely scan the rtePermInfos list to
determine which operations to allow or deny.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
|
|
ri_RootToPartitionMap is currently only initialized for tuple routing
target partitions, though a future commit will need the ability to use
it even for the non-partition child tables, so make adjustments to the
decouple it from the partitioning code.
Also, make it lazily initialized via ExecGetRootToChildMap(), making
that function its preferred access path. Existing third-party code
accessing it directly should no longer do so; consequently, it's been
renamed to ri_RootToChildMap, which also makes it consistent with
ri_ChildToRootMap.
ExecGetRootToChildMap() houses the logic of setting the map appropriately
depending on whether a given child relation is partition or not.
To support this, also add a separate entry point for TupleConversionMap
creation that receives an AttrMap. No new code here, just split an
existing function in two.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
|
|
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in storage, catalog,
access method, executor, and logical replication code, as well as in
miscellaneous utility/library code.
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will do the
same for other parts of the codebase.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
|
|
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer. Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument. Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind. So spend the extra few lines of code
to do it honestly with a separate walker function.
In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value. I left that as-is in back branches though,
to forestall complaints about ABI breakage.
Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.
Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
|
|
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
|
|
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
|
|
This reverts commit 99392cdd78b788295e52b9f4942fa11992fd5ba9.
We'd rather rewrite ri_triggers.c as a whole rather than piecemeal.
Discussion: https://postgr.es/m/E1ncXX2-000mFt-Pe@gemulon.postgresql.org
|
|
Modify the subroutines called by RI trigger functions that want to check
if a given referenced value exists in the referenced relation to simply
scan the foreign key constraint's unique index, instead of using SPI to
execute
SELECT 1 FROM referenced_relation WHERE ref_key = $1
This saves a lot of work, especially when inserting into or updating a
referencing relation.
This rewrite allows to fix a PK row visibility bug caused by a partition
descriptor hack which requires ActiveSnapshot to be set to come up with
the correct set of partitions for the RI query running under REPEATABLE
READ isolation. We now set that snapshot indepedently of the snapshot
to be used by the PK index scan, so the two no longer interfere. The
buggy output in src/test/isolation/expected/fk-snapshot.out of the
relevant test case added by commit 00cb86e75d6d has been corrected.
(The bug still exists in branch 14, however, but this fix is too
invasive to backpatch.)
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Li Japin <japinli@hotmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
|
|
This introduces the SQL/JSON functions for querying JSON data using
jsonpath expressions. The functions are:
JSON_EXISTS()
JSON_QUERY()
JSON_VALUE()
All of these functions only operate on jsonb. The workaround for now is
to cast the argument to jsonb.
JSON_EXISTS() tests if the jsonpath expression applied to the jsonb
value yields any values. JSON_VALUE() must return a single value, and an
error occurs if it tries to return multiple values. JSON_QUERY() must
return a json object or array, and there are various WRAPPER options for
handling scalar or multi-value results. Both these functions have
options for handling EMPTY and ERROR conditions.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
|