| Age | Commit message (Collapse) | Author |
|
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose. Here, we allow these scans to
be parallelized. The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan. Doing that means workers finish at roughly the same time.
Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers). It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range. There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.
Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
|
|
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.
Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*). This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref. This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column. The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.
It may be possible to add prosupport functions for other aggregates. We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).
There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test. Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
|
|
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery. The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated. While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
|
|
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent. Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates. The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions. Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable. Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.
Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.
Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit. It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit. The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.
Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
|
|
This information appears to have been unused since commit
c5b7ba4e67. We could not find any references in third-party code,
either.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan
|
|
These hooks allow plugins to get control at the earliest point at
which the PlannerGlobal object is fully initialized, and then just
before it gets destroyed. This is useful in combination with the
extendable plan state facilities (see extendplan.h) and perhaps for
other purposes as well.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
|
|
This allows extensions to have access to any data they've stored
in the ExplainState during planning. Unfortunately, it won't help
with EXPLAIN EXECUTE is used, but since that case is less common,
this still seems like an improvement.
Since planner() has quite a few arguments now, also add some
documentation of those arguments and the return value.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
|
|
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
|
|
Instead, use the new mechanism that allows planner extensions to store
private state inside a PlannerInfo, treating GEQO as an in-core planner
extension. This is a useful test of the new facility, and also buys
back a few bytes of storage.
To make this work, we must remove innerrel_is_unique_ext's hack of
testing whether join_search_private is set as a proxy for whether
the join search might be retried. Add a flag that extensions can
use to explicitly signal their intentions instead.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
|
|
Extension that make extensive use of planner hooks may want to
coordinate their efforts, for example to avoid duplicate computation,
but that's currently difficult because there's no really good way to
pass data between different hooks.
To make that easier, allow for storage of extension-managed private
state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very
similar lines to what we have permitted for ExplainState since commit
c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
|
|
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.
To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.
We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.
We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
|
|
There are a number of forward declarations that use struct but not the
customary typedef, because that could have led to repeat typedefs,
which was not allowed. This is now allowed in C11, so we can update
these to provide the typedefs as well, so that the later uses of the
types look more consistent.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
|
|
This is allowed in C11, so we don't need the workarounds anymore.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
|
|
The typedef Relids (Bitmapset *) is intended to represent set of
relation identifiers, but was incorrectly used in several places to
store sets of attribute numbers. This is my oversight in e2debb643.
Fix that by replacing such usages with Bitmapset * to reflect the
correct semantics.
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LJhp_xriXf39iCz0TsK+M-2biuhDhpLC6Baxw8+ZYT3A@mail.gmail.com
|
|
Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. This patch removes that loop
and simplifies the function accordingly.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
|
|
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
This patch also removes the UNIQUE_PATH_NOOP related code along the
way, as it is dead code -- if the RHS rel is provably unique, the
semijoin should have already been simplified to a plain inner join by
analyzejoins.c.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
|
|
Commit 9e6104c66 disallowed transition tables on foreign tables, but
failed to account for cases where a foreign table is a child table of a
partitioned/inherited table on which transition tables exist, leading to
incorrect transition tuples collected from such foreign tables for
queries on the parent table triggering transition capture. This
occurred not only for inherited UPDATE/DELETE but for partitioned INSERT
later supported by commit 3d956d956, which should have handled it at
least for the INSERT case, but didn't.
To fix, modify ExecAR*Triggers to throw an error if the given relation
is a foreign table requesting transition capture. Also, this commit
fixes make_modifytable so that in case of an inherited UPDATE/DELETE
triggering transition capture, FDWs choose normal operations to modify
child foreign tables, not DirectModify; which is needed because they
would otherwise skip the calls to ExecAR*Triggers at execution, causing
unexpected behavior.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com
Backpatch-through: 13
|
|
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan. To help address that, here we
adjust the EXPLAIN output to display the following additional details:
1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio
Technically #4 can be calculated using #1, #2 and #3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated. The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily. For example, if #2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
|
|
In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets. This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner. However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.
Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point. This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.
This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID. It then uses this information to perform the NullTest
deduction for Vars during constant folding. This also makes it
possible to leverage this information to pull up NOT IN subqueries.
Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false. Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning. These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.
Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable. This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals. This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded. This approach works well and has
posed no issues.
In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner. This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.
To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
For an ordered Append or MergeAppend, we need to inject an explicit
sort into any subpath that is not already well enough ordered.
Currently, only explicit full sorts are considered; incremental sorts
are not yet taken into account.
In this patch, for subpaths of an ordered Append or MergeAppend, we
choose to use explicit incremental sort if it is enabled and there are
presorted keys.
The rationale is based on the assumption that incremental sort is
always faster than full sort when there are presorted keys, a premise
that has been applied in various parts of the code. In addition, the
current cost model tends to favor incremental sort as being cheaper
than full sort in the presence of presorted keys, making it reasonable
not to consider full sort in such cases.
No backpatch as this could result in plan changes.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4_V7a2enTR+T3pOY_YZ-FU8ZsFYym2swOz4jNMqmSgyuw@mail.gmail.com
|
|
In the wake of commit a16ef313f, we need to deal with more cases
involving PlaceHolderVars in NestLoopParams than we did before.
For one thing, a16ef313f was incorrect to suppose that we could
rely on the required-outer relids of the lefthand path to decide
placement of nestloop-parameter PHVs. As Richard Guo argued at
the time, we must look at the required-outer relids of the join
path itself.
For another, we have to apply replace_nestloop_params() to such
a PHV's expression, in case it contains references to values that
will be supplied from NestLoopParams of higher-level nestloops.
For another, we need to be more careful about the phnullingrels
of the PHV than we were being. identify_current_nestloop_params
only bothered to ensure that the phnullingrels didn't contain
"too many" relids, but now it has to be exact, because setrefs.c
will apply both NRM_SUBSET and NRM_SUPERSET checks in different
places. We can compute the correct relids by determining the
set of outer joins that should be able to null the PHV and then
subtracting whatever's been applied at or below this join.
Do the same for plain Vars, too. (This should make it possible
to use NRM_EQUAL to process nestloop params in setrefs.c, but
I won't risk making such a change in v18 now.)
Lastly, if a nestloop parameter PHV was pulled up out of a subquery
and it contains a subquery that was originally pushed down from this
query level, then that will still be represented as a SubLink, because
SS_process_sublinks won't recurse into outer PHVs, so it didn't get
transformed during expression preprocessing in the subquery. We can
substitute the version of the PHV's expression appearing in its
PlaceHolderInfo to ensure that that preprocessing has happened.
(Seems like this processing sequence could stand to be redesigned,
but again, late in v18 development is not the time for that.)
It's not very clear to me why the old have_dangerous_phv join-order
restriction prevented us from seeing the last three of these problems.
But given the lack of field complaints, it must have done so.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18953-1c9883a9d4afeb30@postgresql.org
|
|
Commit 85e5e222b, which added (a forerunner of) this logic,
argued that
Adding the necessary complexity to make this work doesn't seem like
it would be repaid in significantly better plans, because in cases
where such a PHV exists, there is probably a corresponding join order
constraint that would allow a good plan to be found without using the
star-schema exception.
The flaw in this claim is that there may be other join-order
restrictions that prevent us from finding a join order that doesn't
involve a "dangerous" PHV. In particular we now recognize that
small join_collapse_limit or from_collapse_limit could prevent it.
Therefore, let's bite the bullet and make the case work.
We don't have to extend the executor's support for nestloop parameters
as I thought at the time, because we can instead push the evaluation
of the placeholder's expression into the left-hand input of the
NestLoop node. So there's not really a lot of downside to this
solution, and giving the planner more join-order flexibility should
have value beyond just avoiding failure.
Having said that, there surely is a nonzero risk of introducing
new bugs. Since this failure mode escaped detection for ten years,
such cases don't seem common enough to justify a lot of risk.
Therefore, let's put this fix into master but leave the back branches
alone (for now anyway).
Bug: #18953
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18953-1c9883a9d4afeb30@postgresql.org
|
|
When creating an explicit Sort node for the outer path of a mergejoin,
we need to determine the number of presorted keys of the outer path to
decide whether explicit incremental sort can be applied. Currently,
this is done by repeatedly calling pathkeys_count_contained_in.
This patch caches the number of presorted outer pathkeys in MergePath,
allowing us to save several calls to pathkeys_count_contained_in. It
can be considered a complement to the changes in commit 828e94c9d.
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqvBireB_w6x8BN5txdvBEHxVgZBt=rUnpf5ww5P_E_ww@mail.gmail.com
|
|
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 18 development.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
|
|
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition. For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow. Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.
To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members. Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid. When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.
To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required. Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.
The actual performance increases here are highly dependent on the number
of partitions and the query being planned. Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions. The speedups become much more visible
with a few dozen to hundreds of partitions. With some tested queries
using 56 partitions, the planner was around 3x faster than before. For
use cases with thousands of partitions, these are likely to become
significantly faster. Some testing has shown planner speedups of 60x or
more with 8192 partitions.
Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
|
|
This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr. That simplifies the query tree, eliminating the appearance
of an unnecessary join.
Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics. The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.
We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible. The conversion is only
possible for operations on scalar values, not rows. Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
This commit extracts the code to generate ScalarArrayOpExpr on top of the list
of expressions from match_orclause_to_indexcol() into a separate function
make_SAOP_expr(). This function was extracted to be used in optimization for
conversion of 'x IN (VALUES ...)' to 'x = ANY ...'. make_SAOP_expr() is
placed in clauses.c file as only two additional headers were needed there
compared with other places.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers. The
CompareType is then converted to the index-type-specific strategy when
the plan is created.
This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.
This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.
To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).
Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.
The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.
The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.
The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.
An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.
This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangote09@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
|
|
Commit 83ea6c540 added support for virtual generated columns that are
computed on read. All Var nodes in the query that reference virtual
generated columns must be replaced with the corresponding generation
expressions. Currently, this replacement occurs in the rewriter.
However, this approach has several issues. If a Var referencing a
virtual generated column has any varnullingrels, those varnullingrels
need to be propagated into the generation expression. Failing to do
so can lead to "wrong varnullingrels" errors and improper outer-join
removal.
Additionally, if such a Var comes from the nullable side of an outer
join, we may need to wrap the generation expression in a
PlaceHolderVar to ensure that it is evaluated at the right place and
hence is forced to null when the outer join should do so. In certain
cases, such as when the query uses grouping sets, we also need a
PlaceHolderVar for anything that is not a simple Var to isolate
subexpressions. Failure to do so can result in incorrect results.
To fix these issues, this patch expands the virtual generated columns
in the planner rather than in the rewriter, and leverages the
pullup_replace_vars architecture to avoid code duplication. The
generation expressions will be correctly marked with nullingrel bits
and wrapped in PlaceHolderVars when needed by the pullup_replace_vars
callback function. This requires handling the OLD/NEW RETURNING list
Vars in pullup_replace_vars_callback, as it may now deal with Vars
referencing the result relation instead of a subquery.
The "wrong varnullingrels" error was reported by Alexander Lakhin.
The incorrect result issue and the improper outer-join removal issue
were reported by Richard Guo.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
|
|
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:
RETURNING old.colname, new.colname, ...
RETURNING old.*, new.*
Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:
RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...
This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.
For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.
Dean Rasheed, reviewed by Jian He and Jeff Davis.
Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
|
|
Backpatch-through: 13
|
|
Remove the code for inserting flag columns in the inputs of a SetOp.
That was the only reason why there would be resjunk columns in a
set-operations plan tree, so we can get rid of some code that
supported that, too.
Get rid of choose_hashed_setop() in favor of building Paths for
the hashed and sorted alternatives, and letting them fight it out
within add_path().
Remove set_operation_ordered_results_useful(), which was giving wrong
answers due to examining the wrong ancestor node: we need to examine
the immediate SetOperationStmt parent not the topmost node. Instead
make each caller of recurse_set_operations() pass down the relevant
parent node. (This thinko seems to have led only to wasted planning
cycles and possibly-inferior plans, not wrong query answers. Perhaps
we should back-patch it, but I'm not doing so right now.)
Teach generate_nonunion_paths() to consider pre-sorted inputs for
sorted SetOps, rather than always generating a Sort node.
Patch by me; thanks to Richard Guo and David Rowley for review.
Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
|
|
The original design for set operations involved appending the two
input relations into one and adding a flag column that allows
distinguishing which side each row came from. Then the SetOp node
pries them apart again based on the flag. This is bizarre. The
only apparent reason to do it is that when sorting, we'd only need
one Sort node not two. But since sorting is at least O(N log N),
sorting all the data is actually worse than sorting each side
separately --- plus, we have no chance of taking advantage of
presorted input. On top of that, adding the flag column frequently
requires an additional projection step that adds cycles, and then
the Append node isn't free either. Let's get rid of all of that
and make the SetOp node have two separate children, using the
existing outerPlan/innerPlan infrastructure.
This initial patch re-implements nodeSetop.c and does a bare minimum
of work on the planner side to generate correctly-shaped plans.
In particular, I've tried not to change the cost estimates here,
so that the visible changes in the regression test results will only
involve removal of useless projection steps and not any changes in
whether to use sorted vs hashed mode.
For SORTED mode, we combine successive identical tuples from each
input into groups, and then merge-join the groups. The tuple
comparisons now use SortSupport instead of simple equality, but
the group-formation part should involve roughly the same number of
tuple comparisons as before. The cross-comparisons between left and
right groups probably add to that, but I'm not sure to quantify how
many more comparisons we might need.
For HASHED mode, nodeSetop's logic is almost the same as before,
just refactored into two separate loops instead of one loop that
has an assumption that it will see all the left-hand inputs first.
In both modes, I added early-exit logic to not bother reading the
right-hand relation if the left-hand input is empty, since neither
INTERSECT nor EXCEPT modes can produce any output if the left input
is empty. This could have been done before in the hashed mode, but
not in sorted mode. Sorted mode can also stop as soon as it exhausts
the left input; any remaining right-hand tuples cannot have matches.
Also, this patch adds some infrastructure for detecting whether
child plan nodes all output the same type of tuple table slot.
If they do, the hash table logic can use slightly more efficient
code based on assuming that that's the input slot type it will see.
We'll make use of that infrastructure in other plan node types later.
Patch by me; thanks to Richard Guo and David Rowley for review.
Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
|
|
Traditionally, remove_useless_groupby_columns() was called during
grouping_planner() directly after the call to preprocess_groupclause().
While in many ways, it made sense to populate the field and remove the
functionally dependent columns from processed_groupClause at the same
time, it's just that doing so had the disadvantage that
remove_useless_groupby_columns() was being called before the RelOptInfos
were populated for the relations mentioned in the query. Not having
RelOptInfos available meant we needed to manually query the catalog tables
to get the required details about the primary key constraint for the
table.
Here we move the remove_useless_groupby_columns() call to
query_planner() and put it directly after the RelOptInfos are populated.
This is fine to do as processed_groupClause still isn't final at this
point as it can still be modified inside standard_qp_callback() by
make_pathkeys_for_sortclauses_extended().
This commit is just a refactor and simply moves
remove_useless_groupby_columns() into initsplan.c. A planned follow-up
commit will adjust that function so it uses RelOptInfo instead of doing
catalog lookups and also teach it how to use unique indexes as proofs to
expand the cases where we can remove functionally dependent columns from
the GROUP BY.
Reviewed-by: Andrei Lepikhov, jian he
Discussion: https://postgr.es/m/CAApHDvqLezKwoEBBQd0dp4Y9MDkFBDbny0f3SzEeqOFoU7Z5+A@mail.gmail.com
|
|
The ordering of DISTINCT items is semantically insignificant, so we
can reorder them as needed. In fact, in the parser, we absorb the
sorting semantics of the sortClause as much as possible into the
distinctClause, ensuring that one clause is a prefix of the other.
This can help avoid a possible need to re-sort.
In this commit, we attempt to adjust the DISTINCT keys to match the
input path's pathkeys. This can likewise help avoid re-sorting, or
allow us to use incremental-sort to save efforts.
For DISTINCT ON expressions, the parser already ensures that they
match the initial ORDER BY expressions. When reordering the DISTINCT
keys, we must ensure that the resulting pathkey list matches the
initial distinctClause pathkeys.
This introduces a new GUC, enable_distinct_reordering, which allows
the optimization to be disabled if needed.
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com
|
|
When optimizer generates bitmap paths, it considers breaking OR-clause
arguments one-by-one. But now, a group of similar OR-clauses can be
transformed into SAOP during index matching. So, bitmap paths should
keep up.
This commit teaches bitmap paths generation machinery to group similar
OR-clauses into dedicated RestrictInfos. Those RestrictInfos are considered
both to match index as a whole (as SAOP), or to match as a set of individual
OR-clause argument one-by-one (the old way).
Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's
transformation. The old way of handling them is also considered. So, there
shouldn't be planning regression.
Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com
Author: Alexander Korotkov, Andrey Lepikhov
Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas
Reviewed-by: Peter Geoghegan
|
|
Two near-identical copies of clause_sides_match_join() existed in
joinpath.c and analyzejoins.c. Deduplicate this by moving the function
into restrictinfo.h.
It isn't quite clear that keeping the inline property of this function
is worthwhile, but this commit is just an exercise in code
deduplication. More effort would be required to determine if the inline
property is worth keeping.
Author: James Hunter <james.hunter.pg@gmail.com>
Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com
|
|
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
|
|
When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys,
we failed to realize that the grouping/ordering expressions might be
nullable by grouping sets. As a result, we may incorrectly deem that
the PathKeys are redundant by EquivalenceClass processing and thus
remove them from the pathkeys list. That would lead to wrong results
in some cases.
To fix this issue, we mark the grouping expressions nullable by
grouping sets if that is the case. If the grouping expression is a
Var or PlaceHolderVar or constructed from those, we can just add the
RT index of the RTE_GROUP RTE to the existing nullingrels field(s);
otherwise we have to add a PlaceHolderVar to carry on the nullingrel
bit.
However, we have to manually remove this nullingrel bit from
expressions in various cases where these expressions are logically
below the grouping step, such as when we generate groupClause pathkeys
for grouping sets, or when we generate PathTarget for initial input to
grouping nodes.
Furthermore, in set_upper_references, the targetlist and quals of an
Agg node should have nullingrels that include the effects of the
grouping step, ie they will have nullingrels equal to the input
Vars/PHVs' nullingrels plus the nullingrel bit that references the
grouping RTE. In order to perform exact nullingrels matches, we also
need to manually remove this nullingrel bit.
Bump catversion because this changes the querytree produced by the
parser.
Thanks to Tom Lane for the idea to invent a new kind of RTE.
Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from
various threads.
Author: Richard Guo
Reviewed-by: Ashutosh Bapat, Sutou Kouhei
Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
|
|
If there are subqueries in the grouping expressions, each of these
subqueries in the targetlist and HAVING clause is expanded into
distinct SubPlan nodes. As a result, only one of these SubPlan nodes
would be converted to reference to the grouping key column output by
the Agg node; others would have to get evaluated afresh. This is not
efficient, and with grouping sets this can cause wrong results issues
in cases where they should go to NULL because they are from the wrong
grouping set. Furthermore, during re-evaluation, these SubPlan nodes
might use nulled column values from grouping sets, which is not
correct.
This issue is not limited to subqueries. For other types of
expressions that are part of grouping items, if they are transformed
into another form during preprocessing, they may fail to match lower
target items. This can also lead to wrong results with grouping sets.
To fix this issue, we introduce a new kind of RTE representing the
output of the grouping step, with columns that are the Vars or
expressions being grouped on. In the parser, we replace the grouping
expressions in the targetlist and HAVING clause with Vars referencing
this new RTE, so that the output of the parser directly expresses the
semantic requirement that the grouping expressions be gotten from the
grouping output rather than computed some other way. In the planner,
we first preprocess all the columns of this new RTE and then replace
any Vars in the targetlist and HAVING clause that reference this new
RTE with the underlying grouping expressions, so that we will have
only one instance of a SubPlan node for each subquery contained in the
grouping expressions.
Bump catversion because this changes the querytree produced by the
parser.
Thanks to Tom Lane for the idea to invent a new kind of RTE.
Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from
various threads.
Author: Richard Guo
Reviewed-by: Ashutosh Bapat, Sutou Kouhei
Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
|
|
Previously, when a path type was disabled by e.g. enable_seqscan=false,
we either avoided generating that path type in the first place, or
more commonly, we added a large constant, called disable_cost, to the
estimated startup cost of that path. This latter approach can distort
planning. For instance, an extremely expensive non-disabled path
could seem to be worse than a disabled path, especially if the full
cost of that path node need not be paid (e.g. due to a Limit).
Or, as in the regression test whose expected output changes with this
commit, the addition of disable_cost can make two paths that would
normally be distinguishible in cost seem to have fuzzily the same cost.
To fix that, we now count the number of disabled path nodes and
consider that a high-order component of both the startup cost and the
total cost. Hence, the path list is now sorted by disabled_nodes and
then by total_cost, instead of just by the latter, and likewise for
the partial path list. It is important that this number is a count
and not simply a Boolean; else, as soon as we're unable to respect
disabled path types in all portions of the path, we stop trying to
avoid them where we can.
Because the path list is now sorted by the number of disabled nodes,
the join prechecks must compute the count of disabled nodes during
the initial cost phase instead of postponing it to final cost time.
Counts of disabled nodes do not cross subquery levels; at present,
there is no reason for them to do so, since the we do not postpone
path selection across subquery boundaries (see make_subplan).
Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley.
Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com
|
|
To determine if the two relations being joined can use partitionwise
join, we need to verify the existence of equi-join conditions
involving pairs of matching partition keys for all partition keys.
Currently we do that by looking through the join's restriction
clauses. However, it has been discovered that this approach is
insufficient, because there might be partition keys known equal by a
specific EC, but they do not form a join clause because it happens
that other members of the EC than the partition keys are constrained
to become a join clause.
To address this issue, in addition to examining the join's restriction
clauses, we also check if any partition keys are known equal by ECs,
by leveraging function exprs_known_equal(). To accomplish this, we
enhance exprs_known_equal() to check equality per the semantics of the
opfamily, if provided.
It could be argued that exprs_known_equal() could be called O(N^2)
times, where N is the number of partition key expressions, resulting
in noticeable performance costs if there are a lot of partition key
expressions. But I think this is not a problem. The number of a
joinrel's partition key expressions would only be equal to the join
degree, since each base relation within the join contributes only one
partition key expression. That is to say, it does not scale with the
number of partitions. A benchmark with a query involving 5-way joins
of partitioned tables, each with 3 partition keys and 1000 partitions,
shows that the planning time is not significantly affected by this
patch (within the margin of error), particularly when compared to the
impact caused by partitionwise join.
Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to
check if partition keys are known equal by ECs.
Author: Richard Guo, Tom Lane
Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas
Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com
|
|
In try_partitionwise_join, we aim to break down the join between two
partitioned relations into joins between matching partitions. To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child-join relations for them. With
potentially thousands of partitions, the local objects allocated in
each iteration can accumulate significant memory usage. Therefore, we
opt to eagerly free these local objects at the end of each iteration.
In line with this approach, this patch frees the bitmap set that
represents the relids of child-join relations at the end of each
iteration. Additionally, it modifies build_child_join_rel() to reuse
the AppendRelInfo structures generated within each iteration.
Author: Ashutosh Bapat
Reviewed-by: David Christensen, Richard Guo
Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com
|
|
In the case of a parallel plan, when computing the number of tuples
processed per worker, we divide the total number of tuples by the
parallel_divisor obtained from get_parallel_divisor(), which accounts
for the leader's contribution in addition to the number of workers.
Accordingly, when estimating the number of tuples for gather (merge)
nodes, we should multiply the number of tuples per worker by the same
parallel_divisor to reverse the division. However, currently we use
parallel_workers rather than parallel_divisor for the multiplication.
This could result in an underestimation of the number of tuples for
gather (merge) nodes, especially when there are fewer than four
workers.
This patch fixes this issue by using the same parallel_divisor for the
multiplication. There is one ensuing plan change in the regression
tests, but it looks reasonable and does not compromise its original
purpose of testing parallel-aware hash join.
In passing, this patch removes an unnecessary assignment for path.rows
in create_gather_merge_path, and fixes an uninitialized-variable issue
in generate_useful_gather_paths.
No backpatch as this could result in plan changes.
Author: Anthonin Bonnefoy
Reviewed-by: Rafia Sabih, Richard Guo
Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com
|
|
Previously, the code charged disable_cost for CurrentOfExpr, and then
subtracted disable_cost from the cost of a TID path that used
CurrentOfExpr as the TID qual, effectively disabling all paths except
that one. Now, we instead suppress generation of the disabled paths
entirely, and generate only the one that the executor will actually
understand.
With this approach, we do not need to rely on disable_cost being
large enough to prevent the wrong path from being chosen, and we
save some CPU cycle by avoiding generating paths that we can't
actually use. In my opinion, the code is also easier to understand
like this.
Patch by me. Review by Heikki Linnakangas.
Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi
|
|
0452b461bc made get_eclass_for_sort_expr() always set
EquivalenceClass.ec_sortref if it's not done yet. This leads to an asymmetric
situation when whoever first looks for the EquivalenceClass sets the
ec_sortref. It is also counterintuitive that get_eclass_for_sort_expr()
performs modification of data structures.
This commit makes make_pathkeys_for_sortclauses_extended() responsible for
setting EquivalenceClass.ec_sortref. Now we set the
EquivalenceClass.ec_sortref's needed to explore alternative GROUP BY ordering
specifically during building pathkeys by the list of grouping clauses.
Discussion: https://postgr.es/m/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru
Reported-by: Tom Lane
Author: Andrei Lepikhov
Reviewed-by: Alexander Korotkov, Pavel Borisov
|