Asynchronous MergeAppend

Lists: pgsql-hackers
From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Asynchronous MergeAppend
Date: 2024-07-17 13:24:28
Message-ID: 59be194c5a409fb9fc9f2031581b8a44@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hello.

I'd like to make MergeAppend node Async-capable like Append node.
Nowadays when planner chooses MergeAppend plan, asynchronous execution
is not possible. With attached patches you can see plans like

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Merge Append
Sort Key: async_pt.b, async_pt.a
-> Async Foreign Scan on public.async_p1 async_pt_1
Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b %
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
-> Async Foreign Scan on public.async_p2 async_pt_2
Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b %
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST

This can be quite profitable (in our test cases you can gain up to two
times better speed with MergeAppend async execution on remote servers).

Code for asynchronous execution in Merge Append was mostly borrowed from
Append node.

What significantly differs - in ExecMergeAppendAsyncGetNext() you must
return tuple from the specified slot.
Subplan number determines tuple slot where data should be retrieved to.
When subplan is ready to provide some data,
it's cached in ms_asyncresults. When we get tuple for subplan, specified
in ExecMergeAppendAsyncGetNext(),
ExecMergeAppendAsyncRequest() returns true and loop in
ExecMergeAppendAsyncGetNext() ends. We can fetch data for
subplans which either don't have cached result ready or have already
returned them to the upper node. This
flag is stored in ms_has_asyncresults. As we can get data for some
subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
we check this flag twice in this function.
Unlike ExecAppendAsyncEventWait(), it seems
ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no
need to get result
from synchronous subplan if a tuple form async one was explicitly
requested.

Also we had to fix postgres_fdw to avoid directly looking at Append
fields. Perhaps, accesors to Append fields look strange, but allows
to avoid some code duplication. I suppose, duplication could be even
less if we reworked async Append implementation, but so far I haven't
tried to do this to avoid big diff from master.

Also mark_async_capable() believes that path corresponds to plan. This
can be not true when create_[merge_]append_plan() inserts sort node.
In this case mark_async_capable() can treat Sort plan node as some other
and crash, so there's a small fix for this.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v1-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 1.9 KB
v1-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 34.7 KB

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2024-08-10 20:24:43
Message-ID: 764dd8b8-6374-4f5a-aac7-d8e3f6ebe5fd@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi! Thank you for your work on this subject! I think this is a very
useful optimization)

While looking through your code, I noticed some points that I think
should be taken into account. Firstly, I noticed only two tests to
verify the functionality of this function and I think that this is not
enough.
Are you thinking about adding some tests with queries involving, for
example, join connections with different tables and unusual operators?

In addition, I have a question about testing your feature on a
benchmark. Are you going to do this?

On 17.07.2024 16:24, Alexander Pyhalov wrote:
> Hello.
>
> I'd like to make MergeAppend node Async-capable like Append node.
> Nowadays when planner chooses MergeAppend plan, asynchronous execution
> is not possible. With attached patches you can see plans like
>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
>                                                           QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------
>
>  Merge Append
>    Sort Key: async_pt.b, async_pt.a
>    ->  Async Foreign Scan on public.async_p1 async_pt_1
>          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b %
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>    ->  Async Foreign Scan on public.async_p2 async_pt_2
>          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b %
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>
> This can be quite profitable (in our test cases you can gain up to two
> times better speed with MergeAppend async execution on remote servers).
>
> Code for asynchronous execution in Merge Append was mostly borrowed
> from Append node.
>
> What significantly differs - in ExecMergeAppendAsyncGetNext() you must
> return tuple from the specified slot.
> Subplan number determines tuple slot where data should be retrieved
> to. When subplan is ready to provide some data,
> it's cached in ms_asyncresults. When we get tuple for subplan,
> specified in ExecMergeAppendAsyncGetNext(),
> ExecMergeAppendAsyncRequest() returns true and loop in
> ExecMergeAppendAsyncGetNext() ends. We can fetch data for
> subplans which either don't have cached result ready or have already
> returned them to the upper node. This
> flag is stored in ms_has_asyncresults. As we can get data for some
> subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
> we check this flag twice in this function.
> Unlike ExecAppendAsyncEventWait(), it seems
> ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no
> need to get result
> from synchronous subplan if a tuple form async one was explicitly
> requested.
>
> Also we had to fix postgres_fdw to avoid directly looking at Append
> fields. Perhaps, accesors to Append fields look strange, but allows
> to avoid some code duplication. I suppose, duplication could be even
> less if we reworked async Append implementation, but so far I haven't
> tried to do this to avoid big diff from master.
>
> Also mark_async_capable() believes that path corresponds to plan. This
> can be not true when create_[merge_]append_plan() inserts sort node.
> In this case mark_async_capable() can treat Sort plan node as some
> other and crash, so there's a small fix for this.

I think you should add this explanation to the commit message because
without it it's hard to understand the full picture of how your code works.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2024-08-20 09:14:44
Message-ID: 159b591411bb2c81332018927acbd509@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Alena Rybakina писал(а) 2024-08-10 23:24:
> Hi! Thank you for your work on this subject! I think this is a very
> useful optimization)
>
> While looking through your code, I noticed some points that I think
> should be taken into account. Firstly, I noticed only two tests to
> verify the functionality of this function and I think that this is not
> enough.
> Are you thinking about adding some tests with queries involving, for
> example, join connections with different tables and unusual operators?

I've added some more tests - tests for joins and pruning.

>
> In addition, I have a question about testing your feature on a
> benchmark. Are you going to do this?
>

The main reason for this work is a dramatic performance degradation when
Append plans with async foreign scan nodes are switched to MergeAppend
plans with synchronous foreign scans.

I've performed some synthetic tests to prove the benefits of async Merge
Append. So far tests are performed on one physical host.

For tests I've deployed 3 PostgreSQL instances on ports 5432-5434.

The first instance:
create server s2 foreign data wrapper postgres_fdw OPTIONS ( port
'5433', dbname 'postgres', async_capable 'on');
create server s3 foreign data wrapper postgres_fdw OPTIONS ( port
'5434', dbname 'postgres', async_capable 'on');

create foreign table players_p1 partition of players for values with
(modulus 4, remainder 0) server s2;
create foreign table players_p2 partition of players for values with
(modulus 4, remainder 1) server s2;
create foreign table players_p3 partition of players for values with
(modulus 4, remainder 2) server s3;
create foreign table players_p4 partition of players for values with
(modulus 4, remainder 3) server s3;

s2 instance:
create table players_p1 (id int, name text, score int);
create table players_p2 (id int, name text, score int);
create index on players_p1(score);
create index on players_p2(score);

s3 instance:
create table players_p3 (id int, name text, score int);
create table players_p4 (id int, name text, score int);
create index on players_p3(score);
create index on players_p4(score);

s1 instance:
insert into players select i, 'player_' ||i, random()* 100 from
generate_series(1,100000) i;

pgbench script:
\set rnd_offset random(0,200)
\set rnd_limit random(10,20)

select * from players order by score desc offset :rnd_offset limit
:rnd_limit;

pgbench was run as:
pgbench -n -f 1.sql postgres -T 100 -c 16 -j 16

CPU idle was about 5-10%.

pgbench results:

Without patch, async_capable on:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 130523
latency average = 12.257 ms
initial connection time = 29.824 ms
tps = 1305.363500 (without initial connection time)

Without patch, async_capable off:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 130075
latency average = 12.299 ms
initial connection time = 26.931 ms
tps = 1300.877993 (without initial connection time)

as expected - we see no difference.

Patched, async_capable on:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 135616
latency average = 11.796 ms
initial connection time = 28.619 ms
tps = 1356.341587 (without initial connection time)

Patched, async_capable off:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 131300
latency average = 12.185 ms
initial connection time = 29.573 ms
tps = 1313.138405 (without initial connection time)

Here we can see that async MergeAppend behaves a bit better. You can
argue that benefit is not so big and perhaps is related to some random
factors.
However, if we set number of threads to 1, so that CPU has idle cores,
we'll see more evident improvements:

Patched, async_capable on:
pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 20221
latency average = 4.945 ms
initial connection time = 7.035 ms
tps = 202.221816 (without initial connection time)

Patched, async_capable off
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 14941
latency average = 6.693 ms
initial connection time = 7.037 ms
tps = 149.415688 (without initial connection time)

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v2-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 45.2 KB
v2-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 1.9 KB

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-07-26 07:56:22
Message-ID: 242bace2babce8489701c9ca65cf84f7@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

I've updated patches for asynchronous merge append. They allowed us to
significantly improve performance in practice. Earlier select from
partitioned (and distributed table) could switch to synchronous merge
append plan from asynchronous append. Given that table could have 20+
partitions, it was cheaper, but much less efficient due to remote parts
executing synchronously.

In this version there's a couple of small fixes - earlier
ExecMergeAppend() scanned all asyncplans, but should do this only for
valid asyncplans. Also incorporated logic from

commit af717317a04f5217728ce296edf4a581eb7e6ea0
Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Date: Wed Mar 12 20:53:09 2025 +0200

Handle interrupts while waiting on Append's async subplans

into ExecMergeAppendAsyncEventWait().

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 46.2 KB
0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 1.9 KB

From: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-10-25 11:59:09
Message-ID: 202510251154.isknefznk566@alvherre.pgsql
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I noticed that this patch has gone largely unreviewed, but it needs
rebase due to the GUC changes, so here it is again.

Thanks

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Attachment Content-Type Size
v4-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 1.9 KB
v4-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch text/x-diff 46.4 KB

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Alexander Pyhalov" <a(dot)pyhalov(at)postgrespro(dot)ru>, "Alena Rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-03 13:00:48
Message-ID: DDZ2ULUYDQJ4.MXMP02V4GIG@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi, thanks for working on this!

On Tue Aug 20, 2024 at 6:14 AM -03, Alexander Pyhalov wrote:
>> In addition, I have a question about testing your feature on a
>> benchmark. Are you going to do this?
>>
>
> The main reason for this work is a dramatic performance degradation when
> Append plans with async foreign scan nodes are switched to MergeAppend
> plans with synchronous foreign scans.
>
> I've performed some synthetic tests to prove the benefits of async Merge
> Append. So far tests are performed on one physical host.
>
> For tests I've deployed 3 PostgreSQL instances on ports 5432-5434.
>
> The first instance:
> create server s2 foreign data wrapper postgres_fdw OPTIONS ( port
> '5433', dbname 'postgres', async_capable 'on');
> create server s3 foreign data wrapper postgres_fdw OPTIONS ( port
> '5434', dbname 'postgres', async_capable 'on');
>
> create foreign table players_p1 partition of players for values with
> (modulus 4, remainder 0) server s2;
> create foreign table players_p2 partition of players for values with
> (modulus 4, remainder 1) server s2;
> create foreign table players_p3 partition of players for values with
> (modulus 4, remainder 2) server s3;
> create foreign table players_p4 partition of players for values with
> (modulus 4, remainder 3) server s3;
>
> s2 instance:
> create table players_p1 (id int, name text, score int);
> create table players_p2 (id int, name text, score int);
> create index on players_p1(score);
> create index on players_p2(score);
>
> s3 instance:
> create table players_p3 (id int, name text, score int);
> create table players_p4 (id int, name text, score int);
> create index on players_p3(score);
> create index on players_p4(score);
>
> s1 instance:
> insert into players select i, 'player_' ||i, random()* 100 from
> generate_series(1,100000) i;
>
> pgbench script:
> \set rnd_offset random(0,200)
> \set rnd_limit random(10,20)
>
> select * from players order by score desc offset :rnd_offset limit
> :rnd_limit;
>
> pgbench was run as:
> pgbench -n -f 1.sql postgres -T 100 -c 16 -j 16
>
> CPU idle was about 5-10%.
>
> pgbench results:
>
> [...]
> However, if we set number of threads to 1, so that CPU has idle cores,
> we'll see more evident improvements:
>
> Patched, async_capable on:
> pgbench (14.13, server 18devel)
> transaction type: 1.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 100 s
> number of transactions actually processed: 20221
> latency average = 4.945 ms
> initial connection time = 7.035 ms
> tps = 202.221816 (without initial connection time)
>
>
> Patched, async_capable off
> transaction type: 1.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 100 s
> number of transactions actually processed: 14941
> latency average = 6.693 ms
> initial connection time = 7.037 ms
> tps = 149.415688 (without initial connection time)
>
I ran some benchmarks based on v4 attached by Alvaro in [1] using a
smaller number of threads so that some CPU cores would be idle and I
also obtained better results:

Patched, async_capable on:
tps = 4301.567405

Master, async_capable on:
tps = 3847.084545

So I'm +1 for the idea. I know it's been while since the last patch, and
unfortunully it hasn't received reviews since then. Do you still plan to
work on it? I still need to take a look on the code to see if I can help
with some comments.

During the tests I got compiler errors due to fce7c73fba4, so I'm
attaching a v5 with guc_parameters.dat correctly sorted.

The postgres_fdw/regress tests was also failling due to some whitespace
problems, v5 also fix this.

[1] https://www.postgresql.org/message-id/202510251154.isknefznk566%40alvherre.pgsql

--
Matheus Alcantara

Attachment Content-Type Size
v5-0001-mark_async_capable-subpath-should-match-subplan.patch text/plain 1.9 KB
v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch text/plain 46.4 KB

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-05 06:30:59
Message-ID: 2fb1d9923b6995492e7b163e6cb95402@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Matheus Alcantara писал(а) 2025-11-03 16:00:
> So I'm +1 for the idea. I know it's been while since the last patch,
> and
> unfortunully it hasn't received reviews since then. Do you still plan
> to
> work on it? I still need to take a look on the code to see if I can
> help
> with some comments.
>
> During the tests I got compiler errors due to fce7c73fba4, so I'm
> attaching a v5 with guc_parameters.dat correctly sorted.
>
> The postgres_fdw/regress tests was also failling due to some whitespace
> problems, v5 also fix this.
>
> [1]
> https://www.postgresql.org/message-id/202510251154.isknefznk566%40alvherre.pgsql
>

I'm still interested in working on this patch, but it didn't get any
review (besides internal one). I suppose, Append and MergeAppend nodes
need some unification, for example, ExecAppendAsyncEventWait and
ExecMergeAppendAsyncEventWait looks the same, both
classify_matching_subplans() versions are suspiciously similar. But
honestly, patch needs thorough review.
--
Best regards,
Alexander Pyhalov,
Postgres Professional


From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Alexander Pyhalov" <a(dot)pyhalov(at)postgrespro(dot)ru>, "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
Cc: "Alena Rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-11 21:00:53
Message-ID: DE662JHGRO2O.3KJBBG2R3IT17@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed Nov 5, 2025 at 3:30 AM -03, Alexander Pyhalov wrote:
>> So I'm +1 for the idea. I know it's been while since the last patch,
>> and unfortunully it hasn't received reviews since then. Do you still
>> plan to work on it? I still need to take a look on the code to see if
>> I can help with some comments.
>
> I'm still interested in working on this patch, but it didn't get any
> review (besides internal one). I suppose, Append and MergeAppend nodes
> need some unification, for example, ExecAppendAsyncEventWait and
> ExecMergeAppendAsyncEventWait looks the same, both
> classify_matching_subplans() versions are suspiciously similar. But
> honestly, patch needs thorough review.
>
Here are some comments on my first look at the patches. I still don't
have too much experience with the executor code but I hope that I can
help with something.

v5-0001-mark_async_capable-subpath-should-match-subplan.patch

I don't have to much comments on this, perhaps we could have a commit
message explaining the reason behind the change.

----

v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch

The AppendState struct has the "as_syncdone", this field is not needed
on MergeAppendState?

----
Regarding the duplicated code on classify_matching_subplans I think that
we can have a more generic function that operates over function
parameters, something like this:

/*
* Classify valid subplans into sync and async groups.
*
* It calculates the intersection of *valid_subplans and *asyncplans,
* stores the result in *valid_asyncplans, and removes those members
* from *valid_subplans (leaving only sync plans).
*
* Returns true if valid async plans were found, false otherwise.
*/
static bool
classify_subplans_internal(Bitmapset **valid_subplans,
Bitmapset *asyncplans,
Bitmapset **valid_asyncplans);

----
The GetValidAsyncplans() is not being used

----
We have some reduction of code coverage on nodeMergeAppend.c. The
significant blocks are on ExecMergeAppendAsyncBegin():
+ /* If we've yet to determine the valid subplans then do so now. */
+ if (!node->ms_valid_subplans_identified)
+ {
+ node->ms_valid_subplans =
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
+ node->ms_valid_subplans_identified = true;
+
+ classify_matching_subplans(node);
+ }

And there are some blocks on ExecReScanMergeAppend(). It's worth adding
a test case for then? I'm not sure how hard would be to write a
regression test that cover these blocks.

----
I agree that duplicated code is not good but it seems to me that we
already have some code on nodeMergeAppend.c borrowed from nodeAppend.c
even without you patch, for example the ExecInitMergeAppend(),
ExecReScanMergeAppend() and partially ExecMergeAppend().

Although nodeMergeAppend.c and nodeAppend.c have similar functions ,
some difference exists and I'm wondering if we should wait for the rule
of three [1] to refactor these duplicated code?

[1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)

--
Matheus Alcantara
EDB: http://www.enterprisedb.com


From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-15 10:57:04
Message-ID: efbfbcf00b6b790e8f80c13a83417a23@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Matheus Alcantara писал(а) 2025-11-12 00:00:
> On Wed Nov 5, 2025 at 3:30 AM -03, Alexander Pyhalov wrote:
>>> So I'm +1 for the idea. I know it's been while since the last patch,
>>> and unfortunully it hasn't received reviews since then. Do you still
>>> plan to work on it? I still need to take a look on the code to see if
>>> I can help with some comments.
>>
>> I'm still interested in working on this patch, but it didn't get any
>> review (besides internal one). I suppose, Append and MergeAppend nodes
>> need some unification, for example, ExecAppendAsyncEventWait and
>> ExecMergeAppendAsyncEventWait looks the same, both
>> classify_matching_subplans() versions are suspiciously similar. But
>> honestly, patch needs thorough review.
>>

Hi.
Thanks for review.

> Here are some comments on my first look at the patches. I still don't
> have too much experience with the executor code but I hope that I can
> help with something.
>
> v5-0001-mark_async_capable-subpath-should-match-subplan.patch
>
> I don't have to much comments on this, perhaps we could have a commit
> message explaining the reason behind the change.

I've expanded commit message. The issue is that mark_async_capable()
relies
on the fact that plan node type corresponds to path type. This is not
true when
(Merge)Append decides to insert Sort node.

>
> ----
>
> v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch
>
> The AppendState struct has the "as_syncdone", this field is not needed
> on MergeAppendState?

We don't need as_syncdone. Async Append fetches tuple from async subplan
and waits for them either when they have some data or when there's no
more sync subplans (as we can return any tuple we receive from
subplans). But ExecMergeAppend should decide which tuple to return based
on sort order, so there's no need to remember if we are done with sync
subplans, as subplans ordering matters, and we can't arbitrary switch
between them.

> Regarding the duplicated code on classify_matching_subplans I think
> that
> we can have a more generic function that operates over function
> parameters

I've tried to do so, but there are two issues. There's no suitable
common header between
nodAppend and nodeMergeAppend. I've put
classify_matching_subplans_common() into src/include/nodes/execnodes.h
and sure that it's not the best choice. The second issue is with
as_syncdone, it exists only in AppendState, so
we should check for empty valid_subplans separately. In fact, there's 3
outcomes for Append 1) no sync plans,
no async plans, 2) no async plans, 3) async plans present, and only
last two states have meaning
for MergeAppend.

> The GetValidAsyncplans() is not being used

As well as GetValidAsyncRequest(), these parts are used by our FDW, so
they slipped in the patch. Removed them.

>
> ----
> We have some reduction of code coverage on nodeMergeAppend.c. The
> significant blocks are on ExecMergeAppendAsyncBegin():
> + /* If we've yet to determine the valid subplans then do so now. */
> + if (!node->ms_valid_subplans_identified)
> + {
> + node->ms_valid_subplans =
> + ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
> + node->ms_valid_subplans_identified = true;
> +
> + classify_matching_subplans(node);
> + }
>
> And there are some blocks on ExecReScanMergeAppend(). It's worth adding
> a test case for then? I'm not sure how hard would be to write a
> regression test that cover these blocks.
>

You are right. There's difference between ExecAppend and
ExecMergeAppend. Append identifies valid subplans in
ExecAppendAsyncBegin. MergeAppend - earlier, in ExecMergeAppend(). So
this is really the dead code. And there was an issue with it, which
became evident when I've added test for rescan. When we've identified
new subplans in ExecMergeAppend(), we have to classify them.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v6-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 2.2 KB
v6-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 49.8 KB

From: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-17 21:09:19
Message-ID: CAFY6G8d3Yvxa_kRQA24BsJhwqfmSCv1ujiv_7b6g5isf-ZTs_Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Sat Nov 15, 2025 at 7:57 AM -03, Alexander Pyhalov wrote:
>> Here are some comments on my first look at the patches. I still don't
>> have too much experience with the executor code but I hope that I can
>> help with something.
>>
>> v5-0001-mark_async_capable-subpath-should-match-subplan.patch
>>
>> I don't have to much comments on this, perhaps we could have a commit
>> message explaining the reason behind the change.
>
> I've expanded commit message. The issue is that mark_async_capable()
> relies
> on the fact that plan node type corresponds to path type. This is not
> true when
> (Merge)Append decides to insert Sort node.
>
Your explanation about why this change is needed that you've include on
your first email sounds more clear IMHO. I would suggest the following
for a commit message:
mark_async_capable() believes that path corresponds to plan. This is
not true when create_[merge_]append_plan() inserts sort node. In
this case mark_async_capable() can treat Sort plan node as some
other and crash. Fix this by handling the Sort node separately.

This is needed to make MergeAppend node async-capable that it will
be implemented in a next commit.

What do you think?

I was reading the patch changes again and I have a minor point:

SubqueryScan *scan_plan = (SubqueryScan *) plan;

/*
- * If the generated plan node includes
a gating Result node,
- * we can't execute it asynchronously.
+ * If the generated plan node includes
a gating Result node or
+ * a Sort node, we can't execute it
asynchronously.
*/
- if (IsA(plan, Result))
+ if (IsA(plan, Result) || IsA(plan, Sort))

Shouldn't we cast the plan to SubqueryScan* after the IsA(...) check? I
know this code has been before your changes but type casting before a
IsA() check sounds a bit strange to me. Also perhaps we could add an
Assert(IsA(plan, SubqueryScan)) after the IsA(...) check and before the
type casting just for sanity?

>> ----
>>
>> v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch
>>
>> The AppendState struct has the "as_syncdone", this field is not needed
>> on MergeAppendState?
>
> We don't need as_syncdone. Async Append fetches tuple from async subplan
> and waits for them either when they have some data or when there's no
> more sync subplans (as we can return any tuple we receive from
> subplans). But ExecMergeAppend should decide which tuple to return based
> on sort order, so there's no need to remember if we are done with sync
> subplans, as subplans ordering matters, and we can't arbitrary switch
> between them.
>
Ok, thanks for the explanation.

>
>> Regarding the duplicated code on classify_matching_subplans I think
>> that
>> we can have a more generic function that operates over function
>> parameters
>
> I've tried to do so, but there are two issues. There's no suitable
> common header between
> nodAppend and nodeMergeAppend. I've put
> classify_matching_subplans_common() into src/include/nodes/execnodes.h
> and sure that it's not the best choice. The second issue is with
> as_syncdone, it exists only in AppendState, so
> we should check for empty valid_subplans separately. In fact, there's 3
> outcomes for Append 1) no sync plans,
> no async plans, 2) no async plans, 3) async plans present, and only
> last two states have meaning
> for MergeAppend.
>
I think that's ok to have these separated checks on nodeAppend.c and
nodeMergeAppend.c once the majority of duplicated steps that would be
required is centralized into a single reusable function.

I also agree that execnodes.h may not be the best place to declare this
function but I also don't have too many ideas of where to put it. Let's
see if we have more comments on this.

>> ----
>> We have some reduction of code coverage on nodeMergeAppend.c. The
>> significant blocks are on ExecMergeAppendAsyncBegin():
>> + /* If we've yet to determine the valid subplans then do so now. */
>> + if (!node->ms_valid_subplans_identified)
>> + {
>> + node->ms_valid_subplans =
>> + ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
>> + node->ms_valid_subplans_identified = true;
>> +
>> + classify_matching_subplans(node);
>> + }
>>
>> And there are some blocks on ExecReScanMergeAppend(). It's worth adding
>> a test case for then? I'm not sure how hard would be to write a
>> regression test that cover these blocks.
>>
>
> You are right. There's difference between ExecAppend and
> ExecMergeAppend. Append identifies valid subplans in
> ExecAppendAsyncBegin. MergeAppend - earlier, in ExecMergeAppend(). So
> this is really the dead code. And there was an issue with it, which
> became evident when I've added test for rescan. When we've identified
> new subplans in ExecMergeAppend(), we have to classify them.
>
Thanks, the code coverage looks better now.

I plan to do another round of review on 0002, in the meantime I'm
sharing these comments for now.

--
Matheus Alcantara
EDB: http://www.enterprisedb.com


From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-18 07:14:23
Message-ID: b546d8bd5e2218f4f917fda4c93ead21@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Matheus Alcantara писал(а) 2025-11-18 00:09:
> On Sat Nov 15, 2025 at 7:57 AM -03, Alexander Pyhalov wrote:
>>> Here are some comments on my first look at the patches. I still don't
>>> have too much experience with the executor code but I hope that I can
>>> help with something.
>>>
>>> v5-0001-mark_async_capable-subpath-should-match-subplan.patch
>>>
>>> I don't have to much comments on this, perhaps we could have a commit
>>> message explaining the reason behind the change.
>>
>> I've expanded commit message. The issue is that mark_async_capable()
>> relies
>> on the fact that plan node type corresponds to path type. This is not
>> true when
>> (Merge)Append decides to insert Sort node.
>>
> Your explanation about why this change is needed that you've include on
> your first email sounds more clear IMHO. I would suggest the following
> for a commit message:
> mark_async_capable() believes that path corresponds to plan. This
> is
> not true when create_[merge_]append_plan() inserts sort node. In
> this case mark_async_capable() can treat Sort plan node as some
> other and crash. Fix this by handling the Sort node separately.
>
> This is needed to make MergeAppend node async-capable that it will
> be implemented in a next commit.
>
> What do you think?
>

Seems to be OK.

> I was reading the patch changes again and I have a minor point:
>
> SubqueryScan *scan_plan = (SubqueryScan
> *) plan;
>
> /*
> - * If the generated plan node includes
> a gating Result node,
> - * we can't execute it asynchronously.
> + * If the generated plan node includes
> a gating Result node or
> + * a Sort node, we can't execute it
> asynchronously.
> */
> - if (IsA(plan, Result))
> + if (IsA(plan, Result) || IsA(plan,
> Sort))
>
> Shouldn't we cast the plan to SubqueryScan* after the IsA(...) check? I
> know this code has been before your changes but type casting before a
> IsA() check sounds a bit strange to me. Also perhaps we could add an
> Assert(IsA(plan, SubqueryScan)) after the IsA(...) check and before the
> type casting just for sanity?

Yes, checking for node not to be A and then using it as B seems to be
strange. But casting to another type and checking if node is of a
particular type before using seems to be rather common. It doesn't do
any harm if we don't actually refer to SubqueryScan fields.

Updated the first patch.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v7-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 2.4 KB
v7-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 49.8 KB

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Alexander Pyhalov" <a(dot)pyhalov(at)postgrespro(dot)ru>, "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
Cc: "Alena Rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-19 21:51:35
Message-ID: DED05POJZS2W.2EZ60AOBMDDAE@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue Nov 18, 2025 at 4:14 AM -03, Alexander Pyhalov wrote:
> Updated the first patch.
>
Thanks for the new version. Some new comments.

v7-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch:

1. Should be "nasyncplans" instead of "nplans"? ExecInitAppend use
"nasyncplans" to allocate the as_asyncresults array.

+ mergestate->ms_asyncresults = (TupleTableSlot **)
+ palloc0(nplans * sizeof(TupleTableSlot *));
+

2. I think that this comment should be updated. IIUC ms_valid_subplans
may not be all subplans because classify_matching_subplans() may move
async ones to ms_valid_asyncplans. Is that right?

/*
* If we've yet to determine the valid subplans then do so now. If
* run-time pruning is disabled then the valid subplans will always be
* set to all subplans.
*/

3. This code comment should also mention the Assert(!bms_is_member(...));?

+ /* The result should be a TupleTableSlot or NULL. */
+ Assert(slot == NULL || IsA(slot, TupleTableSlot));
+ Assert(!bms_is_member(areq->request_index, node->ms_has_asyncresults));

4. It's worth include bms_num_members(node->ms_needrequest) <= 0 check
on ExecMergeAppendAsyncRequest() as an early return? IIUC it would avoid
the bms_is_member(), bms_next_member() and bms_is_empty(needrequest)
calls.

ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
Bitmapset *needrequest;
int i;

+ if (bms_num_members(node->ms_needrequest) <= 0)
+ return false;
+

I'm attaching a diff with some cosmetic changes of indentation and
comments. Feel free to include on the patch or not.

--
Matheus Alcantara
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
diff.txt text/plain 2.6 KB

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asynchronous MergeAppend
Date: 2025-11-20 14:22:32
Message-ID: bcdefce7e3db9566d0619ad2eed2a299@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Matheus Alcantara писал(а) 2025-11-20 00:51:
> On Tue Nov 18, 2025 at 4:14 AM -03, Alexander Pyhalov wrote:
>> Updated the first patch.
>>
> Thanks for the new version. Some new comments.
>
> v7-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch:
>
> 1. Should be "nasyncplans" instead of "nplans"? ExecInitAppend use
> "nasyncplans" to allocate the as_asyncresults array.
>
> + mergestate->ms_asyncresults = (TupleTableSlot **)
> + palloc0(nplans * sizeof(TupleTableSlot *));
> +
>

No. There's a difference between how Append and MergeAppend handle async
results.

When Append looks for the next result, it can return any of them.
So, async results are not ordered in Append.
We have maximum nasyncplans of async results and return the first
available result
when we asked for one . For example, in ExecAppendAsyncRequest():

1004 /*
1005 * If there are any asynchronously-generated results that
have not yet
1006 * been returned, we have nothing to do; just return one of
them.
1007 */
1008 if (node->as_nasyncresults > 0)
1009 {
1010 --node->as_nasyncresults;
1011 *result =
node->as_asyncresults[node->as_nasyncresults];
1012 return true;
1013 }

ExecAppendAsyncGetNext() looks (via ExecAppendAsyncRequest()) on any
result.

However, when we are asked for result in MergeAppend, we should return
result of
the specific subplan. To achieve this we should know, which subplan
given results correspond to.
So, we enumerate async results in the same way as requests (or
ms_valid_asyncplans).
Look at ExecAppendAsyncGetNext()/ExecAppendAsyncRequest().

> 2. I think that this comment should be updated. IIUC ms_valid_subplans
> may not be all subplans because classify_matching_subplans() may move
> async ones to ms_valid_asyncplans. Is that right?
>
> /*
> * If we've yet to determine the valid subplans then do so now. If
> * run-time pruning is disabled then the valid subplans will always be
> * set to all subplans.
> */
>

Yes, you are correct, and similar comment in nodeAppend.c lacks the last
sentence.
Removed it.

> 3. This code comment should also mention the
> Assert(!bms_is_member(...));?
>
> + /* The result should be a TupleTableSlot or NULL. */
> + Assert(slot == NULL || IsA(slot, TupleTableSlot));
> + Assert(!bms_is_member(areq->request_index,
> node->ms_has_asyncresults));
>

> 4. It's worth include bms_num_members(node->ms_needrequest) <= 0 check
> on ExecMergeAppendAsyncRequest() as an early return? IIUC it would
> avoid
> the bms_is_member(), bms_next_member() and bms_is_empty(needrequest)
> calls.

We can't exclude the first bms_is_member(), as node->ms_needrequest can
be empty
(we've already got result), so do not need to do request to get it, just
return previously fetched result.

Not sure about check above the following lines:

650 i = -1;
651 while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
652 {
653 if (!bms_is_member(i, node->ms_has_asyncresults))
654 needrequest = bms_add_member(needrequest,
i);
655 }
656

I think, it shouldn't be much cheaper as bms_next_member() will execute
a couple instructions
to find out that the number of words in bitmapset is zero, but will do
nothing expensive.

>
> ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
> Bitmapset *needrequest;
> int i;
>
> + if (bms_num_members(node->ms_needrequest) <= 0)
> + return false;
> +
>

No, as I've mentioned, we can't exclude bms_is_member(mplan,
node->ms_has_asyncresults) check.
We could have received result while waiting for data for another
subplan.

Let's assume, we have 2 async subplans (0 and 1). For example, we've
decided
to get data from subplan 1. We 've already send requests to both async
subplans (in ExecMergeAppendAsyncBegin() or later).
Now we do ExecMergeAppendAsyncRequest(), but there's no subplans, which
need request.
So we enter the waiting loop. Let's assume we got event for another
subplan (0). Via ExecAsyncNotify(),
ExecAsyncForeignScanNotify() we go to postgresForeignAsyncNotify(),
fetch data and mark request as complete.
Via ExecAsyncMergeAppendResponse() we save results for subplan 0 in
node->ms_asyncresults[0]. When we finally
got result for subplan 1, we do the same, but now exit the loop. When
MergeAppend finally decides that it needs
results from subplan 0, we already have them, but ms_needrequest is
empty, so ExecMergeAppendAsyncRequest()
just returns this pre-fetched tuple.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v8-0001-mark_async_capable-subpath-should-match-subplan.patch text/x-diff 2.4 KB
v8-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch text/x-diff 50.3 KB