Why does the sort method used by Postgres change from "top-N" to "external merge" when FOR UPDATE is added?

Question

I have a table in Postgres (14) that is used like a queue with FOR UPDATE SKIP LOCKED queries. It was missing an index, but since any rows are usually processed quickly that never became a problem. Until it suddenly did, after a surge of new work, which slowed the entire database down. I now added the index and all is good, but I'd like to understand what really happened.

A simplified version of the table looks like this:

CREATE TABLE outgoing(
    id int PRIMARY KEY,
    created_at timestamptz NOT NULL,
    some_data text NOT NULL
);

It is periodically queried like this:

SELECT * 
FROM outgoing
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;

When I suddenly got around 500,000 rows in the table, the above query started to take about 30 seconds to execute. After some debugging, I've found that it was the sorting that was the culprit. With the FOR UPDATE clause, it would choose Sort Method: external merge. However, if I removed FOR UPDATE from the query, it changed to the much faster and less resource hungry (and also expected) Sort Method: top-N heapsort. Why does the presence of FOR UPDATE change the sort method?

The behaviour can be reproduced with the above table definition and query, plus the following block to add some rows:

do $$
begin
    for i in 1..500000 loop
        insert into outgoing values (i, now() - make_interval(0, 0, 0, 0, 0, i), i::text);
    end loop;
end; $$;

Plan without FOR UPDATE:

EXPLAIN (ANALYZE, BUFFERS) SELECT * 
FROM outgoing
ORDER BY created_at
LIMIT 100;

Limit  (cost=14230.69..14242.36 rows=100 width=18) (actual time=80.188..85.939 rows=100 loops=1)
  Buffers: shared hit=3305 dirtied=1
  ->  Gather Merge  (cost=14230.69..62845.12 rows=416666 width=18) (actual time=80.186..85.929 rows=100 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=3305 dirtied=1
        ->  Sort  (cost=13230.67..13751.50 rows=208333 width=18) (actual time=25.455..25.459 rows=67 loops=3)
              Sort Key: created_at
              Sort Method: top-N heapsort  Memory: 40kB
              Buffers: shared hit=3305 dirtied=1
              Worker 0:  Sort Method: top-N heapsort  Memory: 40kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Seq Scan on outgoing  (cost=0.00..5268.33 rows=208333 width=18) (actual time=0.007..7.269 rows=166667 loops=3)
                    Buffers: shared hit=3185 dirtied=1
Planning Time: 0.097 ms
Execution Time: 85.972 ms

Plan with FOR UPDATE:

EXPLAIN (ANALYZE, BUFFERS) SELECT * 
FROM outgoing
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;

Limit  (cost=27294.64..27295.89 rows=100 width=24) (actual time=163.076..163.107 rows=100 loops=1)
  Buffers: shared hit=3285, temp read=494 written=2441
  ->  LockRows  (cost=27294.64..33544.64 rows=500000 width=24) (actual time=163.075..163.101 rows=100 loops=1)
        Buffers: shared hit=3285, temp read=494 written=2441
        ->  Sort  (cost=27294.64..28544.64 rows=500000 width=24) (actual time=163.053..163.058 rows=100 loops=1)
              Sort Key: created_at
              Sort Method: external merge  Disk: 19432kB
              Buffers: shared hit=3185, temp read=494 written=2441
              ->  Seq Scan on outgoing  (cost=0.00..8185.00 rows=500000 width=24) (actual time=0.010..29.042 rows=500000 loops=1)
                    Buffers: shared hit=3185
Planning Time: 0.073 ms
Execution Time: 166.365 ms

Laurenz Albe · Accepted Answer · 2025-08-20 08:53:36Z

2

The reason for the difference is that data modifying queries cannot use parallel query (locking rows modifies the data). With two parallel workers, each process only has to sort a third of the rows, which can be done using the available memory, as configured with work_mem. Without parallel query, work_mem is not sufficient, and the sort has to spill to disk.

The solution is to increase work_mem.

answered Aug 20 at 8:53

Laurenz Albe

257k22 gold badges313 silver badges389 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Tamas Kozma Aug 20 at 9:28

Thank you! However, shouldn't a top-N sort use very little memory (you only need to keep 100 items in your head)? The explain plan above says 40 kB (+25 kB for the other worker?) - that would fit nicely in the default 4 MB of work_mem. Why couldn't you still choose a top-N sort, just without parallelism?

Laurenz Albe Aug 20 at 12:40

I am not sure if your understanding of a top-N heapsort is correct. Note that in your first query, the top-N heapsort used more memory than the quicksort. You yould read up the algorithm in src/backend/utils/sort/tuplesort.c.

Tamas Kozma Aug 20 at 20:22

Thanks for pointing me to the code! Now I've read the relevant parts instead of just guessing what it would do. Unfortunately, I still don't see why we shouldn't use top-N heapsort. tuplesort_puttuple_common switches to bounded mode when we have twice LIMIT (200) tuples in the buffer, which should fit. In bounded mode the buffer (now a heap) no longer grows, but either the new tuple is discarded, or it replaces an existing one. That should mean that we never need more memory than what can room 200 rows. What am I missing?

Laurenz Albe Aug 21 at 7:12

I don't know why PostgreSQL doesn't choose a top-N heapsort there. You could step through the function with a debugger - that should give you a clue.

Laurenz Albe Aug 22 at 20:43

You should subscribe to the pgsql-hackers mailing list and send your feature request to them. Describe your reasoning.

|

ValNik · Accepted Answer · 2025-08-20 09:12:43Z

1

In addition to the explanation from @LaurenzAlbe, look at the possibility of applying an index to the "created_at" column.

You can avoid sort operation with index on created_at

create index ix_created_at_outgoung on outgoing(created_at);
select count(*) from outgoing;

count
50000

EXPLAIN ANALYZE
SELECT * 
FROM outgoing
ORDER BY created_at
LIMIT 100;

QUERY PLAN
Limit (cost=0.29..8.07 rows=100 width=44) (actual time=0.268..0.298 rows=100 loops=1)
-> Index Scan using ix_created_at_outgoung on outgoing (cost=0.29..2804.99 rows=36047 width=44) (actual time=0.129..0.151 rows=100 loops=1)
Planning Time: 0.709 ms
Execution Time: 0.766 ms

EXPLAIN ANALYZE
SELECT * 
FROM outgoing
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;

QUERY PLAN
Limit (cost=0.29..9.07 rows=100 width=50) (actual time=0.046..0.156 rows=100 loops=1)
-> LockRows (cost=0.29..3165.47 rows=36047 width=50) (actual time=0.045..0.146 rows=100 loops=1)
-> Index Scan using ix_created_at_outgoung on outgoing (cost=0.29..2804.99 rows=36047 width=50) (actual time=0.033..0.067 rows=100 loops=1)
Planning Time: 0.125 ms
Execution Time: 0.183 ms

fiddle

Without index

QUERY PLAN
Limit (cost=2057.16..2058.41 rows=100 width=50) (actual time=41.817..41.921 rows=100 loops=1)
-> LockRows (cost=2057.16..2507.75 rows=36047 width=50) (actual time=41.816..41.910 rows=100 loops=1)
-> Sort (cost=2057.16..2147.28 rows=36047 width=50) (actual time=41.793..41.810 rows=100 loops=1)
Sort Key: created_at
Sort Method: external merge Disk: 1864kB
-> Seq Scan on outgoing (cost=0.00..679.47 rows=36047 width=50) (actual time=0.013..8.923 rows=50000 loops=1)
Planning Time: 0.115 ms
Execution Time: 42.521 ms

edited Aug 20 at 9:12

answered Aug 20 at 8:57

ValNik

7,8381 gold badge8 silver badges18 bronze badges

2 Comments

Tamas Kozma Aug 20 at 20:29

Thank you for taking the time to answer my question! You are right, of course, and your answer will surely be helpful to many. What interested me personally most, was the change in sorting behaviour that I saw without the index.

ValNik Aug 20 at 20:56

Yes, your question was answered by Laurenz Albe. I just wanted to see how the index changes this behavior.

Collectives™ on Stack Overflow

Why does the sort method used by Postgres change from "top-N" to "external merge" when FOR UPDATE is added?

2 Answers 2

8 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related