PostgreSQL update performance with a select

Question

"Name" is a table with more or less 1 million rows. I've tried this request but it never ends. Is there an issue to avoid the "in" ?

update name 
   set name_val = true 
where name_pk in (select max (name_pk) 
                  from name 
                  group by foreign_key_pk);

I'm not against triggers if it's necessary.

Query plan :

"Nested Loop  (cost=26073.59..26310.38 rows=200 width=54)"
"  ->  HashAggregate  (cost=26073.59..26075.59 rows=200 width=4)"
"        ->  HashAggregate  (cost=23122.82..24598.20 rows=118031 width=12)"
"              ->  Seq Scan on name  (cost=0.00..19956.21 rows=633321 width=12)"
"  ->  Index Scan using name_pk on name  (cost=0.00..1.16 rows=1 width=54)"
"        Index Cond: (public.name.name_pk = (max(public.name.name_pk)))"

2 indexes :

CREATE INDEX link_name_foreign_key_pk
  ON name
  USING btree
  (foreign_key_pk);

CREATE UNIQUE INDEX name_pk
  ON name
  USING btree
  (name_pk);

Thanks.

What does the execution plan say? Which indexes are defined on the table? — user330315
– user330315, Commented Jun 10, 2013 at 15:02
For the note, what does name_pk represent, and could the query be rewritten like this? postgres.cz/wiki/… — Denis de Bernardy
– Denis de Bernardy, Commented Jun 10, 2013 at 15:23
Try an index on (foreign_key_pk, name_pk) especially with Postgres 9.2 this might help. — user330315
– user330315, Commented Jun 10, 2013 at 15:32
Please remember to always include a table definition and your version of Postgres for questions like this. For starters, it would have helped to see NOT NULL constraints ... — Erwin Brandstetter
– Erwin Brandstetter, Commented Jun 10, 2013 at 16:34

Community · Accepted Answer · 2017-05-23 11:57:24Z

3

Create a multi-column index like this (much like @a_horse already suggested in the comment):

CREATE INDEX name_foo_id ON name (foreign_key_pk, name_pk DESC)

DESC makes is only slightly faster. Postgres can scan indexes backwards almost as fast. But it might get tricky with multi-column indexes.

And use this alternative syntax for the UPDATE:

UPDATE name n
SET    name_val = TRUE
FROM  (
    SELECT max(name_pk) AS max_pk
    FROM   name 
    GROUP  BY foreign_key_pk
  ) x
WHERE n.name_pk = x.max_pk
AND   name_val IS DISTINCT FROM TRUE;

IN tends to be the slowest possible solution for bigger sets. A JOIN should be faster.

The additional WHERE clause AND name_val IS DISTINCT FROM TRUE avoids (expensive) empty updates.

An anti-semi-join with NOT EXISTS might be a contender for the performance crown, too:

UPDATE name n
SET    name_val = TRUE
WHERE  NOT EXISTS (
   SELECT 1
   FROM   name
   WHERE  foreign_key_pk = n.foreign_key_pk
   AND    name_pk > n.name_pk
   )
AND    name_val IS DISTINCT FROM TRUE;

edited May 23, 2017 at 11:57

CommunityBot

11 silver badge

answered Jun 10, 2013 at 16:22

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

saka Over a year ago

Good job, it works perfectly ! For information, the second request is 20% faster. Thanks.

Collectives™ on Stack Overflow

PostgreSQL update performance with a select

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related