1

"Name" is a table with more or less 1 million rows. I've tried this request but it never ends. Is there an issue to avoid the "in" ?

update name 
   set name_val = true 
where name_pk in (select max (name_pk) 
                  from name 
                  group by foreign_key_pk);

I'm not against triggers if it's necessary.

Query plan :

"Nested Loop  (cost=26073.59..26310.38 rows=200 width=54)"
"  ->  HashAggregate  (cost=26073.59..26075.59 rows=200 width=4)"
"        ->  HashAggregate  (cost=23122.82..24598.20 rows=118031 width=12)"
"              ->  Seq Scan on name  (cost=0.00..19956.21 rows=633321 width=12)"
"  ->  Index Scan using name_pk on name  (cost=0.00..1.16 rows=1 width=54)"
"        Index Cond: (public.name.name_pk = (max(public.name.name_pk)))"

2 indexes :

CREATE INDEX link_name_foreign_key_pk
  ON name
  USING btree
  (foreign_key_pk);

CREATE UNIQUE INDEX name_pk
  ON name
  USING btree
  (name_pk);

Thanks.

9
  • 1
    What does the execution plan say? Which indexes are defined on the table? Commented Jun 10, 2013 at 15:02
  • i've edited my original post. Commented Jun 10, 2013 at 15:15
  • For the note, what does name_pk represent, and could the query be rewritten like this? postgres.cz/wiki/… Commented Jun 10, 2013 at 15:23
  • 1
    Try an index on (foreign_key_pk, name_pk) especially with Postgres 9.2 this might help. Commented Jun 10, 2013 at 15:32
  • 1
    Please remember to always include a table definition and your version of Postgres for questions like this. For starters, it would have helped to see NOT NULL constraints ... Commented Jun 10, 2013 at 16:34

1 Answer 1

3

Create a multi-column index like this (much like @a_horse already suggested in the comment):

CREATE INDEX name_foo_id ON name (foreign_key_pk, name_pk DESC)

DESC makes is only slightly faster. Postgres can scan indexes backwards almost as fast. But it might get tricky with multi-column indexes.

And use this alternative syntax for the UPDATE:

UPDATE name n
SET    name_val = TRUE
FROM  (
    SELECT max(name_pk) AS max_pk
    FROM   name 
    GROUP  BY foreign_key_pk
  ) x
WHERE n.name_pk = x.max_pk
AND   name_val IS DISTINCT FROM TRUE;

IN tends to be the slowest possible solution for bigger sets. A JOIN should be faster.

The additional WHERE clause AND name_val IS DISTINCT FROM TRUE avoids (expensive) empty updates.

An anti-semi-join with NOT EXISTS might be a contender for the performance crown, too:

UPDATE name n
SET    name_val = TRUE
WHERE  NOT EXISTS (
   SELECT 1
   FROM   name
   WHERE  foreign_key_pk = n.foreign_key_pk
   AND    name_pk > n.name_pk
   )
AND    name_val IS DISTINCT FROM TRUE;
Sign up to request clarification or add additional context in comments.

1 Comment

Good job, it works perfectly ! For information, the second request is 20% faster. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.