Query optimization in PostgreSQL

Question

I have rather simple query with big total runtime. Can you advise me how can I optimize it?

Here is explain analyze: http://explain.depesz.com/s/9xC5

query:

select wpis_id from spoleczniak_oznaczone
where etykieta_id in(
  select tag_id
  from spoleczniak_subskrypcje
  where postac_id = 376476
  );

spoleczniak_oznaczone:

 Column    |  Type   |                             Modifiers
-------------+---------+--------------------------------------------------------------------
 id          | integer | not null default nextval('spoleczniak_oznaczone_id_seq'::regclass)
 etykieta_id | integer | not null
 wpis_id     | integer | not null
Indexes:
    "spoleczniak_oznaczone_pkey" PRIMARY KEY, btree (id)
    "spoleczniak_oznaczone_etykieta_id" btree (etykieta_id)
    "spoleczniak_oznaczone_wpis_id" btree (wpis_id)
Foreign-key constraints:
    "spoleczniak_oznaczone_etykieta_id_fkey" FOREIGN KEY (etykieta_id) REFERENCES    spoleczniak_etykiety(id) DEFERRABLE INITIALLY DEFERRED
    "spoleczniak_oznaczone_wpis_id_fkey" FOREIGN KEY (wpis_id) REFERENCES spoleczniak_tablica(id) DEFERRABLE INITIALLY DEFERRED

spoleczniak_subskrypcje:

  Column   |  Type   |                              Modifiers
-----------+---------+----------------------------------------------------------------------
 id        | integer | not null default nextval('spoleczniak_subskrypcje_id_seq'::regclass)
 postac_id | integer | not null
 tag_id    | integer | not null
Indexes:
    "spoleczniak_subskrypcje_pkey" PRIMARY KEY, btree (id)
    "spoleczniak_subskrypcje_postac_id" btree (postac_id)
    "spoleczniak_subskrypcje_postac_tag" btree (postac_id, tag_id)
    "spoleczniak_subskrypcje_tag_id" btree (tag_id)
Foreign-key constraints:
    "spoleczniak_subskrypcje_postac_id_fkey" FOREIGN KEY (postac_id) REFERENCES postac_postacie(id) DEFERRABLE INITIALLY DEFERRED
    "spoleczniak_subskrypcje_tag_id_fkey" FOREIGN KEY (tag_id) REFERENCES spoleczniak_etykiety(id) DEFERRABLE INITIALLY DEFERRED

... and the resulting query plan, please... Does your table have valid statistics ? did you perform any tuning on the database settings ? — wildplasser
– wildplasser, Commented Mar 19, 2014 at 14:44
@wildplasser: the query plan is there. A link to explain.depesz.com — user330315
– user330315, Commented Mar 19, 2014 at 15:01
Which Postgres version is that? On 9.2 I would have expected it to make use of an index only scan on the index spoleczniak_subskrypcje_postac_tag. Also your row estimates are a bit off. So maybe it is a problem with statistics — user330315
– user330315, Commented Mar 19, 2014 at 15:13
If you read the plan carefully, you could see that your statistics are off. Run vacuum analyze; on both tables. — wildplasser
– wildplasser, Commented Mar 19, 2014 at 16:13

simon at rcl · Accepted Answer · 2014-03-19 15:11:04Z

2

From the Query Plan, most of the time seems to be involved in working out the IN part of the where clause. Proper indexes seem to be used.

select o.wpis_id 
from spoleczniak_oznaczone o
inner join spoleczniak_subskrypcje s on s.tag_id = o.etykieta_id
where s.postac_id = 376476

...looks to be functionally the same but tries it in a different way and could generate a different query plan.

Also, as @wildplasser says, make sure statistics are up-to-date, and indexes defragmented (don't know how to do those in PostgreSQL myself).

EDIT: as @a_horse_with_no_name says in the comment below, the query I've suggested can return duplicates where the original wouldn't. Without knowing your data I don't know whether it will or not. That's a warning to bear in mind.

edited Mar 19, 2014 at 15:11

answered Mar 19, 2014 at 14:54

simon at rcl

7,3691 gold badge21 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user330315 Over a year ago

A join is not necessarily a replacement for an IN condition.

user330315 Over a year ago

I don't think so. postac_id is not unique, neither is tag_id. So a single postac_id could return the same tag_id multiple times and then the result of the join will be different to the result of an IN query.

simon at rcl Over a year ago

...and I missed something. You're right - Thanks! I'll annotate the answer.

wildplasser Over a year ago

@a_horse_with_no_name: in this case you are wrong: in (...) removes duplicates (and NULLs) from the subquery's result before using it in the main query.

user330315 Over a year ago

@wildplasser: exactly, and that's why the join will return something different.

|

scav · Accepted Answer · 2014-03-19 14:59:44Z

1

Is there a reason you preferred using in and a subquery to:

select wpis_id 
from spoleczniak_oznaczone, spoleczniak_subskrypcje
where etykieta_id = tag_id 
and postac_id = 376476

I would guess a simple join might be simpler for the query optimiser.

edited Mar 19, 2014 at 14:59

user330315

answered Mar 19, 2014 at 14:56

scav

1,1551 gold badge9 silver badges17 bronze badges

3 Comments

wildplasser Over a year ago

To the OP: If you read the plan carefully, you could see that your statistics are off. Run vacuum analyze; on both tables.

user3437500 Over a year ago

I did. Before running that query.

wildplasser Over a year ago

Seems illogical: the estimated number of rows in the subquery is still terribly wrong.

wildplasser · Accepted Answer · 2014-03-19 15:12:13Z

1

This should be equivalent (and in most cases will generate the same query plan)

SELECT so.wpis_id
FROM spoleczniak_oznaczone so
WHERE EXISTS (
  SELECT *
  FROM spoleczniak_subskrypcje ss
  WHERE ss.tag_id= so.etykieta_id
  AND so.postac_id = 376476
  );

answered Mar 19, 2014 at 15:12

wildplasser

44.5k9 gold badges72 silver badges116 bronze badges

Comments

Michał Kołodziejski · Accepted Answer · 2014-03-19 21:04:56Z

1

Try replacing this index:

"spoleczniak_oznaczone_etykieta_id" btree (etykieta_id)

with an index on (etykieta_id, wpis_id). This way DB could perform index-only scan (without fetching whole rows from table which costs access time).

answered Mar 19, 2014 at 21:04

Michał Kołodziejski

7065 silver badges6 bronze badges

3 Comments

user3437500 Over a year ago

It's faster, but not fast enough (original query ~800 ms, after this index change ~500 ms).

Michał Kołodziejski Over a year ago

Can you attach explain plan with BUFFERS parameter? Your query returns >500.000 rows and this may be the cause.

Michał Kołodziejski Over a year ago

It does now perform index-only scan, but it needs to loop 23 times through it. Shouldn't a pair (postac_id, tag_id) in spoleczniak_subskrypcje be unique? If I get it correctly, this table is for people's subscriptions on tags, right? If that's true, then you may try @simonatrcl query - replace IN (...) with join. This may eliminate nested loop.

Collectives™ on Stack Overflow

Query optimization in PostgreSQL

4 Answers 4

6 Comments

3 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

3 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related