0

I have rather simple query with big total runtime. Can you advise me how can I optimize it?

Here is explain analyze: http://explain.depesz.com/s/9xC5

query:

select wpis_id from spoleczniak_oznaczone
where etykieta_id in(
  select tag_id
  from spoleczniak_subskrypcje
  where postac_id = 376476
  );

spoleczniak_oznaczone:

 Column    |  Type   |                             Modifiers
-------------+---------+--------------------------------------------------------------------
 id          | integer | not null default nextval('spoleczniak_oznaczone_id_seq'::regclass)
 etykieta_id | integer | not null
 wpis_id     | integer | not null
Indexes:
    "spoleczniak_oznaczone_pkey" PRIMARY KEY, btree (id)
    "spoleczniak_oznaczone_etykieta_id" btree (etykieta_id)
    "spoleczniak_oznaczone_wpis_id" btree (wpis_id)
Foreign-key constraints:
    "spoleczniak_oznaczone_etykieta_id_fkey" FOREIGN KEY (etykieta_id) REFERENCES    spoleczniak_etykiety(id) DEFERRABLE INITIALLY DEFERRED
    "spoleczniak_oznaczone_wpis_id_fkey" FOREIGN KEY (wpis_id) REFERENCES spoleczniak_tablica(id) DEFERRABLE INITIALLY DEFERRED

spoleczniak_subskrypcje:

  Column   |  Type   |                              Modifiers
-----------+---------+----------------------------------------------------------------------
 id        | integer | not null default nextval('spoleczniak_subskrypcje_id_seq'::regclass)
 postac_id | integer | not null
 tag_id    | integer | not null
Indexes:
    "spoleczniak_subskrypcje_pkey" PRIMARY KEY, btree (id)
    "spoleczniak_subskrypcje_postac_id" btree (postac_id)
    "spoleczniak_subskrypcje_postac_tag" btree (postac_id, tag_id)
    "spoleczniak_subskrypcje_tag_id" btree (tag_id)
Foreign-key constraints:
    "spoleczniak_subskrypcje_postac_id_fkey" FOREIGN KEY (postac_id) REFERENCES postac_postacie(id) DEFERRABLE INITIALLY DEFERRED
    "spoleczniak_subskrypcje_tag_id_fkey" FOREIGN KEY (tag_id) REFERENCES spoleczniak_etykiety(id) DEFERRABLE INITIALLY DEFERRED
9
  • Please also add the actual query. Commented Mar 19, 2014 at 14:27
  • 1
    ... and the resulting query plan, please... Does your table have valid statistics ? did you perform any tuning on the database settings ? Commented Mar 19, 2014 at 14:44
  • @wildplasser: the query plan is there. A link to explain.depesz.com Commented Mar 19, 2014 at 15:01
  • Which Postgres version is that? On 9.2 I would have expected it to make use of an index only scan on the index spoleczniak_subskrypcje_postac_tag. Also your row estimates are a bit off. So maybe it is a problem with statistics Commented Mar 19, 2014 at 15:13
  • If you read the plan carefully, you could see that your statistics are off. Run vacuum analyze; on both tables. Commented Mar 19, 2014 at 16:13

4 Answers 4

2

From the Query Plan, most of the time seems to be involved in working out the IN part of the where clause. Proper indexes seem to be used.

select o.wpis_id 
from spoleczniak_oznaczone o
inner join spoleczniak_subskrypcje s on s.tag_id = o.etykieta_id
where s.postac_id = 376476

...looks to be functionally the same but tries it in a different way and could generate a different query plan.

Also, as @wildplasser says, make sure statistics are up-to-date, and indexes defragmented (don't know how to do those in PostgreSQL myself).

EDIT: as @a_horse_with_no_name says in the comment below, the query I've suggested can return duplicates where the original wouldn't. Without knowing your data I don't know whether it will or not. That's a warning to bear in mind.

Sign up to request clarification or add additional context in comments.

6 Comments

A join is not necessarily a replacement for an IN condition.
I don't think so. postac_id is not unique, neither is tag_id. So a single postac_id could return the same tag_id multiple times and then the result of the join will be different to the result of an IN query.
...and I missed something. You're right - Thanks! I'll annotate the answer.
@a_horse_with_no_name: in this case you are wrong: in (...) removes duplicates (and NULLs) from the subquery's result before using it in the main query.
@wildplasser: exactly, and that's why the join will return something different.
|
1

Is there a reason you preferred using in and a subquery to:

select wpis_id 
from spoleczniak_oznaczone, spoleczniak_subskrypcje
where etykieta_id = tag_id 
and postac_id = 376476

I would guess a simple join might be simpler for the query optimiser.

3 Comments

To the OP: If you read the plan carefully, you could see that your statistics are off. Run vacuum analyze; on both tables.
I did. Before running that query.
Seems illogical: the estimated number of rows in the subquery is still terribly wrong.
1

This should be equivalent (and in most cases will generate the same query plan)

SELECT so.wpis_id
FROM spoleczniak_oznaczone so
WHERE EXISTS (
  SELECT *
  FROM spoleczniak_subskrypcje ss
  WHERE ss.tag_id= so.etykieta_id
  AND so.postac_id = 376476
  );

Comments

1

Try replacing this index:

"spoleczniak_oznaczone_etykieta_id" btree (etykieta_id)

with an index on (etykieta_id, wpis_id). This way DB could perform index-only scan (without fetching whole rows from table which costs access time).

3 Comments

It's faster, but not fast enough (original query ~800 ms, after this index change ~500 ms).
Can you attach explain plan with BUFFERS parameter? Your query returns >500.000 rows and this may be the cause.
It does now perform index-only scan, but it needs to loop 23 times through it. Shouldn't a pair (postac_id, tag_id) in spoleczniak_subskrypcje be unique? If I get it correctly, this table is for people's subscriptions on tags, right? If that's true, then you may try @simonatrcl query - replace IN (...) with join. This may eliminate nested loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.