4

Below is a postgres query that seems to be taking far longer than I would expect. The field_instances table is indexed on both form_instance_id and field_id, and the form_instances table is indexed on workflow_state. So I thought it would be a fast query, but it takes forever. Can anybody help me interpret the query plan and what kinds of indexes to add to speed it up? Thanks.

explain analyze
select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state = 'DRqueued'
    and form_instance_id = form_instances.id
    and field_id = 'Book_EstimatedDueDate';
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=8733.85..95692.90 rows=9277 width=29) (actual time=2550.000..15430.000 rows=11431 loops=1)
   Hash Cond: (field_instances.form_instance_id = form_instances.id)
   ->  Bitmap Heap Scan on field_instances  (cost=2681.11..89071.72 rows=47567 width=25) (actual time=850.000..13690.000 rows=51726 loops=1)
         Recheck Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
         ->  Bitmap Index Scan on index_field_instances_on_field_id  (cost=0.00..2669.22 rows=47567 width=0) (actual time=830.000..830.000 rows=51729 loops=1)
               Index Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
   ->  Hash  (cost=5911.34..5911.34 rows=11312 width=8) (actual time=1590.000..1590.000 rows=11431 loops=1)
         ->  Bitmap Heap Scan on form_instances  (cost=511.94..5911.34 rows=11312 width=8) (actual time=720.000..1570.000 rows=11431 loops=1)
               Recheck Cond: ((workflow_state)::text = 'DRqueued'::text)
               ->  Bitmap Index Scan on index_form_instances_on_workflow_state  (cost=0.00..509.11 rows=11312 width=0) (actual time=650.000..650.000 rows=11509 loops=1)
                     Index Cond: ((workflow_state)::text = 'DRqueued'::text)
 Total runtime: 15430.000 ms
(12 rows)
2
  • You can try something like set enable_hashjoin = 0; and see if that gives you a faster plan. If it does, then we shall proceed to check why that plan wasn't used in the first place. Commented Jun 22, 2011 at 4:37
  • A couple of things. What version of pg is this, and have you tried upping work_mem a bit (say 16MB or so)? Oh and could we get either the schema for the tables, or fully qualified column names, it's a little confusing when I don't know which columns come from which tables. Also, have you tried using explicit join syntax? (i.e. from a join b on (a.x=b.y) Commented Jun 22, 2011 at 20:11

3 Answers 3

4

When you say The field_instances table is indexed on both form_instance_id and field_id you mean that there are separate indexes on form_instance_id and field_id on that table?

Try dropping the index on form_instance_id and put a concatenated index on (form_instance_id, field_id).

An index works by giving you a quick lookup that tells you where the rows are that match your index. It then has to fetch through those rows to do what you want. So you always want your index to be as specific as possible. If you put two indexes on the table, you'll have two different ways to do a lookup, but a query will usually only take advantage of one of them. If you put a concatenated index on the table, you'll be able to look up on the first field in the index, the first two fields, etc efficiently. (So a concatenated index on (a, b) gives you fast lookups on a, even faster lookups on both a and b, but doesn't help you look things up on b)

Right now it is figuring out all possible things in form_instances that have the right state. It separately figures out all of the field_instances that have the right field id. It then does a hash join. For this makes a lookup hash from one result set, and scans the other for matches.

With my suggestion it should figure out all possible form_instances of interest. It will then go to the index, and figure out all of the field_instances that match on both the form instance and field id, and then it will find exactly the results of interest. Because the index is more specific, the database will have fewer rows of data to deal with to process your query.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 - this would seem like the best solution to the problem with the information provided - fingers crossed!
As you suspected, I had two separate indexes, one for form_instance_id and one for field_id on the field_instances table. However, I followed your advice, and dropped the form_instance_id index and created the new index on (form_instance_id, field_id), but it doesn't make a difference. The new plan doesn't even seem to use that index. Instead it uses a different index I had created for a different query on (answer,field_id).
@zippy: OK, now the PostgreSQL optimizer is just doing stupid things. I'd need to have access to the system to play with it for a while to figure out why it is being so silly, and figure out how to trick it into doing the right thing. Sorry I can't be of more help.
1

http://explain.depesz.com is a fantastic online tool that helps you identify the hot spots visually. I pasted your results into the tool and got this analysis: http://explain.depesz.com/s/VIk

It's hard to tell anything specifically without seeing your tables and indexes, however.

1 Comment

We were told all of the information we needed about tables and indexes. Also knowing that it spend all of the time in the hash join does nothing to tell us how to get rid of that hash join.
0

Need to know the data you have in your table however just from looking at the sql and column names I would recommend

  1. do you really need an index on workflow_state assuming elements within it can't be very unique - this might not improve select but will insert or update...
  2. try making field_id check the first condition in your where statement

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.