1

I'm trying to get this query to run faster. It seems like sorting by the quality field is what really slows it down (the table has about 5 million rows) - maybe there is an index I can use for that?

Query:

SELECT "connectr_twitterpassage"."id", "connectr_twitterpassage"."third_party_id", "connectr_twitterpassage"."third_party_created", "connectr_twitterpassage"."source", "connectr_twitterpassage"."text", "connectr_twitterpassage"."author", "connectr_twitterpassage"."raw_data", "connectr_twitterpassage"."retweet_count", "connectr_twitterpassage"."favorited_count", "connectr_twitterpassage"."lang", "connectr_twitterpassage"."location", "connectr_twitterpassage"."author_followers_count", "connectr_twitterpassage"."is_retweet", "connectr_twitterpassage"."url", "connectr_twitterpassage"."author_fk_id", "connectr_twitterpassage"."quality", "connectr_twitterpassage"."is_top_tweet", "connectr_twitterpassage"."created", "connectr_twitterpassage"."modified" 
FROM "connectr_twitterpassage" 
INNER JOIN "connectr_twitterpassage_words" ON ("connectr_twitterpassage"."id" = "connectr_twitterpassage_words"."twitterpassage_id") 
WHERE "connectr_twitterpassage_words"."word_id" = 18974807  
ORDER BY "connectr_twitterpassage"."quality" 
DESC LIMIT 100

Here is the EXPLAIN ANALYZE: http://explain.depesz.com/s/7zb

And the table definitions:

\d connectr_twitterpassage

             Column         |           Type           |                              Modifiers                               
    ------------------------+--------------------------+----------------------------------------------------------------------
     id           

      | integer                  | not null default nextval('connectr_twitterpassage_id_seq'::regclass)
 third_party_id         | character varying(10000) | not null
 source                 | character varying(10000) | not null
 text                   | character varying(10000) | not null
 author                 | character varying(10000) | not null
 raw_data               | character varying(10000) | not null
 created                | timestamp with time zone | not null
 modified               | timestamp with time zone | not null
 third_party_created    | timestamp with time zone | 
 retweet_count          | integer                  | not null
 favorited_count        | integer                  | not null
 lang                   | character varying(10000) | not null
 location               | character varying(10000) | not null
 author_followers_count | integer                  | not null
 is_retweet             | boolean                  | not null
 url                    | character varying(10000) | not null
 author_fk_id           | integer                  | 
 quality                | bigint                   | 
 is_top_tweet           | boolean                  | not null
Indexes:
    "connectr_passage_pkey" PRIMARY KEY, btree (id)
    "connectr_twitterpassage_third_party_id_uniq" UNIQUE CONSTRAINT, btree (third_party_id)
    "connectr_passage_author_followers_count" btree (author_followers_count)
    "connectr_passage_favorited_count" btree (favorited_count)
    "connectr_passage_retweet_count" btree (retweet_count)
    "connectr_passage_source" btree (source)
    "connectr_passage_source_like" btree (source varchar_pattern_ops)
    "connectr_passage_third_party_id" btree (third_party_id)
    "connectr_passage_third_party_id_like" btree (third_party_id varchar_pattern_ops)
    "connectr_twitterpassage_author_fk_id" btree (author_fk_id)
    "connectr_twitterpassage_created" btree (created)
    "connectr_twitterpassage_is_top_tweet" btree (is_top_tweet)
    "connectr_twitterpassage_quality" btree (quality)
    "connectr_twitterpassage_third_party_created" btree (third_party_created)
    "id_to_quality_sorted" btree (id, quality DESC NULLS LAST)
Foreign-key constraints:
    "author_fk_id_refs_id_074720a5" FOREIGN KEY (author_fk_id) REFERENCES connectr_twitteruser(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "connectr_passageviewevent" CONSTRAINT "passage_id_refs_id_892b36a6" FOREIGN KEY (passage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "twitter_from_id_refs_id_8adbab24" FOREIGN KEY (twitter_from_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "twitter_to_id_refs_id_8adbab24" FOREIGN KEY (twitter_to_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_twitterpassage_words" CONSTRAINT "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED


connectr=# \d connectr_twitterpassage_words
                               Table "public.connectr_twitterpassage_words"
      Column       |  Type   |                                 Modifiers                                  
-------------------+---------+----------------------------------------------------------------------------
 id                | integer | not null default nextval('connectr_twitterpassage_words_id_seq'::regclass)
 twitterpassage_id | integer | not null
 word_id           | integer | not null
Indexes:
    "connectr_twitterpassage_words_pkey" PRIMARY KEY, btree (id)
    "connectr_twitterpassage_twitterpassage_id_613c80271f09fba8_uniq" UNIQUE CONSTRAINT, btree (twitterpassage_id, word_id)
    "connectr_twitterpassage_words_twitterpassage_id" btree (twitterpassage_id)
    "connectr_twitterpassage_words_word_id" btree (word_id)
    "word_to_twitterpassage_id" btree (word_id, twitterpassage_id)
Foreign-key constraints:
    "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED

connectr=# \d connectr_word
                                        Table "public.connectr_word"
       Column        |           Type           |                         Modifiers                          
---------------------+--------------------------+------------------------------------------------------------
 id                  | integer                  | not null default nextval('connectr_word_id_seq'::regclass)
 word                | character varying(10000) | not null
 created             | timestamp with time zone | not null
 modified            | timestamp with time zone | not null
 frequency           | double precision         | 
 is_username         | boolean                  | not null
 is_hashtag          | boolean                  | not null
 cloud_eligible      | boolean                  | not null
 passage_count       | integer                  | 
 avg_quality         | double precision         | 
 last_twitter_search | timestamp with time zone | 
 cloud_approved      | boolean                  | not null
 display_word        | character varying(10000) | not null
 is_trend            | boolean                  | not null
Indexes:
    "connectr_word_pkey" PRIMARY KEY, btree (id)
    "connectr_word_word_uniq" UNIQUE CONSTRAINT, btree (word)
    "connectr_word_avg_quality" btree (avg_quality)
    "connectr_word_cloud_eligible" btree (cloud_eligible)
    "connectr_word_last_twitter_search" btree (last_twitter_search)
    "connectr_word_passage_count" btree (passage_count)
    "connectr_word_word" btree (word)
Referenced by:
    TABLE "connectr_passageviewevent" CONSTRAINT "source_word_id_refs_id_178d46eb" FOREIGN KEY (source_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_wordmatchrewardevent" CONSTRAINT "tapped_word_id_refs_id_c2ffb369" FOREIGN KEY (tapped_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "word_id_refs_id_00cccde2" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_twitterpassage_words" CONSTRAINT "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
4
  • this looks a bit like full text search. Are you by chance trying to query for connectr_twitterpassage rows that mention a particular "word"? if so, you shouldn't roll your own implementation; postgres has very good support for that built in. Commented Oct 11, 2013 at 17:03
  • Well my thinking was it should be faster to look up a word from the table based on an exact match, and then lookup that word's matching twitterpassages. I tried using pg_trigram before but it wasn't as fast. Commented Oct 11, 2013 at 18:00
  • If you can afford the downtime, then running a cluster on the words table by the word_id column could be very beneficial. I'd take the word_id and passage id single column indexes off of the word table as well. Commented Oct 11, 2013 at 18:25
  • Thanks David. I did a CLUSTER and that has improved the performance significantly. Commented Oct 14, 2013 at 16:01

1 Answer 1

1

Looking at the explain output, the sort is taking very little of the time. It is gathering the data it needs to sort that takes the time.

You must be spending a bit of time hitting the disk. If you could get your data better cached, that should speed it up a lot using the same query.

Otherwise, your best bet may be to denormalize the data by adding the quality field to the connectr_twitterpassage_words table and having an index on (word_id, quality,...)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.