1

I have a very simple query that is taking way too long to run.

SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;

What indexes do I need to add to speed up? I ran a simple vacuum; command and added the following index but neither helped.

CREATE INDEX tbl_idx ON tbl1(col1,col2,col3,col4);

The table has 400k rows. In fact counting them is taking extremely long as well. Running a simple

SELECT count(*) from tbl1;

is taking 8 seconds. So it's possible my problems are with vacuuming or reindexing or something I'm not sure.

Here is the explain command

EXPLAIN SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Unique  (cost=3259846.80..3449267.51 rows=137830 width=25)
   ->  Sort  (cost=3259846.80..3297730.94 rows=15153657 width=25)
         Sort Key: col1, col2, col3, col4
         ->  Seq Scan on tbl1 (cost=0.00..727403.57 rows=15153657 width=25)
(4 rows)

Edit: I'm currently running vacuum full; which hopefully fixes the issue and then maybe someone can give me some pointers on how to fix where I went wrong. It is several hours in and still going as far as I can tell. I did run

select relname, last_autoanalyze, last_autovacuum, last_vacuum, n_dead_tup from pg_stat_all_tables where n_dead_tup >0;

and the table has nearly 16 million n_dead_tup rows.

10
  • What happens when you run `SELECT DISTINCT col1 FROM tbl1;' ? Commented Jul 12, 2017 at 0:18
  • It takes super long. 12 seconds Commented Jul 12, 2017 at 1:05
  • If that one column takes 12 seconds, then the time for each additional column is going to double for each column. So is the whole query taking 48 seconds? Or is it longer? Commented Jul 12, 2017 at 1:07
  • No its about the same. The distinct on the four columns took 9 seconds when I ran it just now. The one column took 7 seconds when I ran it a second time just now. Commented Jul 12, 2017 at 1:10
  • If DISTINCT on col1 takes 12 seconds, how can it take 9 seconds for all four? edit - I just saw that it only took 7 seconds on col1. Commented Jul 12, 2017 at 1:11

3 Answers 3

1

My data doesn't change that frequently so I ended up creating a materialized view

CREATE MATERIALIZED VIEW tbl1_distinct_view AS SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;

that I refresh with a cronjob once a day at 6am

0 6 * * * psql -U mydb mydb -c 'REFRESH MATERIALIZED VIEW tbl1_distinct_view;

Sign up to request clarification or add additional context in comments.

Comments

0

try force database to use your index

set enable_seqscan=off ;
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
set enable_seqscan=on ;

Comments

0

VACUUM and VACUUM FULL are two commands that sound the same but have very different effects.

VACUUM scans a table for tuples that it no longer needs, so that it can overwrite that space during INSERT or UPDATE statements. This command only looks at deleted rows, and does not "defragment" the table - it leaves the space usage the same, but simply marks some space as "dead" in order that it can be reused.

VACUUM FULL looks at every row, and reclaims the space left by deleted rows and dead tuples, essentially "defragmenting" the table. If this is done on a live table, it can take a very long time, and can result in heavy weight locks, increased IO, and index bloat.

I imagine what you need is a VACUUM followed by an ANALYZE, which will rebuild your statistics for each table, improving index performance. These should be performed reasonably regularly in low-usage times for a database. Only if you have a lot of space to reclaim (due to lots of DELETE statements) should you use VACUUM FULL.

Anyhow, since you've run a VACUUM FULL, once that it complete you should run an ANALYZE on the database, followed by a REINDEX (on the database), and then an EXPLAIN on your query again, you should notice an improvement.

1 Comment

Thanks. The vacuum full ended up timing out and disconnecting after 10 hours and I had to kill it and restart. Running a normal vacuum followed by analyze and the query is down to less than a second now. I'll add these to cron once a day.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.