how to add index to column with duplicate values to make the query faster in postgresql?

Question

Here's the table in Postgresql:

name, ts, value
A, 2017-05-28, 1
A, 2017-05-27, 5
A, 2017-05-26, 2
...
B, 2017-05-28, 9
B, 2017-05-28, 12
...

The size of the table will be over 10 million. I'm trying to execute select count(distinct(name)) from "table"; and it takes me over 240s without ending. Could anyone give some suggestions regarding the way to optimise this scenario, like adding partition like Hive or adding index (which needs to be unique, but the name is duplicate across multiple records). Thanks!

wiki.postgresql.org/wiki/Slow_Query_Questions

user330315
– user330315

2017-05-28 15:42:52 +00:00
Commented May 28, 2017 at 15:42 — user330315
– user330315, Commented May 28, 2017 at 15:42

Gordon Linoff · Accepted Answer · 2017-05-28 15:48:47Z

1

For some reason, Postgres does not optimize count(distinct name) very well. (Intriguingly, Hive -- which has a very different optimizer -- has a similar problem.)

Try running the query this way:

select count(*)
from (select distinct name
      from t
     ) t;

I don't think an index will help, but you can always try using one on t(name).

answered May 28, 2017 at 15:48

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

how to add index to column with duplicate values to make the query faster in postgresql?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related