27

I have a table named events in my Postgresql 9.5 database. And this table has about 6 million records.

I am runnig a select count(event_id) from events query. But this query takes 40seconds. This is very long time for a database. My event_id field of table is primary key and indexed. Why this takes very long time? (Server is ubuntu vm on vmware has 4cpu)

Explain:

"Aggregate  (cost=826305.19..826305.20 rows=1 width=0) (actual time=24739.306..24739.306 rows=1 loops=1)"
"  Buffers: shared hit=13 read=757739 dirtied=53 written=48"
"  ->  Seq Scan on event_source  (cost=0.00..812594.55 rows=5484255 width=0) (actual time=0.014..24087.050 rows=6320689 loops=1)"
"        Buffers: shared hit=13 read=757739 dirtied=53 written=48"
"Planning time: 0.369 ms"
"Execution time: 24739.364 ms"
2
  • 757739 blocks for just 6 million rows seems quite a lot. That's roughly 8 rows per block. How many columns does the table have? Or does it have several long text columns (with less than 2k in them). Could also be table bloat. Does this change if you cleanup the table using vacuum full events;? Commented Mar 6, 2019 at 11:57
  • There are 7 columns in the table. But one column has text type and so long json data. Commented Mar 6, 2019 at 12:18

2 Answers 2

42

I just ran into a situation where a table of 1.3 million rows was taking about 35 seconds to perform a simple SELECT COUNT(*). None of the other solutions helped. The issue ended up being that the table was just bloated and hadn't been vacuumed, so Postgres couldn't figure out the most optimal way to query the data. After I ran this, the query time dropped down to about 25ms!

VACUUM (ANALYZE, VERBOSE, FULL) my_table_name;

Sign up to request clarification or add additional context in comments.

5 Comments

Alec, you’re a lifesaver! I spent hours trying to figure out why my COUNT queries were taking ~8 minutes to run. Turns out Postgres’ default vacuum feature hadn’t been enough and needed a manual vacuum.
@AlecSanger How much time this VACCUM command takes? I have three million records in my table. The query is running in the background so I'm not able to verify.
Vacuum could take a while depending on how much data is in there and what state it's in. I've had it take something like 10 minutes in the past. The next time you run it, though, it should be super quick which would confirm that the table was indeed vacuumed.
This will lock your database! Potentially for a long time! For our table with 20M records, it took 4 minutes.
Watch out for VERBOSE + log_statement value, as this might run your instance out of storage space
11

There are multiple factors playing a big role in the decision for PostgreSQL how to execute the count(), but first of all, the column you use inside the count function does not matter. In fact, if you don't need DISTINCT count, stick with count(*).

You can try the following to force an index-only scan:

SELECT count(*) FROM (SELECT event_id FROM events) t;

...if that still results in a sequential scan, than most likely the index is not much smaller than the table itself. To still see how an index-only scan would perform, you can enforce it with:

SELECT count(*) FROM (SELECT event_id FROM events ORDER BY 1) t;

IF that is not much faster, you should also consider an upgrade of the PostgreSQL to at least version 9.6, which introduces parallel sequential scans to speed up these things.

In addition, you can achieve dramatic speedups choosing from a variety of techniques to provide counts which largely depend on your use-case and your requirements:

Last but not least, please always provide the output of an extended explain as @a_horse_with_no_name already recommended, e.g.:

EXPLAIN (ANALYZE, BUFFERS) SELECT count(event_id) FROM events;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.