0

I have a trouble with a "sub select" query into a query :

select
 f.timestamp::date as date,
   user_id,
   activity_type,
   f.container_id as group_id,
   (
      select
          string_agg(distinct("userId"), ',') as group_owners
        from
          jusers_groups_copy g
        where
          g.place_id = f.container_id
          and state like 'owner'
    ) as group_owners
 from
   fact_activity f
 where
   f.container_type like '700'
   and f.timestamp::date < to_date('2016-09-05', 'YYYY-MM-DD')
 group by
  date, user_id, activity_type, group_id
 order by
  date, user_id, activity_type, group_id

Indeed, the string_add inside takes like 20 seconds to be rendered. I used pgAdmin to explain the query and he gives me this message :

"Group  (cost=7029.62..651968.20 rows=17843 width=27) (actual time=431.017..4513.973 rows=11483 loops=1)"
"  Buffers: shared hit=139498 read=411, temp read=255 written=255"
"  ->  Sort  (cost=7029.62..7074.90 rows=18111 width=27) (actual time=430.630..667.098 rows=54660 loops=1)"
"        Sort Key: ((f."timestamp")::date), f.user_id, f.activity_type, f.container_id"
"        Sort Method: external merge  Disk: 2008kB"
"        Buffers: shared hit=1702 read=411, temp read=255 written=255"
"        ->  Seq Scan on fact_activity f  (cost=0.00..5748.76 rows=18111 width=27) (actual time=0.107..188.827 rows=54660 loops=1)"
"              Filter: ((container_type ~~ '700'::text) AND (("timestamp")::date < to_date('2016-09-05'::text, 'YYYY-MM-DD'::text)))"
"              Rows Removed by Filter: 125414"
"              Buffers: shared hit=1691 read=411"
"  SubPlan 1"
"    ->  Aggregate  (cost=36.12..36.13 rows=1 width=5) (actual time=0.315..0.318 rows=1 loops=11483)"
"          Buffers: shared hit=137796"
"          ->  Seq Scan on users_groups_copy g  (cost=0.00..36.09 rows=11 width=5) (actual time=0.041..0.266 rows=13 loops=11483)"
"                Filter: ((state ~~ 'owner'::text) AND (place_id = f.container_id))"
"                Rows Removed by Filter: 1593"
"                Buffers: shared hit=137796"
"Total runtime: 4536.074 ms"

Moreover, I tried to join the tables but the request is way more slower, like this :

select
 f.timestamp::date as date,
   user_id,
   activity_type,
   f.container_id as group_id,
   string_agg(distinct("userId"), ',') as group_owners
 from
   fact_activity f
 join jusers_groups_copy g
 on g.place_id = f.container_id
 where
   f.container_type like '700'
   and f.timestamp::date < to_date('2016-09-05', 'YYYY-MM-DD')
   and g.state like 'owner'
 group by
  date, user_id, activity_type, group_id
 order by
  date, user_id, activity_type, group_id

Finally, there is any indexes into this database, is it why the request is that slow ?

I'd like to know how to improve this request.

Thanks in advance

7
  • there "is" or "are not" any indexes? Commented Sep 5, 2016 at 14:57
  • 1
    this is not the main reason, but no reason to use "like" without wildcards: f.container_type like '700' Commented Sep 5, 2016 at 14:59
  • 2
    Try an index on fact_activity(container_type, timestamp) and use container_type = instead of container_type like and f.timestamp < to_timestamp('2016-09-05', 'YYYY-MM-DD') Commented Sep 5, 2016 at 15:01
  • i think you want indexes on both these, too, right? : g.place_id , f.container_id Commented Sep 5, 2016 at 15:02
  • 1
    type this into google "how to create index in postgres" Commented Sep 5, 2016 at 15:25

2 Answers 2

1

The biggest performance improvement without changing the query would be an index on the table in the subselect that speeds up the subselect:

CREATE INDEX nice_name ON jusers_groups_copy(place_id, state text_pattern_ops);

But I would rewrite the query as a join. That way you might get something more efficient than a nested loop, depending on your data.

Instead of

SELECT f.somecol,
   (SELECT g.othercol
    FROM jusers_groups_copy g
    WHERE g.place_id = f.container_id
      AND g.state LIKE 'owner')
FROM fact_activity f
WHERE ...;

you should write

SELECT f.somecol, g.othercol
FROM fact_activity f
   JOIN jusers_groups_copy g
      ON g.place_id = f.container_id
WHERE g.state LIKE 'owner'
  AND ...;

Depending on the join type selected, the index above (for a nested loop) or a different index can make that query fast.

Sign up to request clarification or add additional context in comments.

Comments

0

I guess you need to change some configuration in /data/postgresql.conf use the following website

pgtune

I think the most important parametres is "work_mem"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.