Postgresql Group by array elements in common

Question

I have a table like this:

CREATE TABLE preferences (name varchar, preferences varchar[]);
INSERT INTO preferences (name, preferences) 
VALUES 
    ('John','{pizza, spaghetti}'), 
    ('Charlie','{spaghetti, rice}'), 
    ('Lucy','{rice, potatoes}'), 
    ('Beth','{bread, cheese}'), 
    ('Trudy','{rice, milk}');

So from the table

John      {pizza, spaghetti}
Charlie   {spaghetti, rice}
Lucy      {rice, potatoes}
Beth      {bread, cheese}
Trudy     {rice, milk}

I would like group all rows that have elements in common (even if it is through other people). So in this case I would like to end up with:

{John,Charlie,Lucy,Trudy}     {pizza,spaghetti,rice,potatoes,milk}
{Beth}                        {bread, cheese}

because Johns preferences intersect with those of Charlie, and those of Charlie intersect with those of Lucy and with those of Trudy.

I already haven an array_intersection function like this:

CREATE OR REPLACE FUNCTION array_intersection(anyarray, anyarray)
  RETURNS anyarray
  language sql
as $FUNCTION$
    SELECT ARRAY(
        SELECT UNNEST($1)
        INTERSECT
        SELECT UNNEST($2)
    );
$FUNCTION$;

and know the array_agg function to aggregate arrays, but how to turn those into a grouping like I want is the step I am missing.

how you plan to use your function?.. create aggregation on it?.. compare with window?.. — Vao Tsun
– Vao Tsun, Commented Sep 29, 2017 at 9:52
you want array_intersects, because you can't use && operator?.. how is it different?.. — Vao Tsun
– Vao Tsun, Commented Sep 29, 2017 at 10:04
@VaoTsun This is just a sample to reduce the question to the bare essentials. IRL the data is more complex, but basically I want to group by in the way illustrated in the example. From the first table to the second. On all other columns in the real world case I will use aggregate functions. — Dolf Andringa
– Dolf Andringa, Commented Sep 29, 2017 at 12:58
@VaoTsun I can indeed (and did) remove my array_intersects function in favor of &&. I kind of forgot about &&. But it doesn't change the question. — Dolf Andringa
– Dolf Andringa, Commented Sep 29, 2017 at 12:59

klin · Accepted Answer · 2017-09-29 13:48:58Z

This is a typical task for recursion. You need an auxiliary function to merge and sort two arrays:

create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
    returns anyarray
    language sql immutable
as $function$
    select array_agg(distinct elem order by elem)
    from (
        select unnest(arr1) elem 
        union
        select unnest(arr2)
    ) s
$function$;

Use the function in the recursive query:

with recursive cte(name, preferences) as (  
    select *
    from preferences
union
    select p.name, array_merge(c.preferences, p.preferences)
    from cte c
    join preferences p 
    on c.preferences && p.preferences 
    and c.name <> p.name
)
select array_agg(name) as names, preferences
from (
    select distinct on(name) *
    from cte
    order by name, cardinality(preferences) desc
    ) s
group by preferences;

           names           |             preferences              
---------------------------+--------------------------------------
 {Charlie,John,Lucy,Trudy} | {milk,pizza,potatoes,rice,spaghetti}
 {Beth}                    | {bread,cheese}
(2 rows)

Collectives™ on Stack Overflow

Postgresql Group by array elements in common

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related