Postgresql Group by array elements in common

Refresh

November 2018

Views

318 time

2

I have a table like this:

CREATE TABLE preferences (name varchar, preferences varchar[]);
INSERT INTO preferences (name, preferences) 
VALUES 
    ('John','{pizza, spaghetti}'), 
    ('Charlie','{spaghetti, rice}'), 
    ('Lucy','{rice, potatoes}'), 
    ('Beth','{bread, cheese}'), 
    ('Trudy','{rice, milk}');

So from the table

John      {pizza, spaghetti}
Charlie   {spaghetti, rice}
Lucy      {rice, potatoes}
Beth      {bread, cheese}
Trudy     {rice, milk}

I would like group all rows that have elements in common (even if it is through other people). So in this case I would like to end up with:

{John,Charlie,Lucy,Trudy}     {pizza,spaghetti,rice,potatoes,milk}
{Beth}                        {bread, cheese}

because Johns preferences intersect with those of Charlie, and those of Charlie intersect with those of Lucy and with those of Trudy.

I already haven an array_intersection function like this:

CREATE OR REPLACE FUNCTION array_intersection(anyarray, anyarray)
  RETURNS anyarray
  language sql
as $FUNCTION$
    SELECT ARRAY(
        SELECT UNNEST($1)
        INTERSECT
        SELECT UNNEST($2)
    );
$FUNCTION$;

and know the array_agg function to aggregate arrays, but how to turn those into a grouping like I want is the step I am missing.

1 answers

2

Это типичная задача для рекурсии. Вам нужно вспомогательную функцию, для объединения и сортировки двух массивов:

create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
    returns anyarray
    language sql immutable
as $function$
    select array_agg(distinct elem order by elem)
    from (
        select unnest(arr1) elem 
        union
        select unnest(arr2)
    ) s
$function$;

Используйте функцию рекурсивного запроса:

with recursive cte(name, preferences) as (  
    select *
    from preferences
union
    select p.name, array_merge(c.preferences, p.preferences)
    from cte c
    join preferences p 
    on c.preferences && p.preferences 
    and c.name <> p.name
)
select array_agg(name) as names, preferences
from (
    select distinct on(name) *
    from cte
    order by name, cardinality(preferences) desc
    ) s
group by preferences;

           names           |             preferences              
---------------------------+--------------------------------------
 {Charlie,John,Lucy,Trudy} | {milk,pizza,potatoes,rice,spaghetti}
 {Beth}                    | {bread,cheese}
(2 rows)