Merging records by 1+ common elements


April 2019


22 time


I have a hive table with the following schema:

int key1 # this is unique
array<int> key2_list

Now I want to merge records here if their key2_lists have any common element(s). For example, if record A has (10, [1,3,5]) and B has (12, [1,2,4]), I want to merge it as ([10,12], [1,2,3,4,5]) or ([1,2,3,4,5]).

If it is easier, the input table can have the following schema instead:

int key1
int key2

If this can be done through Hive or SparkSQL, that would be my preference.



1 answers


for your second table definition you can run something like

select collect_list(key1) from yourtable group by key2

Or for your first table definition

select collect_list(key1) from 
(select key1,  mykey FROM yourtable LATERAL VIEW explode(key2_list) a AS mykey) t
group by mykey

I hope it helps.