I have a hive table with the following schema:
int key1 # this is unique array<int> key2_list
Now I want to merge records here if their key2_lists have any common element(s). For example, if record A has (10, [1,3,5]) and B has (12, [1,2,4]), I want to merge it as ([10,12], [1,2,3,4,5]) or ([1,2,3,4,5]).
If it is easier, the input table can have the following schema instead:
int key1 int key2
If this can be done through Hive or SparkSQL, that would be my preference.