Select random and unique elements from a vector

Refresh

April 2019

Views

38 time

1

Say I have a simple vector with repeated elements:

a <- c(1,1,1,2,2,3,3,3)

Is there a way to randomly select a unique element from each of the repeated elements? I.e. one random draw pointing which elements to keep would be:

1,4,6 ## here I selected the first 1, the first 2 and the first 3

And another:

1,5,8 ## here I selected the first 1, the second 2  and the third 3

I could do this with a loop for each repeated elements, but I am sure there must be a faster way to do this?

EDIT:

Ideally the solution should also always select a particular element if it is already a unique element. I.e. my vector could also be:

b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn

1 answers

4

Using base R ave we could do something like

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7

This generates an index for every value of a, grouped by a and then selects one random index value in each group.


Using same logic with sapply and split

sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)

And it would also work with tapply

tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)

The reason why we need to check the length (if(length(x) > 1)) is because from ?sample

If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.

Hence, when there is only one number (n) in sample(), it takes sample from 1:n (and not n) so we need to check it's length.