You need to play a bit with the two parameters avaiable from the function:
As per the doc from
The parameters perc.over and perc.under control the amount of
over-sampling of the minority class and under-sampling of the majority
perc.over will tipically be a number above 100. With this type of
values, for each case in the orginal data set belonging to the
minority class, perc.over/100 new examples of that class will be
I can't see your data but, if your minority class has 100 cases and
perc.over=100, the algorithm will generate 100/100 = 1 new cases from that class.
The parameter perc.under controls the proportion of cases of the
majority class that will be randomly selected for the final "balanced"
data set. This proportion is calculated with respect to the number of
newly generated minority class cases.
So for example a value of
perc.under=100 will select from the majority class on the original data the same amount of observation that have been generated for the minority class.
In our example just 1 new case was generated so it will add just another one, resulting in a new dataset with 2 cases.
I suggest to use values above 100 for
perc.over, and an even higher value for
perc.under (defaults are 100 and 200).
Keep in mind that you're adding new observations that are not real in your minority class, I'd try to keep these under control.
data <- data.frame(var1 = sample(50),
var2 = sample(50),
out = as.factor(rbinom(50, 1, prob=0.1)))
# 0 1
# 43 7 # 50 rows total (original data)
smote_data <- DMwR::SMOTE(out ~ var1, data, perc.over = 200, perc.under = 400)
# 0 1
# 56 21 # 77 rows total (smote data)