# Questions tagged [sample-size]

9 questions

1

votes

0

answer

13

Views

### SMOTE in r reducing sample size significantly

I have a data set with around 130000 records. The records divided in two class of target variable,0 & 1. 1 contains only 0.09% of total proportion.
I'm running my analysis in R-3.5.1 on Windows 10. I used SMOTE algorithm to work with this imbalanced data set.
I used following code to handle imbalanc...

3

votes

1

answer

1.9k

Views

### Is there a good way to display sample size on grouped boxplots using Python Matplotlib

I could get the size info using groupby and add text to the corresponding location. But I can't help thinking there's a better way as this really seems mundane, something many people would like to see...
To illustrate, the following code would generate a grouped boxplot
import pandas as pd
df = pd.D...

3

votes

1

answer

395

Views

### pwr.chisq.test error in R

I am now trying to estimate the sample size needed for A/B testing of website conversion rate. pwr.chisq.test always gives me error message, when I have small value of conversion rate:
# conversion rate for two groups
p1 = 0.001
p2 = 0.0011
# degree of freedom
df = 1
# effect size
w = ES.w1(p1,p2)
p...

2

votes

1

answer

64

Views

### Simulating thousands of regressions and obtaining p-values

I'm looking to do some basic simulation in R to examine the nature of p-values. My goal is to see whether large sample sizes trend towards small p-values. My thought is to generate random vectors of 1,000,000 data points, regress them on each other, and then plot the distribution of p-values and loo...

6

votes

1

answer

5.8k

Views

### Sample size and power calculation in r as viable alternative to proc power in SAS?

So I am trying to see how close the sample size calculations (for two sample independent proportions with unequal samples sizes) are between proc power in SAS and some sample size functions in r. I am using the data found here at a UCLA website.
The UCLA site gives parameters as follows:
p1=.3,p2=...

2

votes

1

answer

507

Views

### Optimizing for global minimum

I am attempting to use optimize() to find the minimum value of n for the following function (Clopper-Pearson lower bound):
f

2

votes

1

answer

182

Views

### Stratified Bootstrapping in R with >25 strata

I have data with about 25 different groups. In an effort to see how the variance of each group would change if I had different sample sizes I am trying to do stratified bootstraping. For example at sample size 5, it should produce 1000 collections of 5 resampled points for each group. I like to coll...

7

votes

1

answer

8.4k

Views

### Minimum number of observation when performing Random Forest

Is it possible to apply RandomForests to very small datasets?
I have a dataset with many variables but only 25 observation each. Random forests produce reasonable results with low OOB errors (10-25%).
Is there any rule of thumb regarding the minimum number of observations to use?
In fact one of the...

2

votes

2

answer

371

Views

### Sample size for Named Entity Recognition gold standard corpus

I have a corpus of 170 Dutch literary novels on which I will apply Named Entity Recognition. For an evaluation of existing NER taggers for Dutch I want to manually annotate Named Entities in a random sample of this corpus – I use brat for this purpose. The manually annotated random sample will fun...