Questions tagged [sample-size]

1

votes
0

answer
13

Views

SMOTE in r reducing sample size significantly

I have a data set with around 130000 records. The records divided in two class of target variable,0 & 1. 1 contains only 0.09% of total proportion. I'm running my analysis in R-3.5.1 on Windows 10. I used SMOTE algorithm to work with this imbalanced data set. I used following code to handle imbalanc...
Sonia
3

votes
1

answer
1.9k

Views

Is there a good way to display sample size on grouped boxplots using Python Matplotlib

I could get the size info using groupby and add text to the corresponding location. But I can't help thinking there's a better way as this really seems mundane, something many people would like to see... To illustrate, the following code would generate a grouped boxplot import pandas as pd df = pd.D...
Tian He
3

votes
1

answer
395

Views

pwr.chisq.test error in R

I am now trying to estimate the sample size needed for A/B testing of website conversion rate. pwr.chisq.test always gives me error message, when I have small value of conversion rate: # conversion rate for two groups p1 = 0.001 p2 = 0.0011 # degree of freedom df = 1 # effect size w = ES.w1(p1,p2) p...
Peter Pan
2

votes
1

answer
64

Views

Simulating thousands of regressions and obtaining p-values

I'm looking to do some basic simulation in R to examine the nature of p-values. My goal is to see whether large sample sizes trend towards small p-values. My thought is to generate random vectors of 1,000,000 data points, regress them on each other, and then plot the distribution of p-values and loo...
macworthy
6

votes
1

answer
5.8k

Views

Sample size and power calculation in r as viable alternative to proc power in SAS?

So I am trying to see how close the sample size calculations (for two sample independent proportions with unequal samples sizes) are between proc power in SAS and some sample size functions in r. I am using the data found here at a UCLA website. The UCLA site gives parameters as follows: p1=.3,p2=...
user27008
2

votes
1

answer
507

Views

Optimizing for global minimum

I am attempting to use optimize() to find the minimum value of n for the following function (Clopper-Pearson lower bound): f
a.powell
2

votes
1

answer
182

Views

Stratified Bootstrapping in R with >25 strata

I have data with about 25 different groups. In an effort to see how the variance of each group would change if I had different sample sizes I am trying to do stratified bootstraping. For example at sample size 5, it should produce 1000 collections of 5 resampled points for each group. I like to coll...
andemexoax
7

votes
1

answer
8.4k

Views

Minimum number of observation when performing Random Forest

Is it possible to apply RandomForests to very small datasets? I have a dataset with many variables but only 25 observation each. Random forests produce reasonable results with low OOB errors (10-25%). Is there any rule of thumb regarding the minimum number of observations to use? In fact one of the...
Oritteropus
2

votes
2

answer
371

Views

Sample size for Named Entity Recognition gold standard corpus

I have a corpus of 170 Dutch literary novels on which I will apply Named Entity Recognition. For an evaluation of existing NER taggers for Dutch I want to manually annotate Named Entities in a random sample of this corpus – I use brat for this purpose. The manually annotated random sample will fun...
roelmetgevoel