# Questions tagged [statistics]

3731 questions

1

votes

1

answer

7.4k

Views

### R: Boxplot - how to move the x-axis label down?

#RGR ~ Treatment:Geno boxplot
fit

1

votes

1

answer

329

Views

### Why does variogram always plots 15 points in R?

I would like to plot a variable number of points as my sample size increases. However, for some reason the "variogram" function only plots 15 points every time.
I checked to make sure that the size of the data I'm passing "variogram" was varying correctly - it was.
library(gstat)
library(RandomFiel...

1

votes

1

answer

25

Views

### How can I get similar distribution from different groups?

I've to find in the dataset subgroups with similar average for 2 metrics than my original group.
For example, I'd like to find a city or group of cities with the closest average(metric 1) = 10 and average(metric 2) = 5.
Dataset example:
How can I do it?

-1

votes

0

answer

15

Views

### How to fix this function?

I am doing an exercise to create a function. One of the questions is:
"We can estimate the cumulative risk of an certain event using the
exponential formula
1-exp(-1/10000*t) where t is the time to the event. Create a function ans(t), which returns the risk at time t.
and I am using this command:...

1

votes

0

answer

14

Views

### why exponential smoothing returns a error for leap year in python?

here is my code -:
alpha=0.01
beta=0.9
gamma=0.6
trend= "additive"
seasonal="additive"
period=364
fit=
Exponential_method(period,alpha,beta,gamma,trend,seasonal)
here is my length of data is 366 because of leap year and i am doing forecast on daily data
here period =364 gives a error (Why it is giv...

-1

votes

0

answer

3

Views

### Quantifying and comparing vendor provided data across multiple data vendors

Layman and first time poster here. Apologies and thanks in advance.
I'm looking to test the accuracy of vendors I'm looking to hire for data collection. So far my list is narrowed to eight different providers, each delivering two to six columns of data per record using the same standard 100 record...

0

votes

0

answer

5

Views

### How to iterate and increase a counter in SPSS?

I want to do count educational advancement in my dataset in SPSS. I have some programming experience, but I am stuck with the syntax.
I have a variable my_education. I want to iteratively compare my_education with education_father and education_mother. If my_education is bigger than that of my paren...

0

votes

0

answer

27

Views

### Plot the difference between two lists of values in matplotlib

I have two datetime based lists. I want to plot the difference between their values.
The problem is, the lists are of different lengths / resolutions.
For example:
list 1 is a list of readings taken every minute throughout the day.
list 2 is a list of readings taken randomly throughout the day.
I c...

1

votes

2

answer

257

Views

### Calculate pairwise spearman's rank correlation from data present in all files in a directory

I'm trying to calculate Spearman's rank correlation, where the data (tsv with name and rank) for each experiment is stored in separate files in a directory.
Following is the format of input files:
#header not present
#geneName value
ENSMUSG00000026179.14 14.5648627685587
ENSMUSG00000026179.14...

1

votes

1

answer

22

Views

### generate a special matrix (max value of column sum is minimum) with given number of column from a vector

Recently I come across such as a question: given a vector, one need generate a special matrix with given number of column. It should be pointed out that if the elements in the vector is not enough to fill in the generated matrix, then put 0 in the last row in the generated matrix. For the generated...

1

votes

0

answer

13

Views

### SMOTE in r reducing sample size significantly

I have a data set with around 130000 records. The records divided in two class of target variable,0 & 1. 1 contains only 0.09% of total proportion.
I'm running my analysis in R-3.5.1 on Windows 10. I used SMOTE algorithm to work with this imbalanced data set.
I used following code to handle imbalanc...

0

votes

0

answer

5

Views

### How the Naive Bayes works

I already read about the naive bayes that it is a classification technique algorithm and can make predication based on the data you give, but in this example I just cant get it how the output [3,4] came.
Following the example:
#assigning predictor and target variables
x= np.array([[-3,7],[1,5], [1,2...

1

votes

0

answer

12

Views

### Split-normal distribution

What's the best way to compute a split-normal distribution given a mean value with an upper and lower error?
So far I have:
from random import choice, gauss
def random_split_normal(mu: float, upper_sigma: float, lower_sigma:int) -> float:
return abs(gauss(0.0, 1.0)) * choice([upper_sigma, -lower_sig...

1

votes

2

answer

2.1k

Views

### What is a good way to compare similarity between datasets with little variance?

Let's say I have a list of 100 MLB pitchers and 5 statistics for each of them. The difference between, for example, an ERA of 3.5 and 3.1 might not look like a lot to a naive similarity algorithm, but is a lot in baseball. Given that a lot of the player statistics that I'm looking at have this littl...

0

votes

0

answer

6

Views

### How to solve this statistical (standard deviation) problem?

Problem:
Your data set has missing values. Further examination tells you that they are spread along 1.5 standard deviation from the median with distribution mean = 0 & variance = 5. How much data would remain unaffected (tell us the %)? Why?

1

votes

1

answer

1.6k

Views

### R: Multiple Linear Regression with a specific range of variables [duplicate]

This question already has an answer here:
short formula call for many variables when building a model [duplicate]
2 answers
It appears simple, but I don't know how to code it in R.
I have a dataframe (df) with ~100 variables, and I would like to do a multiple regression between the response which i...

0

votes

0

answer

8

Views

### Can someone confirm if I am running this generalized linear model in R correctly?

I'm a grad student and stats beginner just trying to make sure I'm using the right model and using it correctly. I'm using R version 3.5.0. My data look like this:
Example Data
I have multiple BCI data points for each nest and 5 treatment groups. I want to know if there are differences in BCI betwe...

1

votes

2

answer

6.7k

Views

### Matlab Plotting Normal Distribution Probability Density Function

I am new to statistics. I have a discriminant function:
g(x) = ln p(x| w)+ lnP(w)
I know it has a normal distribution. I know mü and sigma variables. How can I plot pdf function of it at Matlab?
Here is a conversation: How to draw probability density function in MatLab? however I don't want to...

1

votes

1

answer

950

Views

### python statsmodels.tsa.stattools.pacf with masked array?

Is there a general trick to using masked arrays (or arrays containing nan's) with the statsmodels routines? For example pacf and acf?

1

votes

2

answer

98

Views

### How to perform statistical computations in a query?

I have a table which is filled with float values. I need to calculate the number of results grouped by their distribution around the mean value (Gaussian Distribution). Basically, it is calculated like this:
SELECT COUNT(*), FloatColumn - AVG(FloatColumn) - STDEV(FloatColumn)
FROM Data
GROUP BY Fl...

1

votes

1

answer

186

Views

### Creating a line from the t table using simulation (in R)

How would I go about creating a line from the t-table in R, after running a simulation for a t distribution? In essence, I want to perform the qt function using only values calculated from a random sample from the normal distribution, rather than using the confidence levels as inputs.
I have run a s...

1

votes

1

answer

2k

Views

### Python SciPy chisquare test returns different p value from Excel and LibreOffice

After reading a recent blog post about an application of the Poisson distribution, I tried reproducing its findings using Python's 'scipy.stats' module, as well as Excel/LibreOffice 'POISSON' and 'CHITEST' functions.
For the expected values shown in the article, I simply used:
import scipy.stats
for...

1

votes

2

answer

1.7k

Views

### Scale parameter in the logit model

While going thorough the logit model notes, I came across something called "scale parameter" in the likelihood. Can someone please explain what that is and what it is used for. What would happen it is not used. Also, is it used in the probit model too?
Cheers

1

votes

2

answer

421

Views

### C/C++ How to calculate the streakedness of numerical data sets?

Would anyone know how to use C/C++ to calculate the streakedness of data? The definition of streakedness is how many deviations away from the mean(i.e running average a numerical data streak. Thank you for your help.
[EDIT] From our company's chief software architect, here is the requirement for a s...

1

votes

1

answer

105

Views

### Fitting a linear model

I have a data frame that looks like
> t
Institution Subject Class ML1 ML1SD
aPhysics0 A Physics 0 0.8730469 0.3329205
aPhysics1 A Physics 1 0.8471074 0.3598839
aPhysics2 A Physics 2 0.8593750 0.3476343
aPhysics3 A Physics 3 0.8875000...

1

votes

2

answer

240

Views

### CSS3 browsers compatibility throught years

I'm trying to find a study or chart which will show the percentage of support for CSS3 in different browsers and versions. I'm looking for it for 2 hours, but the only thing I find is support for CSS3 individual parts but not the whole CSS3.
Could you help me with this?

1

votes

3

answer

1.8k

Views

### Generating a mixture of binomial distributions

I want to generate a mixture of binomial distribution. Why I need it is because
I want to have a normal discrete mixture of gaussian distributions. Is there any
scipy library available for it or can you please guide me for the algorithm.
I know in general for predefined distributions one can use ppf...

1

votes

1

answer

1.4k

Views

### Calculate autocorrelation with lag u in R

Hi I tried calculating autocorrelation with lag u, u = 1...9
I expect 9x1 autocorrelation functions. However when I try to use this code it always gave me 10x1 autocorrelation function with the first term = 1. I am not sure how to proceed.
# initialize a vector to store autocovariance
maxlag

1

votes

1

answer

127

Views

### Standardize not among columns, but small parts of columns, using R

I have a multilevel structure, and what I need to do is standardize for each individual (which is the higher level unit, each having several separate measures).
Consider:
ID measure score
1 1 1 5
2 1 2 7
3 1 3 3
4 2 1 10
5 2 2 5
6 2 3...

1

votes

1

answer

41

Views

### How to execute the version of R which installed in a local folder?

I unpacked the new version of R package and inside a folder I gave commands:
./configure
make
Now I want to run it, if I give command:
$ R
Then it runs the older version. and I have no privilege to deal with it. so I want to run the new installed version. any help?
Perhaps it needs to be exported bu...

1

votes

3

answer

114

Views

### How to determine if a current set of data values represent or relate to previous historic data values?

I am trying to develop an method to identify browsing pattern of a user on the basis of page requests.
In a simple example I have created 8 pages and for each page request from the user to the page I have stored that page's request frequency in the database as you can see below:
Now, my hypothesis...

1

votes

4

answer

500

Views

### iOS library to detect app stats

Is there any iOS library which detects various user stats within the app like time spent on a view, number of times app was activated etc.? Any suggestions will be most welcome.
Thanks.

1

votes

1

answer

626

Views

### Runing R code on `python` with SyntaxError: keyword can't be an expression error Message

I'm looking to run some R code on python
I already installed the R package robustbase on ubunto using apt-get install r-cran-robustbase and rpy packege as well.
from the python console I can successfully run from rpy import * and r.library("robustbase")
but when I run
result = robjects.FloatVector...

1

votes

2

answer

245

Views

### Finding white pixels on monitor in camera image

I have a camera pointed at a monitor displaying a line of white pixels. I get an array of byte values back from the camera. The area of the camera's view is larger than the space taken up by the monitor. I need to find out where on the camera image the white monitor pixels appear. See the sample im...

1

votes

1

answer

603

Views

### Generating random numbers from various distributions in CUDA

I am playing around with doing MCMC on the GPU, and need implementations for various samplers, written for CUDA.
Most of the posts I've seen on StackOverflow relate to uniform, binomial, and normal sampling. Are there any libraries that allow me the simplicity and variety of the d-p-q-r functions i...

1

votes

1

answer

282

Views

### PHP Random Number Generation Issue

10,000 Loops
Range 0-1
Base Average: 0.5
Base Standard Deviation: 0.288675134595
=======================================
mt_rand()
Average: 0.337839939116
Standard Deviation: 0.264176807272
---
hexdec(sha1(*GUID*))
Average: 0.37834
Standard Deviation: 0.284251515902

1

votes

1

answer

194

Views

### Two lines on a line graph with non proportional values

I am trying to get a googlecharts line graph to show me two line graphs with a Y axis of date and an x axis of total amount of substance used. It will be a line graph comparing, for example the total amount of alcohol consumed to tobacco consumed in total per each day.
The area i'm struggling with...

0

votes

0

answer

5

Views

### Model selection function

i am trying to create a new function which is choosing the best model.. First, if the data has no X variable function will automatically choose arima best model using auto.arima function. Second i have models, if there is a one X variable function will choose the best from candidate models. Third if...

1

votes

1

answer

5.6k

Views

### R: converting non-stationary to stationary

I have one data it is not stationary. I'm trying to make it stationary.
I tried log transformation, BoxCox transformation, lag(1, 2 and 3) differences.
No use of these transformations and differencing.
I used adf test to test stationarity in R.
Can anybody tell is there any other method to make it s...

1

votes

2

answer

1.2k

Views

### How do I calculate popularity of content?

I'm developing a web site where the user rates content (1-5 stars). I need to measure the popularity of the content (also referred to as importance/hotness/interest). My first thought was just to add the user ratings for a content:
Popularity = SUM(Rating - 2.5)
If two users gives it 5-stars and on...