# Questions tagged [statistics]

4977 questions

1

votes

0

answer

331

Views

### sklearn Linear Regression vs Batch Gradient Descent

tldr: Why would sklearn LinearRegression give a different result than gradient descent?
My understanding is that LinearRegression is computing the closed form solution for linear regression (described well here https://stats.stackexchange.com/questions/278755/why-use-gradient-descent-for-linear-reg...

1

votes

0

answer

29

Views

### How to understand the equation used for proving the transitivity property for correlation?

I am trying to understand the transitivity proof for correlation. That is if X is highly correlated to Y and Y is highly correlated to Z, then is it necessary that X and Z are also highly correlated. I found the equation which is used for proving this statement and that is:
Corr(X,Y) = Corr(Y,Z)*Cor...

1

votes

0

answer

37

Views

### How to efficiently store a constant stream of stats

I'm sure this has been asked dozens of times but I can't seem to find the correct terms to google for to get the info I need.
I look after a video streaming platform built in Asp.Net MVC 5.2. We film and stream live events. Some of our events have thousands of users watching at a time, sometimes it'...

1

votes

0

answer

63

Views

### difference between OLS and f_oneway (Anova)

I did anova test b/w 'density' : continuous, 'quality':catagorical,
but got different F-statics and p-value.
now I am confused which one should i use to reject or support null
hypothesis.
import statsmodels.formula.api as smf
import patsy
formula = 'density ~ C(quality)'
y,x = patsy.dmatrices(form...

1

votes

0

answer

69

Views

### Sampling XYZ along a vector in R with error

How do I randomly sample XYZ values along the red vector with some predefined error?
i.e. what is an x+error, y+error, z+erro set of values halfway along the red line?
The red vector is basically a subset of the cloud data that I am interested in.
The code below will produce the plot attached.
#...

1

votes

1

answer

235

Views

### Statistics Mode function Exceptions in Python

I have compiled a small code using the mode function from the statistics library in python. The code is basically taking input from sensors, listing them in an array of 10 inputs and then finding the mode in that list. The problem is that as soon as there are 2 equally common values, the codes gives...

1

votes

0

answer

186

Views

### Does the R survey package have a function like prop.test for comparing two population proportions?

I am working with a database that involves weighted statistics and complex survey design, namely the National/Nationwide Inpatient Sample, so I am using R's 'survey' package for tasks like summary statistics, regression, etc.
One test I was interesting in running was comparing the population of a c...

1

votes

0

answer

100

Views

### Maximizing Pearson Correlation, what should be loss function

I'm using Keras for a deep learning prediction task and my aim is to maximize Pearson correlation between predicted values and true labels.
What would be the ideal loss function to use in this scenario?

1

votes

0

answer

35

Views

### Azure resource ID reported depending on consumed volumes

guys, couldn't find similar question, so asking here.
We have a client to Microsoft REST API, and we receive consumed usage normally for multiple subscriptions.
But there's a problematic point.
There are some resource types, which are billed depending on the consumed volume. Each of these has got i...

1

votes

3

answer

44

Views

### Session duration in R

Is there any way in R to compute duration for each session when data is like these :
actionId;SessionId;Date
1;1;'2018-02-02 08:10:00'
2;1;'2018-02-02 08:30:00'
3;1;'2018-02-02 09:01:00'
4;2;'2018-03-01 09:01:00'
5;2;'2018-05-10 09:01:00'
Thx

1

votes

0

answer

36

Views

### “Filtering” input data for analysis in Python

I have a large set of data on which I have to perform a lot of serach operations. In order to reduce the number of data points, the data is 'compressed' by merging every continuous positive-slope or negative-slope point into a single point representing a local maximum or minimum, and also recording...

1

votes

0

answer

29

Views

### ARTool package in R - multiple within factors

I have recently discovered the ARTool package for R (https://cran.r-project.org/web/packages/ARTool/) when looking for a non-parametric alternative for a repeated measures ANOVA.
I have used ARTool and find it really very useful, but I came across a problem, that I am not sure how to deal with. Spec...

1

votes

1

answer

40

Views

### adehabitat compana() doesn't work or returns lambda=NaN

I'm trying to do the compositional analysis of habitat use with the compana() function in the adehabitatHS package (I use adehabitat because I can't install adehabitatHS).
Compana() needs two matrices: one of habitat use and one of avaiable habitat.
When I try to run the function it doesn't work (it...

1

votes

0

answer

77

Views

### Compare many CSV files to get Ranks

I have many csv files that contain daily product sales ranked by position
Sale_2018_04_10.csv:
position Products
1 product_a
2 product_b
3 product_c
4 product_d
Sale_2018_04_11.csv:
position Products
1 product_c
2 product_d
3 produ...

1

votes

1

answer

98

Views

### Random integer generating

I've faced with the curious question. Maybe someone could guide me to relevant literature.
So, in Python, I've created this method, which appends random integers to set until repeated value occurs. When a generated integer is not unique for particularly set, method brakes:
import random
def count_no...

1

votes

0

answer

34

Views

### Histogram Values Distorted When Using `ppc_stat()`

I am using the following data:
speed

1

votes

0

answer

17

Views

### Analyzing Multiple Records per ID using Python

I have a data frame that looks something like this:
ID Date Name ColA ColB ColC ColD Column_Interest
1 09/12 Ann String String String String OneThing
2 09/13 Pete String String String String OneThing
2 09/13 Pete String String String Strin...

1

votes

0

answer

316

Views

### AdStock Transformation in R

I am referencing this document here:
https://mpra.ub.uni-muenchen.de/7683/4/Adstock
On Page 6 there is a formula for AdStock Transformation that looks like this:
I found an R code that reproduces this adstock transformation below:
https://analyticsartist.wordpress.com/2013/11/02/calculating-adsto...

1

votes

0

answer

148

Views

### How to produce a cosinor model with a linear component in R?

Hi StackOverflow community,
I'm not a programmer but a self-taught R user for stats and data visualisation. This is my first question as I have always found posts from other members answered my questions, so thank you and please forgive any etiquette missteps in my question.
I'm trying to create a c...

1

votes

0

answer

152

Views

### Fleiss-kappa score for interannotator agreement

In my dataset I have a set of categories, where for every category I have a set of 150 examples. Each example has been annotated as true/false by 5 human raters. I am computing the inter-annotator agreement using the Fleiss-kappa score:
1) for the entire dataset
2) for each category in particular
Ho...

1

votes

0

answer

94

Views

### Websocket usage statistics

Is there any information about how many sites use websockets? Trends etc, something like https://w3techs.com/technologies/details/ws-nodejs/all/all? Thank you!

1

votes

0

answer

20

Views

### Statistical significance test for ranked data

I have a list of rankings in the following format:
Item | Score | Rank
item1 | 0.97 | 6
item2 | 0.53 | 4
item3 | 0.05 | 1
item4 | 0.68 | 5
item5 | 0.10 | 2
item6 | 0.29 | 3
I want to determine whether the difference between each two pair of ranked items is significant given the scores. What s...

1

votes

0

answer

14

Views

### Analysing simulation result of event sequences with branch

So I have a problem where a sequence of
A1 > B1 > C1 > D1
or
A1 > B1 > C2 > D2
or
A1 > B1 > C2 > D3
or
A2 > B2 > C3 > D4
Note there's more than 1 root starting point too. Each stage also has some other properties to it. So I'd want to ask
find all stage (regardless of ABCD) where property 1 = som...

1

votes

1

answer

56

Views

### How to get range of values of a vector based on a range of values another vector?

I have a vector with statistical numerical values (Time).
I have a second vector also with numerical statistical values. (Distance)
I have calculated the quartile(using quantile function) of the second vector (Distance) and created a new third vector with values 1 to 4, so I can see to which quart...

1

votes

1

answer

427

Views

### Calling R function from Javascript (Node)

I'm trying to call a function from R script. I'm using r-script like bellow :
var R = require('r-script');
var out = R('SampleR.R').data(5, 20).callSync();
console.log(out);
It returns undefined for R('SampleR.R'). Here is my R script, it's very simple script, just for testing.
print('Hello')
Please...

1

votes

0

answer

73

Views

### Integration and false convergence of optimization in R

I am trying to find MLEs of three positive parameters a, mu and theta, and then the value of a function, saying f1.
f1

1

votes

1

answer

301

Views

### How to calculate hazard ratio with coxph output

I have successfully got summary output from 'coxph'. However, now I am curious how to get the hazard ratio from these numbers? Is there a calculation I can do with what I have, or is there a certain code in R that will produce what I want?Output Image

1

votes

0

answer

93

Views

### How is score of a NODE calculated in Hill Climb using bnlearn in R

I am working on my first assignment using bnlearn package to perform EDA.
I have created a network using hill climb (hc) in R with all the default values.
BUT there are few nodes in Bayesian Network which does NOT have any predecessor or successor node in the Directed acyclic graph(DAG) created.
Wh...

1

votes

0

answer

121

Views

### Query on Arimax results

We’ve run Arimax models in R. We had a lot of queries around the interpretation of the outputs and how businesses use these results.
Dataset has quarterly growth variables from 2014 - 2018 (Train data)
Test data contains data quarterly data for 2019.
Dependent variable = Volume (Growth %)
Independ...

1

votes

0

answer

31

Views

### SQL Server statistics for all tables within database

I want to use this query but every time I executed it the result was empty. Do you have any idea why?
DECLARE @name VARCHAR(50)
DECLARE db_cursor CURSOR FOR
SELECT name FROM sys.tables
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT
s.name AS statistics_n...

1

votes

0

answer

54

Views

### Fat tail with D3 (v4) histogram function means empty bins

I am trying to make good use of the D3 histogram function and am struggling due to an awkward fat tail data distribution. The data_points array below pertain to country population densities across multiple years. Cities like Hong Kong with high population densities are responsible for the fat tail....

1

votes

0

answer

181

Views

### Wrong daily interval from UsageStatsManager

I try to get app usage stats for yesterday from 21:00 to 21:00 by UTC, my time zone is UTC+3.
My code:
private String createReport() {
UsageStatsManager usageStatsManager = (UsageStatsManager) getApplication().getSystemService(Context.USAGE_STATS_SERVICE);
if (usageStatsManager != null) {
List usag...

1

votes

1

answer

176

Views

### Error in boot.ci() function in R

I am trying to calculate bootstrap confidence intervals.
Here is my code.
library(boot)
nboot

1

votes

0

answer

165

Views

### PYMC3: How to use math.switch for high dimensional random variables

I am currently trying to implement change point detection using this guide: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb
It uses a switch statement to decide between the p...

1

votes

1

answer

139

Views

### Repeated measures ANOVA and link to mixed-effect models in R

I have a problem when performing a two-way rm ANOVA in R on the following data (link : https://drive.google.com/open?id=1nIlFfijUm4Ib6TJoHUUNeEJnZnnNzO29):
subjectnbr is the id of the subject and blockType and linesTTL are the independent variables. RT2 is the dependent variable
I first performed th...

1

votes

0

answer

33

Views

### Using boot on subset of data

From the boot documentation, it looks like it should be possible to pass information about which observations to use for bootstrapping directly via the command, but I cannot figure out how to access the indices.
To take a simple example, say I want to use only cars with automatic transmissions for m...

1

votes

1

answer

80

Views

### gstat in R - Variogram cutoff distance is not working at larger specified distances with large gridded datasets

I am attempting to compute variograms in R with the gstat package of biomass data across management areas. The biomass data is a raster dataset with a 3.5 ft resolution or 1.0668m. The size of the spatialpointsDataFrame I am passing to the variogram function is 18.6 Mb (814223 elements). (I have als...

1

votes

0

answer

236

Views

### Error using boot.ci function in R “estimated adjustment 'w' is infinite”

While calculating bootstrap confidence intervals for means for some data using the boot.ci command, I get the an error. In the same dataset, it is working for some data and not for others.
my.mean = function(x, indices) {
return( mean( x[indices] ) ) }
pakkeSTdependents.boot = boot(pakkeST$B4..Depe...

1

votes

0

answer

38

Views

### R Tsp attribute coerces order comparison logicals into dates

A weird bug/feature that occurs when applying stats::lag to a date object and then using an order operation returns another date.
date1 [1] '1970-01-02'
As it turns out, as.Date('1970-01-02') == structure(TRUE, class = 'Date'). Similarly, date1 > lag(date2) yields as.Date('1970-01-01'), which is st...

1

votes

0

answer

78

Views

### Sampling from discrete distribution without replacement where the probabilities change each draw

I have a sequence S = (s1,s2,...sk) with probability weights for each sequence site P = (p1,p2,...pk) where the sum of P = 1 maximum length of S may be around 10^9
By Simulation a site k is picked and modified after each draw , as reason the pk also changes each run through. Expected number of sit...