Questions tagged [statistics]

1

votes
0

answer
331

Views

sklearn Linear Regression vs Batch Gradient Descent

tldr: Why would sklearn LinearRegression give a different result than gradient descent? My understanding is that LinearRegression is computing the closed form solution for linear regression (described well here https://stats.stackexchange.com/questions/278755/why-use-gradient-descent-for-linear-reg...
bradm707
1

votes
0

answer
29

Views

How to understand the equation used for proving the transitivity property for correlation?

I am trying to understand the transitivity proof for correlation. That is if X is highly correlated to Y and Y is highly correlated to Z, then is it necessary that X and Z are also highly correlated. I found the equation which is used for proving this statement and that is: Corr(X,Y) = Corr(Y,Z)*Cor...
neha
1

votes
0

answer
37

Views

How to efficiently store a constant stream of stats

I'm sure this has been asked dozens of times but I can't seem to find the correct terms to google for to get the info I need. I look after a video streaming platform built in Asp.Net MVC 5.2. We film and stream live events. Some of our events have thousands of users watching at a time, sometimes it'...
AsciiSmoke
1

votes
0

answer
63

Views

difference between OLS and f_oneway (Anova)

I did anova test b/w 'density' : continuous, 'quality':catagorical, but got different F-statics and p-value. now I am confused which one should i use to reject or support null hypothesis. import statsmodels.formula.api as smf import patsy formula = 'density ~ C(quality)' y,x = patsy.dmatrices(form...
evil genius
1

votes
0

answer
69

Views

Sampling XYZ along a vector in R with error

How do I randomly sample XYZ values along the red vector with some predefined error? i.e. what is an x+error, y+error, z+erro set of values halfway along the red line? The red vector is basically a subset of the cloud data that I am interested in. The code below will produce the plot attached. #...
ITM
1

votes
1

answer
235

Views

Statistics Mode function Exceptions in Python

I have compiled a small code using the mode function from the statistics library in python. The code is basically taking input from sensors, listing them in an array of 10 inputs and then finding the mode in that list. The problem is that as soon as there are 2 equally common values, the codes gives...
SohaibAJ
1

votes
0

answer
186

Views

Does the R survey package have a function like prop.test for comparing two population proportions?

I am working with a database that involves weighted statistics and complex survey design, namely the National/Nationwide Inpatient Sample, so I am using R's 'survey' package for tasks like summary statistics, regression, etc. One test I was interesting in running was comparing the population of a c...
johntitor761
1

votes
0

answer
100

Views

Maximizing Pearson Correlation, what should be loss function

I'm using Keras for a deep learning prediction task and my aim is to maximize Pearson correlation between predicted values and true labels. What would be the ideal loss function to use in this scenario?
megan adams
1

votes
0

answer
35

Views

Azure resource ID reported depending on consumed volumes

guys, couldn't find similar question, so asking here. We have a client to Microsoft REST API, and we receive consumed usage normally for multiple subscriptions. But there's a problematic point. There are some resource types, which are billed depending on the consumed volume. Each of these has got i...
Ol_dirty
1

votes
3

answer
44

Views

Session duration in R

Is there any way in R to compute duration for each session when data is like these : actionId;SessionId;Date 1;1;'2018-02-02 08:10:00' 2;1;'2018-02-02 08:30:00' 3;1;'2018-02-02 09:01:00' 4;2;'2018-03-01 09:01:00' 5;2;'2018-05-10 09:01:00' Thx
aa8
1

votes
0

answer
36

Views

“Filtering” input data for analysis in Python

I have a large set of data on which I have to perform a lot of serach operations. In order to reduce the number of data points, the data is 'compressed' by merging every continuous positive-slope or negative-slope point into a single point representing a local maximum or minimum, and also recording...
1

votes
0

answer
29

Views

ARTool package in R - multiple within factors

I have recently discovered the ARTool package for R (https://cran.r-project.org/web/packages/ARTool/) when looking for a non-parametric alternative for a repeated measures ANOVA. I have used ARTool and find it really very useful, but I came across a problem, that I am not sure how to deal with. Spec...
Jan Wiener
1

votes
1

answer
40

Views

adehabitat compana() doesn't work or returns lambda=NaN

I'm trying to do the compositional analysis of habitat use with the compana() function in the adehabitatHS package (I use adehabitat because I can't install adehabitatHS). Compana() needs two matrices: one of habitat use and one of avaiable habitat. When I try to run the function it doesn't work (it...
Franza
1

votes
0

answer
77

Views

Compare many CSV files to get Ranks

I have many csv files that contain daily product sales ranked by position Sale_2018_04_10.csv: position Products 1 product_a 2 product_b 3 product_c 4 product_d Sale_2018_04_11.csv: position Products 1 product_c 2 product_d 3 produ...
Fatima TT
1

votes
1

answer
98

Views

Random integer generating

I've faced with the curious question. Maybe someone could guide me to relevant literature. So, in Python, I've created this method, which appends random integers to set until repeated value occurs. When a generated integer is not unique for particularly set, method brakes: import random def count_no...
armavox
1

votes
0

answer
34

Views
1

votes
0

answer
17

Views

Analyzing Multiple Records per ID using Python

I have a data frame that looks something like this: ID Date Name ColA ColB ColC ColD Column_Interest 1 09/12 Ann String String String String OneThing 2 09/13 Pete String String String String OneThing 2 09/13 Pete String String String Strin...
REFER
1

votes
0

answer
316

Views

AdStock Transformation in R

I am referencing this document here: https://mpra.ub.uni-muenchen.de/7683/4/Adstock On Page 6 there is a formula for AdStock Transformation that looks like this: I found an R code that reproduces this adstock transformation below: https://analyticsartist.wordpress.com/2013/11/02/calculating-adsto...
nak5120
1

votes
0

answer
148

Views

How to produce a cosinor model with a linear component in R?

Hi StackOverflow community, I'm not a programmer but a self-taught R user for stats and data visualisation. This is my first question as I have always found posts from other members answered my questions, so thank you and please forgive any etiquette missteps in my question. I'm trying to create a c...
Katy J
1

votes
0

answer
152

Views

Fleiss-kappa score for interannotator agreement

In my dataset I have a set of categories, where for every category I have a set of 150 examples. Each example has been annotated as true/false by 5 human raters. I am computing the inter-annotator agreement using the Fleiss-kappa score: 1) for the entire dataset 2) for each category in particular Ho...
Crista23
1

votes
0

answer
94

Views

Websocket usage statistics

Is there any information about how many sites use websockets? Trends etc, something like https://w3techs.com/technologies/details/ws-nodejs/all/all? Thank you!
Elias Goss
1

votes
0

answer
20

Views

Statistical significance test for ranked data

I have a list of rankings in the following format: Item | Score | Rank item1 | 0.97 | 6 item2 | 0.53 | 4 item3 | 0.05 | 1 item4 | 0.68 | 5 item5 | 0.10 | 2 item6 | 0.29 | 3 I want to determine whether the difference between each two pair of ranked items is significant given the scores. What s...
Crista23
1

votes
0

answer
14

Views

Analysing simulation result of event sequences with branch

So I have a problem where a sequence of A1 > B1 > C1 > D1 or A1 > B1 > C2 > D2 or A1 > B1 > C2 > D3 or A2 > B2 > C3 > D4 Note there's more than 1 root starting point too. Each stage also has some other properties to it. So I'd want to ask find all stage (regardless of ABCD) where property 1 = som...
Sleeper Smith
1

votes
1

answer
56

Views

How to get range of values of a vector based on a range of values another vector?

I have a vector with statistical numerical values (Time). I have a second vector also with numerical statistical values. (Distance) I have calculated the quartile(using quantile function) of the second vector (Distance) and created a new third vector with values 1 to 4, so I can see to which quart...
AL.
1

votes
1

answer
427

Views

Calling R function from Javascript (Node)

I'm trying to call a function from R script. I'm using r-script like bellow : var R = require('r-script'); var out = R('SampleR.R').data(5, 20).callSync(); console.log(out); It returns undefined for R('SampleR.R'). Here is my R script, it's very simple script, just for testing. print('Hello') Please...
Hamid
1

votes
0

answer
73

Views

Integration and false convergence of optimization in R

I am trying to find MLEs of three positive parameters a, mu and theta, and then the value of a function, saying f1. f1
C.C.
1

votes
1

answer
301

Views

How to calculate hazard ratio with coxph output

I have successfully got summary output from 'coxph'. However, now I am curious how to get the hazard ratio from these numbers? Is there a calculation I can do with what I have, or is there a certain code in R that will produce what I want?Output Image
Carson
1

votes
0

answer
93

Views

How is score of a NODE calculated in Hill Climb using bnlearn in R

I am working on my first assignment using bnlearn package to perform EDA. I have created a network using hill climb (hc) in R with all the default values. BUT there are few nodes in Bayesian Network which does NOT have any predecessor or successor node in the Directed acyclic graph(DAG) created. Wh...
Maddy
1

votes
0

answer
121

Views

Query on Arimax results

We’ve run Arimax models in R. We had a lot of queries around the interpretation of the outputs and how businesses use these results. Dataset has quarterly growth variables from 2014 - 2018 (Train data) Test data contains data quarterly data for 2019. Dependent variable = Volume (Growth %) Independ...
Sanchi Bhatia
1

votes
0

answer
31

Views

SQL Server statistics for all tables within database

I want to use this query but every time I executed it the result was empty. Do you have any idea why? DECLARE @name VARCHAR(50) DECLARE db_cursor CURSOR FOR SELECT name FROM sys.tables OPEN db_cursor FETCH NEXT FROM db_cursor INTO @name WHILE @@FETCH_STATUS = 0 BEGIN SELECT s.name AS statistics_n...
DavidLinares
1

votes
0

answer
54

Views

Fat tail with D3 (v4) histogram function means empty bins

I am trying to make good use of the D3 histogram function and am struggling due to an awkward fat tail data distribution. The data_points array below pertain to country population densities across multiple years. Cities like Hong Kong with high population densities are responsible for the fat tail....
Noobster
1

votes
0

answer
181

Views

Wrong daily interval from UsageStatsManager

I try to get app usage stats for yesterday from 21:00 to 21:00 by UTC, my time zone is UTC+3. My code: private String createReport() { UsageStatsManager usageStatsManager = (UsageStatsManager) getApplication().getSystemService(Context.USAGE_STATS_SERVICE); if (usageStatsManager != null) { List usag...
1

votes
1

answer
176

Views

Error in boot.ci() function in R

I am trying to calculate bootstrap confidence intervals. Here is my code. library(boot) nboot
sendHelpPlease
1

votes
0

answer
165

Views

PYMC3: How to use math.switch for high dimensional random variables

I am currently trying to implement change point detection using this guide: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb It uses a switch statement to decide between the p...
Chaow Wu
1

votes
1

answer
139

Views

Repeated measures ANOVA and link to mixed-effect models in R

I have a problem when performing a two-way rm ANOVA in R on the following data (link : https://drive.google.com/open?id=1nIlFfijUm4Ib6TJoHUUNeEJnZnnNzO29): subjectnbr is the id of the subject and blockType and linesTTL are the independent variables. RT2 is the dependent variable I first performed th...
user9935785
1

votes
0

answer
33

Views

Using boot on subset of data

From the boot documentation, it looks like it should be possible to pass information about which observations to use for bootstrapping directly via the command, but I cannot figure out how to access the indices. To take a simple example, say I want to use only cars with automatic transmissions for m...
Claus Portner
1

votes
1

answer
80

Views

gstat in R - Variogram cutoff distance is not working at larger specified distances with large gridded datasets

I am attempting to compute variograms in R with the gstat package of biomass data across management areas. The biomass data is a raster dataset with a 3.5 ft resolution or 1.0668m. The size of the spatialpointsDataFrame I am passing to the variogram function is 18.6 Mb (814223 elements). (I have als...
vsjansen
1

votes
0

answer
236

Views

Error using boot.ci function in R “estimated adjustment 'w' is infinite”

While calculating bootstrap confidence intervals for means for some data using the boot.ci command, I get the an error. In the same dataset, it is working for some data and not for others. my.mean = function(x, indices) { return( mean( x[indices] ) ) } pakkeSTdependents.boot = boot(pakkeST$B4..Depe...
Daktre
1

votes
0

answer
38

Views

R Tsp attribute coerces order comparison logicals into dates

A weird bug/feature that occurs when applying stats::lag to a date object and then using an order operation returns another date. date1 [1] '1970-01-02' As it turns out, as.Date('1970-01-02') == structure(TRUE, class = 'Date'). Similarly, date1 > lag(date2) yields as.Date('1970-01-01'), which is st...
AJP123
1

votes
0

answer
78

Views

Sampling from discrete distribution without replacement where the probabilities change each draw

I have a sequence S = (s1,s2,...sk) with probability weights for each sequence site P = (p1,p2,...pk) where the sum of P = 1 maximum length of S may be around 10^9 By Simulation a site k is picked and modified after each draw , as reason the pk also changes each run through. Expected number of sit...
Prometheus

View additional questions