# Stats Stack Exchange

We have two metric (continuous) variables, say X and Y and are interested in a correlation between X an Y. Actually, a correlation test (Pearson or Spearman) is not significant, i.e. it does not reject the null-hypothesis of no correlation. However,...
From: Stats Stack Exchange | By: Herbert_Muc | Wednesday, October 29, 2014
I am a complete newbie to regression trees so maybe I am not understanding it properly. I got the following tree from my analysis (function tree() from R package tree): This is nice, but how can I get more precise tree? For example, the clc_312 variable...
From: Stats Stack Exchange | By: Curious | Friday, October 31, 2014
I have two questions on this subject: (1) The literature on propensity score (PS) consistently discusses the ability of PS to balance groups with different treatments. Does PS allow for balancing on exposures other than treatment, such as being a case...
From: Stats Stack Exchange | By: Adam Robinsson | Friday, October 31, 2014
I am using SVM for a prediction task. My sample size is small, only N=140. Suppose I want to compare the prediction accuracy when using two different feature selection methods. Would it be better to: create a hold-out set that contains e.g. 40 samples,...
From: Stats Stack Exchange | By: user59707 | Friday, October 31, 2014
How to calculate Median Error Distance? I'm looking at "Schulz A. et al. A Multi-Indicator Approach for Geolocalization of Tweets". They are calculating Median Error Distance. But how they are do this?
From: Stats Stack Exchange | By: nub | Friday, October 31, 2014
I asked a question earlier about comparing models using Precision-Recall AUC. One of the answers included the following statement: "The larger the fraction of positives in the data set, the larger the area under the PR curve will be for a given model"....
From: Stats Stack Exchange | By: Jack Twain | Wednesday, October 29, 2014
how to analyse GAM-style effects plots for interpreting qrnn models. I couldn't quite understand it from R documentation.
From: Stats Stack Exchange | By: Nim J | Friday, October 31, 2014
In system identification, parameter estimation I have found in several papers that an analytical bound is derived which is the CRB of the error variance of the estimates. It is shown that the MSE of the estimates reaches the CRB of the error variance...
From: Stats Stack Exchange | By: Ria George | Wednesday, October 29, 2014
Is there a name for approximating the Hessian as the outer product of the gradient with itself? If one is approximating the Hessian of the loss, then the outer product of the gradient with itself is the Fisher information matrix. What about in general?...
From: Stats Stack Exchange | By: Neil G | Friday, October 31, 2014
This may sound like a silly question, but I'm not a statistician, mathematician, or programmer, really. I did quite a bit of googling, including looking into Stack Exchange, but couldn't find a clear answer. I have three plots and do the corresponding...
From: Stats Stack Exchange | By: user3790338 | Friday, October 31, 2014
Suppose a $p \times 1$ vector $x \sim N_p(\boldsymbol 0, \boldsymbol \Sigma_1)$. Now, There is another covariance matrix $\boldsymbol \Sigma_2$. We know that $|\boldsymbol \Sigma_2| < |\boldsymbol \Sigma_1|$, where $|\cdot|$ is the determinant. Is...
From: Stats Stack Exchange | By: user154969 | Friday, October 31, 2014
In the process of designing a quiz application that can assess the student for understanding of a particular concept, I came across Item Response Theory. I have absolutely no clue whether applying this would make my life easy or will complicate further....
From: Stats Stack Exchange | By: labyrinth | Friday, October 31, 2014
I would like to use Gaussian Process regression for the first time, with the covariance kernel $\frac{\beta^{\alpha}}{(x + x' + \beta) ^ \alpha}$ where $\alpha$ and $\beta$ are hyperparameters. I was hoping to use scikit-learn's GaussianProcess module,...
From: Stats Stack Exchange | By: 1'' | Friday, October 31, 2014
Suppose the model is $$Y = b_0 + b_1X_1 + b_2X_2 + b_3D + b_4X_1D + e \\ e \sim\mathcal N(0, \sigma^2)$$ Where $D$ is a categorical variable. $$E(Y|X_1, X_2, D=1) \sim\mathcal ?? \\ E(Y|X_1, X_2, D=0) \sim\mathcal ??$$ I want the sampling distribution...
From: Stats Stack Exchange | By: robbieboy74 | Friday, October 31, 2014
I have a time series of proportions that typically fall in the 0.01-0.05 range. I had intended to use GLM to model these proportions, but I ran into trouble when I needed to first remove a strong seasonal component in the data. After deseasonalization,...
From: Stats Stack Exchange | By: Jay L | Friday, October 31, 2014
How can I calculate the variance of the precision in a normal distribution, knowing i used a conjugate prior? Thanks
From: Stats Stack Exchange | By: blitzstat | Friday, October 31, 2014
I am working in R. I use lm() for maximizing the likelihood in the first analysis, and STAN to sample from the posterior in a second analysis. require(rstan) I have fabricated some data. set.seed(123) N <- 1000 data <- as.data.frame(sapply(1:3,function(x)rnorm(N)))...
From: Stats Stack Exchange | By: ndoogan | Friday, October 31, 2014
I am running two regressions in Stata: one without controls and another with controls. I'm using lincom to find the coefficient and se for the sum of two of my regressors. I am then using outreg2 to create an Excel table with my results. So, here is...
From: Stats Stack Exchange | By: Zena | Friday, October 31, 2014
I am trying to understand how to interpret the output for cvFit(). The data is from UCI's ML repository. This is my model model <- rpart(religion ~ circles + crosses + saltires + quarters + sunstars + crescent + triangle, data=traindata, method="class",...
From: Stats Stack Exchange | By: Blou91 | Friday, October 31, 2014
I am currently exploring the gbm functions in the package dismo to create boosted regression trees for species distribution modeling. I have been using the dismo vignettes as well as the 2008 paper "A working guide to boosted regression trees" by Elith...
From: Stats Stack Exchange | By: GNG | Thursday, October 30, 2014
Consider the following three phenomena. Stein's paradox: given some data from multivariate normal distribution in $\mathbb R^n, \: n\ge 3$, sample mean is not a very good estimator of the true mean. One can obtain an estimation with lower mean squared...
From: Stats Stack Exchange | By: amoeba | Thursday, October 30, 2014
The model generating the observation is of the form $y_n = A^Tx_n + U_n$ where $x$ is the output of a a linear stationary model and $U$ is a zero mean Gaussian noise of known variance. The set of unknown parameters $\theta$ are the coefficients of the...
From: Stats Stack Exchange | By: Srishti M | Thursday, October 30, 2014
As explained in this course handout (page 1), a linear model can be written in the form: $$y = \beta_1 x_{1} + \cdots + \beta_p x_{2} + \varepsilon_i$$ , where $y$ is the response variable and $x_{i}$ is the $i^{th}$ explanatory variable. Often with...
From: Stats Stack Exchange | By: Remi.b | Thursday, October 30, 2014
Is there an accepted method for separating out a validation set in python? In R I would use the sample function. I have 4000 training instances as json and I want to save out a validation set. Should I just randomly pick indices and separate those out?...
From: Stats Stack Exchange | By: inquisitiveIdiot | Thursday, October 30, 2014
I have plotted Kaplan-Meier curves of survival times for cancer patients with above and below average levels of Copy Number Variation (CNV), and performed a log-rank test for each cancer type. I have also fit a univariate Cox Model for each cancer type,...
From: Stats Stack Exchange | By: user5064 | Thursday, October 30, 2014
I am working on Sales data. i have binary variable win/loss the opportunities and rest are the activities done by sales force (sales guys) with 40+ variables (different types of activities done for the Opportunity) I build the logistic model on the available...
From: Stats Stack Exchange | By: user43247 | Thursday, October 30, 2014
I am currently trying to analysis some data in R using Cox proportional hazards. I have been able to get my coxph model to run but, I am having some coding difficulties. I have two factors (individual and trt) with 32 and 2 levels. I am trying to compare...
From: Stats Stack Exchange | By: Corin White | Thursday, October 30, 2014
I'm trying to apply GLMs on a dataset in which dependent variable Y is dichotomous. I applied either logit and probit models, and probit fitted better than logit model. How do I justify the choice of the probit on the logit model? #use of link=logit...
From: Stats Stack Exchange | By: Roberto | Thursday, October 30, 2014
This is a simple question. I've fitted a model with 1334 variables using elastic net to perform feature selection and regularization. I'm now trying to interpret the obtained coefficients in order to find correlations between the input variables and...
From: Stats Stack Exchange | By: jmnavarro | Thursday, October 30, 2014
I am looking for a very flexible bell shape function, with asymmetry on both sides of the bell, also with the possibility that the left arm of the bell had a milder slope while the right had a steep fall. Any hints, please?
From: Stats Stack Exchange | By: Przemyslaw Remin | Thursday, October 30, 2014
I have three time point data of my intervention study(pre-test, post-test and follow-up test,0, 2.5 month and 3.5 month from initial data). I am confused with slope loading how to load this value. I tried with 0, 2.5 and 3.5 loading I got model fit and...
From: Stats Stack Exchange | By: joby | Thursday, October 30, 2014
I'm struggeling to understand the topic of deviance. Let's have two models as follows: Model 1: glm.nb(Resp ~ Parm1 + Parm2 + Parm3) Model 2: glm.nb(Resp ~ Parm1 + Parm2) The only difference between the two models is that the Parm3 was removed for model...
From: Stats Stack Exchange | By: Airone | Thursday, October 30, 2014
Likelihood I have a sample of data (observed) which follow a Gamma distribution. However these observations are conditional on the success of a particular event. Prior The condition highlighted above follows a Binomial distribution. Posterior How do...
From: Stats Stack Exchange | By: Kuda | Thursday, October 30, 2014
If I generated two random variables with mean $\mu_1$ and $\mu_2$, but use the covariance matrix as the second parameter of the normal distribution - does this imply that the two variables are jointly normally distributed?
From: Stats Stack Exchange | By: Pegah | Thursday, October 30, 2014
I want to study the relationship between twe variables. I've got the following scatter plot. But now I'm hesitating on what to do with this: 1) Should I check the assumptions of OLS and then use the lm function? 2) Or should I remove some outliers first?...
From: Stats Stack Exchange | By: Anita | Thursday, October 30, 2014
Assume a simple clinical study with N=200. Half of the participants are men and half of the participants are women. The hemoglobin of the participants ranges between 80 and 150. There's also several other variables. I would like to split the data into...
From: Stats Stack Exchange | By: learner | Thursday, October 30, 2014
I'm interested in a continuous variable, namely blood pressure. The higher the blood pressure, the greater the risk of heart attack and stroke. However, observational data frequently report that also low blood pressure is associated adverse outcomes....
From: Stats Stack Exchange | By: Adam Robinsson | Thursday, October 30, 2014
I'm trying to get a predictive density and currently getting something which I know can't be true (based on both logic and simulation based techniques. Here's the relevant information. $\theta$ is a probability and thus $0 \leq \theta \leq 1$ $p(x|\theta)... From: Stats Stack Exchange | By: BrewStats | Thursday, October 30, 2014 smile frown I want to use ARIMA model ,I use both S+ and R but I don't now witch is better, what are your suggestion to use R or S+ package for time series analysis and ARIMA model From: Stats Stack Exchange | By: mnf | Thursday, October 30, 2014 smile frown Why eigen vector turns towards direction of maximum variance?.I have observed that classical PCA fails in presence of outliers.Can u explain From: Stats Stack Exchange | By: user59635 | Thursday, October 30, 2014 smile frown I have a question regarding re-leveling in lme4 1.1-7. Experimental Design: Our experiment is an eyetracking while reading study (single sentence stimuli). We are analyzing four different continuous eyetracking DVs over three different regions of interest.For... From: Stats Stack Exchange | By: D T | Thursday, October 30, 2014 smile frown I would like to extract the slopes for each individual in a mixed effect model, as outlined in the following paragraph Mixed effects models were used to characterize individual paths of change in the cognitive summary measures, including terms for age,... From: Stats Stack Exchange | By: Andrews | Thursday, October 30, 2014 smile frown I am trying to calculate differential entropy over my data. This is how a subset of my data set looks like :- test.Kidney.meth 0.0666666 0.129032 0.0333333 0 0 0 test.Liver.meth 0.25 0.0625 0.1875 0 0 0 test.brain.meth 0.0192308 0 0.0196079 0.0526316... From: Stats Stack Exchange | By: saad khan | Thursday, October 30, 2014 smile frown My specific example: I have a dataframe, say 'df', containing columns of x coordinates, y coordinates, elevation, and I have a vector of precipitation values associated with the x/y coordinates (say the variable name is 'prec'). I can create a 3d representation... From: Stats Stack Exchange | By: swu4 | Thursday, October 30, 2014 smile frown I have a distribution of samples with a small number of values in each one (less than$10\$). I have calculated the median for each sample, which I want to compare with a model and obtain the difference between the model and the median of each sample....
From: Stats Stack Exchange | By: Py-ser | Thursday, October 30, 2014
here's my model: respond = B0 + B1resplast + B2avggift + B3propresp + B4mailsyear + u Please correct me if I've done this wrong, but here's what I have so far: for regressors = x, var(u|x) = (sigma^2)*x Weight that I will use for WLS is 1/x. The question...
From: Stats Stack Exchange | By: Marty | Thursday, October 30, 2014
I have created a plot of NSW using the oz package in R and I would like to overlay points where hospitals are located. I have the lat and long of the hospitals but I don't know how to plot them on the existing plot. I am a new user of R, any help would...
From: Stats Stack Exchange | By: Jon | Thursday, October 30, 2014
I am doing a meta-analysis on a group of studies, where the observation is a rate (no. of successes / no of trials). Some of the studies have small sample sizes (n<10) and/or did not observe any successes. Because of this I did not do the typical...
From: Stats Stack Exchange | By: Kevin | Thursday, October 30, 2014
I am running logistic regression models to compare the impact of different indicators using Stata. As these comparisons may lead to false conclusion due to confounding and rescaling if log-odds or odds ratios are compared (see Karlson/Holm/Breen 2012...
From: Stats Stack Exchange | By: non-numeric_argument | Wednesday, October 29, 2014
I am conducting a small numerical study by simulating random variables to show that a formula I derived does indeed work. I am trying to get this work peer reviewed so I would like to know what the best practice is for these type of studies. I usually...
From: Stats Stack Exchange | By: user27271 | Wednesday, October 29, 2014
