It is quite common that data sets will contain missing values in them. Suppose we want to try to fill in the missing values. For this we have techniques such as single/multiple imputation and matrix completion methods. In general, are matrix completion...

From: Stats Stack Exchange | By: GXR | Saturday, December 27, 2014

my method of cross validation is to first split my sample into two sub-samples with 80% respectively 20% of the observations and then to correlate the predicted values of my model (created with the bigger sample) with the actually observed values (of...

From: Stats Stack Exchange | By: 00schneider | Sunday, December 28, 2014

I am currently doing academic research in a linguistic field. Unfortunately, I have never had any statistical education. I have been reading on statistics for beginners lately (e.g. 1, 2, 3, 4, and also here on SE 5), and I've had my hands full with...

From: Stats Stack Exchange | By: Bram Vanroy | Sunday, December 28, 2014

i obtained values for a same parameter in various locations and i want to cluster them. (abundance fraction of different minerals in a hyperspectral image) these fractions have spatial correlations. for example, abundance of one mineral could be increase...

From: Stats Stack Exchange | By: Solmaz | Sunday, December 28, 2014

I've ran this linear regression: mtcars_lm <- lm(mpg ~ wt, mtcars) Lets say I observe a value of mpg that is 2 above the predicted value given x wt. Am I right in saying this would be 0.67 standard deviations above the predicted value. Here's my workings...

From: Stats Stack Exchange | By: luciano | Saturday, December 27, 2014

Need a bit of hand holding here since my stats never went past the 100 level in undergrad. I'm trying to write an application that will hopefully allow a user to sort an array of items based on a line they have a created. The items are an array of n...

From: Stats Stack Exchange | By: imjared | Saturday, December 27, 2014

I have a simple 2 class classification problem (classes 1 and 0). I am using the matlab version of LibSVM to learn the model from the data. A very weird thing is happening: if I randomly permute the training data points (both features and labels are...

From: Stats Stack Exchange | By: Yashoteja Prabhu | Saturday, December 27, 2014

How do I plot weibull distribution by specifying weibull parameters? I also want histogram on the same plot?

From: Stats Stack Exchange | By: kancha | Saturday, December 27, 2014

I'm looking for an intuitive answer for the following questions: In statistic and information theory, what's the difference between bhattacharyya distance and kl divergence, when measure the difference between two discrete probability distributions?...

From: Stats Stack Exchange | By: JewelSue | Saturday, December 27, 2014

I was just reading a paper, seeing someone do the following: As a pre-processing step they did the following: PCA the original data -> Stacked Autoencoder Then they fed this pre-processed data into a feed-forward neural network. But - there are two...

From: Stats Stack Exchange | By: Steven | Friday, December 26, 2014

I am interested in the effect of age on outcome Y. I have two nested linear regression models to test linear and quadratic effects of age: Y= $\beta_0$ + $\beta_1$ some_covariate + $\beta_2$ Age + error Y= $\beta_0$ + $\beta_1$ some_covariate + $\beta_2$...

From: Stats Stack Exchange | By: Vincent | Friday, December 26, 2014

Suppose I have $N$ training examples and there are $K$ classes and the targets have a $1$ of $K$ encoding Let $t_k^n$ denote the kth component of the nth training target Let $x^n$ denote the the input to the final hidden layer for the nth training example...

From: Stats Stack Exchange | By: Thomas Mathers | Saturday, December 27, 2014

How to do a logistic regression and/or SVM on sparse data in R? I have $ 10^6 $ observations, $ 10^4 $ TRUE/FALSE features, and each observation has only small number of features, i.e. 1,0,0,0,0,0,1,0,0,... 0,0,0,1,0,0,0,0,0,... 0,0,0,0,0,0,1,0,1,......

From: Stats Stack Exchange | By: user31264 | Saturday, December 27, 2014

I'm not looking for a plug and play method like BEST in R but rather a mathematical explanation of what are some Bayesian methods I can use to test the difference between the mean of two samples.

From: Stats Stack Exchange | By: John | Friday, December 26, 2014

I am trying to run a nested logit model using mlogit() in R. There is an option un.nest.el, which is "a boolean, if TRUE, the hypothesis of unique elasticity is imposed for nested logit models" (from help file). My model only runs if this is TRUE, If...

From: Stats Stack Exchange | By: user1791950 | Friday, December 26, 2014

How can I check if the data is drawn i.i.d. from an unknown multivariate distribution? I tried to validate that assumptiom by checking if the sample follows a normal-distribution in all variables. I also tested for some other distributions, but all tests...

From: Stats Stack Exchange | By: JimBoy | Friday, December 26, 2014

I'm searching for time series smoothing algorithms, which give "future-independent" results - each next smoothed value depends only on previous data (smoothed or not smoothed), but not on any future data. The obvious variant is exponential smoothing...

From: Stats Stack Exchange | By: allchemist | Thursday, December 25, 2014

I done a large study (N about 200 in both control and treatment) in which one of the user ratings is significantly different (p < 0.0001). When I ran the unpaired t-test, the F-test also returned significant (p = 0.0003). This violates the t-test...

From: Stats Stack Exchange | By: user2974849 | Friday, December 26, 2014

Doing a Jarque Bera test in R I get this result: jarque.bera.test(rnorm(85)) data: rnorm(85) X-squared = 1.259, df = 2, p-value = 0.5329 Does it mean that the probability to discard the normality hypotesis (being it trye) is 5329%? If so, why do I get...

From: Stats Stack Exchange | By: will198 | Friday, December 26, 2014

Can someone tell me what are the assumptions of unbiasednes for simple probit model like this $ Prob(y=1|x) = G^-{1}(\beta_0 + x\beta) $ I know that dependent variable models are estimated by MLE so are there any assumptions to check like in the case...

From: Stats Stack Exchange | By: m3d1v0 | Friday, December 26, 2014

I'm wondering how useful the standard deviation is when applied to positively skewed data? The standard deviation implies that 68% of data will lie within one standard deviation of the mean, but surely this will not work for positively skewed data? This...

From: Stats Stack Exchange | By: luciano | Friday, December 26, 2014

I sas this example in the book R in action, the codes in R are as follows: library(multcomp) attach(cholesterol) table(trt) trt aggregate(response, by=list(trt), FUN=mean) aggregate(response, by=list(trt), FUN=sd) fit<-aov(response ~ trt) summary(fit)...

From: Stats Stack Exchange | By: yue86231 | Friday, December 26, 2014

Can somebody help me out with Seasonal ARIMA equation for model (1,0,1) (1,0,1)

From: Stats Stack Exchange | By: USer123 | Friday, December 26, 2014

I am trying to model relationship between length of stay of patients in hospital(Y) vs Age in years(X). The data set I've got doesn't specify the unit of length of stay. So now estimated value of my coefficient for age is $ b_1 = 0.084 $. So if a patient...

From: Stats Stack Exchange | By: Durin | Thursday, December 25, 2014

I am trying the find mode of a probability distribution function given by \begin{equation} g(x/\alpha,\beta,\sigma)=\frac{1}{\Gamma \left( \alpha \right)\beta^{\alpha}}exp\left\{{-\frac{x^2}{2\sigma^{2}}\frac{1}{\beta}}\right\}\frac{x^{2\alpha-1}}{2^{\alpha-1}\sigma^{2\alpha}}I_{{\rm...

From: Stats Stack Exchange | By: Murat Arat | Thursday, December 25, 2014

I have a variable Xi and this is a highly skewed variable because there are a lot of values of certain type occurring at a higher frequency compared to other values. The density plot of this variable Xi is as shown in the figure below. I am trying to...

From: Stats Stack Exchange | By: Science11 | Thursday, December 25, 2014

When I Replace Missing Values - Expectation-Maximization in SPSS, I receive the following message: The EM algorithm failed to converge in 25 iterations. Should the algorithm be able to converge? Anyone can help me? I have 20 variables and 299 cases (217...

From: Stats Stack Exchange | By: Isadora | Thursday, December 25, 2014

This book is written in 1939. It's available here on archive.org. Would you recommend this as an introduction to the mathematics of statistics for beginners?...

From: Stats Stack Exchange | By: Kedar Mhaswade | Thursday, December 25, 2014

In a logistic Generalized Linear Mixed Model (family = binomial), I don't know how to interpret the random effects variance: Random effects: Groups Name Variance Std.Dev. HOSPITAL (Intercept) 0.4295 0.6554 Number of obs: 2275, groups: HOSPITAL, 14 How...

From: Stats Stack Exchange | By: user2310909 | Thursday, December 25, 2014

I was wondering if there is a general procedure of solving for the primary variables of a linear or quadratic or, in general, a convex program after already having solved the dual program. The problem I am working on is a least squares problem with l1-norm...

From: Stats Stack Exchange | By: Dave31415 | Thursday, December 25, 2014

Background The problem I am going to describe has to do with analysis of microarray data, and measurements are made at probes and p.values are returned for differences between groups at probes. The measurements made at these probes are spatially correlated...

From: Stats Stack Exchange | By: Ankur Chakravarthy | Thursday, December 25, 2014

