# Stats Stack Exchange

I'm not sure this is the right site to post my question. If not, please direct me to the right one. I'm interested in machine learning and computational intelligence. I've spent the last year of my free time on personal projects in computation intelligence...
From: Stats Stack Exchange | By: Andrey | Friday, May 22, 2015
I'm using Python Keras package for neural network. This is the link. Is batch_size equals to number of test samples? From Wikipedia we have this information: However, in other cases, evaluating the sum-gradient may require expensive evaluations of the...
From: Stats Stack Exchange | By: user2991243 | Friday, May 22, 2015
if I have time series with 1000 values , and I want to build a predictive model , how far in the future should i successfully forecast to make my predictive model valid, is there any condition or rule for this?
From: Stats Stack Exchange | By: Just Gal | Sunday, May 24, 2015
This eviews workfile contains US index of unemployment from 1960 to 2008 quarterly. I'm trying to understand ACF and PACF. Below is a correlogram for the first 24 lags: What can the correlogram and ACF/PACF tell me about the data? To my understanding,...
From: Stats Stack Exchange | By: Kuromusha | Sunday, May 24, 2015
I have a large data set with repeated measurements of same blood value (co) (2 to 7 measurements per patient). Each measurement is coupled with time which is the time interval between surgical operation and blood level measurement. My aim is to show...
From: Stats Stack Exchange | By: arkiaamu | Sunday, May 24, 2015
How to convert World Fertility Surveys 1975’ (for Pakistan) raw data-DAT files into a set of Stata/SPSS data files. I have found this link but this didn't work Turning the World Fertility Surveys’ raw data into stata files. The World fertility Survey...
From: Stats Stack Exchange | By: Wazir | Sunday, May 24, 2015
I have a problem with my dependent variable, which is a proportion including ones and zeros. I am analyzing the use of a fungicide in apple farming. I have a sample of a survey of 1300 farmers and there are many, who doesnt use this fungicide at all...
From: Stats Stack Exchange | By: Ole | Sunday, May 24, 2015
I am running confirmatory factor analysis (CFA) with r. As I have several missing observations, I get two series of results, on "used" (152) and "total" observations (246). How is the total calculated? Which of the two results should I use for the analysis?...
From: Stats Stack Exchange | By: Valentina Montalto | Sunday, May 24, 2015
I am trying to draw a nomogram from a logistic regression in R by using the rms package, but currently I have a problem: indeed, I can get the nomogram, but the "linear predictor" axis ranges from -2.5 to +3, and I'd like to know whether I can make it...
From: Stats Stack Exchange | By: Leonardo Frazzoni | Sunday, May 24, 2015
I'm trying to show that if $X_n$ converges in probability to 0 and $Y_n$ converges in probability to 0, then $X_n+Y_n$ converges in probability to $0$, ie the sum rule for probability limits. What I'm trying to do is to show it directly using $\epsilon,... From: Stats Stack Exchange | By: CloseToC | Sunday, May 24, 2015 smile frown Imagine you have many observations on which you want to run a classification algorithm. Each observation is characterized by a matrix of non-negative values. For all observations 90-98% of the values are 0. To ensure that a machine learning algorithm... From: Stats Stack Exchange | By: felbo | Sunday, May 24, 2015 smile frown According to FBI date, 12.4% of burglaries are cleared with arrests. A new detective is assigned to 5 different burglaries. What is the probability that at least one of them is cleared with an arrest? From: Stats Stack Exchange | By: sandra | Sunday, May 24, 2015 smile frown I'm new to this and I have a question about the results that I get from checking whether my data contains seasonality or not. I have a csv file which contains date, period and year. However, R reads the date as a factor and I didn't know how to deal... From: Stats Stack Exchange | By: Nab | Sunday, May 24, 2015 smile frown There must be a fundamental error in my approach. Let's start by stating we have a simple regression with two variables X and Y:$Y_t = BX_t + e_t$Where B is the coefficient and e is the error term. Next, take the first difference of the said equation... From: Stats Stack Exchange | By: Dole | Sunday, May 24, 2015 smile frown I'm doing a self-study training of Statistics and I'm stuck with an exercise. If someone could help me... I'm using decathlon data of FactoMinerR plugin and I need to find the expression that better predicts the behavior of an athlete for 1500m testing... From: Stats Stack Exchange | By: gleancrawler | Sunday, May 24, 2015 smile frown To figure out exactly the expected frequency of a given sum in a dice toss (given a certain number of dice and sides/dice), the following formula is posted here by @Glen_b (adapted to dice of six sides, and two dice tossed) the multiplication of the... From: Stats Stack Exchange | By: Antoni Parellada | Saturday, May 23, 2015 smile frown I have a question about the results that I get from checking whether my data contains seasonality or not. I have a csv file which contains date, period and year. However, R reads the date as a factor and I didn't know how to deal with is. So I extracted... From: Stats Stack Exchange | By: Nab | Saturday, May 23, 2015 smile frown What does the component (factor) score covariance matrix in PCA or FA explain? From: Stats Stack Exchange | By: Raina | Saturday, May 23, 2015 smile frown I am trying to assess the predictive performance of two competing linear regression models. $$model 1: Y \sim X_{1} + X_{2}$$ $$model 2: Y \sim X_{1} + X_{2} + X_{3}$$ where y is continuous. I would like to estimate the mean squared error (mse) using... From: Stats Stack Exchange | By: user2957945 | Saturday, May 23, 2015 smile frown I would like to solve the following problem: $$\beta = \arg\min_{\beta;\|\beta\|_1 \leq1} \|X\beta-y\|_2^2$$ which happens to be the constrained formulation of the Lasso, considering only parameter vectors that lie inside the$\ell 1$-unit-ball. However,... From: Stats Stack Exchange | By: broncoAbierto | Saturday, May 23, 2015 smile frown Here's a statement I read from the method section in a paper: "One disadvantage of the ﬁxed-effects approach is that the results obtained are conditional on the data used to estimate them; that is, results cannot be generalized to other years or microregions... From: Stats Stack Exchange | By: NonSleeper | Saturday, May 23, 2015 smile frown Univariate analysis of 18 variables possibly associated with spine infection--can all the historical variables be combined into one variable, then logistic regression be performed?... From: Stats Stack Exchange | By: Steven Shroyer | Saturday, May 23, 2015 smile frown Consider the following dataset: # color type region_west region_cent region_east region_west_pct region_cent_pct region_east_pct # 1 red shirt 24 17 48 0.2697 0.1910 0.5393 # 2 blue shirt 24 18 44 0.2791 0.2093 0.5116 # 3 red pant 42 13 33 0.4773 0.1477... From: Stats Stack Exchange | By: JasonAizkalns | Saturday, May 23, 2015 smile frown There are many distance functions for distributions out there, but I'm having a hard time wading through them all to find one that is "distribution-free", by which I mean that it makes few / weak assumptions about the underlying distributions (in particular,... From: Stats Stack Exchange | By: kjo | Friday, May 22, 2015 smile frown I'd like to analysis my data (animal breeding) with linear quantile mixed model. Lqmm package in R does that but co-variance structure do not know A (relationship numerator matrix is Sparse matrix) because it is phi^2*I and my co-varaince structure is... From: Stats Stack Exchange | By: Hossein naeeimipour yuonesi | Saturday, May 23, 2015 smile frown One question for the nonlinear dimensionality reduction. I have 800 samples and 4900 features for a regression problem, 80% for training and 20% for testing. I have tried linear PCA to reduce it to 200 features. It could increase the cc by 1%. I want... From: Stats Stack Exchange | By: Jingtao | Saturday, May 23, 2015 smile frown As usual my questions develop and this is a continuation of this thread. I got$AIC_1=153.519$for a mixture of two standard Gaussians (with$\chi_1^2=16.6125,\,\nu_1=20-6-1=13$) and$AIC_2=157.24$for a mixture of three Gaussians (with$\chi_2^2=14.0735,\,\nu_2=20-9-1=10$).... From: Stats Stack Exchange | By: corey979 | Saturday, May 23, 2015 smile frown I just encountered a problem while analyzing experimental data using lme4 and lmertest. In the experiment, 67 subjects gave 3 ratings for 50 stimuli shown for 3 different durations (a total of 10050 responses). I used the same nested model for each of... From: Stats Stack Exchange | By: embo | Saturday, May 23, 2015 smile frown (I guess stats.SE is the right place for this) I'm reading Albert's book "Bayesian computation with R". To get theprior predictive density, he extensively uses this formula $$f(y) = \frac{f(y\mid \lambda) g(\lambda)}{g(\lambda | y)},$$ where$f(y\mid...
From: Stats Stack Exchange | By: kekkonen | Saturday, May 23, 2015
I am trying to calculate KS p-value and RMSE for a set of streamflow data in R. This is my first time doing this, and I want to be sure i am not doing something wrong. The data: Obs_data<-c(37.5,20.3,19.7,34.5,63.3,97.3,97.5,70.6,22.0,49.9,21.8,14.9,...
From: Stats Stack Exchange | By: MikeB | Saturday, May 23, 2015
There are two different empirical CDFs which I have and I would like to evaluate how different they are. The place at which I started was Kolmogorov-Smirnov test - the result in my case looks like this (from the R ks.test) Two-sample Kolmogorov-Smirnov...
From: Stats Stack Exchange | By: petwe | Saturday, May 23, 2015
I used ridge regression on a data with multicollinearity ..but I was expecting that the standard error of each predictors would be smaller compared to the ols version.....but from the output inR,, I can only see Standard error(scaled)...and these values...
From: Stats Stack Exchange | By: user77906 | Saturday, May 23, 2015
I wanted to use the Kolmogorov-Smirnov test in R to evaluate how dissimilar two emp
From: Stats Stack Exchange | By: heinheo | Saturday, May 23, 2015
I am trying to model an inference problem, but it doesn't seem to readily fit with the algorithms we usually hear about. I am hoping that I have missed something, and someone hear can point out that missed-out algorithm/approach to me. Roughly speaking,...
From: Stats Stack Exchange | By: Chthonic Project | Saturday, May 23, 2015
So, we have Recurrent Neural Networks and Recursive Neural Networks. Both are usually denoted by the same acronym: RNN. According to Wikipedia, Recurrent NN are in fact Recursive NN, but I don't really understand the explanation. Moreover, I don't seem...
From: Stats Stack Exchange | By: crscardellino | Friday, May 22, 2015
The question is using orthogonal contrast: 4 different paint (1,2,3,4) surface with 2 levels(Asphalt, concrete) for contrast 1: I would like to compare paint 1 and paint 2 for contrast 2: I would like to compare paint 3 with average performance of paint...
From: Stats Stack Exchange | By: Emilia311 | Saturday, May 23, 2015
I have a small data set of purchase of an author whom has published book within 11 years. I categorized them based on the year that book had sell with 2015=1, 2014=2 ... and 2005=11. When I run polynomial effect test in GLM, it gives a significant value...
From: Stats Stack Exchange | By: lonesome | Saturday, May 23, 2015
The attached figure plots the distributions of two variables. I want to demonstrate statistically how closely the the two distributions match each other. What is the best way of doing this?...
From: Stats Stack Exchange | By: luciano | Saturday, May 23, 2015
Does anybody know how to simulate from a log-copula function? I'm trying to simulate $(u,v)$ from a log-copula function with the CDF: $$C(u,v, a) = \exp\bigg(1-\big[(1 - \ln u)^a + (1 - \ln v)^a - 1\big]^{1/a}\bigg)$$ I guess I should use the Marshal-Olkin...
From: Stats Stack Exchange | By: Andrew Lin | Saturday, May 23, 2015
I'm revising a paper on pollination, where the data are binomially distributed (fruit matures or does not). So I used glmer with one random effect (individual plant) and one fixed effect (treatment). A reviewer wants to know whether plant had an effect...
From: Stats Stack Exchange | By: zephyr44 | Friday, May 22, 2015
I am building a random forest in R and was wondering how to extract the most important variables. I am using a random forest to classify if a click is fraud or not, and the goal is to identify characteristics that increase the probability of a click...
From: Stats Stack Exchange | By: lord12 | Saturday, May 23, 2015
I have an empirical probability function $p(z)$. In the first column $z$ and the second column contains $p(z)$ values. The data is given as following : data.cat +0.01234 +0.002816 +0.03693 +0.003265 +0.06152 +0.003551 +0.08611 +0.006612 +0.1107 +0.008898...
From: Stats Stack Exchange | By: Dalek | Friday, May 22, 2015
Hi I am estimating a regression of Y on X where both Y and X are indicator variables, that is, they take values 1 or 0. The model is $Y=\beta*X$ (without constant). I use OLS to estimate the coefficient (as it is in line with my theoretical framework)....
From: Stats Stack Exchange | By: ger | Friday, May 22, 2015
According to this tutorial on deep learning, weight decay (regularization) is not usually applied to the bias terms b why? What is significance (intuition) behind it?...
From: Stats Stack Exchange | By: Harshit | Friday, May 22, 2015
I've been going through time series material of late, trying to re-invent myself as a practitioner in the field. Until I got to the point of actually trying to Matlab some models, I had never run across the characterization of models as "conditional...
From: Stats Stack Exchange | By: StatSmartWannaB | Friday, May 22, 2015
I am confused trying to interpret how two observations are otherwise identical but differ by a dummy variable. For example if we have the following model with a factor variable race being White race the reference category: Call: lm(formula = Score ~...
From: Stats Stack Exchange | By: Ronnie | Friday, May 22, 2015
Assuming a very simplified view of my data looks like this: # factor_1 factor_2 region_1 region_2 region_3 region_1_pct region_2_pct region_3_pct # 1 A X 80 31 57 0.48 0.18 0.34 # 2 B X 10 80 11 0.10 0.79 0.11 # 3 A X 25 81 38 0.17 0.56 0.26 # 4 B X...
From: Stats Stack Exchange | By: JasonAizkalns | Friday, May 22, 2015
Distinct members , I'am really new to econometrics, and I would like to ask if is someone is kind enough to help me with an advice. Who knows how I might run these condition in a econometrics program: Ri,t+ >AVG(Ri(-60,-11)+2*STDV(Ri(-60,-11) Where...
From: Stats Stack Exchange | By: Manasaca | Friday, May 22, 2015
I have a dataset composed of 4 variables, 2 being numerical and 2 categorical (ordinal in fact). They all represent 4 types of indicators/measures of the same phenomenon. I want to analyse them in a multivariate way. I tried to apply first a PCA on the...
From: Stats Stack Exchange | By: agenis | Friday, May 22, 2015
I have a 300+ column data.frame, and no matter how I break it up I get this error every time: Error in solve.default(cv) : Lapack routine dgesv: system is exactly singular: U[107,107] = 0 I tried breaking the dataframe up and running vlf() on it then...
From: Stats Stack Exchange | By: Rilcon42 | Friday, May 22, 2015
