I am doing an exercise of machine learning, and I have built a Gaussian Naive Bayes classifier (i.e., I have defined values of mean and standard deviation) using scikit-learn. Now I am supposed to "compute the error rate using k-fold validation based...

From: Stats Stack Exchange | By: badnack | Friday, April 17, 2015

I would like to conclude on a given time series that if it has Trend or not. I have carried out a cox-stuart test in R and have decomposed to inspect the series visually but still a bit confused on if there is evidence of trend or not. The command used...

From: Stats Stack Exchange | By: user42571 | Monday, April 20, 2015

I got a bit confused during the end of this proof so I am asking for a check. Take $$Y(n) = \begin{cases} 1 &\mbox{with probability} \ 1 -p_n \\ n & \mbox{with probability} \ p_n \end{cases} $$ Assume $p_n \rightarrow 0$ prove that $Y_n$ converges...

From: Stats Stack Exchange | By: Monolite | Sunday, April 19, 2015

I have a set of 10 variables: 9 explanatory, 1 response. I wish to do a constrained regression on the variables and use the values of the coefficients as weights in a TOPSIS analysis. I am having several issues with this - but I think the biggest one...

From: Stats Stack Exchange | By: TheBean | Sunday, April 19, 2015

What is the difference between the variance inflation factor (VIF) and stepwise regression as both help in detecting multicollinearity? What variables are different while running both techniques?

From: Stats Stack Exchange | By: neha | Sunday, April 19, 2015

I originally asked this question in Overflow but someone suggested I post it here. I'm trying to model the number of parks in a neighborhood as a function of education, land area (both continuous variables), and poverty percentage (categorical). There...

From: Stats Stack Exchange | By: user3642531 | Sunday, April 19, 2015

My professor mentioned using the invnorm function on my calculator, but many websites say that you need a mean and standard deviation to figure out the answer. Anybody have an idea of where I can begin? Thank you for your help.

From: Stats Stack Exchange | By: Lynda Strasser-Schweitzer | Sunday, April 19, 2015

For strictly educational purposes, our fictitious high school utilizes a GPA grading system represented by ordinal variable ranging from 0 to 4 (5 potential inputs) and the board test submitted for college admissions is ranked from 0 to 10 (11 potential...

From: Stats Stack Exchange | By: Ioannis Tikas | Sunday, April 19, 2015

I am having difficulties specifying the appropriate structure for nested/random effects in a mixed model that I am trying to pass through the 'Lasso' shrinkage algorithm. I am using the package 'glmmLasso'. My data consists of disease incidence data...

From: Stats Stack Exchange | By: johnybinwv | Sunday, April 19, 2015

Given that our loss function is $\alpha$ strongly convex function we know the Online gradient descent algorithm can get $C\log(T))/\alpha$ for some const C. My question is what happens when $\alpha$ is close to zero( in particular much smaller then 1)...

From: Stats Stack Exchange | By: Raba Poco | Sunday, April 19, 2015

I am trying to predict density function using LOESS in R. However, the predicted values I got are not in the estimated LOESS line. #Generate data n<-10000 a1<-a2<-0.1 a3<-a4<-0.2 a0<-0.1 u1 <-rnorm(n,0,1) u2 <-runif(n,0,1) u3...

From: Stats Stack Exchange | By: user37180 | Sunday, April 19, 2015

I am going through a model selection process with a mixed-model with 3 variables: A, B, and C. B and C are orthogonal factors. B or C may interact with A, so my full model would be: fixed: Y ~ A + B + C + A*B + A*C random: ~1|D When I run my analysis,...

From: Stats Stack Exchange | By: user14241 | Sunday, April 19, 2015

Suppose I am using random forests where the classes are highly unbalanced. How do you detect over fitting and what can you do to avoid it? Breiman says in his paper that random forests do not overfit, but others say that they can? If overfitting does...

From: Stats Stack Exchange | By: lord12 | Saturday, April 18, 2015

I have reading on the Johansen co integration model. I am using the jci test function of MATLAB. I have some a little trouble selecting the 'model.' can somebody explain in layman terms which and what is appropriate where? I can't understand what having...

From: Stats Stack Exchange | By: cryptex | Sunday, April 19, 2015

Suppose I have $n$ data points $x_1,\dots,x_n$, each of which is $p$-dimensional. Let $\Sigma$ be the (non-singular) population covariance of these samples. With respect to $\Sigma$, what is the most efficient way known to compute the vector of squared...

From: Stats Stack Exchange | By: Lepidopterist | Sunday, April 19, 2015

See this question on Math SE. Short story: I read The Elements of Statistical Learning and got frustrated when I was trying to verify some of the results, e.g., given $$\text{RSS}(\beta) = \left(\mathbf{y}-\mathbf{X}\beta\right)^{T}\left(\mathbf{y}-\mathbf{X}\beta\right)\text{,}$$...

From: Stats Stack Exchange | By: Clarinetist | Sunday, April 19, 2015

Linearized rate is a method summarise constant hazard function in a very simple way and defined as: total number of observed events divided by total patients-year (person-year). These rates should be reported with CIs. I was trying to calculate linearized...

From: Stats Stack Exchange | By: Rafik Margaryan | Sunday, April 19, 2015

I try to implement my own cross correlation function in R by translating it as a convolution problem. Part I: So I have two arrays, e.g. two identical arrays, and I want to get the cross correlation in R, then I need the following code?!: a1 = 1:9 a2...

From: Stats Stack Exchange | By: PeteChro | Saturday, April 18, 2015

I keep getting the error when I try to run this simulation could somebody tell me what I'm possibly doing wrong? P <-matrix(c(0.2,0.8,0.3,0.7,0.5,0.5), nrow=3,byrow=T) results <- numeric(1000) set.seed(87654321) for (i in c(20,30,40)){ y <-...

From: Stats Stack Exchange | By: odb | Sunday, April 19, 2015

I have a least square fitting like this: fit = lsfit(log10(M), log10(RS), wt) This function lists statistics and p-values for the coefficient considering the null hypothesis is zero but I want to change the null hypothesis of the coefficient from 0 to...

From: Stats Stack Exchange | By: Fred | Saturday, April 18, 2015

I am trying to build a second-order Markov Chain model, now I am try to find transition matrix from the following data. dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE))) Now I know how to fit the first order Markov...

From: Stats Stack Exchange | By: simonyy | Sunday, April 19, 2015

i need a corpus to try mahout classification, i've tried the AG's corpus of news articles downloaded from this site http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html but that was not enough for me coz it has just a small articles help plz...

From: Stats Stack Exchange | By: Ben Youb | Sunday, April 19, 2015

I have this table for chi-square in R: x <- matrix(c(23,22,10,14,11,12),ncol=3) phi(x) this is a 3 by 2 table and thus phi correlation won't work here, can anyone help me get a code for phi correlation (or something similar) in R for multiple cells?...

From: Stats Stack Exchange | By: jbest | Sunday, April 19, 2015

What is a concrete example of a Bayesian resolution to the Two Envelopes Problem?

From: Stats Stack Exchange | By: Garrett | Sunday, April 19, 2015

As far as I understood it, chi-square provides a measure for determining the similarity of the expected and observed (empirical and theoretical) distributions of nominal variables. It can be employed, for example, in a goodness-of-fit-test that enables...

From: Stats Stack Exchange | By: wehnsdaefflae | Friday, April 17, 2015

I am kind of new to statistics. I have 4 independent variables that has been observed from a system in 4 different configurations. At this point, I don't know what are the best statistical functions that can make some best comparison between them. I...

From: Stats Stack Exchange | By: lonesome | Sunday, April 19, 2015

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd...

From: Stats Stack Exchange | By: briantreg | Sunday, April 19, 2015

Hi there, I am trying to perform a visual analysis of significance on these stats. Other information provided is the Standardized motor skills test score [M = 100, SD = 15], not sure if this is relevant. What I can see is that the difference between...

From: Stats Stack Exchange | By: Tracy | Sunday, April 19, 2015

I have a dataset X, and i'm trying to predict the response variables: a, b, c given an instance x. Typically, one might run whatever regression routine on a, b, and c seperately. However, what happens if a, b, and c are closely related? For example,...

From: Stats Stack Exchange | By: user1858363 | Sunday, April 19, 2015

I'm looking at a Computer Vision application where I try to analyze the strength of edges a certain set of colors make with another color. For, this I take images of two colors falling on top of each other and record the edge strength (through a gradient)...

From: Stats Stack Exchange | By: dev_nut | Saturday, April 18, 2015

Bookmakers quite often price players to score a goal at any point during the game. For example, they may give Ronaldo a 52% chance of scoring a goal in a game, and Messi a 60% chance of scoring a goal in a game. However, how do you work out the possibility...

From: Stats Stack Exchange | By: Odds Help | Sunday, April 19, 2015

This question was taken from a practice exam in my statistics course. Given a random sample $X_1, X_2, ... X_n$ from a Poisson distribution with mean $\lambda$, can you show that $\bar{X}$ is consistent for $\lambda$? We are told to use Tchebysheff's...

From: Stats Stack Exchange | By: Nicky_Ay | Sunday, April 19, 2015

For a sequence $X_1, X_2, \dots $, Let $F_n(x)$ denote the cdf of $X_n$.
Suppose our sequence is $X_n \sim N(0,n) $ then for all $x$ the point-wise limit of $F_n(x)$ is $\frac{1}{2}$.
How would one prove this?

From: Stats Stack Exchange | By: Monolite | Saturday, April 18, 2015

Is the process of calculating Residual Standard Error in Training Set and Test Set same?

From: Stats Stack Exchange | By: caroline | Saturday, April 18, 2015

I came across an old exam question as follows: If the life of one computer component (in years) has Gamma distribution with mean $6$ and variance $18$, how can we find the probability that this component has a lifetime of at least $9$ years? What is...

From: Stats Stack Exchange | By: Dr. Hoshang | Saturday, April 18, 2015

I've implemented the Bayesian Probabilistic Matrix Factorization algorithm using pymc3 in Python. I also implemented it's precursor, Probabilistic Matrix Factorization (PMF). See my previous question for a reference to the data used here. I'm having...

From: Stats Stack Exchange | By: Mack | Saturday, April 18, 2015

I am developing an artificial model to simulate the growth of two types of biological cells under different conditions. The data I obtained from my model takes the form of two data-sets representing the number of cells in the culture across time in what...

From: Stats Stack Exchange | By: max0005 | Saturday, April 18, 2015

1) I think one of the algorithms used to handle ties for the Wilcoxon rank-sum test (a.k.a. Mann-Whitney U test) is Streitberg / Rohmel. I could not find a good source which explains the algorithm/ gives a proof/ or even simply outlines the algorithm....

From: Stats Stack Exchange | By: a.e. | Saturday, April 18, 2015

I have performed an ordinal probit model in STATA and have 2 queries. kindly help The parallel line assumption test (run by oparallel or brant) does not runs. and it give the error that test is only for logit models. is it so? fitsat(test for goodness...

From: Stats Stack Exchange | By: numra | Saturday, April 18, 2015

I am trying to implement a Neural Network to identify a Nonlinear System. I have implemented a very simple system in simulink and on the basis of examples of its input and output I would like to have the NN to mimic its behaviour. The system is the following...

From: Stats Stack Exchange | By: MagoNick | Saturday, April 18, 2015

I am a bit new to the whole nonparametric and Bayesian idea, so tell me if this is correct: to estimate, say, the mean of a dataset's population we do the following: We define a function $f(x)$ that is the PDF of our prior assumption of the distribution...

From: Stats Stack Exchange | By: Simon Kuang | Saturday, April 18, 2015

I asked this question in Stack Overflow: http://stackoverflow.com/questions/29710525/symbol-in-r-lm I feel like here would be a better place to get an answer. What exactly does the ^ symbol do to the regression and why does it make the r^2 so much higher?...

From: Stats Stack Exchange | By: japem | Saturday, April 18, 2015

I have three groups A, B, C, with participant ns of 20, 89, and 165. Each participant ranked her or his concern with 14 items (potential impediments to success). Scale was 0-1-2-3, 3 = most concern. I have the mean rank of each item for each group, and...

From: Stats Stack Exchange | By: Doug A.C. | Saturday, April 18, 2015

Following are acf and pacf plots of a monthly data series. The second plot is acf with ci.type='ma': The persistence of high values in acf plot probably represent a long term positive trend. The question is if this represent seasonal variation? I tried...

From: Stats Stack Exchange | By: rnso | Saturday, April 18, 2015

I'm examining a code in C++ for a nonlinear fit. It is basically a Levenberg Marquardt routine you can find on Netlib or elsewhere. The last step is estimating the errors of the parameters that are fitted. From literature, I know that the variance of...

From: Stats Stack Exchange | By: Clemens | Saturday, April 18, 2015

I am using R2jags to fit a model in R using JAGS. Here is my code: predictorNames <- c("BMIX", "AGE", "TEXPWK", "FRUITS", "VEGTABLS", "FISH", "REDMEAT", "POULTRY", "SOY", "NUTS", "GRAINS", "WHLGRNS", "MILKS", "DAIRY", "RACE.BLACK", "REGION.NE", "REGION.MW",...

From: Stats Stack Exchange | By: user3821273 | Saturday, April 18, 2015

I am studying an experiment of the kind: Let $n_{ij}$ be the number of fetuses, $X_{ij}$ the number of responses i.e. the number of fetuses with a malformation in the jth litter of the ith dose level for j=1,...,25 and i=1,...,5 . Then, $p_{ij}$ is the...

From: Stats Stack Exchange | By: CrishaD | Saturday, April 18, 2015

I took a test two days ago. one of our question is as follows: decision tree with depth 2 is constructed for two binary feature. hypothesis spase that can be shown with the following tree has how many features? The answer sheet say solution as $16$ but...

From: Stats Stack Exchange | By: Anjela Dark | Saturday, April 18, 2015

I have a population of n unique items and am taking a sample of r. I am sampling with replacement. I would like to calculate the probability of sampling any specific item x times give the sample size and population.

From: Stats Stack Exchange | By: David | Saturday, April 18, 2015

I have the numeric values to plot a probability density function....they look like 0.000390911, 0.00039091099989183763, 0.0003909109997836753, 0.0003909109996755129, 0.0003909109995673506, 0.0003909109994591882, 10.398579783795636, 10.398469842516338,...

From: Stats Stack Exchange | By: triub | Friday, April 17, 2015

