# Stats Stack Exchange

I am searching for some algorithms for feature extraction from images which I want to classify using machine learning . I have heard only about SIFT , I have images of buildings and flowers to classify . Other than SIFT what are some good algorithms...
From: Stats Stack Exchange | By: Trafalgar Law | Saturday, March 8, 2014
For a potential emotion recognition bachelor-project I was wondering what statistical test I have to perform when I get my results to test whether it's significant. I will be testing which combination of feature extraction and machine learning algorithm...
From: Stats Stack Exchange | By: NumesSanguis | Sunday, March 9, 2014
Multi-Armed Bandit: http://en.wikipedia.org/wiki/Multi-armed_bandit Uplift Modeling: http://en.wikipedia.org/wiki/Uplift_modelling How are these two approaches different? How are they similar? Is one better than the other? Edit: If an example scenario...
From: Stats Stack Exchange | By: tony | Monday, March 10, 2014
When comparing the performance of two classifiers over a single domain, in the context of a classification problem in machine learning, it is common to use a paired t-test, using the 10 average results from 10x10-fold cross-validation as measurements,...
From: Stats Stack Exchange | By: Vincent Barnabé-Lortie | Friday, March 7, 2014
I want to test whether changes in Hemoglobin (Hgb) levels over time can help diagnose Myelodysplastic syndrome (MDS). I have a cohort of thousands of patients. Each patient has several Hgb measurements over a period of 3 years. Measurements are unevenly...
From: Stats Stack Exchange | By: user2387584 | Monday, March 10, 2014
If you have several linear models, say model1, model2 and model3, how would you cross-validate it to pick the best model? (In R) I'm wondering this because my AIC and BIC for each model are not helping me determine a good model. Here are the results:...
From: Stats Stack Exchange | By: Dino Abraham | Sunday, March 9, 2014
Suppose that we have data on 1000 people. Each of these people are either from California, Texas, or Hawaii. We have various lifestyle variables on each person (e.g. age, gender, etc.). We are interested in comparing the income of people in California...
From: Stats Stack Exchange | By: guestom | Monday, March 10, 2014
In confirmatory analysis do you basically just test hypotheses? Then in exploratory analysis you try to generate hypotheses? In general, I know that you can first do exploratory analysis to form hypotheses and then confirmatory analysis to test them....
From: Stats Stack Exchange | By: topguypoland | Monday, March 10, 2014
I have two sets of data collected from two groups (group 1 in red, group 2 in blue). Each data set has been fit with a sigmoid with 4 parameters: Bottom Plateau, Top Plateau, Point of Inflection, and Slope at Point of Inflection. So for each group, I...
From: Stats Stack Exchange | By: user41607 | Monday, March 10, 2014
There are several methods to make forecasts of equidistant time series (e.g. Holt-Winters, ARIMA, ...). However I am currently working on the following irregular spaced data set, which has a varying amount of data points per year and no regular time...
From: Stats Stack Exchange | By: Ojo | Sunday, March 9, 2014
I found this question on the open MOOC from Stanford, however the answer is not present. I think the median $\leq$ mean. Is this the case?
From: Stats Stack Exchange | By: unj2 | Sunday, March 9, 2014
Disclaimer I am new to this site, relatively new to R (two weeks of learning), have just a really basic knowledge in statistics so sorry if I'm doing a dumb mistake there or asking bad question or something. I also don't know how to nicely embed my dataset...
From: Stats Stack Exchange | By: VaNa | Sunday, March 9, 2014
The formula for the optimal weighting matrix when you perform regression with more instrumental variables than endogenous predictors is the following: $W_{opt} = (\frac{1}{N}Z'Z)^{-1}$ This tells us that we only have to look at the variance covariance...
From: Stats Stack Exchange | By: Kasper | Sunday, March 9, 2014
So I'm trying to show that ${\rm Var}(Z) \le 2({\rm Var}(X)+{\rm Var}(Y))$ for $Z = X + Y$. This seems to be pretty easy to show given that $X$ and $Y$ are uncorrelated. But I'm running into trouble at this step: $${\rm Var}(Z) = {\rm Var}(X) + {\rm... From: Stats Stack Exchange | By: user2208604 | Sunday, March 9, 2014 smile frown The confidence interval for the mean of a random variable Y has coverage 1-\alpha which I am trying to show. Starting from$$\widehat{E(Y)} - q_{1-\frac{1}{\alpha}}\sqrt{\frac{\widehat{Var(Y)}}{n}} \leq E(Y) \leq \widehat{E(Y)} + q_{1-\frac{1}{\alpha}}\sqrt{\frac{\widehat{Var(Y)}}{n}}...
From: Stats Stack Exchange | By: user41593 | Sunday, March 9, 2014
I have fitted a glm to my data set and used to the Durbin-Watson test to check model fit. I have obtained the result. How can i interprete it? lag Autocorrelation D-W Statistic p-value 1 0.7750748 0.4466024 0 Alternative hypothesis: rho != 0...
From: Stats Stack Exchange | By: user40494 | Sunday, March 9, 2014
In this study: Rosenblum, Sara, et al. "Handwriting as an objective tool for Parkinson’s disease diagnosis." Journal of neurology 260.9 (2013): 2357-2361. http://link.springer.com/article/10.1007/s00415-013-6996-x The researchers attempt to classify...
From: Stats Stack Exchange | By: Omri374 | Sunday, March 9, 2014
I have a question about the acceptance ratio used when implementing a random walk M-H in a gibbs sampler to generate sample paths of an unobservable process. When computing the likelihood of a set of parameters, does the likelihood also include the likelihood...
From: Stats Stack Exchange | By: Tyler S | Sunday, March 9, 2014
If my state space is E=1,2,3,4,5,6,7 and transition matrix is {.5 0 0 .5 0 0 0} {0 .2 .3 .1 .4 0 0} {.4 0 0 .6 0 0 0} {0 0 .2 .8 0 0 0} {0 . 3 0 0 0 .3 .4} {0 0 0 0 .2 .6 .2} {0 0 0 0 1 0 0} can someone tell me the communicating classes, closed sets...
From: Stats Stack Exchange | By: slidepuppy1 | Sunday, March 9, 2014
lnQ = the natural logarithm of output (real value added -- RVA) lnL = the natural logarithm of labor lnK = the natural logarithm of the real capital stock (RCAPITAL) COOP = a dummy variable for a worker cooperative. Conventional firms have code (or ID)...
From: Stats Stack Exchange | By: afsdf dfsaf | Sunday, March 9, 2014
I am attempting to perform regression calibration on some test data to familiarize myself with the idea behind it. I don't have much of a stats background. The method rcal in Stata seems to be what I am looking for, however I am having trouble when trying...
From: Stats Stack Exchange | By: Arthur | Sunday, March 9, 2014
I'm working with a uniform distribution as a prior, defined as: $\pi(\theta) = \begin{cases} \frac{1}{7} & \text{if } \theta\in\{0,\frac{1}{6},\frac{2}{6},\ldots,1\} \\ 0 & \text{otherwise } \end{cases}$ I've to compute the prior predictive,...
From: Stats Stack Exchange | By: Ken MacAlpin | Sunday, March 9, 2014
I have a distribution of microparticles that follows a lognormal distribution. The cumulative distribution function thus is given by: $$F_X(x;\mu,\sigma) = \frac12 \operatorname{erfc}\!\left(-\frac{\ln x - \mu}{\sigma\sqrt{2}}\right)$$ $$\mu = log(M)... From: Stats Stack Exchange | By: akid | Sunday, March 9, 2014 smile frown Hi all and thank you in advance for helping! I have three variables: y the dependent count variable, x the independent positive continuous variable, U is the observation unit (and I expect a random effect with it). I would like to test the hypothesis... From: Stats Stack Exchange | By: k-zar | Sunday, March 9, 2014 smile frown I am doing a simple linear regression to test the relationship between voters' self-reported policy stands and their approval ratings for two hypothetical presidential candidates in a simulated election campaign. Participants first report their own position... From: Stats Stack Exchange | By: Sam | Sunday, March 9, 2014 smile frown Apart from the fact that D prime is in Z units (units of measurement transformed to standard deviation units - aka z scores).making it comparable regardless of the original units of measurement, I can't see what the advantage in analysing D prime instead... From: Stats Stack Exchange | By: user41270 | Sunday, March 9, 2014 smile frown I'm a newbie using LinearSVM to train the classifier. I labelled the images of 'buildings' as 1 and the others as -1. The training result is as follows : and As you can see in the image some of the buildings have positive scores and others are having... From: Stats Stack Exchange | By: Varun Das | Sunday, March 9, 2014 smile frown I understand that the 95% confidence interval is supposed to tell us that if we were to repeatedly take samples (of the same size) from the population, and compute the interval in the same way, then 95% of those intervals we computed will contain the... From: Stats Stack Exchange | By: mauna | Sunday, March 9, 2014 smile frown can I do my statistics work based on the central limit theorem? I need to perform a t-test, ANOVA and multiple regression. my outcome variable is highly not normally distributed (Highly positively skewed) and my sample size N=115. I'd like to keep the... From: Stats Stack Exchange | By: Mahmoud Ismael | Friday, March 7, 2014 smile frown I wonder what are the better approaches to categorize continuous data (e.g. age) than dividing them with the use of quantiles and cut function (in R). I have heard about using trees to divide data in the way which takes into consideration how a division... From: Stats Stack Exchange | By: Marciszka | Sunday, March 9, 2014 smile frown I have data on grades obtained in a course over three consecutive years. The grade distribution includes the number of students who received a fail, pass, credit, distinction and high distinction. I need to work out if the proportion of students who... From: Stats Stack Exchange | By: Bunya | Sunday, March 9, 2014 smile frown I have hard time to understand that what is the null hypothesis in the definition of Sensitivity and Specificity here? I'm student of Statistics and from my little knowledge in this field I can see that the here null hypothesis should be: \textrm{H}_{0}:\textrm{... From: Stats Stack Exchange | By: MYaseen208 | Sunday, March 9, 2014 smile frown Consider X_n = \begin{cases} 1 & w.p (1 - 2^{-n})/2\\ -1 &w.p~ (1 - 2^{-n})/2\\ 2^{k} &w.p~ 2^{-k} \text{ for } k > n\\ \end{cases} I need to show that even though this has finite moments,$$\sqrt{n}(\bar{X}_n) \overset{d}{\to} N(0,1)...
From: Stats Stack Exchange | By: Greenparker | Sunday, March 9, 2014
I am predicting a time series' future evolution and am evaluating the path uncertainty using bootstrapping. Is there a good way to visualise the uncertainty that goes beyond simply plotting a pair of confidence bands, e.g. 90% lines? I thought of something...
From: Stats Stack Exchange | By: Marvin | Friday, March 7, 2014
I need to present information about the main predictors of a candidate's votes using a public opinion survey data. I have run a logistic regression using all the variables that I care about, but I can't find a good way to present this information. My...
From: Stats Stack Exchange | By: user1172558 | Saturday, March 8, 2014
(I'm assuming the second x_bar should be a y_bar), but I'm mostly confused how to solve this problem because it seems like since x_bar and y_bar are values, not random variables, W is just a value? If that's the case, how can you take the expected value...
From: Stats Stack Exchange | By: user41569 | Sunday, March 9, 2014
I was reading some notes on ML and clustering and it claimed that the run time of clustering was O(kn) where k is the number of clusters and n is the number of points. I was wondering why this was true and if someone had an analysis for it. This is what...
From: Stats Stack Exchange | By: Pinocchio | Friday, March 7, 2014
Using Williams type contrast in the multiple comparison test with multcomp package will give different test statistics and test p-values for the same dataset and same hypothesis. I would like to know the difference between the two procedures. Can I use...
From: Stats Stack Exchange | By: Zhenglei | Saturday, March 8, 2014
I've got a data set looking at how different groups change over time. #High Abundance, Low Change HALC<-c(100,99,101,98,99,100,100,101,99,100) #Low Anundance, Low Change LALC<-c(1,2,1,2,2,2,1,2,1,2) #High Abundance, High Absolute Change HAHAC<-c(100,99,98,91,86,50,45,30,21,9)...
From: Stats Stack Exchange | By: Vinterwoo | Saturday, March 8, 2014
My girlfriend and I were having a debate about a magic trick that could be used to pick up women on the street. I am a magician and I have a coin. I walk up to a complete stranger with the coin in my hand and know which side is up. I ask her to state...
From: Stats Stack Exchange | By: Noah Clark | Saturday, March 8, 2014
Consider this model: summary(lm(mpg ~ hp*wt, mtcars)) Call: lm(formula = mpg ~ hp * wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.0632 -1.6491 -0.7362 1.4211 4.5513 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 49.80842 3.60516...
From: Stats Stack Exchange | By: luciano | Saturday, March 8, 2014
Can you please give an example of environmental research in which a ranked perspective rather than a linear perspective on monotonic relationships may be as or more useful. I want to study about a tropical disease related to soil ( podoconiosis),there...
From: Stats Stack Exchange | By: Kaleab | Saturday, March 8, 2014
When applying gam.check in the mgcv package, R produces some residual plots and basis dimension output. Is there a way to only produce the plots and not the printed output? library(mgcv) set.seed(0) dat <- gamSim(1,n=200) b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)...
From: Stats Stack Exchange | By: hgeop | Saturday, March 8, 2014
I'm analyzing a survey data that contains the following 3 variables (questions): If the election was today who would you vote for in the following list? (A, B, C, D, or E)? Let's imagine the respondent choose A in the previous item. Then, she is asked:...
From: Stats Stack Exchange | By: user1172558 | Saturday, March 8, 2014
I am studying the effect a certain chemical and age on an outcome. As variables, age is a factor of with levels 1, 2, and 3 and covar1 is continuous. After fitting the model mod1 <- glm(out1 ~ covar1*age, family=poisson, data=df) The estimates are...
From: Stats Stack Exchange | By: user41550 | Saturday, March 8, 2014
I work in experimental economics, labor and housing market. I am trying to answer the following question: why do researchers use matched pairs of applications in labor market correspondence tests (typically in hiring context), whereas they use a completely...
From: Stats Stack Exchange | By: Luca | Saturday, March 8, 2014
Is there such thing as a procedure to 'fit' an random walk? I have done analysis on the increments of my data and have fitted several distributions and performed goodness-of-fit tests. With this done, is that the random walk model 'fitted' and then I...
From: Stats Stack Exchange | By: user40124 | Saturday, March 8, 2014
I am doing an experiment on mice. I found 11 recombinants within 142 mice. I calculated the recombination frequency and found to be 7.7% . other studies (many studies, not one or two), had 1000 mice and 58 recombinants with recombination frequency 5.8%....
From: Stats Stack Exchange | By: IBRAHIM HAMAD | Saturday, March 8, 2014
I am working with a Bayesian posterior distribution derived from a sample (0.8,-0.2,0.4,1.3,0.2) on a N($\mu$,1) distribution, assuming an Exp(1) prior distribution on $\mu$. My immediate concern is plotting the function in R. The posterior distribution...
From: Stats Stack Exchange | By: twolffpiggott | Saturday, March 8, 2014
Consider the model $y_i=\beta x_i + \epsilon_i$ (without a constant term and with $k=1$), where $\mathbb{E}[\epsilon_i]=0, \mathbb{E}[\epsilon_i \epsilon_j]=0, \forall i \neq j$, and $\mathbb{E}[\epsilon_i^2]=\sigma_i^2$. Then consider following estimator...
From: Stats Stack Exchange | By: Jack | Saturday, March 8, 2014
