# Stats Stack Exchange

I am currently using PLS (the set of predictors are quite highly-dimensional) to predict a particular variable, $age$, and I am using Caret's train implementation using the pls method: modelFit <- train(train$age~.,data=train,method = "pls",tuneLength=100)... From: Stats Stack Exchange | By: ben18785 | Monday, August 31, 2015 smile frown I'm freshing up on machine learning (specifically image analysis) and of course looked into Markov random fields. I really cannot wrap my head around the concept of cliques and their application in MRFs. The definition of a clique is, to my knowledge... From: Stats Stack Exchange | By: Dänu | Monday, August 31, 2015 smile frown I have a series of quadrats placed randomly across aerial photography of a region. In each quadrat I have estimated the proportion of the quadrat under cropping and my goal is to estimate the proportion of the region cropped with a 95% confidence interval... From: Stats Stack Exchange | By: user87212 | Monday, August 31, 2015 smile frown In a time series regression I am finding a certain predictor variable significant which should not be, according to the client. Could this be due to the higher variance that this predictor has compared to the other variables in the model? In general,... From: Stats Stack Exchange | By: user2450223 | Wednesday, September 2, 2015 smile frown Suppose I have a time series which is almost periodic. If I were to segment each of the visually most evident periods i.e. say of the longest period, I would find a strong mean cross-correlation among pairs of such segmented periods. Is there a way to... From: Stats Stack Exchange | By: np20 | Wednesday, September 2, 2015 smile frown I'm moving in a new shared house with two other people. There are three bedrooms in the house and we will draw numbers from a hat for the rooms. All agree that room one is better than room two and room two is better than room three. I would be happy... From: Stats Stack Exchange | By: user10473 | Wednesday, September 2, 2015 smile frown so, I've found the misclassified instances in my Random forest model have lower values in some predictors, how can I adjust the model so that the threshold is more sensitive to these predictors? fit1 <- cforest((b == 'three')~ affect+ certain+ negemo+... From: Stats Stack Exchange | By: Lucia | Wednesday, September 2, 2015 smile frown There are certain variables in my data frame which need to be encrypted because it contains sensitive data.How do I go about encrypting these columns/variables? From: Stats Stack Exchange | By: Mayank Bhushan | Wednesday, September 2, 2015 smile frown In my data the classes were defined by binning a variable in 10 bins. After growing the random forest its proximity matrix is viewed as the following MDSplot: As can be seen from the plot all classes are overlapped in all clusters. I wonder if it is... From: Stats Stack Exchange | By: dima | Wednesday, September 2, 2015 smile frown When the examples of our data set contain a very large number of one label (y=0) [e.g. the patient does NOT have cancer] and a comparably smaller number of the other label (y=1) [e.g. the patient has cancer], the term “skewed classes” is often used... From: Stats Stack Exchange | By: Alex Ryan | Tuesday, September 1, 2015 smile frown I am in the process of building a model in which i need to predict total deposit balance of customers in next 3 years. Data are available at customer level. For example, I have data for 0.1 million customers for last 1 year (monthly data). Currently,... From: Stats Stack Exchange | By: Ujjawal Bhandari | Tuesday, September 1, 2015 smile frown I am doing a CFA for 5 constructs, and all together having 23 items, and each construct having a minimum of 3 items each. The erroe variance of two items are 1.21 and 1.08. How can the error variance be more than 1? I am confused. Somebody please help!!!... From: Stats Stack Exchange | By: Confused | Tuesday, September 1, 2015 smile frown Let us say there is a variable that is not normally distributed. Under what circumstances will the natural logarithm of the variable be normally distributed? I have seen many articles and papers where the author will take a logarithm of a variable and... From: Stats Stack Exchange | By: Victor | Tuesday, September 1, 2015 smile frown Consider a discrete time Markov Chain with countable state space S={...,-1,0,1,...} and transition probabilities$p_{i,i+1}= p_{i,i-1},i \in$S. Show that chain is recurrent if p=1/2 and transient if p$\neq1/2$. From: Stats Stack Exchange | By: puneet | Tuesday, September 1, 2015 smile frown If I was given: The total sample size The overall proportion of the sample with the exposure (risk factor) The overall proportion of the sample with the disease (outcome)The odds ratio (=ad/bc) How could I get the formulas for each individual cell (a,b,c,d)?... From: Stats Stack Exchange | By: Martin Frigaard | Tuesday, September 1, 2015 smile frown I'm in need of an (optimization) algorithm in R that can deal with dependent outcome variables. Let me give you a little example: Lets define some input variables x1, ... , xn and some categorical (ordinal) outcome variables y1, ..., ym that describe... From: Stats Stack Exchange | By: JimBoy | Tuesday, September 1, 2015 smile frown Problem I would like to do some inference on a system analogous to die with an unknown number of sides. The die is rolled several times, after which I would like to infer a probability distribution over a parameter corresponding to the number of sides... From: Stats Stack Exchange | By: davipatti | Tuesday, September 1, 2015 smile frown Wikipedia mentions that it is crucial to ensure normality for within-subject variables in a mixed model. I am interpreting this as I should check normality for my within-subject variables and not between subject variables. I have 4 variables: 2 within-subject... From: Stats Stack Exchange | By: PhDDhP | Tuesday, September 1, 2015 smile frown I had two questions regarding model selection for a Hierarchical Bayesian (HB) Regression Model and the purpose of Cross-Validation. 1). I understand cross-validation as one way to perform model selection for a Hierarchical Bayesian Regression Model.... From: Stats Stack Exchange | By: TSP | Tuesday, September 1, 2015 smile frown This seems like it should be an easy problem to solve, but I can't think of anything nor can I find anything to help me. I'm working with multiple data sets where I have a series of sometimes correlated, but usually not, variables. I want to select individuals... From: Stats Stack Exchange | By: MHtaylor | Tuesday, September 1, 2015 smile frown I am self-studying and trying to understand the derivation of the posterior predictive distribution of a naieve bayes model from a graph perspective. In an online course - the derivation of the probability$P(X[M+1]|x[1],...,x[M])$is based on multiplying... From: Stats Stack Exchange | By: Wouter | Sunday, August 30, 2015 smile frown Let$X$be a real valued random variable with exponential distribution. Let$a$be a complex number. What is the distribution of$Y = e^{aX}$? Can Y be written in the form of another known distribution? NOTE: based on the answer of Deep North (below)... From: Stats Stack Exchange | By: Blue | Tuesday, September 1, 2015 smile frown Anyone could explain me the differnce between fixed and random effects concerning panel data regressions? With examples? I have actually used panel data regressions for some of my predictive models but I don't really get the cases when I could used fixed... From: Stats Stack Exchange | By: mmarcc | Tuesday, September 1, 2015 smile frown I'm using bootstrap regression lately to get confidence intervals and assess variability of coefficients for glm regressions. I'd like to know how to interpretate the results when a variable is significative (CI not crossing zero) with bootstrap and... From: Stats Stack Exchange | By: Bakaburg | Tuesday, September 1, 2015 smile frown I want to find out what is the most hot topic which twitter users are speaking about "Cancer". I have downloaded 10000 medical articles abstracts for training and 20000 tweets for test. What topic model I should use? I wanted the easiest model because... From: Stats Stack Exchange | By: ehsan shirzadi | Tuesday, September 1, 2015 smile frown I performed Cox regression analysis for two Biomarkers. Both were significant (p<0.05). Biomarker 1 showed higher significance but lower Exp(B)=hazard ratio value than Biomarker 2 (see below). Biomarker 1: HR=3.06; 95%CI=1.71-5.48; p<0.001 (sorry,... From: Stats Stack Exchange | By: user86880 | Tuesday, September 1, 2015 smile frown Could anyone check that the alternative hypothesis is valid? I wanted to prove that the "Mahalanobis distance ($\mathbf{T_i} = \mathbf{(x_i - \bar{x})^T \Sigma^{-1}(x_i - \bar{x})}$)" is a Log Likelihood Ratio Test statistics for the following hypothesis.... From: Stats Stack Exchange | By: Block Jeong | Tuesday, September 1, 2015 smile frown I've run an experiment in which different subjects had to make a number of decisions, which are stored in the dependent boolean variable Y (0 or 1). I have multiple independed variables which may influence the outcome, namely an ordinal variable containing... From: Stats Stack Exchange | By: Coen | Tuesday, September 1, 2015 smile frown given x(t) a random process assume that we know the statistics of this random process completely e.g. its pdf (probability density function), higher order statistics, and so on NOW let's consider the time-derivative of x(t) obviously, dx/dt is another... From: Stats Stack Exchange | By: Bahram | Monday, August 31, 2015 smile frown I'm new here so please criticize errors! My moment generating function looks like this (after some tidying):$E[e^{wY(t)}] = \frac{1}{\left(1-2\theta(t) w\right)^{k/2}} \exp{\left(\frac{\lambda(t) w}{1-2\theta(t) w}\right)}$It's very close to a non-central... From: Stats Stack Exchange | By: lampishthing | Tuesday, September 1, 2015 smile frown If R^2 explains the variation explained by a model , what explains the correlation between the coefficients given for a slope parameter and an intercept ? I have been thinking of it in two ways: 1) If i could grab the slope and physically move it around... From: Stats Stack Exchange | By: B.Miller | Tuesday, September 1, 2015 smile frown I have used LMM with this formula: f1 <- lmer (dprime_f ~ language_f + (1+language_f|listener_f), data = data1.frame, REML = TRUE) Then I used lsmeans to run pairwise comparison and this is part of the result:$contrasts contrast estimate SE df t.ratio...
From: Stats Stack Exchange | By: user3288202 | Monday, August 31, 2015
My implementation of a Part of Speech Tagger in scikit-learn produces the same values for accuracy, precision and recall. From my point of view it looks like that I've produced either a bug within my implemtation or did something within the idea behind...
From: Stats Stack Exchange | By: jwacalex | Tuesday, September 1, 2015
Suppose i would like to select $m$ integers from the set $S=\{1,2,...,n\}$ with the following rules: 1) $j$ out of $m$ are necessarily distinct. Denote this as $S_1$ 2) the rest $m-j$ are selected from a subset of $S$ of size of $i<=n$ that contains...
From: Stats Stack Exchange | By: Hashed | Monday, August 31, 2015
How would you proceed with your model, if the confusion matrix looks like on the picture below? Classes 2, 3 and 4 get misclassified a lot among each other. my approach was to build 4 different random forest estimators in scikit ipython: estimator1 can...
From: Stats Stack Exchange | By: sultan | Tuesday, September 1, 2015
I have to predict the values of a continuous target variable $Y$ using a bunch of $X$ features. Unfortunately, the regression approch does not provide satisfying results. Thus, I was thinking to transform such a regression problem in a classification...
From: Stats Stack Exchange | By: stochazesthai | Tuesday, September 1, 2015
I live in a world of size $N$. There are $U = uN$ many urns, and $B = bN$ many balls. The likelihood of any particular urn having $k$ many balls is given by $f_N(k, n=bN, p=(uN)^{-1})$, where $f_n()$ is the binomial probability mass function in a world...
From: Stats Stack Exchange | By: FooBar | Tuesday, September 1, 2015
I've read somewhere that the reason we square the differences instead of taking absolute values when calculating variance is that variance defined in the usual way, with squares in the nominator, plays a unique role in Central Limit Theorem. Well, then...
From: Stats Stack Exchange | By: user4205580 | Tuesday, September 1, 2015
Binary data is often mentioned as a nominal sub-category, especially in such examples as female/male, smoker/non-smoker, etc. However, binary data with such values as pass/fail, correct/incorrect, absent/present, etc, seems to give some weight to its...
From: Stats Stack Exchange | By: Billy the Poet | Tuesday, September 1, 2015
I am reading an article whose method is fully based on the likelihood ratio test. The author says that the LR test against one sided alternatives is UMP. He proceeds by claiming that "...even when it [the LR test] can not be shown to be uniformly most...
From: Stats Stack Exchange | By: Sergey Zykov | Monday, August 31, 2015
Stone (1980) provides a minimax rate of convergence $a_n$ for pointwise estimation of a regression function, defining it as $\lim \inf_n \sup_{\theta \in \Theta} P_\theta( \hat{T}_n - T(\theta) > ca_n) >0$ for all $c>0$ and $\lim_{c\rightarrow... From: Stats Stack Exchange | By: alice | Tuesday, September 1, 2015 smile frown I have hierarchical data of individuals nested into families. For each individual, I have independent variables such as age, gender, education, and familiarity with product. For each family unit, I also have covariates such as household income, purchase... From: Stats Stack Exchange | By: Amw 5G | Monday, August 31, 2015 smile frown Random chains of symbols (zeros and ones)$s_1, s_2,\ldots,$drawn from some finite alphabet appear in practically all sciences. Examples include spins in one-dimensional mag- nets, written texts, DNA sequences (here symbols are alphabets), geological... From: Stats Stack Exchange | By: Srishti M | Monday, August 31, 2015 smile frown it is advised to provide mean, sd, and correlations of the data as the best practices in writing sem. However, the books I read did not mention how to obtain or provide those scores from the latent variables. Thus, I wonder, are those values derived... From: Stats Stack Exchange | By: icd | Monday, August 31, 2015 smile frown How to solve for the matrix$X$in the following linear equation$AXB + X = CDA$and$B$are full rank symmetric matrix and there is no structure to$CD$.$CD$just could be$C$. From: Stats Stack Exchange | By: user_1992_1992 | Monday, August 31, 2015 smile frown I have monthly time series data, and would like to do forecasting with detection of outliers. This is the sample of my data set: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2006 7.55 7.63 7.62 7.50 7.47 7.53 7.55 7.47 7.65 7.72 7.78 7.81 2007 7.71... From: Stats Stack Exchange | By: Ted | Monday, August 31, 2015 smile frown I am studying Bayesian Networks and came across a problem of multiplying conditional probabilities.$p(a|b)p(b)=p(a,b)$is a well known formula, but the answer to the multiplication$p(A|B)p(C|D)$, where$A,B,C,D$are sets of variables, is not a conditional... From: Stats Stack Exchange | By: user15988 | Monday, August 31, 2015 smile frown I am trying to find the possibility of matching a string to digits 1,9,6,7 The potential values are 1-9. If the string does not match one of the four values I will modify the string to see if the modified version matches. The modification will either... From: Stats Stack Exchange | By: user1894167 | Monday, August 31, 2015 smile frown The arimax function in the TSA package is to my knowledge the only R package that will fit a transfer function for intervention models. It lacks a predict function though which is sometimes needed. Is the following a work-around for this issue, leveraging... From: Stats Stack Exchange | By: B_Miner | Monday, August 31, 2015 smile frown Given a set of labels$y$and design matrix$X$we often compute a linear regression to find a set of parameters$\hat{\beta}$such that$E[y|X] = X\hat{\beta}$. However, how does one perform regression when$X\$ itself must be inferred conditioned on...
From: Stats Stack Exchange | By: Nicholas Mancuso | Monday, August 31, 2015
