# Stats Stack Exchange

I have a column named inp, and 10 columns named resp1, ..., resp10 in a matrix in R, and I want to compute receiver operating characteristic roc() using inp as input variable and each of resp1, ..., resp10 as a response variable. g <- roc(resp1 ~...
From: Stats Stack Exchange | By: user5054 | Monday, July 6, 2015
I am opening a long silent question here because I cannot understand my mean variance trend produced by limma. I am new to RNA-Seq analysis and tried using limma on my data. After voom transformation, I get something like in the image attached. I double...
From: Stats Stack Exchange | By: Shani A. | Tuesday, July 7, 2015
I am trying to understand scikit's gradient boosted tree implementation, i struggling to understand the terminal node update part def _update_terminal_region(self, tree, terminal_regions, leaf, X, y, residual, pred, sample_weight): """Make a single Newton-Raphson...
From: Stats Stack Exchange | By: Kumaran | Tuesday, July 7, 2015
Consider $K$ independent Laplace variables $X_i$ ($1 \leq i \leq K$) with mean 0 and scale $\lambda$. Let $X′$ be the variable taking the value of the Laplace variable whose absolute value is the minimum among all $X_i$'s. Due to the randomness of...
From: Stats Stack Exchange | By: NeedHelp | Tuesday, July 7, 2015
I have a question regarding the validity of the approach I am considering to look for relationships in the data I am interesting in collecting. The goal of the analysis is to see if the Y variable (a final grade) has any relationship with the movements...
From: Stats Stack Exchange | By: Jesse Johnson | Tuesday, July 7, 2015
I'm new to the neural network field and I would like to understand how one can backtest a neural network trained with backpropagation methodology. Particularly, I have a time series dataset and I trained a neural network by using the neuralnet package...
From: Stats Stack Exchange | By: Quantopic | Monday, July 6, 2015
I understand the proof that $$(aX+bY) = a^2Var(X) +b^2Var(Y) + 2abCov(X,Y),$$ but I don't understand how to prove the generalization to arbitrary linear combinations. Let $a_i$ be scalars for $i\in {1,\dots ,n}$ so we have a vector $\underline a$, and...
From: Stats Stack Exchange | By: Spiky | Tuesday, July 7, 2015
I'm trying to figure out the G Power 3 software. I'm working on my PhD and am researching leadership styles preferred by the Millennial generation. I'm using a MANOVA test. My population is 736. I'm trying to figure out my sample based on 95% and 90%...
From: Stats Stack Exchange | By: Marsha Powell | Tuesday, July 7, 2015
I'm trying to implement a $Q(\lambda)$ algorithm from this paper (warning: link is a download of a PDF) and can't seem to get it to find anything that close to the optimal policy. If possible, I'd like for someone to look over my script and see if I'm...
From: Stats Stack Exchange | By: user3704120 | Monday, July 6, 2015
I have csv file with ID and variable. It looks like this ID V1 1 0 2 -0,12 3 0,05 .... if I use hist(mydata$V1) I get an error message Error in hist.default(mydata$V1) : 'x' must be numeric But with variable ID (mydata$ID) it works. What is bad with... From: Stats Stack Exchange | By: Lukas | Monday, July 6, 2015 smile frown My question is the following : You have a dataset, and you want to determine theoretically what accuracy score (or other way to measure performance such as AUC, etc.) a "perfect" model could get on test data (or : what average accuracy score you would... From: Stats Stack Exchange | By: Pholochtairze | Monday, July 6, 2015 smile frown I am really new to this and I was wondering if I can get some help. I am basically confused as to how to compare one normally distributed variable to two other variables that are skewed (e.g., I want to do a paired samples t-test). The two skewed variables... From: Stats Stack Exchange | By: user81602 | Monday, July 6, 2015 smile frown 5 - point Likert Scale: SA = Strongly Agree = 5; A = Agree = 4; N = Neutral = 3; D = Disagree = 2; SD = Strongly Disagree = 1 Here is a sample of the results:(x) number of participants 1) SA (3) A (1) N = (0) D = (5) SD = (2)... From: Stats Stack Exchange | By: Teri | Monday, July 6, 2015 smile frown I'm trying to recreate an image using either R, Matlab or Python (the languages I know) but I'm having troubles finding a good library for the task. The image I'd like to recreate is the one below. The important parts are the signals, dynamic bands and... From: Stats Stack Exchange | By: GustafG | Monday, July 6, 2015 smile frown I have a bunch of data like these: Out1 Out2 Out3 Out4 Out5 Out6 x1 0.76 0.57 0.45 0.38 0.32 0.28 x2 0.79 0.59 0.47 0.39 0.34 0.29 x3 0.81 0.61 0.49 0.41 0.35 0.31 x4 0.84 0.63 0.51 0.42 0.36 0.32 x5 0.87 0.65 0.52 0.44 0.37 0.33 x6 0.90 0.68 0.54 0.45... From: Stats Stack Exchange | By: NESHOM | Monday, July 6, 2015 smile frown I saw this on your website. Where can I find more about this subject? Here is the url where I found it. How would I calculate the expected change? Based on the info below, how would I calculate the expected change in the test scores for males and females?... From: Stats Stack Exchange | By: user81583 | Monday, July 6, 2015 smile frown I have 2000 observations in a dataset with features and a binary-class outcome. I split the dataset into two sets for split sample validation. I use 80% to train the model and internal perform Cross validation (CV). I then test this model on the 20%... From: Stats Stack Exchange | By: user677101 | Monday, July 6, 2015 smile frown let me ask the question in detail with an example -- I have a historical dataset with columns(a,b,c,d,e,f,g) Now i have to predict (b,c,d,e,f,g) based on the value of 'a' and all the variables are inter-dependent on each other! I did use K-NN, K-Means,... From: Stats Stack Exchange | By: rupesh | Monday, July 6, 2015 smile frown What statistical test is appropriate for the following? I have student ranking by year (1=freshman, etc), along with their responses to questions that seek to find how developed they are in a number of skills (e.g. for team skills, a 3 would indicate... From: Stats Stack Exchange | By: Deborah Lundberg | Monday, July 6, 2015 smile frown I've been using BMR (Bayesian Macroeconometrics in R) package to carryout BVAR(Bayesian Vector Auto Regression). When defining the Minnesota prior for my monthly dataset and have obtained mean of each coefficient of the variables for 13 lags. testbvarmIT<-BVARM(ITdata[1:84,],coefprior=NULL,p=13,constant=TRUE,irf.periods=20,... From: Stats Stack Exchange | By: Gai | Monday, July 6, 2015 smile frown I'm reading through Fan and Li's paper "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties". In Page 2 near bottom right corner, they proposed three properties that a good penalized estimator should have: Unbiasedness: The... From: Stats Stack Exchange | By: Aaron Zeng | Monday, July 6, 2015 smile frown This is probably a fairly silly question, but I have the following regression model: > print(summary(step1)) Call: lm(formula = model1, data = newdat1) Residuals: Min 1Q Median 3Q Max -1.66219 -0.00725 -0.00725 -0.00725 1.28056 Coefficients: Estimate... From: Stats Stack Exchange | By: costebk08 | Monday, July 6, 2015 smile frown I am trying to import .dta STATA dataset into R. I input: library (foreign) read.dta ("E:/Gabrielle.dta") Then what seems to be some of my data appears with a mix of numbers and "NA" Then I input: read.dta("E:/Gabrielle.dta", convert.dates=TRUE, convert.factors=TRUE,... From: Stats Stack Exchange | By: Gabrielle Emanuel | Monday, July 6, 2015 smile frown I've created four linear regression models each with different variables.I looked at the error rate: (actual-prediction)/actual and also on the confidence levels (90%).I'v noticed that there is no correlation between the error rate as I measured it and... From: Stats Stack Exchange | By: user49422 | Monday, July 6, 2015 smile frown I'd like to know some metrics or index that is able to measure uniformity of a document set. Naive example is that if all of document is the same as each other, the index should be 1, and if a doc set includes two types of documents, the index should... From: Stats Stack Exchange | By: rkjt50r983 | Monday, July 6, 2015 smile frown I'm looking to teach myself more about NLP. I started with NLTK and I see that it has potential to eventually become something I could get paid for. From there, my journey continued to reading this blog post by Matthew Honnibal. So I'm interested in... From: Stats Stack Exchange | By: Farley Knight | Monday, July 6, 2015 smile frown There are two independent uniform continuous random variables$X$and$Y$(such that$0 \leq X \leq 10$,$0 \leq Y \leq 10$). The function$f$is the difference between the two random variables ($|X-Y|$). What is the expected value of$f(X,Y)$? My analytic... From: Stats Stack Exchange | By: Saju | Saturday, July 4, 2015 smile frown As I discover machine learning I see different interesting techniques such as: automatically tune algorithms with techniques such as grid search, get more accurate results through the combination of different algorithms of the same "type", that's boosting,... From: Stats Stack Exchange | By: Pholochtairze | Monday, July 6, 2015 smile frown In Hierarchical graphical models, many times the parameters are itself drawn from hyper-parameters. Many a time, owing to the conjugate structure of the model, the parameters are marginalized out. I want to know how marginalizing affects (if at all)... From: Stats Stack Exchange | By: user2008220 | Monday, July 6, 2015 smile frown I'm assessing the normality assumption in a data set using both ad.test function of nortest package and the equivalent in the Adgoftest one. The resulting p-values are strongly different and lead to opposite conclusion (one would lead to acceptance the... From: Stats Stack Exchange | By: Giorgio Spedicato | Monday, July 6, 2015 smile frown Suppose we have a Statistical model$\mathcal{p}= \{N(\theta,\sigma_1^2)^{\otimes m} \otimes N(\mu_2,\sigma_2^2)^{\otimes(n-m)}: \theta \in \mathbb{R} \}$All random variables are independent The parameter of interest is$\theta=\mu_1$.Consider$T(\boldsymbol{X})=(X_1,X_2....X_m)\$...
From: Stats Stack Exchange | By: Danny | Monday, July 6, 2015
I have the following in my textbook: $$r_k \thicksim N(0,\sigma_r^2) \\ \Rightarrow \sum\limits_{k=1}^K r_k(t) \thicksim N(0,K\sigma_r^2) \\ \Rightarrow \frac1{K} \sum\limits_{k=1}^K r_k(t) \thicksim N(0,\frac{K}{K^2}\sigma_r^2) = N(0,\frac1{K}\sigma_r^2)... From: Stats Stack Exchange | By: user2740 | Monday, July 6, 2015 smile frown I have a mixed effects model with 2 Locations 4 Blocks (nested in each location) In each Block I dug a hole and took soil samples at 10 Depths My variable of interest is nitrate concentration (ConcNO3) My lmer model could be if I nest both block within... From: Stats Stack Exchange | By: Kate Tully | Monday, July 6, 2015 smile frown I would like help understanding why a survival regression with no censored data-points does not give the same variance estimates as a linear model (see code below). I think it must be something to do with the fact that the variance is an actual parameter... From: Stats Stack Exchange | By: sqrt | Monday, July 6, 2015 smile frown If I have a simple regression model and the residuals are autocorrelated, what is the difference between a) simply adding the lagged dependent variable to the list of regressors and running OLS b) doing iterative Cochrane-Orcutt procedure First one is... From: Stats Stack Exchange | By: Cagdas Ozgenc | Monday, July 6, 2015 smile frown I'm trying to fit a (logistic) regression model to predict the successful funding of crowdfunding ventures (0/1) based on a series of IV with different level of measurement. One of these IVs is a categorial variable that indicates the nature of the venture... From: Stats Stack Exchange | By: Tacit | Monday, July 6, 2015 smile frown I want to do a regression analysis (as I have found out on What regression model to use when independent variables are percentages to predict % outcome? should probably be a logistic regression) but I am not sure if I want to do the right one. My dependent... From: Stats Stack Exchange | By: TeeVeeZee | Monday, July 6, 2015 smile frown i have a problem of sorting out medical history based on departments .another problem is that it will be from a mobile snapshot of the report . From: Stats Stack Exchange | By: user81537 | Monday, July 6, 2015 smile frown Some explaining facts in the beginning: I have got my data structured in SPSS in the following way. I've got 20 variables (case_number, a_1, b_1, c_1, a_2, b_2, c_2, ....) The variables are named in such a way because I took repeated measures (at different... From: Stats Stack Exchange | By: Ihkavs | Monday, July 6, 2015 smile frown I am going through the LAB section §6.6 on Ridge Regression/Lasso in the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013). More specifically, I am trying to do apply the scikit-learn Ridge... From: Stats Stack Exchange | By: Jordi | Saturday, July 4, 2015 smile frown I tried J48 classifier in Weka to train the learner and got an accuracy of 99% but when I supply it with test dataset it threw me an error saying Training and Test sets are incompatible. I failed at resolving this error and hence moved to R, I'd be glad... From: Stats Stack Exchange | By: Neelima Seshadri | Monday, July 6, 2015 smile frown For example, in SAS's PROC SQL, there are two ways to insert new rows to a table: INSERT is a statement but VALUES is a clause. So if they are different, what is the purpose of knowing the difference between a statement and a clause? From: Stats Stack Exchange | By: Gemini | Sunday, July 5, 2015 smile frown I'm been trying to understand Gibbs sampling. What I'm looking for is a paper or other reference which uses a simple canonical example and uses that to illustrate Gibbs Sampling. Sadly I've not found one that does that. Would appreciate any such reference... From: Stats Stack Exchange | By: user975917 | Monday, July 6, 2015 smile frown I'm doing a linear regression with cluster robust SE and I have the following conceptual problem: I have five regressors, of which four are statistically significant, while the remaining regressor is not. When I put K dummy variables in the model in... From: Stats Stack Exchange | By: Luca Dibo | Sunday, July 5, 2015 smile frown My colleagues and I conducted a study of the effects of an experimental translocation on the movement and activity patterns of common brushtail possums in New Zealand. This involved first capturing 12 individuals (6 males and 6 females), fitting them... From: Stats Stack Exchange | By: Todd Dennis | Monday, July 6, 2015 smile frown This is not homework. I am just bothered about question 2.2.1 of Introduction to Mathematical Statistics (Sixth or seventh edition) of Hogg,McKean and Craig. Question for ready reference is: If the joint pmf of X_1 and X_2 is...$$p(x_1,x_2)=(2/3)^{x_1+x_2}...
From: Stats Stack Exchange | By: Malik | Sunday, July 5, 2015
I am calculating the distance measure between two frequency distributions using the Jenson Shannon divergence, but the output would be NaN, as I have 0 value in my distribution. Could you please advise me why the output is NaN?
From: Stats Stack Exchange | By: user81519 | Monday, July 6, 2015
I am an R beginner, so sorry if I missing something basic. I did a linear fit with 2 input variables and 1 output. Plots of the output vs each input show a clear positive slope for each. However, the data output shows a negative coefficient for one variable....
From: Stats Stack Exchange | By: user3444294 | Sunday, July 5, 2015
I have time series data for any number of time points for both males and females. Would it make sense to have sex as the entity in the panel approach? I have data on the annual rates at which a particular disease occurs. I also have the spending that...
From: Stats Stack Exchange | By: Brad | Saturday, July 4, 2015
I use Cholesky decomposition to simulate correlated random variables given a correlation matrix. The thing is, the result never reproduces the correlation structure as it is given. Here is a small example in Python to illustrate the situation. import...
From: Stats Stack Exchange | By: Eli Korvigo | Sunday, July 5, 2015
