Question Does Simple Linear Regression still work if one does not take the average of the sum of the squared residuals? But instead just minimizes the sum of the squared residuals. Context I am just beginning the Machine Learning course via coursera:...

From: Stats Stack Exchange | By: John Conor Cosnett | Friday, February 12, 2016

While I was going through Oracle Data Mining, found a interesting statement. https://docs.oracle.com/database/121/DMCON/process.htm#DMCON115 "Data Mining and Statistics There is a great deal of overlap between data mining and statistics. In fact most...

From: Stats Stack Exchange | By: Hari Prasad | Friday, February 12, 2016

Having recently graduated from my PhD program in statistics, I had for the last couple of months began searching for work in the field of statistics. Almost every company I considered had a job posting with a job title of "Data Scientist". In fact, it...

From: Stats Stack Exchange | By: RustyStatistician | Thursday, February 11, 2016

In relation to this question and answer, the default value for the Python LDA for alpha is 0.1 and eta is 0.01. Is this supposed to be the normal value? If yes, then how far does low and high goes for alpha and eta? Like let's say for alpha, is there...

From: Stats Stack Exchange | By: ZeferiniX | Sunday, February 14, 2016

I am new in PCA, currently working on a dataset with 10,000 rows and 471 attributes. I just ran PCA by using prcromp in R. I could find out the 1st PC to retain from b$sd^2 of all attributes where b is the result after PCA, but do not which one hold...

From: Stats Stack Exchange | By: Wang Nick | Sunday, February 14, 2016

How can I create Time series plots for the data of API values with respect to thresholds, u = 100? I already used these cods in R API<- read.csv(file.choose()) API<- as.vector(as.matrix(t(API))) length(API) plot.ts(API,xlim=c(0,30000), ylim=c(0,100))...

From: Stats Stack Exchange | By: juseef | Sunday, February 14, 2016

I am looking to model fraudulent cases using logistic regression. However there are tow different datasets which are available. I used to build my model on; it had 4% of fraud cases. My model on this gave me a pretty good accuracy. The dataset on which...

From: Stats Stack Exchange | By: darkage | Sunday, February 14, 2016

How can I estimate functions of the form: $f(X,Y,Z) = a + bX + cY + bcZ$ I know through expert knowledge that the population coefficient of $Z$ is equal to $bc$ but am not sure how to estimate the model with this constraint. If I use OLS to estimate...

From: Stats Stack Exchange | By: user2763361 | Sunday, February 14, 2016

I want to run a hierarchical Bayes regression model using this runiregGibbs function. My data is like the following: y Proximity Time Knowledge Test Purchase Service 1 8 4 2 2 1 3 2 7 2 2 2 2 2 3 9 1 4 2 1 3 4 7 2 1 2 1 1 My X are the six attributes...

From: Stats Stack Exchange | By: lll | Sunday, February 14, 2016

I Spss, how do you Adjust for two binary variables, "gender" and "age" (23)? The primary covariate is a continuous variable with a binary dependent. Is it correct to insert age * gender or is it sufficient to enter the variables separately?...

From: Stats Stack Exchange | By: schvost | Sunday, February 14, 2016

I know that a test statistic is used to help us in hypothesis testing, etc. We compute the test statistic, and then compare it to the $\alpha$ value to reject or accept the null hypothesis. For a normal distribution, this is easy, you just do $ Z = ((X-\mu)\sqrt...

From: Stats Stack Exchange | By: Hunle | Sunday, February 14, 2016

Can I apply a Generalized Linear Model with a repeated mesures design? I an experiment in which n=12, two groups with n=6; measures taken in different momments of time. R's glm function doesn't seem to have an option for repeated measures tests. Thank...

From: Stats Stack Exchange | By: AlfredoB | Sunday, February 14, 2016

I am looking for a method which is able to find confidence regions in multivariate distributions with $(1-\alpha)\%$ probability of occurrence. What I mean by probability of occurrence is that, the probability that an event lies in that region should...

From: Stats Stack Exchange | By: Faran | Saturday, February 13, 2016

Under Item Response Theory, test information $I(\hat\theta)$ is a function of the examinee's estimated ability $\hat\theta$, discovered a the end of the test, and the items that were answered during said test. The standard error of estimation $SE$, in...

From: Stats Stack Exchange | By: Douglas De Rizzo Meneghetti | Saturday, February 13, 2016

A sporting goods manufacturer claims that the variance of string tensions for any decent tennis racquet should be about 9 pounds. The string tensions of 18 randomly selected tennis racquets produced a variance of 8.13 pounds. Find the p-value to test...

From: Stats Stack Exchange | By: Kaustav Sen | Sunday, February 14, 2016

I have built two linear regressions independently of one another, and $Y_1$ and $Y_2$ are in the same units. I am interested in using the sum of $\widehat{Y}_1$ + $\widehat{Y}_2$ (the predictions) to do additional calculations. My questions: Assuming...

From: Stats Stack Exchange | By: akw | Saturday, February 13, 2016

I have approximately-log-normal price data. The data is hierarchically structured. Let's say there are three levels, so the notation for a log-transformed observation could be $x_{ijk}$, with the indexing decreasing in granularity to the right. A minority...

From: Stats Stack Exchange | By: Brash Equilibrium | Sunday, February 14, 2016

One of the biggest issue with clustering is that we drive different conclusion based on different linkage and used methods. I would like to know your opinion on this, which method will you select. One might say, the best method of clustering which gives...

From: Stats Stack Exchange | By: Mola | Saturday, February 13, 2016

I came across this in my textbook, and am having trouble understanding why these two things are equal. Is there an identity someone could show me to help me understand? It seems like we're dropping $\bar{Y}$ and $\bar{X}$....

From: Stats Stack Exchange | By: Peaches | Saturday, February 13, 2016

When I centre data, is it the absolute value that you use or do you keep the sign? For example: you have data (4,5,6) and the mean is 5. After centring the data, is the data (-1,0,1) or (1,0,1)
Thanks

From: Stats Stack Exchange | By: Marion | Saturday, February 13, 2016

I am wonder how to standardize multivariate normal value and check that standardization gives correct result. This question is copy of mine from math section where I get no answear or idea about the problem. Before read the question memorize that standardization...

From: Stats Stack Exchange | By: Bogdan | Saturday, February 13, 2016

Background: It has been shown and widely referenced (applets even exist, etc.) that for even a highly-skewed numeric variable, a sample size of $n\ge{}$30 is often "large enough" for the Central Limit Theorem (CLT) to take effect, and thus for the distribution...

From: Stats Stack Exchange | By: Meg | Saturday, February 13, 2016

I am using an SVC to do binary classification. I am using the rbf kernel and doing leave-one-out cross validation to choose my value of C. I ran the model using my normal features and had a detection probability of 1 and false alarm probability of 0....

From: Stats Stack Exchange | By: user984165 | Saturday, February 13, 2016

I' am reading Introduction to Stochastic Processes by Lawler and I' am a bit confused how demonstrates you get the transition matrix $\textbf{P}_t$ from the infinitesimal generator $\textbf{A}$. I'll provide the example in the book: $\textbf{Example...

From: Stats Stack Exchange | By: oversound | Saturday, February 13, 2016

I'm reading a book: For N-look intensity SAR images, $$I_N=\frac{1}{N}\sum_{i=1}^N I_1(i)=\frac{1}{N}\sum_{i=1}^N (x(i)^2+y(i)^2)$$ Where $x(i)$ and $y(i)$ are the real of the $i$th look (or sample). Since $x(i)$ and $y(i)$ are independently Gaussian...

From: Stats Stack Exchange | By: sepideh | Saturday, February 13, 2016

I'm unsure of how to convince myself that $$\hat{\beta} = \frac{\sum X_i Y_i}{\sum X_i^2}$$ is an unbiased estimator when the regression model $$Y_i = \beta X_i + \epsilon_i$$ follows basic OLS assumptions. To show this is unbiased, we need to show that...

From: Stats Stack Exchange | By: Peaches | Saturday, February 13, 2016

I have an estimate of a CDF in R (nonparametric) and I need to compare this distribution to another one by Kullback-Leibler. In order to do so, I need to find the pdf of this random variable. What is the best way to estimate the pdf in this case? thank...

From: Stats Stack Exchange | By: Mic | Friday, February 12, 2016

I am not able to understand the difference between the joint density function and density function for a random variable Z = x1 + x2 where x1, x2 are uniform rvs in [0,1]. I think joint density in this case is f(x1,x2) = 1 (ref: http://www.math.wm.edu/~leemis/chart/UDR/PDFs/StandarduniformStandardtriangular.pdf)...

From: Stats Stack Exchange | By: user104051 | Saturday, February 13, 2016

Suppose that X and Y are independent Poisson distributed values with means 2θ and θ, respectively. Consider the combined estimator of θ˜θ = k1X + k2Y (a) Find the condition on k1 and k2 such that ˜θ is an unbiased estimator of θ. (b) For ˜θ...

From: Stats Stack Exchange | By: stas | Saturday, February 13, 2016

Trying to answer a question from an textbook but struggling a bit to use the formula. I am trying to calculate the WACC on this particular set of data: So I have that: Re = 20.69 Rd = 1842 E = 792.95 (assuming that this refers to the average of the 'Common...

From: Stats Stack Exchange | By: blurub | Saturday, February 13, 2016

I was trying to understand how much data I would need compared to the number of parameters (and to have good generalization) when I train a radial basis function (RBF) network on a regression task where we estimate some unknown $f^*$ that has the mapping...

From: Stats Stack Exchange | By: Charlie Parker | Saturday, February 13, 2016

I want to estimate a dynamic panel data model using data from Arellano and Bond (1991) [write in Stata: use http://www.stata-press.com/data/r7/abdata.dta]. Now, I want to estimate parameters using the FOD (forward orthogonal deviations) rather than using...

From: Stats Stack Exchange | By: Mark Vitale-Ferrari | Saturday, February 13, 2016

Given the variable $Z$, distributed as follows: $Z$ ~ $Skellam(k;\lambda_1;\lambda_2)$ I would like to understand how to compute the probability $P$$(Z>0)$. I know to compute the value $P$$(Z=z)$, by following what is described in the wikipedia web...

From: Stats Stack Exchange | By: Quantopic | Saturday, February 13, 2016

I have a large matrix. It consists of about 10.000 rows (each row one document) and 10.000 columns (each column one word). The binary value indicates if a word exists (1) in the particular document or not (0). id,topic,cat,dog,mouse,... 1,music,1,1,0,......

From: Stats Stack Exchange | By: user1170330 | Saturday, February 13, 2016

I don't have any major in Statistics but as a data scientist sometimes I use chi square significance test to test independence of categorical variables. I was wondering about: 1) What exactly is chi square? 2) Why does it work in testing independence...

From: Stats Stack Exchange | By: user2409011 | Saturday, February 13, 2016

I often run into charts of data projecting expectations that a rate of increase will continue at an increasing rate. http://arctic-news.blogspot.com/ Is it legit to include in a graph a range of expectations modeled by different geometric and arithmetric...

From: Stats Stack Exchange | By: Rowland Whittet | Saturday, February 13, 2016

I have a time series, let's say N daily log-returns. I want to study the moments (possibly the distribution) of the weekly returns. I have two ways: 1) Using the time-additivity property of logarithms, I just add 5 consequent daily log-returns to get...

From: Stats Stack Exchange | By: Puzzle | Friday, February 12, 2016

I want to use SVD to fit a least-square simple linear regression. So far, I'm able to use SVD to recover the coefficients and MSE. But I'm not able to calculate the standard errors for the coefficients. I'm following the methods in this post, $\hat{\sigma}^2...

From: Stats Stack Exchange | By: Student T | Saturday, February 13, 2016

I come across an article with a regression output: y=-53,06+122,58x-200,7x^2 y is here total spending and x is the ratio of people receiving benefits. The mean of x is 0.204 and the min is 0.106 and max is 0.287 In the article it is stated that the spending...

From: Stats Stack Exchange | By: annesophie | Saturday, February 13, 2016

I'm really new to K-Means clustering technique. I'd like to calculate BIC for K-Means to find best K (number of clusters). I looked around on the web to find a solution in python but there is no specific example except this thread which I'm not sure...

From: Stats Stack Exchange | By: Araz | Saturday, February 13, 2016

This is a cross-post from here (sorry for the redundancy): I'm receiving an unexplainable error when using PGMM. Here's a snippet of my session: > head(df) NUTS_ID year elderly mortality gdp cross_ext_danger 9 AT111 2002 0.1986077 0.01058817 17300...

From: Stats Stack Exchange | By: Daniel Lee | Saturday, February 13, 2016

I thought I understood this issue, but now I'm not as sure and I'd like to check with others before I proceed. I have two variables, X and Y. Y is a proportion, but it is not bounded by 0 and 1 and is generally normally distributed. X is also a proportion,...

From: Stats Stack Exchange | By: Bajcz | Friday, February 12, 2016

I'm new user of R. I have quantal response data with sampling test. Before sampling test decide sample size and sampling points. This data is test with '75' sample size and '20'sampling points. years fail failure_rate 1 0 0.00000000 3 0 0.00000000 5...

From: Stats Stack Exchange | By: Hee Nam | Saturday, February 13, 2016

I'm working with time series data. And there is a use of acf ( auto correlation function ) being applied to the log of a series. Basically the log of the data with acf was the correct way while the acf without log was not. Why I would use a log function...

From: Stats Stack Exchange | By: Alvaro Joao | Saturday, February 13, 2016

I have performed a binary logistic regression with whether or not a sports person was re-contracted or not as the DV. Draft year is a significant predictor therefore I am now trying to determine the predicted probabilities for each draft year (1999-2012)....

From: Stats Stack Exchange | By: Courtney | Saturday, February 13, 2016

I understand that a state-space model is a common model where the current observation $y_t$ depends on the current state $x_t$. Is there any common model where the current observation $y_t$ depends on the future states $x_{t+1}, x_{t+2}, x_{t+3},...$?...

From: Stats Stack Exchange | By: rkjt50r983 | Saturday, February 13, 2016

after seen this example, I've a doubt. Is the probability of be diagnosed HIV positive given the positive ELISA reading dependent of the incidence rate (p)? Why should it be? $$P(\text{HIV positive }| \text{ positive ELISA reading}) = \frac{p\times.977}...

From: Stats Stack Exchange | By: Marie-Eve | Friday, February 12, 2016

I have done some initial analyses on some patient datasets regarding neuro-degenerative diseases. A method I've used is DFA (Detrended Fluctuation Analysis) in MATLAB, which produced a family of curves (straight lines, in fact) for each patient-disease...

From: Stats Stack Exchange | By: Arkoudinos | Friday, February 12, 2016

I'm trying to evaluate series of face detection algorithm. For that I need the best protocol or measures to distinguish each algorithm. As output I only have the number of faces detected (until now I can get more information as the answer tells me)....

From: Stats Stack Exchange | By: Alvaro Joao | Thursday, February 11, 2016

I have a covariate $B$ (let's say age) and two different responses $T_1$ and $T_2$. The bivariate distributions of $B,T_1$ as well as $B,T_2$ are bivariate normal and known: $$ \begin{pmatrix}B\\T_1\end{pmatrix} \sim N \left[ \begin{pmatrix}\mu_B\\\mu_1\end{pmatrix}...

From: Stats Stack Exchange | By: Alexx Hardt | Friday, February 12, 2016

