# Stats Stack Exchange

I am confronted to a particular type of problem that I do not know how to handle. And I cannot find any literature about this problem. Problem settings I have a dataset of $n$ sample points. Each point consists in a set of $m$ vectors of features, named...
From: Stats Stack Exchange | By: Pop | Monday, November 23, 2015
I developed a simple fraud detection example to test logistic regression. I have n features (e.g. credit score, account balance, etc.), m samples for training and I try to compute my output y with 0 - fraud or 1 - no fraud. Everything works well so far....
From: Stats Stack Exchange | By: neurix | Sunday, November 22, 2015
Reading through CV all-time classics I came across a statement that I would like to clarify. This is the post and my question refers to the closing remarks: "I have to note that all of the knowledge I just imparted is somewhat obsolete; now that we have...
From: Stats Stack Exchange | By: Antoni Parellada | Wednesday, November 25, 2015
I am trying to compute a 95% confidence interval for a mean response on a small dataset, yet when I calculate this manually I get a very different interval. How is R calculating the interval when using predict.lm? Am I using the wrong function call?...
From: Stats Stack Exchange | By: ssahli | Wednesday, November 25, 2015
I've also asked this question on math.stackexchange.com. I hope that's not a problem. I'd like to know how I can minimize, with respect to $\hat{y}(x)$, $$\DeclareMathOperator{\Tr}{Tr} \mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)^2 + (\hat{y}(x)-y)\Tr(\nabla^2_x\hat{y}(x))... From: Stats Stack Exchange | By: Kiuhnm | Monday, November 23, 2015 smile frown I know how to do one way ANOVA when the response variable is continuous. I need help learning how to do one way ANOVA when the response variable is binary (i.e., how to calculate MSE within group and between groups when the response is binary).... From: Stats Stack Exchange | By: Heather Keturah | Wednesday, November 25, 2015 smile frown New to R! Just learning! I have a row of data that I would like to break out into 5 rows based on the value. Attached is an image of what I'd like to achieve - don't know where to start! the cost is divided evenly over the 7 days. I have 52 weeks of... From: Stats Stack Exchange | By: retroscience | Tuesday, November 24, 2015 smile frown I have tried to fit a VAR model for two stationary time series dlogsl_ts and dlogllc_ts(tested by PP test and ADF test), the monthly river flow data. From: VARselect(dlogdata, lag.max=10) # SC(3) It seems that I could try fitting the model with a lag... From: Stats Stack Exchange | By: user95902 | Sunday, November 22, 2015 smile frown What reparametrization of vector of parameters \theta makes the Jeffreys prior$$\sqrt{\det I(\theta)}correspond to the uniform prior? A change of parametrization from \theta to \eta changes the Fisher information as follows (I think): \begin{align}... From: Stats Stack Exchange | By: Neil G | Tuesday, November 24, 2015 smile frown Based on my reading some course notes as well as the answers and comments to this SO post I started thinking about the general steps for creating a stationary distribution. Assuming my problem were to model the chance of snow tomorrow based on weather... From: Stats Stack Exchange | By: Rilcon42 | Monday, November 23, 2015 smile frown This question perhaps does not belong here. If you do know where it belongs, please let me know and I will delete the question. Data: Consumer Expenditure Survey from the BLS (PUMD) Files used: MTBI (expenditure by UCC codes) and FMLI (expenditure per... From: Stats Stack Exchange | By: Elad663 | Tuesday, November 24, 2015 smile frown I am attempting to model probabilities using the multinomial logit link and I am confused about how the link works. To study the link function I have been attempting to use a deterministic system. As an example, I am attempting to model the probability... From: Stats Stack Exchange | By: Mark Miller | Tuesday, November 24, 2015 smile frown Let's say we have a curve A with 50 time points (max = 60, min = 1). One way I can do is to standardize the values of the curve and make the range become [0,1] instead of [1,60]. However, I have another curve B with 50 time pints (max = 34, min =3).... From: Stats Stack Exchange | By: RockTheStar | Tuesday, November 24, 2015 smile frown Say I am Ok with the numbers getting drawn from a standard normal distribution, but I also want the autocorrelation of the series at lag 1 to be a specific number. How can I generate such a series of numbers? From: Stats Stack Exchange | By: The Baron | Tuesday, November 24, 2015 smile frown I used x-12 decomposition method to decompose my tourist' arrivals data in to trend seasonal and error term separately.My data exhibit pseudo additive model. I want combine seasonal and error components. because I try to fit ARIMA model for combine series... From: Stats Stack Exchange | By: Hansanie | Tuesday, November 24, 2015 smile frown I'm in a situation that I can't seem to find many much information specifically dealing with, so the details of which may be important to the actual question: I'm currently a fourth year student in university studying mathematics. I'll be coming back... From: Stats Stack Exchange | By: Nathan Margaglio | Tuesday, November 24, 2015 smile frown We have a blog site, with various authors, and I'd like to create an algorithm to rank them. Some of the variables we can use are: Total views (hits across all authors blog posts) Number of blogs published Word count of all blogs (not sure how relevant... From: Stats Stack Exchange | By: Woodsey | Tuesday, November 24, 2015 smile frown I have different users who can change product prices (ecommerce) in random moment. I want to design algorithm which will rank these strategies on their effectiveness in any time period. All data available (prices, costs, types of products, total sales,... From: Stats Stack Exchange | By: Василий Лукин | Tuesday, November 24, 2015 smile frown I tried to calculate it using simple Monte-Carlo. However, I am not convinced in the correctness of the results, so wanted to double-check. For vectors of length 10,000, how high Pearson R do I need, to get p-value of 0.1%? Also, is there another way... From: Stats Stack Exchange | By: The Baron | Tuesday, November 24, 2015 smile frown I am using ARIMA in a Time Series data to predict the next x no. of values. The data is not seasonal but with increasing mean and constant variance. When lesser number of data points are used to test (let's say 15-16 data points.) there is an issue.... From: Stats Stack Exchange | By: tmbsundar | Tuesday, November 24, 2015 smile frown I have a sequence of events (e1,e1,....,en), and each event (ei) is descriped by two features (ti,F1,F2), where ti is the time of capturing the event. The events (ei) is captured every second for 10 hours. An example of the events as follow: {(1,10,5),(2,10,4),(3,10,8),(4,10,0),(5,11,5),(6,11,5),(7,11,19),... From: Stats Stack Exchange | By: Omar14 | Tuesday, November 24, 2015 smile frown I want to compare two models using MATLAB, lets say: M1: Y ~ 1 + AGE M2: Y ~ 1 + AGE + WEIGHT + HEIGHT I would like to obtain the F and p-value for the comparison of M1 vs M2. Is there a simple way to do this?... From: Stats Stack Exchange | By: mat | Tuesday, November 24, 2015 smile frown I am trying to test the predictive accuracy of regression using training sets of varying sizes. Y <- rnorm(100) X <- replicate(5, Y+rnorm(100) ) data <- as.data.frame(cbind(Y,X)) Let's say the training set is 2% of the data: train <- nrow(data)... From: Stats Stack Exchange | By: user3742038 | Tuesday, November 24, 2015 smile frown i'm trying to make a regression model through the stata. so far, i want to know that is there a difference in wages based on educational level(masters vs undergrat) howver, my DTA file have a lot of label lists, for example 70 Post Grad, M.A/M.S.xxxx... From: Stats Stack Exchange | By: purugin | Tuesday, November 24, 2015 smile frown I would like to test for significant differences between several replicated measurements (experimental conditions measured in replicates) considering a numerical covariate. I suppose, I don't have a dependant variable. In order to encode the experimental... From: Stats Stack Exchange | By: datamole | Tuesday, November 24, 2015 smile frown I ran a regression model between X and y. I used robust regression. The results are significant, but when I ran a Pearson correlation I found that the correlation values aren't significant. I don't understand how this can happen. From: Stats Stack Exchange | By: Ehab Ibrahim | Tuesday, November 24, 2015 smile frown Here is my old question I would like to ask if someone knows the difference (if there is any difference) between Hidden Markov models (HMM) and Particle Filter (PF), and as a consequence Kalman Filter, or under which circumstances we use which algorithm.... From: Stats Stack Exchange | By: user5584748 | Monday, November 23, 2015 smile frown As far as I understand, estimating the error of a model, say an artificial neural network, requires to know the "true" model. Wikipedia says in its article "Errors and residuals": "The error (or disturbance) of an observed value is the deviation of the... From: Stats Stack Exchange | By: Julian | Tuesday, November 24, 2015 smile frown I have data which are, as most data are, right-censored. It's abount a panel on firms over 15 years However, I have a variable telling me the age of each firm, per observation. Therefore, data are shown for an interval only, but I know exactly when each... From: Stats Stack Exchange | By: Giacomo | Tuesday, November 24, 2015 smile frown I am interested in the Binomial-Binomial hierarchical model, where the number of trials itself follows a binomial distribution. I would like to know the expected value (first central moment, \mu_1) and variance (second central moment, \mu_2) of this... From: Stats Stack Exchange | By: paideia | Monday, November 23, 2015 smile frown I have the following dataframe on which I did logistic regression with response as outcome. There are some good predictors in these variables so I expected significant variables. structure(list(response = c(0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L,... From: Stats Stack Exchange | By: Ansjovis86 | Tuesday, November 24, 2015 smile frown I have hundreds of sample points where I have data of three variables ranging from 0.0 to 1.0. I would like to use some statistical test to find a function that could predict a phenomenon. I also have data of this phenomenon in these sample points. Here... From: Stats Stack Exchange | By: Albert C | Tuesday, November 24, 2015 smile frown I want to draw Bayesian inference via importance sampling and I do not come up with a good idea of an importance density forp(\sigma)\sim\frac{1}{\sigma}.$$Is there a way to sample from this distribution directly? I am not sure whether$$\frac{1}{z},...
From: Stats Stack Exchange | By: muffin1974 | Monday, November 23, 2015
For a slight bit of insight, I'm (crudely) measuring my energy consumption (kWh and gas) per day. I have the thermostat set at a constant 19 degrees Celsius. However, I want to adjust the outcome for average outside temperature. How do I do that? I'm...
From: Stats Stack Exchange | By: Ben | Tuesday, November 24, 2015
I'm trying to find the joint probability of three probabilities using the correlation coefficient, in order for me to find the joint probability between two of them I used a well known formula: pAB = pApB + corrAB*(sqrt(pA(1 − pA)pB(1 − pB)) However,...
From: Stats Stack Exchange | By: Matthew | Tuesday, November 24, 2015
I am working on a project for which I need to compute the answer to the following problem. I am having trouble working this out. There is a bag with 48 balls numbered 1 to 48 Person A draws 15 balls out of the bag without putting the previously picked...
From: Stats Stack Exchange | By: Vinay | Tuesday, November 24, 2015
Here is my r code and the output of the t-test. Even if the mean is clearly 0, the t-test still accepts the alternative hypothesis. > x = c(-2, -2, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 2, 2) > t.test(x, alternative="two.sided") One...
From: Stats Stack Exchange | By: Mazvél | Tuesday, November 24, 2015
Let say I have some data, and then I fitted the data with a model (a non-linear regression). Then I calculate the r-squared (R^2). When r-squared is negative, what does that mean? Is that mean my model is bad? I know the range of R^2 can be [-1,1]. When...
From: Stats Stack Exchange | By: RockTheStar | Tuesday, November 24, 2015
I have two unknown random variables W and A, where W~(nx,ny) and A~(x,z). My aim is the find out the range of n for W/A. I tried to first standardised W and A and make it greater than -1.96 and small than 1.96. However, I end up with something like:...
From: Stats Stack Exchange | By: Oiile990 | Monday, November 23, 2015
I am currently working with a universe of 100 companies and interested in how they compare on certain metrics. Some companies do not report all metrics that I am interested in so I am reporting them as null vs inserting a 0 value. In the extreme case...
From: Stats Stack Exchange | By: greg | Tuesday, November 24, 2015
I have a sample of $n$ values between 0 and 1 (histogram above). I have $k$ subsets of varying sizes that partition the sample. I would like to test if the averages of the subsamples are significantly different from the average of the sample - ie if...
From: Stats Stack Exchange | By: stat3_ik | Tuesday, November 24, 2015
There seems to be something in our human understanding that creates difficulties in grasping intuitively the idea of variance. In a narrow sense the answer is immediate: squaring throws us off from our reflexive understanding. But, is it just variance...
From: Stats Stack Exchange | By: Antoni Parellada | Monday, November 23, 2015
I am using R, and I had a dataset with 400000 rows and 800 columns, training a random forest model with only 100 trees on this dataset will take me about 1 and half hour on my laptop. So I went on and performed pca on the dataset and find out that the...
From: Stats Stack Exchange | By: Ryan Zhang | Tuesday, November 24, 2015
I am attempting to get the Adjusted R-Square value in R (the programming language) and store it as a variable. I am not sure how to accomplish this. I can see the R-Square value if I call: summary(lm(x~y)) Along with the rest of the data, but how do...
From: Stats Stack Exchange | By: redIago | Tuesday, November 24, 2015
I am working on Big Data Analytics, I would like to know how well k means algorithm can be used for clustering Big Data?
From: Stats Stack Exchange | By: SSoans | Monday, November 23, 2015
The regression model y= b0+ b1 x + b2 x^2 + b3 x^3 and the second regression model y = b0 +b1 (x-u) + b2 (x-u)^2 + b3 (x-u)^3 where u is the mean of x These two models lead to the same curves, or says the same fitted values. I understand centering the...
From: Stats Stack Exchange | By: Sangdi Lin | Monday, November 23, 2015
Simply put, I'd like to know how the plm package in R calculates the residuals of a random-effect regression. I ask this because i'm getting some "weird" outputs. Let-me reproduce them here using the Grunfeld data for four firms, like Gujarati in his...
From: Stats Stack Exchange | By: Rodrigo Remedio | Monday, November 23, 2015
I am curious about the intuition behind the Tukey's HSD. I know that it is designed for post-hoc test(WHEN and HOW part), but I want to know underlying theory that justifies its usage(WHY part). To phrase my question differently, why is using the test...
From: Stats Stack Exchange | By: Alby | Monday, November 23, 2015
I'm looking at the exam results of a university course. The exam was completed by 500 people, who each completed 105 multiple choice questions. I have each person's total score on the exam (ExamTotalScore). 11 of those questions were on a topic that...
From: Stats Stack Exchange | By: user1205901 | Monday, November 23, 2015
I would want to ask if I have a random variable $A \sim N(b,c)$ then what is the distribution of the inverse of $A$?
From: Stats Stack Exchange | By: Oiile990 | Monday, November 23, 2015
