For my masters thesis in corporate finance I'm doing a research about debt concentration (i.e. companies using several debt types or only 1, measured by HHI index) I've got several determinants and some control variables. My data consists of 24503 observations,...

Stats Stack Exchange | By: Brx | Sunday, May 1, 2016

I am implementing a vanilla variational mixture of multivariate Gaussians, as per Chapter 10 of Pattern Recognition and Machine Learning (Bishop, 2007). The Bayesian approach requires to specify parameters for the Gaussian-inverse-Wishart prior: $\alpha_0$...

Stats Stack Exchange | By: lacerbi | Friday, April 29, 2016

I am reading Chris Bishop's Pattern Recognition and Machine Learning. In Section 2.3.5 he introduces some ideas on the contribution of the $n$th observation in a data set to the maximum likelihood estimator of the mean. He says that the larger number...

Stats Stack Exchange | By: cgo | Friday, April 29, 2016

I computed a A x B (2 x 2) within subject ANOVA for a given ROI using repeated measures GLM. The interaction between A and B was not significant, but two main effects were detected. Can I still compare A1 vs A2 within B1 or within B2 using paired ttest...

Stats Stack Exchange | By: ping yang | Friday, April 29, 2016

Can someone explain how the math used to determine that 34 participants were required for this study? To have an 80% chance of detecting a 1.5–percentage point between-group A1C difference as significant (at the two-sided 5% level), with an assumed...

Stats Stack Exchange | By: haim | Sunday, May 1, 2016

I am fitting a mixed effects model in R using nlme lme(y~x+I(x^2),random=~x|subject,data=train) Is this the correct way or should it be lme(y~x+I(x^2),random=~x+I(x^2)|subject,data=train) What is the difference in the interpretation of fitting these...

Stats Stack Exchange | By: kon7 | Monday, May 2, 2016

Hi i'd like to know a bit more about kNN-like approach implementations for classification problems, and specifically classification problems where we want to have a probability distribution as an output (to compute logloss like metrics for example) In...

Stats Stack Exchange | By: Fagui Curtain | Monday, May 2, 2016

I am trying to replicate this paper "Gleditsch, Kristian Skrede and Michael D. Ward. 2006. "Diffusion and the International Context of Democratization", International Organization 50: 911-933" and I have problems finding the gamma coefficients. The base...

Stats Stack Exchange | By: Maria | Monday, May 2, 2016

I am working with the following model and am attempting to derivate coordinate ascent updates using mean field variational inference: Sample $p_X \sim Beta(\alpha_1, \alpha_2)$ Sample $p_Y \sim Beta(\alpha_2, \alpha_1)$ For $i \in \{ 1...d\}$, sample...

Stats Stack Exchange | By: lrAndroid | Sunday, May 1, 2016

In train or rfe I can only set Accuracy or Kappa. Is there a way to edit the functions to define a scoring function? I am using Kappa at the moment but I need to optimize for positive predictive Value (= hit rate = fraction of positives recognized as...

Stats Stack Exchange | By: user670186 | Sunday, May 1, 2016

A couple weeks back, I was seeing if I could solve the basic formulation of the Birthday Problem (i.e. assuming 365 equally likely birthdays, what's the probability that, given a room of ${n}$ people, at least one pair of people share a birthday). The...

Stats Stack Exchange | By: ZombieSocrates | Sunday, May 1, 2016

I have two pivot tables, one with gallons of gas consumed prior to treatment, and a second with gallons of gas consumed after treatment, which is a mixture added to the full gas tank. See image below. I have the pivot table containing a subset of data...

Stats Stack Exchange | By: Jazzmine | Sunday, May 1, 2016

I have two data sets (base and to_match), each with 10 individuals, grouped in 2 classes. Each individual is described by a set of 4 variables. What I want to do is: test wether the groups in the first dataset (base) are identical, based on all the describing...

Stats Stack Exchange | By: Wiliam | Friday, April 29, 2016

What kind of $f(n): \mathbb{N} \to \mathbb{N}$'s make the following statement true? What kind don't? $\limsup A_{f(n)} \subseteq \limsup A_n$ where $n \in \mathbb{N}$ (*) Well obviously the answers to each are: $(f(n) \ | \ \limsup A_{f(n)} \subseteq...

Stats Stack Exchange | By: BCLC | Sunday, May 1, 2016

From Williams' Probability with Martingales: $X_n(\omega)$ does not converge to a limit in $[-\infty,\infty]$ --> Is this supposed to be stronger than $\lim X_n$ does not exist? Why do we have Is the part with $$\liminf X_n(\omega) < \limsup X_n(\omega)$$...

Stats Stack Exchange | By: BCLC | Sunday, May 1, 2016

If I create a weekly ts time series with <= 188 values in it and plot it I get a "fractional" labeled x axis: x <- ts(rnorm(188,0,1), frequency=52, start=c(2000,1)) plot(x) but if I create a time series with >= 189 values, plot displays the...

Stats Stack Exchange | By: Randy Wilson | Sunday, May 1, 2016

I have the following data series: # retrn vix 1 7.44 27.799999 2 14.57 23.4 3 8.03 19.440001 4 4.42 18.43 5 2.27 15.5 6 9.67 17.15 7 -3.44 24.059999 8 8.32 17.08 9 4.65 18.93 10 7.7 17.469999 11 2.87 15.73 12 5.02 18.6 ... retrn - my asset returns (monthly)...

Stats Stack Exchange | By: Vingthor | Sunday, May 1, 2016

In part of an experimental trial (n=1), I asked the participant to answer a specific questionnaire (continuous response variable) under the influence of 4 different dosages (dosage 1, 2, 3 and 4) of a same substance. This task was repeated (after a certain...

Stats Stack Exchange | By: ynwa_in_stats | Sunday, May 1, 2016

Say I know the distribution of $X-Y$, but I do not know the distributino of $X$ (or $Y$), but I know that they are statistically independent, and I know they have the same distribution. Is the problem of finding the distribution well-defined, as in will...

Stats Stack Exchange | By: pkofod | Friday, April 29, 2016

I have a series of monthly returns on financial data. My goal is to estimate the volatility of 10 year rolling returns. I am a bit confused on two options. a) Calculate 10 year rolling returns, annualize this and then calculate the volatility of the...

Stats Stack Exchange | By: Jantamanta | Sunday, May 1, 2016

Suppose we compute the correlation PCA of a dataset $X$ (with $m$ variables and $n$ observations) by first normalizing the input variables. That is: mean -> 0 and standard deviation -> 1. Let us assume for the sake of this question that $\mu_i=0$...

Stats Stack Exchange | By: Werner Van Belle | Sunday, May 1, 2016

Is M-estimation valid only for regression models or does it's working hold good for robust estimation of parameters in other statistical models? I understand that M-estimators are asymptotically normal for least squares models. Is it also true for any...

Stats Stack Exchange | By: user251385 | Sunday, May 1, 2016

I am attempting to model the fluorescent signal emitted by a fluorescent calcium indicator (lights up when there is calcium influx into a cell). According to [1], the following formula works as a workable approximation, under certain conditions: $\Delta...

Stats Stack Exchange | By: mowe | Sunday, May 1, 2016

I'm fairly new to statistics - I'm sure this is a basic question but my google searching is failing me. Happy to just be pointed to other reading. I have 3 datasets of varying sizes (N1 ~ 200,000, N2 ~ 80,000, N3 ~ 400). In each dataset, for each sample...

Stats Stack Exchange | By: kevbonham | Sunday, May 1, 2016

I am totally new to "machine learning" and am looking for how to get started. Can you point me to a few resources, geared for the beginner, that are excellent starting points? What are the main families of tasks in machine learning? Who are the famous...

Stats Stack Exchange | By: Disco Dancer | Sunday, May 1, 2016

This is probably a very basic question; I have a data-frame with a fake questionnaire with three sets of questions measuring three constructs. I'm currently reading some research papers which in order to create the construct aggregate the mean per country,...

Stats Stack Exchange | By: John Smith | Sunday, May 1, 2016

I was sort of self-studying a poorly-elaborated lecture note of factorial design. It mentioned that a $2^{9-5}$ design has resolution 3. This is checked with the table below. It has $2^4=16$ runs, and we require $9+1=10$ runs to delineate all main effects....

Stats Stack Exchange | By: user2513881 | Sunday, May 1, 2016

I am having some problems with estimating a VAR in R. I am trying to replicate a study from Park and Ratti 2008 Using a time period from January 1997 to February 2016, I have been able to perform KPSS and PP tests, which results resemble the ones in...

Stats Stack Exchange | By: fwintherdk | Saturday, April 30, 2016

From wiki: Given a set of independent identically distributed data points $\mathbb{X}=(x_1,\ldots,x_n)$, where $x_i \sim p(x_i|\theta)$ according to some probability distribution parameterized by θ, where θ itself is a random variable described by...

Stats Stack Exchange | By: slava_b | Sunday, May 1, 2016

I am using esri arc to generate random points. I then analyze the pattern from this process using Average Nearest Neighbor which is also in esri gis but lets say it can be in any other software. Is there a chance that it comes as dispersed or clustered...

Stats Stack Exchange | By: Navid | Sunday, May 1, 2016

I have 5 point likert scale questionnaire as dependant variable..and yes /no quectionnaire as independant variable.how i analyze this with spss..want to find correlation of these 2 variables and find relationship

Stats Stack Exchange | By: Anil | Sunday, May 1, 2016

I use arc software to do Moran 1 analysis and it only takes polygons for input. Why is it called point process of only takes polygons?

Stats Stack Exchange | By: Navid | Sunday, May 1, 2016

Let $Y_1 < Y_2 < … < Y_n$ be the order statistics of $n$ independent observations from a continuous distribution with cumulative distribution function $F(x)$ and probability density function: $$f(x)=F′(x)$$ where $0 < F(x) < 1$ over...

Stats Stack Exchange | By: Hamid | Sunday, May 1, 2016

I need to generate random point process manually to learn in the same way they do in other software like arc esri. I can use RAND() but I know what I produce then has to be Poisson distribution because that what I see in literature. How can I make sure...

Stats Stack Exchange | By: Navid | Sunday, May 1, 2016

I am running the following model in R: model = lmer(Tau ~ ageS*days+YrsOfEds*days+sex*days+tract*days + (1|SubjectID), data=long) With this model I am trying to predict change in tau over time based on the quality of a tract. Both tau and tract are continuous...

Stats Stack Exchange | By: HIL | Sunday, May 1, 2016

Let $A$, $B$ be two zero-mean random variables. Let the variance be $\sigma^2_A$, $\sigma^2_B$ and let the correlation be $\sigma_{AB}$. Consider the following expression :- $$ \mathbb{E}\big[A|B=b\big] $$ When $A,B$ are jointly gaussians the above expression...

Stats Stack Exchange | By: Vivek Bagaria | Sunday, May 1, 2016

I know how to find a correlation between 2 variables. How am i supposed to find correlations between multiple variables in r programming and how do i plot a graph for it?

Stats Stack Exchange | By: Akshay Sirsikar | Sunday, May 1, 2016

I recently saw* a pmf: $f(y)=\frac{\mu^y}{(y!)^\theta z(\mu,\theta)}$, where $z(\mu,\theta) = \sum_{i=0}^{\infty}\frac{\mu^i}{(i!)^\theta}$. * It is a bonus question on a homework assignment. I am wondering if this belongs to the exponential family?...

Stats Stack Exchange | By: Kevin | Sunday, May 1, 2016

I've gone through the theoretical definition of cluster analysis and have learnt the basics of it.But i want to know the advantages of the cluster analysis process and a real time example as to where it is used.

Stats Stack Exchange | By: Akshay Sirsikar | Sunday, May 1, 2016

Many statistical software ask whether to standardize data or no: What is a general rule to when data should be standardized? Do we standardize categorical variables? Is there a difference in how standardization effects or in interpreted in different...

Stats Stack Exchange | By: kon7 | Sunday, May 1, 2016

Are there some neural networks that can reach state-of-the-art accuracy with two or three hours training, on dataset like CIFAR, MNIST,etc...

Stats Stack Exchange | By: Eli He | Sunday, May 1, 2016

based on public data and using excel 2010 or after, I want to forecast/predict the football match winner.

Stats Stack Exchange | By: ray | Saturday, April 30, 2016

We know that if $\big(X_1,X_2...X_k) \sim multinomial(n;p_1,p_2...p_k)$ then $X_i \sim bin(n;p_i) $ Then, $var(X_i) = np_i(1-p_i)$. But we have $cov(X_i,X_j) = -np_ip_j$. So doesnt that imply $var(X_i) = cov(X_i,X_i) = -np_i^2$? (Which is basically impossible...

Stats Stack Exchange | By: RibD | Sunday, May 1, 2016

Can someone please explain how the sample mean and sample variance are independent?

Stats Stack Exchange | By: Blueberry | Saturday, April 30, 2016

The two formulations seem identiical to me:
$H(x) = \sum p(x) log(1/p(x))$
why tha latter it is attributed to Shannon rather than Gibbs?

Stats Stack Exchange | By: hayer | Saturday, April 30, 2016

If I understand correctly, boxplot() treats numerical group variable values as discrete values and spaces the boxes evenly on the plot. What can I do to produce a boxplot with a horizontal axis scaled for continuous group variable values? (e.g. in SAS...

Stats Stack Exchange | By: Amit | Saturday, April 30, 2016

I'm new fish in the water of Game Theory and just got stuck with calculating discounting rage (or discounting parameter) with 2x2 matrix. The main condition is that the game is repetitive. Here is the matrix: Here is what I want to learn: (1) how can...

Stats Stack Exchange | By: RLearnsStats | Saturday, April 30, 2016

Please tell me break points for each graph. Thank you....

Stats Stack Exchange | By: B11b | Saturday, April 30, 2016

I know for regular problems, we know if we have an best regular unbiased estimator, it must be the mle. But generally, if we have an unbiased mle, would it also be a best unbiased estimator(or maybe I should call it umvue, as long as it has the smallest...

Stats Stack Exchange | By: Gary Cheng | Saturday, April 30, 2016

I have a neural network that I trained on 32 * 32 px size images. Can I use these filters learned from the network on larger images not used in training the network such as a 600 * 800 px image? Or does it not make any sense to apply filters that were...

Stats Stack Exchange | By: Kevin | Saturday, April 30, 2016

