Consider the kernel methods in machine learning that are used in Support Vector Machine, Gaussian Process,.. We need to define $k(x;y)$ that measure the similarity between 2 data points $x;y$. The common choice is the RBF kernel: $k(x;y)=exp(- || x-y... From: Stats Stack Exchange | By: Lan Trần Thị | Thursday, December 1, 2016
I am currently working on a project, and I must say that I am struggling a bit with this particular step. To be fair, I am unaware as to whether or not this is possible at all, and that is mostly why I am here. So, I have been provided with a rather... From: Stats Stack Exchange | By: C. Maddox | Wednesday, November 30, 2016
To start off, please go through this question regarding measuring non-uniformity in probability distributions. Among several good answers, user495285 has suggested a heuristic of simply taking the L2 norm of a vector whose values add to 1. I've found... From: Stats Stack Exchange | By: Ketan | Wednesday, November 30, 2016
Consider a sequence of i.i.d Bernoulli RV's$\{X_i\}_{i \in \mathbb N}$with parameter$p$. Based on this sequence we build a sequential estimator of$p$(using inverse binomial sampling for example). This means there's a stopping time$N$on the sequence,... From: Stats Stack Exchange | By: Luis Mendo | Friday, December 2, 2016
I have a dataset that includes a group of vegetation variables (number of red maples, Fir, pine.. etc) and a group of environmental data (soil pH, moisture.. etc). I am looking to understand the relationship between these two groups of variables using... From: Stats Stack Exchange | By: ChadSims | Saturday, December 3, 2016
The environment: We have a state equation: $$\xi_t =F\xi_{t-1} + v_t$$ and a measurement equation $$y_t = H\xi_t + w_t$$ with $$E\Bigg[\begin{pmatrix}v_t\\w_t\end{pmatrix}\begin{pmatrix}v_t'&w_t'\end{pmatrix}\Bigg ] = \begin{pmatrix}Q_t&0\\0&R\end{pmatrix}... From: Stats Stack Exchange | By: user106860 | Saturday, December 3, 2016
I'm learning about doing inference with two samples. d̄ = x̄₁ - x̄₂. Sampling is independent. Let's suppose x̄₁ and x̄₂ are both normally distributed. My understanding is that this means d̄ will also be normally distributed. Why... From: Stats Stack Exchange | By: Adam Zerner | Friday, December 2, 2016
I am a particle physicist, and a very frequent task is: given data sampled from a distribution mixture$$ Z \sim \sum_i \phi_i F_i, $$where \phi_i are the mixture weights / prior probabilities and F_i are the distributions of the separate components,... From: Stats Stack Exchange | By: jwimberley | Friday, December 2, 2016
I need to show that E\{ZΦ(Z)\} = 1 / \left( 2\sqrt{\pi} \right). Let Z be a standard normal random variable with density ϕ and distribution function Φ I don't know how to start. From: Stats Stack Exchange | By: mmm | Friday, December 2, 2016
I have gene expression data over a time course and am wondering how best to formulate it. If the data consists of a continuous outcome, gene_expression, day, subject, such that for each subject, both gene_expression and outcome are measure over many... From: Stats Stack Exchange | By: user116351 | Friday, December 2, 2016
Suppose I have a model that predicts the probability of some event (e.g. likelihood of cancer). For the sick (red) and healthy (green) individuals in a holdout set (not used to train the model), we might see something like the following distribution:... From: Stats Stack Exchange | By: user48956 | Friday, December 2, 2016
I have questions about whether a colleague's statistical approach is appropriate. They are looking at whether the effects of 9 continuous predictors on a continuous outcome differ between 3 natural / non-assigned groups. All variables are directly measured... From: Stats Stack Exchange | By: zephryl | Friday, December 2, 2016
Although it is often calculated differently, my intuitive understanding of PCA arises from its definition as the eigendecomposition of the sample covariance matrix. I have recently become aware of various popular methods for improving estimation of the... From: Stats Stack Exchange | By: user310374 | Thursday, December 1, 2016
I have a random variable estimated over time by an online algorithm. I have the mean and variance of the random Gaussian variable at every step t. I expect the time series to have sudden shifts. What is the best way to estimate if the my estimated variable... From: Stats Stack Exchange | By: Dr.Thanos | Friday, December 2, 2016
In the method described here http://dmkd.cs.vt.edu/papers/TKDE17.pdf R code implementation is provided https://github.com/MLSurvival/ESP/blob/master/ESP_TKDE2016/TKDE_code.R the Kaplan-Meier estimator is initially estimated from data but then (from line... From: Stats Stack Exchange | By: user140913 | Friday, December 2, 2016
I recently concluded a survey that used a forced choice Likert Scale (Strongly Agree to Strongly Disagree) and an "I Don't Know" option. Unfortunately, not all responses are complete and I wonder if I can assume a non-response is the same as "I... From: Stats Stack Exchange | By: Michelle | Friday, December 2, 2016
I have a question on how to perform Probit model with endogenous binary regressors. I am so confused after reading so many threads and articles and I am not an econometrics person. But I am reading a paper similar to my research, with different dependent... From: Stats Stack Exchange | By: Bibo | Friday, December 2, 2016
I'm trying to forecast 15 data points based on a time series of 61 data points. Each point is the daily total for a measure, and values of zero are possible. I do have the actual values for the 15 points I'm trying to forecast, so the model can be validated... From: Stats Stack Exchange | By: BogdanC | Friday, December 2, 2016
I am predicting a "yearly cumulative variable" from monthly results. I use Yj = Σ (Xj) / fj where the summation runs from 1,j with j being the current month. I know the f from history; e.g. January = .073, February = .070, March = .087, April = .076,... From: Stats Stack Exchange | By: Jeff | Thursday, December 1, 2016
I am estimating a system of simultaneous linear equations using R and the systemfit package. I have several equations where a one of the coefficients is known to be 1 by theory. Until yesterday I implemented linear restrictions to force this, but then... From: Stats Stack Exchange | By: Benjamin | Thursday, December 1, 2016
I have a table with 65 records and 1000 variables for which I use LASSO to perform feature selection. Then, in order to quantify the relative impact of the variables, I regress the response only on these variables using traditional linear regression.... From: Stats Stack Exchange | By: matsuo_basho | Friday, December 2, 2016
I am currently trying to apply a regression via 'plm' on a panel dataset I constructed. e1<-plm(rgdpna~capital+humancapital,data=nameofdata,na.action=na.omit) Unfortunately I always get the following error code: Error in model.frame.default(formula... From: Stats Stack Exchange | By: clusterb8gaxilulu | Friday, December 2, 2016
I am currently learning (by myself) about the analysis of panel data. What I have seen so far is that a fixed effects model allows us to control for idiosyncratic differences between entities. Moreover, dependencies that universally affect all entities... From: Stats Stack Exchange | By: harlekin | Thursday, December 1, 2016
I am struggling to get the right covariance matrix for a Fixed Effects First Difference panel data model. My guess is that there might be some problem of error autocorrelation ,cov(\varepsilon_t,\varepsilon_{t-1}\neq0) so the covariance matrix is not... From: Stats Stack Exchange | By: adrian1121 | Friday, December 2, 2016
I'm trying to understand the solution to question 1 in this: http://dept.stat.lsa.umich.edu/~ionides/620/hw/hw2sol.pdf E[(N(t))^2] + E[N(t)] * E[N(s)] = (\lambda$$t$)$^2$ + $\lambda$$t$$\lambda$$s + \lambda$$t$ Where does the very last $\lambda$$t... From: Stats Stack Exchange | By: mike24 | Friday, December 2, 2016
I've been looking for a solution in R to estimate a structural VAR with long and short run restrictions as done in Bjørnland & Leitemo (2009), where they represent a stable VAR in a moving average form:$$ y_t=B(L)u_t $$where B(L) is a convergent... From: Stats Stack Exchange | By: lucasfariaslf | Thursday, December 1, 2016
I need to know how to compute chi-square distance that will be used as distance metric for Kmeans clustering on one-dimensional data sets like domain terms used for concept formation in the process of ontology learning. Thanks for your help.... From: Stats Stack Exchange | By: Kidane Woldemariyam | Friday, December 2, 2016
I want to know how to work with Average Precision (AP) when the ratings are non-binary. I will use this excellent worked example as a reference. Suppose the keyword "porcupines" is used in a search engine, and 4 results are retrieved. It is easy to rate... From: Stats Stack Exchange | By: Hamman Samuel | Friday, December 2, 2016
I have a series of data over the last 15 years for a number of countries on annual inflation rates. I want to calculate the probability that the inflation rate in one of those years exceeded a specified inflation rate (say 10%) for each country. For... From: Stats Stack Exchange | By: mjthoms2 | Friday, December 2, 2016
Crosspost from math.stacksexchange. Though it might fit better here. My question is about the possibility of showing equivalence between the hazard rate, the conditional probability (of failure) and a likelihood function. Dynamics: Consider a coin that... From: Stats Stack Exchange | By: TMorville | Thursday, December 1, 2016
I'm working with a dataset from Gapminder (www.gapminder.org), and I've been running some logistic regressions. As I understand it, if you get an Odds Ratio of exactly 1, then then interpretation is that there cannot be a significant difference. However,... From: Stats Stack Exchange | By: Idlan Zakaria | Friday, December 2, 2016
I am wondering if its possible to model or fit a discrete distribution where there are only three possible outcomes: 0, 1, and 3. I get hung up on the fact that 2 is not an allowed outcome and don't really see how to do it. A sample table is below. I... From: Stats Stack Exchange | By: J. Paul | Friday, December 2, 2016
What id the difference/similarity between a biological chromosome and a chromosome from a genetic algorithm? A biological chromosome represents a specific living organism who can be a result of an evolution, and can be a starting point of a new evolution.... From: Stats Stack Exchange | By: anonymous | Friday, December 2, 2016
I am trying to understand whether discrete fourier transform gives the same representation of a curve as a regression using fourier basis. For example, library(fda) Y=dailytempav[,1] ## my data length(Y) ## =365 ## create fourier basis and estimate... From: Stats Stack Exchange | By: qoheleth | Friday, December 2, 2016
I am considering the standard test for randomness based on runs(Wald–Wolfowitz runs test). In it, you basically take a sequence of 1's and 2's, having length n, n_1 1's, n_2 2's and you want to test: H_{0}: there is no trend (i.e. each element... From: Stats Stack Exchange | By: sdd | Friday, December 2, 2016
In one way anova, does the one refer to the independent variable or dependent variable? For example I have three groups: Group A (Music Therapy) Group B (Cognitive therapy) and Group C (no therapy) dependent variable is stress scores Does the one way... From: Stats Stack Exchange | By: shindy | Thursday, December 1, 2016
I've run multiple regressions (>100) and have generated plots of predicted vs observed values. However when I look at the plots using the R viewer I can only see the last 10-15 regressions. How would I modify the R viewer so I can see all the plots... From: Stats Stack Exchange | By: Sandro | Thursday, December 1, 2016
Consider the following Panel Data model:$$ y_{it}=x_{it}\beta+\alpha_{i}+u_{it}$$where$\alpha_{i}$denotes the individual specific fixed effect,$x$and$y$are both scalars for individual$i$at time$t$. I wish to estimate this equation using fixed... From: Stats Stack Exchange | By: Kwame Brown | Thursday, December 1, 2016
I am wondering if there is a way to include upper limits into fitting a function using minimum chi squared when only the chi squared values are know. Essentially I have a model that has two parameters, and I am trying to fit this model to various sets... From: Stats Stack Exchange | By: NeuralLotus | Thursday, December 1, 2016
This isn't really a statistical problem, more like a curiosity. In any case, comments much appreciated. I read a paper where they hypothesize the following: Hypothesis 1. Z moderates the relationship between X and Y. When Z is high, the positive relationship... From: Stats Stack Exchange | By: andree | Thursday, December 1, 2016
Does$E[E(X|Y)|Z] =E[X|Y,Z]$? Also, what about$E[E(X|Y=y)|Z=z I would like to model it as a bayesian glm and had a look at the bayesglm function on ARM (package). The package says: modeling with independent... From: Stats Stack Exchange | By: Silver | Thursday, December 1, 2016 smile frown I have a time series with only 20 observations (see below). I would like to forecast 5 points in the future using sarima.for or arimax in R. Is the number of observations too small for forecasting as such? If so, is there another recommended approach?... From: Stats Stack Exchange | By: Rafael | Thursday, December 1, 2016 smile frown Given the frequency of certain readership patterns, for example: if a person has read: the previous issue of a magazine (P1), read the 2nd last issue of a magazine (P2) and 3rd last issue of a magazine (P3) I have created a group of 7 categories: 7:... From: Stats Stack Exchange | By: mmulibra | Thursday, December 1, 2016 smile frown I'm wanting to compare scores derived from a reaction time task between two groups with unequal sizes (G1 = 78; G2 = 23). However, when I run the U test it tells me there is no significant difference, U = 897.00, z = .000, P = 1.00. How am I getting... From: Stats Stack Exchange | By: user140660 | Wednesday, November 30, 2016 smile frown I would like to compare a model with time varying slope coefficient to a model with constant slope coefficient. In order to do this, I use the R package "dlm" to set up the models and calculate the coefficients, which actually works fine. However, I... From: Stats Stack Exchange | By: Schlaftablette | Thursday, December 1, 2016 smile frown I have two data sets, both ranging from 1996-2016. However, the y-axis values are on completely different scales. The first is for mean NDVI values where 0 is centered on the mean (.1865) and the ranges are the differences between the mean and the values... From: Stats Stack Exchange | By: user71356 | Wednesday, November 30, 2016 smile frown Here is what I have: 1 nominal categorical dependent variable 6 nominal categorical independent variable 1 continuous independent variable I used binomial regression for analysis, but my adviser insists that I use correlation instead. So I need help... From: Stats Stack Exchange | By: Kobie | Thursday, December 1, 2016 smile frown I was reading from this page on Princeton.edu. They are performing a logistic regression (with R) At some point they calculate the probability of getting a residual deviance higher than the one they got on a$\chi^2\$ distribution with degrees of freedom...