I apologize for my stats illiteracy in advanceāI'm not a stats guru by any stretch, but am trying to learn. To start, I'll just introduce what my data set looks like, then what I'd like to accomplish. I am working with geological data (vectors) that...

From: Stats Stack Exchange | By: dazzle | Saturday, September 20, 2014

I am analyzing (extreme value analysis) the dataset which contain daily rainfall over 100 years of a single location. However there are around 500 missing values on the whole dataset. In this case the exact reason why data is missing is not known, but...

From: Stats Stack Exchange | By: carl | Friday, September 19, 2014

Recently i was introduced to the field of Data Science (its been 6 months approx),and i started the journey with Machine Learning Course by Andrew Ng and post that started working on the Data Science Specialization by JHU. On practical Application front...

From: Stats Stack Exchange | By: Vinay Tiwari | Friday, September 19, 2014

I have a classification problem where I would like to develop a binary classifier to classify between two different types of objects, given a time-series (signal) related to that object. The problem I'm dealing with is the following: most literature...

From: Stats Stack Exchange | By: Maverick007 | Thursday, September 18, 2014

This may seem to be a trivial issue when I looked at the linear values of Mean Square error mse =[0.000615833333333331,0.000577499999999998,0.000752499999999997,0.000800833333333331,0.000812499999999997,0.000812499999999997,0.000812499999999997,0.000812499999999997,0.000812499999999997]...

From: Stats Stack Exchange | By: SKM | Sunday, September 21, 2014

https://moodle.concordia.ca/moodle/pluginfile.php/1785520/mod_resource/content/1/Assignment%201.pdf It seems as if I need to find the mean of the marginal distribution table, but I do not know what variables to use....

From: Stats Stack Exchange | By: Michael Gianni | Sunday, September 21, 2014

Hello Dear Researchers. I want to list advantages and disadvantages of Neural network method for classification or estimation purposes. I have already found the advantages of NN method in many papers. But they didn't write anything special about disadvantages...

From: Stats Stack Exchange | By: Electricman | Sunday, September 21, 2014

I am reading and trying to learn about the probability integral transform and some of its uses. From the CV question PIT on a sample with m bins, and KS test used to estimate a good value for m, the probability integral transform of random variable $X$...

From: Stats Stack Exchange | By: Alexis | Sunday, September 21, 2014

Let $ X_1, ... , X_n $ be i.i.d random variables with pdf given by $$f(x;\theta) = \exp(-(x-\theta))I_{(\theta, \infty)}(x)$$ It is asked to find a sufficient statistics for $ \theta $ and to verify if it is complete too. Since $$L(\theta;x)=\exp(-\sum...

From: Stats Stack Exchange | By: Giiovanna | Sunday, September 21, 2014

I have several time-series in a VAR(1) and, due to some of them haven't the same unit of measure, I'd like to estimate the RMSE in percentage. I know that it could be done in several ways (see below) but I don't know precisely which is the one that fits...

From: Stats Stack Exchange | By: fipelle | Sunday, September 21, 2014

I had a discussion about covariance recently and it would be nice to hear your feedback about this. Let's say we have a dataset of $n$ samples with $d$ attributes. For simplicity, let's say 3 of those $d$ attributes are e.g., $d_1$ = distance in miles...

From: Stats Stack Exchange | By: Sebastian Raschka | Sunday, September 21, 2014

How do you calculate the mean and variance of two random variables X~F(m = 3, n = 6) and Y~F(m = 8, n = 6) from their density functions using R?

From: Stats Stack Exchange | By: Kalam | Sunday, September 21, 2014

I have body mass and age data for a population of individuals. I want to fit a cubic smoothing spline curve to the data. I'm using smooth.spline in R, which warns against using cross-validation to select a smoothing parameter when there are duplicate...

From: Stats Stack Exchange | By: Michelle | Sunday, September 21, 2014

My model is $$ Y_{it}=X_{it}'\beta+\varepsilon_{it} $$ where $Y_{it}$ is a vector of weekly observations of a dependent variable and $X_{it}$ is a vector of explanatory variables (also weekly) with coefficients vector $\beta$. I plan on using fixed effects...

From: Stats Stack Exchange | By: Sunv | Sunday, September 21, 2014

Here for 1000 simulations and 40 samples for each, here is random exponential distributor using replicate function lambda = 0.2 n = 40 # The number of samples per simulation nosim = 1000 # The number of simulations set.seed(25) st <- replicate(nosim,rexp(n,lambda));...

From: Stats Stack Exchange | By: user2129623 | Sunday, September 21, 2014

Suppose I compute the Hausman-Taylor estimator using the plm command with the option: model= "ht". Using the result I like to obtain a robust variance-covariance matrix to make inference fully robust. For this purpose the vcovHC() command (part of the...

From: Stats Stack Exchange | By: Manuel S | Sunday, September 21, 2014

My goal is to investigate a dependent variable which is metric (time in hours). The independent variables include 3 metric, 2 binary (factors), and one factor variable, which consists of 11 districts of a city. I tried to conduct a GLM. Can I put all...

From: Stats Stack Exchange | By: mark us | Sunday, September 21, 2014

I want to find the point at which I can expect less than $p_{Error}$ errors in a group of events that follow Poisson distribution. With a little help from Wikipedia I have found out that the equation I need to solve is $$P(X \geq x) \leq \dfrac{e^{-\lambda}(e\lambda)^x}{x^x}\text{,...

From: Stats Stack Exchange | By: Cine | Sunday, September 21, 2014

If I know that d=0.8, sig.level=0.05, power = 0.8, n1/n2 = 3,how can I calculate n1 and n2 with the function pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = ) in pwr package in R or some other functions in R?

From: Stats Stack Exchange | By: user2230101 | Friday, September 19, 2014

Let X1; : : : ;Xn be a random sample on an exponential distribution with mean theta Obtain an unbiased estimator for theta based on G, where G is the geometric mean of the observations. Hint: answer may be expressed in terms of the gamma function. Approach:...

From: Stats Stack Exchange | By: user56158 | Sunday, September 21, 2014

How to get ANOVA table for exponential regression models in MS excel?

From: Stats Stack Exchange | By: Faisal | Sunday, September 21, 2014

I am wondering if 2000:4500 of no:yes respectively is a class imbalance problem?
Highly appreciate any help.

From: Stats Stack Exchange | By: mparida | Sunday, September 21, 2014

I am using STATA. So I have 11 separate variables, all "0 1" binary variables. I would like to recode them into one variable, and at the same time I would like to group the values from variables 1-3 into one, 4-9 into one, and leave variables 10 and...

From: Stats Stack Exchange | By: Ivy Jane | Sunday, September 21, 2014

I have a group of 222 test results that were reviewed separately by two interpreters X and Y. Based on their assessment, they were to assign management into 4 categories A, B, C and D. Then another variable that they were blinded to was revealed to them...

From: Stats Stack Exchange | By: Steve | Sunday, September 21, 2014

Survey margin of error contracts as the proportions become more extreme. Its validity and applicability in such cases has always concerned me, but I suppose much depends on the context. Where we have mere traces of data on one side, an extreme proportion...

From: Stats Stack Exchange | By: LinearB | Sunday, September 21, 2014

I am trying to fit a multinomial mixture model to data from a stream depletion survey. The data were collected by selecting a stream site that is a standard length (usually 150-200m depending on width), blocking the upper and lower end of the site off...

From: Stats Stack Exchange | By: Jason | Saturday, September 20, 2014

Can we convert blob.noun_phrases into Pandas's DataFrame? The data type of blob.noun_phrases is class 'textblob.blob.WordList' type(blob.noun_phrases) class 'textblob.blob.WordList' I am asking this because it would be much more easier to count noun_phrases...

From: Stats Stack Exchange | By: user35577 | Sunday, September 21, 2014

My solution to the problem is as follows: The answer I get is 27. My reasoning is that the last digit must be even, so for that position there are 3 choose 1 possibilities. Then the first digit cannot be zero so there are only 3 possible digits. Finally,...

From: Stats Stack Exchange | By: Peter | Sunday, September 21, 2014

I have a dependent variable (scale data) called posttest, also an independent variable (nominal) of teaching method that has four levels. The covariate is the pretest measured at the scale level. I am using SPSS so the independent variable is the fixed...

From: Stats Stack Exchange | By: Hilda | Sunday, September 21, 2014

To whom it may concern. My "population" with known size P is a landscape and has not more than 4 peaks with about same high. Naturally top elements group locally within the population. I seek the top T elements. Sorting the whole population is easy,...

From: Stats Stack Exchange | By: Stephan | Friday, September 19, 2014

Social network data is frequently found in a two-mode form: people vs. events they attend, people vs. classes they attend, countries vs. treaties they sign, etc. A strategy for analyzing this data is to project the rectangular, binary matrix $X$ into...

From: Stats Stack Exchange | By: Jesse | Saturday, September 20, 2014

Let's say I have a sample size established with alpha=0.05 and power=0.8 (based on time constraint for study). This makes the value of alpha and power mathematically interdependent. So, the same sample size can be achieved with any of: alpha = 0.001...

From: Stats Stack Exchange | By: Vlad | Saturday, September 20, 2014

I have a matrix with 0.25 million rows and 50 columns. I have scaled up this matrix to 1.5 million rows and 50 columns using a Method A. I would like to measure the quality of the method I have used, in terms of the distribution that is retained or any...

From: Stats Stack Exchange | By: user2761431 | Saturday, September 20, 2014

Suppose I have a data set listing the chromosome count of a large number of species. Each species, in addition to chromosome count, have data on their Phylum, Order, and Family. I would like to know what kind of test the following question would require:...

From: Stats Stack Exchange | By: ChromosomeCount | Saturday, September 20, 2014

I run a multinomial logit regression model for a multiclass classification problem and use the following R function: trainedModel <- multinom(UNS ~ ., data = traindata) Where UNS is the target variable. (There are 4 classes) How can I find the the...

From: Stats Stack Exchange | By: TheBlueNotebook | Friday, September 19, 2014

A data set has been analyzed in a research paper. Is it possible to use the same data set but different analysis technique and write a research paper?

From: Stats Stack Exchange | By: Daniel | Saturday, September 20, 2014

I am an entry level R programmer and trying to learn statistics. i have downloaded the daily stock Adjusted Close price of one stock from sep 2011 to till date. As per my study plan, i have plotted some basic plots to understand the daily stock Adjusted...

From: Stats Stack Exchange | By: StatsUser | Saturday, September 20, 2014

This is my first post. I am curious to understand that what is the effect of using non-random sample to estimate the population quantile with sample quantile? Let say, I need to measure the 10th quantile of a population. Now I have non-independent sampling...

From: Stats Stack Exchange | By: Maural | Saturday, September 20, 2014

So I have a pretty well testing SVC train series which puts me into the mid 80 percentile without outrageous C/g values. My current C value is 2.0 and gamma is 0.5. Good numbers across the range during refinement - looking solid. Here's the cross-validation...

From: Stats Stack Exchange | By: Michael | Saturday, September 20, 2014

For the purpose of simulation, I would like to construct such a correlation matrix that would respect to some extend the given set of preferable/desirable correlation coefficients for each pair of variables. I tried filling in a matrix with the given...

From: Stats Stack Exchange | By: Ivan | Saturday, September 20, 2014

I am working with the basic RBM that can be found on Geoffrey Hinton's webseite and the MNIST dataset. What I want to do is graphically cluster the input data. I am working with a three layer network currently: 784 (28x28 pixels) -> 200 -> 50 ->...

From: Stats Stack Exchange | By: user1406177 | Saturday, September 20, 2014

I have a problem.
i'm tring to fit robust regression with different weight functios like wesle and logistic
but i can not do it in R.
plZ hlp me

From: Stats Stack Exchange | By: shakeel | Saturday, September 20, 2014

I understand the modular nature of directed models, and that each node captures a conditional probability. But why do we need undirected models? As far as I can see they lack intuition in that the factors don't represent any type (conditional/marginal)...

From: Stats Stack Exchange | By: user3246971 | Saturday, September 20, 2014

I wrote the following code in sas, but I did not get result! The result histogram in grey and the range of data is not as I specified! what is the problem? I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the...

From: Stats Stack Exchange | By: PSS | Saturday, September 20, 2014

I was going through the C4.5 and ID3 algorithms used to construct a decision tree. Was wondering if there is an efficient way to compute information gain from a continuous variable (during the step where the variable to split is selected), other than...

From: Stats Stack Exchange | By: dasman | Saturday, September 20, 2014

Can adaboost choose the same variable for multiple splits for a given tree? The model was given 100 + variable to choose from and it did choose them for the other trees in the ensemble. I am using gbm package -> var_1 (<= 0.7197815 ) -> ->...

From: Stats Stack Exchange | By: Ferric | Friday, September 19, 2014

If I have a dataset, which consist of 100 variables. And I used the for loop to perform a Kruskal.Willis test for each column of the dataset. Of course, I get the test output for each column, and the p-values are in the output. I need all the p-value...

From: Stats Stack Exchange | By: thx_all | Friday, September 19, 2014

I am reading the Categorical Data analysis by Dr. AGRESTI. Here, it explains "The liklihood function of for the GLM also detemines the asymptotic covariance matrix of the ML estimator Beta_hat. This matrix is the inverse of the information matrix" I...

From: Stats Stack Exchange | By: Elementary Mistake | Friday, September 19, 2014

According to Lecun's paper "effient backprop" [1] the tanh activation function should be preferred over the logistic activation function for the hidden units in neural networks. For the tanh units an output of a_i = -1 is considered as inactive. But...

From: Stats Stack Exchange | By: chris elgoog | Friday, September 19, 2014

I am an italian student, and I'm looking for a particular dataset. I'm interested in a model for spatial regression, with time-varying data. I'm looking for data with coordinates, measured in different moments for each point. Where can I find them? Sorry...

From: Stats Stack Exchange | By: Darko | Friday, September 19, 2014

