Serendeputy - your personal news assistant.

Welcome to Serendeputy!

Serendeputy is your personal news assistant.

Your deputy:
- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
  1. Click links to teach your deputy
  2. Click smileys and frownies
  3. Find favorite topics and sources
  4. See how much better your deputy is getting at finding you good stuff.
  5. Sign in for free to save your profile, or please tell me why you won't.
I apologize for my stats illiteracy in advance–I'm not a stats guru by any stretch, but am trying to learn. To start, I'll just introduce what my data set looks like, then what I'd like to accomplish. I am working with geological data (vectors) that...
From: Stats Stack Exchange | By: dazzle | Saturday, September 20, 2014
smile
frown
I am analyzing (extreme value analysis) the dataset which contain daily rainfall over 100 years of a single location. However there are around 500 missing values on the whole dataset. In this case the exact reason why data is missing is not known, but...
From: Stats Stack Exchange | By: carl | Friday, September 19, 2014
smile
frown
Recently i was introduced to the field of Data Science (its been 6 months approx),and i started the journey with Machine Learning Course by Andrew Ng and post that started working on the Data Science Specialization by JHU. On practical Application front...
From: Stats Stack Exchange | By: Vinay Tiwari | Friday, September 19, 2014
smile
frown
I have a classification problem where I would like to develop a binary classifier to classify between two different types of objects, given a time-series (signal) related to that object. The problem I'm dealing with is the following: most literature...
From: Stats Stack Exchange | By: Maverick007 | Thursday, September 18, 2014
smile
frown
This may seem to be a trivial issue when I looked at the linear values of Mean Square error mse =[0.000615833333333331,0.000577499999999998,0.000752499999999997,0.000800833333333331,0.000812499999999997,0.000812499999999997,0.000812499999999997,0.000812499999999997,0.000812499999999997]...
From: Stats Stack Exchange | By: SKM | Sunday, September 21, 2014
smile
frown
https://moodle.concordia.ca/moodle/pluginfile.php/1785520/mod_resource/content/1/Assignment%201.pdf It seems as if I need to find the mean of the marginal distribution table, but I do not know what variables to use....
From: Stats Stack Exchange | By: Michael Gianni | Sunday, September 21, 2014
smile
frown
Hello Dear Researchers. I want to list advantages and disadvantages of Neural network method for classification or estimation purposes. I have already found the advantages of NN method in many papers. But they didn't write anything special about disadvantages...
From: Stats Stack Exchange | By: Electricman | Sunday, September 21, 2014
smile
frown
I am reading and trying to learn about the probability integral transform and some of its uses. From the CV question PIT on a sample with m bins, and KS test used to estimate a good value for m, the probability integral transform of random variable $X$...
From: Stats Stack Exchange | By: Alexis | Sunday, September 21, 2014
smile
frown
Let $ X_1, ... , X_n $ be i.i.d random variables with pdf given by $$f(x;\theta) = \exp(-(x-\theta))I_{(\theta, \infty)}(x)$$ It is asked to find a sufficient statistics for $ \theta $ and to verify if it is complete too. Since $$L(\theta;x)=\exp(-\sum...
From: Stats Stack Exchange | By: Giiovanna | Sunday, September 21, 2014
smile
frown
I have several time-series in a VAR(1) and, due to some of them haven't the same unit of measure, I'd like to estimate the RMSE in percentage. I know that it could be done in several ways (see below) but I don't know precisely which is the one that fits...
From: Stats Stack Exchange | By: fipelle | Sunday, September 21, 2014
smile
frown
I had a discussion about covariance recently and it would be nice to hear your feedback about this. Let's say we have a dataset of $n$ samples with $d$ attributes. For simplicity, let's say 3 of those $d$ attributes are e.g., $d_1$ = distance in miles...
From: Stats Stack Exchange | By: Sebastian Raschka | Sunday, September 21, 2014
smile
frown
How do you calculate the mean and variance of two random variables X~F(m = 3, n = 6) and Y~F(m = 8, n = 6) from their density functions using R?
From: Stats Stack Exchange | By: Kalam | Sunday, September 21, 2014
smile
frown
I have body mass and age data for a population of individuals. I want to fit a cubic smoothing spline curve to the data. I'm using smooth.spline in R, which warns against using cross-validation to select a smoothing parameter when there are duplicate...
From: Stats Stack Exchange | By: Michelle | Sunday, September 21, 2014
smile
frown
My model is $$ Y_{it}=X_{it}'\beta+\varepsilon_{it} $$ where $Y_{it}$ is a vector of weekly observations of a dependent variable and $X_{it}$ is a vector of explanatory variables (also weekly) with coefficients vector $\beta$. I plan on using fixed effects...
From: Stats Stack Exchange | By: Sunv | Sunday, September 21, 2014
smile
frown
Here for 1000 simulations and 40 samples for each, here is random exponential distributor using replicate function lambda = 0.2 n = 40 # The number of samples per simulation nosim = 1000 # The number of simulations set.seed(25) st <- replicate(nosim,rexp(n,lambda));...
From: Stats Stack Exchange | By: user2129623 | Sunday, September 21, 2014
smile
frown
Suppose I compute the Hausman-Taylor estimator using the plm command with the option: model= "ht". Using the result I like to obtain a robust variance-covariance matrix to make inference fully robust. For this purpose the vcovHC() command (part of the...
From: Stats Stack Exchange | By: Manuel S | Sunday, September 21, 2014
smile
frown
My goal is to investigate a dependent variable which is metric (time in hours). The independent variables include 3 metric, 2 binary (factors), and one factor variable, which consists of 11 districts of a city. I tried to conduct a GLM. Can I put all...
From: Stats Stack Exchange | By: mark us | Sunday, September 21, 2014
smile
frown
I want to find the point at which I can expect less than $p_{Error}$ errors in a group of events that follow Poisson distribution. With a little help from Wikipedia I have found out that the equation I need to solve is $$P(X \geq x) \leq \dfrac{e^{-\lambda}(e\lambda)^x}{x^x}\text{,...
From: Stats Stack Exchange | By: Cine | Sunday, September 21, 2014
smile
frown
If I know that d=0.8, sig.level=0.05, power = 0.8, n1/n2 = 3,how can I calculate n1 and n2 with the function pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = ) in pwr package in R or some other functions in R?
From: Stats Stack Exchange | By: user2230101 | Friday, September 19, 2014
smile
frown
Let X1; : : : ;Xn be a random sample on an exponential distribution with mean theta Obtain an unbiased estimator for theta based on G, where G is the geometric mean of the observations. Hint: answer may be expressed in terms of the gamma function. Approach:...
From: Stats Stack Exchange | By: user56158 | Sunday, September 21, 2014
smile
frown
How to get ANOVA table for exponential regression models in MS excel?
From: Stats Stack Exchange | By: Faisal | Sunday, September 21, 2014
smile
frown
I am wondering if 2000:4500 of no:yes respectively is a class imbalance problem? Highly appreciate any help.
From: Stats Stack Exchange | By: mparida | Sunday, September 21, 2014
smile
frown
I am using STATA. So I have 11 separate variables, all "0 1" binary variables. I would like to recode them into one variable, and at the same time I would like to group the values from variables 1-3 into one, 4-9 into one, and leave variables 10 and...
From: Stats Stack Exchange | By: Ivy Jane | Sunday, September 21, 2014
smile
frown
I have a group of 222 test results that were reviewed separately by two interpreters X and Y. Based on their assessment, they were to assign management into 4 categories A, B, C and D. Then another variable that they were blinded to was revealed to them...
From: Stats Stack Exchange | By: Steve | Sunday, September 21, 2014
smile
frown
Survey margin of error contracts as the proportions become more extreme. Its validity and applicability in such cases has always concerned me, but I suppose much depends on the context. Where we have mere traces of data on one side, an extreme proportion...
From: Stats Stack Exchange | By: LinearB | Sunday, September 21, 2014
smile
frown
I am trying to fit a multinomial mixture model to data from a stream depletion survey. The data were collected by selecting a stream site that is a standard length (usually 150-200m depending on width), blocking the upper and lower end of the site off...
From: Stats Stack Exchange | By: Jason | Saturday, September 20, 2014
smile
frown
Can we convert blob.noun_phrases into Pandas's DataFrame? The data type of blob.noun_phrases is class 'textblob.blob.WordList' type(blob.noun_phrases) class 'textblob.blob.WordList' I am asking this because it would be much more easier to count noun_phrases...
From: Stats Stack Exchange | By: user35577 | Sunday, September 21, 2014
smile
frown
My solution to the problem is as follows: The answer I get is 27. My reasoning is that the last digit must be even, so for that position there are 3 choose 1 possibilities. Then the first digit cannot be zero so there are only 3 possible digits. Finally,...
From: Stats Stack Exchange | By: Peter | Sunday, September 21, 2014
smile
frown
I have a dependent variable (scale data) called posttest, also an independent variable (nominal) of teaching method that has four levels. The covariate is the pretest measured at the scale level. I am using SPSS so the independent variable is the fixed...
From: Stats Stack Exchange | By: Hilda | Sunday, September 21, 2014
smile
frown
To whom it may concern. My "population" with known size P is a landscape and has not more than 4 peaks with about same high. Naturally top elements group locally within the population. I seek the top T elements. Sorting the whole population is easy,...
From: Stats Stack Exchange | By: Stephan | Friday, September 19, 2014
smile
frown
Social network data is frequently found in a two-mode form: people vs. events they attend, people vs. classes they attend, countries vs. treaties they sign, etc. A strategy for analyzing this data is to project the rectangular, binary matrix $X$ into...
From: Stats Stack Exchange | By: Jesse | Saturday, September 20, 2014
smile
frown
Let's say I have a sample size established with alpha=0.05 and power=0.8 (based on time constraint for study). This makes the value of alpha and power mathematically interdependent. So, the same sample size can be achieved with any of: alpha = 0.001...
From: Stats Stack Exchange | By: Vlad | Saturday, September 20, 2014
smile
frown
I have a matrix with 0.25 million rows and 50 columns. I have scaled up this matrix to 1.5 million rows and 50 columns using a Method A. I would like to measure the quality of the method I have used, in terms of the distribution that is retained or any...
From: Stats Stack Exchange | By: user2761431 | Saturday, September 20, 2014
smile
frown
Suppose I have a data set listing the chromosome count of a large number of species. Each species, in addition to chromosome count, have data on their Phylum, Order, and Family. I would like to know what kind of test the following question would require:...
From: Stats Stack Exchange | By: ChromosomeCount | Saturday, September 20, 2014
smile
frown
I run a multinomial logit regression model for a multiclass classification problem and use the following R function: trainedModel <- multinom(UNS ~ ., data = traindata) Where UNS is the target variable. (There are 4 classes) How can I find the the...
From: Stats Stack Exchange | By: TheBlueNotebook | Friday, September 19, 2014
smile
frown
A data set has been analyzed in a research paper. Is it possible to use the same data set but different analysis technique and write a research paper?
From: Stats Stack Exchange | By: Daniel | Saturday, September 20, 2014
smile
frown
I am an entry level R programmer and trying to learn statistics. i have downloaded the daily stock Adjusted Close price of one stock from sep 2011 to till date. As per my study plan, i have plotted some basic plots to understand the daily stock Adjusted...
From: Stats Stack Exchange | By: StatsUser | Saturday, September 20, 2014
smile
frown
This is my first post. I am curious to understand that what is the effect of using non-random sample to estimate the population quantile with sample quantile? Let say, I need to measure the 10th quantile of a population. Now I have non-independent sampling...
From: Stats Stack Exchange | By: Maural | Saturday, September 20, 2014
smile
frown
So I have a pretty well testing SVC train series which puts me into the mid 80 percentile without outrageous C/g values. My current C value is 2.0 and gamma is 0.5. Good numbers across the range during refinement - looking solid. Here's the cross-validation...
From: Stats Stack Exchange | By: Michael | Saturday, September 20, 2014
smile
frown
For the purpose of simulation, I would like to construct such a correlation matrix that would respect to some extend the given set of preferable/desirable correlation coefficients for each pair of variables. I tried filling in a matrix with the given...
From: Stats Stack Exchange | By: Ivan | Saturday, September 20, 2014
smile
frown
I am working with the basic RBM that can be found on Geoffrey Hinton's webseite and the MNIST dataset. What I want to do is graphically cluster the input data. I am working with a three layer network currently: 784 (28x28 pixels) -> 200 -> 50 ->...
From: Stats Stack Exchange | By: user1406177 | Saturday, September 20, 2014
smile
frown
I have a problem. i'm tring to fit robust regression with different weight functios like wesle and logistic but i can not do it in R. plZ hlp me
From: Stats Stack Exchange | By: shakeel | Saturday, September 20, 2014
smile
frown
I understand the modular nature of directed models, and that each node captures a conditional probability. But why do we need undirected models? As far as I can see they lack intuition in that the factors don't represent any type (conditional/marginal)...
From: Stats Stack Exchange | By: user3246971 | Saturday, September 20, 2014
smile
frown
I wrote the following code in sas, but I did not get result! The result histogram in grey and the range of data is not as I specified! what is the problem? I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the...
From: Stats Stack Exchange | By: PSS | Saturday, September 20, 2014
smile
frown
I was going through the C4.5 and ID3 algorithms used to construct a decision tree. Was wondering if there is an efficient way to compute information gain from a continuous variable (during the step where the variable to split is selected), other than...
From: Stats Stack Exchange | By: dasman | Saturday, September 20, 2014
smile
frown
Can adaboost choose the same variable for multiple splits for a given tree? The model was given 100 + variable to choose from and it did choose them for the other trees in the ensemble. I am using gbm package -> var_1 (<= 0.7197815 ) -> ->...
From: Stats Stack Exchange | By: Ferric | Friday, September 19, 2014
smile
frown
If I have a dataset, which consist of 100 variables. And I used the for loop to perform a Kruskal.Willis test for each column of the dataset. Of course, I get the test output for each column, and the p-values are in the output. I need all the p-value...
From: Stats Stack Exchange | By: thx_all | Friday, September 19, 2014
smile
frown
I am reading the Categorical Data analysis by Dr. AGRESTI. Here, it explains "The liklihood function of for the GLM also detemines the asymptotic covariance matrix of the ML estimator Beta_hat. This matrix is the inverse of the information matrix" I...
From: Stats Stack Exchange | By: Elementary Mistake | Friday, September 19, 2014
smile
frown
According to Lecun's paper "effient backprop" [1] the tanh activation function should be preferred over the logistic activation function for the hidden units in neural networks. For the tanh units an output of a_i = -1 is considered as inactive. But...
From: Stats Stack Exchange | By: chris elgoog | Friday, September 19, 2014
smile
frown
I am an italian student, and I'm looking for a particular dataset. I'm interested in a model for spatial regression, with time-varying data. I'm looking for data with coordinates, measured in different moments for each point. Where can I find them? Sorry...
From: Stats Stack Exchange | By: Darko | Friday, September 19, 2014
smile
frown