Serendeputy - your personal news assistant.

Welcome to Serendeputy!

Serendeputy is your personal news assistant.

Your deputy:
- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
  1. Click links to teach your deputy
  2. Click smileys and frownies
  3. Find favorite topics and sources
  4. See how much better your deputy is getting at finding you good stuff.
  5. Sign in for free to save your profile, or please tell me why you won't.
i am beginning to harness scikit's svm to perform some news analytics. While going through their tutorials they perform a classification (using linear SVM) on a dataset called 20 news group. I chose 4 categories and finally input a 2257 x 35843 sparse...
From: Stats Stack Exchange | By: Vikram Murthy | Monday, January 26, 2015
smile
frown
For example the distribution of weights of human. There are not many adults under 40 kg, but a lot more people heavier than 100 kg, although the average of an adult's weight is, let's say, 70 kg. Another example is this human reaction time, sharing the...
From: Stats Stack Exchange | By: ziyuang | Sunday, January 25, 2015
smile
frown
Suppose I have the actual and fitted values of two regression lines. Each regression line is modeling the sales of some good. The fitted and actual values of one of the regression lines is much smaller than the other one. I want to compare the fitted...
From: Stats Stack Exchange | By: phil12 | Monday, January 26, 2015
smile
frown
I was wondering if there was a more comprehensive summary() function in R that perhaps includes more model metrics such as confidence intervals around the estimates maybe log-likelihood, AIC, BIC stuff like that. I know its pretty easy to just call other...
From: Stats Stack Exchange | By: moku | Monday, January 26, 2015
smile
frown
My final target is to develop a predictive model for a rate (fraction) DV. The DV showed bimodality and I have no variable that separates the two modes. Hence I created an IV using two observed IVs that can help in producing estimate near the two modes....
From: Stats Stack Exchange | By: Yan Mu | Monday, January 26, 2015
smile
frown
I am unable to follow the steps needed to derive the Fisher Information matrix and the CRLB of an autoregressive model from the observations $x$. The AR process is excited by non-Gaussain sequence, $u$. The $p$ th order AR process is : $x(n) = \sum_{j=1}^p...
From: Stats Stack Exchange | By: Srishti M | Monday, January 26, 2015
smile
frown
First are some background information. This article "The Odds, Continually Updated" from NY Times happened to catch my attention. To be short, it states that [Bayesian statistics] is proving especially useful in approaching complex problems, including...
From: Stats Stack Exchange | By: Aaron Zeng | Sunday, January 25, 2015
smile
frown
So I have two models and I want to calculate these statistics. Is there any package to calculate them in Stata? PRESS statistic (wiki) And, if I am not mistaken. $$ R^2_{predicted} = 1 - \frac{RESET}{ESS} $$....
From: Stats Stack Exchange | By: Vladimir Yashin | Sunday, January 25, 2015
smile
frown
We have a data set where our outcome of interest varies over 10 years, but the explanatory variable of interest and all of the potential confounders are time-invariant. I am quite certain that a panel regression is not possible with this data set but...
From: Stats Stack Exchange | By: Lola | Sunday, January 25, 2015
smile
frown
My data has a binary outcome (attack or not attack), day (20 day in repeated measured design) and some covariates (nestling’s movement). The objectives of my experiment are testing the effect of time and other factors and selecting useful variables...
From: Stats Stack Exchange | By: sue | Sunday, January 25, 2015
smile
frown
Intuitively, it seems to me that, if one is able to make accurate predictions about a variable, then one has also (perhaps implicitly) produced a good estimate of its marginal or conditional distribution. Conversely, it seems that if one has fitted a...
From: Stats Stack Exchange | By: ssdecontrol | Sunday, January 25, 2015
smile
frown
I am trying to solve the following equation, \begin{equation} = \int_{-\infty}^{\infty} \frac{1}{\sqrt{ (2\pi)^{k_{Y}} | \Sigma |}} \cdot \mathrm{exp} \{ -\frac{1}{2} (Y - Xm)^{T} \Sigma^{-1} (Y - Xm) \} \times \delta(m - \beta) \mathrm{d} m \end{equation}...
From: Stats Stack Exchange | By: user4581 | Sunday, January 25, 2015
smile
frown
How can I perform a Monte Carlo simulation on the entire vector in R I have a vector of 1000 values I would like to simulate the entire vector say 10000 times. I know that to simulate in R we do something like rnorm(10000, mean = 0, sd = 1). But I already...
From: Stats Stack Exchange | By: Alexandre | Sunday, January 25, 2015
smile
frown
The problem comes from reading this [0] paper but I think I can express it in a self contained question. Consider the implicit function $H(z)$ defined by the relation: $$F_z(z+H(z))-F_z(z-H(z))=0.5$$ The authors point out that when $f_z=\max(1-|z|,0)$...
From: Stats Stack Exchange | By: user603 | Sunday, January 25, 2015
smile
frown
My question is related to the thread Negative values for AIC in General Mixed Model. I often get negative AIC values from the software I use. I notice it most when I'm doing time series. But here is what I don't get. When defining the AIC like $$AIC...
From: Stats Stack Exchange | By: Zachary Blumenfeld | Sunday, January 25, 2015
smile
frown
I have done a multivariate meta analysis with R, with support from metafor package. I am using rma.mv-method which gives an R object of class c("rma.mv","rma"). My question is about looking for funnel plot asymmetry: Is it correct to use metafor's ranktest...
From: Stats Stack Exchange | By: bigbang | Saturday, January 24, 2015
smile
frown
I have a univariate discrete random variable and a histogram representing its PDF. Is there a known way to increase/decrease the SD of the distribution (i.e. scaling it on the x-axis), while retaining other shape characteristics as much as possible?...
From: Stats Stack Exchange | By: SkepticalEmpiricist | Sunday, January 25, 2015
smile
frown
Consider two Bayesian updates, where there are two observations. One updates with respect to $x_1$, and then uses the posterior of that as a prior to update with respect to $x_2$. In both cases, $x_1$ and $x_2$ are considered conditionally independent...
From: Stats Stack Exchange | By: bayesianlyconfused | Sunday, January 25, 2015
smile
frown
I want to ask whether a procedure to do the following job exists (or whether it makes sense for it to exist). First, assume we have $k$ functions $f_1,...f_k$ that have the same domain and range. Then we have $n>k$ inputs $x_1,...,x_n$. For each $x_i$,...
From: Stats Stack Exchange | By: zyl1024 | Sunday, January 25, 2015
smile
frown
I often have to do repeated-measures ANOVA with Greenhouse-Geisser or Huynh-Feldt corrections, so I use Anova (as described in http://www.r-bloggers.com/r-tutorial-series-two-way-repeated-measures-anova/, and the A "doubly multivariate" design with two...
From: Stats Stack Exchange | By: Stephen Politzer-Ahles | Saturday, January 24, 2015
smile
frown
I want to use regressionBF to run all subsets regression. Here is my code: fitness.bf = regressionBF(VO2 ~ ., data=fitnessdata) and here is the error it spits out when I try and run the code: Error in checkFormula(formula, data, analysis = "regression")...
From: Stats Stack Exchange | By: yodudeman | Sunday, January 25, 2015
smile
frown
It is straightforward to verify that for two random variables $X$ and $Y$ with variances $\sigma^2_X \neq \sigma^2_Y$, we have that $$\Big|{\rm Cov}(X, Y)\Big| \leq \max\{\sigma^2_X,\, \sigma^2_Y\}$$ On the other hand, is is not true in general that...
From: Stats Stack Exchange | By: Alecos Papadopoulos | Sunday, January 25, 2015
smile
frown
Assume we have an input of an email and we want to predict if it is spam or not spam. Without being a statistician, i would think one of the predictors takes the subject of the input email and compares it to known spam email subjects and generate a lev...
From: Stats Stack Exchange | By: user2827377 | Saturday, January 24, 2015
smile
frown
I'd like asking your help to understand a statistical issue from my data set. I ran a GLM with proportional data, using a binomial distribution. However, I've found underdispersion in my model and I don't know how to deal with that. I'm aware that a...
From: Stats Stack Exchange | By: Mauricio | Saturday, January 24, 2015
smile
frown
I am building a predictor for $y = f(x)$ using training samples ${(x_i, y_i)}$ (assume) drawn i.i.d from some distribution $p(x,y)$, by optimising the empirical L2-loss: $f(x) = argmin_f \; \sum_i ||f(x_i)-y_i||_2^2$. (Assume $f$ is suitably parameterised,...
From: Stats Stack Exchange | By: Vimal | Saturday, January 24, 2015
smile
frown
The object to be observed consists of B cubes $(b_{1},\ldots,b_{B})$. The detector consists of $D$ parts namely $(d_{1},\ldots,d_{D})$. Let $p(b_{i},d_{j})$ denote the probability of detecting a photon emission from cube $b_{i}$ in the detectortube $d_{j}$....
From: Stats Stack Exchange | By: ziT | Saturday, January 24, 2015
smile
frown
What criteria can be used to tell whether the prediction of a model will be reliable. Background: We have data with $N$ computers. However, prices available only for, approx., $N/2$ computers. I build some log-linear model using these $N/2$ observations....
From: Stats Stack Exchange | By: Vladimir Yashin | Saturday, January 24, 2015
smile
frown
please I have a list of 10 stocks, with each having a timeseries of log returns. (AIG,JPM..) I have calculated the log returns for each of the stocks in the following ######### PB29=as.numeric(unlist(AIG[2])) n31=length(PB29) R.AIG <- (log(PB29[-1]/PB29[-n31]))...
From: Stats Stack Exchange | By: Alexandre | Saturday, January 24, 2015
smile
frown
Is there any opportunity to create such interval where a variable ($\{\ln(X_i)\}^n_{i=1}$) is the fraction of prices for two periods? $$ X_i = \frac{price.new_i}{price.old_i} $$ Please, look at my attempt below. Is everything correct?...
From: Stats Stack Exchange | By: Vladimir Yashin | Friday, January 23, 2015
smile
frown
I would like to calculate d prime for a memory task that involves detecting old and new items. The problem I have is that some of the subjects have hit rate of 1 and/or false alarm rate of 0, which makes the probabilities 100% and 0%, respectively. The...
From: Stats Stack Exchange | By: A.Rainer | Saturday, January 24, 2015
smile
frown
Let $A$ and $B$ be two constant matrices and let $x$ and$ y$ be two random vectors, what is the general formula for $Var(Ax+By)$? I know the formula for when $x$ and $y$ are scalar random variables and $A$ and $B$ are constants, but what about the matrix...
From: Stats Stack Exchange | By: user67358 | Saturday, January 24, 2015
smile
frown
Are there techniques whereby I can apply a large data set to a very unrelated small data set and see if there are any patterns that can be identified?
From: Stats Stack Exchange | By: Sathya Atreyam | Saturday, January 24, 2015
smile
frown
I'm working with Shannon, Tsallis and Renyi entropies. I need to normalized these entropies for comparison purposes. In Shannon's entropy you need only to divide by the log of the number of bins. $$H(X) = -\sum_{i}\left({P(x_i) \log_b P(x_i)}\right)/\log_b(N)$$...
From: Stats Stack Exchange | By: Marco | Saturday, January 24, 2015
smile
frown
My study is looking at attitudes towards a concept across four different professional groups: Physicians, Nursing, Pharmacy, and Allied Health. I want to see whether there are differences in attitudes between the groups (e.g. across the professions)...
From: Stats Stack Exchange | By: Kristen | Friday, January 23, 2015
smile
frown
we are trying to do a project to discover emerging topics in social network via link anomaly method. But we are not knowing how to implement this .if any one know the answer please reply...
From: Stats Stack Exchange | By: abss IT | Saturday, January 24, 2015
smile
frown
I'm using R (factanal) to analyze some data. I know from reading that there are various ways of picking how many factors to use in the analysis. I don't know which to choose, or how to do any of them. Here's the data I have so far from factanal. I don't...
From: Stats Stack Exchange | By: David Shobe | Friday, January 23, 2015
smile
frown
I have encountered the following two versions of the Cobb-Douglas production function as an illustration of the differences between intrinsically non-linear and linearisable non-linear regression models (and their transformations): \begin{align} Y_i...
From: Stats Stack Exchange | By: Constantin | Friday, January 23, 2015
smile
frown
I am trying to validate a mixed effects logit regression model with a categorical dependent variable and categorical predictor variables - I have nothing that is continuous. One of my predictor variables is binary, and the other has three possible values...
From: Stats Stack Exchange | By: Chris | Friday, January 23, 2015
smile
frown
What is the difference between noiseless and AWGN channels in terms of channel capacity?
From: Stats Stack Exchange | By: golus | Friday, January 23, 2015
smile
frown
From google search, it seems Normal-Gamma is the conjugate prior for univariate gaussian. I am wondering if there is a systematic way to derive this ? (or to derive conjugate prior for exponential family in general)
From: Stats Stack Exchange | By: aha | Friday, January 23, 2015
smile
frown
I need to print all the permutations of numbers in pyhton. so far I wrote this: def permutation(listNum, i): if i == len(listNum) - 1: print listNum else: for j in range(index, len(listNum)): listNum[i], listNum[j] = listNum[j], listNum[i] perm(listNum,...
From: Stats Stack Exchange | By: gali | Friday, January 23, 2015
smile
frown
I have a set of data showing the dates of sick leave taken by several thousand people. It's been observed that some people have patterns that are unlikely to be by chance - in particular, you can see some people happen to be sick mainly on Fridays, which...
From: Stats Stack Exchange | By: Glinkot | Friday, January 23, 2015
smile
frown
I am quite unsure of the intuitive difference between a random variable converging in probability versus a random variable converging in distribution. I've read numerous definitions and mathematical equations but that doesn't really help. What I don't...
From: Stats Stack Exchange | By: nicefella | Friday, January 23, 2015
smile
frown
This may look like a silly question but I am struck in my work with this notation in one of the papers.
From: Stats Stack Exchange | By: user67252 | Friday, January 23, 2015
smile
frown
I have imported a dataset from excel and I want to run a logistic regression, but SAS does not recognized continuous variables. That is the code I used: Proc logistic data=work.heart class famhist /param=ref ref=first model chd = tobacco ldl typea age...
From: Stats Stack Exchange | By: francesco | Friday, January 23, 2015
smile
frown
Some books seem to include an assumption for the normal linear model which I have never met before. They say that there must be no correlation between between the explanatory variables and the errors. I was wondering if this assumption is true and if...
From: Stats Stack Exchange | By: John M | Friday, January 23, 2015
smile
frown
I am looking to tease out the significance and contribution of a particular variable to 2 different continuous responses. I have 7 continuous variables I know to be influential on the two responses (which have been considered by the literature). I also...
From: Stats Stack Exchange | By: Patricia Spellman | Friday, January 23, 2015
smile
frown
I am trying to build a predictive model on 30 million rows of customer data to predict which product type they will buy. I've looked at and tried out the ff package and the biglm packages, but these models aren't converging when I try to use a bigglm....
From: Stats Stack Exchange | By: Mike | Friday, January 23, 2015
smile
frown
I have a problem as follows. Life of tyres normally distributed for a specific make. mean=24,000 km and sd= 2500 km. Question is: As a result of improvements in manufacture, the length of life is still normally distributed, but the proportion of tyres...
From: Stats Stack Exchange | By: arcomber | Friday, January 23, 2015
smile
frown
Say, we test an arbitrary regression or classification procedure on $n$ independent samples with leave-one-out cross-validation. This results in an estimate of the prediction error $e_n$ for each sample $n$. Can these $e_n$ be assumed to be independent...
From: Stats Stack Exchange | By: kazemakase | Friday, January 23, 2015
smile
frown