## Welcome to Serendeputy!

Serendeputy is your personal news assistant.

- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

How it works.

What to do:
2. Click smileys and frownies
3. Find favorite topics and sources
4. See how much better your deputy is getting at finding you good stuff.

# Stats Stack Exchange

I have on question regarding standardized coefficients (beta) in linear models. I have already asked one question here. From the answers I assume that I should use R's scale() function on the dependent variable as well as on all independent variables...
From: Stats Stack Exchange | By: MagnusMetz | Tuesday, October 21, 2014
smile
frown
I need to describe what the difference between two groups (patients and normal controls) consists of in terms of latent variables that I can describe within each group. For instance, given this PCA variable map: how do I compare this to with the same...
From: Stats Stack Exchange | By: Fredrik Karlsson | Wednesday, October 22, 2014
smile
frown
I know that $Var(\theta)\geq 1/I(\theta)$ where $I(\theta)$ is Fisher information. Let take an example of natural exponential family with density $f(x)=\lambda\exp(-\lambda x)$. In this case we have: $-E[\frac{\partial^2 \log(f)}{\partial\lambda^2}]=\frac{1}{\lambda^2}$...
From: Stats Stack Exchange | By: Ahmed | Wednesday, October 22, 2014
smile
frown
I'm trying to understand the theory of estimators. As I understand it now, if you have an r.v. $X$ and take $n$ i.i.d. samples then an estimator for $E[X^{2}]$ would be $\overline{X^{2}}$ since $E[\overline{X^{2}}] = E[X^{2}]$ (probably only true for...
From: Stats Stack Exchange | By: Addem | Wednesday, October 22, 2014
smile
frown
I am working on research paper for diagnosis of cancer. List of Known prognostic factors Age of patient Size of tumor Grade of tumor Lymphnode involvement and list of Unknown factors which are to be assessed with prognosis by correlating with known prognostic...
From: Stats Stack Exchange | By: made_in_india | Wednesday, October 22, 2014
smile
frown
I want to compare two profile likelihood curves and determine if they are significantly different from one another. For example are the following curves significantly different from one another: I realize I can find a 95% confidence interval for a value...
From: Stats Stack Exchange | By: Ben Haley | Tuesday, October 21, 2014
smile
frown
Attempting to understanding a statistical concept which I'm positive is basic stats, but that I currently don't understand. Say that there's a one in ten million likelihood of an outcome happening during an event, that happens a given count of times,...
From: Stats Stack Exchange | By: blunders | Monday, October 20, 2014
smile
frown
In one use of k-fold cross-validation for evaluating classifiers, one trains k models, each on n(k-1)/k examples, and tests each on n/k examples. The average accuracy on those k test sets of size n/k is used as an estimate of the accuracy of a classifier...
From: Stats Stack Exchange | By: DavidDLewis | Wednesday, October 22, 2014
smile
frown
I want to find the dominant one among two theories which give predictions about the relationships between one dependent variable( Y ) and eight independent variables ( x1 , x2 ... x8 ). The predictions of the two theories are mutually exclusive , i.e....
From: Stats Stack Exchange | By: Sandip Sinha | Wednesday, October 22, 2014
smile
frown
Suppose I have normally distributed data. For each element of the data I want to check how many SDs it is away from the mean. There might be an outlier in the data (likely only one, but might be also two or three) or not, but this outlier is basically...
From: Stats Stack Exchange | By: Oliver | Wednesday, October 22, 2014
smile
frown
Good morning everyone, I have a question in regards to Statistically Significant Sample Sizes to a population. I am working with data in excel and wanting to put the formula for this into the workbook as well but have run into a roadblock of sorts. I...
From: Stats Stack Exchange | By: Filmore34 | Wednesday, October 22, 2014
smile
frown
Let's say I am trying to figure out whether two classes can be differentiated. My methods may not be perfect, but I would like to know whether my features "mean" anything that may possibly be added to reinforce another system (for instance). I know that...
From: Stats Stack Exchange | By: ido | Wednesday, October 22, 2014
smile
frown
Could you please answer some of my questiona regarding SVM-RFE (svm with recursive feature elimination). I am using SVM-RFE with linear kernel for the binary classification and feature selection problem. All feature are rescale with mean 0 and standard...
From: Stats Stack Exchange | By: TmN | Wednesday, October 22, 2014
smile
frown
I know it might be trivial but does the density of daily values impact the forecast accuracy? For example, if a call center receives less than 50 calls for weekdays and less than 10 calls for weekend, is the forecast accuracy diminished compared with...
From: Stats Stack Exchange | By: user12 | Wednesday, October 22, 2014
smile
frown
Suppose I have 1000 draws each of two random variables X and Y. If I wanted to sample the sum of these variables, I would simply calculate 1000 samples, i.e. $$S_{i}=X_{i}+Y_{i}, i=1,2,…,1000$$ And that would give me draws from the pdf of the sum...
From: Stats Stack Exchange | By: mzuba | Wednesday, October 22, 2014
smile
frown
I have a dataset with approximately 4000 rows and 150 columns. I want to predict the values of a single column (= target). The data is on cities (demography, social, economic, ... indicators). A lot of these are highly correlated, so I want to do a PCA...
From: Stats Stack Exchange | By: chrmar | Wednesday, October 22, 2014
smile
frown
I'm not well versed in statistics so I'm not sure if my question is worded exactly correctly but basically here's the problem I'm trying to solve: imagine you have two equal sized arrays of size n. Each array is filled with random numbers from 0 to 1....
From: Stats Stack Exchange | By: David Webb | Wednesday, October 22, 2014
smile
frown
Suppose n people go to a fancy restaurant. Each person is wearing a hat and checks his/her hat at the door as he/she arrives. The hat-check attendant gets tipsy throughout the evening, forgetting which hat belongs to whom, and returns a random hat to...
From: Stats Stack Exchange | By: Zhe Huang | Wednesday, October 22, 2014
smile
frown
I am trying to see differences in the feeding-rate of one bird species between big forest patches and small ones. I have several forest patches of both sizes, and three years of study. Some individuals have been recorded (to assess the feeding-rate)...
From: Stats Stack Exchange | By: Javiero84 | Wednesday, October 22, 2014
smile
frown
I want to find the dominant among two theories. I have used a unique "t - test" as shown in the file below : https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxzYW5kaXBzaW5oYW5vd3xneDo1ZTU5ZWVkNWRmOTgzZDMx Kindly comment about...
From: Stats Stack Exchange | By: Sandip Sinha | Wednesday, October 22, 2014
smile
frown
I feel this is a simple problem, yet I cannot seem to solve it. Any help is greatly appreciated.
From: Stats Stack Exchange | By: confused | Wednesday, October 22, 2014
smile
frown
Let say we have a dataset, $\mathbf{X}$ of $m$ instances, and $n$ features, and a target scalar variable $\mathbf{y}$ ($m$ instances). Now I want to do a regression so, I try to fit a hyperplane $y = \mathbf{x} .\mathbf{w}$ + c. Note : $\mathbf{w}$...
From: Stats Stack Exchange | By: user76170 | Wednesday, October 22, 2014
smile
frown
I am trying to do Logistic Regression in R. My data set contains more than 50 variables. Some of them are factor (qualitative variable) and others are independent variable(quantitive ). I would like to get the significance of the variables from their...
From: Stats Stack Exchange | By: Mohammad Saifullah | Wednesday, October 22, 2014
smile
frown
Imagine a hypothetical scenario in which a ball is thrown along a straight line. During flight, the position is continually sampled; however, at some distance, the sampling fails and only noise is detected. This distance is unknown and variable. One...
From: Stats Stack Exchange | By: user59071 | Wednesday, October 22, 2014
smile
frown
I am trying understand how to correctly build a mixed-effects logistic regression model in R. I believe my model is pretty simple and straight forward but I'm lacking in experience and uncertain I'm doing it correctly. Not being a statistician and struggling...
From: Stats Stack Exchange | By: Michael | Wednesday, October 22, 2014
smile
frown
I need to estimate a panel model. I have run the "normal" fixed effects model using plm in R and also wfe. I also wanted to try pggls considering its tolerance of heteroskedasticity and autocorrelation. However, the results I am getting with pggls are...
From: Stats Stack Exchange | By: jayhawk | Wednesday, October 22, 2014
smile
frown
In a mixed effects model the recommendation is to use a fixed effect to estimate a parameter if all possible levels are included (e.g., both males and females). It is further recommended to use a random effect to account for a variable if the levels...
From: Stats Stack Exchange | By: gung | Tuesday, October 21, 2014
smile
frown
I am aware of some nice examples of pairs of correlated random variables which are marginally normal but not jointly normal. See this answer by Dilip Sarwate, and this one by Cardinal. I am also aware of an example of two normal random variables whose...
From: Stats Stack Exchange | By: user65257 | Tuesday, October 21, 2014
smile
frown
Say I take 500 bootstraps of a population and calculate 95% confidence intervals (CIs) for each sample. I would expect that 95% of the bootstrap sample CIs to contain the true population mean. However, I'm then asked what the probability is of 100% of...
From: Stats Stack Exchange | By: Makaira Murakami | Tuesday, October 21, 2014
smile
frown
Given input vector $x$, let the maximum of $x$ occur at index $i$ in the input vector. I am trying to quantify the peakedness of this maximum and do that I have thought of determining following quantities. I am interested in determining numbers $l$ and...
From: Stats Stack Exchange | By: morpheus | Tuesday, October 21, 2014
smile
frown
Above are three plots of the Linear model I am trying to analyze. The first one is a basic plot of the linear data: LinearModel = read.csv(file= "C:/Users/Nikhil/Documents/LinearModelCase2.csv", header=TRUE, sep=",") plot(LinearModel$X,LinearModel$LinearModel)...
From: Stats Stack Exchange | By: Nikhil Agrawal | Tuesday, October 21, 2014
smile
frown
I have two series of trading profit results. I use the geometric mean to calculate the average in percent (CAGR). I would like to divide it by the standard deviation by combining the two series, but I´m having trouble calculating the combined standard...
From: Stats Stack Exchange | By: Macce | Tuesday, October 21, 2014
smile
frown
The 21,000 estimate for Oct. was certainly not via quad or power regression. I wonder how they got that number? http://www.telegraph.co.uk/news/worldnews/ebola/11121045/Graphic-how-Ebola-cases-have-grown-since-March.html...
From: Stats Stack Exchange | By: JackOfAll | Tuesday, October 21, 2014
smile
frown
I've seen a couple of seemingly unrelated notes about working with large volumes of data and it struck me that I couldn't find much content on problems specific to statistical analysis of Big Data. Is there a compiled list somewhere (or book, article)...
From: Stats Stack Exchange | By: ivanmp | Tuesday, October 21, 2014
smile
frown
I have a database with several continuous variables measured in two times. I searched for a change in time in my dependent variables in this way: difJS<-lmer(JS~Time+(Time|id)+(Time|occupation),dat,REML=T) If I detect a significant fixed effect of...
From: Stats Stack Exchange | By: Andrea Gragnano | Monday, October 20, 2014
smile
frown
As far as I know, both Gaussian mixtures as well as Gaussian processes can be used for regression. My question is: what is better and why? The answers might contain theoretic insights, practical experience or reference to further resources....
From: Stats Stack Exchange | By: Karel Macek | Tuesday, October 21, 2014
smile
frown
I'm a psychology PhD student doing analysis on a relatively large set of data, obtained via online surveys. The purpose of the study is largely to determine normative data for a population of adults, on a number of psychology scales. However, I'm not...
From: Stats Stack Exchange | By: Jim | Tuesday, October 21, 2014
smile
frown
(R studio) Hi, I'm running LDA on a dataset with 250,000 observations, 2 classes and 30 variables. My goal is to create a classification model using the LDA function. After loading my variables I receive a warning that my X's are collinear. (should this...
From: Stats Stack Exchange | By: Marvin Crisostomo | Tuesday, October 21, 2014
smile
frown
I am interested in a model like: $y_{i} = \sum_{k\in K}{\beta_{k} z_{k}}$, with $z_{k} \tilde{} N(\mu_{k}, \sigma_{k})$. where $\beta \equiv(\beta_{k})_{k\in K}$ is not known, but all else is. I assume that there is a prior $\pi (\beta)$ of an arbitrary...
From: Stats Stack Exchange | By: Pedro Forquesato | Tuesday, October 21, 2014
smile
frown
I have fitted a non linear assymptotic equation to a set of data and my interest is in getting the standard deviations of the fitted parameters. Is this possible in nls?
From: Stats Stack Exchange | By: Azzy Azali Azal | Tuesday, October 21, 2014
smile
frown
I am trying to build a, regressive, predictive model for a target time-series that is heavily skewed. You could think of the target as being like earthquake magnitudes or heavy rainfall. Most of the time we sit in the relatively boring head of the distribution,...
From: Stats Stack Exchange | By: lb n-plus-1 | Tuesday, October 21, 2014
smile
frown
I am interested in finding the median absolute distance to quantiles. So, for $Q_\alpha$ the $0 \le \alpha \le 1$ quantile, I would like to find $Q_\gamma^*$ such that $Q_\gamma^*$ satisfies \underset{\gamma}{median}|Q_\alpha-Q_\gamma|=...
From: Stats Stack Exchange | By: Deathkill14 | Tuesday, October 21, 2014
smile
frown
I have several questions concerning analysis of data, especially when there are replications and/or pseudoreplications. First, I read an example in « pseudoreplication is a pseudoproblem » where we wish to determine which of two urns contains the greater...
From: Stats Stack Exchange | By: user3866113 | Tuesday, October 21, 2014
smile
frown
I have 10 items in my store and I am running 3 promotions. To maximize my sell and profit, I want to decide price for the items on daily or weekly basis. No of items=10 Promotions running-3 Other factors can influence- competitor price of that area (...
From: Stats Stack Exchange | By: panda | Tuesday, October 21, 2014
smile
frown
I want to use some count data to train a classifier. The count data range from 0 to 400 something. There are a bunch of smaller counts (0's and 1's). I wonder what would be a good way to categorize it into 4 groups.Thanks in advance!...
From: Stats Stack Exchange | By: Alex | Tuesday, October 21, 2014
smile
frown
A survey found that the average communication of male is 35 and the average communication of female is 65. The data obtained from 100 students from 2 samples and that the standard deviation were _ and _ respectively at a= 0.5. Can it be concluded that...
From: Stats Stack Exchange | By: Leah Young | Tuesday, October 21, 2014
smile
frown
i want to analyze the correlation between overall attitudes with purchase intention. both is continuous variables. Overall attitude using 7-liker scale while purchase intention 5 liker scale. So, can i do correlation between this even they have different...
From: Stats Stack Exchange | By: user59001 | Tuesday, October 21, 2014
smile
frown
This question is based on Honglak Lee's paper "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations". In chapter "4.3 Handwritten digit classification", it is written: We trained 40 first layer bases from...
From: Stats Stack Exchange | By: Baptiste Wicht | Tuesday, October 21, 2014
smile
frown
I am trying to conduct an A/B(/C) test to compare the performance of 3 different website pages but I'm facing issues regarding zero inflated data. I have data for each page regarding 1.) the number of people who clicked on the page 2.) performed the...
From: Stats Stack Exchange | By: user3682157 | Tuesday, October 21, 2014
smile
frown
I attempted to build a deep network (e.g. deep autoencoder) for some object classification, my result showed that the deep networks is worst than shallow network. However, from what I have read from lecture, deep network perform well. This raise me a...
From: Stats Stack Exchange | By: RockTheStar | Monday, October 20, 2014
smile
frown