## Welcome to Serendeputy!

Serendeputy is your personal news assistant.

- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

How it works.

What to do:
2. Click smileys and frownies
3. Find favorite topics and sources
4. See how much better your deputy is getting at finding you good stuff.

# Stats Stack Exchange

I have run some machine learning experiments and now I have created some ROC and Precision-Recall curves (with the help of a toolbox). Unfortunately, I'm not familiar with these two things. Of course, in the web there is plenty of material describing...
From: Stats Stack Exchange | By: machinery | Wednesday, July 27, 2016
smile
frown
I have never worked with dyadic data before but need to do that now. So my question touches upon the very structure of dyadic data. The subject of the study is countries and their ties to each other. I am particularly interested in 3 dimensions/types...
From: Stats Stack Exchange | By: FKG | Wednesday, July 27, 2016
smile
frown
I have a multivariate data set with about 4000 entities. Each entity is described by 9 properties (1 is nominal, 7 continuous, 1 integer). I need to develop a predictive model to predict the nominal attribute (can have 3 different values). More precisely,...
From: Stats Stack Exchange | By: beta | Tuesday, July 26, 2016
smile
frown
I am trying to understand the asymptotic distribution of the Wald test statistic, specifically under the alternative hypothesis which I've found little reference to. For clarity, the binary hypothesis for an unknown parameter vector $\theta$ of size...
From: Stats Stack Exchange | By: pellis | Thursday, July 28, 2016
smile
frown
Let's say that I have one population that can be divided by a 2-level factor. I run MDS twice (using prcomp() {stats} in R), using the 2-level factor to separate my subjects. If I reduce to two dimensions, I now have two plots. Let's say that on each...
From: Stats Stack Exchange | By: tluh | Thursday, July 28, 2016
smile
frown
I have this question, where there are bunch of parameters. We don't know what parameters are related or how they are connected. But we know that the some or all parameters are related to a probability of the outcome. So we need to develop an equation...
From: Stats Stack Exchange | By: rksh | Thursday, July 28, 2016
smile
frown
so for my research project I am investigating effect a blood pressure diagnosis has on alcohol consumption frequency and physical activity. I have constructed 2 different models for both and I will be using xtologit command as I have panel data. Please...
From: Stats Stack Exchange | By: user112638 | Thursday, July 28, 2016
smile
frown
I would like to know what type of statistical analysis needs to be used for this study. The study will involve 8 throwers and 8 cross-country runners. They will use a machine to determine the strength of their dominate and non-dominate side. They will...
From: Stats Stack Exchange | By: ELF | Thursday, July 28, 2016
smile
frown
McCullagh & Nelder, 2nd edition, p 91 claim that to make comparisons "fair", it's best to use a single estimate of overdispersion parameter usually derived from the most complex model. I noticed that the same thing is done in this SAS example. Does...
From: Stats Stack Exchange | By: Nik Tuzov | Thursday, July 28, 2016
smile
frown
I was in the process of analyzing the data below. The first 2 columns are correct. They represent a point in time and a value of a stock at that point. I decided to calculate the probability surrounding each value and calculated the average to be the...
From: Stats Stack Exchange | By: NoChance | Thursday, July 28, 2016
smile
frown
Consider the simple GAM fit as below: my.gam <- gam(y~s(x), data=mydata) Is there anyway to return the estimated smoothing parameter (lambda) so that I can save it? How can I set lambda to a desired value?...
From: Stats Stack Exchange | By: M. Er | Thursday, July 28, 2016
smile
frown
In a question I asked recently, I was told that it was a big "no-no" to extrapolate with loess. But, in Nate Silver's most recent article on FiveThirtyEight.com he discussed using loess for making election predictions. He was discussing the specifics...
From: Stats Stack Exchange | By: a.powell | Thursday, July 28, 2016
smile
frown
This relates to the use of a continuous variable as a predictor in a multiple regression. If a continuous variable (e.g. age) was measured in a questionnaire but the datafile has placed 'cutoffs' on the variable (apologies this is probably not the correct...
From: Stats Stack Exchange | By: SwingingStrawberry | Thursday, July 28, 2016
smile
frown
I have a multiple linear model for time series data for which the regression residuals are autocorrelated and display seasonal behavior. This seasonal behavior is induced deterministically by a cyclic variable written into the model. In order to calculate...
From: Stats Stack Exchange | By: hirschme | Thursday, July 28, 2016
smile
frown
When evaluating a model, for example a binary classifier, should the train and test set have 50% + and 50% - label distribution or could the distribution be random? If the distribution is biased in the train/test sets e.g., 80% + and 20%-, the precision/recall...
From: Stats Stack Exchange | By: bla345 | Tuesday, July 26, 2016
smile
frown
I try to find additive and innovative outliers in the German Stock Index (dax) using the method Doornik & Ooms explained in 2002: Step 1 Estimate the baseline GARCH model to obtain log-likelihood (lb)and residuals Step 2 Find the largest (in absolute...
From: Stats Stack Exchange | By: Mo Bro | Thursday, July 28, 2016
smile
frown
I have paired data.The response variable is categorical with levels 1-5. And I have used both McNemar's test and Wilcoxon rank sign test and I get different p-values. For McNemar's test a p-value of 0.0396; and for Wilcoxon rank sign test a p-value of...
From: Stats Stack Exchange | By: Mik meadow | Thursday, July 28, 2016
smile
frown
My dissertation is about funds seasonality and the model that I am using is an OLS with dummies to check if January has a return greater than the remaining period: $$R_t = B_0 + B_1 D_{mt} + U_t$$ $R_t$ is the return on funds $B_0$ is the intercept...
From: Stats Stack Exchange | By: Duarte | Thursday, July 28, 2016
smile
frown
I need to give one hour introductory lecture on statistics to gifted middle school students. I am a software engineer and data analyst but don't have an academic degree or solid background in statistics. Can someone suggest a resource and a case study...
From: Stats Stack Exchange | By: Pankaj Agarwal | Thursday, July 28, 2016
smile
frown
This question I think was already asked here but I can't fully understand the answer. I have a number of ordinal predictors that I'm transforming into dummy variables and I'm wondering whether the hierarchical multiple regression linear relationship...
From: Stats Stack Exchange | By: SwingingStrawberry | Thursday, July 28, 2016
smile
frown
Trying to understand the statistical technique that can be used to solve this problem- I have set of features (x1,x2,x3,x4,x5) both continuous and discrete. The dependent variable is continuous (for example sales). I am trying to find segments based...
From: Stats Stack Exchange | By: Naveenan | Thursday, July 28, 2016
smile
frown
I have two normal distributions fg and bg with mean (mu) and standard deviations (sd) as follows: set.seed(100) fg = rnorm(10000, mean=11.00, sd=3.77) bg = rnorm(10000, mean=-0.508, sd=1.04) If I fit an LDA model like this: library(MASS) mydata = data.frame(label...
From: Stats Stack Exchange | By: Omar | Thursday, July 28, 2016
smile
frown
I need to analyze a 12 year long performance data of a waste water treatment plant in Sydney . The data has 250 parameters. As a part of schedule sampling, some of the parameters are measured 3 times in a week or only in weekdays or even 4 times in a...
From: Stats Stack Exchange | By: Mahesh Aacas | Wednesday, July 27, 2016
smile
frown
Suppose you're trying to predict anomalies. That is, consider the case where you have a data set that has a column called result. Suppose the data set has 365 rows and result has a value of 1 in only 12 of those rows and 0 in the other rows. Now suppose...
From: Stats Stack Exchange | By: Paul Reiners | Wednesday, July 27, 2016
smile
frown
This question already has an answer here: How do you report percentage accuracy for glmnet logistic regression? 3 answers PREFACE: I don't care about the merits of using a cutoff or not, or how one should choose a cutoff. My question is purely mathematical...
From: Stats Stack Exchange | By: felix000 | Wednesday, July 27, 2016
smile
frown
I have a massive data set of about 25000 pancreatic cancer patients; extremely quick and depressing mortality rate. I'm interested in survival differences between three groups--no treatment, chemotherapy, and chemo-radiation. Below is the survival function...
From: Stats Stack Exchange | By: Ryan | Wednesday, July 27, 2016
smile
frown
From: Stats Stack Exchange | By: Moe | Wednesday, July 27, 2016
smile
frown
I am trying to build a multilinear regression with predictor variables that likely are correlated. I understand that this is a problem, due to overlapping explanations of data. I think I have a method that may get around this, but would like to see if...
From: Stats Stack Exchange | By: BarocliniCplusplus | Wednesday, July 27, 2016
smile
frown
I'm new to time series modeling and am trying to do seasonal ARIMA modeling here. I have figured out the p,d,q values but im not sure how to select the period in the below formula. There seem to be troughs in the data during summer months and winter...
From: Stats Stack Exchange | By: Arslán | Wednesday, July 27, 2016
smile
frown
I have two groups, "in" and "out," and item categories that can be split up among the groups. For example, I can have item category A that is 99% "in" and 1% "out," and item B that is 98% "in" and 2% "out." For each of these items, I actually have the...
From: Stats Stack Exchange | By: neelshiv | Wednesday, July 27, 2016
smile
frown
I am trying to quantify error of the MLE for the following model using Fisher's information: $Y_{j} \sim Binomial(n_{j}, p_{j})$ $logit(p_{j}) = \eta + \gamma_{j}$ where the $n_{j}$'s and $\gamma_{j}$'s are known. Looking at some old notes from my statistics...
From: Stats Stack Exchange | By: K23 | Tuesday, July 26, 2016
smile
frown
I am a teacher in a language program at a university, and I was interested in investigating whether or not there is a correlation between the number of sessions a student spends in our program and their performance on the TOEFL (an English proficiency...
From: Stats Stack Exchange | By: Tom | Wednesday, July 27, 2016
smile
frown
I am trying to solve the equation AX = B in R. I have two matrices, A and B: A = matrix(c(1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0, 0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,...
From: Stats Stack Exchange | By: ultimate8 | Wednesday, July 27, 2016
smile
frown
I have the following data-set. sexratio=c("0%male","0%male","0%male","0%male","25%male","25%male","50%male","50%male" ,"75%male","75%male","100%male","100%male","100%male", "100%male") Trainsize=c("Ts130","Ts260","Ts520","Ts1040","Ts130", "Ts1040","Ts130",...
From: Stats Stack Exchange | By: setye | Wednesday, July 27, 2016
smile
frown
I want to do an regression analysis based on a online questionairy related to how risk averse people are with others money. I have 80 respodents. My dependent variable is a lottery question how much money are people willingly to gamble with severeal...
From: Stats Stack Exchange | By: N-K Pham | Wednesday, July 27, 2016
smile
frown
I want to find the standard error and standard deviation Min. = 8 Х104 Max. = 8Х107 - = 9 Х106 n = 36
From: Stats Stack Exchange | By: Dr ashraf | Wednesday, July 27, 2016
smile
frown
I am trying to translate some logistic regressions from SAS to Mplus. Some are fairly straightforward, e.g., no random effects, and others are mixed models with random intercepts. The example here will be a fixed effects model and my question will involve...
From: Stats Stack Exchange | By: William Shakespeare | Tuesday, July 26, 2016
smile
frown
I have t and beta (the estimate of the standardized coefficient), but I didn't write down the coefficients or their standard errors. How can I work out the unstandardized coefficients?
From: Stats Stack Exchange | By: ruth | Tuesday, July 26, 2016
smile
frown
I'm working on an assigment about Eigenfaces for facial recognition: http://courses.cs.washington.edu/courses/cse455/10wi/projects/p2/ . Here is the dataset: http://courses.cs.washington.edu/courses/cse455/10wi/projects/p2/class_images.zip , there are...
From: Stats Stack Exchange | By: Dang Manh Truong | Wednesday, July 27, 2016
smile
frown
I am using Scikit learn to perform PCA and Feature Agglomeration of my survey data. I have done the following : #PCA pca = PCA(n_components=10) pca.fit(adf_nan_norm) hello = pca.get_covariance() adf_nan_norm is a dataframe. My data frame has 181 rows...
From: Stats Stack Exchange | By: LoveMeow | Wednesday, July 27, 2016
smile
frown
I get the following error message when trying to load pbktest: Error : object ‘sigma’ is not exported by 'namespace:stats' In addition: Warning messages: 1: package ‘pbkrtest’ was built under R version 3.3.1 2: replacing previous import by ‘stats::sigma’...
From: Stats Stack Exchange | By: Emily | Wednesday, July 27, 2016
smile
frown
In the textbooks I have access to (and discuss hypothesis testing for correlation), I only met examples, where the null-hypothesis was $\rho=0$, and the alternative hypothesis was $\rho\ne 0$. My question is about using a one sided alternative hypothesis...
From: Stats Stack Exchange | By: Ferenc Beleznay | Wednesday, July 27, 2016
smile
frown
I am given $N$ observations of pairs of covariates and response $(\mathbf{x}_i, y_i)$. When the response are non-negative integers, by doing Poisson regression I am modelling $y_i \sim \mathrm{Pois}(\mu_i)$ as Poisson random variables with mean $\mu_i$,...
From: Stats Stack Exchange | By: Alex | Wednesday, July 27, 2016
smile
frown
I have a data set with approximately 1 million records. Some of these have experienced a specific event and are flagged accordingly. Using the attributes associated with these records I would like to classify those which have not experienced this event,...
From: Stats Stack Exchange | By: Cost of Failure | Wednesday, July 27, 2016
smile
frown
I have a model for a biological experiment which has a typical bayesian structure, that is $\lambda \rightarrow X$. Now let's assume for control the parameters are $\lambda_1 \rightarrow X_1$ and $\lambda_2 \rightarrow X_2$. Here $X_1$ is the observed...
From: Stats Stack Exchange | By: Hirak Sarkar | Wednesday, July 27, 2016
smile
frown
I was wondering generally what to do if the different dimensions/features of your input data display vastly different distributions in comparison to each other? Is there some form of recipe about how to proceed in this case? My data has 4 features/dimensions...
From: Stats Stack Exchange | By: Carsten D | Wednesday, July 27, 2016
smile
frown
I have data collected on a meeting-by-meeting basis, where the change in time between two meetings is not constant, i.e., $\Delta t\ne1$. Are ordinary Augmented Dicky Fuller and Phillips-Perron tests valid in this context?
From: Stats Stack Exchange | By: ts_highbury | Tuesday, July 26, 2016
smile
frown
I have a very small sample size of 4 or 5 within a treatment. Can I use the Mann-Whitney to draw any inference of differences between them? Online calculators seem to indicate a requirement of sample size >= 5 Sharada
From: Stats Stack Exchange | By: R Sharada | Tuesday, July 26, 2016
smile
frown
Let's say I have a set of binary random variables. Does there exist, for any such set, another set of binary random variables that carries approximately the same information, but whose members are all statistically independent of each other? By "carry...
From: Stats Stack Exchange | By: delta | Wednesday, July 27, 2016
smile
frown
I am using 2x3 repeated measures ANOVA. I am analyzing three years of test data and comparing two different groups. However, my data is negatively skewed. What is the best way to transform negatively skewed data? What shall I use for repeated measure...
From: Stats Stack Exchange | By: Landa | Wednesday, July 27, 2016
smile
frown