## Welcome to Serendeputy!

Serendeputy is your personal news assistant.

- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

How it works.

What to do:
2. Click smileys and frownies
3. Find favorite topics and sources
4. See how much better your deputy is getting at finding you good stuff.

# Stats Stack Exchange

I have read the chapters that are related to "LASSO" regression method in : 1- The elements of statistical learning (Tibshirani,et.al) 2- Statistical Learning with Sparsity , The Lasso and Generalizations.(Tibshirani,et.al). My questions : I- I do not...
From: Stats Stack Exchange | By: jeza | Saturday, June 25, 2016
smile
frown
We have data that shows usernames and their IP addresses when they connect to a particular server. The data also contains IP address to geolocation mappings. So our data also contains fields that show the city, state and country pertaining to an IP address...
From: Stats Stack Exchange | By: learnerX | Friday, June 24, 2016
smile
frown
I wanted to complete calculus pre-requisites for machine learning class. I am doing an online course of multi-variable calculus. Can someone please suggest what lectures after Lecture 13 are relevant.( Course link- http://ocw.mit.edu/courses/mathematics/18-02-multivariable-calculus-fall-2007/video-lectures/)...
From: Stats Stack Exchange | By: Abhishek Bhatia | Wednesday, June 22, 2016
smile
frown
I have a time series of logarithmic returns. After inspection of the ACF and PACF plots, I tried to fit AR(2), MA(2) and ARMA(1,1) models and eventually found out that the AR(2) version can possibly fit best. The AR(2) model has no constant, no trend,...
From: Stats Stack Exchange | By: msmna93 | Saturday, June 25, 2016
smile
frown
I'm trying to report means of levels given a model. xy <- data.frame(y = c(rnorm(50, mean = 50, sd = 10), rnorm(50, mean = 25, sd = 10)), x = rep(c("A", "B"), each = 50)) head(xy) library(ggplot2) ggplot(xy, aes(x = x, y = y)) + theme_bw() + geom_jitter()...
From: Stats Stack Exchange | By: Roman Luštrik | Saturday, June 25, 2016
smile
frown
I am learning about Heckman selection model and confused with conditional expectation notation. If the equation that determines sample selection is: The outcome equation is: If error terms have bivariate normal distribution, then: Can someone explain...
From: Stats Stack Exchange | By: quirik | Saturday, June 25, 2016
smile
frown
Can adaboost be used with multiple observations? If yes, how / why is it working? Example: let's say I want to predict if it's going to rain=1, or not=-1 and I want to model my prediction based on .. let's say the amount of clouds AND the average atmospheric...
From: Stats Stack Exchange | By: MeHigh | Saturday, June 25, 2016
smile
frown
I find four thresholds when I have five ordered responses (strongly agree to strongly disagree). I was wondering how should I interpret the thresholds?
From: Stats Stack Exchange | By: Soumya | Saturday, June 25, 2016
smile
frown
I have got a panel data regarding to electric consumption(including total consumption,AC consumption and light consumption) among different buildings.There were categorical variables I have already encoded in (like season,building types et al.) Each...
From: Stats Stack Exchange | By: hwq729 | Saturday, June 25, 2016
smile
frown
I am using packages {rugarch} for forecasting and {forecast} for Diebold - Mariano test. As a first step, I am specifying the first AR-GARCH model for financial time series (AAPL Nasdaq) using ugarchspec{rugarch} spec1 <- ugarchspec( variance.model...
From: Stats Stack Exchange | By: John | Thursday, June 23, 2016
smile
frown
Suppose we know that the population size is $n=1,000$ but for whatever reason, we only have the bottom $n_1=100$ observations and the top $n_2 = 200$ observations. Furthermore, suppose we know the data $X_i \overset{\text{iid}}{\sim} N(0,\sigma^2)$....
From: Stats Stack Exchange | By: Gene Burinsky | Saturday, June 25, 2016
smile
frown
I am currently building a genetic algorithm to tune n parameters where n will probably be in the range of 3 ≤ n ≤ 8 but could be up to 15. I would like my initial population N (let's say N=1000) to be evenly dispersed across the input space. This...
From: Stats Stack Exchange | By: Dónal Flanagan | Friday, June 24, 2016
smile
frown
I would like to get a better idea of stochastic gradient descent algorithms, especially and most important Adam, since I've expierenced reasonable results with Adam and refuse to use something "just because it works". Sidenote: I'm familiar with basic...
From: Stats Stack Exchange | By: ascenator | Friday, June 24, 2016
smile
frown
I am working on Two-Way ANOVA for an unbalanced design. Will Tukey multiple comparison be the same for different types (I, II & III) of Sum Of Squares of ANOVA for an unbalanced design. I am doing the ANOVA using the car package but the output of...
From: Stats Stack Exchange | By: cheedep | Saturday, June 25, 2016
smile
frown
The Dirichlet distribution allows you to generate a sample of numbers $x_i$ with a prescribed sum, say $\sum_i x_i = 1$. Moreover, the parameters $\alpha$ allow some degree of control on the means of the individual $x_i$. I also want to generate random...
From: Stats Stack Exchange | By: becko | Friday, June 24, 2016
smile
frown
I am currently doing a customer segmentation project in SAS. I have identified 2700 customers who are have made a purchase in each of the 4 years I am analysing. For the cluster analysis the more purchases/customer each year the better the data quality...
From: Stats Stack Exchange | By: George | Friday, June 24, 2016
smile
frown
Okay so just a bit hazy on a few things, any help would be much appreciated. It is my understanding that the linear regression model is predicted via a conditional expectation E(Y|X)=b+Xb+e Do we assume that both X and Y are Random variables with some...
From: Stats Stack Exchange | By: William Carulli | Friday, June 24, 2016
smile
frown
I would like to estimate an angle $\theta\in\left(-\frac{\pi}{2},\frac{\pi}{2}\right)$ given the noisy observations of its sine and cosine (this is related to my earlier question). My estimator is the inverse tangent of the ratio of the means of the...
From: Stats Stack Exchange | By: M.B.M. | Friday, June 24, 2016
smile
frown
I have 2 proportions i want to compare, they are from the same population but at different time periods, which in between there was an intervention done. H0: Proportion in Period#1 = Proportion in Period#2 H1: Proportion in Period#1 ≠ Proportion in...
From: Stats Stack Exchange | By: Paris Char | Friday, June 24, 2016
smile
frown
I have a modeling problem that can be reconceptualized in a simpler way like so: Imagine I have an infinite urn with red, green, and blue balls in it with the proportions pr, pg, and pb (these probabilities are known). Next, I hand the urn to another...
From: Stats Stack Exchange | By: user2048508 | Friday, June 24, 2016
smile
frown
I have read this and I have stuck on page 4. It says that By definition normal pseudo-residual is precisely N(0,1) distributed and its value is zero if Y is equal to the median of its distribution. Thus these residuals measure the deviations from the...
From: Stats Stack Exchange | By: F.F. | Friday, June 24, 2016
smile
frown
I have a dataset with several missing values. I know that the missing is MNAR. I'm trying to use MICE to impute the data; then apply a survival model on the imputed data. The MICE paper advises that all variables in the full model (the survival model)...
From: Stats Stack Exchange | By: user90593 | Friday, June 24, 2016
smile
frown
In this post we ask a question about a natural phenomenon called humans attempt to find decision by counting votes. The specific incident of such natural phenomenon that this question is about is the case of Brexit. Note: the question is not about politics....
From: Stats Stack Exchange | By: caveman | Friday, June 24, 2016
smile
frown
I'm modeling some repeated-measures presence-absence data using a binomial GLMM in lme4. I've been using the method suggested by Nakagawa and Schielzeth (2013) to calculate a marginal and conditional pseduo-$R^2$. According to their approach (if I'm...
From: Stats Stack Exchange | By: yeticrab | Friday, June 24, 2016
smile
frown
If all of my coefficients in my logsitic model have really perfect t-statistics that all show sufficiently high significance but have two coefficients that have high VIF like 13-14, can I ignore the multicollinearity? I produce y values from my model...
From: Stats Stack Exchange | By: Eric | Friday, June 24, 2016
smile
frown
I have a couple of questions concerning the time trend (βt) in an augmented Dickey-Fuller test for panel data: 1) From what I understand there is no clear rule/standard or test as to when to include a time trend in a DF model or not. Do I understand...
From: Stats Stack Exchange | By: AppleCinnamon | Friday, June 24, 2016
smile
frown
Good morning, i would like to assess the impact of level of environmental compliance (treatment variable: 0=non compliance; 1=fair compliance; 2=good compliance; 3=excellent compliance) on firm profitability (outcome) Please, i'm looking for stata command...
From: Stats Stack Exchange | By: eke balla sophie | Friday, June 24, 2016
smile
frown
Let $X_1, X_2, \ldots$ be independent $U(0,2)$ random variables and let $$Y_n = \prod_{i=1}^n \, X_i \;.$$ How do I prove or disprove that that $Y_n$ converges to $0$ almost surely ?
From: Stats Stack Exchange | By: Dwaipayan Gupta | Friday, June 24, 2016
smile
frown
I am trying to maximize a Poisson likelihood function for an array of values, of the form llhij = -mij + kij*ln(mij/kij). (where I use the numerical approximation ln(k!) ~ k*ln(k)). For my purposes, the mij are the number of events predicted by a model,...
From: Stats Stack Exchange | By: Danielle | Thursday, June 23, 2016
smile
frown
I've created a Monte Carlo simulation that randomly divides my data into "test" and "training"-Samples and then trains a neural network. The ratio of 0 and 1 (19.62%) Category is stabilized on sampling. My results show a highly fluctuating model accuracy...
From: Stats Stack Exchange | By: Krombopulos Michael | Friday, June 24, 2016
smile
frown
I'll use this picture to explain What I want to do is define some patterns as trained patterns. Then given data I want to be able to determine if the pattern exists in the dataset, and if it does exist determine how many times it occurs. I have had success...
From: Stats Stack Exchange | By: Mike Sallese | Wednesday, June 22, 2016
smile
frown
Here are my questions: is there a difference between "VAR(1)" and "AR(1)"? Granger Causality inspects the direction of causality. In return, we receive a p-value on how much a time series is likely to contribute to a better prediction of the other. Choi...
From: Stats Stack Exchange | By: AverageZero | Friday, June 24, 2016
smile
frown
Is this notation accepted when I write $\text{Var}(X)=\mathbb{E}(X^2)-\mathbb{E}^2(X)$?
From: Stats Stack Exchange | By: StubbornAtom | Friday, June 24, 2016
smile
frown
I would like to find the maximum likelihood estimation (MLE) of the parameters of following distribution and desnity function : F(x) = 1- exp{-(ax)^b-(cx)^d} f(x) = (a^b x^{b-1}-c^d x^{d-1}) for x>0, a,c>0, b>1, and d <1. I have written the...
From: Stats Stack Exchange | By: shany | Friday, June 24, 2016
smile
frown
From a car, I get a value every second for accelerator pedal, from 0 to 100%. When you don't touch it, it reports 0%, and when you kick it to the max, that is 100%. I want to calculate how much change there is in this value. It will be one of many indicators...
From: Stats Stack Exchange | By: Terje Kolderup | Friday, June 24, 2016
smile
frown
Can you please explain in simple way. Is it so important in logistic regression?
From: Stats Stack Exchange | By: Aby Mathew | Friday, June 24, 2016
smile
frown
I have a dataset with 163 observations (all countries in the world with population > 1000000) and 290 variables related to their disease burden and performance. Because there are more variables than observations I cannot run a standard linear regression....
From: Stats Stack Exchange | By: user3387899 | Friday, June 24, 2016
smile
frown
Say I have 4 random variables. $X^{(1)}$ and $X^{(2)}$ are jointly multivariate normal with mean 0 and covariance $\Sigma_X$, and $Y^{(1)}$ and $Y^{(2)}$ are jointly multivariate normal with mean 0 and covariance $\Sigma_Y$. There are no dependencies...
From: Stats Stack Exchange | By: Ruben van Bergen | Friday, June 24, 2016
smile
frown
If I'm attempting to model & predict Realized Volatility as defined as the sum of squared intraday returns. Does it make sense to evaluate GARCH and GARCH variants? If yes, are there special considerations given that part of the model is modelling...
From: Stats Stack Exchange | By: TCopple | Friday, June 24, 2016
smile
frown
I am new to plotting learning curves and this is the first one I made. But now I need some help with interpretating the plot... What does the score mean? And why does the training score decreases and the test score increases upon an increasing training...
From: Stats Stack Exchange | By: Papie | Friday, June 24, 2016
smile
frown
So, im in a bit of trouble here. I am using R (i'm very new at this), and i'm trying to plot the probability effects of a interaction effect, using the effects package. This is what the plot shows However, when looking at the logistic regression model:...
From: Stats Stack Exchange | By: marc | Thursday, June 23, 2016
smile
frown
I'm doing an MBA dissertation on the topic "Impact of Customer Service Satisfaction on Business Performance". The hypothesis set is "Customer service satisfaction can positively impact business performance whereas the null hypothesis set is "Customer...
From: Stats Stack Exchange | By: Sahaar Salim Kazi | Friday, June 24, 2016
smile
frown
I want to learn SAS Data Step 2 (DS 22) and In Memory Statistics (IM Stats) and I am unable to find a good resource online. Beside this I do not know what is In Memory Statistics (IM Stats). Can someone help me regarding this?
From: Stats Stack Exchange | By: user86354 | Friday, June 24, 2016
smile
frown
A p-value is the probability to obtain a statistic that is at least as extreme as the one observed in the sample data when assuming that the null-hypothesis (H0) is true. Graphically this corresponds to the area defined by the sample statistic under...
From: Stats Stack Exchange | By: matti | Friday, June 24, 2016
smile
frown
$R(t)$ = reliability = 1 - unreliability (probability that item is still operational for a given time t) unreliability = Cumulative Distribution Function (CDF) = (probability that item has failed for a given time $t$) So if item's time-to-failure data...
From: Stats Stack Exchange | By: Andre Chenier | Thursday, June 23, 2016
smile
frown
I have panel data and doing analysis in Stata 13. So when I run xttest0 to check for random effects I found an error "not sorted". Please help me what this error mean and how can I resolve this problem
From: Stats Stack Exchange | By: Tauseef Ali | Friday, June 24, 2016
smile
frown
I have a rather large dataset (1M training samples). Each epoch in my neural network takes about 12 hours. I'm wondering what is the best strategy to tune the hyperparameters (batch size, step size, etc.), as just performing a grid search (and a random...
From: Stats Stack Exchange | By: yoki | Friday, June 24, 2016
smile
frown
I have a dependency treebank which I divide to training set and test set. I extract some rules ((DS,PS) pairs) to convert the treebank to phrase structures. When I extract such rules from the training set, I can measure the percentage of rules they cover...
From: Stats Stack Exchange | By: Ahmad | Friday, June 24, 2016
smile
frown
I have a numerically calculated graph, on unit-less coordinates. I have experimental data which corresponds to a point on that graph. The data has units, and to make it unitless one has to divide it by a "constant" which determines where these points...
From: Stats Stack Exchange | By: Mr.WorshipMe | Thursday, June 23, 2016
smile
frown
I'm running a meta-regression/multi-level analysis that contains only categorical variables. The printout of the data is as follows: res.fe Multivariate Meta-Analysis Model (k = 19; method: REML) Variance Components: none Test for Residual Heterogeneity:...
From: Stats Stack Exchange | By: statsguyz | Friday, June 24, 2016
smile
frown